Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support encrypted databases #484

Open
JeremyPasco opened this issue Dec 8, 2023 · 12 comments
Open

Support encrypted databases #484

JeremyPasco opened this issue Dec 8, 2023 · 12 comments

Comments

@JeremyPasco
Copy link

JeremyPasco commented Dec 8, 2023

Hi, I'm looking for a way to read encrypted sqlite databases.
There are many different ciphers allowing encryption of sqlite databases. One project support multiple ciphers: https://github.com/utelle/SQLite3MultipleCiphers

I am not a C programmer but as far I understand, SQLite3MultipleCiphers is a fork of sqlite and could be integrated in RSQLite instead of the original sqlite source code.

I guess this would allow users to decrypt database through simple dsn specification.
For example, accessing a sqlite dabase encrypted with sqlcipher is as simple as appeding the following to the dsn: ?_cipher=sqlcipher&_key=<passphrase>.

Edit: DNS is a exposed by some libraries in other langages, but after investigation they all translate internally DNS variables to PRAGMA statements or sqlite functions. MultipleCiphers doesn't handle translation by itself.

I would perfectly understand that dropping sqlite for one of its forks may not be relevant here but maybe providing documentation on how to get it done may be usefull to some devs?

I would gladly investigate the topic if you find it relevant. Could you just just provide me clues on how to compile this package against an sqlite fork like SQLite3MultipleCiphers?

@JeremyPasco
Copy link
Author

I found out sqlite3 has been integrated through its amalgation source code. Multicipher also provides an amalgation source. I will make a fork with it and provide feedback here.

@JeremyPasco
Copy link
Author

JeremyPasco commented Dec 10, 2023

Ok, I have been able to compile with multicipher but having hard time on some parts:

First, I cannot make it compile through a simple devtools::install_github.

Working steps under Ubuntu 23.04 to make it compile:

  1. git clone this repo
  2. build source package without modification
  3. replace src/vendor/sqlite3/sqlite3.h and src/vendor/sqlite3/sqlite3.c with their mulcticipher equivalent (from https://github.com/utelle/SQLite3MultipleCiphers/releases/download/v1.8.1/sqlite3mc-1.8.1-sqlite-3.44.2-amalgamation.zip)
  4. Rename files accordingly:
  • sqlite3mc_amalgamation.c -> sqlite3.c
  • sqlite3mc_amalgamation.h -> sqlite3.h
  1. Build one more time -> no error

Step 5) fails without execution of step 2) previously.

Then, I don't know how to trigger encryption:

library(DBI)

# does nothing, db stays in plaintext format
con <- dbConnect(RSQLite::SQLite(), "test2.db")
dbExecute(con, "PRAGMA cipher = 'sqlcipher'")
dbExecute(con, "PRAGMA key = 'passphrase'")
dbWriteTable(con, "mtcars", mtcars)
dbDisconnect(con)

I suspect some sql statements are sent by RSQLite before PRAGMA key, which is not allowed based on documentation:
"This means that the key (and any options) must be set before the first operation on the database"

@krlmlr
Copy link
Member

krlmlr commented Dec 31, 2023

Thank you for your patience.

Perhaps we need dbConnect(RSQLite(), default.extensions = FALSE, synchronous = NULL) ?

RSQLite:::dbConnect_SQLiteDriver
#> function(drv, dbname = "", ..., loadable.extensions = TRUE,
#>                                    default.extensions = loadable.extensions, cache_size = NULL,
#>                                    synchronous = "off", flags = SQLITE_RWC, vfs = NULL,
#>                                    bigint = c("integer64", "integer", "numeric", "character"),
#>                                    extended_types = FALSE) {
#>   stopifnot(length(dbname) == 1, !is.na(dbname))
#> 
#>   if (!is_url_or_special_filename(dbname)) {
#>     dbname <- normalizePath(dbname, mustWork = FALSE)
#>   }
#> 
#>   dbname <- enc2utf8(dbname)
#> 
#>   vfs <- check_vfs(vfs)
#>   stopifnot(is.integer(flags), length(flags) == 1)
#> 
#>   bigint <- match.arg(bigint)
#> 
#>   extended_types <- isTRUE(extended_types)
#>   if (extended_types) {
#>     check_suggested("hms", "dbConnect")
#>   }
#>   conn <- new("SQLiteConnection",
#>     ptr = connection_connect(dbname, loadable.extensions, flags, vfs, extended_types),
#>     dbname = dbname,
#>     flags = flags,
#>     vfs = vfs,
#>     loadable.extensions = loadable.extensions,
#>     ref = new.env(parent = emptyenv()),
#>     bigint = bigint,
#>     extended_types = extended_types
#>   )
#> 
#>   ## experimental PRAGMAs
#>   if (!is.null(cache_size)) {
#>     cache_size <- as.integer(cache_size)
#>     tryCatch(
#>       dbExecute(conn, sprintf("PRAGMA cache_size=%d", cache_size)),
#>       error = function(e) {
#>         warning("Couldn't set cache size: ", conditionMessage(e), "\n",
#>           "Use `cache_size` = NULL to turn off this warning.",
#>           call. = FALSE
#>         )
#>       }
#>     )
#>   }
#> 
#>   if (!is.null(synchronous)) {
#>     synchronous <- match.arg(synchronous, c("off", "normal", "full"))
#>     tryCatch(
#>       dbExecute(conn, sprintf("PRAGMA synchronous=%s", synchronous)),
#>       error = function(e) {
#>         warning("Couldn't set synchronous mode: ", conditionMessage(e), "\n",
#>           "Use `synchronous` = NULL to turn off this warning.",
#>           call. = FALSE
#>         )
#>       }
#>     )
#>   }
#> 
#>   if (default.extensions) {
#>     initExtension(conn)
#>   }
#> 
#>   reg.finalizer(
#>     conn@ptr,
#>     function(x) {
#>       if (dbIsValid(conn)) {
#>         warning_once("call dbDisconnect() when finished working with a connection")
#>       }
#>     }
#>   )
#> 
#>   conn
#> }
#> <environment: namespace:RSQLite>

Created on 2023-12-31 with reprex v2.0.2

@JeremyPasco
Copy link
Author

Thank you, I just tried but got no success. I made sure to retrieve the connection just after conn <- new("SQLiteConnection", ...) with parameters you provided, but the following PRAGMA had still no effect (db is still in plain text).
I don't know exactly what's happening behind a PRAGMA statement. Is it possible some R <--> C binding have to be developped specifically to handle encryption?

@krlmlr
Copy link
Member

krlmlr commented Jan 2, 2024

Would you like to point me to your implementation so that I can take a look?

@JeremyPasco
Copy link
Author

I just restarted my implementation and discovered that compilation step never finished properly. Now that it's fixed, I am able to encrypt databases!
Here is my fork: https://github.com/JeremyPasco/RSQLite

And here is a reprex to use it:

library(DBI)

# Create an encrypted database
con <- dbConnect(RSQLite::SQLite(), "test.db", synchronous = NULL)
dbExecute(con, "PRAGMA cipher = 'sqlcipher'")
#> [1] 0
dbExecute(con, "PRAGMA key = 'passphrase'")
#> [1] 0
dbWriteTable(con, "mtcars", mtcars)
dbDisconnect(con)

# Reopen it
con <- dbConnect(RSQLite::SQLite(), "test.db", synchronous = NULL)

# Fails without passphrase
dbListTables(con)
#> Error: file is not a database

# Success after passphrase
dbExecute(con, "PRAGMA cipher = 'sqlcipher'")
#> [1] 0
dbExecute(con, "PRAGMA key = 'passphrase'")
#> [1] 0
dbListTables(con)
#> [1] "mtcars"

dbDisconnect(con)

default.extensions = FALSE isn't necessary but synchronous = NULL is.

This fullfill my use case. But if you plan to integrate this feature into the package, it could benefit from certain modifications, like URI parameters handling for encryption instead of manual PRAGMA. Which could also solve the current synchronous issue IMO.

Let me know if you want me to contribute further.

@krlmlr
Copy link
Member

krlmlr commented Jan 3, 2024

Thanks, nice! What would a change look like that leaves sqlite.c and sqlite.h untouched, or changes these files only minimally? Happy to host it here as long as it's easy to maintain and understand.

@JeremyPasco
Copy link
Author

Not sure that's possible.
These 2 files were provided as is by Sqlite3MultipleCiphers.

As you can see with this commit, sqlite.h modifications consists only in 2 additions. I'm not a C dev but I guess that part could be put in a dedicated file.
But modifications of sqlite.c are more complex.

Sqlite3MultiCiphers aims to simplify the integration process by providing an amalgation source as a drop-in replacement of each new SQLite release as you can see here.

So updating Sqlite3MultipleCiphers would only require:

  1. Download latest sqlite3mc-x.x.x-sqlite-x.x.x-amalgamation.zip
  2. Extract sqlite3mc_amalgamation.c and sqlite3mc_amalgamation.h into srv/vendor/sqlite3/
  3. Rename them sqlite3.c and sqlite3.h

@krlmlr
Copy link
Member

krlmlr commented Jan 3, 2024

Thanks, that's too bad. If you want to run this from your fork for now, and perhaps deploy binaries to r-universe, we can add a link from this project to your fork.

@JeremyPasco
Copy link
Author

Ok I understand.
Just an other idea to keep it simple as possible and up-to-date with RSQLite updates: would it be possible to put the 3 needed steps into an optional script in this package and trigger it through something like install.packages("RSQLite", configure.args=c("--sqlite3MultipleCiphers"))?

@krlmlr
Copy link
Member

krlmlr commented Jan 3, 2024

I'd rather keep the two projects separate for now. Happy to reconsider at some point.

@JeremyPasco
Copy link
Author

Thank you for your time. I will make some improvements and write some docs on my fork then and I'll post update here when it is ready if you'd like to publish a link to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants