-
-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for dumping the database to stream #4165
Conversation
4d387d8
to
f4908bd
Compare
b082838
to
17372a0
Compare
} | ||
|
||
async fn write_bytes(w: &mut (impl AsyncWrite + Unpin), b: &[u8]) -> Result<()> { | ||
let bytes_len = format!("{}:", b.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These format
calls are pretty expensive, definitely would try to avoid them.
1d291f8
to
616fe76
Compare
My 208M database serializes to 149M bencode dump, so even without compression replacing the database with a dump is an improvement. Additional 50M are likely all the indices that can be recreated from the data. |
c45cb35
to
d53118c
Compare
3116182
to
a8e1d14
Compare
9a42192
to
82460d8
Compare
pub(crate) async fn get_connection(&self) -> Result<PooledConnection> { | ||
let lock = self.pool.read().await; | ||
let pool = lock.as_ref().context("no SQL connection")?; | ||
pool.get().await | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried that people will use this function for writing without calling write_lock()
. For now, I couldn't come up with a better idea to make this safer than a doc comment - maybe a doc comment is fine / the best we can do.
pub async fn deserialize(&self, r: impl AsyncRead + Unpin) -> Result<()> { | ||
let mut conn = self.get_connection().await?; | ||
|
||
// Start a write transaction to take a database snapshot. | ||
let transaction = conn.transaction()?; | ||
|
||
let decoder = Decoder::new(r); | ||
decoder.deserialize(transaction).await?; | ||
|
||
Ok(()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably needs to call self.write_lock()
.
async fn deserialize(mut self, mut tx: Transaction<'_>) -> Result<()> { | ||
self.expect_dictionary().await?; | ||
|
||
self.expect_key("_config").await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're holding the Transaction
over an await
point, the future returned by the function will not be Send
.
JsonRPC requires all futures to be Send
, so this:
diff --git a/deltachat-jsonrpc/src/api/mod.rs b/deltachat-jsonrpc/src/api/mod.rs
index b71718299..88f19e504 100644
--- a/deltachat-jsonrpc/src/api/mod.rs
+++ b/deltachat-jsonrpc/src/api/mod.rs
@@ -146,6 +146,16 @@ async fn inner_get_backup_qr(&self, account_id: u32) -> Result<Qr> {
#[rpc(all_positional, ts_outdir = "typescript/generated")]
impl CommandApi {
+ async fn t(&self) {
+ let mut w = Vec::new();
+ self.get_context(0)
+ .await
+ .unwrap()
+ .sql
+ .serialize(&mut w)
+ .await;
+ }
+
/// Test function.
async fn sleep(&self, delay: f64) {
tokio::time::sleep(std::time::Duration::from_secs_f64(delay)).await
diff --git a/src/context.rs b/src/context.rs
index 627e63c78..b2bf48129 100644
--- a/src/context.rs
+++ b/src/context.rs
@@ -188,7 +188,7 @@ fn deref(&self) -> &Self::Target {
pub struct InnerContext {
/// Blob directory path
pub(crate) blobdir: PathBuf,
- pub(crate) sql: Sql,
+ pub sql: Sql,
pub(crate) smeared_timestamp: SmearedTimestamp,
/// The global "ongoing" process state.
///
will give you a vey long and unreadable error.
https://docs.rs/tokio/latest/tokio/task/struct.LocalSet.html#use-with-run_until looks like it might make it possible to run a non-send function from JsonRPC, but I didn't research further for now. Maybe you know more about this, anyway.
/// If this version is below the actual database version, | ||
/// serialization code is outdated. | ||
/// If this version is above the actual database version, | ||
/// migrations have to be run first to update the database. | ||
const SERIALIZE_DBVERSION: &str = "99"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we'll have to update serializing and deserializing everytime we change the db scheme, and (de)serializing only works between the same core version. We should make sure that everyone is OKish with this trade-off before merging.
Serialization is ready,
dump <filename>
in repl produces a dump in bencode format.Deserialization is ready too, in REPL
read <filename>
reads the dump.Serializer holds a read transaction, so it should be possible to run serialization twice and get the same result, e.g. for hashing and actual streaming, without having to store the dump anywhere.
The blobs are currently not dumped into the same stream. Dumping the blobs into the stream has the advantage that the dump will not contain
$BLOBDIR
filenames at all, so there are no assumptions about whether the blobs are stored inside the database or in the filesystem. We can rethink it if we decide to store the blobs into the database.Note that in bencode dictionary keys are stored in lexicographical order. This makes it very easy to parse structures stored as dictionaries, because you either know the next key if the key is required or only need one token of lookahead (
peek()
operation on the token stream) if they key is optional.