Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress mime_headers column with HTML emails stored in database (#4077) #4129

Merged
merged 1 commit into from
Mar 31, 2023

Conversation

iequidoo
Copy link
Collaborator

@iequidoo iequidoo commented Mar 4, 2023

It compresses for me HTML emails more than twice (10468 -> 4988 bytes f.e.), but for some reason in my db they take 0% of total disk usage, so, @link2xt, better measure how it helps to u. But currently i use the max. compression level (11), it's too slow and takes much time when upgrading a db. So, for ur db u probably need to reduce it, maybe even to 6.

UPD: Forgot to do vacuum on my db. Remeasured -- it reduces my db in 1.797 times.

CHANGELOG.md Outdated
@@ -8,6 +8,7 @@
- Run `cargo-deny` in CI. #4101
- Check provider database with CI. #4099
- Switch to DEFERRED transactions #4100
- Compress `mime_headers` column with HTML emails stored in database
Copy link
Collaborator Author

@iequidoo iequidoo Mar 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not adding a PR number, it can be easily found as it has the same name. Maybe we can add a CI check that a changelog entry is a substring of the PR name?
UPD: But if the PR is renamed then, a recheck is needed.

Copy link
Collaborator

@link2xt link2xt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to compress the whole database during the migration is not only slow, but also looks dangerous. What if we run out of space/memory or otherwise repeatedly fail to compress the whole database?

One option would be to introduce another state for mime_modified column, so 0 means there is no raw MIME structure, 1 means there is an uncompressed mail and 2 means there is a brotli compressed blob stored. Then it is possible to gradually compress old messages during housekeeping: try to select up to 50 uncompressed messages with mime_modified=1 and turn them into mime_modified=2 messages.

src/sql/migrations.rs Outdated Show resolved Hide resolved
src/sql/migrations.rs Outdated Show resolved Hide resolved
src/sql/migrations.rs Outdated Show resolved Hide resolved
@iequidoo
Copy link
Collaborator Author

iequidoo commented Mar 4, 2023

Trying to compress the whole database during the migration is not only slow, but also looks dangerous. What if we run out of space/memory or otherwise repeatedly fail to compress the whole database?

One option would be to introduce another state for mime_modified column, so 0 means there is no raw MIME structure, 1 means there is an uncompressed mail and 2 means there is a brotli compressed blob stored. Then it is possible to gradually compress old messages during housekeeping: try to select up to 50 uncompressed messages with mime_modified=1 and turn them into mime_modified=2 messages.

I still think it's ok to compress the whole db in the migration:

  • It's unlikely that it will fail because of low disk space, it takes apparently as much as the current db size at maximum. As for memory, i think sqlite also doesn't eat much here (i even think it doesn't depend on a number of rows processed in a query, it just produces dirty fs pages that go to the disk on memory pressure), but will measure.
  • It's the single transaction. If it fails, we can ask a user to provide a bit more disk space and restart the app. I don't think that DC db takes more than 1% of disk space for most users. And i think it's not a good idea to have less than 1% of free disk space usually :)

The only thing we must care of is the upgrade time. If it takes too long, we should go the way u suggested.

@link2xt
Copy link
Collaborator

link2xt commented Mar 4, 2023

It's unlikely that it will fail because of low disk space, it takes apparently as much as the current db size at maximum.

It definitely could have happened in my case, 500M of HTML messages and close to zero disk space because the common way to run out of disk space is to take some high resolution photos, so when you run out of disk space there is usually less than 100M left.

If it fails, we can ask a user to provide a bit more disk space and restart the app.

We don't even have an existing UI for this. The problem with migrations is that they happen not as a result of explicit user action, but after (auto-)upgrade. The user could have continue messaging and may have been in some ongoing conversation, then the app restarts and demands that instead of chatting the user frees up some disk space and waits until upgrade finishes.

@iequidoo
Copy link
Collaborator Author

iequidoo commented Mar 4, 2023

It definitely could have happened in my case, 500M of HTML messages and close to zero disk space because the common way to run out of disk space is to take some high resolution photos, so when you run out of disk space there is usually less than 100M left.

Of course, it could happen, but for users that accidently run out of disk space the probability of this is << 1 if we assume that their db is << total disk space. But for users constantly sitting at low disk space it can be as large as 100%, yes (but IDK what it's an idea of doing so :)).

Btw, as for memory -- for me DC eats 35--55M of RAM during the migration. It would be interesting how much it eats on ur large db. But ok, you convinced me, let's do the format change in background.

@iequidoo iequidoo force-pushed the iequidoo/compress-mime-headers branch 2 times, most recently from 0892cff to f60b2b0 Compare March 9, 2023 03:11
@iequidoo iequidoo marked this pull request as ready for review March 9, 2023 03:39
src/sql/migrations.rs Outdated Show resolved Hide resolved
@iequidoo
Copy link
Collaborator Author

iequidoo commented Mar 9, 2023

@link2xt did u see https://docs.rs/sqlite-zstd/latest/sqlite_zstd/ ? It does row-level db compression

@iequidoo iequidoo requested a review from r10s March 9, 2023 15:58
@link2xt
Copy link
Collaborator

link2xt commented Mar 9, 2023

@link2xt did u see docs.rs/sqlite-zstd/latest/sqlite_zstd ? It does row-level db compression

No. Looks interesting, but I am not sure it is compatible with our SQLCipher library and if it is going to be maintained in the future, while brotli is definitely not going anywhere, web browsers use it.

@iequidoo iequidoo force-pushed the iequidoo/compress-mime-headers branch from f60b2b0 to c3dc8aa Compare March 9, 2023 22:41
Copy link
Collaborator

@link2xt link2xt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mime_compressed column that may or may not exist is too complicated IMO.
Now there is a need to construct SQL statements dynamically, consider the case when the column may exist or may not exist etc.

Wouldn't it be simpler to:

  1. Use mime_modified=0 to indicate that there is no saved MIME, mime_modified=1 that there is an uncompressed MIME and mime_modified=2 to indicate that there is a brotli-compressed MIME.
  2. Store all new MIME messages in mime_modified=2 format, stop writing mime_modified=1.
  3. During the housekeeping look for mime_modified=1 and recompress them into mime_modified=2 format, 50 messages at a time.

No need for new columns and MsgsMimeUncompressed config value.

If we eventually want to get rid of old mime_modified=1 format and conversion code, a migration can be introduced that simply sets mime_modified=0 and removes the MIME message. If it happens in >1year, worst case is something was not converted by that time, then HTML part will simply be removed, not a big deal.

CHANGELOG.md Outdated Show resolved Hide resolved
src/tools.rs Outdated Show resolved Hide resolved
src/tools.rs Outdated Show resolved Hide resolved
@iequidoo
Copy link
Collaborator Author

iequidoo commented Mar 10, 2023

This mime_compressed column that may or may not exist is too complicated IMO. Now there is a need to construct SQL statements dynamically, consider the case when the column may exist or may not exist etc.

Wouldn't it be simpler to:

1. Use `mime_modified=0` to indicate that there is no saved MIME, `mime_modified=1` that there is an uncompressed MIME and `mime_modified=2` to indicate that there is a brotli-compressed MIME.

2. Store all new MIME messages in `mime_modified=2` format, stop writing `mime_modified=1`.

3. During the housekeeping look for `mime_modified=1` and recompress them into `mime_modified=2` format, 50 messages at a time.

No need for new columns and MsgsMimeUncompressed config value.

If we eventually want to get rid of old mime_modified=1 format and conversion code, a migration can be introduced that simply sets mime_modified=0 and removes the MIME message. If it happens in >1year, worst case is something was not converted by that time, then HTML part will simply be removed, not a big deal.

There's some complex logic around mime_modified in receive_imf.rs (at least i see that mime headers may be saved also if save_mime_headers is set), so i'd avoid mixing it with compression. mime_compressed by itself doesn't add complexity, we just may assume that this column always exists and get rid of dynamic SQL statements and MsgsMimemUncompressed config value. And then after 1 year, as u suggested, we may just drop the mime_compressed column and all still uncompressed HTML parts.

Also we can rename mime_compressed to compressed just in case we decide to compress other columns later. So, 1 would mean that mime_headers is compressed, 2 -- mime_in_reply_to is also compressed and so on.

As for 3., it's already done in housekeeping, for 50 messages at a time, but in the loop until all messages are compressed. Afaiu housekeeping doesn't block a user from using DC? The only concern is if there are tons of messages, the battery will be drained quickly.

@iequidoo iequidoo force-pushed the iequidoo/compress-mime-headers branch from c3dc8aa to 559b79c Compare March 10, 2023 13:43
@iequidoo iequidoo requested a review from link2xt March 10, 2023 13:43
@iequidoo iequidoo force-pushed the iequidoo/compress-mime-headers branch from 559b79c to 1b3fd0e Compare March 10, 2023 14:03
src/sql/migrations.rs Outdated Show resolved Hide resolved
src/tools.rs Show resolved Hide resolved
src/tools.rs Show resolved Hide resolved
src/message.rs Outdated Show resolved Hide resolved
src/receive_imf.rs Outdated Show resolved Hide resolved
src/sql.rs Outdated
})? {
have_uncompressed = true;
let (id, mime_headers) = row?;
let mime_headers = buf_compress(&mime_headers)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment, this is probably fine, because we cannot do this outside the transaction. And this already happens inside block_in_place.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second thought, maybe select uncompressed data, compress it in spawn_blocking and then open a transaction only when we are going to write? Just check when writing that chat ID is not trash, i.e. the message was not deleted while we were compressing it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the problem is that we're blocking other write transactions for too long (blocking a db connection isn't a problem, we can just allocate one extra connection if so). Maybe it's ok to process one message per transaction if so? Otherwise it becomes complicated: select 50 messages, then compress them, then start a transaction to store them and also make sure there are no races (a message wasn't deleted and so on)

Copy link
Collaborator Author

@iequidoo iequidoo Mar 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think better to introduce a semaphore with 2 permits around Sql::write_mtx. So, one permit is consumed by Sql::write_lock() and the second may be checked using available_permits() inside long-running transactions so that they can cancel running if a contention with other write transactions is detected

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise it becomes complicated: select 50 messages, then compress them, then start a transaction to store them and also make sure there are no races (a message wasn't deleted and so on)

If the message was deleted, we just don't store it. UPDATE ... WHERE chat_id != trash AND id = ... or something like this, worst case the message is deleted and UPDATE becomes no-op.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semaphore and cancelling long-running transactions definitely sounds complicated, easier not to have long-running transactions with computations like compression and encryption inside.

Copy link
Collaborator Author

@iequidoo iequidoo Mar 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can check it out now, it really looks simpler than splitting the transaction. The only downside is that it may be cancelled early and update just one row, but i don't think it will happen frequently.
UPD: And also this mechanism can be used in other long-running transactions if they will appear in the future.
UPD: Maybe we need to grow the connection pool size to 4 then.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increased the connection pool size to 4 and added a comment on this

@iequidoo
Copy link
Collaborator Author

@link2xt, btw, maybe we should check available battery somehow in housekeeping? And if it's < 80% f.e., skip compression. There's https://docs.rs/battery/latest/battery/ f.e.

@link2xt
Copy link
Collaborator

link2xt commented Mar 10, 2023

@link2xt, btw, maybe we should check available battery somehow in housekeeping? And if it's < 80% f.e., skip compression. There's docs.rs/battery/latest/battery f.e.

This does not support Android and iOS. If we go for this, it is easier to implement these platform-specific check in the UI and tell the core when it can do expensive operations or rather postpone them. But even easier is not to do it at all, I don't think this compression is that expensive.

@iequidoo
Copy link
Collaborator Author

This does not support Android and iOS. If we go for this, it is easier to implement these platform-specific check in the UI and tell the core when it can do expensive operations or rather postpone them. But even easier is not to do it at all, I don't think this compression is that expensive.

But it takes significant time even for me. I think that on your 747M db it can take 15-20 minutes and if it will happen when the battery is almost exhausted, it's not good.

@link2xt
Copy link
Collaborator

link2xt commented Mar 10, 2023

But it takes significant time even for me. I think that on your 747M db it can take 15-20 minutes and if it will happen when the battery is almost exhausted, it's not good.

Then maybe limit to compressing 100 messages during one housekeeping, so it compresses 100 messages a day. It may take a while but will catch up eventually because new messages are also compressed. And we can still speed it up in the next updates if needed.

@iequidoo iequidoo force-pushed the iequidoo/compress-mime-headers branch from 1b3fd0e to 37fb59f Compare March 11, 2023 01:49
src/sql.rs Outdated
pub async fn write_lock(&self) -> MutexGuard<'_, ()> {
self.write_mtx.lock().await
pub async fn write_lock(&self) -> Result<(SemaphorePermit<'_>, MutexGuard<'_, ()>)> {
Ok((self.write_sem.acquire().await?, self.write_mtx.lock().await))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe split this into multiple lines, so it is obvious that semaphore is acquired first and mutex second.

Because if we first try to get the mutex and lock there, we don't get the semaphore even when it is free, and therefore don't signal that we are waiting for mutex.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. But it's quite obvious for people like me that came from C -- Rust tuples are evaluated left-to-right like the C comma operator that looks very similar sintactically :)

@link2xt
Copy link
Collaborator

link2xt commented Mar 11, 2023

LGTM overall.

I wonder if the semaphore is an overkill here and a second "queue mutex" would be sufficient.

Entering the transaction:

  1. Acquire queue mutex.
  2. Acquire transaction mutex.
  3. Open the transaction.

Closing the transaction:

  1. Close the transaction.
  2. Release queue mutex.
  3. Release transaction mutex.

Checking if someone is waiting:

  1. Release the queue mutex.
  2. Try to get queue mutex back.
  3. If successfully acquired the queue mutex, continue.
  4. If fail to get the queue mutex, close the transaction and release the transaction mutex.

The queue mutex guard may be moved into the transaction closure, so it is available to the transaction code for releasing anytime and is automatically dropped at the end of transaction.

@iequidoo iequidoo force-pushed the iequidoo/compress-mime-headers branch from 37fb59f to b133d3e Compare March 11, 2023 13:53
@iequidoo
Copy link
Collaborator Author

LGTM overall.

I wonder if the semaphore is an overkill here and a second "queue mutex" would be sufficient.

Entering the transaction:

1. Acquire queue mutex.

2. Acquire transaction mutex.

3. Open the transaction.

Closing the transaction:

1. Close the transaction.

2. Release queue mutex.

3. Release transaction mutex.

Checking if someone is waiting:

1. Release the queue mutex.

2. Try to get queue mutex back.

3. If successfully acquired the queue mutex, continue.

4. If fail to get the queue mutex, close the transaction and release the transaction mutex.

The queue mutex guard may be moved into the transaction closure, so it is available to the transaction code for releasing anytime and is automatically dropped at the end of transaction.

Replaced it with Arc. Mutexes/semaphores are not really needed here, no one needs waiting on them

@link2xt
Copy link
Collaborator

link2xt commented Mar 17, 2023

I think it still can be improved. Looks like transactions have an acceptable overhead when we do <= 50 transactions per second.

50 transactions per second is definitely too much. Typical disk write latency is about 10ms, so 50 transactions is nearly half a second spent syncing to disk, 50% of the time doing nothing.

So, we can SELECT and compress messages for 0.02s and then run an UPDATE transaction.

I would say SELECT and compress at least for a second, then you get about 1% sync overhead.

@iequidoo
Copy link
Collaborator Author

50 transactions per second is definitely too much. Typical disk write latency is about 10ms, so 50 transactions is nearly half a second spent syncing to disk, 50% of the time doing nothing.

I thinks optimising for rotational disks isn't worth it. But even on my SSD (non-NVMe however) doing > 50 transactions/s is unefficient. I think there are also some expenses in SQLite (statement compilation and so on).

I would say SELECT and compress at least for a second, then you get about 1% sync overhead.

1 second is good for performance, but then a user may experience noticeable slowness of UI. Better 0.1s then, it's less than avg human reaction time.

@link2xt
Copy link
Collaborator

link2xt commented Mar 17, 2023

I thinks optimising for rotational disks isn't worth it.

SD card latencies are even worse, can be as bad as about 100ms average and 200ms worst. This is what you can expect from an old Android phone.

1 second is good for performance, but then a user may experience noticeable slowness of UI.

It is compressing on a single thread and not locking the database while compressing, should not be noticeable at all unless the system is single-core which is rare even for phones. And this happens once a day at most.

@link2xt
Copy link
Collaborator

link2xt commented Mar 17, 2023

So I would compress as much as possible within 5 seconds or whatever the housekeeping limit is, then write everything in a single transaction and do no more than a single compression transaction per housekeeping.

@iequidoo iequidoo force-pushed the iequidoo/compress-mime-headers branch from fb938f6 to 6c732e9 Compare March 17, 2023 03:00
@iequidoo
Copy link
Collaborator Author

iequidoo commented Mar 17, 2023

So I would compress as much as possible within 5 seconds or whatever the housekeeping limit is, then write everything in a single transaction and do no more than a single compression transaction per housekeeping.

But then if we make a compression level too low or increase a time limit for this housekeeping job (for whatever reason), memory consumption may become unacceptable. So, i limited transactions to 1 per second, it's slower than a single transaction in 1.234 times for me which looks ok.

EDIT: I made a mistake in my measurements. I don't see any difference between the only transaction and 1 per second. And looks like now the bottleneck is block_in_place() called for each message separately.

EDIT: Removing block_in_place() changes nothing for me, so probably not. But it's still ~1.2 times as slow as compared to the single transaction with compression inside, and the reason is not the 1-second limit.

src/sql.rs Outdated
let mut msgs = Vec::new();
let start_time = Instant::now();
let mut stmt = conn.prepare(
"SELECT id, mime_headers FROM msgs WHERE mime_headers!='' AND mime_compressed=0",
Copy link
Member

@r10s r10s Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure, this does not load all mime_headers to memory at once?

surely, it should not, but after the recent experiences with #4164 i am a bit sceptical ... maybe we should still add some LIMIT just to sleep better, also if things may change in the future?

even if that is okay - how much memory is used during the 1 s where all data are collected in memory?

Copy link
Member

@r10s r10s Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k, this was not changed by the recent changes wrt transaction, so already tested and probably okay.

@iequidoo for reviewing in such a complex space, i prefer to have some commit history, even if force-push is used and even if the history is dirty here and there. this makes sense, eg. to review changes only - currently, i did not find a way to compare or test the current state against the one before @link2xt adapted the loop (i have a local backup by random, however :)

when merging to master, commit can be squashed (what we are usually doing)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k, i measure the memory usage on my database, this is on my M1 pro macos max. 25mb per second, this is probably quite okay, even for lower budget devices - however, these devices will probably be also slower and use less memory per second.

Copy link
Collaborator Author

@iequidoo iequidoo Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure, this does not load all mime_headers to memory at once?

surely, it should not, but after the recent experiences with #4164 i am a bit sceptical ... maybe we should still add some LIMIT just to sleep better, also if things may change in the future?

even if that is okay - how much memory is used during the 1 s where all data are collected in memory?

But there's no good LIMIT value, we don't know the avg size of messages, so we should reduce the time limit instead if we see a huge memory usage. SQLite itself doesn't eat much memory because of SELECT wth no LIMIT, probably it has a buffer of some reasonably limited size, the question is how much our msgs Vec eats. On my machine i don't see in the top RES > 50m, but better you check on your large db-s.

And btw, i don't see any performance improvement when increasing the time limit from the current 1s to 10s, so maybe even 1s is too much, but i think it's hardware-dependent, so i'd left it as is for now.

src/tools.rs Outdated Show resolved Hide resolved
Copy link
Member

@r10s r10s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, i am quite fine with the pr now :) thanks a lot for that good work!

i would do the "adaptive compression" as suggested above - this is only a little change, making eg. slow devices double as fast for larger blobs (small blobs are fast anyway).

as discussed in sync, however, we'll wait with merging after 1.36 is released

@link2xt
Copy link
Collaborator

link2xt commented Mar 17, 2023

If compression is too slow for housekeeping, because it may be triggered at application start or wake up, there is an option of moving it to backup export. When user is exporting backup, there is a progress bar anyway, so if we spend some time compressing the msgs table and only then start backup, user is waiting anyway. And it only happens once, the second backup will not have the compression step.

@dignifiedquire
Copy link
Member

I am wondering if we should really keep these in the DB. We are seeing already issues with DB sizes, so why not store the actual files on disk, if we do the work of processing them all? (could be compressed or not there)

@iequidoo
Copy link
Collaborator Author

I am wondering if we should really keep these in the DB. We are seeing already issues with DB sizes, so why not store the actual files on disk, if we do the work of processing them all? (could be compressed or not there)

Also an option, looks like we don't access HTML message parts frequently. But what are the problems with large DBs? Is it related to import/export / backup transfer?

@link2xt
Copy link
Collaborator

link2xt commented Mar 17, 2023

But what are the problems with large DBs?

The latest problem with large database is that we failed to remove columns here as it used >2GB of memory: #4164

Another problem is that for backup transfer we need to make a database snapshot, and currently it uses as much disk space as the database itself.

Let's keep this PR as is, I am going to make database-to-stream serialization anyway, maybe it will solve our problems with database streaming.

@link2xt
Copy link
Collaborator

link2xt commented Mar 18, 2023

as discussed in sync, however, we'll wait with merging after 1.36 is released

Maybe merge this with commented out housekeeping? So new messages are already compressed, and how we deal with old messages we can decide later.

@iequidoo
Copy link
Collaborator Author

Maybe merge this with commented out housekeeping? So new messages are already compressed, and how we deal with old messages we can decide later.

If we want to protect from huge memory usage, there's https://docs.rs/sysinfo/latest/sysinfo/index.html crate, we can check for available memory and if it goes below, say, 50% of its initial value, stop collecting compressed messages and start UPDATE transactions, each limited to 100ms to not cause a noticeable slowness of UI.

@link2xt
Copy link
Collaborator

link2xt commented Mar 18, 2023

It's not only memory usage, but iOS waking up the process and then killing it a bit later if it has not stopped within "reasonable" time after waking up: deltachat/deltachat-ios#1202

As for memory usage, I would rather not use too much memory in the first place rather than attempt to detect it with a potentially buggy dependency and all the race conditions involved of attempting to stop sqlite when it has already started allocating all your memory. With zram and overcommitment involved it is not even clear how much "free" memory you have at any moment on Linux/Android.

@iequidoo
Copy link
Collaborator Author

all the race conditions involved of attempting to stop sqlite when it has already started allocating all your memory

I think (and my measurements confirm) that the problem is not in SQLite allocating memory (it likely has a reasonably limited buffer), but our msgs Vec. As for that problem with removing colums, looks like SQLite doesn't implement ALTER TABLE in a good way, i don't see such behaviour for UPDATE f.e.

With zram and overcommitment involved it is not even clear how much "free" memory you have at any moment on Linux/Android.

I think using the value reported by free in -/+ buffers/cache: is ok for Linux, but i don't know which one to use on other platforms, and yes, i agree that better to avoid using potentially buggy crates. So, maybe indeed it's better to comment out the code in housekeeping() for now. Another option is to compress in background -- when a message is read from the db, compress it and store back. And after a year, say, just remove all still uncompressed HTML parts -- if they were not needed for a year, they're probably not needed at all. But we can postpone this decision.

@link2xt
Copy link
Collaborator

link2xt commented Mar 18, 2023

@r10s What do you think about commenting out housekeeping part and merging as is? Then we get compression for new messages and can figure out compression for old messages later after 1.36 release.

@iequidoo
Copy link
Collaborator Author

iequidoo commented Mar 18, 2023

@r10s What do you think about commenting out housekeeping part and merging as is? Then we get compression for new messages and can figure out compression for old messages later after 1.36 release.

I'd also compress ones that are read from the db as wrote above. We don't access mime_headers frequently, so i don't expect huge CPU/battery usage because of that.

@iequidoo iequidoo force-pushed the iequidoo/compress-mime-headers branch from 0fb0aeb to 3fff053 Compare March 18, 2023 21:25
@iequidoo
Copy link
Collaborator Author

Can we merge this now as the release is done? And maybe remove the commented out code at all? I believe that compression of only accessed messages is enough. If we decide to compress in housekeeping() later, we can resurrect the code from here

@link2xt
Copy link
Collaborator

link2xt commented Mar 23, 2023

Can we merge this now as the release is done?

It is just the core release, not 1.36 delta chat release. 1.36 release is due end of this month, then we can merge it.

And maybe remove the commented out code at all? I believe that compression of only accessed messages is enough. If we decide to compress in housekeeping() later, we can resurrect the code from here

+1

We may also move the whole housekeeping to some other place, like detecting that the app is idle for some time and then starting this. Currently housekeeping mostly happens at the app start, which is not good as it delays actual startup and on iOS may even happen when the app is actually in the background and better spend the time lchecking for new incoming messages than cleaning up the database.

@iequidoo iequidoo force-pushed the iequidoo/compress-mime-headers branch 2 times, most recently from f955d33 to ad1ad96 Compare March 25, 2023 19:26
@link2xt
Copy link
Collaborator

link2xt commented Mar 31, 2023

@iequidoo I think this can be rebased and then merged to master once the tests pass. 1.36.0 is already released and is using stable branch anyway, so we can start merging experimental things on the master branch.

@iequidoo iequidoo force-pushed the iequidoo/compress-mime-headers branch from ad1ad96 to 75fdc60 Compare March 31, 2023 20:59
@iequidoo iequidoo merged commit c401780 into master Mar 31, 2023
@iequidoo iequidoo deleted the iequidoo/compress-mime-headers branch March 31, 2023 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants