Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smarter termination of background fetches #1461

Closed
wants to merge 1 commit into from
Closed

Conversation

r10s
Copy link
Member

@r10s r10s commented Feb 4, 2022

this pr aims to finish background fetches as soon as possible, so when all fetching is done and all connections are "idle" again.

as a first preparation, i added some tracking for the current duration of background fetches.

currently, we do start_io(), wait 10 seconds, stop_io(), wait another second and finish.

i would have expected, that this should always be around 11-12 seconds, which is true in many some cases - but there are also durations of 1-10 minutes:

EDIT: some more data:

as the app is terminated by apple after 30 seconds, this may means that the app gets "frozen" and woke up after some time, with then either being in foreground or maybe background again, without really using every slot for doing fetches.
this may also explain that we notice wakup coming regulary, but mails not fetched in that time.

so, my suspection is that stop_io() may not interrupt fetching but wait until things are done (which would not work then).

the log is about today, i can say that no "big" messages were fetched in that time, justifying a long download, only a number of small ones, read receipts, etc, maybe 100 messages.

EDIT: stop_io() is surrounded by the log messages "⬅️ finishing fetch" and "⬅️ fetch done". i checked some logs, all of them look okay, there is only a second between these two calls, so that part looks good up to now. but there must be one call lasting some time, hm.

@link2xt can you say what stop_io() is really doing? does it interrupt ongoing fetches? or at least, finishe only the current one and exists then? or does it wait until the next idle?

once this is figured out and we have reliable 11-12 second fetches, we can do the next step and terminate sooner/later by listening to DC_EVENT_CONNECTIVITY_CHANGED somehow as described in #1157

targets #1157 , cc @cyBerta @Hocuri

@Hocuri
Copy link

Hocuri commented Feb 5, 2022

link2xt can you say what stop_io() is really doing? does it interrupt ongoing fetches? or at least, finishe only the current one and exists then? or does it wait until the next idle?

The question was not asked at me, but I was cc'ed, so I'll look at the code and answer anyway :)

The things that might take so long:

  • accounts.read() i.e. lock the RwLock
  • scheduler.read() i.e. lock the other RwLock
  • stop_sender.send(()) - the channel is bounded, so, there IIUC might be a race condition that stop_io() is called twice and this tries to send twice. But then we would probably have a deadlock, not just a 1 minute delay.
  • The ongoing fetches are cancelled at the next await in the code, i.e. almost immediately (this is done with race(fut)). The current fetch is not finished.
  • We wait for them to complete (actually twice, once via shutdown_receiver.recv().await and then via handle.await)
  • We do ephemeral_task.cancel()

All of them look like they don't take so long, but seems like one of them does...

@link2xt
Copy link
Contributor

link2xt commented Feb 5, 2022

  • The ongoing fetches are cancelled at the next await in the code, i.e. almost immediately (this is done with race(fut)). The current fetch is not finished.

Database access is blocking, so it's possible that Inbox thread is waiting to do some small transaction while Movebox thread is holding the lock, writing hundreds of messages to the database.

Mvbox is not aborted until Inbox says it has finished, because Scheduler::stop is stopping all handles in a sequence:
https://github.com/deltachat/deltachat-core-rust/blob/13a5e3cf6f32f94c66fce6b68a2eb12d83d16a47/src/scheduler.rs#L553-L558

Maybe stopping Mvbox thread first or, even better, stopping all of them concurrently (racing inbox.stop(), mvbox.stop(), sentbox.stop() against each other) could help.

@r10s r10s force-pushed the finish-bg-fetch-asap branch from 7802d16 to 7b56b1d Compare February 19, 2022 01:59
@r10s r10s force-pushed the finish-bg-fetch-asap branch 2 times, most recently from ad69f9b to d0c80f3 Compare March 23, 2022 22:50
@r10s
Copy link
Member Author

r10s commented Mar 25, 2022

on a fresh installed iphone se with a fresh testrun-account, notifications arrive as well, however, the log does not show any notify_fetch_durations:

@r10s r10s force-pushed the finish-bg-fetch-asap branch from d0c80f3 to 79219a7 Compare April 15, 2022 16:29
@r10s
Copy link
Member Author

r10s commented Apr 15, 2022

while the faster termination is still a thing to do, and the long timings need to be fixed, the approach of doing things synchronous in performFetchWithCompletionHandler resp. didReceiveRemoteNotification is wrong:

  • at least didReceiveRemoteNotification is called in the main thread (and i think, also performFetchWithCompletionHandler which is similar in many ways), i tested that by printing Thread.isMainThread
  • opening the app while being randomly in one of the fetch methods will freeze the ui until the fetch is done, this is easily reproducible in the simulator
  • there are some examples that do things synchronous, however, they are wrong; maybe just to make things simple

therefore, the previous approach, scheduling the work to a background thread is correct in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants