-
-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
background handler terminated by 0xdead10cc sometimes #1202
Comments
we should check the diff between 1.19.0 and 1.19.1 |
this is the diff between 1.19.0 and 1.19.1: |
@r10s does There are two nice articles describing similar errors to what we see: Somewhere we're keeping a file handle, probably the database while the app is about to be suspended.
This quote describes what might go wrong in our app, too: we call maybe network from a background task and mark the background task some steps afterwards as completed: |
not sure, at a first glance, i did not find something, but might be, maybe some config or so. and even if not directly - it may still keep the handle open, if it takes longer for whatever reason. maybe @link2xt knows better. |
@r10s said on irc:
Not sure why this happens, all From reading the links above, I get that the problem is unclosed database "connections" (file descriptors in case of SQLite) at the time of suspension. I see that you are trying to stop working by calling |
thanks for the insights, @link2xt ! @cyBerta we are sharing the database between extensions, or? we did not really have a choice iirc, however, is is trally what is discouraged in the links above? i thought about that always more as a "lib" doing core stuff, that is used by main-app and needed extensions. also, we did not always get the error - or maybe it was just covered by other ones? EDIT: i checked some older crash logs - it was also present there - but of course much rare as the background fetch was called less often. so, indeed, there are some hints that it is not really related to |
thinking it over - even if we schedule so, maybe instead of #1206 that schedules everything, just call maybeNetwork in the given thread as it was done before 1.19.1. i would not expect that the issue will be gone completely, but maybe it will be much less, on the level as before. even if it does not help, we would use nb: i am wondering if the crash is actually user-visible when not using testflight or a debugger. so, maybe, some mitigating is good enough here. |
some more deep information: https://developer.apple.com/forums/thread/126438, https://developer.apple.com/forums/thread/655225 EDIT: and here is the official apple doc, took me a moment to find it: https://developer.apple.com/documentation/xcode/understanding-the-exception-types-in-a-crash-report#EXC_CRASH-(SIGKILL) |
the issue is not fixed by #1208, maybe mitigated. for getting the error popup: you have to install from testflight. i never got the exception-popup when installing from xcode. instead i am getting the launch screen maybe once a day on starting, that might be how 0xdead10cc gets visible without testflight. i am wondering, if the issue is user visible when installing from regular app store; if not, maybe it is okay to release with the issue for now, unless we find an easy way to fix it. |
also some explanation how GRDB mitigates the issue on the user API level: https://github.com/groue/GRDB.swift/blob/master/Documentation/SharingADatabase.md#how-to-limit-the-0xdead10cc-exception |
If I understand the issue correctly, our problem is related to the fact that we share the database between the share extension and main app. In this case database locks are created for syncing. Both share extension and main app run as separate processes. Another way to mitigate the issue is described here: https://inessential.com/2020/02/13/how_we_fixed_the_dreaded_0xdead10cc_cras
I think this will be very cumbersome or impossible for us to implement though, because the extension needs data from the database to show the chat list, contacts etc. and to send messages. |
yip, some reports point in that direction, while others do not. what i do not get: if it is really related to the database moved to shared spaces: (a) i would expect the issue to appear far more often, not only once a day or so (b) what would the shared space be good for then? :) |
with latest testflight, there may be a new issue, where 0xdead10cc happens, on startup, reported by @dignifiedquire - and sth. similar happens also to me when running testflight. the new issue may be related to the fix-markseen-forground-fix, #1182 - this is the log, that points to that direction, https://gist.github.com/r10s/f93220a734a8a3e26705fed996356283, line 48, startTimer. |
0xdead10cc is still there in 1.20.3, here is a recent log with a different stack trace: https://gist.github.com/r10s/ffd99677f5b6e8f1d62ae4d82238311b, we'll see if at least the new coming-to-foreground-issue is fixed and if the number of exceptions are lowered by 1.20.3 in general (i personally had none so far) |
another idea: may it be that we have to do https://stackoverflow.com/questions/37321498/background-task-on-remote-notification-suspended-after-a-short-while points in this direction, but i did not find anything "official" yet. maybe check some other open source apps. |
the link leads to the issue overview. Is that where you wanted to point to? |
i updated the link :) |
Regarding our questions on the last meeting:
from https://developer.apple.com/documentation/uikit/uiapplication/1623031-beginbackgroundtask:
|
How many background tasks can be called: from https://developer.apple.com/forums/thread/85066
|
0xdead10cc crashes in 1.28.1 are at typically at the following line: https://github.com/deltachat/deltachat-ios/blob/v1.28.1/deltachat-ios/AppDelegate.swift#L460 meanwhile, however, the code is different, also, these crashes seem to be replaced by a "force suspend" when being not in testflight, see above. but we are still in the same code are and i still do not full get the too long times from #1461 (we meanwhile just stop the last 16 wakeups - all of them should be around 11 seconds, however, some are 10 minutes or longer, so there is a wakeup in between):
also @link2xt : your suggestion from above, a dc_accounts_suspend() - how much effort would that be? not sure if that would totally fix the issue, however, it still seems the right way to go. |
ftr: a maybe somehow related crosslink: deltachat/deltachat-core-rust#2955 (comment) |
With current API it is possible to introduce |
i am running 1.30.1 from testflight on an iphone7 that got these errors before. well, up to no, 30+ hours later, i did not encountered any crashes. however, after all, i stay skeptical 🧐 |
The database is going to be closed more reliably with deltachat/deltachat-core-rust#4053 |
nice, looking forward to try that out and if it helps on testflight builds |
lets hope for the best, currently getting about 10 crashes per day.. |
on testflight? or otherwise debug? |
on testflight |
the issue is not completely fixed, however, still seems to appear on testflight only. we probably need a way to really suspend the core, the theory is still that core accesses some files a moment after we say "okay, apple, ready to suspend". |
Currently the core maintains an SQL connection pool for each Context that is never closed. Context contains If we want to close It is also technically possible to destroy async tokio Runtime which is currently lazily created by the deltachat-ffi, but it is probably not needed and would take additional effort, closing the database is the first thing to do. |
I think this is related to encrypted accounts and to the new lock on the account manager. |
getting "Namespace RUNNINGBOARD, Code 0xdead10cc" crashes from time to time, i cannot reproduce that, but it was reported by @dignifiedquire; some logs are in App Store Connect, excerpt
full log - so, there seems sth. wrong with background thread handling, searching for "Namespace RUNNINGBOARD, Code 0xdead10cc" brings up some advices, need to investigate further.
EDIT: to sum up, the issue with 0xdead10cc is that a lock is held on some files (probably sql-database) after the app
getssignals to be suspended. reason might be dispatched jobs that are still running for a little moment, we tried to mitigate that by recent commits and to exclude this cause. however, might also be that "shared" usage of the database is the cause, that would be very hard to fix, see discussions wrt share-to-delta.stats of recent versions:
compared with older stats, we had 700 installations and only 1 crash/7 days (at least after removing sqlx)
ftr, 0xdead10cc was also topic in #1057
The text was updated successfully, but these errors were encountered: