-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NSE failed to persist megolm sessions, causing an explosion of UISIs #5426
Comments
So I went to look for UISIs in my rageshake:
...and then looked to see what the NSE was up to with the most common ones...
Which is interesting, as somewhere between |
So looks like the NSE deleted our own crypto store, during this run:
...which would rather explain why E2EE then exploded. |
(also, why do we keep calling |
Having read the code, I see that the NSE is trying to maintain a separate bgCryptoStore, and it's that which is being deleted rather than the main realm DB. However, there's presumably a race of some kind which means that it thinks that the app has consumed all the data from bgCryptoStore and so resets it prematurely. From a quick look at the code: if syncResponseStoreManager.syncResponseStore.syncResponseIds.count == 0 {
// To avoid dead lock between processes, we write to the cryptoStore only from only one process.
// If there is no cached sync responses, it means they have been consumed by MXSession. Now is the
// right time to clean the cryptoStore.
MXLog.debug("[MXBackgroundSyncService] updateBackgroundServiceStoresIfNeeded: Reset MXBackgroundCryptoStore")
cryptoStore.reset()
} Looks like the if clause might be incorrectly returning 0? In terms of the race, the timing here is that our push comes in at 13:01:51, concludes that the bgCryptoStore should be reset - but meanwhile the main app is fg'd and running from 13:01:28.018, doing a big slow incremental catchup that finally completes at 13:01:52.253 after taking 16.9s(!) to commit the MXFileStore. I'm failing to see how the activity in the bgCryptoStore ever gets copied over to the real CryptoStore however, so i'm unclear on how the race is actually failing. This whole thing feels like an incredibly fragile hack around the fact that our DB is too slow and inefficient to be used in NSE. Would love to know how rust-sdk and rust-sdk-crypto compare. |
+100 Damir is aware of that. He confirmed that the rust-sdk-crypto and its DB support this scenario of multi-process. |
|
I think i just saw this again in 1.8.3 TF; rageshake attached. |
I think my iPhone contacts may be running into this issue. They are unable to decrypt any messages any more. One contact ran into this state a few weeks ago and has since ditched Element and Matrix entirely (after years of use). The other has just reached it now. I have two questions:
|
Any updates/workarounds? |
Yes the issue still exists |
I can see only the one way sync from the NSE side..
and there Is no is there a way to force sync |
Just curious if this can affect the sync tokens and cryptoStore getting updated: |
Steps to reproduce decrypt error:
matrix-org/matrix-ios-sdk@13ce311 making the fileStore.loadMetaData(true) fixes this provided the discardsession happens only once.. |
This fix the issue : |
Steps to reproduce
I spent the morning on EW, and then fg'd up EI when going to lunch. However, many recent sessions (including my own) were UISI. Rageshaked on sender (EW) and receiver (EI).
Outcome
No UISIs, rather than loads of UISIs from different senders.
It looks like the NSE received these new sessions (from my EI rageshake):
but then when the main app tried to decrypt using that inbound session:
So, how did the NSE fail to store the session?
Your phone model
12 pro max
Operating system version
iOS15.1.1
Application version
1.6.13-develop-#20220120114547
Homeserver
matrix.org
Will you send logs?
Yes
The text was updated successfully, but these errors were encountered: