Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrating existing DB to an AppGroup #643

Closed
foxware00 opened this issue Nov 4, 2019 · 65 comments
Closed

Migrating existing DB to an AppGroup #643

foxware00 opened this issue Nov 4, 2019 · 65 comments
Labels

Comments

@foxware00
Copy link

foxware00 commented Nov 4, 2019

Question

I'm trying to create a Share extension that has access to my host application's database. The problem is that users already have the SQLite file created and populated. I know the path of the existing database and where the new one is.

My question is there a clean and easy way to migrate the SQLite file to another path in a one-off operation. I'm guessing there are a few files that all need to be copied and duplicated cleanly.

My initial thoughts would be to open the current DB, create a new on in a new path and copy all the data over. This might take a few seconds the first run but seems like the cleanest way to do it. I'd delete the old DB, and switch over to the new DB once the migration has completed. Or would a quicker direct file copy make more sense?

I've found a CoreData method that migrates data from one URL to another, is this something that's easy to achieve in GRDB?

Environment

GRDB flavor(s): GRDB
GRDB version: 4.1.1
Installation method: CocoaPods
Xcode version: 11.1
Swift version: 5.1
Platform(s) running GRDB: iOS
macOS version running Xcode: 10.14.5

@foxware00 foxware00 changed the title Migrating exiting DB to an AppGroup Migrating existing DB to an AppGroup Nov 4, 2019
@groue
Copy link
Owner

groue commented Nov 4, 2019

Hello @foxware00,

My advice is to use the backup API.

This sounds to me like the safest way to migrate your app in a context where several processes are attempting at using it. I hope that SQLite is able to deal with on-progress backups correctly. This is not something I have personally tested.

Now, I suggest that you:

  • Test your migration in various stages of success or error. Use https://www.sqlite.org/backup.html as a reference. Force crashes. Make your app robust.
  • Be ready for both the app and the extension to share a single database. This scenario has never been a focus of GRDB, because it is a wildly wide subject. Please refer to Dealing with External Connections.

If you have an interesting story to tell after this, please come back and share your experience! I'll answer other questions if you have any. But will have to grasp a few things about SQLite: click on the links I gave you.

@groue groue added the question label Nov 4, 2019
@foxware00
Copy link
Author

foxware00 commented Nov 4, 2019

@groue Thanks as always for a prompt and detailed response.

The backup API looks ideal, I will do some investigation. I didn't want to manually copy files if there was a better way to achieve it. It looks like a simple way to easily transfer the data. I'll be sure to open the existing connection I want to migrate in read-only mode to prevent data changing before the backup has finished. Given it's a blocking call, I'm hoping it'll be a simple as follows.

  1. Open existing connection in read-only mode
  2. Open new connection in App Group
  3. Run back up into the new connection.
  4. If successful point app at the new connection
  5. Delete and close the old connection.

I'll give those links a good read over. My current thought was to use the app extensions mainly to just query data the main app has. Given the easiest way to achieve this is by sharing my base target framework (Containing my data layer), I'm getting a lot of the data I need for free without having to re-write the queries and syncing logic.

I'm going to have a proper play at sharing the connection, but given all writes are transactional it'll be interesting to see if there are any issues. I'm currently using a database pool in WAL mode. I might consider also opening the extension's connection in readOnly mode to prevent multiple writes to the same SQL file. From the looks of it, WAL mode is designed for this purpose, with the small unknown of how IPC and shared memory working with App Extensions and the host app on iOS. So off for some in-depth testing of how it all works. From 2.2 in this link "Readers can exist in separate processes"

Is there something different in GRDB from raw SQLite that is abstracting some of the normal connection locks and transactions away from native SQLite? I guess my point is, isn't SQLite designed for this scenario when using transactions and WAL mode? Or does each process get a different WAL?

Anyway sorry for my rambling. Feel free to close the issue, and thank you for taking the time to give me a detailed response!

@groue
Copy link
Owner

groue commented Nov 4, 2019

Is there something different in GRDB from raw SQLite that is abstracting some of the normal connection locks and transactions away from native SQLite? I guess my point is, isn't SQLite designed for this scenario when using transactions and WAL mode? Or does each process get a different WAL?

GRDB is indeed a very thin layer over SQLite.

As you say, one writer and several readers are totally OK for a WAL database. A read-only pool or queue in the extension should not impact the read-write pool in the main app at all.

But we also have to deal with iOS. This and this tell us that... your rambling is normal, and I don't have a definitive answer because I haven't experienced this setup yet.

@foxware00
Copy link
Author

But we also have to deal with iOS. This and this tell us that... your rambling is normal, and I don't have a definitive answer because I haven't experienced this set up yet.

Fortunately, the Database is not encrypted nor do I target iOS 8 and below.

I'm going to have a good play today and test a few things. Many thanks for all the help.

@groue
Copy link
Owner

groue commented Nov 5, 2019

Yes, I didn't want to sound alarming, sorry :-) I just wanted to say that iOS has its word, so... yeah, test your stuff. I'm closing this issue now. Please open a new one if you have any further question!

@groue groue closed this as completed Nov 5, 2019
@klaas
Copy link

klaas commented Nov 9, 2019

@foxware00 I would really be interested in your long time experience and if you find some edge cases that have to be taken into account. Even a thumbs up or down would be great :-)

I'm facing a similar issue and right now I'm using GRDB only in the main app. The share extension and the main app communicate with one-way-only JSON files (e.g. every shared item creates a small unique file that is integrated in the database as soon as the main app gets a chance to do so).

@foxware00
Copy link
Author

@klaas I've chosen a slightly different route from the one you mention.
I'm migrating the existing database into the app group so I can access my device/user information required to complete the share without having to keep it in sync and fetch it again.

I make a simple check during the startup of the Share extension to check whether the migration has happened. Directing them to the main app first so I can perform the migration. I can also use the migrated data to validate a user being logged in being able to perform functions.

This was I can write straight to the database from the extension.

As mentioned higher up in this thread, I'm using a WAL file (database pools) to prevent read's and writes being blocked by either app. I'm keeping the logic and writing to the DB to a minimum, simply writing the state of the shared item into a pending state so when the app opens I can show the progress of the upload awaiting the real post from the server.

I've also had to migrate my keychain and user defaults.

Shout if you have any more questions or would like me to explain something in more detail.

@groue
Copy link
Owner

groue commented Nov 12, 2019

I'd like to distinguish a few cases:

  1. Read/write from the app, read/only from the extension. The extension does NOT observe the database for changes performed by the app.

    This is the most simple case. If the app uses the WAL mode (with a DatabasePool), then the app should always be able to read and write, and the extension should always be able to read, without any error, even if both processes concurrently access the database.

  2. Read/write from the app, read/write from the extension. The app and the extension do NOT observe the database for changes performed by the other process.

    Because both processes want to write in the database, they both may see SQLITE_BUSY errors, even if they use the WAL mode (with a DatabasePool). The WAL mode still does not allow two connections to write in the same db at the same time.

    SQLITE_BUSY errors are a pain, because they are a threat to all writes.

    Fortunately, SQLite has busy handlers. Those are configured in GRDB using the Configuration.busyMode property.

    There is also the threat of SQLITE_LOCKED, but I'm less familiar with this one, and I'm not sure it can happen due to concurrent writes.

  3. Read/write from the app, read/write from the extension. The app or the extension does observe the database for changes performed by the other process.

    On top of the handling of SQLITE_BUSY errors seen above, we now have to kick some database observations.

    This is a topic that nobody has ever shared any hint about. And I never had to implement it. This recent article https://www.avanderlee.com/swift/core-data-app-extension-data-sharing/ contains useful information. It is about Core Data, but it does not make it less relevant.

    The observation APIs currently provided by GRDB are all about Not! Missing! A! Single! Impactful! Transaction! Sorry for the shout, but this is very important. Being sure that not a single modification is missed is crucial for some apps, and currently all GRDB database observation tools provide this very strong guarantee. Apps process absolutely all individual transactions that have performed changes, with access to the changed data, without any coalescing or glitch, ever.

    To this end, all GRDB observations insert a "stop the world" handler that is run after each transaction that is deemed relevant has been committed on disk. During the execution of this handler, the app can't write, and the database can be queried (DatabaseQueue), or a concurrent reader can spawn and enter snapshot isolation right from the last written database state (usually faster, available only for database pools, used under the hood by ValueObservation, for example).

    This "stop-the-world" guarantee has to be dropped if we talk about change notifications between two processes. This is because when process P1 wants to perform change C1 and then change C2, we don't want to guarantee that process P2 has processed the notification of change C1 (including fetching those changes from a known database content), before P1 is allowed to proceed on to change C2.

    We'll thus need another kind of database observation, a relaxed one. This is totally uncharted territory for GRDB.

    I'd love that this topic would become better explored. 🤗

@foxware00
Copy link
Author

@groue An update on this.

I've done a bunch of testing and on the surface, everything is working perfectly. I'm able to migrate the DB, clean up the old one and move users onto the new db version.

However, I've been scratching my head for quite a while. I'm getting a bunch of crashes from our internal testers on testflight. The stack traces don't shed any light, are often random, and often the only code that crashed is "main". However after analysing a large number of crash reports. I've noticed almost always there is another thread in the background running a transaction. (I pre-fetch data, and sync small bits of data off the back of an APNS notification).

This is where it gets difficult. It's only happening on release builds as far as I can tell and only after a couple of hours. No reproduction steps, no explicit code has crashed according to the crashes being uploaded. It seems almost all of the crashes have a background thread that begun a transaction but never finished it. I don't see the line for .commit ever being called.

For additional context, the App Extension isn't actually being run. It's just the "app group" database that's being accessed.

Is it all just a coincidence that that background thread is as such. Crashes are always happening in the background, never when a user is using that app. And the only reason we'd be running in the background is to perform the sync so maybe it is just that.

I wonder if in your little reading around the subject you know of anything that might explain this?

public func inTransaction(transaction: @escaping (Database) -> Void) {
    if let pool = blinkDatabase.dbPool as? DatabasePool {
        try! pool.writeInTransaction { db in
            transaction(db) <--- last call here
            return .commit
        }
    }
}

My database config

var config = Configuration()
config.maximumReaderCount = 10
config.busyMode = .timeout(5)

@groue
Copy link
Owner

groue commented Nov 27, 2019

@foxware00, do you have any information about the crash? Could it be related to Data Protection? iOS won't let the app access the database file if it is locked according to its FileProtectionType.

@foxware00
Copy link
Author

foxware00 commented Nov 27, 2019 via email

@groue
Copy link
Owner

groue commented Nov 27, 2019

All right @foxware00, thanks for the report. I can only hope you can eventually gather or Google more information.

@foxware00
Copy link
Author

@groue I've found some more information by inspecting the crash reports manually. It seems the root cause is

Exception Type:  EXC_CRASH (SIGKILL)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note:  EXC_CORPSE_NOTIFY
Termination Reason: Namespace RUNNINGBOARD, Code 0xdead10cc

Which according to Apple 0xdead10cc is to do with the file lock. Interestingly this wasn't a problem before using App groups. Now to find out why. 🤔

The exception code 0xdead10cc indicates that an application has been terminated by the OS because it held on to a file lock or sqlite database lock during suspension. If your application is performing operations on a locked file or sqlite database at suspension time, it must request additional background execution time to complete those operations and relinquish the lock before suspending.

@groue
Copy link
Owner

groue commented Nov 28, 2019

Thanks @foxware00. Could it help if GRDB would wrap its transactions around a request for additional background execution? There is a related comment in the FMDB repo, even if they were not chasing after the same crash: ccgus/fmdb#754 (comment).

@foxware00
Copy link
Author

Possibly. However, it looks like you can't request it in the same way from an app extension. I wonder if there is much of an overhead to doing what you suggest.
I'm going to have a quick play and see if there is anything I'm doing wrong inside my performFetchWithCompletionHandler method from AppDelegate. Fortunately, I only have one block of code which runs in the background so wrapping it with a request for additional execution time might be the easiest thing to do. Although this seems odd from within performFetchWithCompletionHandler.

The beginBackgroundTaskWithName:expirationHandler: method cannot be called from an app extension. To request extra execution time from your app extension, call the performExpiringActivityWithReason:usingBlock: method of NSProcessInfo instead.

@groue
Copy link
Owner

groue commented Nov 28, 2019

Fortunately, I only have one block of code which runs in the background so wrapping it with a request for additional execution time might be the easiest thing to do.

Sure. Please tell us if this fixes the crash! This will be a very precious information if someone someday decides to wrap all of this is an easy-to-grasp high-level API.

@foxware00
Copy link
Author

Absolutely. Currently, I have a timeout of 29 seconds on the background task, which may be a little close to comfort to the supposed 30 seconds. I'm going to try lowering this to 20 seconds so I can cancel any pre-fetches that might be close to being overrun. My theory is that a request is in flight, and Apple kills the process just before I timeout and callback. At which point my URLSession returns and tries to hold the transaction open to write the response. If this lowers the crashes substantially. I'm going to additionally try wrapping the call in a background execution request and hopefully, the crashes will subside completely. Stay tuned!

@groue
Copy link
Owner

groue commented Nov 28, 2019

I surely stay tuned.

I was thinking several things, like:

  • we could start a background task when a transaction is active and the app enters background - whatever it means for iOS, watchOS, tvOS, extensions, etc.
  • we could interrupt the current statement/transaction when the app is notified that its background execution time is about to expire. See for example sqlite3_interrupt.

But there is no rush: we need your feedback first ;-)

@foxware00
Copy link
Author

foxware00 commented Nov 28, 2019

They both sound like great additions. Dealing with extensions is always tricky as you don't have access to UIApplication.shared which is where you determine your state.

I'm pushing a build to my internal guinea pigs so I can collect some crash statistics on the two options I've implemented. I really like the idea of a generic solution within GRDB though.

Also an interesting tidbit. Logging out UIApplication.shared.backgroundTimeRemaining inside the pre-fetch method gives me different times. 29.962309416674543, 28.535144541674526and 27.565803166653495 all of which might explain why my 29 seconds was overrunning the alotted time.

@foxware00
Copy link
Author

Seems both of my approaches didn't fully fix the issue. Crashes are still occurring with the same error. I found this link giving an example to a similar issue to the one you linked around SQLCipher a while back. Given I'm not encrypting the DB, is there something we're missing? I'm now trying moving back to a DatabaseQueue rather than a pool. This obviously won't settle for release but might see if the issue lies somewhere around that logic. I'm thinking to have something wrapping around the transaction itself might be more bulletproof. Back to the drawing board, for now...

https://github.com/signalapp/SQLCipherVsSharedData

@groue
Copy link
Owner

groue commented Nov 28, 2019

Given I'm not encrypting the DB, is there something we're missing?

Maybe it just happens that our friends work with SQLCipher. This does not mean that the issue does not happen to regular unencrypted SQLite databases.

May I ask @charlesmchen-signal and @michaelkirk-signal if you were able to overcome this trouble?

@foxware00
Copy link
Author

foxware00 commented Nov 28, 2019

I see they've forked GRDB with fixes for the state of the DB when SQLite cipher is used.

michaelkirk-signal@0c58229

They do allude to it being a problem for only encrypted databases, but it could seem there is an issue with iOS recognizing the state of the files during a suspended state. As I've been experiencing though, debug builds and simulators don't exhibit the issue which is making it all the more difficult to fix.

Set `unencryptedHeaderConfiguration` if you have a SQLCipher database in a shared app
container to avoid 0x10deadcc crashes.

As a rule, when an app or extension is suspended, if that process holds a lock on a file in
a shared container, iOS will kill that process with the exception code: 0x10deadcc.

Because SQLite's ubiquitous WAL mode requires file locking to facilitate concurrency, iOS
provides a special exemption to this rule for SQLite files. However, because SQLCipher
databases are encrypted, iOS cannot recognize them as database files. The exemption does
not apply, and iOS kills the process.

The work around is to leave the first bytes of the SQLCipher database unencrypted.
This allows iOS to recognize that the locked file is a SQLite database and avoids the
crash.

The data in the first bytes of the database file are boilerplate like "SQLite Format 3\0",
not sensitive user data.

See more in `UnencryptedHeaderConfiguration` and at:
https://www.zetetic.net/sqlcipher/sqlcipher-api/#cipher_plaintext_header_size

@groue
Copy link
Owner

groue commented Nov 28, 2019

Yes, cipher_plaintext_header_size is about solving sqlcipher/sqlcipher#255, and this is an SQLCipher-specific issue. Maybe the Signal team had to do something else in order to avoid other causes for 0x10deadcc.

@foxware00
Copy link
Author

More woes, it looks as though the closing the connection could be a candidate.

https://twitter.com/BigZaphod/status/1164924859954159618

https://bugzilla.mozilla.org/show_bug.cgi?id=1307822

https://bugzilla.mozilla.org/show_bug.cgi?id=1291304

This has a lot of interesting stuff in it. Not sure of any definitive solution that applies to GRDB

https://bugzilla.mozilla.org/show_bug.cgi?id=1317324#c10

Anyway, DatabaseQueue is uploading, so I'll see if there is something different when using WAL.

@foxware00
Copy link
Author

foxware00 commented Nov 28, 2019

Sorry for bombarding this thread. More a place for a record than pestering you for a response.
It seems WAL might be required anyway.

My train of thought is currently in two places.
Before I end the background task, I call Database.close() directly to close all connections before reporting to iOS that I'm done.

Another thing, not sure how much of an issue it is. But I don't delete the old WAL file during migration. I never open a connection to it, but maybe system is looking at the app's directory finding the old (closed) wal and seeing it closed whilst the open connection is still valid.

UPDATE. For opening a connection inside app group it looks like it must be a WAL db. Not sure why mine isn't working. But will try out removing the WAL file as well to ensure no files are hanging around

@michaelkirk-signal
Copy link
Contributor

michaelkirk-signal commented Nov 28, 2019

👋

I see they've forked GRDB with fixes for the state of the DB when SQLite cipher is used.

michaelkirk-signal/GRDB.swift@0c58229

We are not currently using a forked version of GRDB. In the past we've briefly used forked versions as we wait for changes to get upstreamed. So far we've either been able to get our changes upstreamed or worked with @groue to find some better solution (thanks @groue!).

In this particular case, rather than the specific plaintextHeader api you linked to, we worked with @groue to introduce the more general prepareDatabase method, and use it like this:

https://github.com/signalapp/Signal-iOS/blob/master/SignalServiceKit/src/Storage/Database/GRDBDatabaseStorageAdapter.swift#L437

configuration.prepareDatabase = { (db: Database) in
    let keyspec = try keyspec.fetchString()
    try db.execute(sql: "PRAGMA key = \"\(keyspec)\"")
    try db.execute(sql: "PRAGMA cipher_plaintext_header_size = 32")
}

But, back to the 10deadcc's, just to be clear about what I think we're talking about:

From https://developer.apple.com/library/archive/technotes/tn2408/_index.html

Processes using database in WAL journal mode will only be jetsam'd if they attempt to hold a write transaction open at the time of suspension.

So yes, this will affect encrypted and unencrypted db's alike. Without the cipher_plaintext_header_size configuration it will affect encrypted db's much more.

The short version is we've never completely eliminated these kinds of crashes.

  • if you have a write transaction going at the time the OS suspends the app, you will get a 0x10deadcc crash.
  • background tasks ask for but are not guaranteed any particular amount of time to continue execution. Documentation and experience says that in the usual case this is about 30s. But you can't count on it.

Additionally, in our codebase there is some amount of non-obvious async writes happening as side effects of other code. So we might be at second 29.5 of a background task when some async thing kicks off another write.

We have had a long tail of tracking down this kind of code and, where possible, ensure it doesn't run in the share extension. But sometimes these writes are critical to the functioning of the share extension. e.g. in our case sending a message through the share extension on a shoddy network might fail several times in the background and be ready for another retry at second 29.

We still want to try to get it out in that last second if we can, even though it risks running afoul of the locking rules. So to some degree we've adjusted our expectations to the rules of the platform and accepted that some level of crashing is worth it if it means we'll be more likely to get messages through. We've continued to see improvements here as we've optimized our writes to be faster.

Another thing, since it sounds like you will be doing writes from the share extension, you probably want to look at: configuration.defaultTransactionKind = .immediate

@groue
Copy link
Owner

groue commented Nov 29, 2019

Thank you @michaelkirk-signal :-)

I did setup a sample app which reproduces the crash on my device (iOS 13.2.3). A simple app, without any extension. When I open an transaction (immediate so that the lock is acquired) and send the app to background, here is what I witness in the debugger:

  • time: applicationDidEnterBackground is called, beginBackgroundTask
  • time + ~25s: end of background task, start a timer which repeatedly calls ProcessInfo.processInfo.performExpiringActivity
  • ProcessInfo.processInfo.performExpiringActivity always give false (not about to be suspended)
  • time + ~30s: Message from debugger: Terminated due to signal 9

When the app runs without debugger, Xcode is able to import a crash log which mentions 0xdead10cc:

Exception Type:  EXC_CRASH (SIGKILL)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note:  EXC_CORPSE_NOTIFY
Termination Reason: Namespace RUNNINGBOARD, Code 0xdead10cc
Triggered by Thread:  0

Thread 0 name:  Dispatch queue: com.apple.main-thread
Thread 0 Crashed:
0   libsystem_kernel.dylib        	0x00000001b2ef8634 mach_msg_trap + 8
1   libsystem_kernel.dylib        	0x00000001b2ef7aa0 mach_msg + 72
2   CoreFoundation                	0x00000001b309f04c __CFRunLoopServiceMachPort + 216
3   CoreFoundation                	0x00000001b309a16c __CFRunLoopRun + 1444
4   CoreFoundation                	0x00000001b30998a0 CFRunLoopRunSpecific + 464
5   GraphicsServices              	0x00000001bcff1328 GSEventRunModal + 104
6   UIKitCore                     	0x00000001b718a740 UIApplicationMain + 1936
7   AppGroup                      	0x0000000100534e9c 0x100528000 + 52892
8   libdyld.dylib                 	0x00000001b2f24360 start + 4

What bugs me is that I could never have ProcessInfo.processInfo.performExpiringActivity tell that the process is about to be suspended.

@groue
Copy link
Owner

groue commented Dec 2, 2019

The experiments app: https://github.com/groue/10deadcc

@foxware00
Copy link
Author

Thank you @michaelkirk-signal for your breakdown that makes a lot of sense. I think my train of thought was that a transaction on the database in WAL Journal mode should be allowed to complete in suspended scenarios, which isn't true 100% of the time, similar to what you mention. It looks like we've made some real headway in understanding the problem in more depth, so thank you for taking the time to explain.

So what would be gained by the OS killing a process holding a lock if that process is the only one that could ever lock the file in the first place?

I was thinking it was a performance consideration, a stricter, cancellation and interrupt policy which was only applying to app group containers. Your explanation clarifies it greatly.

There surely is something to do here. GRDB is "a toolkit for SQLite databases, with a focus on application development". This means that it is definitely in the scope of the lib to help avoiding 0xdead10cc.

This is fantastic! Thank you, is there anything I can help with? I've had a quick look through those two PR's and look great. I need to do some more bedtime reading on SQLite locking to understand the second one better and the problems you've mentioned. Happy to test anything with my internal testers.

@groue
Copy link
Owner

groue commented Dec 2, 2019

Thanks @foxware00. Yes, working on GRDB is a great opportunity to learn many things about SQLite. Like almost everybody, I used to know nothing about this database. The technique is somewhat simple: find a funny puzzle to solve, read a lot of documentation, perform experiments, discover... stuff, and eventually ship an API that is robust and makes sense for other developers ;-)

@michaelkirk-signal
Copy link
Contributor

In my tests so far, only reads in the WAL journal mode (what we get with a GRDB DatabasePool) do not trigger 10deadcc.

Ah, that very well could be. All of our use cases and testing were against WAL mode.

@groue
Copy link
Owner

groue commented Dec 2, 2019

Ah, that very well could be. All of our use cases and testing were against WAL mode.

I completely understand ;-) Just in case you'd use a plain dbQueue one day ;-)

@foxware00
Copy link
Author

  • Manage panic with performExpiringActivity and other iOS tools

Out of interest what did you have in mind for this?
I've noticed application?.applicationState == .background should only be called from the main thread. Xcode gives me warnings about calling it from a background thread. I know transactions can be requested from any thread, thus checking whether we need to start a request for background processing and having to jump back onto the main thread to check our state seems like an issue. Interestingly none of the documentation suggests what happens if you call these methods from the foreground, but it's implied that they're background only.

@groue
Copy link
Owner

groue commented Dec 5, 2019

What I have in mind is:

  1. Forbid locking the database once background task expiration has been notified, meaning that the app may enter suspended state at any time. This means interrupting currently executed database statements (Database interruption #659), and throwing an error each time the app is attempting at using the database in a way that creates a lock (Database Lock Prevention #660). It's better getting an error than crashing. Keep the database in this paranoid state until...
  2. ... we authorize locking the database again, once it is safe to do so.

First item is easy.

Second item is not easy, or rather, full of shadows. For more detail: https://forums.developer.apple.com/thread/126438

@foxware00
Copy link
Author

Second item is not easy, or rather, full of shadows. For more detail: https://forums.developer.apple.com/thread/126438

Some interesting reading. Is this because you're trying to block further access to the database when a suspension is happening? It sounds like the API's are designed as wrappers around individual functionality rather than reporting the state of the system. For point 2, if we want to know when the app is allowed to authorize locks. Isn't on the successful completion of a transaction or when the background task expires? At this stage, we're being killed anyway and any other transactions wouldn't have been allowed due to the panic. And further ones would again wrap in the same background handlers. Is it this stage, the further transactions, immediately after a handler expiring that we're not notified?

I might be missing the point, however, it's a fascinating topic and lesson in learning how these things are working.

@groue
Copy link
Owner

groue commented Dec 5, 2019

Some interesting reading. Is this because you're trying to block further access to the database when a suspension is happening? It sounds like the API's are designed as wrappers around individual functionality rather than reporting the state of the system.

Welcome to the world of modern applications.

After a background task has expired, the application is not suspended yet. Many things can happen until it gets suspended for good. Core Location can notify a location update. An application timer can fire. A network request can end. All of those events may trigger a database access.

And if we would start a background task then, we would not be notified of its expiration, because it was started too late. Yes, that means 0xdead10cc.

@groue
Copy link
Owner

groue commented Dec 5, 2019

After a background task has expired, the application is not suspended yet. Many things can happen until it gets suspended for good. Core Location can notify a location update. An application timer can fire. A network request can end. All of those events may trigger a database access.

Imagine what it would mean if your app had to deal with this, with your little hands. Your app would had to setup a background task. Wait for its expiration. At that moment, your app would have to notify all other places in your application that they must no longer access the database. Cancel timers. Cancel Core Location updates. Cancel network requests. Or let them all go, but inject somehow that database must no longer be used.

... and of course this state must be lifted eventually, because your app won't block access to its database forever, will it?

Imagine the pollution in the code of your app.

Imagine what it means if you forget to handle properly the background task expiration in one place (No 0xdead10cc from timer and core location, but you forget the network request).

@foxware00
Copy link
Author

That makes a lot of sense thank you for explaining.

And if we would start a background task then, we would not be notified of its expiration, because it was started too late. Yes, that means 0xdead10cc.

This is the interesting bit, it seems like a bug on iOS's part not notifying you the second time. With performExpiringActivity states that "When your process is in the background, the method tries to take a task assertion to ensure that your block has time to execute". Sounding like in the scenario you request additional time after the first expiration it would call straight though with expired true.

Is this not what you're seeing? Or background tasks are behaving differently?

@groue
Copy link
Owner

groue commented Dec 5, 2019

This is the interesting bit, it seems like a bug on iOS's part not notifying you the second time.

It doesn't fit our need. But I'm sure Apple people will tell it behaves as expected. If you think about it, it is exactly what you intended as well with your sentence above:

It sounds like the API's are designed as wrappers around individual functionality.

The use case they want to support is exactly this one. You wrap tasks. What happens after task expiration is none of their business, and errs on the side of programmer error. What happens if an app starts a task too late is none of their business, and errs on the side of programmer error as well.

Now we are after another use case, a use case that is explicitly not supported by Apple APIs. What do we know? GRDB is surely benevolent. But maybe Apple engineers did want to prevent aggressive applications from turning the feature we are after into real dark patterns.

@groue
Copy link
Owner

groue commented Dec 6, 2019

OK we have a clear answer now: https://forums.developer.apple.com/thread/126438.

Fun fact, Core Data is not immune against 0xdead10cc: https://blog.iconfactory.com/2019/08/the-curious-case-of-the-core-data-crash/

I believe we can do better.

@groue
Copy link
Owner

groue commented Dec 6, 2019

The blog post above contains this interesting paragraph:

Due to incorrectly placed task markers, [...] iOS would unceremoniously kill the process — making it look like it had crashed somewhere inside of Core Data..

I spent several days reworking a bunch of old code to ensure that task markers were started and ended in the proper sequence, and to ensure that the task wasn’t marked as complete until after the entire process was done.

One could not have expressed it in a better way. Having GRDB users tell how they spent several days dealing with markers is exactly what I want to avoid. What a shit work.

No. What I want is something like:

do {
    try databaseStuff()
} catch {
    // Handle eventual error due to lock prevention.
    // For example try again when database becomes available again.
}

We see that we'll have to expose this notion of "database becomes available again". In spirit it should be very close to UIApplicationProtectedDataDidBecomeAvailable.

In the same trend of thought, application developers will also have to be able to easily restart database observations that have failed because of lock prevention (and see how it fits with RxGRDB and GRDBCombine).

There's still a lot of work before we can ship the "App Group edition" of GRDB. And I'm not talking about cross-process database observation (this will probably ship in the "App Group Gold edition").

@foxware00
Copy link
Author

https://blog.iconfactory.com/2019/08/the-curious-case-of-the-core-data-crash/

This was exactly my path. Nice to know I wasn't alone.

We see that we'll have to expose this notion of "database becomes available again". In spirit it should be very close to UIApplicationProtectedDataDidBecomeAvailable.

Does this effectively notify us to unblock the DB?

The real problem here is basically that the ONLY time it's truly safe to start a background task is in EXACTLY the same runloop cycle as your app was woken by the system. In any other situation (for example, in a different thread or at some later point after you were woken), there isn't any way to guarantee that your app hasn't already started to suspend.

It sounds like we might need to live with the occasional crash. Minimising the potential impact of such. It seems like a scenario Apple has just decided to accept the fact there they might kill you at any point. An interesting stance indeed.

Interestingly in my scenario the second the app opens from didReceiveRemoteNotification or performFetchWithCompletionHandler I request a background task, only ending it when completed successfully. Obviously, it still crashes because I don't actually forcefully stop the database execution. I need to interrupt the database manually, something I need to try. Clearly this isn't the solution for everyone and is exactly what we'd ideally resolve with the handling being internal in the clean clear API you suggest. I plan on adding further background processing in the near future which would only increase the surface area of this, ideally, unnecessary work. "shit work" indeed.

I feel your pain and frustration and thank your efforts in rolling out a robust solution for the rest of us and to keep GRDB simple and effective API.

@groue
Copy link
Owner

groue commented Dec 8, 2019

The "final" PR is drafted: #663

@foxware00
Copy link
Author

The "final" PR is drafted: #663

Very exciting! I'll have a proper read through shortly

@groue
Copy link
Owner

groue commented Dec 12, 2019

#663 is able to prevent 0xdead10cc for applications, but I was not able yet to work on extensions, mainly because I don't know how to reproduce the crash in an extension. It's obviously impossible to try to prevent a crash that is impossible to reproduce.

#663 is thus on hold, but I want to merge into master the preliminary work. This is #668, flagged experimental.

@groue
Copy link
Owner

groue commented Dec 12, 2019

If you know how to reproduce the 0xdead10cc in an extension, please give your recipe!

@foxware00
Copy link
Author

@groue unfortunately I haven't experienced an extension causing 0xdead10cc. I was only seeing issues when dealing with the main app doing background fetches.

@groue
Copy link
Owner

groue commented Dec 12, 2019

There are some evidence that this crash exists in extensions, such as this tweet by @marcoarment. But it's difficult to grab any actionable information. If shipping a real app is a mandatory step, then we'll hardly see any advance on this topic in an open source project.

@foxware00
Copy link
Author

Agreed, I saw this too. From my testing it's less predictable from an app extension. What have you tried so far?
In the same way your tests hold a lock on the database during suspsension. Would simply holding a lock, then calling through to self.extensionContext?.completeRequest(returningItems: [], completionHandler: nil) trigger the same scenario. Or is this what you've already tried?

I can try test this later on today.

@groue
Copy link
Owner

groue commented Dec 12, 2019

Agreed, I saw this too. From my testing it's less predictable from an app extension. What have you tried so far?

  • Created an App Group on my personal developer account
  • Ran an iMessage extension on the device. Keep a transaction open forever. Extension always exits with status code 0 when I leave iMessage.
  • Ran a sharing extension on the device. Keep a transaction open forever. Could never witness the crash.

Would simply holding a lock, then calling through to self.extensionContext?.completeRequest(returningItems: [], completionHandler: nil) trigger the same scenario.

I don't understand. What do you have in mind?

@foxware00
Copy link
Author

Ran a sharing extension on the device. Keep a transaction open forever.

Calling self.extensionContext?.completeRequest(returningItems: [], completionHandler: nil) from the share extension closes the extension and can hand off the URL session to the containing app. My thinking way transactions held here would cause the error. It's very odd that you're seeing status code 0, when it sounds like it should almost definitely be killing the process to clear the lock.

@groue
Copy link
Owner

groue commented Dec 12, 2019

It's very odd that you're seeing status code 0, when it sounds like it should almost definitely be killing the process to clear the lock.

Yes. It could also be a side-effect of the debugger. But when I run the extension without any debugger attached, I don't see anything weird in Console.app. The Devices window of Xcode does not show any crash log.

Of course, I may just have missed something. This is a chore very long process 😅

Calling self.extensionContext?.completeRequest(returningItems: [], completionHandler: nil) from the share extension closes the extension and can hand off the URL session to the containing app. My thinking way transactions held here would cause the error.

You mean the transaction held by the extension. Thanks for the tip, this may give something. Did you make an attempt?

@foxware00
Copy link
Author

foxware00 commented Dec 12, 2019

One moment, let me give it a go...

Bingo. Is this what you're expecting?

default	14:03:13.170361 +0000	runningboardd	[xpcservice<com.companyname.app.AppNameShare>:740] Suspending task.
default	14:03:13.170482 +0000	runningboardd	Calculated state for xpcservice<com.companyname.app.AppNameShare>: running-suspended (role: None)
error	14:03:13.172627 +0000	runningboardd	not an SQLite database: /var/mobile/Containers/Data/PluginKitPlugin/6C2AFE22-6746-4EC4-B778-14AFB51721D4/Library/Caches/com.crashlytics.data/com.companyname.app.AppNameShare/v3/active/914c3246306d43ea9e8e17bad2b332e7/sdk.log
default	14:03:13.174155 +0000	mediaserverd	-CMSessionMgr- CMSessionMgrHandleApplicationStateChange: CMSession: Client com.companyname.app.AppNameShare with pid '740' is now Background Suspended. Background entitlement: NO ActiveLongFormVideoSession: NO WhitelistedLongFormVideoApp NO
default	14:03:13.186477 +0000	mediaserverd	-CMSessionMgr- CMSessionMgrHandleApplicationStateChange: CMSession: Sending stop command to com.companyname.app.AppNameShare with pid '740' because client is background suspended and there is no AirPlay video session for it
error	14:03:13.187505 +0000	runningboardd	[xpcservice<com.companyname.app.AppNameShare>:740] suspended with locked system files:
/var/mobile/Containers/Shared/AppGroup/635520FC-602F-4747-84AA-67E18F0E1ECA/appName-db.sqlite
not in allowed directories:
(null)
default	14:03:13.187558 +0000	runningboardd	[xpcservice<com.companyname.app.AppNameShare>:740] Terminating with context: <RBSTerminateContext: 0x104acf260; domain: 15; code: 0xDEAD10CC; explanation: "[xpcservice<com.companyname.app.AppNameShare>:740] was suspended with locked system files:
/var/mobile/Containers/Shared/AppGroup/635520FC-602F-4747-84AA-67E18F0E1ECA/appName-db.sqlite
not in allowed directories:
(null)"; reportType: 1; maxRole: 7; maxTerminationResistance: 3>
default	14:03:13.187593 +0000	runningboardd	[xpcservice<com.companyname.app.AppNameShare>:740] terminate_with_reason() success
default	14:03:13.187699 +0000	runningboardd	[xpcservice<com.companyname.app.AppNameShare>:740] Set darwin role to: None
default	14:03:13.187903 +0000	backboardd	Connection removed: IOHIDEventSystemConnection uuid:7C87D6B9-C72E-4D83-A850-46B488DB4552 pid:740 process:AppName type:Passive entitlements:0x0 caller:BackBoardServices: <redacted> + 380 attributes:{
    HighFrequency = 0;
    bundleID = "com.companyname.app.AppNameShare";
    pid = 740;
} inactive:0 events:8 mask:0x800
default	14:03:13.188821 +0000	SpringBoard	[xpcservice<com.companyname.app.AppNameShare>:740] Now flagged as pending exit for reason: workspace client connection invalidated
default	14:03:13.191227 +0000	runningboardd	XPC connection invalidated: [xpcservice<com.companyname.app.AppNameShare>:740]
default	14:03:13.192977 +0000	runningboardd	[xpcservice<com.companyname.app.AppNameShare>:740] Death sentinel fired!
default	14:03:13.195519 +0000	itunesstored	Updating configuration of monitor <RBSProcessMonitorConfiguration: 0x102cd8eb0; id: M139-1; qos: 17> {
    predicates = {
        <RBSProcessPredicate: 0x102cfb0c0> {
            predicate = <RBSCompoundPredicate; <RBSProcessBundleIdentifierPredicate; com.supersmashing.ios.calderstewart>; <RBSCompoundPredicate; <RBSProcessEUIDPredicate; 501>; <RBSProcessBKSLegacyPredicate: 0x102c427f0>>>;
        };
    }
    descriptor = <RBSProcessStateDescriptor: 0x102cf7690; values: 11> {
        namespaces = {
            com.apple.frontboard.visibility;
        }
    };
}
error	14:03:13.231278 +0000	runningboardd	RBSStateCapture remove item called for untracked item <RBProcessMonitorObserver: 0x104947a50; <RBProcess: 0x10494baa0; 740; identity: xpcservice<com.companyname.app.AppNameShare>>; configCount: 0>
default	14:03:13.274739 +0000	symptomsd	defusing ticker tickerFatal having seen progress by flow for com.companyname.app, rxbytes 6976 duration 7.951 seconds started at time: Thu Dec 12 14:03:05 2019
default	14:03:13.296903 +0000	runningboardd	Removing process: [xpcservice<com.companyname.app.AppNameShare>:740]
default	14:03:13.297181 +0000	runningboardd	Removing assertions for terminated process: [xpcservice<com.companyname.app.AppNameShare>:740]
default	14:03:13.297851 +0000	runningboardd	Calculated state for xpcservice<com.companyname.app.AppNameShare>: none (role: None)
default	14:03:13.304488 +0000	SpringBoard	Firing exit handlers for 740 with context <RBSProcessExitContext; unknown; terminationContext: <RBSTerminateContext: 0x28036bf40; domain: 15; code: 0xDEAD10CC; explanation: "[xpcservice<com.companyname.app.AppNameShare>:740] was suspended with locked system files:
/var/mobile/Containers/Shared/AppGroup/635520FC-602F-4747-84AA-67E18F0E1ECA/appName-db.sqlite
not in allowed directories:
(null)"; reportType: 1; maxRole: 7; maxTerminationResistance: 3>>
default	14:03:13.304572 +0000	SpringBoard	[xpcservice<com.companyname.app.AppNameShare>:740] Process exited: <RBSProcessExitContext; unknown; terminationContext: <RBSTerminateContext: 0x28036bf40; domain: 15; code: 0xDEAD10CC; explanation: "[xpcservice<com.companyname.app.AppNameShare>:740] was suspended with locked system files:
/var/mobile/Containers/Shared/AppGroup/635520FC-602F-4747-84AA-67E18F0E1ECA/appName-db.sqlite
not in allowed directories:
(null)"; reportType: 1; maxRole: 7; maxTerminationResistance: 3>>.
default	14:03:13.304623 +0000	SpringBoard	[xpcservice<com.companyname.app.AppNameShare>:740] Setting process task state to: Not Running
default	14:03:13.305487 +0000	runningboardd	Calculated state for xpcservice<com.companyname.app.AppNameShare>: none (role: None)

Looks like it worked

terminationContext: <RBSTerminateContext: 0x28036bf40; domain: 15; code: 0xDEAD10CC; explanation: "[xpcservice<com.companyname.app.AppNameShare>:740] was suspended with locked system files

To replicate this. I installed the application. Removed the debugger and opened console

This is code that caused it. I added this to a button press within the ShareExtension. The openLongRunningTransaction is stolen from your 10deadcc test repo

Database.shared.openLongRunningTransaction { result in 
    print("transaction result: \(result)")
}
self.extensionContext?.completeRequest(returningItems: [], completionHandler: nil)

@groue
Copy link
Owner

groue commented Dec 12, 2019

Thanks @foxware00. Meanwhile, I'll shortly merge and ship #668. It does not automatically suspend databases, but you can do it from your app and extension. You are warmly encouraged to perform your experiments and see if you can get rid of 0xdead10cc. And profit from the other shared database setup advice.

@foxware00
Copy link
Author

@groue Thank you I will shortly have a play. Thanks for all your hard work on the topic. Getting closer to a really fantastic and robust solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants