-
-
Notifications
You must be signed in to change notification settings - Fork 731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster alternative to pool.makeSnapshot which updates existing DatabaseSnapshot #619
Comments
Hello @michaelkirk-signal,
I won't discuss your architecture :-) I'm not a snapshot user myself, so issues like this one are very useful. Thanks for the detailed context: it is easy to answer your questions.
Yes, definitely. You use the synchronous snapshot A transaction was just committed, so you are perfectly sure of the db content. And you use If you had started a plain transaction instead ( in your case, maybe I'm not sure the stronger guarantee of
I agree that creating a new connection for each snapshot is far from ideal. Refreshing an existing one is a natural need. It's time we had a serious look at the recently introduced SQLite snapshots, see what they allow, and how they could influence our snapshotting API. Today I recommend that you copy the implementation of
I lack experience here. I know that open transactions do alter checkpointing, may return errors depending on the checkpoint mode, and that the WAL can grow really big if you keep long running transactions. |
It uses internal APIs, so it can't be copied right away. What's important is that you read something after the transaction has begun: try db.beginTransaction(.deferred)
// Read anything
_ = try Row.fetchCursor(db, sql: "SELECT rootpage FROM sqlite_master LIMIT 1").next() |
In our case, we accumulate details about the specific changes of a commit, and then need to communicate the aggregated details in lock step with the ui database snapshot being updated. Powering a collection view is a good example, because it needs to know not only that the db did change, but also needs some specifics of how it changed so that we can do an efficient update. Let me flesh out my example a little more:
So I don't think the weaker version would work for us. It would be problematic if at [3] the contents of
Thanks! |
All right Michael! This is an important "data point". I also remember #602: changing the passphrase of an encrypted database creates some difficult concurrency-related work for the applications that use snapshots. I'm still trying to figure out where this will drive us. A very quick glance at SQLite snapshots seems to open new possibilities. Basically, snapshotting could become independent from particular transactions in particular connections, as currently implemented in GRDB. Several SQLite connections could share the same SQLite snapshot. This could make GRDB snapshots able to perform concurrent reads. We could also have GRDB snapshots use their parent pool's reader connections, and this would both solve the concurrency difficulties created with #602, and better honor the It will take some time before a plan emerges. Meanwhile, I hope you will be able to refresh your snapshots as described above! |
It looks like it is true.
It looks like it is wrong. Once a SQLite connection has been puts in the rails of a snapshot with sqlite3_snapshot_open, it can still jump to another snapshot (as far as I understand), but it can't jump back to regular operations. So we can't share the read-only connections of a database pool between regular reads and snapshot reads. Confirmation requested on the SQLite mailing list. Sigh |
Cool, it looks I was wrong thinking I was wrong |
SQLite snapshots can be invalidated by WAL checkpoints. This is not very cool for us, because any write can trigger an automatic checkpoint. Refactoring GRDB snapshots on top of SQLite snapshots would require disabling automatic checkpoint. I asked for a confirmation. |
For information, all experiments with SQLite snapshots happen in this branch: https://github.com/groue/GRDB.swift/tree/dev/SharedDatabaseSnapshot To build and test, run |
@michaelkirk-signal, during my exploration of snapshots, I happen to face some questions if we allow snapshots to be refreshed. For example, since snapshots conform to DatabaseReader, one is allowed to start a database observation from them. It's easy when snapshots are immutable (there is no change to observe in an immutable snapshot). It gets less easy if snapshots can be refreshed. FYI, I'm planning to keep snapshots totally immutable, without any refresh method, for the sake of simplicity. I just hope that refactoring GRDB snapshots on top of SQLite snapshots, and reusing the pool's reader connections will lift the performance issues you have profiled:
I admit I'm surprised that this difference makes a difference in your app. Would you be able to elaborate a little more? |
Draft PR: #625 |
I've put together an example project for you, if you'd like to see: It also includes a couple of screenshots from Instruments. That example is vastly simplified, but similar in spirit to how DB changes make their way to our views. You can toggle the commented out section in Example.swift to compare the performance behavior of rebuilding snapshots vs. fast-forwarding them. |
Thank you very much @michaelkirk-signal ! I'll be able to check how #625 behaves as well 👍 |
@michaelkirk-signal, here are the results of the experiments with your sample app, on my machine. First screenshot is with regular snapshots that are re-created (your option 1): Second screenshot is with regular snapshots that are re-used (your option 2): Third screenshot is with the new "shared snapshots" (will have to find a better name) introduced in #625: 🔥 It looks like it's the way to go 😄 I had plenty of discussions on the SQLite mailing list in order to make sure GRDB makes a correct use of SQLite snapshots. I'm now confident. The unexpected difficulty is that the System SQLite, which ships with the ability to use SQLite snapshots, misses the required declaration in <sqlite3.h> 🤦♂. In order to test your SPM app I had to manually modify GRDB (commenting out all What will be hard is the packaging 😒 One final note. Your sample code recreates snapshots from the main queue. This means that you do not have a snapshot in the exact state left by the transaction that has just ended: many concurrent transactions may have been completed before the main queue has the opportunity to grab its snapshot. This code, in practice, runs in the "weak" mode we have described in #619 (comment), despite your claim that your app needs to process each individual changes. The strong version is the following: func databaseDidCommit(_ db: Database) {
// Take the snapshot before any other write can modify the db
let snapshot = try! self.storage.pool.makeSnapshot()
DispatchQueue.main.async {
self.latestSnapshot = snapshot
self.updateUI()
}
} With the new shared snapshots, the strong version will read as below (and it will be fast unless I'm mistaken): func databaseDidCommit(_ db: Database) {
// Take the snapshot before any other write can modify the db
let snapshot = try! self.storage.pool.makeSharedSnapshot()
DispatchQueue.main.async {
self.latestSnapshot = snapshot
self.updateUI()
}
} |
I also want to stress out that the "shared snapshots" introduced in #625 and WAL checkpointing are incompatible. #625 prevents automatic and explicit checkpoints as long as there are living shared snapshots, so that snapshots are reliable. There is no free lunch. With shared snapshots:
I think this information is worth considered very closely :-) |
Ah yes, that was an error I introduced when writing the example app. 😓We were in fact doing something like your strong example, which creates the snapshot synchronously in
Obviously we'd need to checkpoint eventually. This strategy seems doable. If you have existing shared snapshots, is it that checkpointing will be unproductive, or that it will inadvertently update the db state seen by shared snapshot?
I'm not familiar with the implementation of shared snapshots, but it sounds incompatible with having a share extension. Is that true? That would be a deal breaker unfortunately. |
Don't worry - I just had to spot it just in case :-)
When snapshots are protected, automatic checkpoint is disabled, and explicit checkpoints fail with an error. Recent commits in #625 has made such protection disabled by default. "Historical snapshots" (a new name that is closer to original SQLite Documentation) are fragile in the intention of the SQLite authors. They are fragile because any checkpoint invalidates them - and it's impossible to protect them absolutely. I think this is why they are so fast: there is no need to create any lock on disk in order to create one - but external connections are unaware of their existence. Protection for a given database pool can be turned on explicitly: var config = Configuration()
config.fragileHistoricalSnapshots = false
let dbPool = try DatabasePool(..., configuration: config) But, as you have understood:
If the extension wants to write, then yes it can invalidate snapshots: existing snapshots will throw errors. In case you'd have any doubt, this fragility does not apply to "regular" snapshots returned by |
As always, thank you for the clear explanation @groue! |
We have discovered a new area of SQLite :-) So it looks like you will stick with regular snapshots. Remains the question of making them mutable with a public refresh method. I'm still slightly reluctant to do it. With the transaction refresh, you're living on the edge by using private implementation details. It may break one day, but it won't break soon :-) I think it's not bad, you and I have time to gather more data and evidence: we're all learning. If you think we're done, you may close this issue :-) |
What did you do?
I'm using DatabaseSnapshot as a stable data source for my UI and hitting some performance bottlenecks when repeatedly rebuilding the DatabaseSnapshot.
From what I've read, the documentation recommends building a new DatabaseSnapshot and discarding the old one when you need to update it, but this can be quite expensive.
In my app, database writes occur mostly in the background. After a write, I need to update my UI's DatabaseSnapshot, then notify any stale views to re-render with the latest snapshot. This notification occurs as part of my UIDatabaseDelegate protocol in the following example:
In practice the UIDatabaseDelegate is usually almost equivalent to the set of ViewControllers on the navigation stack (typically 2-4 items).
Since snapshot isolation is transaction isolation, rather than building a new snapshot, it seems like we could do something cheaper, along the lines of:
Intuitively, and in my profiling, starting a new transaction is much faster than creating a new snapshot (which I believe opens a new database connection). However, it introduces some immediate questions:
concurrency - can we be sure the new transaction at [2] reflects the state of the db that was just committed? I think so, because didCommit happens within the context of
SchedulingWatchdog.preconditionValidQueue(database)
so no interstitial could have occurred.beginSnapshotIsolation is currently an internal method. Could it be public? Or could we encapsulate something like the above in a public
try databaseSnapshot.updateToLatest()
method? Is that something you'd be interested in supporting?would this have an impact on the db's ability to efficiently checkpoint?
I'm also curious if you think this is a reasonable approach generally, or if this is not how you envisioned DatabaseSnapshot being used.
I'm aware of the more targeted Observer flavors GRDB offers, e.g. each view has a view model based on the Records it uses. The same query(ies) that build the view models can be fed to an observer, etc. And although I might like to move more in that direction in the future, it doesn't mesh well with the current app architecture.
Environment
GRDB flavor(s): GRDB+SQLCipher
GRDB version: 4.3.0
Installation method: CocoaPods
Xcode version: 10.3
Swift version: 5.0.1
Platform(s) running GRDB: iOS
macOS version running Xcode: 10.14.6
The text was updated successfully, but these errors were encountered: