-
-
Notifications
You must be signed in to change notification settings - Fork 177
Snapshots #486
Comments
Thanks for reaching out! Previous discussions: TLDR: there hasn't been a compelling use case for exposing snapshots. Can you share yours? :) Regarding "multiple get calls": we're working to expose iterators and
In what way and why was it necessary? |
See also Level/levelup#491 (comment) |
Exposing iterators in LevelUP would do the trick in some use cases but then you can't have multiple ReadStreams (they cannot share the same iterator concurrently). db.createValueStream({start:'index c', end:'index d'}) .on('data', key => { db.get('record ' + key, (err, value) => { ... }) ) |
|
Multiple get calls on one single snapshot can be achieved by creating multiple iterators at the exact same time. @frankboehmer have you tried this approach? |
@peakji Actually, I wasn't aware of this behavior. This would do the trick :). It's a weird interface, though. Having a let snap = db.snapshot() and then snap.get(), snap.createReadStream() etc., snap.close() on LevelUP would seem much more intuitive to me. And the implementation is probably quite simple, at least for the LeveDOWN backend. |
Also you'd need to create all required iterators up-front which might be ugly in some use cases. |
at least for the LeveDOWN backend is exactly right 😉 |
@frankboehmer This is just a workaround 😉. It works because each But after reading the index-retrieve use case above, I'm starting to agree that it's necessary to expose raw However, I don't think this feature should be added in +1 for implementing in |
I'd like to hear why the consistency is required. Often it's perfectly fine to do: db.createValueStream().on('data', key => {
db.get(key, (err, value) => {
if (err.notFound) {
// Just forget about this key and move on
}
})
) |
@vweevers think about the index as a reverse index of a data field that's also stored in the record. If the record gets updated in between, you get inconsistencies - the index doesn't reflect the current state of the record. You might find all sorts of workarounds to handle such a situation - however, it stays an inconsistent view. |
@vweevers Suppose we are creating a simple inverted index, and we put the index keys and documents in one single database by using different prefixes:
We can issue a prefix query for docs with words starting with |
I see your point and want to add the nuance that many people dó think about this and decide they don't care about such inconsistencies. Taking @peakji's example, which is excellent, the end result might be that an application sometimes shows a search result for "spiderman" that isn't relevant anymore. That can be acceptable. That said, I understand the use case. I'm not sure yet about:
It definitely shouldn't be a required backend feature, but that doesn't mean levelup can't wrap it. It could handle open state, read streams (like it wraps iterators on a regular db) and promises. |
@peakji just curious: why did you make this edit to your comment:
|
Just theoretically, I didn't run the benchmark yet: In the inverted index example, we can have another iterator and use |
|
Exposing snapshots can make it easier to build a transaction layer. There was some work on this back in 2015 that was never merged. PR #152 https://ongardie.net/blog/node-leveldb-transactions/ |
Is this still true if one of the iterators is done through the stream creation? const i = db.iterator();
for await (const o of db.createReadStream()) {
// use i here
// can we expect consistency between o and i?
} |
@CMCDragonkai Yes, because a stream is backed by an iterator too, which is created when you call |
Ok that's useful, I was wondering that iterators have the Does this mean that |
Just a reference. |
I've been attempting to implement snapshot isolation in leveldb, and although iterators provide a snapshot, one problem is the ability to derive further iterators from the same snapshot. Right now this can only be done by creating all the iterators up-front and at the same time. This limits the expressiveness of iterators. What would it take to bring snapshots directly to the leveldown interface (if not the abstract-level` interface?). |
At the moment, this is a documentation-only PR, acting as an RFC. In particular I'd like review of my choice to use a token-based approach (as first suggested by Rod Vagg in Level/community#45) as opposed to a dedicated snapshot API surface (as suggested by Julian Gruber in Level/community#47). Main reasons for not choosing the latter: - It would be a third API surface. We have the main database API, the sublevel API which is equal to that except for implementation- specific additional methods, and would now add another API which only has read methods (get, iterator, etc, but not put and batch). If the API was reusable, meaning you could pass around a snapshot as if it was a regular database (like you can with sublevels) then I'd be cool with it. But for that to happen, we'd have to implement transactions as well, which I consider to be out of scope although transactions are in fact a use case of snapshots. - It would have a higher complexity once you factor in sublevels. I.e. to make `db.sublevel().snapshot().get(key)` read from the snapshot but also prefix the given key. By instead doing `db.sublevel().get(key, { snapshot })`, the sublevel can just forward that snapshot option to its parent database. - Furthermore, with the token approach, you can pass a snapshot to multiple sublevels, which cleanly solves the main use case of retrieving data from an index (I included an example in the docs). I renamed the existing snapshot mechanism to "implicit snapshots" and attempted to clarify the behavior of those as well. If accepted, some related issues can be closed, because this PR: - Includes (a more complete write-up than) Level/classic-level#40. - Mentions Level/leveldown#796 (which should be moved rather than closed). - Answers Level/classic-level#28. - Solves the main use case as described in Level/leveldown#486. - Effectively completes Level/awesome#19.
Tracked in Level/community#118. It won't be added to |
Leveldb exposes a feature, called "Snapshots" that comes in handy to get consistent reads on a database. LevelDOWN makes use of snapshots to provide consistent ReadStreams. However, sometimes one needs to combine this with additional read operations - e.g. multiple ReadStreams or multiple get calls. Those won't be consistent because they do not share a common snapshot.
Here's a fork that exposes snapshots to node.js. db.snapshot() creates a snapshot which provides the normal read operations known from the LevelDOWN interface: https://github.com/frankboehmer/leveldown
It's important to close the snapshot when no longer needed because it may suck up resources.
Since the implementation of a snpashot interface would also need to be discussed with LevelUP, I did not create pull request. Even worse: the fork reintegrates parts of LevelUP.
I think, snapshots would be a nice addition to LevelDOWN / LevelUP. And I feel it's quite easy to achieve. So please feel free to look at the fork or get in touch to join forces to make this happen.
Cheers, Frank
The text was updated successfully, but these errors were encountered: