-
-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MongoDB and Redis stores #372
Conversation
Implements a key-value store using SQLite. As this is a builtin module in Python and a common database to use in various languages, this should have high utility and be very portable. Not to mention many databases provide an SQLite language on top regardless of the internal representation. So this can be a great template for users wishing to work with Zarr in their preferred database.
Try using the `SQLiteStore` everywhere one would use another store and make sure that it behaves correctly. This includes simple key-value store usage, creating hierarchies, and storing arrays.
Provide a few examples of how one might use `SQLiteStore` to store arrays or groups. These examples are taken with minor modifications from the `LMDBStore` examples.
Includes a simple example borrowed from `LMDBStore`'s tutorial example, which shows how to create and use an `SQLiteStore`.
Otherwise we may end up opening a different databases' files and try to use them with SQLite only to run into errors. This caused the doctests to fail previously. Changing the extension as we have done should avoid these conflicts.
Instead of opening, committing, and closing the SQLite database for every operation, limit these to user requested operations. Namely commit only when the user calls `flush`. Also close only when the user calls `close`. This should make operations with SQLite much more performant than when we automatically committed and closed after every user operation.
As users need to explicitly close the `SQLiteStore` to commit changes and serialize them to the SQLite database, make sure to point this out in the docs.
Appears some of these commands work without capitalization. However as the docs show commands as capitalized, ensure that we are doing the same thing as well. That way this won't run into issues with different SQL implementations or older versions of SQLite that are less forgiving. Plus this should match closer to what users familiar with SQL expect.
Make use of `in` instead of repeating the same logic in `__delitem__`. As we now keep the database open between operations, this is much simpler than duplicating the key check logic. Also makes it a bit easier to understand what is going on.
This was needed when the `import` of `sqlite3` was only here to ensure that it existed (even though it wasn't used). Now we make use of `sqlite3` where it is being imported. So there is no need to tell flake8 to not worry about the unused import as there isn't one.
Make sure that everything intended to be added to the `SQLiteStore` database has been written to disk before attempting to pickle it. That way we can be sure other processes constructing their own `SQLiteStore` have access to the same data and not some earlier representation.
No need to normalize the path when there isn't one (e.g. `:memory:`).
Fix a typo. Co-Authored-By: jakirkham <[email protected]>
Include author and original issue in changelog entry. Co-Authored-By: jakirkham <[email protected]>
The default value for `check_same_thread` was previously set to `False` when in reality we want this check enabled. So set `check_same_thread` to `True`.
As users could change the setting of things like `check_same_thread` or they may try to access the same database from multiple threads or processes, make sure to flush any changes that would mutate the database.
As we now always commit after an operation that mutates the data, there is no need to commit before pickling the `SQLiteStore` object. After all the data should already be up-to-date in the database.
As everything should already be flushed to the database whenever the state is mutated, there is no need to perform this before closing.
While there is a default implementation of `update` for `MutableMapping`s, it means that we perform multiple `__setitem__` operations. However it would be better if we could commit all key-value pairs in one operation and commit them. Hence we implement `update` for this purpose.
Simplifies `__setitem__` to an `update` operation with a dictionary that contains only one item. This works the same as before, but cuts out some redundancy, which simplifies the code a bit.
As we now make sure to commit after every mutating change to the database, disable `check_same_thread` again as it should be safe.
As some of these parameters no longer make sense to be user customizable, go ahead and just set their values as part of the `sqlite3.connect` call. This ensures that they are set the way we expect. Also it ensures that if users try to mess with them, an error will be raised due to duplicate keyword arguments. To elaborate on why these parameters are not user configurable any more, `detect_types` only makes sense if one is building their own table with specific types. Instead we build the table for users and have very generic types (i.e. text and blobs), which are not worth checking. As we commit after every modification to the database to make it more friendly for other threads and processes, the `isolation_level` might as well be to auto-commit. Setting it to anything else really has no effect. Finally there is no need for `check_same_thread` to be anything other than `False` as we are guaranteeing everything is committed after mutation, which ensures the database is thread and process safe.
As we already enable auto-committing, any mutation is automatically written after performed. So there is no need for us to commit afterwards. Besides `commit` is turned into a no-op if auto-committing is enabled.
As we auto-commit all changes, there is no need for a `flush` operation for the `SQLiteStore`. So go ahead and drop the `flush` function and its documentation.
As the default implementation of `clear` deletes each key-value pair, this will be considerably slower than an operation that can remove all of them at once. Here we do exactly that by using SQL's `DROP TABLE`. Unfortunately there is not a truncate table command, but doing a drop followed by a create has the same effect. We combine these two operations using `executescript`. Thus auto-commit won't run until after both have run, which will commit the table with all key-value pairs removed.
@jakirkham and @alimanfoo - I think this is basically done. I don't know why the coveralls report is failing. My reading of the coveralls report show that we have 100% coverage on the new bits in this PR but please let me know if I'm missing something. |
@jhamman I think coveralls is sometimes broken, I have seen that before where coverage is 100% on the new/modified files but somehow coveralls reports total coverage down. One way to check is to make a trivial commit and see if the problem persists. I'll also do some digging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good to me. A couple of very small queries, but up to you if you want to do anything about them.
Only other question is whether you want to mark these two new classes as "experimental features" or not. This is just a convention we've been using to highlight features that are new, and so might require a bit of further work before the API can be considered fully stable. I'm OK either way.
If you do want to mark as experimental then add this after the first paragraph of the docstring:
.. note:: This is an experimental feature.
zarr/tests/test_storage.py
Outdated
|
||
def create_store(self): | ||
# TODO: this is the default host for MongoDB on Travis, | ||
# we probably want to generalize this though |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO something you want to address before merge? Would we ever need to do something different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I think the more important question is how do you want to handle these tests when a mongodb or redis server is not available. How should we decide when to skip these tests?
class TestRedisStore(StoreTests, unittest.TestCase): | ||
|
||
def create_store(self): | ||
# TODO: this is the default host for Redis on Travis, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, is this TODO something that needs to be resolved before merge, or can we live with as-is?
I think the coveralls report is broken, all files have 100% coverage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I think the only outstanding question here is the one you raised, which is how to deal with skipping tests if redis or mongo servers are not running locally.
Co-Authored-By: jhamman <[email protected]>
Co-Authored-By: jhamman <[email protected]>
Travis fails because of flake8:
(Aside: I'm going to enable the pep8speaks bot so we don't have to go digging through the travis logs.) |
I've fixed the pep8 problem, I think this is good to go, will merge if CI passes. |
OK, this is going in, thanks all. |
This adds two new stores to zarr, MongoDB and Redis. This is basically the same implementation that @nbren12 proposed in this gist. The PR is far from complete but I wanted to put it up for others to see, should there be interest.
This is also building on #368 so that should be merged first.
Finally, I should say that I've been meaning to learn a bit about how zarr stores are implemented so I'm using this as a learning experience.
closes #299
TODO:
tox -e docs
)