Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KV store read/write to disk #104

Closed
trgill opened this issue May 24, 2021 · 4 comments
Closed

KV store read/write to disk #104

trgill opened this issue May 24, 2021 · 4 comments
Assignees

Comments

@trgill
Copy link
Contributor

trgill commented May 24, 2021

What are we thinking for the KV store moving forward? Are we going to consider moving to something like lightning db, or extend the code we've got to sync to disk?

As for syncing to disk, I think we should be able to provide that functionality even with existing db backend without any major problems - in the manner of the dump command. But here, instead of sending the buffer/memfd over to client, we'd copy the buffer content to disk. The buffer_write already calls sendfile to copy the memfd to speficied fd so that should be pretty quick operation.

A more effective alternative (if we wanted to avoid even the extra copying between the memfd and the file) would be for the buffer code to support writing to a file directly as a new type of buffer backend which would use mmap-ed file (instead of anonymous memory like as it is with memfd). The code should be then almost the same as with "dump" code, only changing the buffer backend to that mmap-ed file and skipping the part where the "dump" code sends the memfd to client.

Then adding the part which would read the file and reconstruct the db out of it.

However, we need to take care that certain records/properties are not persistent over reboots, like major:minor. So we need to use persistent identifiers here instead whenever we reference devices in the db, which we don't do right at the moment - we're using exactly that major:minor...

That's actually something I'm working on right now, where I'm trying to generate a persistent unique identifier for a device ("unique" for the purpose of SID) which would allow us to:

  • Define device aliases (same device can be identified by more than one identifier).
  • Make the records persistent.
  • Define triggers on sets of devices in persistent way.

(That also includes the logic to match the persistent identifier with the actual device with major:minor as seen on the system.)

What I'm more concerned about is the ability to do effective selection/querying on the database where our current db backend comes short - it's simply a hash. We'd need to be able to select subsets of records, ranges. For that, we need a different backend than the hash (or at least extend the existing hash heavily). Whether we do it on our own or whether we reuse an existing database, that's a question whether we can find a database that will suit our needs and it won't restrict us more than bringing any good. It's true that LMDB has very close to this, but I hadn't tried that in depth so I could say it's the one we'd use for sure. It's still an open question...

Originally posted by @prajnoha in #103 (comment)

@trgill
Copy link
Contributor Author

trgill commented Jul 7, 2021

This is closely related to #108

I'm currently looking at replacing the kv_store functions with calls to LMDB.

https://github.com/LMDB/lmdb.git

@trgill trgill self-assigned this Jul 7, 2021
@prajnoha
Copy link
Member

prajnoha commented Jul 8, 2021

If you could look into LMDB in more detail and its workings, that would be of great help! Thanks!

I think we need to research LMDB's possibilities and check if we would be able to map our need onto the functinality that LMDB provides. I'm specifically interested in the snapshoting. Then whether there are any restrictions posed by LMDB, possible locks, key value lenghts... anything. Then of course whether we would be able to implement/map somehow our KV vector values where items might be added/removed to the same vector by different snapshots (== workers). I mean situation like this:

  • MY_VECTOR_VALUE = {a}
  • snapshot1/worker1 adds {b, c} to the vector
  • snapshot2/worker2 adds {d} to the vector
  • snapshot3/worker3 removes {a} from the vector
  • etc..

In the case above, we should end up with MY_VECTOR_VALUE = {b, c, d} in the end.

These vector values are important for us to be able to define "groups of items" which is basic building block in SID.

I haven't looked at LMDB in such detail that I could say it's perfect match for us or whether it would be so restricting that it's unusable for us - so that's something we need to simply research, check and try.

The best would be if we didn't need to replace the SID's KV store abstraction, but only add a new KV store backend...

@prajnoha
Copy link
Member

prajnoha commented Jul 8, 2021

Then of course whether we would be able to implement/map somehow our KV vector values where items might be added/removed to the same vector by different snapshots (== workers). I mean situation like this:

  • MY_VECTOR_VALUE = {a}
  • snapshot1/worker1 adds {b, c} to the vector
  • snapshot2/worker2 adds {d} to the vector
  • snapshot3/worker3 removes {a} from the vector
  • etc..

In the case above, we should end up with MY_VECTOR_VALUE = {b, c, d} in the end.

...I think LMDB doesn't have vectors/lists value types, there's just "simple value", not typed. So the "add to the vector"/"remove from the vector" operations would need to be provided in addition by us.

That means, if we're adding an item to the vector, we would:
(1) GET the old value for provided key
(2) look if the item we're adding to vector value for the key is already there:
(2a) if yes, there's nothing to do, the vector value already contains the item
(2b) if not, add the item to the old value vector and PUT the result as new value for the key

This practically means that there's the extra step (1) - before we PUT the value, we need to GET it first for us to be able to add the item to the vector we're storing as "value".

(The same would apply for removing item from the vector, just checking if the item is there and if yes, removing it...)

With the hash backend we use today, we don't actually need the extra explicit "(1) GET "step, because when inserting a value to a hash, the hash itself already checks if there's any value stored under the given key and consults with the update callback what should be done next if there's any existing value under that key:

sid/src/include/base/hash.h

Lines 127 to 137 in cf867f8

/*
* hash_update function calls hash_update_fn callback with hash_update_fn_arg right before the update
* and based on callback's return value, it either keeps the old data or updates with new data.
*/
int hash_update(struct hash_table *t,
const void * key,
uint32_t key_len,
void ** data,
size_t * data_len,
hash_update_fn_t hash_update_fn,
void * hash_update_fn_arg);

(The data can even be edited by the hash_update_fn - that's how we support adding/removing items to vector values under hash backend.)

But I think if LMDB is quick (which I believe it is), then we don't need to care about the extra "GET" for each "PUT" operation... It's just something to keep in mind and count with.

@prajnoha
Copy link
Member

prajnoha commented Feb 2, 2022

We can redirect the database dump with serialized records to specified file and reload on SID initialization, related patches:
91f19e8
449e21d
38f47e1
73ad629

@prajnoha prajnoha closed this as completed Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants