KV store read/write to disk #104

trgill · 2021-05-24T18:03:00Z

What are we thinking for the KV store moving forward? Are we going to consider moving to something like lightning db, or extend the code we've got to sync to disk?

As for syncing to disk, I think we should be able to provide that functionality even with existing db backend without any major problems - in the manner of the dump command. But here, instead of sending the buffer/memfd over to client, we'd copy the buffer content to disk. The buffer_write already calls sendfile to copy the memfd to speficied fd so that should be pretty quick operation.

A more effective alternative (if we wanted to avoid even the extra copying between the memfd and the file) would be for the buffer code to support writing to a file directly as a new type of buffer backend which would use mmap-ed file (instead of anonymous memory like as it is with memfd). The code should be then almost the same as with "dump" code, only changing the buffer backend to that mmap-ed file and skipping the part where the "dump" code sends the memfd to client.

Then adding the part which would read the file and reconstruct the db out of it.

However, we need to take care that certain records/properties are not persistent over reboots, like major:minor. So we need to use persistent identifiers here instead whenever we reference devices in the db, which we don't do right at the moment - we're using exactly that major:minor...

That's actually something I'm working on right now, where I'm trying to generate a persistent unique identifier for a device ("unique" for the purpose of SID) which would allow us to:

Define device aliases (same device can be identified by more than one identifier).
Make the records persistent.
Define triggers on sets of devices in persistent way.

(That also includes the logic to match the persistent identifier with the actual device with major:minor as seen on the system.)

What I'm more concerned about is the ability to do effective selection/querying on the database where our current db backend comes short - it's simply a hash. We'd need to be able to select subsets of records, ranges. For that, we need a different backend than the hash (or at least extend the existing hash heavily). Whether we do it on our own or whether we reuse an existing database, that's a question whether we can find a database that will suit our needs and it won't restrict us more than bringing any good. It's true that LMDB has very close to this, but I hadn't tried that in depth so I could say it's the one we'd use for sure. It's still an open question...

Originally posted by @prajnoha in #103 (comment)

The text was updated successfully, but these errors were encountered:

trgill · 2021-07-07T18:31:23Z

This is closely related to #108

I'm currently looking at replacing the kv_store functions with calls to LMDB.

https://github.com/LMDB/lmdb.git

prajnoha · 2021-07-08T09:23:16Z

If you could look into LMDB in more detail and its workings, that would be of great help! Thanks!

I think we need to research LMDB's possibilities and check if we would be able to map our need onto the functinality that LMDB provides. I'm specifically interested in the snapshoting. Then whether there are any restrictions posed by LMDB, possible locks, key value lenghts... anything. Then of course whether we would be able to implement/map somehow our KV vector values where items might be added/removed to the same vector by different snapshots (== workers). I mean situation like this:

MY_VECTOR_VALUE = {a}
snapshot1/worker1 adds {b, c} to the vector
snapshot2/worker2 adds {d} to the vector
snapshot3/worker3 removes {a} from the vector
etc..

In the case above, we should end up with MY_VECTOR_VALUE = {b, c, d} in the end.

These vector values are important for us to be able to define "groups of items" which is basic building block in SID.

I haven't looked at LMDB in such detail that I could say it's perfect match for us or whether it would be so restricting that it's unusable for us - so that's something we need to simply research, check and try.

The best would be if we didn't need to replace the SID's KV store abstraction, but only add a new KV store backend...

prajnoha · 2021-07-08T13:13:32Z

Then of course whether we would be able to implement/map somehow our KV vector values where items might be added/removed to the same vector by different snapshots (== workers). I mean situation like this:

MY_VECTOR_VALUE = {a}

snapshot1/worker1 adds {b, c} to the vector

snapshot2/worker2 adds {d} to the vector

snapshot3/worker3 removes {a} from the vector

etc..

In the case above, we should end up with MY_VECTOR_VALUE = {b, c, d} in the end.

...I think LMDB doesn't have vectors/lists value types, there's just "simple value", not typed. So the "add to the vector"/"remove from the vector" operations would need to be provided in addition by us.

That means, if we're adding an item to the vector, we would:
(1) GET the old value for provided key
(2) look if the item we're adding to vector value for the key is already there:
(2a) if yes, there's nothing to do, the vector value already contains the item
(2b) if not, add the item to the old value vector and PUT the result as new value for the key

This practically means that there's the extra step (1) - before we PUT the value, we need to GET it first for us to be able to add the item to the vector we're storing as "value".

(The same would apply for removing item from the vector, just checking if the item is there and if yes, removing it...)

With the hash backend we use today, we don't actually need the extra explicit "(1) GET "step, because when inserting a value to a hash, the hash itself already checks if there's any value stored under the given key and consults with the update callback what should be done next if there's any existing value under that key:

sid/src/include/base/hash.h

Lines 127 to 137 in cf867f8

    
           /* 
        
            * hash_update function calls hash_update_fn callback with hash_update_fn_arg right before the update 
        
            * and based on callback's return value, it either keeps the old data or updates with new data. 
        
            */ 
        
           int hash_update(struct hash_table *t, 
        
                           const void *       key, 
        
                           uint32_t           key_len, 
        
                           void **            data, 
        
                           size_t *           data_len, 
        
                           hash_update_fn_t   hash_update_fn, 
        
                           void *             hash_update_fn_arg);

(The data can even be edited by the hash_update_fn - that's how we support adding/removing items to vector values under hash backend.)

But I think if LMDB is quick (which I believe it is), then we don't need to care about the extra "GET" for each "PUT" operation... It's just something to keep in mind and count with.

prajnoha · 2022-02-02T12:26:59Z

We can redirect the database dump with serialized records to specified file and reload on SID initialization, related patches:
91f19e8
449e21d
38f47e1
73ad629

trgill self-assigned this Jul 7, 2021

prajnoha closed this as completed Feb 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KV store read/write to disk #104

KV store read/write to disk #104

trgill commented May 24, 2021

trgill commented Jul 7, 2021

prajnoha commented Jul 8, 2021 •

edited

Loading

prajnoha commented Jul 8, 2021 •

edited

Loading

prajnoha commented Feb 2, 2022

KV store read/write to disk #104

KV store read/write to disk #104

Comments

trgill commented May 24, 2021

trgill commented Jul 7, 2021

prajnoha commented Jul 8, 2021 • edited Loading

prajnoha commented Jul 8, 2021 • edited Loading

prajnoha commented Feb 2, 2022

prajnoha commented Jul 8, 2021 •

edited

Loading

prajnoha commented Jul 8, 2021 •

edited

Loading