Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to abstract-level interface #11

Closed
CMCDragonkai opened this issue Mar 17, 2022 · 11 comments · Fixed by #19
Closed

Upgrade to abstract-level interface #11

CMCDragonkai opened this issue Mar 17, 2022 · 11 comments · Fixed by #19
Assignees
Labels
development Standard development epic Big issue with multiple subissues r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management

Comments

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Mar 17, 2022

Specification

The abstract-level https://github.com/Level/abstract-level is a new foundation of LevelDB. Upgrading to this will help us resolve all other issues in DB.

  • Use bufferWrap to support ArrayBuffer, Uint8Array and Buffer #3 - abstract-level supports it directly
  • Integrate DB into the level db interface so db levels are the same as db #8 - Our DB abstraction should be rebuilt as an abstract-level layer, that is a class extending AbstractLevel
    class DB extends AbstractLevel { ... }
    
    It will still need to use classic-level (which is just leveldown atm) in NodeJS because NodeJS doesn't support indexed DB atm. But this should then basically make DB just an implementation of the abstract-level interface.
  • DB would then be a specialised classic-level that sets up root sublevels of data, transactions, and index and performs semi-transparent encryption & decryption (since it doesn't have a block level yet).
  • DB would then return sublevels as wrapped DBs as well...
  • Implement True Snapshot Isolation for LevelDB #4 Should already be resolved, since we end up using iterator() for read-committed level transactions, and true snapshots aren't really needed.
  • Integrate Automatic Indexing into DB #1 becomes possible now that we have a better idea of what kind of indexing needs we need, and we can put all the index data under the index root level

#5 still cannot be solved until IDB becomes available, then we can use browser-level or level-js and have the transparent encryption be done at the block level. Or done at the C++ level inside leveldb, or rocksdb.

Wait until level has released version 8, although this can be started now, since we may not need to use the level package at all. The only dependencies of js-db may be abstract-level and classic-level.

Additional context

Tasks

  1. ...
  2. ...
  3. ...
@CMCDragonkai CMCDragonkai added development Standard development epic Big issue with multiple subissues labels Mar 17, 2022
@CMCDragonkai CMCDragonkai changed the title Upgrade to https://github.com/Level/abstract-level Upgrade to abstract-level interface Mar 17, 2022
@CMCDragonkai
Copy link
Member Author

I want to change the API interface to:

get(keys: string | Buffer | Array<string | Buffer>)

So that way get('key') or get(['level1', 'key']) can work.

And a way to create deeply nested levels more easily: Level/subleveldown#112

@CMCDragonkai
Copy link
Member Author

We should remove any usage and management of sublevels. Unless the sublevels themselves are just DB objects. But mostly it should be possible to just directly use the DB object instance and use key paths.

The APIs between DB and DBTransaction is still different because we want to eventually move to where you can just specify a key path instead of directly maintaining particular sublevel objects. Once all methods can just take a full keypath, there's no need to manage a bunch of sublevel objects, and that should reduce the object maintenance overhead.

// root iterator at dataDb
DB.iterator(options);
// same
DB.iterator(options, []);
// down level 1
DB.iterator(options, ['level1']);
DB.iterator(options, ['level1', 'level2]);
DBTransaction.iterator(options, ['level1', 'level2']);

DB.count();
DB.count(['level1', 'level2']);
DBTransaction.count();
DBTransaction.count(['level1']);

DB.clear();
DB.clear(['level1']);
DBTransaction.clear(['level1']); // this should be a transactional clear, to do this, we must iterator and then do `db.del`

DB.get('key');
DB.get(['level1', 'key']);
// these are not allowed, or they are equivalent to asking for undefined
DB.get([]);
DB.get();

DB.put('key', 'value');
DB.put(['level1', 'key'], 'value');

DB.del('key');
DB.del(['level1', 'key']);
DBTransaction.del('key');
DBTransaction.del(['level1', 'key']);

@CMCDragonkai
Copy link
Member Author

We will need to test and benchmark sublevel creation multiple times to see how fast we can do this though, because somethings like iterator and count will require creating those sublevels.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Mar 23, 2022

Some benchmarking results show that naively creating sublevels in order to support the key paths API or the domain APIs is quite slow:

  create 1 sublevels:
    87 320 ops/s, ±0.79%   | fastest

  create 2 sublevels:
    45 325 ops/s, ±0.55%   | 48.09% slower

  create 3 sublevels:
    30 294 ops/s, ±0.39%   | 65.31% slower

  create 4 sublevels:
    22 764 ops/s, ±2.34%   | slowest, 73.93% slower

  get via sublevel:
    25 055 ops/s, ±1.36%   | 71.31% slower

  get via key path concatenation:
    54 992 ops/s, ±2.68%   | 37.02% slower

Therefore it appears that if we want to have an efficient implementation, instead of just creating intermediate sublevel objects, we instead should be doing key path/prefix concatenation.

Now this is what we do when we use get and put and del, however the methods like level and count and dump and iterator are not, and are just creating sublevel objects.

For level, count, dump and iterator, we fundamentally need subleveldown to support taking the concatenated prefix, or an array type as the prefix in Level/subleveldown#112 (comment). But this is prevented by subleveldown's RangeError that is thrown Level/subleveldown#111 (comment).

Basically it seems like db.sublevel(['foo', bar']) would be implemented efficiently as db.sublevel('foo!bar') so that the key path is just concatenated rather than creating intermediate sublevel objects.

Our in our case:

subleveldown(dbLevel, `foo${utils.prefix}bar`)
// or
subleveldown(dbLevel, ['foo', 'bar'])

@CMCDragonkai
Copy link
Member Author

I suggest that we remove subleveldown entirely, and just allow key paths for both iterators and get, put, del and clear... etc. Most custom operations like dump, clear, count all rely on the iterator. And we don't need subleveldown if we run our iterator based on a specific key path for gt and lt`.

I reckon it would be something like this:

iterator({
  // gt means greater than this key
  // since !A!B! would be the domain path, any key added would always be 1 character more
  gt: '!A!B!',
  // this would have to be the very next byte number above A!B, but that which doesn't produce an extra character, we could create a buffer and just increment by 1 the last byte
  // if `!` is the separator which is 0x21, then plus 1 would be `"` which is 0x22
  lt: '!A!B"'
})

Then we probably solve a number of problems.

@CMCDragonkai
Copy link
Member Author

This script proves we can just use gt and lt to get a proper iteration:

import DB from './src/DB';

async function main () {
  const db = await DB.createDB({
    dbPath: './tmp/db',
    fresh: true
  });

  const l1 = await db.level('level1');
  await db.put(['level1'], 'k', 'v');
  await db.put(['level1'], 'a', 'v');
  await db.put(['level1'], 'c', 'v');

  await db.level('level2');
  await db.put(['level2'], 'k', 'v');
  await db.put(['level2'], 'a', 'v');
  await db.put(['level2'], 'c', 'v');

  await db.put([], '!level1"', 'v');
  await db.put([], '!level1 ', 'v');

  console.log('USING SUBLEVELDOWN');
  for await (const k of l1.createKeyStream()) {
    console.log(k.toString());
  }

  console.log('ITERATOR');
  const i = db.dataDb.iterator({
    gt: '!level1!',
    lt: '!level1"'
  });
  // @ts-ignore
  for await (const [k, v] of i) {
    console.log(k.toString());
  }

  console.log('KEY STREAM');
  for await (const k of db.dataDb.createKeyStream({
    gt: '!level1!',
    lt: '!level1"'
  })) {
    console.log(k.toString());
  }

  console.log('DUMP');
  console.log(await db.dump());

  await db.stop();
}

main();

I just realised that dump doesn't preserve order, and that it assumes keys can just be strings. We can change this to an array of tuples, where the first tuple is always a buffer, or a binary string, and the value is decrypted as usual. If I want to use binary string I can use Buffer.toString('binary'), not this is the same as if I used String.fromCharCode(...key), but I used that in js-id due to Uint8Array.

The raw === true can be used to return buffers for both.

@CMCDragonkai
Copy link
Member Author

With this prototype, I might just go ahead with getting rid of subleveldown entirely, and then that would make this 3.0.0 as the APIs are again different. Then that would affect PK quite significantly.

@CMCDragonkai
Copy link
Member Author

Even without using subleveldown, the DB can still use abstract-level, and we sort of rely less on what the underlying system can do.

@CMCDragonkai
Copy link
Member Author

I missed one thing, sublevels are actually indexed like this:

!A!!B!k

So between sublevels we have !! not just a single !.

@CMCDragonkai
Copy link
Member Author

Due to #12, the 3.0.0 DB has been released and the interface shouldn't change all that much even with upgrade to abstract-level.

@CMCDragonkai
Copy link
Member Author

#13 has enabled the ability to arbitrarily use level parts without worrying about reserved bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development epic Big issue with multiple subissues r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management
Development

Successfully merging a pull request may close this issue.

2 participants