Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimize the use of keys related to index operations #4

Open
WolfDan opened this issue Jan 27, 2019 · 2 comments
Open

Minimize the use of keys related to index operations #4

WolfDan opened this issue Jan 27, 2019 · 2 comments
Labels
block This feature could be important to the project but is blocked due given reason

Comments

@WolfDan
Copy link
Contributor

WolfDan commented Jan 27, 2019

Right now any data type index operation could face a duplicate key prefix which is quite inefficient

Let's take for example 2 record

user_a = %{name: "Artorias", deaths: 50}
user_b = %{name: "Chosen Undeath", deaths: 50}

the index result of the deaths record would looks like

("user", "deaths", 50, user_a_node_uid) = ''
("user", "deaths", 50, user_b_node_uid) = ''

As you can see we have repeated the prefix "node_name", "deaths", 50 which is quite long by itself, containing bit_string data as node_name, property_name and property_value, so this proposal is to change it this way

("user", "deaths", 50, random_id) = [user_a_node_uid, user_b_node_uid]

we use the random_id in order to "extend" the index, since FDB has a limitation on value size, so when it reach this size we split the index in order to add more uids into the result of the index

This way a single key prefix can contain a quite big amount of node uids

I don't think it will have any repercution, normally on any index you need to query them all in order to bring any query result

@WolfDan
Copy link
Contributor Author

WolfDan commented Feb 2, 2019

I'm doing some tests on the topic, there's a problem with this aproach and is that increase the transaction conflics

The logical steps looks like this:

  • Get the range by the node_name, property_name and property_value
  • If empty then write the key index
  • If data, we check the keys size and select a key that has space left
  • Add the uid to the value of the key and update

The problem resides on the last step, since the first one is getting the key range and the last one is writting on one of those keys it creates a transaction conflict

I'll be doing more testings in order to fix those issues

@WolfDan WolfDan added the block This feature could be important to the project but is blocked due given reason label Feb 11, 2019
@WolfDan
Copy link
Contributor Author

WolfDan commented Mar 12, 2019

With the rewrite now a big part of the tuple is converted into a Directory, that means that this part will be compressed into a single small integer and can be easier to move or rename later on

Atm this is the best way to reduce space usage without making too much transaction conflicts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
block This feature could be important to the project but is blocked due given reason
Projects
None yet
Development

No branches or pull requests

1 participant