Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ask about hash expansion #97

Open
KeeProMise opened this issue Jun 26, 2022 · 7 comments
Open

Ask about hash expansion #97

KeeProMise opened this issue Jun 26, 2022 · 7 comments

Comments

@KeeProMise
Copy link

During the hash expansion, the data in the hash bucket on the disk will be moved to the new bucket. Is the performance affected by the disk IO?

@vinniefalco
Copy link
Member

I'm not sure I understand the question. NuDB only works with SSD drives, and the performance is near the theroetical maximum for what is possible with a single write queue. Are you asking me if the overall performance of the library is affected by the performance of the SSD drive? Obviously yes, so I am maybe not understanding the question fully.

@KeeProMise
Copy link
Author

I'm not sure I understand the question. NuDB only works with SSD drives, and the performance is near the theroetical maximum for what is possible with a single write queue. Are you asking me if the overall performance of the library is affected by the performance of the SSD drive? Obviously yes, so I am maybe not understanding the question fully.

I know that you use linear hash to solve hash table expansion. Do you directly use the native linear hash expansion algorithm? In my understanding, the expansion of the native linear hash table will cause the data in some slots to be moved to the new slot. I think this process will involve random io. Have you optimized this process?

@vinniefalco
Copy link
Member

vinniefalco commented Jun 26, 2022

Do you directly use the native linear hash expansion algorithm?

No. The principle is the same, but since NuDB is block/bucket based, every expansion of the hash table adds an entire block of slots and not just one. NuDB is very fast. You should run your own benchmarks to confirm that the performance meets your requirements.

There's a benchmark program that compares NuDB to other key/value stores:
https://github.com/CPPAlliance/NuDB/tree/master/bench

@KeeProMise
Copy link
Author

Do you directly use the native linear hash expansion algorithm?

No. The principle is the same, but since NuDB is block/bucket based, every expansion of the hash table adds an entire block of slots and not just one. NuDB is very fast. You should run your own benchmarks to confirm that the performance meets your requirements.

There's a benchmark program that compares NuDB to other key/value stores: https://github.com/CPPAlliance/NuDB/tree/master/bench

Thank you for your reply, I would also like to ask, is your key a fixed length? How many bits do you map the key to, and what is the hash algorithm for the key? What is the approximate number of keys you support? What is the approximate conflict rate?

@vinniefalco
Copy link
Member

vinniefalco commented Jun 26, 2022

is your key a fixed length?

Yes. You set the key size when the database is created.

How many bits do you map the key to

64 bits

what is the hash algorithm for the key

The default algorithm uses xxhasher which is fine for general purpose use. But this can be changed, if you can take advantage of how your keys are distributed.

What is the approximate number of keys you support

Because of the birthday problem, you can insert 2^32 (over 4 billion) keys before performance will begin to degrade. But since a database can be trivially split up (by dividing the key space by a power of two) the number of keys is essentially unlimited. NuDB is in production use with databases that are over 13 terabytes in size.

What is the approximate conflict rate?

The default settings aim for 50% occupancy in buckets. Less than 1% of buckets require two blocks on disk ("spills").

Remember that each bucket is designed to hold many keys not just one. And the cost for I/O is much higher than the cost for in-memory lookup.

@KeeProMise
Copy link
Author

Thank you very much for your patient answer. At present, I want to design a kv database, the key is a string, and the value is a list, which is used to store some integer values. Currently, I need to support adding kv operations and adding numbers to a list of a key. And want to get the list of a key as quickly as possible, do you mind? At present, my idea is to use linear hash to solve the expansion of the hash table when adding keys, but adding an integer value to a key to the list may cause the value of a key to be distributed in multiple disk blocks, which requires multiple disk io to obtain the complete value, so the read performance is poor.

@vinniefalco
Copy link
Member

NuDB is not going to work well for you, because you intend to change the value of already-inserted entries. Modification and deletion of existing values is not possible in NuDB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants