-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ask about hash expansion #97
Comments
I'm not sure I understand the question. NuDB only works with SSD drives, and the performance is near the theroetical maximum for what is possible with a single write queue. Are you asking me if the overall performance of the library is affected by the performance of the SSD drive? Obviously yes, so I am maybe not understanding the question fully. |
I know that you use linear hash to solve hash table expansion. Do you directly use the native linear hash expansion algorithm? In my understanding, the expansion of the native linear hash table will cause the data in some slots to be moved to the new slot. I think this process will involve random io. Have you optimized this process? |
No. The principle is the same, but since NuDB is block/bucket based, every expansion of the hash table adds an entire block of slots and not just one. NuDB is very fast. You should run your own benchmarks to confirm that the performance meets your requirements. There's a benchmark program that compares NuDB to other key/value stores: |
Thank you for your reply, I would also like to ask, is your key a fixed length? How many bits do you map the key to, and what is the hash algorithm for the key? What is the approximate number of keys you support? What is the approximate conflict rate? |
Yes. You set the key size when the database is created.
64 bits
The default algorithm uses xxhasher which is fine for general purpose use. But this can be changed, if you can take advantage of how your keys are distributed.
Because of the birthday problem, you can insert 2^32 (over 4 billion) keys before performance will begin to degrade. But since a database can be trivially split up (by dividing the key space by a power of two) the number of keys is essentially unlimited. NuDB is in production use with databases that are over 13 terabytes in size.
The default settings aim for 50% occupancy in buckets. Less than 1% of buckets require two blocks on disk ("spills"). Remember that each bucket is designed to hold many keys not just one. And the cost for I/O is much higher than the cost for in-memory lookup. |
Thank you very much for your patient answer. At present, I want to design a kv database, the key is a string, and the value is a list, which is used to store some integer values. Currently, I need to support adding kv operations and adding numbers to a list of a key. And want to get the list of a key as quickly as possible, do you mind? At present, my idea is to use linear hash to solve the expansion of the hash table when adding keys, but adding an integer value to a key to the list may cause the value of a key to be distributed in multiple disk blocks, which requires multiple disk io to obtain the complete value, so the read performance is poor. |
NuDB is not going to work well for you, because you intend to change the value of already-inserted entries. Modification and deletion of existing values is not possible in NuDB. |
During the hash expansion, the data in the hash bucket on the disk will be moved to the new bucket. Is the performance affected by the disk IO?
The text was updated successfully, but these errors were encountered: