-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: hash: fix lost hash_node because of _kv_delta #46
Conversation
When adding a new hash_node, hash_update_binary() finds the pointer that will be set to point to that hash_node before calling _hash_update_fn(), which calls the kv_update_fn. _kv_delta() can itself add hash_nodes to the hash_table. If these nodes are in the same slot as the one in the original call to hash_update_binary(), they will set the same pointer that was found in the original call. This means that when the original hash_node is added, it will overwrite the pointer to the hash_nodes added by _kv_delta() causing them to be lost, and memory to leak. To fix this, _do_hash_insert_binary() now checks if the found pointer is no longer NULL after the kv_update_fn is called, and if so, continues down the list until if finds a NULL pointer. This will fix the issue, assuming that the kv_update_fn never adds the original key to the hash table, and that it never deletes any node from the hash table.
Nice! I wonder if we could also add a check inside hash_update_binary so we'd catch the situation where inside the hash_update_binary a hash_update_fn is called that in turn calls hash_remove or hash_update_binary again with the same key to prevent any possible problems of this kind... Maybe some form of a handle passed from one call to another to keep track of the nested calls, not sure what a good approach would be here right now to accomplish such a check. Thanks for finding and fixing this one, I'll merge the patch. |
hash_update_binary() could search through the hash table twice, the first time to get the old data for the hash_update_fn, and assuming that returns successfully, a second time to find the actual pointer to use to add or update the hash_node. The downside would be that it would be wasted work in most cases. If hash_update_fn (and by extension the kv_update_fns) returned a different value if the passed in key was added, or any hash_node was deleted, then hash_update_binary() would only have to recheck the hash table in those cases. I looked, and I'm pretty sure that none of the existing kv_update_fns can do this, which is why I just added some comments stating the requirement that they don't. Obviously, it's pretty easy to make sure that a kv_update_fn for keyX doesn't store keyX. However it's harder to make sure that a kv_update_fn for keyX doesn't store keyY whose kv_update_fn stores keyX. To do that we would have to pass a buffer that gets filled with all the keys that we've stored as we go through the kv_update_fns, or something. We could put it in kv_update_arg, but I'm not sure it's worth the effort. |
Actually, I just realized that it's really easy to check if kv_update_fn() added the key that hash_update_binary() is trying to add. If that key got added by kv_update_fn(), it must have been added to the end of the same slot, right where hash_update_binary() is planning to add that same key. So, when do_hash_insert_binary() is going through the list to find the new NULL pointer, it just needs to check if any of the new hash_nodes matches the key it's adding. However, being able to detect this doesn't tell us what to do when it happens? I would think that if a kv_update_fn() needed to change some data associated with the original key, it would do that by editing the update_spec.new* fields. If, instead, it added the key to the hash_table with different data than what's in the update_spec.new_data field, we... what? Overwrite it? Error out? I would assume that if this happens, it's an error. As far as the prohibition on removing a hash_node in hash_update_fn(), my feeling is that if there ever comes a point when it's necessary to remove a hash_node in hash_update_fn(), we can worry about how to communicate that back to hash_update_binary() then. |
Exactly, that's actually the primary reason I added the
Yup, if the data for the same key is changed by any (nested)
Definitely an error. I'd simply error out with
Maybe the |
When adding a new hash_node, hash_update_binary() finds the pointer that
will be set to point to that hash_node before calling _hash_update_fn(),
which calls the kv_update_fn. _kv_delta() can itself add hash_nodes to
the hash_table. If these nodes are in the same slot as the one in the
original call to hash_update_binary(), they will set the same pointer
that was found in the original call. This means that when the original
hash_node is added, it will overwrite the pointer to the hash_nodes
added by _kv_delta() causing them to be lost, and memory to leak.
To fix this, _do_hash_insert_binary() now checks if the found pointer is
no longer NULL after the kv_update_fn is called, and if so, continues
down the list until if finds a NULL pointer. This will fix the issue,
assuming that the kv_update_fn never adds the original key to the hash
table, and that it never deletes any node from the hash table.