-
Notifications
You must be signed in to change notification settings - Fork 689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data leak with goleveldb backend? #374
Comments
Can you run the bleve_dump utility to see what data (if any) is still there? If bleve_dump doesn't show anything, you can use the lower-level 'leveldbutil' that comes with leveldb to dump just the contents of the leveldb file. Running those after one of the iterations should give some insight. |
There are a few things that are expected to be left around.
|
I ran the test with just one iteration, and with just 10 documents. After delete and compact, I ran bleve_dump. There are a huge number (1750, to be exact) of
Some have binary data (like the first), and some have recognizable strings (second). Then at the very bottom, I see the definitions for the fields:
And finally I see the I noticed that all of the I attached the dump output if that helps. |
@mschoch, any thoughts on this one? |
I don't have anything really to add, what you found is definitely the case. It has to do with the fact that we are using a "merge operator" to update the dictionary rows. We built our merge operator API to emulate (and fit in with) what RocksDB had. And at the time that didn't seem to allow us to delete a row as a result of the merge. I think there is now a way to get that behavior, but our API doesn't support it. So, the net effect is that even if a dictionary term count goes to 0, we don't delete it. Even at the time we knew this was undesirable, but we also didn't expect it to be a big deal. The use case your testing of creating docs then deleting them all isn't one we optimized for. Is this a particularly interesting case to you? Or are you just testing different things out? My thought is to leave this open so that we eventually circle back to address it, but its not particularly high priority for me right now. |
We're building IoT gateway software that runs on small footprint devices. For example, we're running on a device with 128MB of RAM, and 128MB of flash (of which about 65% is available for our app + data). We're receiving and indexing "reports" from attached "things", and the partition runs out of space pretty quickly. So I wrote code to monitor how much space is free, and delete the oldest reports (with a Compact) to avoid disk full errors. This solution probably extends our ability to run without filling the disk from hours or days to days or weeks. Eventually it would be great to have this fixed, but not urgent. If you can give me some pointers at what needs to be done, I might be able to convince my boss to let me work on fixing this. |
Well, I don't think its straightforward to fix. Our current design is broken. I did another quick review and it seems that the result of a RocksDB merge is always another row, not a delete. Even if we defined our merge wrapper to delete rows if the new row is nil, that wouldn't work correctly with RocksDB's native merge operator. The only alternative I can see is to have some sort of background process cleaning things up, but I'm not excited about that solution either. |
Remove DictionaryTerm with count 0 during compact (workaround for #374)
Compact for boltdb (workaround for #374)
I have a scenario where I add a doc, run queries on it and then delete it again because I do not need it anymore. This is done quite frequently. Using leveldb as the backend and using the in-memory option when creating the index. |
I've observed the behavior noted here (dictionary term not deleted when count goes to 0) on the levigo based leveldb backend as well. |
The application I'm working on runs on IoT gateways with limited resources (~100MB storage free). Therefore, I submitted #373 to be able to proactively reclaim disk space after deleting "expired" documents from indexes.
I wrote a test where I do the following:
Compact
method I added in Add compact method to goleveldb store #373Here are the sizes of the index folder:
Size at start: 52K
Just to be sure the
Compact
method was actually doing something, I commented out that line and ran again (create 1000, delete 1000, no compact). Without the call toCompact
, deleting the 1000 documents actually increases the size of the index:Is this a bug? I'm wondering if there's some set of keys that gets created when a document is indexed that is not getting deleted when the document is removed?
The text was updated successfully, but these errors were encountered: