Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove DictionaryTerm with count 0 during compact (workaround for #374) #376

Merged
merged 2 commits into from
Jun 1, 2016
Merged

Remove DictionaryTerm with count 0 during compact (workaround for #374) #376

merged 2 commits into from
Jun 1, 2016

Conversation

mmindenhall
Copy link
Contributor

I spent a couple hours on this last night as a workaround for #374.

The implementation removes all DictionaryTerm entries with Count=0 from the index, in configurable batches, within a transaction. Originally I did this all in one large transaction, but settled on this approach to avoid locking out other writers for an extended period of time. A batch size of 250 seems like a good default number for a server-based implementation...I'll be using batch sizes of more like 50-100 in my app (running on a single core ARMv7).

I ran this test, that I wrote before submitting #374:

  1. Create and initialize a new index, take snapshot of size of index folder
  2. Do the following 5 times:
  3. Add 1000 documents to the index
  4. Take snapshot of size of index folder
  5. Delete all documents from the index
  6. Call the new Compact method I added in Add compact method to goleveldb store #373
  7. Take snapshot of size of index folder

I verified that there were no dictionary term entries present after the test, and the .ldb file contained only the document mapping and fields.

I also ran a variation of the test above to ensure that documents could still be indexed during the execution of CompactWithBatchSize. Just before calling CompactWithBatchSize, I started another goroutine to index documents during the compact. I verified that the documents were indexed, document count after finishing was correct, and there were no dictionary term entries with count 0.

@mmindenhall
Copy link
Contributor Author

Yes...definitely makes more sense that way. I made the change, and ran my tests again.

@mschoch mschoch merged commit 92cf2a8 into blevesearch:master Jun 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants