-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve retention mark files. #3706
Conversation
I might be wrong, but since boltdb iterates over keys in order, we should be getting chunk ids in order. |
That's correct although I didn't wanted to rely on this, technically only the rotation and fill percent was required for the improvement. But I went ahead make that change just in case so we have this guarantee whatever the input. |
Ping @sandeepsukhani should we revert the mark file system or are you ok with the approach ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added a suggestion and a comment for a possible issue.
return err | ||
} | ||
binary.BigEndian.PutUint64(m.buf, id) // insert in order using sequence id. | ||
if err := m.bucket.Put(m.buf, chunkID); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think reusing a buf for keys might cause a problem because the comment here says:
Supplied value must remain valid for the life of the transaction.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the key is cloned, but I'm thinking about the chunkId.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you're right going to keep the fillpercent but revert this change.
This PR is two fold: 1) Rotate mark files when they reached 100k marks, this is to ensure we max out marks file and don't create file that may be too big. 2) Instead of inserting marks using the chunk id as the key, we insert using a natural sequence. This has two benefits: - Keys are ordered, and so insertion are faster. - Allows us to use boltdb fill percent to 100%, this means boltdb won't over allocate for inserting key in between unordered data. Why ? I realize that when inserting huge amount of marks, this operation can take up to hours since boltdb has to re-allocate pages over and over. This is mainly because chunkid arrive without specific order. Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Co-authored-by: Sandeep Sukhani <[email protected]>
5f5ad9c
to
fe6e422
Compare
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This PR is two fold:
Rotate mark files when they reached 100k marks, this is to ensure we max out marks file
and don't create file that may be too big.
Instead of inserting marks using the chunk id as the key, we insert using a natural sequence.
This has two benefits:
Why ?
I realize that when inserting huge amount of marks, this operation can take up to hours since boltdb has to re-allocate pages over and over.
This is mainly because chunkid arrive without specific order.
Signed-off-by: Cyril Tovena [email protected]