-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete commit log files that are covered by snapshot files #802
Conversation
richardartoul
commented
Jul 23, 2018
•
edited
Loading
edited
- Refactors how we list / interact with the commit log files on disk (specifically, we no longer use the name of the file to determine the start time nor do we rely on the current configuration to guess what the commit log block size is)
- Rewrite commit log cleanup logic to allow commit logs to be deleted if they're captured by snapshots
- In addition to the standard unit tests that would be expected with this P.R, I've added a mixed mode read write integration prop test that uses a random seed and property testing to verify that the cleanup logic always leave the node in a state where if its restarted all data can be recovered.
Codecov Report
@@ Coverage Diff @@
## master #802 +/- ##
==========================================
+ Coverage 77.98% 78.02% +0.04%
==========================================
Files 365 365
Lines 31438 31479 +41
==========================================
+ Hits 24516 24561 +45
+ Misses 5262 5258 -4
Partials 1660 1660
Continue to review full report at Codecov.
|
b7173e9
to
8a33ff2
Compare
const maxBlockSize = 12 * time.Hour | ||
const maxPoints = 100 | ||
const minSuccessfulTests = 8 | ||
const maxFlushWaitTime = time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: combine into single const block:
const (
minBlockSize = 15 * time.Minute
maxBlockSize = 12 * time.Hour
maxPoints = 100
minSuccessfulTests = 8
maxFlushWaitTime = time.Minute
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
// Files returns a slice of all available commit log files on disk along with | ||
// their associated metadata. | ||
func Files(opts Options) ([]File, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
continue | ||
} | ||
if predicate(file, start, duration) { | ||
if predicate(file.FilePath, file.Start, file.Duration) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to make predicate just take commitlog.File
so it's easier to add new fields in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call
src/dbnode/storage/cleanup.go
Outdated
var ( | ||
ropts = ns.Options().RetentionOptions() | ||
nsBlocksStart, nsBlocksEnd = commitLogNamespaceBlockTimes(start, duration, ropts) | ||
needsFlush = ns.NeedsFlush(nsBlocksStart, nsBlocksEnd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I wonder if we should change these type of functions to instead of taking start, end time.Time
we should be using a single xtime.Range
arg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how much it buys us as all the helper functions take a start/end as well and we're not trying to calculate overlap or anything like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, just more that the name of the data structure represents the variable itself better and its a single var that can then be passed around. I'm not bullish though, we can revisit this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool lets leave this out in the interest of time for now
src/dbnode/storage/cleanup.go
Outdated
filesToCleanup := filterCommitLogFiles(files, func(start time.Time, duration time.Duration) bool { | ||
if outerErr != nil { | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a necessary early return false
? It seems like we'll reach same result whether we skip entries after we've seen a single entry. Just because if so, then we don't have to add test coverage for these lines if it doesn't make a difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I guess its just the difference between first error wins / last error wins. Some of the operations are expensive in that they have to go to disk so returning early is nice in that sense too, but optimizing for the error case is kind of dumb. I'll remove it.
writeCommitLogDataSpecifiedTS( | ||
t, | ||
testSetup, | ||
testSetup.storageOpts.CommitLogOptions().SetFlushInterval(defaultIntegrationTestFlushInterval), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: Maybe declare these options before the for
loop? Just to avoid this super long line and all:
clOpts := testSetup.storageOpts.CommitLogOptions().
SetFlushInterval(defaultIntegrationTestFlushInterval)
for _, clTime := range commitLogTimes {
// Need to generate valid commit log files otherwise cleanup will fail.
data := map[xtime.UnixNano]generate.SeriesBlock{
xtime.ToUnixNano(clTime): nil,
}
writeCommitLogDataSpecifiedTS(t, testSetup, clOpts, data, ns1, clTime, false)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
writeCommitLogDataSpecifiedTS( | ||
t, | ||
testSetup, | ||
testSetup.storageOpts.CommitLogOptions().SetFlushInterval(defaultIntegrationTestFlushInterval), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
@@ -73,7 +74,7 @@ func verifySeriesMapForRange( | |||
input generate.SeriesBlock, | |||
expectedDebugFilePath string, | |||
actualDebugFilePath string, | |||
) { | |||
) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm this is probably a good change. Especially when called in an async goroutine because otherwise the goroutine just dies when a require.Foo(...)
fails. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah plus gopter require you to return (false, error) when something goes wrong. If you just panic it can't really do its thing
src/dbnode/storage/cleanup.go
Outdated
isCapturedBySnapshot, err := ns.IsCapturedBySnapshot( | ||
nsBlocksStart, nsBlocksEnd, start.Add(duration)) | ||
if err != nil { | ||
outerErr = err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to get the first error btw, why not make it a slice var outerErrs []error
then outerErrs = append(outerErrs, err)
and then if len(outerErrs) > 0 { return outerErrs[0] }
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even just use a multiErr
here and return all errors shrug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that I could just make the filterCommitLogFiles accept a predicate function that returns (bool, error) and that makes everything clean and simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM