Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete commit log files that are covered by snapshot files #802

Merged
merged 19 commits into from
Jul 26, 2018

Conversation

richardartoul
Copy link
Contributor

@richardartoul richardartoul commented Jul 23, 2018

  • Refactors how we list / interact with the commit log files on disk (specifically, we no longer use the name of the file to determine the start time nor do we rely on the current configuration to guess what the commit log block size is)
  • Rewrite commit log cleanup logic to allow commit logs to be deleted if they're captured by snapshots
  • In addition to the standard unit tests that would be expected with this P.R, I've added a mixed mode read write integration prop test that uses a random seed and property testing to verify that the cleanup logic always leave the node in a state where if its restarted all data can be recovered.

@codecov
Copy link

codecov bot commented Jul 23, 2018

Codecov Report

Merging #802 into master will increase coverage by 0.04%.
The diff coverage is 92.43%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #802      +/-   ##
==========================================
+ Coverage   77.98%   78.02%   +0.04%     
==========================================
  Files         365      365              
  Lines       31438    31479      +41     
==========================================
+ Hits        24516    24561      +45     
+ Misses       5262     5258       -4     
  Partials     1660     1660
Flag Coverage Δ
#coordinator 60.14% <ø> (-0.1%) ⬇️
#dbnode 81.56% <92.43%> (+0.06%) ⬆️
#m3ninx 72.7% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c096772...a68bb16. Read the comment docs.

@richardartoul richardartoul changed the title [WIP] - Delete commit log files that are covered by snapshot files Delete commit log files that are covered by snapshot files Jul 24, 2018
@richardartoul richardartoul force-pushed the ra/delete-snapshot-files-master branch from b7173e9 to 8a33ff2 Compare July 24, 2018 21:47
const maxBlockSize = 12 * time.Hour
const maxPoints = 100
const minSuccessfulTests = 8
const maxFlushWaitTime = time.Minute
Copy link
Collaborator

@robskillington robskillington Jul 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: combine into single const block:

const (
	minBlockSize = 15 * time.Minute
	maxBlockSize = 12 * time.Hour
	maxPoints = 100
	minSuccessfulTests = 8
	maxFlushWaitTime = time.Minute
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// Files returns a slice of all available commit log files on disk along with
// their associated metadata.
func Files(opts Options) ([]File, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

continue
}
if predicate(file, start, duration) {
if predicate(file.FilePath, file.Start, file.Duration) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to make predicate just take commitlog.File so it's easier to add new fields in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call

var (
ropts = ns.Options().RetentionOptions()
nsBlocksStart, nsBlocksEnd = commitLogNamespaceBlockTimes(start, duration, ropts)
needsFlush = ns.NeedsFlush(nsBlocksStart, nsBlocksEnd)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I wonder if we should change these type of functions to instead of taking start, end time.Time we should be using a single xtime.Range arg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how much it buys us as all the helper functions take a start/end as well and we're not trying to calculate overlap or anything like that

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, just more that the name of the data structure represents the variable itself better and its a single var that can then be passed around. I'm not bullish though, we can revisit this later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool lets leave this out in the interest of time for now

filesToCleanup := filterCommitLogFiles(files, func(start time.Time, duration time.Duration) bool {
if outerErr != nil {
return false
}
Copy link
Collaborator

@robskillington robskillington Jul 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a necessary early return false? It seems like we'll reach same result whether we skip entries after we've seen a single entry. Just because if so, then we don't have to add test coverage for these lines if it doesn't make a difference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess its just the difference between first error wins / last error wins. Some of the operations are expensive in that they have to go to disk so returning early is nice in that sense too, but optimizing for the error case is kind of dumb. I'll remove it.

writeCommitLogDataSpecifiedTS(
t,
testSetup,
testSetup.storageOpts.CommitLogOptions().SetFlushInterval(defaultIntegrationTestFlushInterval),
Copy link
Collaborator

@robskillington robskillington Jul 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: Maybe declare these options before the for loop? Just to avoid this super long line and all:

	clOpts := testSetup.storageOpts.CommitLogOptions().
		SetFlushInterval(defaultIntegrationTestFlushInterval)
	for _, clTime := range commitLogTimes {
		// Need to generate valid commit log files otherwise cleanup will fail.
		data := map[xtime.UnixNano]generate.SeriesBlock{
			xtime.ToUnixNano(clTime): nil,
		}
		writeCommitLogDataSpecifiedTS(t, testSetup, clOpts, data, ns1, clTime, false)
	}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

writeCommitLogDataSpecifiedTS(
t,
testSetup,
testSetup.storageOpts.CommitLogOptions().SetFlushInterval(defaultIntegrationTestFlushInterval),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: Same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@@ -73,7 +74,7 @@ func verifySeriesMapForRange(
input generate.SeriesBlock,
expectedDebugFilePath string,
actualDebugFilePath string,
) {
) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm this is probably a good change. Especially when called in an async goroutine because otherwise the goroutine just dies when a require.Foo(...) fails. 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah plus gopter require you to return (false, error) when something goes wrong. If you just panic it can't really do its thing

isCapturedBySnapshot, err := ns.IsCapturedBySnapshot(
nsBlocksStart, nsBlocksEnd, start.Add(duration))
if err != nil {
outerErr = err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to get the first error btw, why not make it a slice var outerErrs []error then outerErrs = append(outerErrs, err) and then if len(outerErrs) > 0 { return outerErrs[0] }?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even just use a multiErr here and return all errors shrug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that I could just make the filterCommitLogFiles accept a predicate function that returns (bool, error) and that makes everything clean and simple.

Copy link
Collaborator

@robskillington robskillington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@richardartoul richardartoul merged commit 8968605 into master Jul 26, 2018
@justinjc justinjc deleted the ra/delete-snapshot-files-master branch August 9, 2018 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants