Merge chunks together on compaction #397

csmarchbanks · 2018-09-26T19:03:52Z

fixes: #236

This is a first pass at merging small chunks together into larger ones. Initial query benchmark results seem promising for when blocks become larger. This code currently takes the 120 samples per 2 hours plan used in the head and makes the chunk size proportionally larger as the block size increases.

benchcmp results between first and last commits:

benchmark                                                          old ns/op     new ns/op     delta
BenchmarkPersistedQueries/series=10,samplesPerSeries=1000-8        103098        106463        +3.26%
BenchmarkPersistedQueries/series=10,samplesPerSeries=10000-8       468642        259983        -44.52%
BenchmarkPersistedQueries/series=10,samplesPerSeries=100000-8      3987684       209855        -94.74%
BenchmarkPersistedQueries/series=100,samplesPerSeries=1000-8       863718        852374        -1.31%
BenchmarkPersistedQueries/series=100,samplesPerSeries=10000-8      4582881       2465040       -46.21%
BenchmarkPersistedQueries/series=100,samplesPerSeries=100000-8     47530956      1918334       -95.96%

benchmark                                                          old allocs     new allocs     delta
BenchmarkPersistedQueries/series=10,samplesPerSeries=1000-8        427            427            +0.00%
BenchmarkPersistedQueries/series=10,samplesPerSeries=10000-8       1977           1107           -44.01%
BenchmarkPersistedQueries/series=10,samplesPerSeries=100000-8      17267          877            -94.92%
BenchmarkPersistedQueries/series=100,samplesPerSeries=1000-8       3738           3738           +0.00%
BenchmarkPersistedQueries/series=100,samplesPerSeries=10000-8      19238          10538          -45.22%
BenchmarkPersistedQueries/series=100,samplesPerSeries=100000-8     172145         8238           -95.21%

benchmark                                                          old bytes     new bytes     delta
BenchmarkPersistedQueries/series=10,samplesPerSeries=1000-8        70499         70500         +0.00%
BenchmarkPersistedQueries/series=10,samplesPerSeries=10000-8       194376        123395        -36.52%
BenchmarkPersistedQueries/series=10,samplesPerSeries=100000-8      1317825       92106         -93.01%
BenchmarkPersistedQueries/series=100,samplesPerSeries=1000-8       341780        341780        +0.00%
BenchmarkPersistedQueries/series=100,samplesPerSeries=10000-8      1580558       870738        -44.91%
BenchmarkPersistedQueries/series=100,samplesPerSeries=100000-8     12815166      557849        -95.65%

simonpasquier

Thanks for working on this. I'm not familiar that much with the tsdb code but it looks ok to me.

simonpasquier · 2018-09-27T10:00:02Z

compact.go

+func mergeChunks(chks []chunks.Meta) []chunks.Meta {
+	newChks := make([]chunks.Meta, 0, len(chks))
+	for i := 0; i < len(chks); i++ {
+		if i < len(chks)-1 && chks[i].Chunk.NumSamples()+chks[i+1].Chunk.NumSamples() <= 480 {


This would be easier to read like this:

for i := 0; i < len(chks); i++ { if i >= len(chks) - 1 || chks[i].Chunk.NumSamples()+chks[i+1].Chunk.NumSamples() > 480 { newChks = append(newChks, chks[i]) continue } newChunk := chunkenc.NewXORChunk() ... }

codesome · 2018-09-27T11:23:58Z

compact.go

+			newChunk := chunkenc.NewXORChunk()
+			app, err := newChunk.Appender()
+			if err != nil {
+				return chks


Should we also log the error? Also in other places where we are returning chks due to error.

krasi-georgiev · 2018-09-27T20:15:07Z

once we clear some previous PRs I will focus on this one. Sorry for the delay.

krasi-georgiev · 2018-09-28T08:44:09Z

please remove the [WIP] when this is ready for a review.

csmarchbanks · 2018-10-13T13:18:05Z

@krasi-georgiev This is ready for a review now.

Sorry it took awhile, the last two weeks have been a bit crazy for me.

csmarchbanks · 2018-10-13T13:19:37Z

Also, it seems worthwhile to run prombench for this. Should I make a PR updating tsdb in Prometheus to do that? Or is there another a better way to run prombench for changes in tsdb?

krasi-georgiev · 2018-10-15T11:30:19Z

yeah for now we need a PR against Prometheus, but let me review it first and will run the tests after.
It will probably be a week or two before I get to this since we have few other PRs to finish before this one.

krasi-georgiev · 2018-10-16T09:30:08Z

compact.go

+	logger           log.Logger
+	ranges           []int64
+	chunkPool        chunkenc.Pool
+	mergeSmallChunks bool


Why do you think we should make it configurable?
What reason would we have to have it disabled?

Mostly, the issue suggested that it should be an option. Also, I could imagine some queries getting slower if they look at small time ranges from long ago due to seeks taking longer. I am planning to do some additional benchmarking in that area soon.

ok ping me when ready for another review. Thanks

compact.go

querier_test.go

compact.go

Signed-off-by: Chris Marchbanks <[email protected]>

compact_test.go

krasi-georgiev · 2018-10-20T13:04:31Z

compact_test.go

+				{
+					{
+						lset:   map[string]string{"a": "b"},
+						chunks: [][]sample{{{t: 0}, {t: 10}}, {{t: 11}, {t: 20}}},


I am also thinking that we should also test that the size of the merged chunks is as expected - doesn't pass the maxSamples boundary and not less than what is expected. This is to prevent bugs for creating too big chunks or smaller than the original once.
Probably better to add in a separate test. Maybe should also move the logic from here to the new dedicated test for merging.

also check the new chunks has the expected min/max times

krasi-georgiev · 2018-10-20T14:26:11Z

4hs later 😅 and afer trying few different things this is what I came up with :

// mergeChunks .....
func mergeChunks(chks []chunks.Meta, maxSamples int) ([]chunks.Meta, error) {
	mergedChks := make([]chunks.Meta, 0, len(chks))

	for range chks {

		chkNew, totalAppended, err := appendChunks(chks, maxSamples)
		if err != nil {
			return nil, err
		}

		mergedChks = append(mergedChks, chkNew)

		// All existing chunks were merged so no need to continue.
		if len(chks) == totalAppended {
			break
		}
		// Cut the slice so that the next loop run includes
		// only chunks after the last merged chunk.
		chks = chks[totalAppended:]
	}

	return mergedChks, nil
}

// appendChunks .....
func appendChunks(chks []chunks.Meta, maxSamples int) (chunks.Meta, int, error) {
	var totalAppended int

	if len(chks) == 1 {
		return chks[0], 0, nil
	}

	chunkNew := chunkenc.NewXORChunk()
	appNew, err := chunkNew.Appender()
	if err != nil {
		return chunks.Meta{}, 0, err
	}

	for _, chkExisting := range chks {
		// Keep total samples lower than the maxSamples.
		if chunkNew.NumSamples()+chkExisting.Chunk.NumSamples() > maxSamples {
			break
		}
		it := chkExisting.Chunk.Iterator()
		for it.Next() {
			appNew.Append(it.At())
		}
		if err := it.Err(); err != nil {
			return chunks.Meta{}, 0, err
		}
		totalAppended++
	}

	mergedChk := chunks.Meta{
		MinTime: chks[0].MinTime,
		MaxTime: chks[totalAppended-1].MaxTime,
		Chunk:   chunkNew,
	}
	return mergedChk, totalAppended, nil

}

Signed-off-by: Chris Marchbanks <[email protected]>

csmarchbanks · 2018-10-21T03:55:26Z

I added a unit test for mergeChunks, decided to leave the compaction tests in there to supplement the specific unit test.

I am going to think a bit more on how to simplify mergeChunks, I am not super happy with either way yet...

EDIT: Ended up doing something similar to your idea, but with a couple shortcuts still in mergeChunks. Let me know what you think.

Signed-off-by: Chris Marchbanks <[email protected]>

fabxc · 2018-10-22T05:45:21Z

makes the chunk size proportionally larger as the block size increases.

That sounds wrong. The compression ratio nearly reaches optimum average at about 60 samples. There's virtually no further gain at all beyond 120 samples (Figure 6).
All we save is indexing those chunks but index size has not been a major concern IIRC.

Do we have any data that would back up that bigger chunks would become more efficient?
If you keep making chunks bigger and bigger, that would effectively mean spending a ton of CPU on each compaction to re-compress virtually every series, no?
Queries will have to decompress day-long chunks just to access a small window of data within them. Thus a query for 1h and 4 weeks could take the same amount of time in unlucky cases.

Please correct me if I misunderstood something.
This seems like an expensive and invasive operation and we'd need some hard data what improvement this shows. The added benchmark is relatively micro and doesn't reflect the big picture I think. There are some impressive improvements for sure – it would be good to verify whether those are owed to samples being actually compressed in one big sequence or whether this is a simple allocation problem that could just be optimized away.

krasi-georgiev · 2018-10-22T09:35:27Z

@csmarchbanks we can probably test all this roughly in a prombench test. If you open a PR against Prometheus with these changes I can start a test leave it running and test some query timings, cpu usage etc.

csmarchbanks · 2018-10-22T14:45:21Z

@fabxc Your understanding is correct, definitely more cpu on compaction, and I don't know if it would make a real world difference. I wanted to get this in an ok enough place where it would make sense to test using prombench/a custom build of prometheus for a more realistic scenario, and verify if there are any noticeable improvements. If not, then I will close this PR because the complexity and extra cpu on compaction is definitely not worth it.

I will look into some allocation optimizations as well. That sounds like an interesting path to pursue.

@krasi-georgiev I will let you know when I make a prometheus PR. Should be sometime today.

csmarchbanks · 2018-10-23T17:18:21Z

Looking at the benchmarks in prometheus/prometheus#4768, I don't think making bigger chunks is worthwhile. Query performance does not change much, and some compactions are taking much much longer (40 minutes vs 10 minutes).

An alternative could be to merge tiny chunks (say less than 40 samples) into another chunk to increase compression a bit. However, in my prometheus no more than 1% of chunks would be merged, so I doubt the extra complexity is worthwhile.

csmarchbanks · 2018-10-23T19:31:48Z

Not worth it

csmarchbanks mentioned this pull request Sep 26, 2018

Merging of smaller-chunks during compaction #236

Closed

csmarchbanks force-pushed the merge-chunks branch from ef12bcb to 6730dfe Compare September 26, 2018 19:14

simonpasquier reviewed Sep 27, 2018

View reviewed changes

codesome reviewed Sep 27, 2018

View reviewed changes

csmarchbanks force-pushed the merge-chunks branch from 7968fe0 to 9d72229 Compare October 13, 2018 13:12

csmarchbanks changed the title ~~[WIP] Merge chunks together on compaction~~ Merge chunks together on compaction Oct 13, 2018

krasi-georgiev reviewed Oct 18, 2018

View reviewed changes

csmarchbanks added 5 commits October 19, 2018 20:38

Add benchmark for queries from persisted data

be13d1d

Signed-off-by: Chris Marchbanks <[email protected]>

First pass at merging chunks together during compaction

efd5bc6

Signed-off-by: Chris Marchbanks <[email protected]>

Review feedback and simplification

4f9c2a6

Signed-off-by: Chris Marchbanks <[email protected]>

Pass merge chunk configuration through to top level

4863aae

Signed-off-by: Chris Marchbanks <[email protected]>

Make chunk size scale proportionally with the size of the block

eaad99b

Signed-off-by: Chris Marchbanks <[email protected]>

csmarchbanks force-pushed the merge-chunks branch from 9d72229 to 0046d2d Compare October 20, 2018 02:39

Review feedback

254d672

Signed-off-by: Chris Marchbanks <[email protected]>

csmarchbanks force-pushed the merge-chunks branch from 0046d2d to 254d672 Compare October 20, 2018 02:57

krasi-georgiev reviewed Oct 20, 2018

View reviewed changes

Add unit test for mergeChunks

d9635b2

Signed-off-by: Chris Marchbanks <[email protected]>

Attempt to simplify mergeChunks

bae6624

Signed-off-by: Chris Marchbanks <[email protected]>

csmarchbanks force-pushed the merge-chunks branch from 441f820 to bae6624 Compare October 21, 2018 05:01

krasi-georgiev mentioned this pull request Oct 21, 2018

refactor util funcs to allow re-usage. #419

Merged

csmarchbanks mentioned this pull request Oct 23, 2018

[DO NOT MERGE] Testing out merging chunks together in tsdb prometheus/prometheus#4768

Closed

csmarchbanks closed this Oct 23, 2018

csmarchbanks mentioned this pull request Oct 23, 2018

Add benchmark for querying a persisted block #425

Merged

GiedriusS mentioned this pull request Feb 7, 2019

store: add ability to limit max num of samples / concurrent queries thanos-io/thanos#798

Merged

bwplotka mentioned this pull request Nov 18, 2019

compaction: Consider merging chunks for the same series during compaction. prometheus/prometheus#6332

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge chunks together on compaction #397

Merge chunks together on compaction #397

csmarchbanks commented Sep 26, 2018 •

edited by krasi-georgiev

Loading

simonpasquier left a comment

simonpasquier Sep 27, 2018

codesome Sep 27, 2018

krasi-georgiev commented Sep 27, 2018

krasi-georgiev commented Sep 28, 2018

csmarchbanks commented Oct 13, 2018

csmarchbanks commented Oct 13, 2018

krasi-georgiev commented Oct 15, 2018

krasi-georgiev Oct 16, 2018

csmarchbanks Oct 19, 2018

krasi-georgiev Oct 19, 2018

krasi-georgiev Oct 20, 2018

krasi-georgiev commented Oct 20, 2018 •

edited

Loading

csmarchbanks commented Oct 21, 2018 •

edited

Loading

fabxc commented Oct 22, 2018 •

edited

Loading

krasi-georgiev commented Oct 22, 2018

csmarchbanks commented Oct 22, 2018

csmarchbanks commented Oct 23, 2018

csmarchbanks commented Oct 23, 2018

Merge chunks together on compaction #397

Merge chunks together on compaction #397

Conversation

csmarchbanks commented Sep 26, 2018 • edited by krasi-georgiev Loading

simonpasquier left a comment

Choose a reason for hiding this comment

simonpasquier Sep 27, 2018

Choose a reason for hiding this comment

codesome Sep 27, 2018

Choose a reason for hiding this comment

krasi-georgiev commented Sep 27, 2018

krasi-georgiev commented Sep 28, 2018

csmarchbanks commented Oct 13, 2018

csmarchbanks commented Oct 13, 2018

krasi-georgiev commented Oct 15, 2018

krasi-georgiev Oct 16, 2018

Choose a reason for hiding this comment

csmarchbanks Oct 19, 2018

Choose a reason for hiding this comment

krasi-georgiev Oct 19, 2018

Choose a reason for hiding this comment

krasi-georgiev Oct 20, 2018

Choose a reason for hiding this comment

krasi-georgiev commented Oct 20, 2018 • edited Loading

csmarchbanks commented Oct 21, 2018 • edited Loading

fabxc commented Oct 22, 2018 • edited Loading

krasi-georgiev commented Oct 22, 2018

csmarchbanks commented Oct 22, 2018

csmarchbanks commented Oct 23, 2018

csmarchbanks commented Oct 23, 2018

csmarchbanks commented Sep 26, 2018 •

edited by krasi-georgiev

Loading

krasi-georgiev commented Oct 20, 2018 •

edited

Loading

csmarchbanks commented Oct 21, 2018 •

edited

Loading

fabxc commented Oct 22, 2018 •

edited

Loading