-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fetcher: add filter to remove old blocks if new overlapping blocks found #2071
Conversation
21f3fc4
to
2059766
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Good work, this looks like it will work... but only for basic case of blocks:
[A:A] [B:B] [C:C] [D:ABC]
This is great however if we have only compaction levels higher than 2 we can have e.g:
[H:ABC][I:DFG][J:ABCDEFG]
which will be not detected here.
I would say even for development purposes having a table unit test is really helpful for yoursefl! In this case I would even start with unit test (Test driven development!). You specify use cases, you implement the code and verify in seconds ❤️ This is what I would do (:
For your information this is what Prometheus/TSDB is doing when compacting two blocks together, how it combines Sources
: https://github.com/prometheus/prometheus/blob/dee6981a6c78dfc28d000f86e6a6639cc37150b3/tsdb/compact.go#L332
pkg/block/fetcher.go
Outdated
return &CompactedBlockFilter{} | ||
} | ||
|
||
// Filter filters out blocks that filters out blocks that have newer block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filter filters out blocks that filters out blocks
? =D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not addressed ):
pkg/block/fetcher.go
Outdated
@@ -410,3 +411,52 @@ func (f *LabelShardedMetaFilter) Filter(metas map[ulid.ULID]*metadata.Meta, sync | |||
delete(metas, id) | |||
} | |||
} | |||
|
|||
// CompactedBlockFilter is a MetaFetcher filter that filters out blocks that have newer block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have newer block from two or more overlappings blocks
?
What does it mean for block to have blocks? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not addressed.
pkg/block/fetcher.go
Outdated
for id, meta := range metas { | ||
sources := make([]ulid.ULID, len(metas)) | ||
for _, sourceID := range meta.Compaction.Sources { | ||
if sourceID == id { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use sourceID.Compare(id) == 0
to be safe maybe (:
pkg/block/fetcher.go
Outdated
if areSlicesEqual(sources, meta.Compaction.Sources) { | ||
for _, sourceID := range meta.Compaction.Sources { | ||
if sourceID != id { | ||
delete(metas, sourceID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next to deleting, we probably want store the deleted metas and check if we are ready to delete those blocks from bucket (:
bc34b89
to
31c1507
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
There are some things to improve, but overall good work!
I would focus on:
- simplifying vertex, remove forest thing - it's just another vertex.
- addVertex method name is not enough, it does a lot more than just add.
- avoid shallow functions/methods
- Remove repetitions from test cases
- Be consistent and clear on test cases names
But apart from that this looks amazing! (:
I think once you will address those comments (Of course if you don't agree with anything, just tell me!), And we will be happy with functionality we can look on optimizing things. I can see at least 4 places where we can save some CPU and memory, BUT micro optimization should be done based on numbers, so we will start with creating some benchmark, and then optimize it (:
Do you want to pair program sometime this week on the go benchmark? @khyatisoneji
cmd/thanos/compact.go
Outdated
metaFetcher, err := block.NewMetaFetcher(logger, 32, bkt, "", extprom.WrapRegistererWithPrefix("thanos_", reg), | ||
block.NewLabelShardedMetaFilter(relabelConfig).Filter, | ||
(&consistencyDelayMetaFilter{logger: logger, consistencyDelay: consistencyDelay}).Filter, | ||
compactedBlockFilter.Filter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if it wouldn't be beter if the Syncer would have internal method Filter
and DuplicatedBlocks()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Syncer is not used by store component, so we need to add the filter there explicitly then
pkg/block/fetcher.go
Outdated
return &CompactedBlockFilter{} | ||
} | ||
|
||
// Filter filters out blocks that filters out blocks that have newer block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not addressed ):
pkg/block/fetcher_test.go
Outdated
}, | ||
}, | ||
{ | ||
name: "3 non compacted metas and compacted meta of level 2 in bucket", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, meta is never technically compacted -> we need to consistent and clear in naming here. Also compaction does not matter here: Just what constains same sources.
For example this case could be named: Block 4 contains 1, 2, 3
Also I believe :
ULID: ULID(4),
Compaction: tsdb.BlockMetaCompaction{
Sources: []ulid.ULID{ULID(1), ULID(2), ULID(3), ULID(4)},
},
.. is wrong. This will never happen as 4 is compacted block, so it will be never part of sources I believe (That's why I linked the Prometheus code which does it, to double check) (:
pkg/block/fetcher_test.go
Outdated
}, | ||
}, | ||
} { | ||
f := NewCompactedBlockFilter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add case with 2 blocks having exactly the same sources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, its added with test case name: two compacted blocks with same sources
pkg/block/forest.go
Outdated
rootSources := metas[vertex.ID].Compaction.Sources | ||
sources := metas[id].Compaction.Sources | ||
|
||
if f.containsSources(rootSources, sources) && f.containsSources(sources, rootSources) { // a block exists with same sources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is quite long so maybe we can put comment above? And make it a full sentence? (Capital first letter)
pkg/block/forest.go
Outdated
for _, a := range s2 { | ||
found := false | ||
for _, e := range s1 { | ||
if a.Compare(e) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I think we can optimize a bit but this willl do for now (:
We can create some go benchmark for this in next PR 💪
efa1d15
to
57a001d
Compare
@bwplotka I have addressed some concerns that you have made:
I have one suggestion that we should address deletion in a separate PR. Let's narrow down scope of this PR to only filter out the duplicate blocks. |
Very good idea. But in this case we should not probably hook this filter yet in Compactor I think. We can hook in Store Gateway though. |
pkg/block/fetcher.go
Outdated
// Filter filters out duplicate blocks that can be formed | ||
// from two or more overlapping blocks that fully submatches the source blocks of the older blocks. | ||
func (f *DeduplicateFilter) Filter(metas map[ulid.ULID]*metadata.Meta, synced GaugeLabeled, _ bool) { | ||
forest := NewVertex(ulid.MustNew(uint64(0), nil), nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still forest name, maybe just root
?
pkg/block/forest.go
Outdated
) | ||
|
||
// Vertex type implements a vertex of a Forest. | ||
type Vertex struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filename is still old (forest)
pkg/block/forest.go
Outdated
return ulids | ||
} | ||
|
||
func (f *Vertex) add(id ulid.ULID, startVertex *Vertex, metas map[ulid.ULID]*metadata.Meta) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add
is still cryptic. At least can we have a comment what this method does? (: Because it's not just vertex - it's a vertex that
Also:
func (f *Vertex) add(id ulid.ULID, startVertex *Vertex, metas map[ulid.ULID]*metadata.Meta) bool { | |
func (v *Vertex) add(id ulid.ULID, startVertex *Vertex, metas map[ulid.ULID]*metadata.Meta) bool { |
pkg/block/forest.go
Outdated
return f.add(id, rootVertex, metas) | ||
} | ||
|
||
func (f *Vertex) containsSources(rootSources, sources []ulid.ULID) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it have to be method? Also, I don't get why we need this at all this method does nothing ): It might too shallow
32c58b6
to
4b69821
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Let's move add
method maybe as suggested and fix test suggestions and we are good to start some benchmarking! What do you think? (: In next PRs obviously... Exciting!
pkg/block/fetcher.go
Outdated
}) | ||
|
||
for _, meta := range metaSlice { | ||
root.AddMeta(meta.ULID, metas) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, I would vote to move it as a separate function (:
pkg/block/fetcher_test.go
Outdated
{ | ||
name: "two blocks with same sources", | ||
input: map[ulid.ULID]*metadata.Meta{ | ||
ULID(3): { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not addressed
pkg/block/node.go
Outdated
"github.com/thanos-io/thanos/pkg/block/metadata" | ||
) | ||
|
||
// Node type implements a node of a Forest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forest still ):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions? Node of a tree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't both test cases relevant? The first one is for case where two blocks have same sources and second one is where the blocks have overlapping sources.
ce6ee20
to
e5b6eb4
Compare
pkg/block/node.go
Outdated
return ulids | ||
} | ||
|
||
func (n *Node) add(id ulid.ULID, startNode *Node, metas map[ulid.ULID]*metadata.Meta, areEqual, isChildNode compareNodes) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I said like 4 times to move this from node and it's still there and you did not explain why you are against this (:
Maybe I can send PR to your PR to show you how it can be done and show adventages? (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main problem is that this is just a node
It's not hierarchicalBlocksNode
or anything like this. Method is also just add
which suggests nothing is done expect just adding a node to the tree ... somewhere. This is might be not clear for readers of this code so we need to improve this.
Second argument why I believe it would nice to move add
to separate function is extensibility. What if we would like to use our tree with meta.jsons data struct for other algorithms? We can't because add
has certain logic baked in.
e5b6eb4
to
8133cbe
Compare
pkg/block/fetcher_test.go
Outdated
func TestDeduplicateFilter_Filter(t *testing.T) { | ||
for _, tcase := range []struct { | ||
name string | ||
input map[ulid.ULID]tsdb.BlockMetaCompaction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just map[ulid.ULID][]ulid.ULID
?
90d8998
to
4a96f13
Compare
0b57901
to
03e85d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I think we need to fix todo and add more test cases and we are good to merge and play with some benchmarking ❤️
Thanks!
pkg/block/fetcher.go
Outdated
if metas[id] != nil { | ||
f.DuplicateIDs = append(f.DuplicateIDs, id) | ||
} | ||
// TODO(khyati): Increment synced metrics with proper state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo
pkg/block/fetcher_test.go
Outdated
{ | ||
name: "compacted block with sources in bucket", | ||
input: map[ulid.ULID][]ulid.ULID{ | ||
ULID(4): []ulid.ULID{ULID(1), ULID(2), ULID(3)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put other blocks here that we don't expect to be compacted
{ | ||
name: "two compacted blocks with same sources", | ||
input: map[ulid.ULID][]ulid.ULID{ | ||
ULID(3): []ulid.ULID{ULID(1), ULID(2)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, let's mix it
{ | ||
name: "two compacted blocks with overlapping sources", | ||
input: map[ulid.ULID][]ulid.ULID{ | ||
ULID(4): []ulid.ULID{ULID(1), ULID(2)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
pkg/block/fetcher_test.go
Outdated
name: "two compacted blocks with overlapping sources", | ||
input: map[ulid.ULID][]ulid.ULID{ | ||
ULID(4): []ulid.ULID{ULID(1), ULID(2)}, | ||
ULID(5): []ulid.ULID{ULID(1), ULID(2), ULID(3)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reorder ULIDs in Source to check if that will work?
{ | ||
name: "3 compacted blocks of level 2 and one compacted block of level 3 in bucket", | ||
input: map[ulid.ULID][]ulid.ULID{ | ||
ULID(10): []ulid.ULID{ULID(1), ULID(2), ULID(3)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, let's mix with some block that should stay
pkg/block/fetcher_test.go
Outdated
input: map[ulid.ULID][]ulid.ULID{ | ||
ULID(1): []ulid.ULID{ULID(1)}, | ||
ULID(5): []ulid.ULID{ULID(1), ULID(2)}, | ||
ULID(6): []ulid.ULID{ULID(1), ULID(2), ULID(3), ULID(4)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's reorder those a bit to make it more tricky. Yes we put it in map anyway, but sometimes map tends to be.. accidently ordered (:
} { | ||
f := NewDeduplicateFilter() | ||
if ok := t.Run(tcase.name, func(t *testing.T) { | ||
synced := prometheus.NewGaugeVec(prometheus.GaugeOpts{}, []string{"state"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can assert on this metric!
See our talk: https://are-you-testing-your-observability.now.sh/
Click right
until 26 slide and check it out (:
pkg/block/fetcher_test.go
Outdated
ULID(5): []ulid.ULID{ULID(1), ULID(2)}, | ||
ULID(6): []ulid.ULID{ULID(1), ULID(2), ULID(3), ULID(4)}, | ||
ULID(7): []ulid.ULID{ULID(1), ULID(2), ULID(3)}, | ||
ULID(8): []ulid.ULID{ULID(1), ULID(2), ULID(3), ULID(4)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also swap ULID(7)
to ULID(8)
(: Make it more tricky!
ULID(7): []ulid.ULID{ULID(1), ULID(2), ULID(3)}, | ||
ULID(8): []ulid.ULID{ULID(1), ULID(2), ULID(3), ULID(4)}, | ||
}, | ||
expected: []ulid.ULID{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add one more case?
The case would be something like
10 containing 1,2,6,7
11 containing 1,2,6,8
(:
And maybe:
10 containing 1,2,6,7
11 containing 1,2,3
03e85d7
to
53507c0
Compare
Added the requested changes:
|
53507c0
to
3e0f88e
Compare
Signed-off-by: khyatisoneji <[email protected]>
3e0f88e
to
025b327
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
func compareSliceWithMapKeys(tb testing.TB, m map[ulid.ULID]*metadata.Meta, s []ulid.ULID) { | ||
_, file, line, _ := runtime.Caller(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow this is epic! Probably bit over-engineered, but... thanks! (:
Changes
Add Filter to filter out old blocks if new overlapping blocks found that fully submatch the old blocks
Please let me know if this is the right direction to go in terms of filtering out the old blocks