Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add thanos objmeta component for large workload. (#6468) #6553

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

hanjm
Copy link
Member

@hanjm hanjm commented Jul 23, 2023

From #6468 and the discuss at slack , we think it is nesseray to manage block metadata in another more efficiently and cheaper way when thanos blocks grow very high.
may releated issue in the past: #3712 #3150

func (c *lazyOverlapChecker) sync(ctx context.Context) error {
if err := c.bucket.Iter(ctx, "", func(path string) error {

thanos/pkg/block/fetcher.go

Lines 320 to 321 in cdba35b

for id := range ch {
meta, err := f.loadMeta(ctx, id)

thanos/pkg/block/fetcher.go

Lines 857 to 859 in cdba35b

for id := range ch {
m := &metadata.DeletionMark{}
if err := metadata.ReadMarker(ctx, f.logger, f.bkt, id.String(), m); err != nil {

thanos/pkg/compact/compact.go

Lines 1537 to 1540 in cdba35b

for id := range ch {
m := &metadata.NoCompactMark{}
// TODO(bwplotka): Hook up bucket cache here + reset API so we don't introduce API calls .
if err := metadata.ReadMarker(ctx, f.logger, f.bkt, id.String(), m); err != nil {

var list []string
err := cb.Bucket.Iter(ctx, dir, func(s string) error {
list = append(list, s)
return f(s)
}, options...)
remainingTTL := cfg.TTL - time.Since(iterTime)
if err == nil && remainingTTL > 0 {
data, encErr := cfg.Codec.Encode(list)
if encErr == nil {
cfg.Cache.Store(map[string][]byte{key: data}, remainingTTL)

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

This PR add a optional component objmeta, allow fast iter all the block and get block metadata, save the cost from object storage.
*NOTE: This component is only needed when your thanos workload is very large, how large is large? when the number of blocks in object storage is more than 100,000 or the HEAD request to object storage more than 10k per second.

Verification

  1. Unit test.
  2. LOCAL_BUCKET_ENABLED=1 OBJ_META_ENABLED=1 scripts/quickstart.sh

Benchmark

$ go test . -v -test.bench redisBackend -test.run redisBackend
goos: darwin
goarch: arm64
pkg: github.com/thanos-io/thanos/pkg/objmeta
Benchmark_redisBackend_GetBlockMeta
Benchmark_redisBackend_GetBlockMeta-10            160112              7758 ns/op            2064 B/op          7 allocs/op
Benchmark_redisBackend_ExistsBlockMeta
Benchmark_redisBackend_ExistsBlockMeta-10         170720              7168 ns/op              64 B/op          2 allocs/op
PASS
ok      github.com/thanos-io/thanos/pkg/objmeta 3.605s

@sonatype-lift
Copy link

sonatype-lift bot commented Jul 23, 2023

Sonatype Lift is retiring

Sonatype Lift will be retiring on Sep 12, 2023, with its analysis stopping on Aug 12, 2023. We understand that this news may come as a disappointment, and Sonatype is committed to helping you transition off it seamlessly. If you’d like to retain your data, please export your issues from the web console.
We are extremely grateful and thank you for your support over the years.

📖 Read about the impacts and timeline

@@ -307,17 +346,37 @@ done

sleep 0.5

if [ -n "${GCS_BUCKET}" -o -n "${S3_ENDPOINT}" ]; then
if [ -n "${GCS_BUCKET}" -o -n "${S3_ENDPOINT}" -o -n "${LOCAL_BUCKET_ENABLED}" ]; then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SC2166: Prefer [ p ] || [ q ] as [ p -o q ] is not well defined.


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.

@@ -307,17 +346,37 @@ done

sleep 0.5

if [ -n "${GCS_BUCKET}" -o -n "${S3_ENDPOINT}" ]; then
if [ -n "${GCS_BUCKET}" -o -n "${S3_ENDPOINT}" -o -n "${LOCAL_BUCKET_ENABLED}" ]; then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SC2166: Prefer [ p ] || [ q ] as [ p -o q ] is not well defined.


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.

@hanjm hanjm changed the title Add thanos objmeta component (#6468) Add thanos objmeta component for large workload. (#6468) Jul 23, 2023
@yeya24
Copy link
Contributor

yeya24 commented Jul 23, 2023

@hanjm Thanks for the work. Do you have any benchmark/data to share about how much this new component helps?

@hanjm
Copy link
Member Author

hanjm commented Jul 23, 2023

@yeya24 hi, i use it in compactor and thanos store, and reduce all the operation='exist' request against to object storage. the compactor run faster, the thanos store sync faster.
image

benchmark result at local.

$ go test . -v -test.bench redisBackend -test.run redisBackend
goos: darwin
goarch: arm64
pkg: github.com/thanos-io/thanos/pkg/objmeta
Benchmark_redisBackend_GetBlockMeta
Benchmark_redisBackend_GetBlockMeta-10            160112              7758 ns/op            2064 B/op          7 allocs/op
Benchmark_redisBackend_ExistsBlockMeta
Benchmark_redisBackend_ExistsBlockMeta-10         170720              7168 ns/op              64 B/op          2 allocs/op
PASS
ok      github.com/thanos-io/thanos/pkg/objmeta 3.605s

@hanjm hanjm marked this pull request as ready for review July 23, 2023 08:11
@yeya24
Copy link
Contributor

yeya24 commented Jul 23, 2023

With this change #6474, we should have less Exists call and the iteration can be cached.

@hanjm
Copy link
Member Author

hanjm commented Jul 24, 2023

With this change #6474, we should have less Exists call and the iteration can be cached.

it not solved this problem in large workload.

use objstore.WithRecursiveIter will lead to less exists call but lead to more iter call, it need iter all block index and chunks.

when number of block grow, iteration result is cached use a single value, which is a big value and cannot set success to remote cache backend.

@fpetkovski
Copy link
Contributor

That's really interesting feedback, do have some rough numbers about how big the result of a recursive iter is? Maybe there is something we can do to make it smaller so it fits in cache.

@GiedriusS
Copy link
Member

GiedriusS commented Jul 24, 2023

Yeah, this seems a bit overkill. We've heard feedback multiple times that Thanos is already too hard to understand. Having a component like this adds significant complexity. Perhaps we could fix caching instead? Redis/memcached/etc. seem perfectly fitted for such use-case IMHO. For example, we could use Redis lists to store this data.

@hanjm
Copy link
Member Author

hanjm commented Jul 24, 2023

do have some rough numbers about how big the result of a recursive iter is?

The length of block id is 26 bytes, for example 1 million blocks blocks got 24MB. it is too large for remote cache.
after #6474 will be bigger beacuse all the object names added, even we can compress it use gzip or dod, but it not solved completely when blocks number grow.

this seems a bit overkill.

For large workload it is not. so it is a optional component only need for large workload.

***NOTE:** This component is only needed when your thanos workload is very large, how large is large? when the number of blocks in object storage is more than 100,000 or the HEAD request to object storage more than 10k per second.

Perhaps we could fix caching instead?

May be, but it not solved completely.

  1. Cache interface is simply set/get bytes, not support complex data structure. memcached can not store list data structure
  2. when the list is appending but not completed, the reader in another component will got a incompleted view.
  3. More than one component write the same iter/meta cache independently, seems not a good pattern. in distributed system, the writer to storage should be Single Responsibility Principle, just like Thanos Philosophy says "Each subcommand should do one thing and do it well"

    thanos/README.md

    Lines 60 to 69 in cdba35b

    ## Thanos Philosophy
    The philosophy of Thanos and our community is borrowing much from UNIX philosophy and the golang programming language.
    * Each subcommand should do one thing and do it well
    * e.g. thanos query proxies incoming calls to known store API endpoints merging the result
    * Write components that work together
    * e.g. blocks should be stored in native prometheus format
    * Make it easy to read, write, and, run components
    * e.g. reduce complexity in system design and implementation

hanjm added 12 commits July 28, 2023 22:06
quickstart.sh support thanos objmeta
support delete-marker and no-compact-marker,  multi backend
make backend as interface

Signed-off-by: Jimmie Han <[email protected]>
change redis client
fix bucket construct

Signed-off-by: Jimmie Han <[email protected]>
Signed-off-by: Jimmie Han <[email protected]>
Signed-off-by: Jimmie Han <[email protected]>
Signed-off-by: Jimmie Han <[email protected]>
Signed-off-by: Jimmie Han <[email protected]>
Signed-off-by: Jimmie Han <[email protected]>
Signed-off-by: Jimmie Han <[email protected]>
Signed-off-by: Jimmie Han <[email protected]>
Signed-off-by: Jimmie Han <[email protected]>
@hanjm hanjm force-pushed the feature/jimmiehan-add-redis-meta-base-main branch from 8db7276 to 1692dc4 Compare July 28, 2023 14:09
@hanjm hanjm force-pushed the feature/jimmiehan-add-redis-meta-base-main branch from 1692dc4 to c4b94cc Compare July 28, 2023 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants