-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sidecar: Add /api/v1/flush
endpoint
#7359
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: mluffman <[email protected]>
Signed-off-by: mluffman <[email protected]>
Signed-off-by: mluffman <[email protected]>
Signed-off-by: mluffman <[email protected]>
BlocksUploaded int `json:"blocksUploaded"` | ||
} | ||
|
||
func (s *SidecarAPI) flush(r *http.Request) (interface{}, []error, *api.ApiError, func()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't /api/v1/snapshot
just create hardlinks to existing blocks? https://github.com/prometheus/prometheus/blob/main/tsdb/block.go#L687-L696. You will have to somehow filter out the non-head block here. I'm not sure how to do that.
Alternatively, Prometheus could have a new parameter to take a snapshot of the head only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shipper.Sync
only uploads blocks that don't already exist in the bucket
https://github.com/thanos-io/thanos/blob/main/pkg/shipper/shipper.go#L284-L289
snapshotDir := s.dataDir + "/" + dir | ||
|
||
s.shipper.SetDirectoryToSync(snapshotDir) | ||
uploaded, err := s.shipper.Sync(r.Context()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prometheus might produce a block in the middle of the snapshot call. I think you will have to disable the other syncer before doing anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the concern that it'd produce an overlapping block? Or that there's a potential race between two shippers Sync
ing at the same time?
if err != nil { | ||
return nil, nil, &api.ApiError{Typ: api.ErrorInternal, Err: fmt.Errorf("failed to upload head block: %w", err)}, func() {} | ||
} | ||
return &flushResponse{BlocksUploaded: uploaded}, nil, nil, func() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we immediately shutdown Prometheus after this is done? Otherwise, the same risk of overlapping blocks appears.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point - if the endpoint is only meant to be called once before shutting down, does it even need to be an endpoint? Should it just be the default behavior of the sidecar shutting down? That (I think) should eliminate the concern of both overlapping blocks and two shippers racing against one another.
Although the endpoint alleviates concerns around ordering shutdown, prometheus-operator could hit the endpoint and then just delete the statefulset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love it if all of this would be handled automatically by Sidecar when it is shutting down, but that would entail deeper integration between Sidecar & Prometheus. In Sidecar, we should only care about the head block and avoid overlaps. If only there were a way to tell Prometheus to trim data from the head block that had been uploaded by Sidecar already. :(
Maybe an alternative would be to shut down Prometheus and then read the HEAD block ourselves from Sidecar? We could produce a block from it, upload it, and then trim the head. What do you think about such idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I quite like that idea - I played around with it locally and it does work!
https://gist.github.com/Nashluffy/097e7df7d0a90b0cdefd2b87fb3129c8
But there's a few issues with it, mostly that opening a ReadOnlyDB reads in the WAL before it can be persisted to disk, which depending on the size can be very time consuming & memory intense. If we did want to take this approach, I think there'd have to be a finalizer on the prometheus-owned statefulsets, and a controller that would remove the finalizer. The controller would create a pod that mounts the volume, opens a ReadOnlyDB, and then persists to disk & uploads. But this is quite a lot of orchestration.
The more I think about this, the more I think persist-head-to-block needs to be an endpoint on prometheus/tsdb that we consume. There's quite an extensive thread here about it prometheus-junkyard/tsdb#346
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If overlapping blocks is not a problem due to vertical compaction being on, it should be clearly noted in the docs + logs
If we want a flush api we probably should add it to shipper so that ruler and receiver also get it, right? |
Changes
Adds a sidecar API with one endpoint:
/api/v1/flush
which calls the TSDB snapshot endpoint on the prometheus instance, then uploads all not-already-present blocks in the snapshot to object store.There are a few issues that explain the motivation:
Essentially if this is the last time sidecar will be running (ie. cluster is being deleted, shard being removed, etc...) then without some flushing mechanism you will permanently lose up to 2 hours of data.
Verification
Beside the unit tests, running prometheus locally and calling the endpoint works as expected.