-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Delete series for object storage #2780
Conversation
1d7e587
to
ff0e5b2
Compare
cc @metalmatze @bwplotka :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments/suggestions but overall good work! Concise and easy to read. I don't think we should delve too much into the exact API contracts as that will most likely be done when you are going to code this. Also, I like the "undeletions" functionality. It's definitely a very nice touch!
pkg/cacheutil/memcached_client.go
Outdated
@@ -269,7 +269,7 @@ func (c *memcachedClient) Stop() { | |||
c.workers.Wait() | |||
} | |||
|
|||
func (c *memcachedClient) SetAsync(ctx context.Context, key string, value []byte, ttl time.Duration) error { | |||
func (c *memcachedClient) SetAsync(_ context.Context, key string, value []byte, ttl time.Duration) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the changes in this file are unrelated? 👁️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was done by me in the PR #2775. I think this proposal branch has been created from that PR branch or something like this.
* **Perspectives to deal with Downsampling of blocks having tombstones:** | ||
* **Block with tombstones and max duration to perform deletes passed:** The compactor should first check the maximum duration to perform deletions for that block and if the proposed maxtime passed the deletions are first performed and then the downsampling occurs. | ||
* **Block with tombstones and max duration still hasn’t passed:** Perform compaction. | ||
* **Performing deletes on already compacted blocks:** Have a threshold to perform deletions on the compacted blocks (Prometheus has 5%) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we elaborate more on what is this threshold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This threshold is something that Prometheus does as well. We need to check for the exact value and maybe should want to link here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now we want to start with the 5% that Prometheus has.
We want to follow up on that threshold on IRC/Matrix and ask why it was chosen. In the future, we can always be smarter about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this to the Prometheus community meeting, happening next week :)
|
||
|
||
|
||
1. External tool operating to perform deletions on the object storage.(Details still to be discussed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably unrealistic to have this since the tool would have to coordinate its actions with potentially more than 1 compactor running on a bucket. IMHO it's not worth considering this as a feasible option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You think we should remove it from the proposal entirely?
* The details are entered via a deletions API (good to have a web UI interface) and they are processed by the compactor to create a tombstone file, if there’s a match for the details entered. Afterwards the tombstone file is uploaded to object storage making it accessible to other components. | ||
* If the data with the requested details couldn’t be found in the storage an error message is returned back as a response to the request. | ||
* Store Gateway masks the series on processing the tombstones from the object storage. | ||
* **Perspectives to deal with Downsampling of blocks having tombstones:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's also worth noting that we shouldn't consider blocks for the deletion API operations that have a deletion-mark.json
i.e. about to be completely deleted themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! Definitely that's a good point
* **Block with tombstones and max duration still hasn’t passed:** Perform compaction. | ||
* **Performing deletes on already compacted blocks:** Have a threshold to perform deletions on the compacted blocks (Prometheus has 5%) | ||
* For undoing deletions of a time series there are two proposed ways | ||
* API to undelete a time series - maybe delete the whole tombstones file? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if a deletion happens during a compaction pass? In that case, we most likely should return an error to the caller, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compactor is continuously running compactions and return errors to the caller might result in a lot of errors all the time, which would be annoying.
For now, we are going to check how Prometheus handles this.
|
||
|
||
|
||
* Tombstones should be append only, so that we can solve these during compaction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solve these
... what is "these"? :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tombstones, will update.
|
||
|
||
* Tombstones should be append only, so that we can solve these during compaction. | ||
* We don’t want to add this feature to the sidecar. The sidecar is expected to be kept lightweight. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 concur with this
|
||
* We propose to implement deletions using the tombstones approach. | ||
* A user is expected to enter the following details for performing deletions: | ||
* **external labels** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why external labels and not metric labels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that it absolutely correct. We actually want to use label matchers here, so that we can match metric labels as well as external labels.
* **start timestamp** | ||
* **end timestamp** (start and end timestamps of the series data the user expects to be deleted) | ||
* **maximum waiting duration for performing deletions** where a default value is considered if explicitly not specified by the user. (only after this time passes the deletions are performed by the compactor in the next compaction cycle) | ||
* The details are entered via a deletions API (good to have a web UI interface) and they are processed by the compactor to create a tombstone file, if there’s a match for the details entered. Afterwards the tombstone file is uploaded to object storage making it accessible to other components. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The details are entered via a deletions API (good to have a web UI interface) and they are processed by the compactor
This means the API is exposed by the compactor. I personally see the compactor more an "off-the-grid" component and I'm not sure that design-wise it's good that it will expose this API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compactor is one of the very few components (next to Sidecar, Receiver, Ruler) that has write capabilities to object storage. At the same time it is only ever running once per object storage, so that we have a single writer to the deletion tombstones.
Additionally, by design, we thought that having the compactor "off-the-grid" and thus only accessible by an administrator solves a few permission problems right away.
We'll add this point to the proposal.
|
||
* Add the deletion API (probably compactor) that only creates tombstones | ||
* Store Gateway should be able to mask based on the tombstones from object storage | ||
* A web UI for the deletion API? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep this proposal smaller and remove the UI part form this proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally, this proposal in itself is quite complex and we will remove the UI from for now. Later, we will follow up on this.
pkg/cacheutil/memcached_client.go
Outdated
@@ -269,7 +269,7 @@ func (c *memcachedClient) Stop() { | |||
c.workers.Wait() | |||
} | |||
|
|||
func (c *memcachedClient) SetAsync(ctx context.Context, key string, value []byte, ttl time.Duration) error { | |||
func (c *memcachedClient) SetAsync(_ context.Context, key string, value []byte, ttl time.Duration) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was done by me in the PR #2775. I think this proposal branch has been created from that PR branch or something like this.
@@ -0,0 +1,86 @@ | |||
# Delete series for object storage | |||
|
|||
**Date:** <2020-06-17> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has to have certain format, please other proposals, something like:
---
title: Thanos Remote Write
type: proposal
menu: proposals
status: accepted
owner: brancz
---
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Harshitha1234, @bwplotka and I went through these comments during our weekly meeting. :)
|
||
* We propose to implement deletions using the tombstones approach. | ||
* A user is expected to enter the following details for performing deletions: | ||
* **external labels** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that it absolutely correct. We actually want to use label matchers here, so that we can match metric labels as well as external labels.
* **start timestamp** | ||
* **end timestamp** (start and end timestamps of the series data the user expects to be deleted) | ||
* **maximum waiting duration for performing deletions** where a default value is considered if explicitly not specified by the user. (only after this time passes the deletions are performed by the compactor in the next compaction cycle) | ||
* The details are entered via a deletions API (good to have a web UI interface) and they are processed by the compactor to create a tombstone file, if there’s a match for the details entered. Afterwards the tombstone file is uploaded to object storage making it accessible to other components. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compactor is one of the very few components (next to Sidecar, Receiver, Ruler) that has write capabilities to object storage. At the same time it is only ever running once per object storage, so that we have a single writer to the deletion tombstones.
Additionally, by design, we thought that having the compactor "off-the-grid" and thus only accessible by an administrator solves a few permission problems right away.
We'll add this point to the proposal.
* **Perspectives to deal with Downsampling of blocks having tombstones:** | ||
* **Block with tombstones and max duration to perform deletes passed:** The compactor should first check the maximum duration to perform deletions for that block and if the proposed maxtime passed the deletions are first performed and then the downsampling occurs. | ||
* **Block with tombstones and max duration still hasn’t passed:** Perform compaction. | ||
* **Performing deletes on already compacted blocks:** Have a threshold to perform deletions on the compacted blocks (Prometheus has 5%) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now we want to start with the 5% that Prometheus has.
We want to follow up on that threshold on IRC/Matrix and ask why it was chosen. In the future, we can always be smarter about this.
* **Block with tombstones and max duration still hasn’t passed:** Perform compaction. | ||
* **Performing deletes on already compacted blocks:** Have a threshold to perform deletions on the compacted blocks (Prometheus has 5%) | ||
* For undoing deletions of a time series there are two proposed ways | ||
* API to undelete a time series - maybe delete the whole tombstones file? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compactor is continuously running compactions and return errors to the caller might result in a lot of errors all the time, which would be annoying.
For now, we are going to check how Prometheus handles this.
|
||
|
||
|
||
* Tombstones should be append only, so that we can solve these during compaction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tombstones, will update.
|
||
* Add the deletion API (probably compactor) that only creates tombstones | ||
* Store Gateway should be able to mask based on the tombstones from object storage | ||
* A web UI for the deletion API? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally, this proposal in itself is quite complex and we will remove the UI from for now. Later, we will follow up on this.
2f11059
to
2e58734
Compare
Thank you for the feedback all the points were really worth noting 💪 |
4c95b50
to
0dbf838
Compare
Signed-off-by: Harshitha1234 <[email protected]>
Signed-off-by: Harshitha1234 <[email protected]>
Signed-off-by: Harshitha1234 <[email protected]>
Signed-off-by: Harshitha1234 <[email protected]>
Signed-off-by: Harshitha1234 <[email protected]>
Signed-off-by: Harshitha1234 <[email protected]>
0dbf838
to
e62fcd6
Compare
* **Why Compactor**? : The compactor is one of the very few components (next to Sidecar, Receiver, Ruler) that has write capabilities to object storage. At the same time it is only ever running once per object storage, so that we have a single writer to the deletion tombstones. | ||
* If the data with the requested details couldn’t be found in the storage an error message is returned back as a response to the request. | ||
* Store Gateway masks the series on processing the tombstones from the object storage. | ||
* **Perspectives to deal with Downsampling of blocks having tombstones:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**Perspectives to deal with Downsampling of blocks having tombstones:**
Unfinished?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you mean Compactions
maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad! 🤦
* **Why Compactor**? : The compactor is one of the very few components (next to Sidecar, Receiver, Ruler) that has write capabilities to object storage. At the same time it is only ever running once per object storage, so that we have a single writer to the deletion tombstones. | ||
* If the data with the requested details couldn’t be found in the storage an error message is returned back as a response to the request. | ||
* Store Gateway masks the series on processing the tombstones from the object storage. | ||
* **Perspectives to deal with Downsampling of blocks having tombstones:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* **Perspectives to deal with Downsampling of blocks having tombstones:** | |
* **Perspectives to deal with `compaction` of blocks having tombstones:** |
* If the data with the requested details couldn’t be found in the storage an error message is returned back as a response to the request. | ||
* Store Gateway masks the series on processing the tombstones from the object storage. | ||
* **Perspectives to deal with Downsampling of blocks having tombstones:** | ||
* **Block with tombstones and max duration to perform deletes passed:** The compactor should first check the maximum duration to perform deletions for that block and if the proposed maxtime passed the deletions are first performed and then the downsampling occurs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* **Block with tombstones and max duration to perform deletes passed:** The compactor should first check the maximum duration to perform deletions for that block and if the proposed maxtime passed the deletions are first performed and then the downsampling occurs. | |
* **Block with tombstones and max duration to perform deletes passed:** The compactor should first check the maximum duration to perform deletions for that block and if the proposed maxtime passed the deletions are first performed and then the compaction occurs. |
* **label matchers** | ||
* **start timestamp** | ||
* **end timestamp** (start and end timestamps of the series data the user expects to be deleted) | ||
* **maximum waiting duration for performing deletions** where a default value is considered if explicitly not specified by the user. (only after this time passes the deletions are performed by the compactor in the next compaction cycle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am missing what this is, can we reword?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this feature to the future work :)
Signed-off-by: Harshitha1234 <[email protected]>
44a177c
to
435eeab
Compare
Signed-off-by: Harshitha1234 <[email protected]>
Made new changes as per the discussions in my weekly meeting with @metalmatze @bwplotka. :) TODOs from our meeting:
|
Signed-off-by: Harshitha1234 <[email protected]>
Signed-off-by: Harshitha1234 <[email protected]>
Signed-off-by: Harshitha1234 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi everyone! Thanks for this proposal, it seems super well thought out! Did leave some thoughts around the proposed approach, which I hope are useful :)
Also, I have some ideas on how to improve the structure, I see we are mixing the alternatives in the "Proposed Approach" section and also having a separate "Alternatives" section. Can we move all the info regarding alternatives to the "Alternatives" section, and keep the approach, problems with approach and action plan together?
This way, we can read everything about each approach separately, but I am not super familiar with the Thanos style guide so feel free to ignore my suggestions :)
* If the data with the requested details couldn’t be found in the storage an error message is returned back as a response to the request. | ||
* Store Gateway masks the series on processing the tombstones from the object storage. | ||
* **Perspectives to deal with Compaction of blocks having tombstones:** | ||
* **Block with tombstones** Have a threshold to perform deletions on the compacted blocks ([In Prometheus](https://github.com/prometheus/prometheus/blob/f0a439bfc5d1f49cec113ee9202993be4b002b1b/tsdb/compact.go#L213), the blocks with big enough time range, that have >5% tombstones, are considered for compaction.) We solve the tombstones, if the tombstones are greater than than the threshold and then perform compaction. If not we attach the tombstone file to the new block. If multiple blocks are being compacted, we merge the tombstone files of the blocks whose threshold is not met. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not we attach the tombstone file to the new block. If multiple blocks are being compacted, we merge the tombstone files of the blocks whose threshold is not met.
Hrm, curious why this is different from how Prometheus does it. In Prometheus if we are compacting multiple blocks together, then we always remove the data regarding tombstones, regardless of the the threshold. https://github.com/prometheus/prometheus/blob/348ff4285ffa59907c9d7fd7eb3cb7d748f42758/tsdb/compact.go#L777-L806
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think the 5% rule should apply only for blocks that have a max level of compaction. Otherwise, all should be applied during compaction as it's essentially almost free to apply deletions during compaction.
The only caveat was about some delay
time for tombstones to ensure undelete. This functionality was hower decided to be handled in later efforts, right?
* **Blocks without tombstones:** Downsampling happens... | ||
|
||
##### Problems during implementations and possible ideas to solve : | ||
During the implementation phase of this proposal one of the first problems we came across was that we had to pull the index of all the blocks for creating or appending a tombstone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a problem? This is what we need to do if we get a query with the matchers as well right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, Compactor is not responsible for that. It does not load or need to load all the blocks in the system in order to craft tombstones. That's why we cannot use Prometheus format tombstone and have to figure out something else. If we would required to load indexes for ALL blocks in order to create deletion request - this will never scale.
We need to delegate logic of figuring our which series ID to delete in leafs for querying and during compaction for rewrite.
The tricky part is to know which blocks have the data to be deleted. We thought about simple Series call with skipChunk
options
* **Idea 3:** If we want to have an API for deletions, one way is to have the Store API configured in such a way that the compactor can use it to check for a match. | ||
*Edge case:* Do we rebuild all the rows of a tombstone? Because we need to be aware that the tombstone is being included(or not) in the given Store API call. | ||
|
||
*We're starting with (2) Idea 2* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean? Are we implementing (2) Idea 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's be clear about this.
|
||
*We're starting with (2) Idea 2* | ||
|
||
#### Considerations : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what this considerations sections refer to. Is this for the alternative (2 Idea 2) or for the original proposal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was intended for the original proposal :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, amazing! Thanks for massive input @pracucci @gouthamve on your side. Hopefully we can make something that will be useful for Cortex as well, so let's collab 🤗
## Proposed Approach | ||
|
||
* We propose to implement deletions using the tombstones approach. | ||
* To start off we use the Prometheus tombstone file format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure to point out why this is not possible, and let's put that to Alternatives
* **Idea 3:** If we want to have an API for deletions, one way is to have the Store API configured in such a way that the compactor can use it to check for a match. | ||
*Edge case:* Do we rebuild all the rows of a tombstone? Because we need to be aware that the tombstone is being included(or not) in the given Store API call. | ||
|
||
*We're starting with (2) Idea 2* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's be clear about this.
##### Problems during implementations and possible ideas to solve : | ||
During the implementation phase of this proposal one of the first problems we came across was that we had to pull the index of all the blocks for creating or appending a tombstone. | ||
|
||
##### Proposed Alternative Approaches: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally we would do
## Alternative
sections with couple of alternative, but with clear separation. e.g
## Proposed Solution
* CLI.
* Custom format global - single file per request
.. elaborate on how and what challenges / open questions are.
## Alternatives
### Where Deletion API sits
1. Compactor:
* Pros: Easiest as it's single writer in Thanos (but not for Cortex)
* Cons: Does not have quick access to querying or block indexes
1. Store GW:
* Pros: Does have quick access to querying or block indexes
* Cons: There would many writers, and store never was about to write things, it only needed read access
### How we store tombstone and in what format
1. Prometheus Format per block
1. Prometheus Format global
1. Custom format per block
1. Custom format global - appendable file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can explain that * Custom format global - single file per request
idea is good because:
- https://cloud-native.slack.com/archives/CL25937SP/p1595954850430300
- We can easily have multiple writers
- No need to have index data involved
Chalenges
- How to "compact" / "remove" those files when applied.
- Do we want those deletion to by applied only during compaction or also for already compacted blocks. If yes how? 5%? And how to tell that when is 5%? What component would do that?
- Any rate limiting, what if there are too many files? Would that scale?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus other ideas to talk about later like:
Optimization for file names?
Signed-off-by: Harshitha1234 <[email protected]>
Signed-off-by: Harshitha1234 <[email protected]>
Hey, can we finalize this and make sure it's clean for other ppl? We just share it to Cortex devs, they would be happy to contribute (: |
Yes, sure. PTAL I've made the latest changes based on our last discussion about the proposal, if something doesn't seem right please let me know 🙂 |
Signed-off-by: Harshitha1234 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this is a first step, but to me it leaves more open questions than answers. I understand the need to "start" with something, so LGTM! In particular:
- I agree with global tombstones (well, I proposed it, so I'm just agreeing with myself)
- It's likely we'll find out that file name format may open to optimisations, but I would leave this decision to the implementation phase
Let's go! 🚀
|
||
## Considerations | ||
|
||
* The old blocks and the tombstones are deleted during compaction time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this opens to a race condition.
Above it's mentioned this:
Store Gateway masks the series on processing the global tombstone files from the object storage.
If the store gateway scans deleted tombstones before the old blocks are effectively offloaded by the store gateway (we do soft delete first, apply a delay and then delete hard delete them) we may open to a time window during which the tombstone is deleted, the store gateway removed it from its local map, the old block is still queried and we do re-expose deleted series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it has to be delayed-deleted or something else... It's not easy, because:
- How we know there are no more stuff to delete then in mentioned block?
- What if someone backfills block that touches deleted period?
- What if min-time is .. not specified? Does it has to be?
Behaviour on all those questions have to be defined in this proposal, explicit and simple to understand for user of deletion API I think ):
## Proposed Approach | ||
|
||
* We propose to implement deletions via the tombstones approach using a CLI tool. | ||
* A tombstone is proposed to be a **Custom format global - single file per request**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't found the exact format of this file documented. Anyway, I would add the timestamp of the deletion request inside as well. Would allow to build hard deletion grace periods, without having to read it from the object storage (object creation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still missing @Harshitha1234
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to finally merge. Seems like we at least agree on the first steps so far 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Some further comments. I think we have some further questions to define in this proposal. Let's try to find a simple solution that will work (: and simplify things for first iteration. What matters is that it has to be simple to use.
## Proposed Approach | ||
|
||
* We propose to implement deletions via the tombstones approach using a CLI tool. | ||
* A tombstone is proposed to be a **Custom format global - single file per request**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still missing @Harshitha1234
* We propose to implement deletions via the tombstones approach using a CLI tool. | ||
* A tombstone is proposed to be a **Custom format global - single file per request**. | ||
* **Why Custom format global - single file per request**? : | ||
* https://cloud-native.slack.com/archives/CL25937SP/p1595954850430300 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
We need more details, can we use full sentence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also design is to make sure we have full view, so I would rather explain instead of linking some further reading (:
* During compaction or downsampling, we check if the blocks being considered have series corresponding to a tombstone file, if so we delete the data and continue with the process. | ||
|
||
## Open Questions | ||
* Optimization for file names? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe more why we would need this? (:
|
||
## Open Questions | ||
* Optimization for file names? | ||
* Do we create a tombstone irrespective of the presence of the series with the given matcher? Or should we check for the existence of series that corresponds to the deletion request? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be moved to proposal - we would love to avoid doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have this in alternatives
## Open Questions | ||
* Optimization for file names? | ||
* Do we create a tombstone irrespective of the presence of the series with the given matcher? Or should we check for the existence of series that corresponds to the deletion request? | ||
* Should we consider having a threshold for compaction or downsampling where we check all the tombstones and calculate the percentage of deletions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - this should be covered as well here 🤔
As per - how to delete stuff from already compacted blocks section
|
||
## Considerations | ||
|
||
* The old blocks and the tombstones are deleted during compaction time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it has to be delayed-deleted or something else... It's not easy, because:
- How we know there are no more stuff to delete then in mentioned block?
- What if someone backfills block that touches deleted period?
- What if min-time is .. not specified? Does it has to be?
Behaviour on all those questions have to be defined in this proposal, explicit and simple to understand for user of deletion API I think ):
Signed-off-by: Harshitha1234 <[email protected]>
Signed-off-by: Harshitha1234 <[email protected]>
Signed-off-by: Harshitha1234 <[email protected]>
Let's get back to this... is it ready for review? 🤗 |
Yes, it is :) |
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Still valid! @bwplotka Are there any key changes that we expect in this proposal? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, just implement this 🤗
Merged! Thanks @Harshitha1234 🤗 |
Oh yeah! Let's do this then!! 💪 |
|
||
## Future Work | ||
|
||
* Performing actual deletions in the object storage and rewriting the blocks based on tombstones by the compactor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This question is still unknown to me. How does compactor planning work with global tombstones?
Or we just don't do anything in the planner and only delete data when compaction triggers automatically or perform block rewrite using the rewrite tool?
Changes
Design proposal for delete series #1598
Verification