-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compactor: Compaction of Raw Data without cleaning local Disk #1499
Comments
When you run the sidecar to upload the blocks to remote storage you should have the compaction disabled If you need the compaction enabled for some reason you can set the Prometheus size or time retention flag so that older blocks get deleted. |
@krasi-georgiev |
@Jakob3xD oops, sorry for the confusion. Lines 1018 to 1021 in 865d5ec
Are you sure that the compacted blocks don't exceed 1tb or if you don't have something else on that HDD that occupied the space? The cleanup happens while the compactor downloads the blocks so in theory there could be a race where the old blocks are not removed while trying to download the new ones. |
@krasi-georgiev I only see level=info but I can tell that each time the message
occur the folder gets fully cleaned up. Just to be clear. My issue is that after the compactor performed the compaction and starts to delete the blocks on the object storage it does not fully clean them on his local folder. The compactor deletes about 40gb on the object storage and on his local folder but there are still 46gb of data in his local folder which does not get cleaned up. Does the Compactor deletes the compacted block on his folder or just the blocks he compacted? |
The code is to delete everything in |
For my understanding and to clarify it.
Do you mean the moment where the Compactor starts the compaction of the whole level. So when the log message says caller=compact.go:1035 msg="start of compaction" or do you mean the the moment where a block gets compacted caller=compact.go:440 msg="compact blocks"? |
I little bit before: |
Yes, so that was the simplificaitons we did. We have only one place to delete stuff (beginning of compactor iteration), in order to reduce cases of forgetting to delete. (: I think this issue shows that we can't simplify like this and it would be nice to clean things ASAP (as well as on the beginning - in case of crash/restart) |
@krasi-georgiev @bwplotka It seems like you guys misunderstood my issue.
Today a new level 4 compaction started with Thanos v0.8.1 and the issues is still present.
I am note quite sure about all steps made by the down-sampling because not everything is displayed in the logs. So that step 3 looks like this: Do you guys understand my issue? Followed are my current compactor settings: |
Fixes: #1499 Signed-off-by: Bartek Plotka <[email protected]>
Thanks, I made our e2e more strict and fixed this hopefully. Each compaction group run was still reusing disk potentially: #1666 |
Fixes: #1499 Signed-off-by: Bartek Plotka <[email protected]>
…s. (#1666) * Fixed compactor tests; Moved to full e2e compact test; Cleaned metrics. Signed-off-by: Bartek Plotka <[email protected]> * Removed block after each compaction group run. Fixes: #1499 Signed-off-by: Bartek Plotka <[email protected]> * Moved to label hash for dir names for compactor groups. Fixes: #1661 Signed-off-by: Bartek Plotka <[email protected]> * Addressed comments. Signed-off-by: Bartek Plotka <[email protected]> * Addressed comments, rebased. Signed-off-by: Bartek Plotka <[email protected]>
…s. (#1666) * Fixed compactor tests; Moved to full e2e compact test; Cleaned metrics. Signed-off-by: Bartek Plotka <[email protected]> * Removed block after each compaction group run. Fixes: #1499 Signed-off-by: Bartek Plotka <[email protected]> * Moved to label hash for dir names for compactor groups. Fixes: #1661 Signed-off-by: Bartek Plotka <[email protected]> * Addressed comments. Signed-off-by: Bartek Plotka <[email protected]> * Addressed comments, rebased. Signed-off-by: Bartek Plotka <[email protected]> Signed-off-by: Giedrius Statkevičius <[email protected]>
…s. (#1666) * Fixed compactor tests; Moved to full e2e compact test; Cleaned metrics. Signed-off-by: Bartek Plotka <[email protected]> * Removed block after each compaction group run. Fixes: thanos-io/thanos#1499 Signed-off-by: Bartek Plotka <[email protected]> * Moved to label hash for dir names for compactor groups. Fixes: thanos-io/thanos#1661 Signed-off-by: Bartek Plotka <[email protected]> * Addressed comments. Signed-off-by: Bartek Plotka <[email protected]> * Addressed comments, rebased. Signed-off-by: Bartek Plotka <[email protected]>
Thanos, Prometheus and Golang version used
Thanos: v0.7.0 Prometheus: latest (v2.12.0)
What happened
When compacting Raw data the local disk wont clean after a compaction is done.
In my case this will cause the compactor to use more then 1TB of Space when stating a lvl 4 compaction.
What you expected to happen
I expected that the compactor clears the data on his local disk after he uploaded the compacted block to the object storage like it is done on the down-sampled data.
It was expected that the compactor crashes once the 1TB disk was full.
How to reproduce it (as minimally and precisely as possible):
Some blocks form several Sidecars which needs to compact.
The one larger peak at 11:15 is the finish of the level 3 compaction.
Each small peak you see is a new compaction to level 4.
Full logs to relevant components
Once the disk is full:
level=error ts=2019-09-05T16:51:30.809934432Z caller=main.go:213 msg="running command failed" err="error executing compaction: compaction failed: compaction failed for group 0@{monitor="node",prom="prom-30",region="fsn1",replica="a"}: download block 01DM09QBNY0SDHGPT6J71FPST2: copy object to file: write /tmp/thanos/compact/0@{monitor="node",prom="prom-30",region="fsn1",replica="a"}/01DM09QBNY0SDHGPT6J71FPST2/chunks/000002: no space left on device"
Anything else we need to know
The text was updated successfully, but these errors were encountered: