-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading footer of parquet file: error in rang e read from s3 backend 416 Requested Range Not Satisfiable #4166
Comments
I have not seen this particular error and suggests a corrupt block. The block has a meta.json file with two relevant fields: The "size" field should be the exact size in bytes of the data.parquet file. Is this the case? The footer size is just how far back into the file that is read to get the footer. If the data.parquet file is not corrupt then it would technically be possible to rebuild the meta.json and bloom filters by replaying the data, but I don't think we've written code to do that. It may be also interesting to try to open the data parquet in an external tool like: https://github.com/stoewer/parquet-cli to see if the parquet file itself is corrupt or the meta.json is just wrong. Unless you are just in the mood to deep dive the issue, the easiest fix would be to simply remove the block. |
Hii I am facing the similar issue where size field in meta.json is misconfigured with the size of data.parquet. Any idea or suggestion to fix this ? |
Technically you can just delete the block and lose the traces, but I'm surprised we're getting reports of this. We have 10s of millions of blocks in our backends and I don't think we've ever seen this. |
Hii {"format":"vParquet3","blockID":"000530a5-1e4e-4528-82be 1ac71b093f4f", "minID":"AAApWFvHR6oWqYAmXy4FLA==", "maxID":"///4AfCcpyAPH9fOSh+3ag==", One thing we noticed that we had lots of compactor restart after changing few changes in compactor it became stable and then this issue started. Config.yaml: cache:
|
Hi everyone, this was also reported at https://community.grafana.com/t/error-reading-footer-of-parquet-file/133954 The commonality of it and the original issue was using minio, but the next report above is s3. Still related as all use the same minio s3 client. |
We have now seen this issue occur on a level 0 block. We believe this is caused when an ingester flushes a partially completed block. Can those folks who have seen this issue help confirm?
|
Sharing what we have seen to preserve thoughts while they are fresh and to see if this coincides with what others are seeing.
On startup there is no indication this block exists as completing/wal block which suggests the block was completed and was removed without this line being logged. In addition the completion was only done partially. The meta.json was written, but the block was not entirely flushed to disk which suggests a bug in this logic. |
After further research we have determined this was caused by a node failure in one of our clusters. As the ingester was going down the disk was failing and the ingester was unable to shutdown cleanly. The disk on startup was in an unknown state. To fix we intend to do better validation checking on startup and, in particular, test for this case where the data.parquet file is strangely truncated. If you are seeing this issue we would recommend checking that the ingester is cleanly shutting down. You should see a log line like this when the ingester receives SIGINT:
and a second line like this to confirm it has completely shut down:
|
Describe the bug
Fails to fetch parquet objects from S3/Minio and get this error
To Reproduce
Steps to reproduce the behavior:
This is our config:
Environment:
Additional Context
This is how the files are organized in the tempo machine
tempo.log
The text was updated successfully, but these errors were encountered: