-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flink iceberg may occur duplication when succeed to write datafile and commit but checkpoint fail #10765
Comments
Is this similar to #10526? |
|
I have taken another look at the info you have provided. Do you have any more detailed log, or any way to repro the case? As you highlighted there are 2 commits with the same data:
This should be prevented by the |
@pvary Because there were other Flink apps, there were too many logs, so I am attaching additional logs by filtering them by table name among all logs.
Checking the log, it seems that there is a "commiting" log at 04:57, but there is no "commited" log for this. |
Which version of Flink do you using btw? I see this in the logs This means that Flink doesn't know if 19516 (snapshot So when it recovers it will find the metadata in the state for checkpoint So we have to check the recovery codepath to understand what is happening. https://github.com/apache/iceberg/blob/apache-iceberg-1.4.3/flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java#L144
What Catalog are you using? Is there any cache, or something which might return wrong data for the table? Thanks, |
I use flink version 1.15.4
We use hive catalog, and set up 'engine.hive.lock-enabled=true' By the way, there seems to be one more strange thing in the log. After processing the checkpointId for 19516, there should be a log somewhere for processing the 19517 checkpoint ID, but there is no log at all. Because our system performs checkpoints every minute, the checkpoint for 19517 should be performed around 04:58:32 after the 19516 checkpoint performed at 04:57:32. However, there is no related log. This is because the taskmanager was shut down around that time. However, an attempt was made for 19516 at 04:59:22, and if you look at the log at 05:00:00, the 19516 snapshot was successfully committed. But there is no log for 19517 anywhere. There is no record for 19517 even when looking at metadata.json! Is it normal situation during recovery?
Thanks! |
I think this is the log for the 19517:
The job was recorvering, and not all of the operators were running, so checkpoint 19517 failed. |
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. |
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' |
Apache Iceberg version
1.4.3
Query engine
Flink
Please describe the bug 🐞
It seems like very rare duplicates occur in flink iceberg.
Let me explain the situation I experienced:
result
Looking at the situation, the restore was done with the checkpoint ID that was completed up to the checkpoint, and the commit was performed again up to the completed checkpoint.
As a result, the commit for checkpoint id 19516 was performed twice, pointing to the same data file.
When I read this in trino, the same data will be read twice and will appear duplicated.
I tried to delete it to resolve duplicate data, but the following error occurred in trino.
Found more deleted rows than exist in the file
rewrite_file was also performed
but contrary to the message, data duplication was not actually resolved.
and expire snapstot is not worked, too.
Finally, I will try to modify manifest directly.(Addition) After rewrite file, I found that data can be deleted without error.
So i create backup table, and backup other table without duplicated data
Then. delete duplicated data in original table and copy from backup table
Is there a better solution in this situation?
and please check this situation
Thanks.
Willingness to contribute
The text was updated successfully, but these errors were encountered: