Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot handle a rare case for flash back table #1664

Closed
zanmato1984 opened this issue Mar 26, 2021 · 1 comment · Fixed by #8422
Closed

Cannot handle a rare case for flash back table #1664

zanmato1984 opened this issue Mar 26, 2021 · 1 comment · Fixed by #8422
Labels

Comments

@zanmato1984
Copy link
Contributor

Current tombstone + flash back strategy cannot handle the following rare case:

  • Alter table by widening a column
  • Insert a row with the widened column's value to be out of the range of the type before widening
  • Drop table
  • TiFlash does a full schema sync and directly tombstones the table (with the alter diff missed)
  • TiFlash flushes the new inserted row into storage

In such a case, TiFlash won't be able to decode this row and errors out "overflow detected".

But as you can see, this is a extremely rare case.

@zanmato1984 zanmato1984 self-assigned this Mar 26, 2021
@zanmato1984 zanmato1984 added type/bug The issue is confirmed as a bug. severity/moderate labels Mar 26, 2021
@JaySon-Huang
Copy link
Contributor

JaySon-Huang commented Mar 27, 2021

I've met another corner case that will lose data quietly after flash back a table:

How to reproduce

Do not execute DBGInvoke __refresh_schemas() in line 4, then only 0 row can be selected from TiFlash at last.

### Test case for applying raft snapshot for tombstoned table
mysql> create table test.t(id int);
# It is important that TiFlash has synced the table schema
>> DBGInvoke __refresh_schemas()      <--- comment out this line

# Insert some record
mysql> insert into test.t values (3),(4);

# Enable the failpoint and make it pause before applying the raft snapshot
>> DBGInvoke __init_fail_point()
>> DBGInvoke __enable_fail_point(pause_before_apply_raft_snapshot)
>> DBGInvoke __enable_fail_point(pause_until_apply_raft_snapshot)
mysql> alter table test.t set tiflash replica 1;

# Drop table and force sync schema to make sure table in TiFlash is tombstoned.
mysql> drop table test.t;
>> DBGInvoke __refresh_schemas()

# Wait for a while so that the region snapshot is sent to TiFlash by the Region leader
SLEEP 3
# Disable the failpoint to apply writes even if the storage is tombstoned.
>> DBGInvoke __disable_fail_point(pause_before_apply_raft_snapshot)

# Wait till the snapshot is applied
>> DBGInvoke __wait_fail_point(pause_until_apply_raft_snapshot)

# Recover table and force sync schema to make sure table in TiFlash is recovered.
mysql> recover table test.t;
>> DBGInvoke __refresh_schemas()

func> wait_table test t

# Read again, the record should appear.
mysql> set session tidb_isolation_read_engines='tiflash'; select * from test.t;  <-- Only 0 row can be selected from TiFlash
+------+
| id   |
+------+
|    3 |
|    4 |
+------+

mysql> drop table if exists test.t;

Why data is lost

  • Create a table t with some rows
  • Create a TiFlsh replica in TiDB
  • Drop the table t
  • Somehow the table t is not synced to TiFlash node (important)
    • Maybe the duration between creating table and dropping table is short
    • Or maybe a TiFlash node is down for a long time, and the table is created and dropped during the time the TiFlash node is down
  • The table t is tombstoned but the placement-rule will be set for table t and Raft leader will send a snapshot to TiFlash node
  • TiFlash node try to apply the snapshot to storage
    • Find that the storage t does not exist ("get a nullptr") and trigger schema-sync
    • SchemaSyncer get the latest schema snapshot, t is tombstoned and not existed in the latest schema snapshot, so it is not created in TiFlash node (important)
    • Try to decode data again but still can not find t in TiFlash, so ignore that snapshot
  • Recover table t in TiDB
  • TiFlash sync the table t, but no snapshot will be applied
  • The rows stored in TiKV is not consistent with those in TiFlash (!!!)

ti-chi-bot bot pushed a commit that referenced this issue Jun 15, 2023
ti-chi-bot bot pushed a commit that referenced this issue Nov 30, 2023
ti-chi-bot bot pushed a commit that referenced this issue Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants