-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] Master stale reads: Not specifying read-hybrid-time causes SysCatalog to read stale data #17342
Comments
|
|
@amitanandaiyer I think in the last debugging I did on this, indeed, the iterator would choose a time slightly in the past, due to some safetime logic / iterator creation method argument |
From Sergei on Slack previously (https://yugabyte.slack.com/archives/C03ULJYE0KG/p1663614223304439?thread_ts=1663356281.778909&cid=C03ULJYE0KG)
|
Yes. you're right. This is the same issue you/sergei have been commenting on. SysCatalog/CatalogMaster doesn't specify the read time, so this internally gets translated to However, IMO, the issue is not with using SingleTime, but with getting the safe time with RequireLease::kFalse -- this essentially is a "read from follower" like semantics where we may not see the latest update. using RequireLease::kTrue should fix the issue. I also see a few places in SysCatalog use ReadHybridTime::Max() .. this might work too, in terms of not missing out on the latest write -- however, it seems risky to do this as I think the iterator may see other "future" writes and not get a consistent view. Probably worth revisiting its usage to see if we are ok with this? The reason these "fixes" didn't quite work when you tried is because of the RequestScope issue, which also contributes to stale reads from master. |
@deeps1991, Do you think this should be backported? If so upto what stable branch? cc @amitanandaiyer |
…pecify the ReadHybridTime Summary: Syscatalog currently does not specify a read_hybrid_time when reading the catalog version. This causes the docdb layer to pick up a read-hybrid time without a lease -- which is ok for consistent prefix/stale-reads, however it may not be the most up to date. This can cause stale data to be returned while reading the sys catalog, and may cause issues like in GH 5030. ``` docdb::ReadOperationData read_operation_data = { .deadline = deadline, .read_time = read_hybrid_time ? read_hybrid_time : ReadHybridTime::SingleTime(VERIFY_RESULT(SafeTime(RequireLease::kFalse))), }; ``` Ideally, we'd want the pg layer to specify the read-hybrid-timestamp at which the read needs to be done. Choosing the read-hybrid-time based on SingleTime(SafeTime) at the master, even if done with a lease, could cause a stale read. A transaction that has already been committed on a different node/status-tablet with a future timestamp, could go unseen. An alternative to this is to use ReadHybridTime::kMax, however this may expose future writes and give an inconsistent view of the system. This should generally be avoided. This issue is avoided in the DML case as the client specifies the read-hybrid-time accounting for clock skew `NowWithRange` and the reads are restarted. This isn't the case with sys-catalog reads as the read-timestamp is chosen at the master itself, and there are no read restarts being done. Choosing the "current time at master" as the read time has an additional complication that it may not yet be safe to read at -- thus requiring a wait for the safe-time to catch up. This can delay the read(s) by up to a heart-beat interval (500ms) if there are no active writes. -- This can be alleviated by choosing the current SafeTime as the read time, which should avoid the wait in the common case. However, if there are future intents for a committed transaction within the clock-skew window, we may still have to wait. Thus to summarize: (1) ReadHybridTime() => SafeTime(kFalse) => Consistent but possibly stale results. Safest to use and should be preferred if the clients are ok with possibly stale reads. (2) Use RHT::Max() => Possibly inconsistent if there are concurrent writes. However, should be ok if there are no concurrent writes. (3) Use NowWithRange() => Consistent and safe to use, however may cause the rpc to wait. Also, callers need to handle read-restarts. (4) Use SafeTimeWithRange => Consistent and safe to use. No waits in the common case. Callers need to handle read-restarts. Can lead to waiting for safe time in the uncommon case. Given that PG ensures that until DDL transaction verification is completed, no other DDL can occur on the table, we this diff updates them to use (4) to read the latest written values in the context of catalog reads which need the latest value.. The following clients require up-to-date state of PG catalog and have been switched to do reads with restarts: 1. YSQL catalog version reads 2. ysql_transaction_ddl These clients do not require the absolute latest state, but need consistent reads. 1. CDC 2. Tablespace refresh task 3. Backup-restore that tries to determine the tablegroup version. They have been updated to explicitly specify ReadHybridTime() as the timestamp used. This shouldn't result in any behavioral change for such use cases, as this is the same timestamp that was used prior to this diff. Test Plan: ybd release --cxx-test pg_backends-test --gtest_filter PgBackendsTest.Stress --tp 1 -n 10 Reviewers: dsrinivasan, jason, pjain, sergei, rsami, rthallam, kpopali Reviewed By: dsrinivasan, sergei Subscribers: hsunder, bogdan, jason, ybase Differential Revision: https://phorge.dev.yugabyte.com/D25342
… from ReadYsqlDBCatalogVersionImpl Summary: b7fadbe / D25342 introduced updating RestartReadHt to prevent stale reads. However, this wasn't being updated when control exits early from ReadYsqlDBCatalogVersionImpl. Update the code path to ensure that RestartReadHt is updated. Jira: DB-6543, DB-7433 Test Plan: Jenkins Reviewers: dsrinivasan Reviewed By: dsrinivasan Subscribers: bogdan, ybase Differential Revision: https://phorge.dev.yugabyte.com/D27366
a1729c3 is not present in 2.18 and prior branches. Based on chatting with @deeps1991 and @rthallamko3, we shouldn't want to backport this diff to older branches. |
Jira Link: DB-6543
Description
Master stale reads could also happen after fixing the RequestScope in issue #14926
seems like
transaction_status_cache
is returning PENDING for a transaction that was committed by the client/pg.Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: