-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] Crash in YbPgInheritsCacheDelete: Assert(entry->refcount == 1); #19807
Labels
Comments
jasonyb
added
kind/bug
This issue is a bug
area/ysql
Yugabyte SQL (YSQL)
status/awaiting-triage
Issue awaiting triage
labels
Nov 2, 2023
jasonyb
pushed a commit
that referenced
this issue
Dec 7, 2023
Summary: Test TestPgRegressIndex has been often failing since commit 465ee2c, titled [#18082] YSQL: Stop using ForeignScan for YB relations That commit changes user-initiated system table requests such as SELECT distinct(current_version = last_breaking_version) from pg_yb_catalog_version; to use YbSeqScan rather than ForeignScan. A difference between the two is that YbSeqScan does not set catalog_version in the request to DocDB. The function ybcBeginScan, which is shared by both YbSeqScan and internal system table scans, has logic to avoid setting catalog_version for system tables. This logic has been in place so that the internal system table scans don't hit catalog version mismatch error (which is actually less correct, but that is a discussion for another day). Therefore, on server-side (in this case, master), the catalog version check does not happen for this query, so catalog version mismatch is not detected. Then, the subsequent queries almost always run off an outdated catalog because the true catalog version doesn't propagate fast enough. The test relies on these queries to execute with an up-to-date catalog, so this results in failure. A simple fix for the test is to add sleeps so that the catalog version can propagate. This strategy is already being used by other tests, and this situation could be treated as not much different. Instead, bring back the old behavior of sending ysql_catalog_version in user-initiated system table requests. Do so by further requiring the scans to be internally-generated in order to skip setting catalog_version. The IsSystemRelation condition is still needed because internally-generated scans can scan user tables, such as in index build. Add the assumptions about queries causing catalog version mismatch and cache refresh in the test. Leave issue #16408 open since the overall java test is still failing due to issue #19807 among other reasons. Jira: DB-8984 Test Plan: TestPgRegressIndex is flaky for multiple reasons, such as - issue #19807 - DropTable RPC timed out - expired or aborted by a conflict: 40001 - Restart read required at - could not serialize access due to concurrent update Manually check the following: ./yb_build.sh fastdebug --gcc11 \ --java-test TestPgRegressIndex -n 10 Backport-through: 2.20 Close: #20017 Reviewers: myang Reviewed By: myang Subscribers: kfranz, amartsinchyk, yql Differential Revision: https://phorge.dev.yugabyte.com/D30412
jasonyb
pushed a commit
that referenced
this issue
Dec 8, 2023
Summary: Test TestPgRegressIndex has been often failing since commit 465ee2c, titled [#18082] YSQL: Stop using ForeignScan for YB relations That commit changes user-initiated system table requests such as SELECT distinct(current_version = last_breaking_version) from pg_yb_catalog_version; to use YbSeqScan rather than ForeignScan. A difference between the two is that YbSeqScan does not set catalog_version in the request to DocDB. The function ybcBeginScan, which is shared by both YbSeqScan and internal system table scans, has logic to avoid setting catalog_version for system tables. This logic has been in place so that the internal system table scans don't hit catalog version mismatch error (which is actually less correct, but that is a discussion for another day). Therefore, on server-side (in this case, master), the catalog version check does not happen for this query, so catalog version mismatch is not detected. Then, the subsequent queries almost always run off an outdated catalog because the true catalog version doesn't propagate fast enough. The test relies on these queries to execute with an up-to-date catalog, so this results in failure. A simple fix for the test is to add sleeps so that the catalog version can propagate. This strategy is already being used by other tests, and this situation could be treated as not much different. Instead, bring back the old behavior of sending ysql_catalog_version in user-initiated system table requests. Do so by further requiring the scans to be internally-generated in order to skip setting catalog_version. The IsSystemRelation condition is still needed because internally-generated scans can scan user tables, such as in index build. Add the assumptions about queries causing catalog version mismatch and cache refresh in the test. Leave issue #16408 open since the overall java test is still failing due to issue #19807 among other reasons. Jira: DB-8984 Test Plan: TestPgRegressIndex is flaky for multiple reasons, such as - issue #19807 - DropTable RPC timed out - expired or aborted by a conflict: 40001 - Restart read required at - could not serialize access due to concurrent update Manually check the following: ./yb_build.sh fastdebug --gcc11 \ --java-test TestPgRegressIndex -n 10 Backport-through: 2.20 Original commit: 1350cef / D30412 Reviewers: myang Reviewed By: myang Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D30851
deeps1991
pushed a commit
that referenced
this issue
Dec 18, 2023
…nt == 1) Summary: Java test TestPgRegressIndex was failing because the assert mentioned in the title was firing. This assert ensures that, when a cached entry in the YbPgInheritsCache is being removed, there are no references pointing to it at this point. In normal scenarios, the above assumption is true, because every code path accessing the cache carefully releases any references to this cache at the end of its use. However, in some cases, it is possible for a query to lookup the cache and fail with error before it could release its reference to the cache. In this case, the assertion fires and causes PG backend to crash. To fix this issue, this cache should be cleaned up the same way the relcache is cleaned. Relcache is associated with a ResourceOwner. When a transaction or subtransaction ends, the resource owners perform cleanup. At this point, if the resource owner sees objects with unreleased references, they are released silently if the transaction was aborted. However, a WARNING is printed if the transaction was committed as we expect all resources to be cleaned up neatly in case of committed transaction/subtransactions. The same reasoning is applied for the YbPgInheritsCache, and is also associated with a ResourceOwner ensuring that references here are cleaned up in the same way Since yb_pg_indexing test is used to validate the above change, added a minor unrelated fix to the yb_pg_indexing.out file. The test was failing because the out file did not contain the NOTICE that is printed when table rewrite operations are carried out. Jira: DB-8741 Test Plan: ./yb_build.sh --java-test TestPgRegressIndex : specifically yb_pg_indexing file ./yb_build.sh --java-test org.yb.pgsql.TestPgCacheConsistency#testPgInheritsCacheConsistency ./yb_build.sh --java-test org.yb.pgsql.TestPgRegressPartitions Minimal test to reproduce this issue (fetched from yb_pg_indexing) ``` create table idxpart (a int, b int, PRIMARY KEY(a,b)) partition by range (a, b); create table idxpart1 partition of idxpart for values from (0, 0) to (10, 10); create index idxpart_a_b_idx on only idxpart (a, b); create index nonconcurrently idxpart1_a_b_idx on idxpart1 (a, b); alter index idxpart_a_b_idx attach partition idxpart1_a_b_idx; create index nonconcurrently idxpart1_2_a_b on idxpart1 (a, b); -- This statement fails due to duplicate index already attached to the -- partitioned index. The failure occurs when a reference to the parent -- index cache entry is taken but not yet released. alter index idxpart_a_b_idx attach partition idxpart1_2_a_b; -- Without this fix, this statement was failing as we try to invalidate -- the cache entry and notice that the refcount != 1. With the fix, this -- statement completes successfully drop table idxpart; ``` Reviewers: dmitry Reviewed By: dmitry Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D30004
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Jira Link: DB-8741
Description
./yb_build.sh fastdebug --gcc11 --java-test TestPgRegressIndex
crashes on yb_pg_indexing test. Analyzing the coredump,See #16408 (comment).
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: