Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Enable DDL related tests to run with read committed isolation #19975

Open
1 task done
pkj415 opened this issue Nov 16, 2023 · 0 comments
Open
1 task done

[YSQL] Enable DDL related tests to run with read committed isolation #19975

pkj415 opened this issue Nov 16, 2023 · 0 comments
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@pkj415
Copy link
Contributor

pkj415 commented Nov 16, 2023

Jira Link: DB-8949

Description

There are many DDL related tests marked with this gh issue number which have been disabled to run with read committed isolation:

  1. PgCatalogVersionTest.FixCatalogVersionTable due to reason (1), see below
  2. PgCatalogVersionTest.DBCatalogVersionGlobalDDL due to reason (1)
  3. PgIndexBackfillTest.CreateIndexSimultaneously due to reason (2)
  4. PgPackedRowTest.AlterTable/*
  5. PgLibPqTest.ConcurrentInsertTruncateForeignKey due to reason (3), only reproable in debug builds on devserver
  6. PgDDLConcurrencyTest.IndexCreation due to reason (4), only reproable in debug builds on devserver
  7. PgDdlAtomicityParallelDdlTest.TestParallelDdl due to reason (4), only reproable in debug builds on devserver
  8. YBBackupTest.TestYSQLRestoreBackupToDBCatalogVersionMode due to reason (4)
  9. Tests in pg_ddl_atomicity-test due to reason (5)

Reasons for failure:
(1)Fails with READ COMMITTED semantics don't apply to DDL transactionsbecausePgTxnManager::ResetTransactionReadPoint is called while performing catalog version increment in the final stage of a DDL.

[ts-1] 2023-10-13 01:19:17.032 UTC [144933] ERROR:  READ COMMITTED semantics don't apply to DDL transactions:     @     0x7f2251643995  yb::pggate::PgApiImpl::ResetTransactionReadPoint()
	[ts-1]      @     0x7f2251637bc9  YBCPgResetTransactionReadPoint
	[ts-1]      @     0x55a0865e4fa1  GetTransactionSnapshot
	[ts-1]      @     0x55a08629f38b  fmgr_sql
	[ts-1]      @     0x55a08613cf3f  YbCallSQLIncrementCatalogVersions
	[ts-1]      @     0x55a08613b291  YbIncrementMasterCatalogVersionTableEntry
	[ts-1]      @     0x55a0865c061b  YBDecrementDdlNestingLevel
	[ts-1]      @     0x55a0865c5e62  YBTxnDdlProcessUtility
	[ts-1]      @     0x55a086444bc8  PortalRunUtility
	[ts-1]      @     0x55a086444237  PortalRunMulti
	[ts-1]      @     0x55a086443aab  PortalRun
	[ts-1]      @     0x55a086440b06  yb_exec_simple_query_impl
	[ts-1]      @     0x55a0864410eb  yb_exec_query_wrapper_one_attempt
	[ts-1]      @     0x55a08643e0ab  PostgresMain
	[ts-1]      @     0x55a0863a8061  BackendRun
	[ts-1]      @     0x55a0863a7434  ServerLoop
	[ts-1]      @     0x55a0863a3652  PostmasterMain
	[ts-1]      @     0x55a0862fa0cf  PostgresServerProcessMain
	[ts-1]      @     0x55a0862fa6c2  main
	[ts-1]      @     0x7f2250df4d85  __libc_start_main
	[ts-1]      @     0x55a08607cf6e  _start
	[ts-1]

(2) Fails because DDL is retried in RC which is not expected by the test

[ts-1] 2023-10-17 22:09:29.994 UTC [64621] LOG:  Restarting statement due to kReadRestart/kConflict error:
[ts-1] 	Query: CREATE INDEX iii ON ttt (i)
[ts-1] 	Error: could not serialize access due to concurrent update

I1016 22:18:37.496551 16270 client-internal.cc:2593] New master addresses: [127.0.0.1:21282]
../../src/yb/yql/pgwrapper/pg_index_backfill-test.cc:984: Failure
Expected equality of these values:
		table_info->schema.version()
		  Which is: 10
		expected_schema_version
		  Which is: 2

(3) Check failed: !options.ddl_mode() Restarting a DDL transaction not supported

(4) Check failed: !IsDdlMode() READ COMMITTED semantics don't apply to DDL transactions

(5) "Transaction for catalog table write operation 'pg_type' not found"

./yb_build.sh --cxx-test pgwrapper_pg_index_backfill-test --gtest_filter PgIndexBackfillTest.CreateIndexSimultaneously

Issue (1) can be solved by moving --ddl_transaction_state.nesting_level; just before HandleYBStatus(YBCPgExitSeparateDdlTxnMode()); in YBDecrementDdlNestingLevel(). This is done in pg_yb_utils.c in https://phorge.dev.yugabyte.com/D29278?vs=148742&id=152090#toc. However, this results in issue (5) cropping up in some of the other DDL tests.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@pkj415 pkj415 added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Nov 16, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Nov 16, 2023
@pkj415 pkj415 moved this to Pending in Wait-Queue Based Locking Nov 22, 2023
samiahmedsiddiqui pushed a commit to samiahmedsiddiqui/yugabyte-db that referenced this issue Nov 23, 2023
…version

Summary:
To reproduce the bug:

1. Create a cluster and convert it into per-database catalog version mode
```
yugabyte=# select * from pg_yb_catalog_version;
 db_oid | current_version | last_breaking_version
--------+-----------------+-----------------------
      1 |               1 |                     1
  13242 |               1 |                     1
  13243 |               1 |                     1
  13245 |               1 |                     1
  13246 |               1 |                     1
(5 rows)
```
The above query output indicates that the cluster is running in per-database
catalog version mode.

2. Run the following DDL global-impact DDL statement
```
yugabyte=# ALTER ROLE yugabyte SUPERUSER;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

```

In the PG log:

```
F1122 01:04:40.396021 15991 pg_txn_manager.cc:346] Check failed: !IsDdlMode() READ COMMITTED semantics don't apply to DDL transactions
```

This happens when we call `yb_increment_all_db_catalog_versions` to increment
catalog versions for all databases. This is done inside function
`YBDecrementDdlNestingLevel` which first does
```
    --ddl_transaction_state.nesting_level;
```
and then if `ddl_transaction_state.nesting_level == 0` it calls
`yb_increment_all_db_catalog_versions` if the current DDL statement has global
impact. In `GetTransactionSnapshot` we have
```
       if (IsYBReadCommitted() && YBGetDdlNestingLevel() == 0)
```
The function `yb_increment_all_db_catalog_versions` is still called as part of
the DDL statement, but `ddl_transaction_state.nesting_level` is already 0 so
`YBGetDdlNestingLevel()` returns 0. As a result the above `if` condition is true
and the DDL statement is treated as a non-DDL statement, eventually leading to
the assertion failure.

The DDL mode is already transferred to pggate when we call
`YBCPgEnterSeparateDdlTxnMode`. So I added `YBCPgIsDdlMode()` and uses it to
replace `YBGetDdlNestingLevel() == 0`.

I reverted some work arounds in `pg_catalog_version-test.cc` which used snapshot
isolation because otherwise those unit tests would fail under read committed
isolation.

Test Plan: ./yb_build.sh debug --cxx-test pg_catalog_version-test

Reviewers: tvesely, pjain

Reviewed By: pjain

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D30417
pkj415 added a commit that referenced this issue Nov 30, 2023
Summary:
TestPgRegressIndex rarely fails with the following errror if
READ COMMITTED isolation is enabled:

```
jenkins@jenkins-master ~/jobs/github-yugabyte-db-alma8-master-gcc11-fastdebug/builds$ find -name '*fatal_failure_details*txt' | xargs grep -l 'Restarting a DDL transaction not supported'
./4258/archive/java/yb-pgsql/target/surefire-reports_org.yb.pgsql.TestPgRegressIndex__testPgRegressIndex/org.yb.pgsql.TestPgRegressIndex.testPgRegressIndex.fatal_failure_details.ts-1.127.52.22.121-port13683.2023-11-19T05_58_16.pid13287.txt
./4248/archive/java/yb-pgsql/target/surefire-reports_org.yb.pgsql.TestPgRegressIndex__testPgRegressIndex/org.yb.pgsql.TestPgRegressIndex.testPgRegressIndex.fatal_failure_details.ts-1.127.29.54.114-port15898.2023-11-18T00_42_03.pid223695.txt
./4262/archive/java/yb-pgsql/target/surefire-reports_org.yb.pgsql.TestPgRegressIndex__testPgRegressIndex/org.yb.pgsql.TestPgRegressIndex.testPgRegressIndex.fatal_failure_details.ts-1.127.206.118.77-port16906.2023-11-20T11_42_24.pid40994.txt
```

Disabling read committed in this test to avoid this. Added a
TODO to re-enable read committed for this test as part of
#19975.

Test Plan: ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressIndex'

Reviewers: jason, tvesely

Reviewed By: tvesely

Subscribers: tvesely, yql

Differential Revision: https://phorge.dev.yugabyte.com/D30509
myang2021 added a commit that referenced this issue Dec 7, 2023
…atalog version

Summary:
To reproduce the bug:

1. Create a cluster and convert it into per-database catalog version mode
```
yugabyte=# select * from pg_yb_catalog_version;
 db_oid | current_version | last_breaking_version
--------+-----------------+-----------------------
      1 |               1 |                     1
  13242 |               1 |                     1
  13243 |               1 |                     1
  13245 |               1 |                     1
  13246 |               1 |                     1
(5 rows)
```
The above query output indicates that the cluster is running in per-database
catalog version mode.

2. Run the following DDL global-impact DDL statement
```
yugabyte=# ALTER ROLE yugabyte SUPERUSER;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

```

In the PG log:

```
F1122 01:04:40.396021 15991 pg_txn_manager.cc:346] Check failed: !IsDdlMode() READ COMMITTED semantics don't apply to DDL transactions
```

This happens when we call `yb_increment_all_db_catalog_versions` to increment
catalog versions for all databases. This is done inside function
`YBDecrementDdlNestingLevel` which first does
```
    --ddl_transaction_state.nesting_level;
```
and then if `ddl_transaction_state.nesting_level == 0` it calls
`yb_increment_all_db_catalog_versions` if the current DDL statement has global
impact. In `GetTransactionSnapshot` we have
```
       if (IsYBReadCommitted() && YBGetDdlNestingLevel() == 0)
```
The function `yb_increment_all_db_catalog_versions` is still called as part of
the DDL statement, but `ddl_transaction_state.nesting_level` is already 0 so
`YBGetDdlNestingLevel()` returns 0. As a result the above `if` condition is true
and the DDL statement is treated as a non-DDL statement, eventually leading to
the assertion failure.

The DDL mode is already transferred to pggate when we call
`YBCPgEnterSeparateDdlTxnMode`. So I added `YBCPgIsDdlMode()` and uses it to
replace `YBGetDdlNestingLevel() == 0`.

I reverted some work arounds in `pg_catalog_version-test.cc` which used snapshot
isolation because otherwise those unit tests would fail under read committed
isolation.

Original commit: 417e2de / D30417

Test Plan: ./yb_build.sh debug --cxx-test pg_catalog_version-test

Reviewers: tvesely, pjain

Reviewed By: tvesely

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D30840
@robertsami robertsami assigned pkj415 and unassigned tvesely Dec 13, 2023
@sushantrmishra sushantrmishra removed the status/awaiting-triage Issue awaiting triage label Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue
Projects
Status: Pending
Development

No branches or pull requests

4 participants