Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] flaky test: org.yb.pgsql.TestPgRegressIndex.testPgRegressIndex #16408

Closed
1 task done
bmatican opened this issue Mar 13, 2023 · 4 comments
Closed
1 task done

[YSQL] flaky test: org.yb.pgsql.TestPgRegressIndex.testPgRegressIndex #16408

bmatican opened this issue Mar 13, 2023 · 4 comments
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug kind/failing-test Tests and testing infra priority/high High Priority

Comments

@bmatican
Copy link
Contributor

bmatican commented Mar 13, 2023

Jira Link: DB-5819

Description

Keeps popping up on per-diff Detective for 2-3 build types.

report

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@bmatican bmatican added area/ysql Yugabyte SQL (YSQL) kind/failing-test Tests and testing infra status/awaiting-triage Issue awaiting triage labels Mar 13, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue and removed status/awaiting-triage Issue awaiting triage labels Mar 13, 2023
@m-iancu m-iancu assigned jasonyb and unassigned m-iancu May 24, 2023
@m-iancu m-iancu assigned amitanandaiyer and unassigned jasonyb Jul 13, 2023
@m-iancu
Copy link
Contributor

m-iancu commented Jul 13, 2023

Re-assigning to @amitanandaiyer as based on some of the error messages this looks to be caused by #17342.
Will review test status after the fix for that lands to confirm (and close or re0assign as needed).

@m-iancu m-iancu assigned jasonyb and andrei-mart and unassigned amitanandaiyer Sep 20, 2023
@jasonyb
Copy link
Contributor

jasonyb commented Oct 31, 2023

As of 2023-10-30, frequently failing tests are yb_pg_indexing (sometimes) and yb_reindex. At the time of the original report, this test was flaky for other reasons (I believe it was transactions related).

  • yb_reindex: started failing after 465ee2c.
  • yb_pg_indexing: likely started failing after commit 25f5fb1. Segfault stack below.
Stack trace of thread 3071708:                                                                                                                                                                                                                                                                                                                               
#0  0x00007f80a9770acf raise (libc.so.6)                                                                                                                                                                                                                                                                                                                     
#1  0x00007f80a9743ea5 abort (libc.so.6)                                                                                                                                                                                                                                                                                                                     
#2  0x0000000000a4344f ExceptionalCondition (postgres)                                                                                                                                                                                                                                                                                                       
#3  0x0000000000a4311a YbPgInheritsCacheDelete (postgres)                                                                                                                                                                                                                                                                                                    
#4  0x0000000000a4330a YbPgInheritsCacheInvalidate (postgres)                                                                                                                                                                                                                                                                                                
#5  0x0000000000a433ab YbPgInheritsCacheRelCallback (postgres)                                                                                                                                                                                                                                                                                               
#6  0x0000000000a22e5d LocalExecuteInvalidationMessage (postgres)                                                                                                                                                                                                                                                                                            
#7  0x0000000000a21c59 ProcessInvalidationMessages (postgres)                                                                                                                                                                                                                                                                                                
#8  0x0000000000a22537 CommandEndInvalidationMessages (postgres)                                                                                                                                                                                                                                                                                             
#9  0x000000000055c05e AtCCI_LocalCache (postgres)                                                                                                                                                                                                                                                                                                           
#10 0x00000000005a0024 deleteOneObject (postgres)                                                                                                                                                                                                                                                                                                            
#11 0x00000000005a00dc deleteObjectsInList (postgres)                                                                                                                                                                                                                                                                                                        
#12 0x00000000005a02a4 performMultipleDeletions (postgres)                                                                                                                                                                                                                                                                                                   
#13 0x00000000006c5604 RemoveRelations (postgres)                                                                                                                                                                                                                                                                                                            
#14 0x00000000008f95ef ExecDropStmt (postgres)                                                                                                                                                                                                                                                                                                               
#15 0x00000000008fcb98 ProcessUtilitySlow (postgres)                                                                                                                                                                                                                                                                                                         
#16 0x00000000008fb41d standard_ProcessUtility (postgres)                                                                                                                                                                                                                                                                                                    
#17 0x00000000008fb6db YBProcessUtilityDefaultHook (postgres)                                                                                                                                                                                                                                                                                                
#18 0x00007f80981fb681 pgss_ProcessUtility (pg_stat_statements.so)                                                                                                                                                                                                                                                                                           
#19 0x00007f80ab900540 ybpgm_ProcessUtility (/PATH/TO/REPO/build/fastdebug-gcc11-dynamic-ninja/postgres/lib/yb_pg_metrics.so)                                                                                                                                                                                               
#20 0x00007f80981d4755 pgaudit_NextProcessUtility_hook (pgaudit.so)                                                                                                                                                                                                                                                                                          
#21 0x00007f80981d5e7a pgaudit_ProcessUtility_hook (pgaudit.so)                                                                                                                                                                                                                                                                                              
#22 0x00007f80981c0b07 pg_hint_plan_ProcessUtility (pg_hint_plan.so)                                                                                                                                                                                                                                                                                         
#23 0x0000000000a7e7a6 YBTxnDdlProcessUtility (postgres)                                                                                                                                                                                                                                                                                                     
#24 0x00000000008fb71a ProcessUtility (postgres)                                                                                                                                                                                                                                                                                                             
#25 0x00000000008f72ec PortalRunUtility (postgres)                                                                                                                                                                                                                                                                                                           
#26 0x00000000008f7d6e PortalRunMulti (postgres)                                                                                                                                                                                                                                                                                                             
#27 0x00000000008f8bde PortalRun (postgres)                                                                                                                                                                                                                                                                                                                  
#28 0x00000000008f34eb exec_simple_query (postgres)                                                                                                                                                                                                                                                                                                          
#29 0x00000000008f075a yb_exec_query_wrapper_one_attempt (postgres)                                                                                                                                                                                                                                                                                          
#30 0x00000000008f1f74 yb_exec_query_wrapper (postgres)                                                                                                                                                                                                                                                                                                      
#31 0x00000000008f1fc8 yb_exec_simple_query (postgres)
#32 0x00000000008f49a7 PostgresMain (postgres)
#33 0x0000000000854a2b BackendRun (postgres)
#34 0x0000000000856eb3 PostmasterMain (postgres)
#35 0x000000000079fb34 PostgresServerProcessMain (postgres)
#36 0x000000000079fb54 main (postgres)
#37 0x00007f80a975cd85 __libc_start_main (libc.so.6)
#38 0x00000000004a88be _start (postgres)

@jasonyb
Copy link
Contributor

jasonyb commented Nov 9, 2023

The reindex test failure might be an actual regression.

Here is a test case: derived from yb_reindex.sql

CREATE TEMP TABLE tmp (i int PRIMARY KEY, j int);
CREATE INDEX ON tmp (j);
INSERT INTO tmp SELECT g, -g FROM generate_series(1, 10) g;
-- Disable reads/writes to the index.
UPDATE pg_index SET indislive = false, indisready = false, indisvalid = false
    WHERE indexrelid = 'tmp_j_idx'::regclass;
--- Force cache refresh.
SELECT * from pg_yb_catalog_version;
SET yb_non_ddl_txn_for_sys_tables_allowed TO on;
UPDATE pg_yb_catalog_version SET current_version = current_version + 1;
UPDATE pg_yb_catalog_version SET last_breaking_version = current_version;
RESET yb_non_ddl_txn_for_sys_tables_allowed;
SELECT * from pg_yb_catalog_version;

Run with debugger breakpoint b nodeModifyTable.c:1731: UPDATE tmp SET i = 11 WHERE j = -5;

For recent master cd69a84,

Thread 1 "postgres" hit Breakpoint 1, ExecUpdate (mtstate=mtstate@entry=0x4a77e4d14a0, tupleid=tupleid@entry=0x7ffc63ccf7ca, oldtuple=oldtuple@entry=0x0, slot=0x4a77d968dd8, 
    planSlot=planSlot@entry=0x4a77e4d1df0, epqstate=epqstate@entry=0x4a77e4d1568, estate=0x4a77e4d0120, canSetTag=true)
    at ../../../../../../src/postgres/src/backend/executor/nodeModifyTable.c:1731
1731                    if (resultRelInfo->ri_NumIndices > 0 && !HeapTupleIsHeapOnly(tuple))
(gdb) p *tuple
$5 = {t_len = 32, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 15}, t_tableOid = 16460, t_ybctid = 0, t_data = 0x4a77f1fd078}
(gdb) p *tuple->t_data
$6 = {t_choice = {t_heap = {t_xmin = 15, t_xmax = 0, t_field3 = {t_cid = 0, t_xvac = 0}}, t_datum = {datum_len_ = 15, datum_typmod = 0, datum_typeid = 0}}, t_ctid = {ip_blkid = {
      bi_hi = 65535, bi_lo = 65535}, ip_posid = 0}, t_infomask2 = 32770, t_infomask = 10240, t_hoff = 24 '\030', t_bits = 0x4a77f1fd08f ""}

For recent 2.18 7c9798f,

Thread 1 "postgres" hit Breakpoint 1, ExecUpdate (mtstate=mtstate@entry=0x24c83f9c86a0, tupleid=tupleid@entry=0x7ffe232ce9ba, oldtuple=oldtuple@entry=0x0, slot=0x24c83f482040, 
    planSlot=planSlot@entry=0x24c83f9c8ff0, epqstate=epqstate@entry=0x24c83f9c8768, estate=0x24c83f9c8120, canSetTag=true)
    at ../../../../../../src/postgres/src/backend/executor/nodeModifyTable.c:1731
1731            if (resultRelInfo->ri_NumIndices > 0 && !HeapTupleIsHeapOnly(tuple))
(gdb) p *tuple
$2 = {t_len = 32, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 11}, t_tableOid = 16386, t_ybctid = 0, t_data = 0x24c83f482968}
(gdb) p *tuple->t_data
$3 = {t_choice = {t_heap = {t_xmin = 4, t_xmax = 0, t_field3 = {t_cid = 0, t_xvac = 0}}, t_datum = {datum_len_ = 4, datum_typmod = 0, datum_typeid = 0}}, t_ctid = {ip_blkid = {
      bi_hi = 65535, bi_lo = 65535}, ip_posid = 0}, t_infomask2 = 2, t_infomask = 10240, t_hoff = 24 '\030', t_bits = 0x24c83f48297f ""}

Notice t_infomask2 differs

  • master: 0x8002
  • 2.18: 0x0002

HeapTupleIsHeapOnly call diverges because of that difference. In master, it no longer goes inside the if. I don't see much difference between the two paths here, but I suspect some other area depending on HeapTupleIsHeapOnly does make a difference.

In upstream PG 15.2, the line moved somewhere else: b heapam_handler.c:339

Breakpoint 1, heapam_tuple_update (relation=0x7f3ddff5ed28, otid=0x7ffce8253e92, slot=0x1d47a60, cid=0, snapshot=<optimized out>, crosscheck=0x0, wait=true, tmfd=0x7ffce8253ef0, lockmode=0x7ffce8253dec, update_indexes=0x7ffce8253de9) at heapam_handler.c:339
warning: Source file is more recent than executable.
339             *update_indexes = result == TM_Ok && !HeapTupleIsHeapOnly(tuple);
(gdb) p *tuple
$3 = {t_len = 32, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 11}, t_tableOid = 16850, t_data = 0x1d370c0}
(gdb) p *tuple->t_data
$4 = {t_choice = {t_heap = {t_xmin = 902, t_xmax = 0, t_field3 = {t_cid = 0, t_xvac = 0}}, t_datum = {datum_len_ = 902, datum_typmod = 0, datum_typeid = 0}}, t_ctid = {ip_blkid = {bi_hi = 65535, bi_lo = 65535}, ip_posid = 0}, t_infomask2 = 2, t_infomask = 10240, t_hoff = 24 '\030', t_bits = 0x1d370d7 ""}

Notice t_infomask2 is 2, matching 2.18. So something happened in master that likely messed up this field.

@jasonyb
Copy link
Contributor

jasonyb commented Nov 15, 2023

yb_reindex failure is a catalog_version and cache issue.

Originating from commit 6fec2ec, the condition for doing YBCPgSetCatalogCacheVersion has always had ybc_fdw set catalog version in request but other scans like index scan, index only scan (, and later ybc_remote_scan) not set catalog version for system rel requests.

ybcBeginScan:

/*
 * Set the current syscatalog version (will check that we are up to date).
 * Avoid it for syscatalog tables so that we can still use this for
 * refreshing the caches when we are behind.
 * Note: This works because we do not allow modifying schemas (alter/drop)
 * for system catalog tables.
 */
if (!IsSystemRelation(rel))

ybcBeginForeignScan:

/* Set the current syscatalog version (will check that we are up to date) */

Since Andrei's commit removes foreign scan, direct system table reads no longer use foreign scan and instead use yb seq scan. So they don't send catalog version and don't notice catalog version mismatch. Here is the key snippet:

SET yb_non_ddl_txn_for_sys_tables_allowed TO on;
UPDATE pg_yb_catalog_version SET current_version = current_version + 1;
UPDATE pg_yb_catalog_version SET last_breaking_version = current_version;
RESET yb_non_ddl_txn_for_sys_tables_allowed;
SELECT distinct(current_version = last_breaking_version) from pg_yb_catalog_version;
-- Show the corruption.
/*+SeqScan(tmp) */
SELECT i FROM tmp WHERE j = -5;
/*+IndexScan(tmp_j_idx) */
SELECT i FROM tmp WHERE j = -5;

Before, SELECT from pg_yb_catalog_version gets catalog version mismatch and causes remaining scans to operate on up-to-date cache. After, it doesn't notice mismatch (except for the rare timing where catalog version propogates through heartbeat fast enough), and temp table scans also don't notice since they don't reach out to master/tserver, which is where catalog version mismatch checks happen. Putting a sleep before the cache-dependent select (the last select) causes the issue to go away. Putting an EXPLAIN instead almost always shows sequential scan being chosen instead of index scan because, operating off an old cache, it thinks indislive, indisvalid, indisready are still false.

One fix in this case is to send catalog version in the system relation requests (it seems to me the comment justification is outdated). But if the SELECT to pg_yb_catalog_version never existed, then this would be a problem even with that fix. If we accept that the catalog changes can propagate slowly over heartbeat, then having a command to explicitly clear caches would be nice (though we have to be careful about tserver response cache, so this command should either clear both caches or clear just pg cache but also get latest catalog version from master -- though, now that I think about it, a more direct approach may be a command that force rechecks with master's catalog version which can hook up with the existing refresh logic). If we do not accept cases where queries are completely local and can get away with not checking catalog version, then a different more proper solution should be done (there are other similar cases besides select from temp table cc: @deeps1991).

pkj415 added a commit that referenced this issue Nov 30, 2023
Summary:
TestPgRegressIndex rarely fails with the following errror if
READ COMMITTED isolation is enabled:

```
jenkins@jenkins-master ~/jobs/github-yugabyte-db-alma8-master-gcc11-fastdebug/builds$ find -name '*fatal_failure_details*txt' | xargs grep -l 'Restarting a DDL transaction not supported'
./4258/archive/java/yb-pgsql/target/surefire-reports_org.yb.pgsql.TestPgRegressIndex__testPgRegressIndex/org.yb.pgsql.TestPgRegressIndex.testPgRegressIndex.fatal_failure_details.ts-1.127.52.22.121-port13683.2023-11-19T05_58_16.pid13287.txt
./4248/archive/java/yb-pgsql/target/surefire-reports_org.yb.pgsql.TestPgRegressIndex__testPgRegressIndex/org.yb.pgsql.TestPgRegressIndex.testPgRegressIndex.fatal_failure_details.ts-1.127.29.54.114-port15898.2023-11-18T00_42_03.pid223695.txt
./4262/archive/java/yb-pgsql/target/surefire-reports_org.yb.pgsql.TestPgRegressIndex__testPgRegressIndex/org.yb.pgsql.TestPgRegressIndex.testPgRegressIndex.fatal_failure_details.ts-1.127.206.118.77-port16906.2023-11-20T11_42_24.pid40994.txt
```

Disabling read committed in this test to avoid this. Added a
TODO to re-enable read committed for this test as part of
#19975.

Test Plan: ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressIndex'

Reviewers: jason, tvesely

Reviewed By: tvesely

Subscribers: tvesely, yql

Differential Revision: https://phorge.dev.yugabyte.com/D30509
jasonyb pushed a commit that referenced this issue Dec 7, 2023
Summary:
Test TestPgRegressIndex has been often failing since commit
465ee2c, titled

    [#18082] YSQL: Stop using ForeignScan for YB relations

That commit changes user-initiated system table requests such as

    SELECT distinct(current_version = last_breaking_version) from pg_yb_catalog_version;

to use YbSeqScan rather than ForeignScan.  A difference between the two
is that YbSeqScan does not set catalog_version in the request to DocDB.
The function ybcBeginScan, which is shared by both YbSeqScan and
internal system table scans, has logic to avoid setting catalog_version
for system tables.  This logic has been in place so that the internal
system table scans don't hit catalog version mismatch error (which is
actually less correct, but that is a discussion for another day).

Therefore, on server-side (in this case, master), the catalog version
check does not happen for this query, so catalog version mismatch is not
detected.  Then, the subsequent queries almost always run off an
outdated catalog because the true catalog version doesn't propagate fast
enough. The test relies on these queries to execute with an up-to-date
catalog, so this results in failure.

A simple fix for the test is to add sleeps so that the catalog version
can propagate.  This strategy is already being used by other tests, and
this situation could be treated as not much different.

Instead, bring back the old behavior of sending ysql_catalog_version in
user-initiated system table requests.  Do so by further requiring the
scans to be internally-generated in order to skip setting
catalog_version.  The IsSystemRelation condition is still needed because
internally-generated scans can scan user tables, such as in index build.

Add the assumptions about queries causing catalog version mismatch and
cache refresh in the test.

Leave issue #16408 open since the overall java test is still failing due
to issue #19807 among other reasons.

Jira: DB-8984

Test Plan:
TestPgRegressIndex is flaky for multiple reasons, such as

- issue #19807
- DropTable RPC timed out
- expired or aborted by a conflict: 40001
- Restart read required at
- could not serialize access due to concurrent update

Manually check the following:

    ./yb_build.sh fastdebug --gcc11 \
      --java-test TestPgRegressIndex -n 10

Backport-through: 2.20
Close: #20017

Reviewers: myang

Reviewed By: myang

Subscribers: kfranz, amartsinchyk, yql

Differential Revision: https://phorge.dev.yugabyte.com/D30412
jasonyb pushed a commit that referenced this issue Dec 8, 2023
Summary:
Test TestPgRegressIndex has been often failing since commit
465ee2c, titled

    [#18082] YSQL: Stop using ForeignScan for YB relations

That commit changes user-initiated system table requests such as

    SELECT distinct(current_version = last_breaking_version) from pg_yb_catalog_version;

to use YbSeqScan rather than ForeignScan.  A difference between the two
is that YbSeqScan does not set catalog_version in the request to DocDB.
The function ybcBeginScan, which is shared by both YbSeqScan and
internal system table scans, has logic to avoid setting catalog_version
for system tables.  This logic has been in place so that the internal
system table scans don't hit catalog version mismatch error (which is
actually less correct, but that is a discussion for another day).

Therefore, on server-side (in this case, master), the catalog version
check does not happen for this query, so catalog version mismatch is not
detected.  Then, the subsequent queries almost always run off an
outdated catalog because the true catalog version doesn't propagate fast
enough. The test relies on these queries to execute with an up-to-date
catalog, so this results in failure.

A simple fix for the test is to add sleeps so that the catalog version
can propagate.  This strategy is already being used by other tests, and
this situation could be treated as not much different.

Instead, bring back the old behavior of sending ysql_catalog_version in
user-initiated system table requests.  Do so by further requiring the
scans to be internally-generated in order to skip setting
catalog_version.  The IsSystemRelation condition is still needed because
internally-generated scans can scan user tables, such as in index build.

Add the assumptions about queries causing catalog version mismatch and
cache refresh in the test.

Leave issue #16408 open since the overall java test is still failing due
to issue #19807 among other reasons.

Jira: DB-8984

Test Plan:
TestPgRegressIndex is flaky for multiple reasons, such as

- issue #19807
- DropTable RPC timed out
- expired or aborted by a conflict: 40001
- Restart read required at
- could not serialize access due to concurrent update

Manually check the following:

    ./yb_build.sh fastdebug --gcc11 \
      --java-test TestPgRegressIndex -n 10

Backport-through: 2.20
Original commit: 1350cef / D30412

Reviewers: myang

Reviewed By: myang

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D30851
@m-iancu m-iancu assigned myang2021 and unassigned jasonyb May 14, 2024
myang2021 added a commit that referenced this issue Jun 24, 2024
…t output

Summary:
The unit test org.yb.pgsql.TestPgRegressIndex.testPgRegressIndex is always
failing now because the test output needs to be updated due to a recently
introduced table rewrite NOTICE output:

```
NOTICE:  table rewrite may lead to inconsistencies
DETAIL:  Concurrent DMLs may not be reflected in the new table.
HINT:  See #19860. Set 'ysql_suppress_unsafe_alter_notice' yb-tserver gflag to true to suppress this notice.
```

In addition, the test output itself isn't reliable because of the following
error for statement `drop index idxpart0_pkey;`:

```
ERROR:  cannot drop index idxpart0_pkey because index idxpart_pkey requires it
HINT:  You can drop index idxpart_pkey instead.
```
vs
```
ERROR:  cannot drop index idxpart0_pkey because constraint idxpart0_pkey on table idxpart0 requires it
HINT:  You can drop constraint idxpart0_pkey on table idxpart0 instead.
```

Both are possible and which one gets reported depends upon the order in the
pg_depend scan output. Because pg_depend does not have a primary key, it uses
YB generated uuid and therefore the scan output order isn't fixed.

I updated the expected test output.
Jira: DB-5819

Test Plan: ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressIndex#testPgRegressIndex'

Reviewers: fizaa

Reviewed By: fizaa

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D36068
myang2021 added a commit that referenced this issue Jun 27, 2024
…ndex expected test output

Summary:
The unit test org.yb.pgsql.TestPgRegressIndex.testPgRegressIndex is always
failing now because the test output needs to be updated due to a recently
introduced table rewrite NOTICE output:

```
NOTICE:  table rewrite may lead to inconsistencies
DETAIL:  Concurrent DMLs may not be reflected in the new table.
HINT:  See #19860. Set 'ysql_suppress_unsafe_alter_notice' yb-tserver gflag to true to suppress this notice.
```

In addition, the test output itself isn't reliable because of the following
error for statement `drop index idxpart0_pkey;`:

```
ERROR:  cannot drop index idxpart0_pkey because index idxpart_pkey requires it
HINT:  You can drop index idxpart_pkey instead.
```
vs
```
ERROR:  cannot drop index idxpart0_pkey because constraint idxpart0_pkey on table idxpart0 requires it
HINT:  You can drop constraint idxpart0_pkey on table idxpart0 instead.
```

Both are possible and which one gets reported depends upon the order in the
pg_depend scan output. Because pg_depend does not have a primary key, it uses
YB generated uuid and therefore the scan output order isn't fixed.

I updated the expected test output.
Jira: DB-5819

Original commit: e69dc27 / D36068

Test Plan: ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressIndex#testPgRegressIndex'

Reviewers: fizaa

Reviewed By: fizaa

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D36210
myang2021 added a commit that referenced this issue Jun 28, 2024
Summary:
The unit test org.yb.pgsql.TestPgRegressIndex.testPgRegressIndex oftens times
out with an error like:
```
org.junit.runners.model.TestTimedOutException: test timed out after 1800 seconds
```

I splitted the schedule `yb_index_serial_schedule` into two so that each has a
better chance not to time out. They each take about the same run time.
Jira: DB-5819

Test Plan:
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressIndex#testPgRegressIndex'
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressIndex#testPgRegressIndex2'

On my dev vm debug build, testPgRegressIndex runs about 18:30 minutes,
testPgRegressIndex2 runs about 19 minutes.

Reviewers: fizaa

Reviewed By: fizaa

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D36224
jasonyb pushed a commit that referenced this issue Jun 28, 2024
Summary:
 9c637e2 [PLAT-14429]: Modify Troubleshooting Platform registration workflow in YBA
 0a1406d [PLAT-14098]: Updating yb.runtime_conf_ui.tag_filter with appropriate tags value does not display the flags accordingly
 70a87f9 [PLAT-13605][PLAT-13609]: Edit Volume controls and storage type in FULL_MOVE but not in case of UPDATE
 26fbfe0 [PLAT-14515][UI] Clicking preview doesn't show the correct info and clears up the data provided while setting up the ysql_ident or ysql_hba multiline flags.- [PLAT-14514] [PLAT-14513]
 a07946b [#18233] Initial commit for yugabyted unit test framework.
 b2e8ee7 [#22842] docdb: Improve usability of stack trace tracking endpoints
 508f26e [docs] Added RN 2.20.2.3-b2 (#23042)
 214d44a [#22935] YSQL: Use db oid in the tserver's sequence cache entry key
 c47b2d9 [#22802] YSQL: Avoid renaming DocDb tables during legacy rewrite
 Excluded: 7c8343d [#22874] YSQL: Fix cascaded drops on columns
 58c8d4e [#23046] xCluster: Remove ns_replication from the code
 a70681d [#22816] YSQL: Bug fixes for replication connections in ysql connection manager
 b239e07 Doc upgrade versions (#22988)
 Excluded: ec76062 [#16408] YSQL: Split TestPgRegressIndex.testPgRegressIndex

Test Plan: Jenkins: rebase: pg15-cherrypicks

Reviewers: jason, tfoucher

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D36247
myang2021 added a commit that referenced this issue Jun 28, 2024
…stPgRegressIndex

Summary:
Merge YB master commit ec76062 titled

    [#16408] YSQL: Split TestPgRegressIndex.testPgRegressIndex

and committed 2024-06-28T16:41:49+00:00 into YB pg15.

- yb_index_serial_schedule
  - YB pg15 9690419 adds yb_index_including
    test. YB master ec76062 splitted
    yb_index_serial_schedule into yb_index_serial_schedule and
    yb_index_serial2_schedule. Resolved the conflict by adding yb_index_including
    into yb_index_serial_schedule and kept yb_index_serials_schedule
    unchanged.
Jira: DB-5819

Test Plan:
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressIndex#testPgRegressIndex'
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressIndex#testPgRegressIndex2'

Note that both tests fail. The purpose is to examine that both completed
with 30 minutes. I made a pg15 release build without my change to run

./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressIndex#testPgRegressIndex'

It also fails and those failed tests have identical test output so this diff
does not alter the test outputs.

Reviewers: jason, tfoucher

Reviewed By: jason

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D36252
myang2021 added a commit that referenced this issue Jul 1, 2024
…sIndex

Summary:
The unit test org.yb.pgsql.TestPgRegressIndex.testPgRegressIndex oftens times
out with an error like:
```
org.junit.runners.model.TestTimedOutException: test timed out after 1800 seconds
```

I splitted the schedule `yb_index_serial_schedule` into two so that each has a
better chance not to time out. They each take about the same run time.
Jira: DB-5819

Original commit: ec76062 / D36224

Test Plan:
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressIndex#testPgRegressIndex'
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressIndex#testPgRegressIndex2'

On my dev vm debug build, testPgRegressIndex runs about 18:30 minutes,
testPgRegressIndex2 runs about 19 minutes.

Reviewers: fizaa

Reviewed By: fizaa

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D36249
jasonyb pushed a commit that referenced this issue Jul 17, 2024
Summary:
 0c8e378 [#23216]yugabyted: Adding required models for source db details and target recommendation details sub pages.
 133ff1c [#23210] yugabyted: yugabyted collect_logs fails to gather logs when yugabyted is not running.
 08ca9dd [#23196] DocDB: Handle preview flags in ValidateFlagValue
 13c0ced [#23215] docdb: disable packed for colocated tables by default
 6146084 [doc][ybm] Reorg to Aeon authentication pages (#23226)
 Excluded: 56d2d2d [#16408] YSQL: add gflag TEST_generate_ybrowid_sequentially
 c8e7530 [PLAT-14672] getSessionInfo should return a valid api token
 3abd045 [#22594] YSQL: Fix flaky TestTransactionStatusTable.testCreation unit test
 d67ba12 [#22158] YSQL: Set local limit as safe time even when the read time is already set.
 1315a10 [docs] PG compatible logical replication architecture (#23220)
 88e92a0 [#23216] yugabyted: Adding a new field ObjectName to model SqlObjectMetadata.

Test Plan: Jenkins: rebase: pg15-cherrypicks

Reviewers: jason, tfoucher

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D36661
jasonyb pushed a commit that referenced this issue Jul 18, 2024
…owid_sequentially

Summary:
master: 56d2d2d

Motivation: many ported regress tests have tables with no PK, and
upstream PG often does SELECTs with no ORDER BY on these tables.
Upstream PG's ordering is consistent in that rows allocate a ctid
sequentially.  YB, on the other hand, randomly generates a UUID for
ybrowid, and furthermore, tables with no PK are HASH sorted.

Fix those two differences via a new tserver gflag
TEST_generate_ybrowid_sequentially.  The ybrowids are generated using
MonoTime::Now() serialized to 8 bytes compared to the usual 16 byte
UUID.  This works for regress tests as regress tests normally do not
concurrently write to the same table or use connections across different
nodes with clock skew.

There is another unresolved difference between PG and YB's ctid/ybrowid
allocation: in PG, UPDATEs reallocate ctid.  Since doing the same would
be very intrusive to the YB model of UPDATEs, leave this out.  Most
tests do not do selective UPDATEs anyway, and for the few cases that do,
they can resort to using a ybsort column.

Update a handful of tests, particularly changing those using the ybsort
workaround to no longer use it.  Of particular note is yb_pg_with, which
still needs the ybsort column for some tables due to some UPDATEs.

Ideally, all ported regress tests turn on this flag.  But the current
state of things makes it hard to separate ported tests from non-ported
ones.  Put that work off for later, particularly the pg15 branch, where
this code is already in place in commit
fedbdac.

For here in master (currently based on PG 11), fix just the
TestPgRegressIndex test using this flag.  Split it to TestPgRegressIndex
and TestPgRegressPgIndex for non-ported and ported tests respectively.
For the ported test, set the flag.

Also note that yb_pg_indexing's output changes for an error/hint
message because the pk-less table now has ASC ybrowid instead of HASH
ybrowid.  This causes a new code path to get taken which ends up calling
RelationGetIndexList:

    ATRewriteTables > make_new_heap > yb_copy_split_options > YbRelationSetNewRelfileNode > YbGetSplitOptions > RelationGetPrimaryKeyIndex > RelationGetIndexList

So when entering FetchUniqueConstraintName for forming the message, it
hits the else block because rd_pkindex is loaded since
RelationIdGetRelation now finds the index in the scan unlike before.
This is undesirable behavior since it differs from upstream PG, but
dealing with it is out of scope.
Jira: DB-5819

MERGE:

- yb_index_schedule: master 56d2d2d
  removes yb_index_serial_schedule and yb_index_serial2_schedule and
  makes yb_index_schedule and yb_pg_index_schedule.  pg15
  9690419 adds yb_index_including test
  to yb_index_serial_schedule.  Move that test to yb_index_schedule.

Test Plan:
    ./yb_build.sh fastdebug --gcc11 --java-test TestPgRegressIndex
    ./yb_build.sh fastdebug --gcc11 --java-test TestPgRegressPgIndex \
      -n 10 --tp 1

Jenkins: rebase: pg15-cherrypicks

Reviewers: tfoucher

Reviewed By: tfoucher

Subscribers: yql, tfoucher

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D36672
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug kind/failing-test Tests and testing infra priority/high High Priority
Projects
None yet
Development

No branches or pull requests

7 participants