Skip to content

Commit

Permalink
[#25015] YSQL: replace sleep in a test function
Browse files Browse the repository at this point in the history
Summary:
LibPqTestBase::BumpCatalogVersion has a Sleep(2s) that is intended to
wait for the recently bumped catalog version to propagate.  Whether it
intends for the catalog version to propagate to all tservers or just to
the connection conn, I do not know, because tests have varying
expectations.  For example, PgLibPqTest.StaleMasterReads does not seem
care at all whether it propagated because it only checks master's
catalog version.  And PgLibPqTest.CatalogCacheMemoryLeak already has a
sleep call after calling this function.  So it looks like no tests
actually care about the catalog version propagating.

It turns out, the 2s sleep is not sufficient when writing a test that
does care about the catalog version propagating, at least to conn.  It
fails 1/100 times on TSAN, and reducing it to 1.5s makes it fail 3/50
times.  To make the wait more reliable, have wait on conn's local
catalog version directly.  This function can then be used more reliably
in cases where conn needs to have up-to-date catalog version.
Jira: DB-14156

Test Plan:
    ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_backends-test
    ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_libpq-test

Close: #25015
Depends on D40110

Reviewers: myang

Reviewed By: myang

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D40115
  • Loading branch information
jaki committed Nov 21, 2024
1 parent 77958e8 commit 0f875f5
Show file tree
Hide file tree
Showing 2 changed files with 33 additions and 10 deletions.
41 changes: 32 additions & 9 deletions src/yb/yql/pgwrapper/libpq_test_base.cc
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@

#include "yb/common/common.pb.h"
#include "yb/common/pgsql_error.h"
#include "yb/util/backoff_waiter.h"
#include "yb/util/monotime.h"
#include "yb/util/size_literals.h"
#include "yb/yql/pgwrapper/libpq_utils.h"
Expand Down Expand Up @@ -90,22 +91,44 @@ Result<PgOid> GetDatabaseOid(PGConn* conn, const std::string& db_name) {
Format("SELECT oid FROM pg_database WHERE datname = '$0'", db_name));
}

Status LibPqTestBase::BumpCatalogVersion(int num_versions, PGConn* conn,
// Bump catalog version num_bumps times using conn. After each bump, wait for the new catalog
// version to propagate to conn in order to avoid catalog version mismatch errors.
// Prerequisites:
// - conn should not be in the middle of a transaction
// - there should be no other concurrent catalog version bumps
Status LibPqTestBase::BumpCatalogVersion(int num_bumps, PGConn* conn,
const std::string& alter_value) {
LOG(INFO) << "Do " << num_versions << " breaking catalog version bumps";
const auto query = "SELECT catalog_version FROM pg_stat_activity WHERE pid = pg_backend_pid()";
auto initial_catalog_version = VERIFY_RESULT(conn->FetchRow<int64_t>(query));
LOG(INFO) << "Do " << num_bumps << " breaking catalog version bumps starting at "
<< initial_catalog_version;
if (alter_value.empty()) {
for (int i = 0; i < num_versions; ++i) {
for (int i = 1; i <= num_bumps; ++i) {
RETURN_NOT_OK(IncrementAllDBCatalogVersions(
*conn, IsBreakingCatalogVersionChange::kTrue /* is_breaking */));
// Here we call increment_all_db_catalog_versions to bump the catalog version.
// Add a sleep for the new catalog version to propagate to avoid catalog version
// mismatch error when we have back-to-back calls to increment_all_db_catalog_versions.
SleepFor(2s);
auto target_catalog_version = initial_catalog_version + i;
RETURN_NOT_OK(LoggedWaitFor(
[conn, target_catalog_version, &query]() -> Result<bool> {
auto current_catalog_version = VERIFY_RESULT(conn->FetchRow<int64_t>(query));
if (current_catalog_version == target_catalog_version) {
return true;
}
if (current_catalog_version < target_catalog_version) {
return false;
}
return STATUS_FORMAT(
IllegalState,
"unexpected catalog version $0 > target $1:"
" does the test do concurrent DDLs without synchronization?",
current_catalog_version, target_catalog_version);
},
10s,
Format("wait for catalog version $0 to propagate", target_catalog_version)));
}
return Status::OK();
}
// Some tests cannot tolerate the added sleep if using increment_all_db_catalog_versions.
SCHECK_EQ(num_versions, 1, InvalidArgument, "cannot bump more than one version with alter_value");
// Some tests cannot tolerate the added wait if using increment_all_db_catalog_versions.
SCHECK_EQ(num_bumps, 1, InvalidArgument, "cannot bump more than one version with alter_value");
return conn->ExecuteFormat("ALTER ROLE yugabyte $0", alter_value);
}

Expand Down
2 changes: 1 addition & 1 deletion src/yb/yql/pgwrapper/libpq_test_base.h
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ class LibPqTestBase : public PgWrapperTestBase {
Result<PGConn> ConnectToDBWithReplication(const std::string& db_name);
void SerializableColoringHelper(int min_duration_seconds = 0);
static bool TransactionalFailure(const Status& status);
static Status BumpCatalogVersion(int num_versions, PGConn* conn,
static Status BumpCatalogVersion(int num_bumps, PGConn* conn,
const std::string& alter_value = "");
static void UpdateMiniClusterFailOnConflict(ExternalMiniClusterOptions* options);
};
Expand Down

0 comments on commit 0f875f5

Please sign in to comment.