Skip to content

Commit

Permalink
[#21841] YSQL: Disable loading replication slots from disk on startup
Browse files Browse the repository at this point in the history
Summary:
In PG, the replication slot metadata, snapshots and spilled transactions are stored in the `pg_replslot/{slot_name}` directory. Upon server
startup, the metadata is read from the disk and loaded into shared memory.

In YB, we only utilize the disk for storing spilled transactions during streaming (https://phorge.dev.yugabyte.com/D33750). This revision disables loading the replication slots from the disk as the metadata doesn't exist so it leads to failures such as #21841. Note that in YB, the replication slot metadata is stored in yb-master and loaded on demand when asked to stream.
Jira: DB-10741

Test Plan:
Jenkins: test regex: .*ReplicationSlot.*

No automated test for this as our mini cluster framework doesn't support restarting PG with existing data.

Did manual testing with the following steps:

1. Create cluster

```
bin/yugabyted start --ui=false --advertise_address=127.0.0.1 --master_flags="yb_enable_cdc_consistent_snapshot_streams=true,allowed_preview_flags_csv={yb_enable_cdc_consistent_snapshot_streams,ysql_yb_enable_replication_commands,ysql_yb_enable_replica_identity},ysql_yb_enable_replication_commands=true,ysql_TEST_enable_replication_slot_consumption=true,ysql_yb_enable_replica_identity=true" --tserver_flags="allowed_preview_flags_csv={yb_enable_cdc_consistent_snapshot_streams,ysql_yb_enable_replication_commands,ysql_yb_enable_replica_identity},ysql_yb_enable_replication_commands=true,yb_enable_cdc_consistent_snapshot_streams=true,ysql_TEST_enable_replication_slot_consumption=true,ysql_cdc_active_replication_slot_window_ms=0,ysql_sequence_cache_method=server,ysql_yb_enable_replica_identity=true"
```

2. Create replication slot and start streaming from ysqlsh. It fails as ysqlsh does not support streaming but it at least leads to the creation of the directory in the file system

```
> bin/ysqlsh "dbname=yugabyte replication=database"
ysqlsh (11.2-YB-2.23.0.0-b0)
Type "help" for help.

yugabyte=# CREATE_REPLICATION_SLOT test_slot LOGICAL pgoutput;
 slot_name | consistent_point |    snapshot_name    | output_plugin
-----------+------------------+---------------------+---------------
 test_slot | 0/2              | 7013592194282041344 | pgoutput
(1 row)

yugabyte=# START_REPLICATION SLOT test_slot LOGICAL 0/2;
WARNING:  Virtual WAL instance not found for the session_id: 1
ERROR:  client sent proto_version=0 but we only support protocol 1 or higher
CONTEXT:  slot "test_slot", output plugin "pgoutput", in the startup callback
```

3. Validate that the directory exists

```
> ls ~/var/data/pg_data/pg_replslot
test_slot
```

4. Stop the cluster and start it again

```
bin/yugabyted stop

bin/yugabyted start --ui=false --advertise_address=127.0.0.1 --master_flags="yb_enable_cdc_consistent_snapshot_streams=true,allowed_preview_flags_csv={yb_enable_cdc_consistent_snapshot_streams,ysql_yb_enable_replication_commands,ysql_yb_enable_replica_identity},ysql_yb_enable_replication_commands=true,ysql_TEST_enable_replication_slot_consumption=true,ysql_yb_enable_replica_identity=true" --tserver_flags="allowed_preview_flags_csv={yb_enable_cdc_consistent_snapshot_streams,ysql_yb_enable_replication_commands,ysql_yb_enable_replica_identity},ysql_yb_enable_replication_commands=true,yb_enable_cdc_consistent_snapshot_streams=true,ysql_TEST_enable_replication_slot_consumption=true,ysql_cdc_active_replication_slot_window_ms=0,ysql_sequence_cache_method=server,ysql_yb_enable_replica_identity=true"
```

5. Validate that the startup worked. Without the fix it fails, with the fix it passes

```
> bin/ysqlsh "dbname=yugabyte replication=database"
```

Reviewers: asrinivasan

Reviewed By: asrinivasan

Subscribers: ycdcxcluster, yql

Differential Revision: https://phorge.dev.yugabyte.com/D33876
  • Loading branch information
dr0pdb committed Apr 8, 2024
1 parent 15ec688 commit f2a2821
Showing 1 changed file with 11 additions and 2 deletions.
13 changes: 11 additions & 2 deletions src/postgres/src/backend/replication/slot.c
Original file line number Diff line number Diff line change
Expand Up @@ -1288,8 +1288,17 @@ StartupReplicationSlots(void)
continue;
}

/* looks like a slot in a normal state, restore */
RestoreSlotFromDisk(replication_de->d_name);
/*
* YB Note: We do not store the replication slot metadata on disk. This
* directory is only used for storing spilled large txns by the
* reorderbuffer. Our source of truth for replication slots is
* yb-master, so we disable loading the slot from disk here.
*/
if (!YBIsEnabledInPostgresEnvVar())
{
/* looks like a slot in a normal state, restore */
RestoreSlotFromDisk(replication_de->d_name);
}
}
FreeDir(replication_dir);

Expand Down

0 comments on commit f2a2821

Please sign in to comment.