tapdb: use the serializable isolation by default for postgres #334

Roasbeef · 2023-05-31T22:23:07Z

In this commit, we start to use the serializable snapshot isolation by default for postgres. For sqlite, this is a noop as only a single writer is allowed at any time. With this mode, we should guard against all types of anomalies, at the cost of extra locking and potential perf hits.

One thing we'll need to test out is if the DB driver will automatically do retries or not if a transaction cannot be serialized.

This is an attempt at fixing #333 at a global level.

Roasbeef · 2023-05-31T22:34:38Z

itest failure show we do indeed have an issue here:

2023-05-31 22:26:22.717 [WRN] UNIV: unable to log sync event: unknown postgres error: ERROR: null value in column "universe_root_id" violates not-null constraint (SQLSTATE 23502)

Roasbeef · 2023-05-31T23:17:38Z

Alternative path for the last commit: https://github.com/powerman/pqx/blob/a6a4bb664f620589441e7b28d20464134af550c2/serialize.go#L16

Roasbeef · 2023-06-01T00:08:44Z

itest failure show we do indeed have an issue here:

Fixed with the latest commit: we now retry (forever). We should log, and also consider an upper limit for retry as well.

jharveyb

Tested with #336, and confirmed these change fix the proof upload issues we observed before.

Need to apply the fixup commits but otherwise LGTM, nice catch on the default postgres modes 💯

guggero

Good catch! This should indeed make things safer.

Though with the current implementation I think we might get into weird situations with the combinations of goto and defer() (see inline comments).

guggero · 2023-06-05T14:44:04Z

tapdb/interfaces.go

+			// Roll back the transaction, then pop back up to try
+			// once again.
+			//
+			// TODO(roasbeef): needs retry limit


Should we fix this TODO to avoid endless re-tries?

Yeah I think so, open question: what's a good limit for retries? My strategy was going to be deploy the modified version in staging/testnet, then check the logs to see how often we retry. We have @jharveyb's mega mint script, so that can be used to calibrate.

guggero · 2023-06-05T14:47:50Z

tapdb/interfaces.go

@@ -100,6 +101,7 @@ func NewTransactionExecutor[Querier any](db BatchedQuerier,
 func (t *TransactionExecutor[Q]) ExecTx(ctx context.Context,
 	txOptions TxOptions, txBody func(Q) error) error {

+txStart:


This will cause a new defer with a rollback to be caused for each retry. And IIUC then Rollback() will return an error if it is called twice. So I think we need to roll this out into an actual loop, otherwise we might get weird side effects or errors.

I think we still need to rollback after each attempt, will do some doc/code digging.

Roasbeef · 2023-06-06T01:38:41Z

Tacked on a proper retry loop PTAL.

Also has logging now so we can see how many times things need to be retried in practice.

In this commit, we start to use the serializable snapshot isolation by default for postgres. For sqlite, this is a noop as only a single writer is allowed at any time. With this mode, we should guard against all types of anomalies, at the cost of extra locking and potential perf hits. One thing we'll need to test out is if the DB driver will automatically do retries or not if a transaction cannot be serialized. This is an attempt at fixing lightninglabs#333 at a global level.

In this commit, we introduce a new postgres specific error returned when a transaction cannot be serialized. We then catch this error in order to implement retry logic for a transaction that failed to be serialized. We'll retry up to 10 times, waiting a random amount up to 50 ms between each retry.

Roasbeef · 2023-06-06T01:56:25Z

Re calling rollback multiple times, looks like any second call will be a noop here due to the CAS: https://cs.opensource.google/go/go/+/refs/tags/go1.20.4:src/database/sql/sql.go;l=2279-2284

guggero

Very nice, LGTM 🎉

guggero · 2023-06-06T07:10:36Z

Re calling rollback multiple times, looks like any second call will be a noop here due to the CAS: https://cs.opensource.google/go/go/+/refs/tags/go1.20.4:src/database/sql/sql.go;l=2279-2284

Ah, yes. I overlooked that we just ignore the ErrTxDone in the defer call, so it's a no-op indeed.

jharveyb mentioned this pull request Jun 5, 2023

tapdb: properly set max connections for postgres #336

Merged

jharveyb self-requested a review June 5, 2023 14:38

jharveyb approved these changes Jun 5, 2023

View reviewed changes

guggero requested changes Jun 5, 2023

View reviewed changes

Roasbeef force-pushed the postgres-serializable branch 2 times, most recently from beb672a to 782c21b Compare June 6, 2023 01:37

Roasbeef requested a review from guggero June 6, 2023 01:38

Roasbeef force-pushed the postgres-serializable branch 2 times, most recently from 0b70655 to 28bad1f Compare June 6, 2023 01:43

Roasbeef added 2 commits June 5, 2023 18:52

Roasbeef force-pushed the postgres-serializable branch from 28bad1f to 5f11328 Compare June 6, 2023 01:56

guggero approved these changes Jun 6, 2023

View reviewed changes

guggero added this pull request to the merge queue Jun 6, 2023

Merged via the queue into lightninglabs:main with commit 192c67e Jun 6, 2023

guggero mentioned this pull request Jun 12, 2023

Add support for sqlite/postgres database backends lightninglabs/aperture#96

Merged

jharveyb mentioned this pull request Jun 12, 2023

universe: universe stats DB issues during universe sync #354

Closed

Roasbeef mentioned this pull request Aug 8, 2023

tapd/universe: fix potential concurrency issues related to concurrent SMT tree insertion #333

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tapdb: use the serializable isolation by default for postgres #334

tapdb: use the serializable isolation by default for postgres #334

Roasbeef commented May 31, 2023

Roasbeef commented May 31, 2023

Roasbeef commented May 31, 2023

Roasbeef commented Jun 1, 2023

jharveyb left a comment

guggero left a comment

guggero Jun 5, 2023

Roasbeef Jun 5, 2023

guggero Jun 5, 2023

Roasbeef Jun 5, 2023

Roasbeef commented Jun 6, 2023

Roasbeef commented Jun 6, 2023

guggero left a comment

guggero commented Jun 6, 2023

tapdb: use the serializable isolation by default for postgres #334

tapdb: use the serializable isolation by default for postgres #334

Conversation

Roasbeef commented May 31, 2023

Roasbeef commented May 31, 2023

Roasbeef commented May 31, 2023

Roasbeef commented Jun 1, 2023

jharveyb left a comment

Choose a reason for hiding this comment

guggero left a comment

Choose a reason for hiding this comment

guggero Jun 5, 2023

Choose a reason for hiding this comment

Roasbeef Jun 5, 2023

Choose a reason for hiding this comment

guggero Jun 5, 2023

Choose a reason for hiding this comment

Roasbeef Jun 5, 2023

Choose a reason for hiding this comment

Roasbeef commented Jun 6, 2023

Roasbeef commented Jun 6, 2023

guggero left a comment

Choose a reason for hiding this comment

guggero commented Jun 6, 2023