Debug deadlock while applying DB migrations #4588

bartekn · 2022-09-14T14:08:37Z

What version are you using?

2.20.0

What did you do?

Apply 58_add_index_by_id_optimization.sql migration file while ingestion is running on another ingesting instance.

What did you expect to see?

No error.

What did you see instead?

Deadlock error:

pq: deadlock detected handling 58_add_index_by_id_optimization.sql

There are two things that are important to remember: 1. the migrations are run in a db transaction and 2. because we have multiple ingesting instance, ingestion is running along db migrations (also in a DB transaction). This is the chain of events I think that leads to deadlock:

first CREATE INDEX query in the migration file acquires ShareLock on history_operation_claimable_balances (this lock is going to be released at the end of transaction).
one of ingesting instances INSERT a row into history_operation_liquidity_pools and by doing this acquires RowExclusiveLock (it conflicts with ShareLock and is also released at the end of transaction).
now, CREATE INDEX on history_operation_liquidity_pools is executed, again ShareLock acquire attempt but on history_operation_liquidity_pools table and needs to wait until ingestion commits because there’s already a conflicting lock on this table.
ingesting instance attempts to INSERT a new row into history_operation_claimable_balances but waits because in step 1 a ShareLock was acquired.
deadlock because both txs wait for the other tx lock to be released.

I created a simple local repro.

session 1:

deadlock_repro=# create table t1 (f1 int);
CREATE TABLE
deadlock_repro=# create table t2 (f1 int);
CREATE TABLE
deadlock_repro=# begin;
BEGIN
deadlock_repro=*# create index i1 on t1(f1);
CREATE INDEX
deadlock_repro=*# create index i2 on t2(f1);
CREATE INDEX
deadlock_repro=*#

session 2:

deadlock_repro=# begin;
BEGIN
deadlock_repro=*# insert into t2 (f1) values (0);
INSERT 0 1
deadlock_repro=*# insert into t1 (f1) values (0);
ERROR:  deadlock detected
LINE 1: insert into t1 (f1) values (0);
                    ^
DETAIL:  Process 80691 waits for RowExclusiveLock on relation 1638638 of database 1638622; blocked by process 80642.
Process 80642 waits for ShareLock on relation 1638641 of database 1638622; blocked by process 80691.
HINT:  See server log for query details.

(the queries within txs are run one after another in two sessions: S1, S2, S1, S2)

The text was updated successfully, but these errors were encountered:

tsachiherman · 2022-09-21T11:53:06Z

What I'm going to propose might be an overkill :
Should we require the migration transaction to take a share lock, so that all the index creation operations would block the ingestion ?

bartekn · 2022-09-21T12:16:27Z

Yes, this is exactly the purpose of #4587. There is no other way to prevent deadlocks if ingestion is running along db migations.

Shaptic · 2022-10-13T22:51:41Z

Upon releasing v2.22.0, it appears that non-ingesting instances are trying to write to the read-only database. The error during deployment via deployinator was:

pq: cannot set transaction read-write mode during recovery

Is it possible that the PR at #4587 is triggering that? The workaround was to run the Restart job after running the Install job, as it seems that restarting the service does not attempt to apply any migrations.

tamirms · 2022-10-18T14:50:50Z

Upon releasing v2.22.0, it appears that non-ingesting instances are trying to write to the read-only database. The error during deployment via deployinator was:
pq: cannot set transaction read-write mode during recovery
Is it possible that the PR at #4587 is triggering that? The workaround was to run the Restart job after running the Install job, as it seems that restarting the service does not attempt to apply any migrations.

this is actually a pre-existing issue #4580 which was fixed by #4664

bartekn added horizon bug labels Sep 14, 2022

bartekn added this to Platform Scrum Sep 14, 2022

bartekn moved this to Backlog in Platform Scrum Sep 14, 2022

bartekn moved this from Backlog to In Progress in Platform Scrum Sep 14, 2022

bartekn mentioned this issue Sep 14, 2022

services/horizon: Lock ingestion while applying DB migrations #4587

Merged

7 tasks

bartekn closed this as completed in #4587 Sep 27, 2022

Repository owner moved this from In Progress to Done in Platform Scrum Sep 27, 2022

Shaptic reopened this Oct 13, 2022

tamirms closed this as completed Oct 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debug deadlock while applying DB migrations #4588

Debug deadlock while applying DB migrations #4588

bartekn commented Sep 14, 2022

tsachiherman commented Sep 21, 2022

bartekn commented Sep 21, 2022

Shaptic commented Oct 13, 2022 •

edited

Loading

tamirms commented Oct 18, 2022

Debug deadlock while applying DB migrations #4588

Debug deadlock while applying DB migrations #4588

Comments

bartekn commented Sep 14, 2022

What version are you using?

What did you do?

What did you expect to see?

What did you see instead?

tsachiherman commented Sep 21, 2022

bartekn commented Sep 21, 2022

Shaptic commented Oct 13, 2022 • edited Loading

tamirms commented Oct 18, 2022

Shaptic commented Oct 13, 2022 •

edited

Loading