Lock wait timeout exceeded MySQL errors #9905

nickdunn · 2021-09-30T18:36:51Z

Description

Very sporadically I am getting Lock wait timeout exceeded errors from MySQL.

These appear in Craft logs like:

Next yii\db\Exception: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction

and

[error][yii\db\Exception] PDOException: SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction in /var/www/craft/vendor/yiisoft/yii2/db/Command.php:1302

Running show engine innodb status shows they are consistently from queries on the structureelements table:

------------------------
LATEST DETECTED DEADLOCK
------------------------
2021-09-30 13:49:41 0x14b8
*** (1) TRANSACTION:
TRANSACTION 6366085, ACTIVE 2 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 22 lock struct(s), heap size 1136, 10 row lock(s), undo log entries 8
MySQL thread id 89160, OS thread handle 5260, query id 7402009 10.2.0.62 cswwwuatdbmysql updating
UPDATE `structureelements` SET `lft`=`lft`+2, `dateUpdated`='2021-09-30 13:49:39' WHERE (`lft` >= 884) AND (`root`=1)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 623 page no 20 n bits 1192 index structureelements_root_idx of table `craftcms`.`structureelements` trx id 6366085 lock_mode X waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 00000001; asc     ;;
 1: len 4; hex 80000001; asc     ;;

*** (2) TRANSACTION:
TRANSACTION 6365973, ACTIVE 5 sec inserting
mysql tables in use 1, locked 1
335 lock struct(s), heap size 41168, 2975 row lock(s), undo log entries 2044
MySQL thread id 89190, OS thread handle 5304, query id 7403525 10.2.0.62 cswwwuatdbmysql update
INSERT INTO `structureelements` (`structureId`, `dateCreated`, `dateUpdated`, `uid`, `lft`, `rgt`, `level`) VALUES (20875, '2021-09-30 13:49:41', '2021-09-30 13:49:41', '58e60ef1-6f8b-4a5d-9783-cf378111c29d', 1, 2, 0)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 623 page no 20 n bits 1192 index structureelements_root_idx of table `craftcms`.`structureelements` trx id 6365973 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 00000001; asc     ;;
 1: len 4; hex 80000001; asc     ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 623 page no 20 n bits 1192 index structureelements_root_idx of table `craftcms`.`structureelements` trx id 6365973 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 00000001; asc     ;;
 1: len 4; hex 80000001; asc     ;;

*** WE ROLL BACK TRANSACTION (1)

The database server is pretty chunky and performs well elsewhere. The innodb_lock_wait_timeout setting was at the 50 second default. I have just dumped to 100 but this feels like a sticking plaster i.e. telling it to wait a bit longer just in case!

#5872 suggests a large number of revisions could be the issue. I only have about 6k entries and 4k revisions, while there are 57k rows in structureelements.

I'm conscious StackOverflow is awash with useless threads on these errors and the usual reply is "refactor your code to make sure everything is in one transaction" which doesn't seem too useful.

The offending queries seem fairly innocuous to me. The issue is very sporadic — maybe 10 times in 3 months — and always seemingly when a content editor goes to save an entry.

Reckons?

Additional info

Craft version: 3.7.13
PHP version: 8.0.1
Database driver & version: MySQL 5.7.32 (https://azure.microsoft.com/en-us/services/mysql/)

The text was updated successfully, but these errors were encountered:

brandonkelly · 2021-10-01T21:47:16Z

Hey @nickdunn, do you have SEOmatic installed by chance? If so can you please ensure that you are running its latest version? We have seen similar issues where that ended up being the culprit, and has been fixed in v3.4.14.

lassemt · 2021-10-02T12:38:04Z

@brandonkelly I've been experiencing the same on a page recently. It's a "Pruning extra revisions" job that fails:

Now we are running the latest version, but we noticed the issue 10 days ago after running 3.7.11 for 2 weeks. Before that we have been running 3.6.17 without any problems. So it might be a change in 3.7 causing this?

Another thing I have noticed is that this is often happening after saving a specific entry. The entry containes a matrix field with a bunch of blocks, but this entry have had it's revisions pruned without a problem earlier.

System info

PHP 7.4.15
Linux 5.4.0-88-generic
MySQL 8.0.23
Craft Pro 3.7.14

Plugins

AsyncQueue 2.3.0
Calendar Links 1.0.1
Craft MailChimp 1.0.2 (custom plugin)
DigitalOcean Spaces Volume 1.1.3
Element API 2.8.3
Imgix 2.1.0
Minify 1.2.10
Redactor 2.8.8
Super Table 2.6.8
Twigpack 1.2.15

Notes

It's an issue that occured before AsyncQueue was installed.

Timeline

Sun Jul 04 2021 Updated to 3.6.17
Tue Sep 07 2021 Updated to 3.7.11
Fri Sep 24 2021 First time we noticed the problem. Installed Async Queue plugin as a potential solution.
Sat Oct 02 2021 Experiencing the same problem. Updated to 3.7.14. Managed to recreate problem by saving entry after update.

Hope this helps. Let me know if you need me to test anything more.

Edit:
It's not hard to notice the problem as the servers stales when it happens.

brandonkelly · 2021-10-04T22:40:00Z

@lassemt Does the job finish successfully if you restart it? Or does it consistently fail?

lassemt · 2021-10-05T10:10:06Z

@lassemt Does the job finish successfully if you restart it? Or does it consistently fail?

@brandonkelly I didn't try restarting the queue as most of the times I had to restart the server. I have the queue.log from when it happened and you see it tries to restart the job itself:

2021-10-01 17:14:33 [-][150469][-][info][craft\queue\QueueLogBehavior::beforeExec]  [272119] Pruning extra revisions (attempt: 1, pid: 461734) - Started
2021-10-01 17:15:25 [-][150469][-][error][craft\queue\QueueLogBehavior::afterError]  [272119] Pruning extra revisions (attempt: 1, pid: 461734) - Error (time: 52.052s): SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction
The SQL being executed was: DELETE FROM `craft_structureelements` WHERE ((`lft` >= 1059) AND (`rgt` <= 1060)) AND (`root`=1)
2021-10-01 17:15:25 [-][150469][-][info][craft\queue\QueueLogBehavior::beforeExec]  [277820] Pruning extra revisions (attempt: 1, pid: 461734) - Started
2021-10-01 17:17:58 [-][150469][-][error][craft\queue\QueueLogBehavior::afterError]  [277820] Pruning extra revisions (attempt: 1, pid: 461734) - Error (time: 152.207s): SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction
The SQL being executed was: DELETE FROM `craft_structureelements` WHERE ((`lft` >= 955) AND (`rgt` <= 956)) AND (`root`=1)
2021-10-01 17:17:58 [-][150469][-][info][craft\queue\QueueLogBehavior::beforeExec]  [283346] Pruning extra revisions (attempt: 1, pid: 461734) - Started
2021-10-01 17:22:06 [-][150469][-][error][craft\queue\QueueLogBehavior::afterError]  [283346] Pruning extra revisions (attempt: 1, pid: 461734) - Error (time: 248.894s): SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction
The SQL being executed was: DELETE FROM `craft_structureelements` WHERE ((`lft` >= 955) AND (`rgt` <= 956)) AND (`root`=1)
2021-10-01 17:22:06 [-][150469][-][info][craft\queue\QueueLogBehavior::beforeExec]  [270997] Pruning extra revisions (attempt: 2, pid: 461734) - Started
2021-10-01 17:27:08 [-][150469][-][error][craft\queue\QueueLogBehavior::afterError]  [270997] Pruning extra revisions (attempt: 2, pid: 461734) - Error (time: 301.233s): SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction
The SQL being executed was: DELETE FROM `craft_structureelements` WHERE ((`lft` >= 955) AND (`rgt` <= 956)) AND (`root`=1)

brandonkelly · 2021-10-05T17:50:35Z

Thanks. The first and last errors are 12 minutes apart, so that probably rules out the possibility that the lock was just legitimately tied up by a separate process.

Do you see any other [errors] further up in the logs that look like they might have had something to do with updating structure data? Or do you have anything with a semi-recent timestamp in storage/logs/phperrors.log? Seems like maybe there was a previous request that ended prematurely, and wasn’t able to release its lock on the structure.

lassemt · 2021-10-05T18:44:06Z

@brandonkelly that makes sense. Here is batch from queue.log when I tried to force recreating the error by manually saving the entry a couple of times. This was done right after I restarted the server, so the first event you see is the first queued job after the restart, and the last event is from right after I triggered the server restart through Digital Ocean.

2021-10-02 13:42:20 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292891] Updating search indexes (attempt: 1, pid: 1628) - Started
2021-10-02 13:42:20 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292891] Updating search indexes (attempt: 1, pid: 1628) - Done (time: 0.056s)
2021-10-02 13:42:20 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292894] Updating search indexes (attempt: 1, pid: 1628) - Started
2021-10-02 13:42:20 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292894] Updating search indexes (attempt: 1, pid: 1628) - Done (time: 0.050s)
2021-10-02 13:42:21 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292893] Pruning extra revisions (attempt: 1, pid: 1628) - Started
2021-10-02 13:42:22 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292893] Pruning extra revisions (attempt: 1, pid: 1628) - Done (time: 1.023s)
2021-10-02 13:42:48 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292896] Updating element slugs and URIs (attempt: 1, pid: 1665) - Started
2021-10-02 13:42:48 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292896] Updating element slugs and URIs (attempt: 1, pid: 1665) - Done (time: 0.039s)
2021-10-02 13:42:48 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292895] Updating search indexes (attempt: 1, pid: 1665) - Started
2021-10-02 13:42:48 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292895] Updating search indexes (attempt: 1, pid: 1665) - Done (time: 0.052s)
2021-10-02 13:42:48 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292898] Updating search indexes (attempt: 1, pid: 1665) - Started
2021-10-02 13:42:48 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292898] Updating search indexes (attempt: 1, pid: 1665) - Done (time: 0.048s)
2021-10-02 13:42:48 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292897] Pruning extra revisions (attempt: 1, pid: 1665) - Started
2021-10-02 13:42:49 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292897] Pruning extra revisions (attempt: 1, pid: 1665) - Done (time: 0.788s)
2021-10-02 13:43:06 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292900] Updating element slugs and URIs (attempt: 1, pid: 1727) - Started
2021-10-02 13:43:07 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292900] Updating element slugs and URIs (attempt: 1, pid: 1727) - Done (time: 0.037s)
2021-10-02 13:43:07 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292899] Updating search indexes (attempt: 1, pid: 1727) - Started
2021-10-02 13:43:07 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292899] Updating search indexes (attempt: 1, pid: 1727) - Done (time: 0.046s)
2021-10-02 13:43:07 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292902] Updating search indexes (attempt: 1, pid: 1727) - Started
2021-10-02 13:43:07 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292902] Updating search indexes (attempt: 1, pid: 1727) - Done (time: 0.049s)
2021-10-02 13:43:07 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292901] Pruning extra revisions (attempt: 1, pid: 1727) - Started
2021-10-02 13:43:08 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292901] Pruning extra revisions (attempt: 1, pid: 1727) - Done (time: 0.654s)
2021-10-02 13:43:24 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292904] Updating element slugs and URIs (attempt: 1, pid: 1755) - Started
2021-10-02 13:43:24 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292904] Updating element slugs and URIs (attempt: 1, pid: 1755) - Done (time: 0.036s)
2021-10-02 13:43:24 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292903] Updating search indexes (attempt: 1, pid: 1755) - Started
2021-10-02 13:43:24 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292903] Updating search indexes (attempt: 1, pid: 1755) - Done (time: 0.048s)
2021-10-02 13:43:24 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292906] Updating search indexes (attempt: 1, pid: 1755) - Started
2021-10-02 13:43:24 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292906] Updating search indexes (attempt: 1, pid: 1755) - Done (time: 0.048s)
2021-10-02 13:43:25 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292905] Pruning extra revisions (attempt: 1, pid: 1755) - Started
2021-10-02 13:43:25 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292905] Pruning extra revisions (attempt: 1, pid: 1755) - Done (time: 0.749s)
2021-10-02 13:43:43 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292908] Updating element slugs and URIs (attempt: 1, pid: 1784) - Started
2021-10-02 13:43:43 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292908] Updating element slugs and URIs (attempt: 1, pid: 1784) - Done (time: 0.040s)
2021-10-02 13:43:43 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292907] Updating search indexes (attempt: 1, pid: 1784) - Started
2021-10-02 13:43:43 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292907] Updating search indexes (attempt: 1, pid: 1784) - Done (time: 0.049s)
2021-10-02 13:43:43 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292910] Updating search indexes (attempt: 1, pid: 1784) - Started
2021-10-02 13:43:43 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292910] Updating search indexes (attempt: 1, pid: 1784) - Done (time: 0.050s)
2021-10-02 13:43:44 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292909] Pruning extra revisions (attempt: 1, pid: 1784) - Started
2021-10-02 13:43:44 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292909] Pruning extra revisions (attempt: 1, pid: 1784) - Done (time: 0.780s)
2021-10-02 13:44:29 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292912] Updating element slugs and URIs (attempt: 1, pid: 1894) - Started
2021-10-02 13:44:30 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292912] Updating element slugs and URIs (attempt: 1, pid: 1894) - Done (time: 0.440s)
2021-10-02 13:44:30 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292916] Updating element slugs and URIs (attempt: 1, pid: 1897) - Started
2021-10-02 13:44:30 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292916] Updating element slugs and URIs (attempt: 1, pid: 1897) - Done (time: 0.091s)
2021-10-02 13:44:30 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292911] Updating search indexes (attempt: 1, pid: 1894) - Started
2021-10-02 13:44:30 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292911] Updating search indexes (attempt: 1, pid: 1894) - Done (time: 0.114s)
2021-10-02 13:44:30 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292915] Updating search indexes (attempt: 1, pid: 1894) - Started
2021-10-02 13:44:31 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292915] Updating search indexes (attempt: 1, pid: 1894) - Done (time: 0.204s)
2021-10-02 13:44:30 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292914] Updating search indexes (attempt: 1, pid: 1897) - Started
2021-10-02 13:44:31 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292914] Updating search indexes (attempt: 1, pid: 1897) - Done (time: 0.351s)
2021-10-02 13:44:31 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292918] Updating search indexes (attempt: 1, pid: 1894) - Started
2021-10-02 13:44:31 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [292918] Updating search indexes (attempt: 1, pid: 1894) - Done (time: 0.105s)
2021-10-02 13:44:31 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292913] Pruning extra revisions (attempt: 1, pid: 1897) - Started
2021-10-02 13:45:24 [-][-][-][error][craft\queue\QueueLogBehavior::afterError]  [292913] Pruning extra revisions (attempt: 1, pid: 1897) - Error (time: 53.096s): SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction
The SQL being executed was: DELETE FROM `craft_structureelements` WHERE ((`lft` >= 927) AND (`rgt` <= 928)) AND (`root`=1)
2021-10-02 13:44:32 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [292917] Pruning extra revisions (attempt: 1, pid: 1894) - Started
2021-10-02 13:47:45 [-][-][-][error][craft\queue\QueueLogBehavior::afterError]  [292917] Pruning extra revisions (attempt: 1, pid: 1894) - Error (time: 193.661s): SQLSTATE[HY000]: General error: 2006 MySQL server has gone away

The only error in phperrors.log close to this timestamp is at [02-Oct-2021 13:47:45 Europe/Oslo] which is reelated to the shutdown: "Communication link failure: 1053 Server shutdown in progress".

I was watching the queue in the admin when the above was happening and took screenshot of the 2 "Pruning extra revisions" jobs that "got stuck". The images are in the order they were in the queue overview in the admin:

And the second:

I'm not sure if it matters, but notice that the 2 jobs are related to the same SourceID and that the one with the ID 292913 that was pushed first failed right after the one with the ID 292917 was pushed to the queue. I believe it was right after the second one was pushed to the queue the server started to run slow.

brandonkelly · 2021-10-05T20:52:06Z

@lassemt Can you try updating to Craft 3.7.15, restart the DB server, and see if you can still reproduce this? Sounds like maybe 8e780c7 might help.

lassemt · 2021-10-05T23:50:19Z

@brandonkelly I updated and restarted the server now. The same issue is still there and the logs are the same. But I noticed that it only happens when there are 2 "Pruning extra revisions" jobs for the same entry. None of the jobs are completed before the server starts to slow down, and it is always the first one pushed to queue that get the Lock Wait timeout exceeeded error message. The last one of the 2 jobs rarely shows up in the queue.log.

If there is only one "Pruning extra revisions" job for the entry, the job finishes as expected:

2021-10-06 00:46:36 [-][-][-][info][craft\queue\QueueLogBehavior::beforeExec]  [294411] Pruning extra revisions (attempt: 1, pid: 222642) - Started
2021-10-06 00:46:37 [-][-][-][info][craft\queue\QueueLogBehavior::afterExec]  [294411] Pruning extra revisions (attempt: 1, pid: 222642) - Done (time: 0.915s)

I had to open the entry editor in 2 tabs and save them almost simultaneously to get 2 jobs going.

nickdunn · 2021-10-06T09:52:36Z

Hey @nickdunn, do you have SEOmatic installed by chance? If so can you please ensure that you are running its latest version? We have seen similar issues where that ended up being the culprit, and has been fixed in v3.4.14.

We do, and have recently updated to latest version. Not seen the error since... but then again we've only seen the error a handful of times in 6 months so it's sporadic. I don't think this was the culprit since we've had regenerateSitemapsAutomatically set to false for a long time so SEOmatic isn't doing anything queue related — so I think different to nystudio107/craft-seomatic#989.

brandonkelly · 2021-10-06T12:09:05Z

@lassemt Do you have multiple queue runners?

wallacio · 2021-10-07T08:54:05Z

I'd like to chip in with a similar experience, although it's not just a single resource/element that it's happening on. We do have a faily chunky install, with 9 sites, ~50k rows in elements and ~8k in structureelements. Max revisions is set to 5 in config and although we're using Redis for caching, it's not being used for the queue and the default driver is being used here. I'd consider offloading to Redis if that would help or is recommended anyway. Queue runner in the offending environment is the default (triggered by CP rather than queue/run in a cron job).

MariaDB pegged a single core (out of 4) once this behaviour started and had a huge impact on DB performance such that the CP was unususable (couldn't save entries reliably). Killing the DB process was the only way to resolve things. I have since run prune-revisions via the CLI and it ripped through pretty quickly.

queue.log attached for info, not that it reports anything that hasn't been seen already (what's with the order of the entries being messed up though? why aren't they in ascending time order?)

Craft 3.6.18
Plugins:
"bertoost/craft-mailjet": "1.2.3",
"born05/craft-imagehotspots": "1.2.0",
"chasegiunta/scss": "1.0.0",
"craftcms/feed-me": "4.3.6",
"craftcms/redactor": "2.8.8",
"endroid/qr-code": "^3.9",
"ether/simplemap": "3.9.1",
"mmikkel/reasons": "2.2.5",
"nthmedia/entry-gps-coordinates": "1.3.1",
"verbb/field-manager": "2.2.3",
"verbb/formie": "1.4.12",
"verbb/super-table": "2.6.8"

queue.log:
queue-snip.log

lassemt · 2021-10-07T09:51:32Z

@lassemt Do you have multiple queue runners?

@brandonkelly I don't think so (I'm not too familiar with queues in PHP). As far as I know, I'm only running the default queue driver. I disabled the ostark/craft-async-queue plugin before updating and running the previous tests.

nickdunn · 2021-10-07T10:19:09Z

Using Async Queue here, default 2 concurrent processes.

brandonkelly · 2021-10-07T14:35:57Z

@nickdunn I have a feeling multiple queue runners has a tendency to exacerbate these sorts of issues.

sjcallender · 2021-10-11T18:47:27Z

Yeah, use 1 queue processor and you'll be fine. We learned this the hard way too, hence: https://twitter.com/sjcallender/status/1433238613039407105

nickdunn · 2021-10-12T11:15:00Z

Ok, we've upgraded Craft, Neo and SEOmatic to latest and have dropped Async Queue to one queue processor so will see if this fixes it.

kevinmu17 · 2021-10-25T14:07:50Z

Same here!

The entire site from our client went down after a few “Internal server errors” messages. For me as the admin I could test this once and after it the website + CP crashed down.
The YII toolbar gave an error:
Lock wait timeout exceeded; try restarting transaction

There where 2 failed tasks in the queue before it threw me out that stated something about Pruning extra revisions:

2021-10-25 12:28:34 [-][1440][-][info][craft\queue\QueueLogBehavior::beforeExec]  [36509] Pruning extra revisions (attempt: 1, pid: 1930471) - Started
2021-10-25 15:09:29 [-][1440][-][error][craft\queue\QueueLogBehavior::afterError]  [36509] Pruning extra revisions (attempt: 1, pid: 1930471) - Error (time: 9655.015s): SQLSTATE[HY000]: General error: 2006 MySQL server has gone away
2021-10-25 15:09:29 [-][1440][-][error][PDOException] PDOException: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away in /vendor/yiisoft/yii2/db/Transaction.php:203
Stack trace:
#0 /vendor/yiisoft/yii2/db/Transaction.php(203): PDO->rollBack()
#1 /vendor/craftcms/cms/src/services/Elements.php(1541): yii\db\Transaction->rollBack()
#2 /vendor/craftcms/cms/src/queue/jobs/PruneRevisions.php(75): craft\services\Elements->deleteElement()
#3 /vendor/yiisoft/yii2-queue/src/Queue.php(246): craft\queue\jobs\PruneRevisions->execute()
#4 /vendor/yiisoft/yii2-queue/src/cli/Queue.php(147): yii\queue\Queue->handleMessage()
#5 /vendor/craftcms/cms/src/queue/Queue.php(131): yii\queue\cli\Queue->handleMessage()
#6 [internal function]: craft\queue\Queue->craft\queue\{closure}()
#7 /vendor/yiisoft/yii2-queue/src/cli/Queue.php(117): call_user_func()
#8 /vendor/craftcms/cms/src/queue/Queue.php(140): yii\queue\cli\Queue->runWorker()
#9 /vendor/craftcms/cms/src/controllers/QueueController.php(84): craft\queue\Queue->run()
#10 [internal function]: craft\controllers\QueueController->actionRun()
#11 /vendor/yiisoft/yii2/base/InlineAction.php(57): call_user_func_array()
#12 /vendor/yiisoft/yii2/base/Controller.php(181): yii\base\InlineAction->runWithParams()
#13 /vendor/craftcms/cms/src/web/Controller.php(190): yii\base\Controller->runAction()
#14 /vendor/yiisoft/yii2/base/Module.php(534): craft\web\Controller->runAction()
#15 /vendor/craftcms/cms/src/web/Application.php(274): yii\base\Module->runAction()
#16 /vendor/craftcms/cms/src/web/Application.php(577): craft\web\Application->runAction()
#17 /vendor/craftcms/cms/src/web/Application.php(253): craft\web\Application->_processActionRequest()
#18 /vendor/yiisoft/yii2/base/Application.php(392): craft\web\Application->handleRequest()
#19 /public_html/index.php(22): yii\base\Application->run()
#20 {main}

Maybe you guys can shed some light on this issue so we can maybe prevent this in the future.

PHP version: 7.4.23
Craft CMS Pro: 3.6.11.2
MariaDB: 10.3.31

sjcallender · 2021-10-25T15:38:14Z

@kevinmu17 - This is the exact issue that forced us to use just one queue processor. We've been fine ever since.

kevinmu17 · 2021-10-25T15:43:16Z

@kevinmu17 - This is the exact issue that forced us to use just one queue processor. We've been fine ever since.

We didn't install any, or does craft out of the box use more than 1? :)

sjcallender · 2021-10-25T16:00:02Z

We didn't install any, or does craft out of the box use more than 1? :)

By default, the queue is run when you visit the the control panel. I'm not sure if two or more can ever be initiated this way. @brandonkelly?

I'd recommended setting 'runQueueAutomatically' => false in config/general.php and then enabling a single alternative queue worker as noted in the docs: https://craftcms.com/docs/3.x/config/config-settings.html#runqueueautomatically

angrybrad · 2021-10-25T16:15:50Z

@kevinmu17 I'd suggest updating to the latest 3.7 release as a starting point and see if the behavior changes. There have been improvements made around transactions and locking that you're missing.

nickdunn · 2021-11-03T08:58:34Z

Could #9977 be a potential fix for this?

Edit: unlikely — locally I've just run utils/repair/section-structure against my main "Page" structure and the number of structureelements rows has remained at around 55k.

brandonkelly · 2021-11-03T23:12:31Z

@nickdunn You updated to 3.7.19 first right?

nickdunn · 2021-11-04T10:38:00Z

Ha, yes I did! The utils/repair/section-structure CLI task ran and output the tree structure of my entries.

But I'm thinking maybe 55K structureelements rows is par for the course for a few hundred entries. I have 3 Structure sections all totalling ~250 entries.

I must admit I don't fully understand what the lower level structures and structureelements are, but I have other plugins that use things that are "ordered" (Neo blocks, Navigation etc) so maybe this adds rows to the table and 55K rows is expected. There are 18K rows in structures, 8K having dateDeleted dates set.

brandonkelly · 2021-11-04T14:35:31Z

Ah… it’s due to Neo. Each individual Neo field/element combo is going to add its own additional structure.

hiasl · 2022-01-28T08:14:35Z

We just update to 3.7.10.1 since we had similar problems with a version < 3.7.19, but the problem still persists., although now it does not create that massive db load anymore which can finally only be resolved by restarting it.

Here are some facts:

It is related to "pruning extra revisions tasks", started a day before making no progress at all.
The database is executing the same queries again and again:

SELECT *
FROM `structureelements`
WHERE ((`lft` > 44) AND (`rgt` < 45) AND (`level` <= 2)) AND (`root`=1919849)
ORDER BY `lft`

UPDATE `structureelements` SET `lft`=`lft`-2, `dateUpdated`='2022-01-28 08:05:27' WHERE (`lft` >= 46) AND (`root`=1919849)

DELETE FROM `structureelements` WHERE ((`lft` >= 44) AND (`rgt` <= 45)) AND (`root`=1919849)

We have failed "prune extra revision tasks" with the following error:

SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction The SQL being executed was: DELETE FROM `structureelements` WHERE ((`lft` >= 41) AND (`rgt` <= 42)) AND (`root`=1919849)

I have 1 queue runner via cronjob using flock to prevent concurrent runs (runQueueAutomatically disabled). The queue runner shows the "prune" job being executed hanging at 33% (no changes for 24 hours). Although this job is active/reserved, other tasks are still handled (updating search indexes primarily). My first guess would be I have multiple runners, which I can not confirm. I mentioned this in a support ticket before: The queue sometimes looses track of its running tasks and does not close them. It has something todo with database timeouts and long running queue tasks.
Next I emptied the queue and made sure no queue running happens (ps ax grepping for the queue tasks - nothing): But the database queries mentioned above are still happening, again and again. I can not stop them, so they must be self triggering or something.

I know that some of my findings above sound rather confused, but this is what I am seeing.

I need to mention that we have Neo installed in version 2.8.16 and we need to update that. will happen soon and I will keep you posted.

metheeruetaichetcharoen-toast · 2022-08-29T01:24:55Z

@hiasl We have the same exact issue, We've tried rebooting the cluster DB and still the issue persist, did you do anything to resolve your issue?

Fixes #9905 - where a deadlock error would occur when updating a row referencing an elements table row, which may be locked by another request

brandonkelly · 2022-11-18T01:17:39Z

We finally got to the bottom of this, with much help form @aaronbushnell 🎉

The deadlock errors occurred when updating rows in the structureelements or relations tables, which referenced an elements table row that was currently being edited by a separate request.

We fixed by dropping the foreign keys that relate these tables together, and deferring the “cascade deletes” to garbage collection.

Since that involved a new DB migration, we decided to fix it for Craft 4.4 rather than the next patch releases.

To test the fix early, you can change your craftcms/cms requirement in composer.json to "4.4.x-dev#45d78959f9f08930bec5b3f6bb7235a9bb103cc2 and run composer update. Note that reverting to Craft 4.3 is not possible due to the new 4.4 migrations though, so create a database backup before you update.

aaronbushnell · 2022-11-18T02:51:22Z

I tried this update in my Craft 4 branch of this project and it works, @brandonkelly! Thanks so much for the work you and Brad put into this!

I presume this probably requires Craft 4 and couldn't be a fix backported to Craft 3?

nickdunn · 2022-11-18T09:45:49Z

Yay, well done!

Stupid question, but is it genuinely safe to remove these? Don't the FKs provide data integrity for those relationships? And is there a risk that any plugins rely on these FK rows and it may be a bigger breaking change?

brandonkelly · 2022-11-18T16:20:12Z

@aaronbushnell Due to the schema change, we wouldn’t backport the fix to v3 unless there was going to be a 3.8 release. So maybe down the road (e.g. if we have a need to bump the minimum PHP version), but not in the immediate future.

Update: We ended up releasing 3.8 alongside 4.4.

@nickdunn It can be a concern, but not in this case. Both relations and structureelements will just get joined into element queries based on elements.id matches. There’s no harm in having extra rows in there referencing nonexistent element IDs; they will just be ignored.

(Plus, elements typically get hard-deleted via garbage collection, so in most cases those extra rows will only be around for a fraction of a millisecond, between the time the elements are hard-deleted and the orphaned extra rows are deleted.)

aaronbushnell · 2022-11-18T17:37:30Z

Due to the schema change, we wouldn’t backport the fix to v3 unless there was going to be a 3.8 release. So maybe down the road (e.g. if we have a need to bump the minimum PHP version), but not in the immediate future.

That makes sense, thanks for the follow-up @brandonkelly!

Fixes #9905 for Craft 3.8

brandonkelly · 2023-03-08T22:09:41Z

Craft 3.8 and 4.4 are out now with this deadlock fix!

boboldehampsink · 2023-03-17T09:00:59Z

@brandonkelly I'm having a lot of issues with structures being orphaned on multiple 3.8 sites, could this change have anything to do with it?

brandonkelly · 2023-03-17T15:27:48Z

@boboldehampsink This change could lead to orphaned structureelements rows that reference a now-deleted element ID, but those shouldn’t have any noticeable effect, and will get deleted during garbage collection.

If you want to send in a DB backup + Composer files, and point out the problem, we can look into it from there. Or if you can come up with steps to reproduce, post a new issue about it.

boboldehampsink · 2023-03-20T08:54:01Z

@brandonkelly I was already mailing with support (Oli) about this. Just sent another dump where it can be reproduced by running GC, see support mail.

olivierbon · 2023-03-20T15:20:17Z

Related spicywebau/craft-neo#725

brandonkelly · 2023-03-20T18:52:17Z

@boboldehampsink Is your issue specific to Neo as well?

boboldehampsink · 2023-03-20T18:53:45Z

@brandonkelly I do have this specific issue, but I had one other site that had non-Neo structure related issues after GC - it just broke a Structure section. So my guess is its not Neo but Craft (structure) specific.

lthurston · 2023-03-20T19:40:03Z

After an upgrade from 3.7.68 -> 3.8.3 last week we've had several structures flatten and attempts to edit them result in server errors. This is an instance with Neo, Navigation plugins, among others, but non-plugin structures (the page tree, for instance) was also flattened. We don't know when this background process ran, so we're rolling back (spinning up a new host based on a snapshot, rolling back the db to before the upgrade and restoring a pre-upgrade release).

Stack: headless Craft, NextJS, Postgres

brandonkelly · 2023-03-20T22:38:19Z

Thanks all! I was able to reproduce with @boboldehampsink’s DB, and tracked it down to a PostgreSQL + garbage collection bug in 3.8 + 4.4.

Craft 3.8.4 and 4.4.4 are out now with a fix. (96c56f1)

lthurston · 2023-03-21T19:48:28Z

@brandonkelly The flattening of structures appears to be fixed, but GC in 3.8.4 seems to be removing tag, category and asset relations. I've been able to reproduce this two times now by rolling back, looking at entries, running GC, looking again.

#9905 (comment)

brandonkelly · 2023-03-21T21:23:21Z

Doh, sorry about that. We were so focused on this being a structure bug, I completely forgot that we made the same GC change for relations as well.

Craft 3.8.5 and 4.4.5 are out with the same fix for relations.

lthurston · 2023-03-21T21:27:42Z

We'll test it out and will confirm.

lthurston · 2023-03-21T23:14:26Z

Looks good to us, thanks!

paragonn · 2023-05-24T13:02:48Z

Hi, I recently upgraded a large site from Craft CMS 3.7.55.1 to Craft CMS 4.4.11 and I am receiving the following error when trying to save changes to an entry:
SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction

I added some test blocks to a page that I had forgotten to remove. I went into the entry deleted the test blocks and clicked save and that is when the error happened. I now can't make any changes to the entry without receiving the error above in the logs and an Internal Server Error in the CP.

I tried to add a block to other entries and they seem to be fine, however, I am concerned that this is not an isolated case and I could receive more of these errors.

@brandonkelly

brandonkelly · 2023-05-24T17:51:43Z

Hey @paragonn, it sounds like maybe your mutex lock didn’t clear itself for some reason. How is your mutex component configured within config/app.php?

paragonn · 2023-05-24T18:38:08Z

Hey @brandonkelly -- I should also have mentioned that it is a load-balanced environment across two servers. We have a redis component configured, but no mutex. Should we add a mutex configuration?

Below is the components configuration that we have in place in the config/app.php:
$components = [
'redis' => [
'class' => yii\redis\Connection::class,
'hostname' => App::env('REDIS_HOST'),
'port' => App::env('REDIS_PORT'),
'password' => App::env('REDIS_PASSWORD'),
],
'cache' => [
'class' => yii\redis\Cache::class,
'defaultDuration' => 86400,
'keyPrefix' => App::env('APP_ID') ?: 'CraftCMS',
],
];

angrybrad · 2023-05-25T14:27:43Z

@paragonn Yup! See https://craftcms.com/knowledge-base/configuring-load-balanced-environments#mutex-locks

Probably worth giving the rest of that page a once-over, too.

paragonn · 2023-05-25T18:21:21Z

Thanks @angrybrad !

nickdunn added the bug label Sep 30, 2021

mattgrayisok mentioned this issue Oct 22, 2021

Database lock-up due to huge transaction rollback #9999

Closed

maxstrebel mentioned this issue Aug 16, 2022

[3.3.0] Undefined array key 6 spicywebau/craft-neo#639

Closed

brandonkelly added a commit that referenced this issue Nov 18, 2022

Drop the FKs for relations.targetId and structureelements.elementId

45d7895

Fixes #9905 - where a deadlock error would occur when updating a row referencing an elements table row, which may be locked by another request

brandonkelly closed this as completed Nov 18, 2022

maxstrebel mentioned this issue Dec 2, 2022

Database Rollback failed on Deadlock spicywebau/craft-neo#637

Closed

brandonkelly added a commit that referenced this issue Feb 21, 2023

Backport 45d7895

fc799d9

Fixes #9905 for Craft 3.8

brandonkelly reopened this Mar 20, 2023

brandonkelly closed this as completed Mar 20, 2023

brandonkelly added a commit that referenced this issue Mar 21, 2023

Fixed relation data GC + PostgreSQL bug

df12e22

#9905 (comment)

Lock wait timeout exceeded MySQL errors #9905

Lock wait timeout exceeded MySQL errors #9905

Comments

nickdunn commented Sep 30, 2021

Description

Additional info

brandonkelly commented Oct 1, 2021

lassemt commented Oct 2, 2021 • edited Loading

System info

Plugins

Notes

Timeline

brandonkelly commented Oct 4, 2021

lassemt commented Oct 5, 2021 • edited Loading

brandonkelly commented Oct 5, 2021

lassemt commented Oct 5, 2021

brandonkelly commented Oct 5, 2021

lassemt commented Oct 5, 2021

nickdunn commented Oct 6, 2021

brandonkelly commented Oct 6, 2021

wallacio commented Oct 7, 2021 • edited Loading

lassemt commented Oct 7, 2021

nickdunn commented Oct 7, 2021

brandonkelly commented Oct 7, 2021

sjcallender commented Oct 11, 2021

nickdunn commented Oct 12, 2021

kevinmu17 commented Oct 25, 2021

sjcallender commented Oct 25, 2021

kevinmu17 commented Oct 25, 2021

sjcallender commented Oct 25, 2021

angrybrad commented Oct 25, 2021

nickdunn commented Nov 3, 2021 • edited Loading

brandonkelly commented Nov 3, 2021

nickdunn commented Nov 4, 2021

brandonkelly commented Nov 4, 2021

hiasl commented Jan 28, 2022 • edited Loading

metheeruetaichetcharoen-toast commented Aug 29, 2022

brandonkelly commented Nov 18, 2022

aaronbushnell commented Nov 18, 2022

nickdunn commented Nov 18, 2022

brandonkelly commented Nov 18, 2022 • edited Loading

aaronbushnell commented Nov 18, 2022

brandonkelly commented Mar 8, 2023

boboldehampsink commented Mar 17, 2023

brandonkelly commented Mar 17, 2023

boboldehampsink commented Mar 20, 2023

olivierbon commented Mar 20, 2023

brandonkelly commented Mar 20, 2023 • edited Loading

boboldehampsink commented Mar 20, 2023

lthurston commented Mar 20, 2023

brandonkelly commented Mar 20, 2023 • edited Loading

lthurston commented Mar 21, 2023

brandonkelly commented Mar 21, 2023

lthurston commented Mar 21, 2023

lthurston commented Mar 21, 2023

paragonn commented May 24, 2023 • edited Loading

brandonkelly commented May 24, 2023

paragonn commented May 24, 2023 • edited Loading

angrybrad commented May 25, 2023

paragonn commented May 25, 2023

lassemt commented Oct 2, 2021 •

edited

Loading

lassemt commented Oct 5, 2021 •

edited

Loading

wallacio commented Oct 7, 2021 •

edited

Loading

nickdunn commented Nov 3, 2021 •

edited

Loading

hiasl commented Jan 28, 2022 •

edited

Loading

brandonkelly commented Nov 18, 2022 •

edited

Loading

brandonkelly commented Mar 20, 2023 •

edited

Loading

brandonkelly commented Mar 20, 2023 •

edited

Loading

paragonn commented May 24, 2023 •

edited

Loading

paragonn commented May 24, 2023 •

edited

Loading