如何解决和调试MySQL错误:ER_LOCK_WAIT_TIMEOUT:锁等待超时; 尝试重新启动事务

15

我在这里读了几篇文章,但仍然不明白如何解决错误。另外,我该如何解密show engine innodb status的输出。我怀疑它是死锁或一个永远不提交/回滚的事务,导致它永远持有锁(这可能吗?)。

# show engine innodb status;
=====================================
2015-09-27 04:43:54 2b9cf470f700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 33 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 16140 srv_active, 0 srv_shutdown, 3941554 srv_idle
srv_master_thread log flush and writes: 3957687
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 25915
OS WAIT ARRAY INFO: signal count 25874
Mutex spin waits 1675, rounds 50212, OS waits 1581
RW-shared spins 24307, rounds 729240, OS waits 24278
RW-excl spins 16, rounds 1680, OS waits 56
Spin rounds per wait: 29.98 mutex, 30.00 RW-shared, 105.00 RW-excl
------------
TRANSACTIONS
------------
Trx id counter 712274
Purge done for trx's n:o < 712178 undo n:o < 0 state: running but idle
History list length 1593
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 18286, OS thread handle 0x2b9cf470f700, query id 2307113 bb116-14-164-108.singnet.com.sg 116.14.164.108 meclub2359 init
show engine innodb status
---TRANSACTION 0, not started
MySQL thread id 18285, OS thread handle 0x2b9cf4444700, query id 2306954 bb116-14-164-108.singnet.com.sg 116.14.164.108 meclub2359 cleaning up
---TRANSACTION 712266, not started
MySQL thread id 18270, OS thread handle 0x2b9cf43c2700, query id 2307088 172.31.8.174 meclub2359 cleaning up
---TRANSACTION 712235, not started
MySQL thread id 18268, OS thread handle 0x2b9cf3896700, query id 2306757 172.31.8.174 meclub2359 cleaning up
---TRANSACTION 712268, not started
MySQL thread id 18267, OS thread handle 0x2b9cf4381700, query id 2307091 172.31.8.174 meclub2359 cleaning up
---TRANSACTION 712272, not started
MySQL thread id 5214, OS thread handle 0x2b9cf31dd700, query id 2307112 localhost 127.0.0.1 rdsadmin cleaning up
---TRANSACTION 712269, ACTIVE 31 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 360, 1 row lock(s)
MySQL thread id 18288, OS thread handle 0x2b9cf44c6700, query id 2307092 172.31.8.174 meclub2359 updating
UPDATE stardate SET winnersSelected=1 WHERE id='6'
------- TRX HAS BEEN WAITING 31 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 4067 page no 3 n bits 72 index `PRIMARY` of table `ebdb`.`stardate` trx id 712269 lock_mode X locks rec but not gap waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 16; compact format; info bits 0
 0: len 4; hex 00000006; asc     ;;
 1: len 6; hex 0000000addcd; asc       ;;
 2: len 7; hex 180000021d2d57; asc      -W;;
 3: len 30; hex 68747470733a2f2f73332d61702d736f757468656173742d312e616d617a; asc https://s3-ap-southeast-1.amaz; (total 62 bytes);
 4: len 30; hex 4d75736963616c204576656e696e67207769746820456c6c696520476f75; asc Musical Evening with Ellie Gou; (total 35 bytes);
 5: len 30; hex 4d75736963616c204576656e696e67207769746820456c6c696520476f75; asc Musical Evening with Ellie Gou; (total 35 bytes);
 6: len 30; hex 4d75736963616c204576656e696e67207769746820456c6c696520476f75; asc Musical Evening with Ellie Gou; (total 255 bytes);
 7: len 22; hex 53696e6761706f72650d0a4d6172696e612042617920; asc Singapore  Marina Bay ;;
 8: len 20; hex 52443230313530394d434d4543444153434d5346; asc RD201509MCMECDASCMSF;;
 9: len 4; hex 80000002; asc     ;;
 10: len 5; hex 9997295000; asc   )P ;;
 11: len 5; hex 999702a000; asc      ;;
 12: len 4; hex 8000000c; asc     ;;
 13: len 1; hex 81; asc  ;;
 14: len 5; hex 999706e345; asc     E;;
 15: len 5; hex 99972ee820; asc   .  ;;

------------------
---TRANSACTION 712213, ACTIVE 501 sec
5 lock struct(s), heap size 1184, 1 row lock(s), undo log entries 3
MySQL thread id 18284, OS thread handle 0x2b9cf3daa700, query id 2307007 172.31.8.174 meclub2359 cleaning up
Trx read view will not see trx with id >= 712249, sees < 712141
---TRANSACTION 712197, ACTIVE 574 sec
5 lock struct(s), heap size 1184, 1 row lock(s), undo log entries 3
MySQL thread id 18283, OS thread handle 0x2b9cf4403700, query id 2306959 172.31.8.174 meclub2359 cleaning up
Trx read view will not see trx with id >= 712241, sees < 712141
---TRANSACTION 712184, ACTIVE 595 sec
5 lock struct(s), heap size 1184, 1 row lock(s), undo log entries 3
MySQL thread id 18281, OS thread handle 0x2b9cf468d700, query id 2306856 172.31.8.174 meclub2359 cleaning up
Trx read view will not see trx with id >= 712240, sees < 712141
---TRANSACTION 712141, ACTIVE 935 sec
7 lock struct(s), heap size 1184, 12 row lock(s), undo log entries 19
MySQL thread id 18279, OS thread handle 0x2b9cf4485700, query id 2306383 172.31.8.174 meclub2359 cleaning up
Trx read view will not see trx with id >= 712146, sees < 712146
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
 ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
562 OS file reads, 295536 OS file writes, 202242 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.24 writes/s, 0.18 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 1241027, node heap has 2 buffer(s)
0.06 hash searches/s, 0.27 non-hash searches/s
---
LOG
---
Log sequence number 87135734
Log flushed up to   87135734
Pages flushed up to 87135734
Last checkpoint at  87135734
0 pending log writes, 0 pending chkp writes
133514 log i/o's done, 0.09 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 641744896; in additional pool allocated 0
Dictionary memory allocated 264646
Buffer pool size   38271
Free buffers       20250
Database pages     18019
Old database pages 6631
Modified db pages  0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 423, not young 82
0.00 youngs/s, 0.00 non-youngs/s
Pages read 521, created 17502, written 131006
0.00 reads/s, 0.00 creates/s, 0.12 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 18019, unzip_LRU len: 0
I/O sum[4]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
4 read views open inside InnoDB
Main thread process no. 14465, id 47953069967104, state: sleeping
Number of rows inserted 55739, updated 14635, deleted 309, read 1463461
0.00 inserts/s, 0.03 updates/s, 0.00 deletes/s, 35.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

看起来有一些查询挂起了吗?运行show processlist命令查看。

enter image description here

如果是这样,我该如何调试我的应用程序以找出原因?我正在使用AWS RDS、Elastic Beanstalk、NodeJS/SailsJS。 更新 我发现这个问题只发生在一个行上。当我尝试UPDATE stardate SET winnersSelected=1 WHERE id='6'时。 更新 我发现关闭连接的代码更改有效。但我仍然想知道,如何从上面的输出中确定原因?我认为这将是有用的。我不确定这是否是原因。
例如,我看到
---TRANSACTION 0, not started
MySQL thread id 18285, OS thread handle 0x2b9cf4444700, query id 2306954 bb116-14-164-108.singnet.com.sg 116.14.164.108 meclub2359 cleaning up

---TRANSACTION 712235, not started
MySQL thread id 18268, OS thread handle 0x2b9cf3896700, query id 2306757 172.31.8.174 meclub2359 cleaning up

为什么清理工作似乎需要这么长时间?我应该担心吗?

---TRANSACTION 712269, ACTIVE 31 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 360, 1 row lock(s)
MySQL thread id 18288, OS thread handle 0x2b9cf44c6700, query id 2307092 172.31.8.174 meclub2359 updating
UPDATE stardate SET winnersSelected=1 WHERE id='6'
------- TRX HAS BEEN WAITING 31 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 4067 page no 3 n bits 72 index `PRIMARY` of table `ebdb`.`stardate` trx id 712269 lock_mode X locks rec but not gap waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 16; compact format; info bits 0
 0: len 4; hex 00000006; asc     ;;
 1: len 6; hex 0000000addcd; asc       ;;
 2: len 7; hex 180000021d2d57; asc      -W;;
 3: len 30; hex 68747470733a2f2f73332d61702d736f757468656173742d312e616d617a; asc https://s3-ap-southeast-1.amaz; (total 62 bytes);
 4: len 30; hex 4d75736963616c204576656e696e67207769746820456c6c696520476f75; asc Musical Evening with Ellie Gou; (total 35 bytes);
 5: len 30; hex 4d75736963616c204576656e696e67207769746820456c6c696520476f75; asc Musical Evening with Ellie Gou; (total 35 bytes);
 6: len 30; hex 4d75736963616c204576656e696e67207769746820456c6c696520476f75; asc Musical Evening with Ellie Gou; (total 255 bytes);
 7: len 22; hex 53696e6761706f72650d0a4d6172696e612042617920; asc Singapore  Marina Bay ;;
 8: len 20; hex 52443230313530394d434d4543444153434d5346; asc RD201509MCMECDASCMSF;;
 9: len 4; hex 80000002; asc     ;;
 10: len 5; hex 9997295000; asc   )P ;;
 11: len 5; hex 999702a000; asc      ;;
 12: len 4; hex 8000000c; asc     ;;
 13: len 1; hex 81; asc  ;;
 14: len 5; hex 999706e345; asc     E;;
 15: len 5; hex 99972ee820; asc   .  ;;

我可以从上面的输出中得知查询 UPDATE stardate SET winnersSelected=1 WHERE id='6' 被阻塞了吗?但是我怎么知道被什么阻塞了呢?

你可以尝试增加锁等待超时时间。 - Abhishek Ginani
@Code-Monk,我并不认为超时有所帮助,似乎更像是由其他正在睡眠的进程持有的锁?但我不确定,因为我不太理解如何解读这些信息。 - Jiew Meng
1
您可以将wait_timeout设置得更低,以关闭休眠的进程连接。该变量的默认值为8小时。 - Abhishek Ginani
@Code-Monk。我注意到这只发生在我尝试更新一个特定的行时。此外,我已经重新启动了数据库和应用服务器,因此所有连接都应该关闭。 - Jiew Meng
如果只发生在一行上,并且总是发生在该行上,则假定该行已损坏。尝试修复表格,看看是否可以解决问题。如果您无法在修复运行时将数据库脱机,则应从上次备份中复制表格,并查看新表格中是否存在错误。您还可以尝试删除该行并将其作为另一行添加回来。PS>请参阅http://dba.stackexchange.com/questions/17788/cannot-update-certain-rows-in-innodb-tables和http://dba.stackexchange.com/questions/3223/will-these-two-queries-result-in-a-deadlock-if-executed-in-sequence。 - David Soussan
清理过程太长时间不正常...你检查了慢查询日志吗?好像有些查询花费了太长时间,因此innodb在等待其完成以清理日志,从而以某种方式锁定更新查询...使用innotop观察事务和锁可能也会有所帮助... - madpoet
2个回答

2

当处理复杂的MySQL诊断问题时,很多时候需要使用gdb。您可以像这样做:

gdb -ex "set pagination 0" -ex "thread apply all bt" \
  --batch -p $(pidof mysqld)

然后你需要查看堆栈跟踪列表,通过了解源代码,你可以找出发生了什么。 "清理" 长时间的一个可能原因是某个关键锁被另一个线程占用,而且可能有多种原因。如果你粘贴堆栈跟踪信息,我就能够给你一个大概的想法。InnoDB 中的死锁实际上是一种合法情况。它是以这样一种方式设计的,即在某些情况下(希望很少),锁定算法会发现自己处于死锁状态。锁等待超时可以缓解该问题 - 达到超时时间时,事务将被中止以退出死锁状态。 理论上,理想的 InnoDB 客户端代码应编写为在由于死锁而失败的所有事务都应重试。 但在实践中,要使大型代码库符合此原则可能非常困难,但我仍建议选择一些关键战斗并重新编写最麻烦的代码以符合该原则。 为避免死锁,请频繁提交事务并避免以不可预测的顺序更新记录。当然,这还包括适当清理连接 - 如你在本例中所观察到的那样。如果不关闭连接,你可能会遇到看似无害的情况,如下所示: BEGIN SELECT 长时间没有活动 问题在于事务锁定了使用 SELECT 读取的行,并且在 COMMIT/ROLLBACK 之前不会解锁这些行。 如果你有很多难以清理的混乱代码,可以考虑不再使用连接池。MySQL 的会话启动开销非常低,并且可以很好地处理频繁的重新连接。这样,不活动的连接可能会自动在应用程序请求终止时关闭(我假设某些关于代码的事情在大量应用程序中是共同的),你就不会有处于打开状态的事务的挂起连接被返回到连接池中。

1

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接