当前位置:网站首页>Failure Analysis | A SELECT statement crashes MySQL, what happened?
Failure Analysis | A SELECT statement crashes MySQL, what happened?
2022-08-02 11:30:00 【Aikesheng open source community】
作者:刘开洋
爱可生交付服务团队北京 DBA,对数据库及周边技术有浓厚的学习兴趣,喜欢看书,追求技术.
本文来源:原创投稿
*爱可生开源社区出品,原创内容未经授权不得随意使用,转载请联系小编并注明来源.
In the troubleshooting of many difficult problems,I came across another one recently select statement execution will result MySQL 崩溃的问题,特来分享给大家.
Look at the error first
一般来讲,As long as the database crashes,Then the error log will definitely leave clues,Let's look at the specific error first:
06:08:23 UTC - mysqld got signal 11 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x7f55ac0008c0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f56f4074d80 thread_stack 0x46000
/usr/local/mysql/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x2e) [0x1f1b71e]
/usr/local/mysql/bin/mysqld(handle_fatal_signal+0x323) [0xfcfac3]
/lib64/libpthread.so.0(+0xf630) [0x7f5c28c85630]
/usr/local/mysql/bin/mysqld(actual_key_parts(KEY const*)+0xa) [0xef55ca]
/usr/local/mysql/bin/mysqld(calculate_key_len(TABLE*, unsigned int, unsigned long)+0x28) [0x10da428]
/usr/local/mysql/bin/mysqld(handler::ha_index_read_map(unsigned char*, unsigned char const*, unsigned long, ha_rkey_function)+0x261) [0x10dac51]
/usr/local/mysql/bin/mysqld(check_unique_constraint(TABLE*)+0xa3) [0xe620e3]
/usr/local/mysql/bin/mysqld(do_sj_dups_weedout(THD*, SJ_TMP_TABLE*)+0x111) [0xe62361]
/usr/local/mysql/bin/mysqld(WeedoutIterator::Read()+0xa9) [0x1084cd9]
/usr/local/mysql/bin/mysqld(MaterializeIterator::MaterializeQueryBlock(MaterializeIterator::QueryBlock const&, unsigned long long*)+0x17c) [0x10898bc]
/usr/local/mysql/bin/mysqld(MaterializeIterator::Init()+0x1e1) [0x108a021]
/usr/local/mysql/bin/mysqld(SELECT_LEX_UNIT::ExecuteIteratorQuery(THD*)+0x251) [0xf5d241]
/usr/local/mysql/bin/mysqld(SELECT_LEX_UNIT::execute(THD*)+0xf9) [0xf5f3f9]
/usr/local/mysql/bin/mysqld(Sql_cmd_dml::execute_inner(THD*)+0x20b) [0xeedf8b]
/usr/local/mysql/bin/mysqld(Sql_cmd_dml::execute(THD*)+0x3e8) [0xef7418]
/usr/local/mysql/bin/mysqld(mysql_execute_command(THD*, bool)+0x39c9) [0xeab3a9]
/usr/local/mysql/bin/mysqld(mysql_parse(THD*, Parser_state*)+0x31c) [0xead0cc]
/usr/local/mysql/bin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x156b) [0xeaeb6b]
/usr/local/mysql/bin/mysqld(do_command(THD*)+0x174) [0xeb0104]
/usr/local/mysql/bin/mysqld() [0xfc1a08]
/usr/local/mysql/bin/mysqld() [0x23ffdec]
/lib64/libpthread.so.0(+0x7ea5) [0x7f5c28c7dea5]
/lib64/libc.so.6(clone+0x6d) [0x7f5c26db9b0d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f55ac0ca298): SELECT DISTINCT T.CUST_NO FROM testDB.TABLE_TRANSACTION T WHERE EXISTS (SELECT 1 FROM testDB.Table1 T1 WHERE T.CUST_NO = T1.CUST_NO ) AND T.AGENT_CERT_NO IS NOT NULL
Connection ID (thread ID): 65
Status: NOT_KILLED
From the output of the above error log can find a few more obvious information:
1、导致崩溃的 SQL 语句为:SELECT DISTINCT T.CUST_NO FROM testDB.TABLE_TRANSACTION T WHERE EXISTS (SELECT 1 FROM testDB.Table1 T1 WHERE T.CUST_NO = T1.CUST_NO ) AND T.AGENT_CERT_NO IS NOT NULL
2、The signal from the database is signal 11 ,即是 MySQL An incorrect memory address was accessed.
分析过程
1、查看 OS logs and system resource usage:
OS The output of the log has no effect on the troubleshooting direction,无 MySQL OOM 的现象.
Check out monitoring at MySQL The crash period does not have any abnormal output,And can be executed in the environment at any time select Trigger the database crash .
2、Get the full one from the business side SQL and table structure information.
# 完整的SQL语句:
SELECT 'testPA' AS INDIC_KEY, A.CUST_NO AS OBJ_KEY,
CASE WHEN B.CUST_NO IS NULL THEN 1 ELSE END AS INDICVAL1,'2222-06-06' AS GRADING_DATE
FROM testDB.Table1 A
LEFT JOIN (
SELECT DISTINCT T.CUST_NO
FROM testDB.TABLE_TRANSACTION T
WHERE
EXISTS (SELECT 1 FROM testDB.Table1 T1 WHERE T.CUST_NO = T1.CUST_NO)
AND T.AGENT_CERT_NO IS NOT NULL
) B ON A.CUST_NO = B.CUST_NO;
# 表结构
CREATE TABLE `TABLE_TRANSACTION` (
`cert_key` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`cust_no` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
···
CREATE TABLE `Table1` (
`CUST_NO` varchar(32) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
···
3、查看该 select 语句的执行计划
4、堆栈分析
Through the stack you can see what the optimizer will do EXISTS The subquery is transformed into semi-join
操作,Since the optimizer is selected by default DuplicateWeedout
执行策略,Therefore, the deduplication operation of the outer query record will be realized by establishing a temporary table.
The execution process can be verified through the execution plan:执行计划的 Extra The columns will drive the table display Start temporary
提示,The driven table will be displayed End temporary
提示.
5、Stack problem point output:
(/usr/local/mysql/bin/mysqld(actual_key_parts(KEY const*)+0xa) [0xef55ca]
The stack is at memory address 0xef55ca
The location collapsed,This address is passable gdb The corresponding code bits are obtained by analysis:
#4 0x0000000000ef55ce in actual_key_parts (key_info=0x7fd5241641b0) at ../../mysql-8.0.19/sql/sql_class.h:1487
sql_class.h:1487
源码地址为:
通过 inline
得知 optimizer_switch_flag
函数为 actual_key_parts
的内联调用,找到 actual_key_parts
函数的位置:
6、使用 gdb 进行调试:
6.1. 使用 gdb 的 frame 下推4layer stack to actual_key_parts
函数:
6.2. 打印 actual_key_parts
The memory address corresponding to the pointer returned by the function
6.3. 发现在使用 in_use
时返回值为空,出现 0x0
,说明 table Memory addressing error.
6.4. 由于 in_use
返回为空,在调用 in_use
后面的代码 optimizer_switch_flag
When an illegal address appears,导致数据库的 crash .
得出结论
经过分析,It is currently determined that the problem is MySQL The function memory address is messed up bug ,After polling for higher version code,发现 8.0.24 The above code has been revised since the version:
以下为该 bug 的相关描述:https://github.com/mysql/mysql-server/commit/7fde9072e1f62b1b3cf857757a3be41cec5c8e48
解决方案
in the above analysis,我们得到这个 bug 是由于使用了 semi-join
的 DuplicateWeedout
Executing the policy caused the problem to occur,If the change database cannot be upgraded within a short period of time,And I want to avoid this problem as much as possible.
On the one hand, it must be avoided by the business side SQL 的执行,从 DBA The point of view is to consider this SQL How can it be executed normally,Well verified:
The following three solutions can solve the current problem select Database crashes caused by queries.
1、Set a reasonable and uniform character set for business tables(utf8mb4)和排序规则,避免existUsed in semi-join DuplicateWeedout
策略,加快 SQL 执行效率;
2、Turn off database level DuplicateWeedout
优化策略:
SET [GLOBAL|SESSION] optimizer_switch='duplicateweedout=off';
3、升级 MySQL 版本到 8.0.24 ;
其他解决思路
1、Retrieve relevant stack codes directly on platforms like Google,查找 MySQL 类似的 bug ,Then correlate in the repaired version SQL 的验证,确认该 bug The corresponding ones have been fixed,Complete troubleshooting.
2、数据库开启 coredump Complete secondary verification of the stack.
特别鸣谢:爱可生 CTO 黄炎 先生
边栏推荐
猜你喜欢
AlphaFold又放大招,剑指整个生物界!
从众多接口中脱颖而出的最稳定的接口——淘宝详情api
Three.JS程序化建模入门
When not to use () instead of Void in Swift
使用mosquitto过程中的问题解决
Challenge LeetCode1000 questions in 365 days - Day 047 Design Circular Queue Circular Queue
QT笔记——Q_PROPERTY了解
Camera Hal OEM模块 ---- cmr_snapshot.c
Create an application operation process using the kubesphere GUI
C#为listview选中的项添加右键菜单
随机推荐
When not to use () instead of Void in Swift
基于threejs的商品VR展示平台的设计与实现思路
ECCV22|PromptDet:无需手动标注,迈向开放词汇的目标检测
Geoffery Hinton: The Next Big Thing in Deep Learning
放苹果(暑假每日一题 13)
jacoco的学习以及理解
QT笔记——Q_PROPERTY了解
注意力机制
2022年8月初济南某外包公司全栈开发面试题整理
一体化在线政务服务平台,小程序容器技术加速建设步伐
AdguardHome如何配置设置?我的AdguardHome配置内容过滤器拦截列表
Breaking the Boundary, Huawei's Storage Journey
微信小程序---组件开发与使用
pyqt5连接MYSQL数据库问题
Oracle降低高水位
npm install报错npm ERR Could not resolve dependency npm ERR peer
jvmxmx和xms参数分析(设定优化校准)
关于#oracle#的问题,如何解决?
如何在技术上来保证LED显示屏质量?
Mysql事务隔离级别与MVCC(多版本并发控制)