当前位置:网站首页>Failure Analysis | A SELECT statement crashes MySQL, what happened?
Failure Analysis | A SELECT statement crashes MySQL, what happened?
2022-08-02 11:30:00 【Aikesheng open source community】
作者:刘开洋
爱可生交付服务团队北京 DBA,对数据库及周边技术有浓厚的学习兴趣,喜欢看书,追求技术.
本文来源:原创投稿
*爱可生开源社区出品,原创内容未经授权不得随意使用,转载请联系小编并注明来源.
In the troubleshooting of many difficult problems,I came across another one recently select statement execution will result MySQL 崩溃的问题,特来分享给大家.
Look at the error first
一般来讲,As long as the database crashes,Then the error log will definitely leave clues,Let's look at the specific error first:
06:08:23 UTC - mysqld got signal 11 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x7f55ac0008c0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f56f4074d80 thread_stack 0x46000
/usr/local/mysql/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x2e) [0x1f1b71e]
/usr/local/mysql/bin/mysqld(handle_fatal_signal+0x323) [0xfcfac3]
/lib64/libpthread.so.0(+0xf630) [0x7f5c28c85630]
/usr/local/mysql/bin/mysqld(actual_key_parts(KEY const*)+0xa) [0xef55ca]
/usr/local/mysql/bin/mysqld(calculate_key_len(TABLE*, unsigned int, unsigned long)+0x28) [0x10da428]
/usr/local/mysql/bin/mysqld(handler::ha_index_read_map(unsigned char*, unsigned char const*, unsigned long, ha_rkey_function)+0x261) [0x10dac51]
/usr/local/mysql/bin/mysqld(check_unique_constraint(TABLE*)+0xa3) [0xe620e3]
/usr/local/mysql/bin/mysqld(do_sj_dups_weedout(THD*, SJ_TMP_TABLE*)+0x111) [0xe62361]
/usr/local/mysql/bin/mysqld(WeedoutIterator::Read()+0xa9) [0x1084cd9]
/usr/local/mysql/bin/mysqld(MaterializeIterator::MaterializeQueryBlock(MaterializeIterator::QueryBlock const&, unsigned long long*)+0x17c) [0x10898bc]
/usr/local/mysql/bin/mysqld(MaterializeIterator::Init()+0x1e1) [0x108a021]
/usr/local/mysql/bin/mysqld(SELECT_LEX_UNIT::ExecuteIteratorQuery(THD*)+0x251) [0xf5d241]
/usr/local/mysql/bin/mysqld(SELECT_LEX_UNIT::execute(THD*)+0xf9) [0xf5f3f9]
/usr/local/mysql/bin/mysqld(Sql_cmd_dml::execute_inner(THD*)+0x20b) [0xeedf8b]
/usr/local/mysql/bin/mysqld(Sql_cmd_dml::execute(THD*)+0x3e8) [0xef7418]
/usr/local/mysql/bin/mysqld(mysql_execute_command(THD*, bool)+0x39c9) [0xeab3a9]
/usr/local/mysql/bin/mysqld(mysql_parse(THD*, Parser_state*)+0x31c) [0xead0cc]
/usr/local/mysql/bin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x156b) [0xeaeb6b]
/usr/local/mysql/bin/mysqld(do_command(THD*)+0x174) [0xeb0104]
/usr/local/mysql/bin/mysqld() [0xfc1a08]
/usr/local/mysql/bin/mysqld() [0x23ffdec]
/lib64/libpthread.so.0(+0x7ea5) [0x7f5c28c7dea5]
/lib64/libc.so.6(clone+0x6d) [0x7f5c26db9b0d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f55ac0ca298): SELECT DISTINCT T.CUST_NO FROM testDB.TABLE_TRANSACTION T WHERE EXISTS (SELECT 1 FROM testDB.Table1 T1 WHERE T.CUST_NO = T1.CUST_NO ) AND T.AGENT_CERT_NO IS NOT NULL
Connection ID (thread ID): 65
Status: NOT_KILLED
From the output of the above error log can find a few more obvious information:
1、导致崩溃的 SQL 语句为:SELECT DISTINCT T.CUST_NO FROM testDB.TABLE_TRANSACTION T WHERE EXISTS (SELECT 1 FROM testDB.Table1 T1 WHERE T.CUST_NO = T1.CUST_NO ) AND T.AGENT_CERT_NO IS NOT NULL
2、The signal from the database is signal 11 ,即是 MySQL An incorrect memory address was accessed.
分析过程
1、查看 OS logs and system resource usage:
OS The output of the log has no effect on the troubleshooting direction,无 MySQL OOM 的现象.
Check out monitoring at MySQL The crash period does not have any abnormal output,And can be executed in the environment at any time select Trigger the database crash .
2、Get the full one from the business side SQL and table structure information.
# 完整的SQL语句:
SELECT 'testPA' AS INDIC_KEY, A.CUST_NO AS OBJ_KEY,
CASE WHEN B.CUST_NO IS NULL THEN 1 ELSE END AS INDICVAL1,'2222-06-06' AS GRADING_DATE
FROM testDB.Table1 A
LEFT JOIN (
SELECT DISTINCT T.CUST_NO
FROM testDB.TABLE_TRANSACTION T
WHERE
EXISTS (SELECT 1 FROM testDB.Table1 T1 WHERE T.CUST_NO = T1.CUST_NO)
AND T.AGENT_CERT_NO IS NOT NULL
) B ON A.CUST_NO = B.CUST_NO;
# 表结构
CREATE TABLE `TABLE_TRANSACTION` (
`cert_key` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`cust_no` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
···
CREATE TABLE `Table1` (
`CUST_NO` varchar(32) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
···
3、查看该 select 语句的执行计划
4、堆栈分析
Through the stack you can see what the optimizer will do EXISTS The subquery is transformed into semi-join
操作,Since the optimizer is selected by default DuplicateWeedout
执行策略,Therefore, the deduplication operation of the outer query record will be realized by establishing a temporary table.
The execution process can be verified through the execution plan:执行计划的 Extra The columns will drive the table display Start temporary
提示,The driven table will be displayed End temporary
提示.
5、Stack problem point output:
(/usr/local/mysql/bin/mysqld(actual_key_parts(KEY const*)+0xa) [0xef55ca]
The stack is at memory address 0xef55ca
The location collapsed,This address is passable gdb The corresponding code bits are obtained by analysis:
#4 0x0000000000ef55ce in actual_key_parts (key_info=0x7fd5241641b0) at ../../mysql-8.0.19/sql/sql_class.h:1487
sql_class.h:1487
源码地址为:
通过 inline
得知 optimizer_switch_flag
函数为 actual_key_parts
的内联调用,找到 actual_key_parts
函数的位置:
6、使用 gdb 进行调试:
6.1. 使用 gdb 的 frame 下推4layer stack to actual_key_parts
函数:
6.2. 打印 actual_key_parts
The memory address corresponding to the pointer returned by the function
6.3. 发现在使用 in_use
时返回值为空,出现 0x0
,说明 table Memory addressing error.
6.4. 由于 in_use
返回为空,在调用 in_use
后面的代码 optimizer_switch_flag
When an illegal address appears,导致数据库的 crash .
得出结论
经过分析,It is currently determined that the problem is MySQL The function memory address is messed up bug ,After polling for higher version code,发现 8.0.24 The above code has been revised since the version:
以下为该 bug 的相关描述:https://github.com/mysql/mysql-server/commit/7fde9072e1f62b1b3cf857757a3be41cec5c8e48
解决方案
in the above analysis,我们得到这个 bug 是由于使用了 semi-join
的 DuplicateWeedout
Executing the policy caused the problem to occur,If the change database cannot be upgraded within a short period of time,And I want to avoid this problem as much as possible.
On the one hand, it must be avoided by the business side SQL 的执行,从 DBA The point of view is to consider this SQL How can it be executed normally,Well verified:
The following three solutions can solve the current problem select Database crashes caused by queries.
1、Set a reasonable and uniform character set for business tables(utf8mb4)和排序规则,避免existUsed in semi-join DuplicateWeedout
策略,加快 SQL 执行效率;
2、Turn off database level DuplicateWeedout
优化策略:
SET [GLOBAL|SESSION] optimizer_switch='duplicateweedout=off';
3、升级 MySQL 版本到 8.0.24 ;
其他解决思路
1、Retrieve relevant stack codes directly on platforms like Google,查找 MySQL 类似的 bug ,Then correlate in the repaired version SQL 的验证,确认该 bug The corresponding ones have been fixed,Complete troubleshooting.
2、数据库开启 coredump Complete secondary verification of the stack.
特别鸣谢:爱可生 CTO 黄炎 先生
边栏推荐
猜你喜欢
How to technically ensure the quality of LED display?
Three.JS程序化建模入门
【Acunetix-Forgot your password】
一体化在线政务服务平台,小程序容器技术加速建设步伐
SQL 经典50题(题目+解答)(1)
C#/VB.NET 添加多行多列图片水印到Word文档
ssm web page access database data error
基于threejs的商品VR展示平台的设计与实现思路
打破千篇一律,DIY属于自己独一无二的商城
Nanny Level Tutorial: Write Your Own Mobile Apps and Mini Programs (Part 2)
随机推荐
Jest 测试框架 beforeEach 的设计原理解析
博云入选Gartner中国DevOps代表厂商
MySQL百万数据优化总结 一
注意力机制
How to technically ensure the quality of LED display?
Problem solving in the process of using mosquitto
X86函数调用模型分析
Oracle超全SQL,细节狂魔
小程序插件的生态丰富,加速开发建设效率
一体化在线政务服务平台,小程序容器技术加速建设步伐
字母交换--字符串dp
C#/VB.NET to add more lines more columns image watermark into the Word document
C#为listview选中的项添加右键菜单
npm WARN config global `--global`, `--local` are deprecated. Use `--location解决方案
npm WARN deprecated [email protected] This version of tar is no longer supported, and will not receive
外包学生管理系统架构文档
npm run serve启动报错npm ERR Missing script “serve“
数字化转型中的低代码
Failed to configure mysql, what's going on?
Idea 全局搜索(idea如何全局搜索关键字)