当前位置:网站首页>Failure Analysis | A SELECT statement crashes MySQL, what happened?

Failure Analysis | A SELECT statement crashes MySQL, what happened?

2022-08-02 11:30:00 Aikesheng open source community

作者:刘开洋

爱可生交付服务团队北京 DBA,对数据库及周边技术有浓厚的学习兴趣,喜欢看书,追求技术.

本文来源:原创投稿

*爱可生开源社区出品,原创内容未经授权不得随意使用,转载请联系小编并注明来源.

In the troubleshooting of many difficult problems,I came across another one recently select statement execution will result MySQL 崩溃的问题,特来分享给大家.

Look at the error first

一般来讲,As long as the database crashes,Then the error log will definitely leave clues,Let's look at the specific error first:

06:08:23 UTC - mysqld got signal 11 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x7f55ac0008c0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f56f4074d80 thread_stack 0x46000
/usr/local/mysql/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x2e) [0x1f1b71e]
/usr/local/mysql/bin/mysqld(handle_fatal_signal+0x323) [0xfcfac3]
/lib64/libpthread.so.0(+0xf630) [0x7f5c28c85630]
/usr/local/mysql/bin/mysqld(actual_key_parts(KEY const*)+0xa) [0xef55ca]
/usr/local/mysql/bin/mysqld(calculate_key_len(TABLE*, unsigned int, unsigned long)+0x28) [0x10da428]
/usr/local/mysql/bin/mysqld(handler::ha_index_read_map(unsigned char*, unsigned char const*, unsigned long, ha_rkey_function)+0x261) [0x10dac51]
/usr/local/mysql/bin/mysqld(check_unique_constraint(TABLE*)+0xa3) [0xe620e3]
/usr/local/mysql/bin/mysqld(do_sj_dups_weedout(THD*, SJ_TMP_TABLE*)+0x111) [0xe62361]
/usr/local/mysql/bin/mysqld(WeedoutIterator::Read()+0xa9) [0x1084cd9]
/usr/local/mysql/bin/mysqld(MaterializeIterator::MaterializeQueryBlock(MaterializeIterator::QueryBlock const&, unsigned long long*)+0x17c) [0x10898bc]
/usr/local/mysql/bin/mysqld(MaterializeIterator::Init()+0x1e1) [0x108a021]
/usr/local/mysql/bin/mysqld(SELECT_LEX_UNIT::ExecuteIteratorQuery(THD*)+0x251) [0xf5d241]
/usr/local/mysql/bin/mysqld(SELECT_LEX_UNIT::execute(THD*)+0xf9) [0xf5f3f9]
/usr/local/mysql/bin/mysqld(Sql_cmd_dml::execute_inner(THD*)+0x20b) [0xeedf8b]
/usr/local/mysql/bin/mysqld(Sql_cmd_dml::execute(THD*)+0x3e8) [0xef7418]
/usr/local/mysql/bin/mysqld(mysql_execute_command(THD*, bool)+0x39c9) [0xeab3a9]
/usr/local/mysql/bin/mysqld(mysql_parse(THD*, Parser_state*)+0x31c) [0xead0cc]
/usr/local/mysql/bin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x156b) [0xeaeb6b]
/usr/local/mysql/bin/mysqld(do_command(THD*)+0x174) [0xeb0104]
/usr/local/mysql/bin/mysqld() [0xfc1a08]
/usr/local/mysql/bin/mysqld() [0x23ffdec]
/lib64/libpthread.so.0(+0x7ea5) [0x7f5c28c7dea5]
/lib64/libc.so.6(clone+0x6d) [0x7f5c26db9b0d]
  
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f55ac0ca298): SELECT DISTINCT T.CUST_NO FROM testDB.TABLE_TRANSACTION T  WHERE EXISTS (SELECT 1  FROM testDB.Table1 T1 WHERE T.CUST_NO = T1.CUST_NO ) AND T.AGENT_CERT_NO IS NOT NULL
Connection ID (thread ID): 65
Status: NOT_KILLED

From the output of the above error log can find a few more obvious information:

1、导致崩溃的 SQL 语句为:SELECT DISTINCT T.CUST_NO FROM testDB.TABLE_TRANSACTION T WHERE EXISTS (SELECT 1 FROM testDB.Table1 T1 WHERE T.CUST_NO = T1.CUST_NO ) AND T.AGENT_CERT_NO IS NOT NULL

2、The signal from the database is signal 11 ,即是 MySQL An incorrect memory address was accessed.

分析过程

1、查看 OS logs and system resource usage:

OS The output of the log has no effect on the troubleshooting direction,无 MySQL OOM 的现象.

Check out monitoring at MySQL The crash period does not have any abnormal output,And can be executed in the environment at any time select Trigger the database crash .

2、Get the full one from the business side SQL and table structure information.

# 完整的SQL语句:
SELECT 'testPA' AS INDIC_KEY, A.CUST_NO AS OBJ_KEY,
  CASE WHEN B.CUST_NO IS NULL THEN 1 ELSE  END AS INDICVAL1,'2222-06-06' AS GRADING_DATE
FROM testDB.Table1 A
LEFT JOIN (
  SELECT DISTINCT T.CUST_NO
  FROM testDB.TABLE_TRANSACTION T
  WHERE
    EXISTS (SELECT 1 FROM testDB.Table1 T1 WHERE T.CUST_NO = T1.CUST_NO)
  AND T.AGENT_CERT_NO IS NOT NULL
) B ON A.CUST_NO = B.CUST_NO;
  
# 表结构
CREATE TABLE `TABLE_TRANSACTION` (
  `cert_key` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `cust_no` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
 ···
CREATE TABLE `Table1` (
  `CUST_NO` varchar(32) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
 ···

3、查看该 select 语句的执行计划

4、堆栈分析

Through the stack you can see what the optimizer will do EXISTS The subquery is transformed into semi-join 操作,Since the optimizer is selected by default DuplicateWeedout 执行策略,Therefore, the deduplication operation of the outer query record will be realized by establishing a temporary table.

The execution process can be verified through the execution plan:执行计划的 Extra The columns will drive the table display Start temporary 提示,The driven table will be displayed End temporary 提示.

5、Stack problem point output:

(/usr/local/mysql/bin/mysqld(actual_key_parts(KEY const*)+0xa) [0xef55ca]

The stack is at memory address 0xef55ca The location collapsed,This address is passable gdb The corresponding code bits are obtained by analysis:

#4 0x0000000000ef55ce in actual_key_parts (key_info=0x7fd5241641b0) at ../../mysql-8.0.19/sql/sql_class.h:1487

sql_class.h:1487 源码地址为:

通过 inline 得知 optimizer_switch_flag 函数为 actual_key_parts 的内联调用,找到 actual_key_parts 函数的位置:

6、使用 gdb 进行调试:

6.1. 使用 gdb 的 frame 下推4layer stack to actual_key_parts 函数:

6.2. 打印 actual_key_parts The memory address corresponding to the pointer returned by the function

6.3. 发现在使用 in_use 时返回值为空,出现 0x0 ,说明 table Memory addressing error.

6.4. 由于 in_use 返回为空,在调用 in_use 后面的代码 optimizer_switch_flag When an illegal address appears,导致数据库的 crash .

得出结论

经过分析,It is currently determined that the problem is MySQL The function memory address is messed up bug ,After polling for higher version code,发现 8.0.24 The above code has been revised since the version:

以下为该 bug 的相关描述:https://github.com/mysql/mysql-server/commit/7fde9072e1f62b1b3cf857757a3be41cec5c8e48

解决方案

in the above analysis,我们得到这个 bug 是由于使用了 semi-joinDuplicateWeedout Executing the policy caused the problem to occur,If the change database cannot be upgraded within a short period of time,And I want to avoid this problem as much as possible.

On the one hand, it must be avoided by the business side SQL 的执行,从 DBA The point of view is to consider this SQL How can it be executed normally,Well verified:

The following three solutions can solve the current problem select Database crashes caused by queries.

1、Set a reasonable and uniform character set for business tables(utf8mb4)和排序规则,避免existUsed in semi-join DuplicateWeedout 策略,加快 SQL 执行效率;

2、Turn off database level DuplicateWeedout 优化策略:

SET [GLOBAL|SESSION] optimizer_switch='duplicateweedout=off';

3、升级 MySQL 版本到 8.0.24 ;

其他解决思路

1、Retrieve relevant stack codes directly on platforms like Google,查找 MySQL 类似的 bug ,Then correlate in the repaired version SQL 的验证,确认该 bug The corresponding ones have been fixed,Complete troubleshooting.

2、数据库开启 coredump Complete secondary verification of the stack.

特别鸣谢:爱可生 CTO 黄炎 先生

原网站

版权声明
本文为[Aikesheng open source community]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/214/202208021118115582.html