当前位置:网站首页>TiCDC synchronization delay problem
TiCDC synchronization delay problem
2022-07-29 12:30:00 【InfoQ】
作者: seiang原文来源:
https://tidb.net/blog/b3ab96b6
Today sharing a one I met a few weeks ago TiCDC 同步 MySQL The problem of data delay,The processing process has twists and turns,希望对大家有所帮助;
(
笔者能力有限,If there are technical or descriptive errors in the article,请大家及时指正,非常感谢!
)
背景介绍
首先,Briefly introduce the TiCDC Approximate application scenarios of synchronization tasks and synchronization links,如下图所示:

TiDB In the cluster storage to table data synchronization,The amount of data in a single day is 2.5 亿左右,The data storage size for a single day is in 80G 左右,data due to storage space limitations,The historical data needs to be cleaned regularly every day,However, these historical data need to be saved for business and customer service personnel to query,需要永久保存;
So for this table data pass TiCDC way of real-time data transfer from TiDB 同步到 MySQL,然后将 MySQL data is being synced to Hive For permanent preservation;
分析解决过程
The synchronization task has been running stably for more than a month,No any problems;但是在 2022-06-28 08:16 Start to receive continuously TiCDC Sync Delay Alert,监控如下所示:

1、First check if the synchronization task is interrupted,Found that the sync task is normal,并且 tso 和 checkpoint also changing,But the delay is constantly increasing
$ tiup ctl:v5.0.3 cdc changefeed list --pd=http://10.xx.xx.xx:2379
Starting component `ctl`: /home/tidb/.tiup/components/ctl/v5.0.3/ctl cdc changefeed list --pd=http://10.xx.xx.xx:2379
[
{
"id": "xx-xx-task",
"summary": {
"state": "normal",
"tso": 434212451741859960,
"checkpoint": "2022-06-28 09:04:12.360",
"error": null
}
}
]
2、从监控信息,Unified Sort on disk 一直在增长,Does it feel like there is a synchronization delay caused by a large transaction?

通过查看 CDC 节点日志、以及 TiDB Server node log of,and confirm with the salesperson,within this time frame,The business side has not been adjusted,No surge in business
Next, add a TiCDC Related Important Monitoring Instructions:
- Changefeed checkpoint lag:The progress of the upstream and downstream data of the synchronization task is poor(in time)
- Changefeed checkpoint:Synchronize the progress of the task synchronization to the downstream,Under normal circumstances, the green column should be connected to the yellow line
- Sink write duration:TiCDC Time-consuming histogram of writing a transaction's changes to the downstream
The first wave of positioning:在 CDC 或是 TiDB No related anomalies were found at the cluster level;
3、接下来,Check downstream MySQL Whether there's a problem,lead to slower consumption of data,如下是 MySQL Monitoring in recent days
平均负载:

IO Util:

From a monitoring perspective,发现 MySQL The host load there is abnormal,much higher than before,并且 IO basically full;
但是为什么 MySQL Why did the load suddenly increase,IO All of a sudden is played?上层 TiDB of business check that business in recent days is normal,business volume is normal;
The next RiBiao analysis under synchronous cycle,在下游 MySQL Has there been any change in the last two days?:
备注:Reasons and fields, there is no direct link,The specific table structure here hides the relevant fields;
mysql>show create table reXXX_20220627\G
*************************** 1. row ***************************
Table: reXXX20220627
Create Table: CREATE TABLE `reXXX20220627` (
`pid` int(10) unsigned NOT NULL,
`tid` int(10) unsigned NOT NULL,
....
PRIMARY KEY (`pid`,`mpid`,`mid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin ROW_FORMAT=COMPRESSED;
1 row in set (0.00 sec)
mysql>show create table reXXX_20220628\G
*************************** 1. row ***************************
Table: reXXX20220628
Create Table: CREATE TABLE `reXXX20220628` (
`pid` int(10) unsigned NOT NULL,
`tid` int(10) unsigned NOT NULL,
....
PRIMARY KEY (`pid`,`mpid`,`mid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
1 row in set (0.00 sec)
Compare downstream MySQL Periodic table structure for the last two days,It is found that the table structure of the previous day and the table of the current day are the same,具体体现在:
(1)The previous day's table was COMPRESSED 行格式,And the table of the day is the default DYNAMIC 行格式
(2)The previous day's table was utf8_bin 字符排序规则,And the table of the day is the default utf8_general_ci 字符排序规则
Second wave positioning:发现下游 MySQL table structure inconsistency problem;
4、下面,According to the above problems,first downstream MySQL The row format of the table structure is adjusted to COMPRESSED,After adjustment is complete,In the second day of the peak time and the downstream business CDC 同步延迟,下游 MySQL The load has eased a lot compared to yesterday,But still higher than before,如下所示:

From the above results,下游 MySQL After adjusting the table structure, the row format is COMPRESSED It can alleviate the downstream consumption delay problem during peak business hours,Just ease,But the problem and the existence of;
Third wave positioning:下游 MySQL table structure inconsistency problem,By adjusting the table row format to COMPRESSED It can alleviate the downstream consumption delay problem during peak business hours,But peak time delays still exist;
5、接下来,Continue to adjust the collation of the table,the collation of the table,从 utf8_general_ci 调整为 utf8_bin,After a few days of comparing,During peak business hours, the delay is basically 1s 左右;下游的 MySQL The load also dropped a lot,基本稳定;


下面补充一下,MySQL The row format of the table structure is in COMPRESSED 同等情况下,utf8_general_ci 和 utf8_bin The performance of the collation:


From the comparison results,同等配置情况下,utf8_bin sorting rules than utf8_general_ci slightly better performance;
总结
目前 TiCDC In most scenarios, it can meet business scenarios,Including some of the cluster data currently passing through TiCDC 同步数据到 Kafka,It's still a problem for more than a year of operation;期待未来 TiCDC Can be more stable efficient,And can support a variety of big data business scenarios;
边栏推荐
- 什么是BOM
- js 数组常用API
- 容器化 | 在 Rancher 中部署 MySQL 集群
- [纯理论] FPN (Feature Pyramid Network)
- PHP uedtior报错 errorHandler is not defined
- 【day04】IDEA, method
- Paddle frame experience evaluation and exchange meeting, the use experience of the product is up to you!
- Hugo NexT V4 介绍
- 【多线程】——Callable创建多线程
- 【我的OpenGL学习进阶之旅】向量点乘和叉乘的几何意义
猜你喜欢
随机推荐
QCon大会广州站它来了!独家定制双肩背包等你领取!
DAY 22 丨 daily SQL clock 】 【 the average selling price of the difficulty of medium 】
网络层和传输层限制
【多线程】——Callable创建多线程
金仓数据库KingbaseES客户端编程接口指南-ODBC(9. 疑难解答)
栈“后进先出”和队列中“先进先出”的含义
Pangolin库链接库问题
SQL clock in daily DAY 23 丨 】 the number of students have different subjects to test difficulty simple 】 【
WordPress 重置密码
QCon Guangzhou Station is here!Exclusive custom backpacks are waiting for you!
递归-八皇后问题
国内首秀元宇宙Live House圆满收官,百事可乐虚拟偶像真的好会!!
记账APP:小哈记账3——登录页面的制作
How much is the test environment, starting from the actual needs
1.4, stack
Js array commonly used API
PL/SQL 集合
shell if else 使用
TiCDC Migration - TiDB to MySQL Test
基础架构之分布式配置中心
![[纯理论] YOLOv5](/img/c2/6569926228ce763381ab75ef8498fb.png)







