当前位置:网站首页>Push technology practice | master these two tuning skills to speed up tidb performance a thousand times!
Push technology practice | master these two tuning skills to speed up tidb performance a thousand times!
2022-07-04 01:49:00 【Push @ little assistant】
At the Beijing Winter Olympics , Athletes are fighting hard , They are using speed 、 Power 、 Toughness interprets faster 、 Higher 、 Stronger Olympic spirit , Salute them ! Actually , stay 0 and 1 In the computer world , Developers and programmers in order to improve the running speed of the system 、 Maximize server performance , We also have to face all kinds of challenges , Keep putting forward plans , Practice , To break through the bottleneck 、 Solve the problem .
A push “ Big data cost reduction and efficiency improvement ” project , It is through summing up and sharing their own experience of stepping on the pit in the actual combat process of big data 、 Tuning skills, etc , Provide reference for practitioners to carry out big data practice . This article is about “ Big data cost reduction and efficiency improvement ” The third part of the topic , We will share the tuning through , Realization TiDB Practical experience of thousand times performance improvement .
Getui and TiDB The bond of
As a data intelligence enterprise , One push for hundreds of thousands APP It provides developer services such as message push , At the same time, we provide professional digital solutions for many industry customers . While rapidly developing the business , The company's data volume is also growing rapidly . as time goes on , More and more data ,MySQL It has been unable to meet the needs of the company for rapid query and analysis of data , A kind of Support horizontal elastic expansion , It can effectively deal with high concurrency 、 Massive data scenarios , At the same time, it's highly compatible MySQL The new database has become the selection demand of push .
After in-depth research , We found that “ online celebrity ” database TiDB Not only have the above characteristics , still Financial high availability 、 With strong data consistency 、 Support real-time HTAP Cloud native distributed database . therefore , We decided to MySQL Switch to TiDB, Expectation realization With the increasing amount of data storage , Still ensure fast query of data , Satisfy internal and external customers to analyze data efficiently The needs of , For example, provide timely push and distribution volume for developers and users 、 Arrival rate and other related data reports , Help them make scientific decisions .
After model selection , We started data migration . This migration MySQL The amount of data in the database instance is several T about , We use TiDB Self contained ecological tools Data Migration (DM) Migrate full and incremental data .
- Full data migration : Migrate the table structure of the corresponding table from the data source to TiDB, Then read the stock data , Write to TiDB colony .
- Incremental data replication : After the full data migration is completed , Read the corresponding table change from the data source , Then write to TiDB colony .
Pushers MySQL Data migration to TiDB
When the data synchronization is stable , Gradually migrate the application to TiDB Cluster. After migrating the last application , stop it DM Cluster. This is done from MySQL To TiDB Data migration .
notes :DM See the official document for the specific configuration and use of .
fall into TiDB The use of “ Anti pattern ”
However , When all applications are migrated to TiDB after , But the database response is slow 、 Carton , A series of problems such as application unavailability .
Here's the picture :
When logging in the database, I encountered Caton
By checking , We found a lot of slow SQL Is to use load Script for data import .
slow SQL The import of takes tens of minutes
After communicating with the business party , We found that Some import statements contain tens of thousands of records , Importing takes tens of minutes .
Compare the previous use of MySQL, One import takes only a few minutes or even tens of seconds to complete , And moved to TiDB But it takes double or even several times to complete , Made up of several machines TiDB Cluster is not as good as one MySQL machine .
This is definitely not open TiDB The right posture , We need to find out why , Optimize it .
The load on a single server is too high
Monitor by viewing , It is found that the server load pressure is on one of the machines ( Pictured above , The server marked in the red wireframe bears the main pressure ), This shows that we have not made full use of all our resources at present , Failed to play TiDB As the performance advantage of distributed database .
open TiDB Correct use of posture
First, optimize the configuration parameters
How to optimize it ? Let's start with configuration parameters . as everyone knows , Many configuration parameters are the default parameters of the system , This does not help us make rational use of the performance of the server . Through in-depth access to official documents and multiple rounds of measurement , We are right. TiDB The configuration parameters have been adjusted properly , So as to make full use of server resources , Make the server performance reach the ideal state .
The following table is a push right TiDB Description of adjustment of configuration parameters , For reference :
Focus on solving hot issues
Adjusting configuration parameters is only a basic step , We still need to fundamentally solve the problem that the server load pressure is concentrated on one machine . But how to solve it ? This requires us to have an in-depth understanding of TiDB The architecture of , as well as TiDB The internal principle of saving data in tables .
stay TiDB In the whole architecture of , Distributed data storage engine TiKV Server Responsible for storing data . When storing data ,TiKV Using range segmentation (range) Segment the data in a way , The smallest unit of segmentation is region. Every region There's a size limit ( The default upper limit is 96M), There will be multiple copies , Each set of copies , Become a raft group. Every raft group Zhongyou leader Be responsible for reading this block of data & Write .leader Will be automatically PD Components (Placement Driver, abbreviation “PD”, It is the management module of the whole cluster ) Uniform scheduling on different physical nodes , To divide reading and writing pressure equally , Load balancing .
TiDB Architecture diagram ( Image from TiDB Official website )
TiDB Each table will be assigned a TableID, Allocate one for each index IndexID, Assign one... To each row RowID( By default , If the table uses integer Primary Key, Then I can use Primary Key Value as RowID). The data of the same table will be stored in table ID A beginning with a prefix range in , The data will follow RowID The values are arranged in order . In the insert (insert) In the process of watch , If RowID The value of is increasing , The inserted row can only be appended at the end .
When Region After reaching a certain size, it will split , After the split, it can only be in the present range Append... At the end of the range , And always only in the same Region on insert operation , This forms a hot spot ( That is, the excessive load at a single point ), fall into TiDB The use of “ Anti pattern ”.
common increment The type self increment primary key is incremented in order , By default , When the primary key is an integer , The primary key value will be used as RowID , here RowID Also increasing in order , In large quantities insert The write hotspot of the table will be formed . meanwhile ,TiDB in RowID By default, it is also incremented in the order of self increment , When the primary key type is an integer , You will also encounter the problem of writing hotspots .
In the use of MySQL Database time , For convenience , We are all used to using self increasing ID As the primary key of the table . therefore , Take data from MySQL Migrate to TiDB after , The original table structure remains unchanged , It is still self increasing ID As the primary key of the table . This causes problems when importing data in batches TiDB Write hot issues , Lead to Region The division continues , Consume a lot of resources .
Regarding this , It's going on TiDB To optimize the , Let's start with table structure , Yes, self increasing ID Rebuild the table as the primary key , Delete auto increment ID, Use TiDB implicit _tidb_rowid Column as primary key , take
create table t (a int primary key auto_increment, b int);
Change it to :
create table t (a int, b int)SHARD_ROW_ID_BITS=4 PRE_SPLIT_REGIONS=2
By setting SHARD_ROW_ID_BITS, take RowID Break up and write multiple different Region, So as to alleviate the problem of writing hotspots .
Notice here ,SHARD_ROW_ID_BITS Value determines the number of slices :
- SHARD_ROW_ID_BITS = 0 Express 1 A shard
- SHARD_ROW_ID_BITS = 4 Express 16 A shard
- SHARD_ROW_ID_BITS = 6 Express 64 A shard
SHARD_ROW_ID_BITS Excessive value setting will cause RPC The number of requests is enlarged , increase CPU And network overhead , Here we will SHARD_ROW_ID_BITS Set to 4.
PRE_SPLIT_REGIONS It refers to the pre uniform segmentation after successful table creation , We set PRE_SPLIT_REGIONS=2, Pre uniform segmentation after successful table creation 2^(PRE_SPLIT_REGIONS) individual Region.
Summary of experience
· Self incrementing primary keys are not allowed for new tables in the future ,
Consider using a business primary key
· Add parameters SHARD_ROW_ID_BITS = 4 PRE_SPLIT_REGIONS=2
Besides , because TiDB Optimizer and MySQL There are some differences , The same SQL Statements in MySQL It can be executed normally in , And in the TiDB Slow execution in . We target specific slow SQL In depth analysis , And targeted index optimization , Good results .
Optimization results
Pass slowly SQL The query platform can see , optimized , Most imports are completed in seconds , Compared with the original tens of minutes , Thousands of times the performance improvement .
slow SQL Optimization results
meanwhile , The performance monitoring chart also shows , When the load is high , Several machines are high at the same time , Instead of raising alone , This shows that our optimization method is effective ,TiDB As a distributed database, its advantages can be truly reflected .
After optimization , Achieve server load balancing
summary
As a new distributed relational database ,TiDB Can be OLTP(Online Transactional Processing) and OLAP(Online Analytical Processing) The scenario provides a one-stop solution . A tweet not only uses TiDB Efficient query of massive data , At the same time, based on TiDB Real time data analysis 、 Exploration of insight .
More later “ Big data cost reduction and efficiency improvement ” Share the dry goods of , Please continue to lock in the technology to practice the official account. ( WeChat official account ID:getuitech)~
边栏推荐
- HackTheBox-baby breaking grad
- C import Xls data method summary II (save the uploaded file to the DataTable instance object)
- Small program graduation project based on wechat video broadcast small program graduation project opening report function reference
- Customize redistemplate tool class
- SRCNN:Learning a Deep Convolutional Network for Image Super-Resolution
- C import Xls data method summary IV (upload file de duplication and database data De duplication)
- Since the "epidemic", we have adhered to the "no closing" of data middle office services
- Douban scoring applet Part-3
- Huawei rip and BFD linkage
- Which insurance products can the elderly buy?
猜你喜欢
Applet graduation design is based on wechat course appointment registration. Applet graduation design opening report function reference
Will the memory of ParticleSystem be affected by maxparticles
Jerry's modification setting status [chapter]
MySQL deadly serial question 2 -- are you familiar with MySQL index?
When tidb meets Flink: tidb efficiently enters the lake "new play" | tilaker team interview
What is the intelligent monitoring system of sewage lifting pump station and does it play a big role
Applet graduation project based on wechat selection voting applet graduation project opening report function reference
Douban scoring applet Part-3
Huawei cloud micro certification Huawei cloud computing service practice has been stable
Should enterprises start building progressive web applications?
随机推荐
Portable two-way radio equipment - current market situation and future development trend
Setting function of Jerry's watch management device [chapter]
Logical operator, displacement operator
JVM performance tuning and practical basic theory - medium
Life cycle of instance variables, static variables and local variables
MySQL deadly serial question 2 -- are you familiar with MySQL index?
Install the pit that the electron has stepped on
Write the first CUDA program
Jerry's watch listens to the message notification of the target third-party software and pushes the message to the device [article]
Winter vacation daily question -- a single element in an ordered array
Decompile and modify the non source exe or DLL with dnspy
Small program graduation design is based on wechat order takeout small program graduation design opening report function reference
Jerry's modification setting status [chapter]
Example 073 square sum value judgment programming requires the input of a and B, if a ²+ b ² If the result of is greater than 100, a is output ²+ b ² Value, otherwise output the result of a + B.
Maximum entropy model
Bacteriostatic circle scanning correction template
Skku| autonomous handover decision of UAV Based on deep reinforcement learning
A fan summed up so many interview questions for you. There is always one you need!
Introduction to graphics: graphic painting (I)
Solution of cursor thickening