当前位置：网站首页>Performance optimization of database 5- database, table and data migration

Performance optimization of database 5- database, table and data migration

2022-06-23 22:02:00 【Girlfriend in college entrance examination】

Keep creating , Accelerate growth ！ This is my participation 「 Nuggets day new plan · 6 Yuegengwen challenge 」 Of the 18 God , Click to see the event details

One 、 Database split

1. Why do you split the database

Problems with stand-alone database ？

Slave capacity 、 performance 、 It is difficult to meet the scenario of massive data in terms of availability and operation and maintenance cost .

Performance aspect , The amount of data exceeds a certain threshold ,B+ Disk access caused by increasing tree index depth IO More times , This leads to a decline in query performance .
In terms of capacity , The amount of data that a single machine can store is limited
Usability , A large number of queries fall on a single database node or a simple master-slave architecture , The database is hard to bear .
Operation and maintenance , The amount of data reaches a certain threshold , The master-slave synchronization delay is high 、 Add field index 、 Backing up these will be slow , Impact on business system .

The master-slave structure solves the problem of high availability 、 Read extension . But the single machine capacity remains unchanged , Stand alone write performance cannot be solved .

To solve these problems , We need to adopt == Sub database and sub table ==, Split the database . Reduce the write pressure of a single node , Increase the maximum data capacity of the whole system .

Extended Cube

X Axis ： adopt clone The entire system is replicated , colony
Y Axis ： Copy by decoupling different functions , Business split
Z Axis ： Expand by splitting different data , Data fragmentation

2. Split Vertically

Split Vertically , Divide databases and tables according to business latitude .

It's vertical

Put a database , Split into multiple databases with different business data processing capabilities . Such as ： Split the e-commerce library into a separate library 、 Order Library 、 Commodity bank .

Remove the meter vertically

If the amount of data in a single table is too large , You also need to split a single table . Such as ： One 200 The order master table of column , Split into more than ten sub tables ： The order sheet 、 Order details sheet 、 Order receipt information table, etc .

Advantages and disadvantages of vertical splitting ：

advantage ：

Single storehouse （ Single table ） smaller , Easy to manage
Improved performance and capacity
After break up , Reduced system and data complexity
It can be used as the basis of microservice transformation

shortcoming ：

Library change , Management becomes complex
It is highly invasive to the business system
The transformation process is complex , Easy to break down
You can't continue to split until you split it to a certain extent

3. Horizontal split

Horizontal splitting is to slice the data directly , There are two specific methods: sub database and sub table . Do not change the structure of the data itself , Just reduce the amount of data in a single node . In this way, there is no need to make special changes to the code of the business system itself , It can even be transparent based on some middleware .

For example, put a 10 Single warehouse list of 100 million recorded orders . By user id Divide 32 modulus , Split a single library into 32 Databases ; Then press the order id Divide 32 modulus , Each library is disassembled into 32 Tables . This is the 32*32=1024 Tables , The data volume of a single table is less than one million .

Horizontal sub warehouse sub table

Generally speaking, our data has creation time , Can be split by time , According to the year 、 quarter 、 month 、 Heaven can .

Or split according to the user 、 It can even be split according to some custom complex logic .

Why don't you suggest dividing tables sometimes , Only sub warehouse is recommended ？

Because the split table cannot solve the capacity problem , If the bottleneck is IO（ disk IO、 The Internet IO） On , The sub table can't solve , Because the sub table is still on the same machine , The sub library can be on two machines .

Sub database or sub table , How to choose ？

In general , If the data itself is under great reading and writing pressure , disk IO Has become a bottleneck , Then the sub database score table is better . And use different libraries , It can improve the parallel data processing capability of the whole cluster .

On the contrary , You can try to divide the table , Reduce the amount of data in a single table .

Advantages and disadvantages of horizontal database and table ： advantage ：

Solve the capacity problem
It has less impact on the system than vertical splitting
Partially improve performance and stability

shortcoming ：

Large scale of cluster , Managing complex
complex SQL Support questions
Data migration issues
Consistency issues

4. Classification management of data

Data classification management refers to improving data management ability by classifying data .

With the development of business systems 、 The analysis of the data found that , Many data have different requirements for quality .

Such as order data , Sure, consistency is the highest requirement , You can't lose . And some log data , Intermediate data , There is not so high consistency . If you lose it, you lose it .

in addition , Different strategies can also be adopted for the order data in the same table , There are many invalid orders , We can regularly transfer or remove .（== In some trading systems 80% The above are meaningless orders cancelled after placing an order , So you can clean it up ==）

If there are no invalid orders , You can also consider ：

Orders placed but unpaid in the last week , It is more likely to be queried and paid . And a little more , You can just cancel .
lately 3 Data of orders placed in last month , It is most likely to be repeatedly queried online and counted by the system .
3 Months ago -3 Data within years , The possibility of query is small , Online inquiry is not available
3 More than years of data , You can query directly without providing any way .

In this case , We can use certain means to optimize the system according to the classification ：

Define the data placed but unpaid within one week as thermal data , Put it into database and memory at the same time
Definition 3 The data within months are temperature data , Put it in the database , Provide normal query operation
Definition 3 Months to 3 The year's data are cold data , Remove from database , Archive to some cheap disk , In a compressed way （ such as MySQL Of tokuDB engine ） Storage , If the user needs to query, pick up the work order to query
Definition 3 The data of more than years are ice data , Back up to media such as tape , No query operation is provided .

5. database middleware

Technology evolution of database middleware

ShardingSphere Is a set of open source distributed database middleware solutions composed of the ecosystem , It consists of JDBC、Proxy and Sidecar（ Planning ） this 3 They are independent of each other , It can also be composed of products that can be deployed and used together . They all provide data fragmentation 、 Distributed transaction and database governance functions , It can be applied to such as Java isomorphism 、 isomerism 、 Various application scenarios such as cloud native .

ShardingSphere-JDBC

frame ShardingSphere-JDBC, It can be used directly in business code , Support common databases and JDBC. Only applicable to Java Language .

Using examples ：

Read / write separation ：github.com/mmcLine/sha…
Sub database and sub table ：gitee.com/mmcLine/sha…

Two 、 Data migration

Scheme 1 ： Total quantity

Business system downtime
Database migration , Check consistency
Business system upgrade , Access the new database

If the new database structure is the same , Sure dump After that, the full volume is imported . If it's heterogeneous , You need to program .

Option two ： Total quantity + The incremental

Time stamps that depend on the data itself

First synchronize the data to a recent timestamp （ As the day before ）
Then release the upgrade downtime maintenance
Resynchronize the change data in the last period of time
Finally, upgrade the business system , Access the new database .

Option three ：binlog+ Total quantity + The incremental

Through master library or slave library binlog To parse and reconstruct the data , To replicate .
Generally, it needs the support of Middleware Tools .

It can realize multithreading , Breakpoint continuation , Full or incremental data synchronization .

Then you can do ：

Realize custom complex heterogeneous data structure
Realize automatic capacity expansion and reduction , For example, from warehouse and table to single warehouse and single table 、 From single warehouse and single table to sub warehouse and sub table 、 branch 4 From one library to another 8 A library, etc .

Here is a migration tool ：

ShardingSphere-scaling

Download package ：archive.apache.org/dist/shardi…

introduces ： from 2 Database expansion to 4 A database

Two new databases

User3 and User4 It's new .

Configure dual master for data synchronization

When the data synchronization is complete , Configure dual master and dual write （ Synchronization because there is a delay , If there are write and update operations all the time , There will be inaccuracies ）
After the data synchronization is completely consistent , Delete dual master synchronization , Modify database configuration , And restart .

Now the expansion is complete , But there is data redundancy , We also need to write a program , Delete redundant data in the database .

User1 Remove uid % 4 = 2 The data of ;
User3 Remove uid % 4 = 0 The data of ;
User2 Remove uid % 4 = 3 The data of ;
User4 Remove uid % 4 = 1 The data of ;

This scheme can realize n Kudzu 2n Smooth expansion of Library , Increase database service capacity , Reduce the amount of data in a single database by half . The core principle is ： Double the capacity , Avoid data migration .

原网站

版权声明
本文为[Girlfriend in college entrance examination]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/174/202206232132509627.html