当前位置:网站首页>Performance optimization of database 5- database, table and data migration
Performance optimization of database 5- database, table and data migration
2022-06-23 22:02:00 【Girlfriend in college entrance examination】
Keep creating , Accelerate growth ! This is my participation 「 Nuggets day new plan · 6 Yuegengwen challenge 」 Of the 18 God , Click to see the event details
One 、 Database split
1. Why do you split the database
Problems with stand-alone database ?
Slave capacity 、 performance 、 It is difficult to meet the scenario of massive data in terms of availability and operation and maintenance cost .
- Performance aspect , The amount of data exceeds a certain threshold ,B+ Disk access caused by increasing tree index depth IO More times , This leads to a decline in query performance .
- In terms of capacity , The amount of data that a single machine can store is limited
- Usability , A large number of queries fall on a single database node or a simple master-slave architecture , The database is hard to bear .
- Operation and maintenance , The amount of data reaches a certain threshold , The master-slave synchronization delay is high 、 Add field index 、 Backing up these will be slow , Impact on business system .
The master-slave structure solves the problem of high availability 、 Read extension . But the single machine capacity remains unchanged , Stand alone write performance cannot be solved .
To solve these problems , We need to adopt == Sub database and sub table ==, Split the database . Reduce the write pressure of a single node , Increase the maximum data capacity of the whole system .
Extended Cube
- X Axis : adopt clone The entire system is replicated , colony
- Y Axis : Copy by decoupling different functions , Business split
- Z Axis : Expand by splitting different data , Data fragmentation
2. Split Vertically
Split Vertically , Divide databases and tables according to business latitude .
It's vertical
Put a database , Split into multiple databases with different business data processing capabilities . Such as : Split the e-commerce library into a separate library 、 Order Library 、 Commodity bank .
Remove the meter vertically
If the amount of data in a single table is too large , You also need to split a single table . Such as : One 200 The order master table of column , Split into more than ten sub tables : The order sheet 、 Order details sheet 、 Order receipt information table, etc .
Advantages and disadvantages of vertical splitting :
advantage :
- Single storehouse ( Single table ) smaller , Easy to manage
- Improved performance and capacity
- After break up , Reduced system and data complexity
- It can be used as the basis of microservice transformation
shortcoming :
- Library change , Management becomes complex
- It is highly invasive to the business system
- The transformation process is complex , Easy to break down
- You can't continue to split until you split it to a certain extent
3. Horizontal split
Horizontal splitting is to slice the data directly , There are two specific methods: sub database and sub table . Do not change the structure of the data itself , Just reduce the amount of data in a single node . In this way, there is no need to make special changes to the code of the business system itself , It can even be transparent based on some middleware .
For example, put a 10 Single warehouse list of 100 million recorded orders . By user id Divide 32 modulus , Split a single library into 32 Databases ; Then press the order id Divide 32 modulus , Each library is disassembled into 32 Tables . This is the 32*32=1024 Tables , The data volume of a single table is less than one million .
Horizontal sub warehouse sub table
Generally speaking, our data has creation time , Can be split by time , According to the year 、 quarter 、 month 、 Heaven can .
Or split according to the user 、 It can even be split according to some custom complex logic .
Why don't you suggest dividing tables sometimes , Only sub warehouse is recommended ?
Because the split table cannot solve the capacity problem , If the bottleneck is IO( disk IO、 The Internet IO) On , The sub table can't solve , Because the sub table is still on the same machine , The sub library can be on two machines .
Sub database or sub table , How to choose ?
In general , If the data itself is under great reading and writing pressure , disk IO Has become a bottleneck , Then the sub database score table is better . And use different libraries , It can improve the parallel data processing capability of the whole cluster .
On the contrary , You can try to divide the table , Reduce the amount of data in a single table .
Advantages and disadvantages of horizontal database and table : advantage :
- Solve the capacity problem
- It has less impact on the system than vertical splitting
- Partially improve performance and stability
shortcoming :
- Large scale of cluster , Managing complex
- complex SQL Support questions
- Data migration issues
- Consistency issues
4. Classification management of data
Data classification management refers to improving data management ability by classifying data .
With the development of business systems 、 The analysis of the data found that , Many data have different requirements for quality .
Such as order data , Sure, consistency is the highest requirement , You can't lose . And some log data , Intermediate data , There is not so high consistency . If you lose it, you lose it .
in addition , Different strategies can also be adopted for the order data in the same table , There are many invalid orders , We can regularly transfer or remove .(== In some trading systems 80% The above are meaningless orders cancelled after placing an order , So you can clean it up ==)
If there are no invalid orders , You can also consider :
- Orders placed but unpaid in the last week , It is more likely to be queried and paid . And a little more , You can just cancel .
- lately 3 Data of orders placed in last month , It is most likely to be repeatedly queried online and counted by the system .
- 3 Months ago -3 Data within years , The possibility of query is small , Online inquiry is not available
- 3 More than years of data , You can query directly without providing any way .
In this case , We can use certain means to optimize the system according to the classification :
- Define the data placed but unpaid within one week as thermal data , Put it into database and memory at the same time
- Definition 3 The data within months are temperature data , Put it in the database , Provide normal query operation
- Definition 3 Months to 3 The year's data are cold data , Remove from database , Archive to some cheap disk , In a compressed way ( such as MySQL Of tokuDB engine ) Storage , If the user needs to query, pick up the work order to query
- Definition 3 The data of more than years are ice data , Back up to media such as tape , No query operation is provided .
5. database middleware
Technology evolution of database middleware
ShardingSphere Is a set of open source distributed database middleware solutions composed of the ecosystem , It consists of JDBC、Proxy and Sidecar( Planning ) this 3 They are independent of each other , It can also be composed of products that can be deployed and used together . They all provide data fragmentation 、 Distributed transaction and database governance functions , It can be applied to such as Java isomorphism 、 isomerism 、 Various application scenarios such as cloud native .
ShardingSphere-JDBC
frame ShardingSphere-JDBC, It can be used directly in business code , Support common databases and JDBC. Only applicable to Java Language .
Using examples :
- Read / write separation :github.com/mmcLine/sha…
- Sub database and sub table :gitee.com/mmcLine/sha…
Two 、 Data migration
Scheme 1 : Total quantity
- Business system downtime
- Database migration , Check consistency
- Business system upgrade , Access the new database
If the new database structure is the same , Sure dump After that, the full volume is imported . If it's heterogeneous , You need to program .
Option two : Total quantity + The incremental
Time stamps that depend on the data itself
- First synchronize the data to a recent timestamp ( As the day before )
- Then release the upgrade downtime maintenance
- Resynchronize the change data in the last period of time
- Finally, upgrade the business system , Access the new database .
Option three :binlog+ Total quantity + The incremental
- Through master library or slave library binlog To parse and reconstruct the data , To replicate .
- Generally, it needs the support of Middleware Tools .
It can realize multithreading , Breakpoint continuation , Full or incremental data synchronization .
Then you can do :
- Realize custom complex heterogeneous data structure
- Realize automatic capacity expansion and reduction , For example, from warehouse and table to single warehouse and single table 、 From single warehouse and single table to sub warehouse and sub table 、 branch 4 From one library to another 8 A library, etc .
Here is a migration tool :
ShardingSphere-scaling
Download package :archive.apache.org/dist/shardi…
introduces : from 2 Database expansion to 4 A database
- Two new databases
User3 and User4 It's new .
- Configure dual master for data synchronization
- When the data synchronization is complete , Configure dual master and dual write ( Synchronization because there is a delay , If there are write and update operations all the time , There will be inaccuracies )
- After the data synchronization is completely consistent , Delete dual master synchronization , Modify database configuration , And restart .
- Now the expansion is complete , But there is data redundancy , We also need to write a program , Delete redundant data in the database .
- User1 Remove uid % 4 = 2 The data of ;
- User3 Remove uid % 4 = 0 The data of ;
- User2 Remove uid % 4 = 3 The data of ;
- User4 Remove uid % 4 = 1 The data of ;
This scheme can realize n Kudzu 2n Smooth expansion of Library , Increase database service capacity , Reduce the amount of data in a single database by half . The core principle is : Double the capacity , Avoid data migration .
边栏推荐
- The most common usage scenarios for redis
- Error running PyUIC: Cannot start process, the working directory ‘-m PyQt5. uic. pyuic register. ui -o
- Minimisé lorsque Outlook est allumé + éteint
- Open source C # WPF control library --newbeecoder UI User Guide (II)
- Practice of business level disaster recovery switching drill
- 从CVPR 2022看域泛化(Domain Generalization)最新研究进展
- Using barcode software to make certificates
- Flink practical tutorial: advanced 4-window top n
- Bluetooth chip | Renesas and Ti launch new Bluetooth chip, try Lenz st17h65 Bluetooth ble5.2 chip
- KnowDA: All-in-One Knowledge Mixture Model for Data Augmentation in Few-Shot NLP(KnowDA:用于 Few-Shot NLP 中数据增强的多合一知识混合模型)
猜你喜欢

Installation and use of Minio

Leetcode must review six lintcode (28348455116385)

嵌入式开发:嵌入式基础——重启和重置的区别

HDLBits-> Circuits-> Arithmetic Circuitd-> 3-bit binary adder

万字长文!一文搞懂InheritedWidget 局部刷新机制

Introduction to scikit learn machine learning practice

《scikit-learn机器学习实战》简介

Error running PyUIC: Cannot start process, the working directory ‘-m PyQt5. uic. pyuic register. ui -o

CAD图在线Web测量工具代码实现(测量距离、面积、角度等)

Simple code and design concept of "back to top"
随机推荐
Question: how to understand the network protocol and why the OSI reference model is divided into seven layers
How to deploy the API gateway? Is it OK not to use the API gateway?
Code implementation of CAD drawing online web measurement tool (measuring distance, area, angle, etc.)
2021-12-22: palindrome substring. Give you a string s, please count and return
KnowDA: All-in-One Knowledge Mixture Model for Data Augmentation in Few-Shot NLP(KnowDA:用于 Few-Shot NLP 中数据增强的多合一知识混合模型)
CAD图在线Web测量工具代码实现(测量距离、面积、角度等)
Build DNS server in Intranet
Flink practical tutorial: advanced 4-window top n
ECS (no matter which one, as long as it is an ordinary ECS) uses the installed version of SketchUp to install an error
Unusual transaction code mebv of SAP mm preliminary level
How to make a label for an electric fan
Selenium批量查询运动员技术等级
高阶柱状图之极环图与极扇图
Explain the rainbow ingress universal domain name resolution mechanism
The use of go unsafe
Go bubbling, cocktail, quick, insert sort
How does the hybrid cloud realize the IP sec VPN cloud networking dedicated line to realize the interworking between the active and standby intranet?
How to open a stock account? What are the main considerations for opening an account? Is there a security risk in opening an account online?
University of North China, Berkeley University of California, etc. | Domain Adaptive Text Classification with structural Knowledge from unlabeled data
The most common usage scenarios for redis