当前位置：网站首页>Database: data field change under high-speed parallel distribution

Database: data field change under high-speed parallel distribution

2022-06-09 18:27:00 【Linux server development】

1 background

This is often the case , Our business has been running steadily for some time , And the traffic has gradually increased . Now , But for some reason （ Such as function adjustment or business expansion ）, You need to adjust the data table , Add fields or Modify table structure . Maybe a lot of people say alter table add column ... / alter table modify ..., It was easy to solve the problem . This is actually risky , For high complexity 、 A table with a large amount of data . Adjust table structure 、 Create or delete index 、 trigger , May cause table locking , The duration of locking the table depends on the actual situation of your data table . I have learned a painful lesson , The data scale was not evaluated well during the first business launch , As a result, business data cannot be written in for a long time . So what is the way to seamlessly upgrade the business tables of the database , Make the table transparent to users ？ Let's discuss one by one .

2 Add associated table

The simplest way , Store the new fields in another secondary table , Use a foreign key to associate to the primary key of the main table . Achieve the goal of dynamic expansion . After the subsequent functions are launched , The new data will be stored in the secondary table , The main table does not need to be adjusted , transparent 、 Nondestructive .

The problem is ：

When reading data , Join table query is inefficient , More data , The more complex the data , The more obvious the disadvantage .
There is no complete solution to the problem , Then there are new fields , We still face the problem of adding a new table or modifying the original table . Even if the subsequent newly added fields are added to the secondary table , It also faces the problem of locking tables .
The secondary table is only used to solve the problem of adding new fields , The problem of field update is not solved （ For example, modify the field name 、 Data type, etc ）.

3 Add a new common column

Suppose our original table structure is as follows , In order to ensure the sustainable development of the business , There will be field extensions in the future . At this point, you need to consider adding a general field that can be expanded and shrunk automatically .

With MySQL As an example ,5.7 After the version version Json Field type , It is convenient for us to store complex Json Object data .

use test;
DROP TABLE IF EXISTS `t_user`;
CREATE TABLE "t_user" (
  "id" bigint(20) NOT NULL AUTO_INCREMENT,
  "name" varchar(20) NOT NULL,
  "age" int(11) DEFAULT NULL,
  "address" varchar(255) DEFAULT NULL,
  "sex" int(11) DEFAULT '1',
  "ext_data" json DEFAULT NULL COMMENT 'json character string ',
  PRIMARY KEY ("id")
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8;

-- ----------------------------
-- Records of t_user
-- ----------------------------
INSERT INTO `t_user` VALUES ('1', 'brand', '21', 'fuzhou', '1', '{"tel": "13212345678", "name": "brand", "address": "fuzhou"}');

In the code ext_data use Json data type , Is an extensible object carrier , Information supplement for storing the queried data . alike ,MySQL The data type provided , It also provides a very powerful Json Function to operate .

SELECT id,`name`,age,address FROM `t_user` WHERE json_extract(ext_data,'$.tel') = '13212345678';

give the result as follows ：

Written before MySQL In series , A reader of the blog asked me to sum it up MySQL Json Usage of , There has been no time , You can look at the documents on the official website , It's still clear .

Json Structures are generally downward compatible , So when you design field extensions , It is generally recommended to add , Deleting old attributes... Is not recommended . But there is also a problem , The more complex the business is ,Json The higher the complexity , There are more redundant attributes . For example, our json There are three properties ,tel、name、address, After that, the business is being adjusted , Find out tel useless , Add a age attribute , that tel Do you want to delete it ？ There is a better way , Is to add... To the watch version attribute , The business of each period corresponds to one version, Every version Corresponding Json The data structure is also different .

advantage ：

Properties can be dynamically extended at any time
New and old data can exist at the same time
Data migration is convenient , Write a program to convert the old version ext The new version of ext, And modify it version

Insufficient ：

ext_data Fields in cannot be indexed
ext_data Inside key A lot of space will be occupied , Suggest key Be brief
from json It's troublesome to count the data of a certain field in , And it's inefficient .
Relatively inefficient query , The operation is complicated .
to update Json One of the fields in is inefficient , It is not suitable for storing data with complex business logic .
The statistics are complex , It is suggested that the data to be reported should not be saved json.

improvement ：

If ext The attributes in the are required to be indexed , Probably NoSql（ Such as MongoDB） It would be more suitable for

Article Welfare 】 In addition, Xiaobian also sorted out some C++ Back-end development interview questions , Teaching video , Back end learning roadmap for free , You can add what you need ： Click to join the learning exchange group ~ Group file sharing

Xiaobian strongly recommends C++ Back end development free learning address ：C/C++Linux Server development senior architect /C++ Background development architect

4 New table + Data migration

4.1 Data migration with triggers

The whole process is as follows ：

Create a new table t_user_v1 (id, name, age, address, sex, ext_column), Contains extended fields ext_column
Add trigger on existing table , The original watch DML operation （ The main INSERT、UPDATE、DELETE）, Will trigger the operation , Transfer data to a new table t_user_v1 in
For the original data in the old table , Step by step migration until completion
Delete trigger , Remove the original watch （ The default is drop fall ）
Put the new watch t_user_v1 rename （rename） Original table t_user Go through the above steps , Gradually migrate data to new tables , And replace the old table , The whole operation does not need to be stopped for maintenance , No harm to the business

4.2 utilize Binlog Data migration

If it is MySQL database , You can copy binlog For data migration , The effect is the same , Compared to triggers , More stable .

4.3 The problem is

Cumbersome operation , inefficiency
There is an operational gap between data migration and data table switching , For high concurrency 、 Data sheet for high frequency operation , There are risks , It will cause temporary connection failure and Data inconsistency .
For big data tables , Long synchronization time

5 Field reservation

Reserved fields and How to map fields to table names .

5.1 The problem is

alike , Query efficiency is low
There are unknowns by default , There may be insufficient preset fields , There may also be spatial redundancy
Redundant empty subfields , There are obstacles to the occupation of storage space and the improvement of performance .
This method is still stupid , Not suitable for programmers' thinking

6 Multi master mode and hierarchical update

If the business flow is small , You can directly add or modify fields in the table , Short write locks are bearable . But if it is highly concurrent 、 clustering 、 Distributed system , From the data level, the master-slave or sub database and table management should be carried out . The following is a typical multi main mode , The process of upgrading the database table structure .

Under normal two main modes , Master master synchronization , have access to DBproxy、Fabric Wait for data middleware to do load balancing , You can also define some load policies by yourself , such as Range、Hash.
Modify the configuration , Let the traffic be switched to one of them , Then upgrade the data table of the other one （ Like cutting DB1, Use only DB2）. Remember to do it during the low peak period , Avoid excessive traffic that causes another database instance to be suspended due to excessive load .
Take this operation in turn , But there is no need to upgrade at this time DB2 了 , Because it's primary synchronization .DB instance 1 It is a new table structure , At this time, it will be updated to... Together with the architecture and data DB2 On .
Wait until the two database instances are consistent , Modify the configuration , Reset the load on both database instances , Return to the previous state .