当前位置：网站首页>How does MySQL archive data?

How does MySQL archive data?

2022-07-28 21:56:00 【JavaShark】

It usually involves the following two actions ：

transfer ： Migrate data from business instance to archive instance .
Delete ： Delete migrated data from the business instance .

When dealing with similar requirements , They all developed children's shoes for DBA, from DBA To deal with it .

therefore , Many developers of children's shoes are curious ,DBA How to perform archiving operations ？ Does the archiving condition lock the table without an index ？ Is it safe? , Will the data be deleted , But it didn't file successfully ？

In response to these questions , Let's introduce MySQL Data archiving artifact in ——pt-archiver.

One 、 What is? pt-archiver

pt-archiver yes Percona Toolkit One of the tools in .

Percona Toolkit yes Percona One provided by the company MySQL tool kit .

There are many practical tools in the toolkit MySQL Management tools .

for example , Our common table structure change tool pt-online-schema-change, Master slave data consistency verification tool pt-table-checksum.

It's no exaggeration to say , Skillfully use Percona Toolkit yes MySQL DBA One of the essential skills .

Two 、 install

Percona Toolkit Download address ：https://www.percona.com/downloads/percona-toolkit/LATEST/

The official provides ready-made software packages for multiple systems .

What I often use is Linux - Generic Binary package .

Let's say Linux - Generic Version as an example , See how it's installed .

# cd /usr/local/

# wget https://downloads.percona.com/downloads/percona-toolkit/3.3.1/binary/tarball/percona-toolkit-3.3.1_x86_64.tar.gz --no-check-certificate

# tar xvf percona-toolkit-3.3.1_x86_64.tar.gz

# cd percona-toolkit-3.3.1

# yum install perl-ExtUtils-MakeMaker perl-DBD-MySQL perl-Digest-MD5

# perl Makefile.PL

# make

# make install

3、 ... and 、 Simple introduction

First , Let's look at a simple archive Demo.

Test data

mysql> show create table employees.departments\G

*************************** 1. row ***************************

Table: departments

Create Table: CREATE TABLE `departments` (

`dept_no` char(4) NOT NULL,

`dept_name` varchar(40) NOT NULL,

PRIMARY KEY (`dept_no`),

UNIQUE KEY `dept_name` (`dept_name`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

1 row in set (0.00 sec)

mysql> select * from employees.departments;

+---------+--------------------+

| dept_no | dept_name |

+---------+--------------------+

| d009 | Customer Service |

| d005 | Development |

| d002 | Finance |

| d003 | Human Resources |

| d001 | Marketing |

| d004 | Production |

| d006 | Quality Management |

| d008 | Research |

| d007 | Sales |

+---------+--------------------+

9 rows in set (0.00 sec)

below , We will employees.departments Table data from 192.168.244.10 Archive to 192.168.244.128.

The specific command is as follows ：

pt-archiver --source h=192.168.244.10,P=3306,u=pt_user,p=pt_pass,D=employees,t=departments --dest h=192.168.244.128,P=3306,u=pt_user,p=pt_pass,D=employees,t=departments --where "1=1"

Three parameters are specified on the command line .

--source： Source library （ Business examples ） Of DSN.

DSN stay Percona Toolkit Common in , It can be understood as the abbreviation of information related to the target instance .

Supported abbreviations and meanings are as follows ：

--dest： Target library （ Archive instances ） Of DSN.

--where： Archive conditions ."1=1" On behalf of filing the full table .

Four 、 Realization principle

Let's combine General log Take a look at the output of pt-archiver Implementation principle of .

Source library log

2022-03-06T10:58:20.612857+08:00 10 Query SELECT /*!40001 SQL_NO_CACHE */ `dept_no`,`dept_name` FROM `employees`.`departments` FORCE INDEX(`PRIMARY`) WHERE (1=1) ORDER BY `dept_no` LIMIT 1

2022-03-06T10:58:20.613451+08:00 10 Query DELETE FROM `employees`.`departments` WHERE (`dept_no` = 'd001')

2022-03-06T10:58:20.620327+08:00 10 Query commit

2022-03-06T10:58:20.628409+08:00 10 Query SELECT /*!40001 SQL_NO_CACHE */ `dept_no`,`dept_name` FROM `employees`.`departments` FORCE INDEX(`PRIMARY`) WHERE (1=1) AND ((`dept_no` >= 'd001')) ORDER BY `dept_no` LIMIT 1

2022-03-06T10:58:20.629279+08:00 10 Query DELETE FROM `employees`.`departments` WHERE (`dept_no` = 'd002')

2022-03-06T10:58:20.636154+08:00 10 Query commit

...

Target library log

2022-03-06T10:58:20.613144+08:00 18 Query INSERT INTO `employees`.`departments`(`dept_no`,`dept_name`) VALUES ('d001','Marketing')

2022-03-06T10:58:20.613813+08:00 18 Query commit

2022-03-06T10:58:20.628843+08:00 18 Query INSERT INTO `employees`.`departments`(`dept_no`,`dept_name`) VALUES ('d002','Finance')

2022-03-06T10:58:20.629784+08:00 18 Query commit

...

Combine the logs of the source library and the target library , You can see ：

1）pt-archiver First, a record will be queried from the source database , Then insert the record into the target library .

Target library inserted successfully , Will delete this record from the source library .

This ensures that the data is deleted before , It must be archived successfully .

2） Carefully observe the execution time of these operations , The order is as follows .

Source database query record .

Insert record in target library .

Delete record from source library .

Target library COMMIT.

Source library COMMIT.

This implementation draws lessons from the two-stage commit algorithm in distributed transactions .

3）--where In the parameter "1=1" It will be delivered to SELECT In operation .

"1=1" On behalf of filing the full table , Other conditions can also be specified , Such as the time we often use .

4） Every query uses the primary key index , In this way, even if there is no index in the archive condition , Nor will it produce a full table scan .

5） Every deletion is based on the primary key , This can avoid the risk of locking the whole table due to the lack of index in the archiving condition .

5、 ... and 、 Batch archiving

If you use Demo Archive the parameters in , In the case of large amount of data , It's going to be very inefficient , After all COMMIT It's an expensive operation .

So online , We usually do batch operations .

The specific command is as follows ：

pt-archiver --source h=192.168.244.10,P=3306,u=pt_user,p=pt_pass,D=employees,t=departments --dest h=192.168.244.128,P=3306,u=pt_user,p=pt_pass,D=employees,t=departments --where "1=1" --bulk-delete --limit 1000 --commit-each --bulk-insert

Compared with the previous archive command , This command specifies four additional parameters , among ,

--bulk-delete： Batch deletion .

--limit： Number of records filed per batch .

--commit-each： For each batch of records , It's just COMMIT once .

--bulk-insert： Archive data to LOAD DATA INFILE Import into the archive in the way of .

Look at the corresponding... Of the above command General log.

Source library

2022-03-06T12:13:56.117984+08:00 53 Query SELECT /*!40001 SQL_NO_CACHE */ `dept_no`,`dept_name` FROM `employees`.`departments` FORCE INDEX(`PRIMARY`) WHERE (1=1) ORDER BY `dept_no` LIMIT 1000

...

2022-03-06T12:13:56.125129+08:00 53 Query DELETE FROM `employees`.`departments` WHERE (((`dept_no` >= 'd001'))) AND (((`dept_no` <= 'd009'))) AND (1=1) LIMIT 1000

2022-03-06T12:13:56.130055+08:00 53 Query commit

Target library

2022-03-06T12:13:56.124596+08:00 51 Query LOAD DATA LOCAL INFILE '/tmp/hitKctpQTipt-archiver' INTO TABLE `employees`.`departments`(`dept_no`,`dept_name`)

2022-03-06T12:13:56.125616+08:00 51 Query commit：

Be careful ：

1） If you want to execute LOAD DATA LOCAL INFILE operation , The target library needs to be local_infile Parameter set to ON.

2） If you don't specify --bulk-insert And did not specify --commit-each, The insertion of the target library will still look like Demo As shown in , Submit line by line .

3） If you don't specify --commit-each, Even if... In the table 9 A record is passed through a DELETE Command deleted , But because it involves 9 Bar record ,pt-archiver Will execute COMMIT operation 9 Time . The same is true for the target library .

4） In the use of --bulk-insert When filing, pay attention to , If there is a problem during the import process , For example, primary key conflicts ,pt-archiver No error will be prompted .

6、 ... and 、 Speed comparison between different archiving parameters

The following table is the archive 20w data , Comparison between different execution time parameters .

Through the data in the table , We can come to the following points ：

1） The first way is the slowest .

In this case , Whether it's a source library or an archive library , They are operated and submitted line by line .

2） Specify only --bulk-delete --limit 1000 Still very slow .

In this case , The source library is deleted in batch , but COMMIT The number of times has not decreased .

The archive library is still inserted and submitted line by line .

3）--bulk-delete --limit 1000 --commit-each

Equivalent to the second archiving method , Both source and target libraries are submitted in batches .

4）--limit 1000 and --limit 5000 Archiving performance is similar .

5）--bulk-delete --limit 1000 --bulk-insert And --bulk-delete --limit 1000 --commit-each --bulk-insert comparison , No settings --commit-each.

Although they are all batch operations , But the former will perform COMMIT operation 1000 Time .

From this view , Empty transactions are not without cost .

7、 ... and 、 Other common usages

1、 Delete data

Deleting data is pt-archiver Another common usage scenario .

The specific command is as follows ：

pt-archiver --source h=192.168.244.10,P=3306,u=pt_user,p=pt_pass,D=employees,t=departments --where "1=1" --bulk-delete --limit 1000 --commit-each --purge --primary-key-only

On the command line --purge Means only delete , No archiving .

It specifies --primary-key-only , such , In execution SELECT In operation , Only the primary key will be queried , Not all columns will be queried .

Next , Let's look at the delete command related General log.

To visually show pt-archiver Implementation logic of deleting data , The actual test will --limit Set up in order to 3.

# Open transaction

set autocommit=0;

# View table structure , Get primary key

SHOW CREATE TABLE `employees`.`departments`;

# Start deleting the first batch of data

# adopt FORCE INDEX(`PRIMARY`) Force primary key

# It specifies --primary-key-only, Therefore, only the primary key will be queried

# In fact, there is no need to obtain all the qualified primary key values , Just take a minimum and maximum value .

SELECT /*!40001 SQL_NO_CACHE */ `dept_no` FROM `employees`.`departments` FORCE INDEX(`PRIMARY`) WHERE (1=1) ORDER BY `dept_no` LIMIT 3;

# Delete based on primary key , When deleting, you also bring --where Specified deletion criteria , To avoid accidental deletion

DELETE FROM `employees`.`departments` WHERE (((`dept_no` >= 'd001'))) AND (((`dept_no` <= 'd003'))) AND (1=1) LIMIT 3;

# Submit

commit;

# Delete the second batch of data

SELECT /*!40001 SQL_NO_CACHE */ `dept_no` FROM `employees`.`departments` FORCE INDEX(`PRIMARY`) WHERE (1=1) AND ((`dept_no` >= 'd003')) ORDER BY `dept_no` LIMIT 3;

DELETE FROM `employees`.`departments` WHERE (((`dept_no` >= 'd004'))) AND (((`dept_no` <= 'd006'))) AND (1=1); LIMIT 3

commit;

# Delete the third batch of data

SELECT /*!40001 SQL_NO_CACHE */ `dept_no` FROM `employees`.`departments` FORCE INDEX(`PRIMARY`) WHERE (1=1) AND ((`dept_no` >= 'd006')) ORDER BY `dept_no` LIMIT 3;

DELETE FROM `employees`.`departments` WHERE (((`dept_no` >= 'd007'))) AND (((`dept_no` <= 'd009'))) AND (1=1) LIMIT 3;

commit;

# Delete the last batch of data

SELECT /*!40001 SQL_NO_CACHE */ `dept_no` FROM `employees`.`departments` FORCE INDEX(`PRIMARY`) WHERE (1=1) AND ((`dept_no` >= 'd009')) ORDER BY `dept_no` LIMIT 3;

commit;

In the business code , If we have similar deletion requirements , You might as well learn from pt-archiver How to implement .

2、 Archive data into files

Data can be archived to the database , It can also be archived in a file .

The specific command is as follows ：

pt-archiver --source h=192.168.244.10,P=3306,u=pt_user,p=pt_pass,D=employees,t=departments --where "1=1" --bulk-delete --limit 1000 --file '/tmp/%Y-%m-%d-%D.%t'

Specifies the --file , instead of --dest.

The file name uses the date formatting symbol , The supported symbols and meanings are as follows ：

%d Day of the month, numeric (01..31)

%H Hour (00..23)

%i Minutes, numeric (00..59)

%m Month, numeric (01..12)

%s Seconds (00..59)

%Y Year, numeric, four digits

%D Database name

%t Table name

The generated file is CSV Format , It can be passed later LOAD DATA INFILE Command to load into the database .

8、 ... and 、 How to avoid master-slave delay

Whether it's data archiving or deletion , For source libraries , All need to be carried out DELETE operation .

Many people worry , If too many records are deleted , Will cause master-slave delay .

in fact ,pt-archiver It has the ability to automatically adjust archiving based on master-slave delay （ Delete ） Ability to operate .

If the delay from the library exceeds 1s（ from --max-lag Appoint ） Or replication status is not normal , The archive will be suspended （ Delete ） operation , Until recovered from the library .

By default ,pt-archiver The delay from the library will not be checked .

If you want to check , Need to pass --check-slave-lag Explicitly set the address of the slave Library , for example ,

pt-archiver --source h=192.168.244.10,P=3306,u=pt_user,p=pt_pass,D=employees,t=departments --where "1=1" --bulk-delete --limit 1000 --commit-each --primary-key-only --purge --check-slave-lag h=192.168.244.20,P=3306,u=pt_user,p=pt_pass

Only check here 192.168.244.20 The delay of .

If there are multiple slave libraries to check , Need to --check-slave-lag Specify multiple times , One slave library at a time .

Nine 、 Common parameters

--analyze

After performing the archiving operation , perform ANALYZE TABLE operation .

Can be followed by any string , If the string contains s , Will execute... In the source library ANALYZE operation .

If the string contains d , Will execute... In the target library ANALYZE operation .

If you have d and s , Then both the source library and the target library will execute ANALYZE operation . Such as ,

--analyze ds

--optimize

After performing the archiving operation , perform OPTIMIZE TABLE operation .

Use the same --analyze similar .

--charset

Specify the connection （Connection） Character set .

stay MySQL 8.0 Before , The default is latin1.

stay MySQL 8.0 in , The default is utf8mb4 .

Be careful , The default value here is the same as MySQL Server character set character_set_server irrelevant .

If this value is explicitly set ,pt-archiver After the connection is established , Will execute first SET NAMES 'charset_name' operation .

--[no]check-charset

Check the source library （ Target library ） Connect （Connection） Whether the character set is consistent with the character set of the table .

If it's not consistent , The following errors will be prompted ：

Character set mismatch: --source DSN uses latin1, table uses gbk.  You can disable this check by specifying --no-check-charset.

This is the time , Remember not to follow the prompts to specify --no-check-charset Ignore check , Otherwise, it is easy to cause garbled code .

For the above error report , Can be --charset The character set specified as the table .

Be careful , This option does not compare whether the character sets of the source library and the target library are consistent .

--[no]check-columns

Check whether the column names of the source table and the target table are consistent .

Be careful , Only column names will be checked , The order of the columns is not checked 、 Whether the data types of columns are consistent .

--columns

Archive specified columns .

In the case of self addition , If the self incrementing columns of the source table and the target table intersect , It is not necessary to file and add columns , This is the time , You need to use --columns Explicitly specify archive Columns .

--dry-run

Print only those to be executed SQL, Not actually .

Often used before actual operation , Verify the to be executed SQL Whether they meet their expectations .

--ignore

Use INSERT IGNORE Archive data .

--no-delete

Do not delete the data of the source library .

--replace

Use REPLACE Operate archive data .

--[no]safe-auto-increment

When archiving a table with a self incrementing primary key , The row with the largest self incremented primary key will not be deleted by default .

To do so , Mainly to avoid MySQL 8.0 Previously, the self incrementing primary key cannot be persisted .

When archiving the whole table , This needs attention .

If you need to delete , Need to specify --no-safe-auto-increment.

--source

Give the information of the source instance .

In addition to the commonly used options , It also supports the following options ：

a： Specify the default database for the connection .

b： Set up SQL_LOG_BIN=0 .

If it is specified in the source library , be DELETE The operation will not be written to Binlog in .

If it is specified in the target library , be INSERT The operation will not be written to Binlog in .

i： Set the index used by the archive operation , The default is primary key .

--progress

Show progress information , Number of rows per unit .

Such as --progress 10000, Then every file （ Delete ）10000 That's ok , Just print the progress information once .

TIME ELAPSED COUNT

2022-03-06T18:24:19 0 0

2022-03-06T18:24:20 0 10000

2022-03-06T18:24:21 1 20000

The first column is the current time , The second column is the elapsed time , The third column is archived （ Delete ） The number of rows .

Ten 、 summary

front , We compared the execution time of different parameters in the archiving operation .

among ,--bulk-delete --limit 1000 --commit-each --bulk-insert It's the fastest . Not specifying any batch operation parameters is the slowest .

But in the use of --bulk-insert Pay attention to , If there is a problem during the import process ,pt-archiver No error will be prompted .

Common errors are primary key conflicts , The data type of the data and the target column are inconsistent .

If not used --bulk-insert, But by default INSERT Operation to archive , Most errors can be identified .

for example , Primary key conflict , The following errors will be prompted .

DBD::mysql::st execute failed: Duplicate entry 'd001' for key 'PRIMARY' [for Statement "INSERT INTO `employees`.`departments`(`dept_no`,`dept_name`) VALUES (?,?)" with ParamValues: 0='d001', 1='Marketing'] at /usr/local/bin/pt-archiver line 6772.

The imported data is inconsistent with the data type of the target column , The following errors will be prompted .

DBD::mysql::st execute failed: Incorrect integer value: 'Marketing' for column 'dept_name' at row 1 [for Statement "INSERT INTO `employees`.`departments`(`dept_no`,`dept_name`) VALUES (?,?)" with ParamValues: 0='d001', 1='Marketing'] at /usr/local/bin/pt-archiver line 6772.

Of course , Inconsistent data and type , The premise that can be identified is to archive the instance SQL_MODE For the strict model .

If the instance to be archived has MySQL 5.6, In fact, it is difficult for us to archive the SQL_MODE Turn on to strict mode .

because MySQL 5.6 Of SQL_MODE The default is non strict mode , Therefore, it is inevitable to produce a lot of invalid data , For example, in the time field 0000-00-00 00:00:00 .

This invalid data , If you insert it into an archive instance with strict mode turned on , Will report an error directly .

From the perspective of data security , The most recommended filing method is ：

1） Archive first , But do not delete the data of the source library .

2） Compare whether the data of source database and archive database are consistent .

3） If the comparison results are consistent , Then delete the archived data of the source library .

among , The first and third steps can be through pt-archiver Get it done , The second step can be through pt-table-sync Get it done .

This method of deleting while archiving , Although it's a lot of trouble , But relatively speaking , More secure .

原网站

版权声明
本文为[JavaShark]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/209/202207281946046319.html

当前位置：网站首页>How does MySQL archive data?

How does MySQL archive data?

边栏推荐

猜你喜欢

随机推荐