当前位置:网站首页>The maximum recommended number of rows for MySQL is 2000W. Is it reliable?
The maximum recommended number of rows for MySQL is 2000W. Is it reliable?
2022-07-28 09:32:00 【JD technology developer】
1 background
As an old driver driving in the back circle for many years , Have you often heard ,“mysql A single watch should not exceed 2000w”,“ A single watch exceeds 2000w It's time to consider data migration ”,“ The data of your table will arrive soon 2000w 了 , No wonder the query speed is slow ”
These famous folk sayings are similar to “ The group only discusses technology , Don't drive , Don't drive faster than 120 code , Otherwise, it will kick the group automatically ”, Only heard of it , Never tried. , ha-ha .
Now let's step on the speed to the end , Dry to 180 Give it a try …….
2 experiment
Experiment and have a look …
Build a table
CREATE TABLE person(id int NOT NULL AUTO_INCREMENT PRIMARY KEY comment ' Primary key ',person_id tinyint not null comment ' user id',person_name VARCHAR(200) comment ' User name ',gmt_create datetime comment ' Creation time ',gmt_modified datetime comment ' Modification time ') comment ' Personnel information sheet ';
Insert a piece of data
insert into person values(1,1,'user_1', NOW(), now());
utilize mysql Pseudo column rownum Set the pseudo column starting point to 1
select (@i:[email protected]+1) as rownum, person_name from person, (select @i:=100) as init;set @i=1;
Run the following sql, Continuous execution 20 Time , Namely 2 Of 20 The power is about 100w The data of ; perform 23 The next is 2 Of 23 The power is about 800w , In this way, tens of millions of test data can be inserted , If you don't want to double the number , But think a little , A small increase , There's a trick , Is in the SQL Add where Conditions , Such as id> A certain value can be used to control the increased amount of data .
insert into person(id, person_id, person_name, gmt_create, gmt_modified)select @i:[email protected]+1,left(rand()*10,10) as person_id,concat('user_',@i%2048),date_add(gmt_create,interval + @i*cast(rand()*100 as signed) SECOND),date_add(date_add(gmt_modified,interval [email protected]*cast(rand()*100 as signed) SECOND), interval + cast(rand()*1000000 as signed) SECOND)from person;
Note here , Maybe you're getting close 800w perhaps 1000w Data time , Will report a mistake :The total number of locks exceeds the lock table size, This is because your temporary table memory is not set large enough , Just expand the setting parameters .
SET GLOBAL tmp_table_size =512*1024*1024; (512M)SET global innodb_buffer_pool_size= 1*1024*1024*1024 (1G);
Let's first look at a set of test data , This set of data is in mysql8.0 Version of , And it's on my computer , Because this machine is still running idea , Browser and other tools , So it is not machine configuration or database configuration , So the test data is limited to reference .


It seems that this group of data really corresponds to the title , When the data reaches 2000w in the future , The query duration rises sharply ; Is this the iron rule ?
Now let's take a look at the recommended value 2kw How did you get it ?
3 Single table quantity limit
First, let's think about the maximum number of rows in a single table of the database ?
CREATE TABLE person(id int(10) NOT NULL AUTO_INCREMENT PRIMARY KEY comment ' Primary key ',person_id tinyint not null comment ' user id',person_name VARCHAR(200) comment ' User name ',gmt_create datetime comment ' Creation time ',gmt_modified datetime comment ' Modification time ') comment ' Personnel information sheet ';
Look at the table above sql,id It's the primary key , Itself is the only , In other words, the size of the primary key can limit the upper limit of the table , If the primary key declares int size , That is to say 32 position , So support 2^32-1 ~~21 Billion ; If it is bigint, That's it 2^62-1 ?(36893488147419103232), It's hard to imagine how big this is , Generally, it is not before this limit , Maybe the database is full !!
Someone counted it , If you build a watch , The auto increment field selects unsigned bigint , Then the maximum value of self growth is 18446744073709551615, Add a new record per second , About when will it be used up ?

4 Table space
Now let's take a look at the structure of the index , by the way , What we will talk about next is based on Innodb Engine , Everybody knows Innodb The internal index of is B+ Trees

The data in this table , Storing on hard disk is similar , It is actually placed in a place called person.ibd (innodb data) In the file of , Also called table space ; Although the data sheet , They seem to be connected one by one , But in fact, it is divided into many small data pages in the document , And every one is 16K. It's like this , Of course, this is just our abstraction , There is another segment in the table space 、 District 、 Group and many other concepts , But we need to jump out and see .

5 The data structure of the page
Because each page only 16K Size , But if there is a lot of data , There must be no room for these data on that page , Then the data will be divided into other pages , So in order to link these pages , There must be a record of the front and back page addresses , It is convenient to find the corresponding page ; At the same time, every page is unique , Then you need a unique logo to mark the page , It's the page number ; Data will be recorded in the page, so there will be read and write operations , There will be interrupts or other exceptions in the read-write operation, resulting in incomplete data , Then you need a verification mechanism , So there is also a check code in it , The most important thing about read operation is efficiency , If you traverse one by one according to the records , That must be very laborious , Therefore, the corresponding page directory will be generated for the data (Page Directory); So the internal structure of the actual page is like the following .

As you can see from the diagram , One InnoDB The storage space of data pages is roughly divided into 7 Parts of , The number of bytes occupied by some parts is determined , The number of bytes occupied by some parts is uncertain .
On page 7 Among the three components , The records stored by ourselves will be stored in... According to the row format we specify User Records part .
But at the beginning of the page generation , Not really User Records This part , Every time we insert a record , Will come from Free Space part , In other words, the unused storage space in which a record size is applied is divided into User Records part , When Free Space Part of the space is completely User Records After partial substitution , It means that this page is used up , If there are any new records to insert , You need to apply for a new page . The process is illustrated as follows .

Just now, we talked about the process of adding data .
Let's talk about , Data search process , Suppose we need to find a record , We can load every page in the table space into memory , Then judge whether the record is what we want one by one , When the amount of data is small , No problem , Memory can also support ; But the reality is so cruel , Will not give you this situation ; To solve this problem ,mysql There is the concept of index in ; As we all know, index can speed up the query of data , What the hell is going on ? Now I'll take a look .
6 The data structure of the index
stay mysql The data structure of the index in is almost the same as that of the page just described , And the size is also 16K, But what is recorded in the index page is the page ( Data pages , Index page ) Minimum primary key for id And page number , And adding hierarchical information to the index page , from 0 Start counting up , So there is the concept of hierarchy between pages .

After seeing this picture , Is it a little similar , Is it like a binary tree , Yes , you 're right ! It's just a tree , It's just that we simply draw three nodes here ,2 Layer structure , If there's more data , It may extend to 3 A tree of layers , This is what we often say B+ Trees , On the bottom floor page level =0, That is, the leaf node , The rest are non leaf nodes .

Look at the picture , Let's take a single node , First, it is a non leaf node ( Index page ), In its content area id and Page number and address are two parts , This id Is the smallest record recorded in the corresponding page id value , The page number address is a pointer to the corresponding page ; Data pages are almost the same , The difference is that the data page records the real row data rather than the page address , and id Is also sequential .
7 Recommended value of single table
So let's do that 3 layer ,2 Bifurcation ( In fact, it is M Bifurcation ) To illustrate the process of finding a row of data .
For example, we need to find a id=6 Row data , Because the page number and the smallest of the page are stored in the non leaf node id, So we start from the top , First look at the page number 10 In the directory , Yes [id=1, Page number =20],[id=5, Page number =30], Note that the left node is the smallest id by 1, The right node is the smallest id yes 5;6>5, Then follow the rule of dichotomy , Make sure to continue searching towards the right node , Find the page number 30 After the node , It is found that this node has child nodes ( Nonleaf node ), Then keep comparing , Empathy ,6>5&&6<7, So I found the page number 60, Find the page number 60 after , It is found that this node is a leaf node ( Data nodes ), Then load the data of this page into the memory for one-to-one comparison , It turned out that id=6 The data line .
From the above process, we find , We are looking for id=6 The data of , A total of three pages were queried , If all three pages are on disk ( Not loaded into memory in advance ), Then you need to experience the disk up to three times IO.
It should be noted that , The page number in the figure is just an example , In fact, it is not continuous , Storage on disk is not necessarily sequential .

thus , We probably know how the data structure of the table is , You probably know how to query data , In this way, we can roughly estimate how much data such a structure can store .
From the diagram above, we know B+ It is the leaf node of the number that has data , Non leaf nodes are used to store index data .
therefore , The same one 16K Page of , Every data in a non leaf node points to a new page , There are two possibilities for a new page
- If it's a leaf node , Then there are rows of data
- If it is a non leaf node , Then it will continue to point to new pages
hypothesis
- The number of non leaf nodes pointing to other pages is x
- The number of data rows that can be accommodated in the leaf node is y
- B+ The number of layers of the number is z
This is shown in the following figure
Total =x^(z-1) *y That is to say, the total will be equal to x Of z-1 Power And Y The product of the .

X =?
The structure of the page has been introduced at the beginning of the article , Indexes are no exception , There will be File Header(38 byte)、Page Header (56 Byte)、Infimum + Supermum(26 byte)、File Trailer(8byte), Plus the page directory , Probably 1k about , Let's treat it as 1K, The size of the whole page is 16K, be left over 15k For storing data , The primary key and page number are mainly recorded in the index page , The primary key is assumed to be Bigint(8 byte), And the page number is also fixed (4Byte), Then a piece of data in the index page is 12byte; therefore x=15*1024/12≈1280 That's ok .
The structure of leaf nodes and non leaf nodes is the same , Empathy , The space for data is also 15k; But the leaf node stores real row data , There will be many more factors influencing this , such as , Type of field , Number of fields ; The larger the space occupied by each row of data , The fewer rows will be placed in the page ; Here, we temporarily press one row of data 1k To calculate , That page can be saved 15 strip ,Y≈15.
That's it , Is there a spectrum in your heart
According to the above formula ,Total =x^(z-1) y, It is known that x=1280,y=15
hypothesis B+ The tree has two layers , That's it Z =2, Total = (1280 ^1 )15 = 19200
hypothesis B+ The tree has three layers , That's it Z =3, Total = (1280 ^2) *15 = 24576000 ( about 2.45kw)
Oh dear , Mama ah ! This is exactly the recommended maximum number of lines at the beginning of the article 2000w Well ! Right , commonly B+ The number of levels is at most 3 layer , Think about it , If it is 4 layer , Except for the disk when querying IO The number of times will increase , And this Total What would it be worth , It should be 3 More than 10 billion , It's not very reasonable , therefore ,3 Layer should be a reasonable value .
No
We were just saying Y The value of is assumed to be 1K , For example, the data space occupied by my industry is not 1K , It is 5K, Then a single data page can only be put down at most 3 Data
Again , Or in accordance with Z=3 To calculate , that Total = (1280 ^2) *3 = 4915200 ( near 500w)
therefore , At the same level ( Similar query performance ) Under the circumstances , When the row data size is different , In fact, the maximum recommended value is also different , And there are many other factors that affect query performance , such as , Database version , Server configuration ,sql And so on ,MySQL To improve performance , The index of the table is loaded into memory . stay InnoDB buffer size In enough cases , It can complete full load into memory , There will be no problem with the query . however , When a single table database reaches the upper limit of a certain magnitude , Causes memory to be unable to store its index , Make the following SQL The query will produce a disk IO, This leads to performance degradation , So add hardware configuration ( For example, using memory as a disk makes ), It may bring immediate performance improvement .
8 summary
- Mysql The table data of is stored in the form of pages , Pages are not necessarily continuous on disk .
- The space of the page is 16K, Not all spaces are used to store data , There will be some fixed information , Such as , Header , footer , Page number , Check code, etc .
- stay B+ In the tree , The data structures of leaf nodes and non leaf nodes are the same , The difference lies in , Leaf nodes store actual row data , Instead of leaf nodes, they store primary keys and page numbers .
- The index structure will not affect the maximum number of rows in a single table ,2kw It's just the recommended value , Exceeding this value may result in B+ The tree level is higher , Affect query performance .
9 Reference resources
- https://www.jianshu.com/p/cf5d381ef637
- MySQL Data page structure of - Mo Tianlun
- 《MYSQL kernel :INNODB Storage engine volume 1》
边栏推荐
- 5 运算符、表达式和语句
- C simply call FMU for simulation calculation
- [swintransformer source code reading II] window attention and shifted window attention
- CakePHP 4.4.3 release, PHP rapid development framework
- 股指期货开户的条件和流程
- 7 C control statements: branches and jumps
- Informatics Olympiad all in one 1617: circle game | 1875: [13noip improvement group] circle game | Luogu p1965 [noip2013 improvement group] circle game
- MySQL 8.0.30 GA
- How does gbase 8A use preprocessing to quickly insert data?
- Magic brace- [group theory] [Burnside lemma] [matrix fast power]
猜你喜欢

什么是跨域?如何解决请跨域问题?

阿里云服务器搭建和宝塔面板连接
![Rgb-t tracking: [multimodal fusion] visible thermal UAV tracking: a large scale benchmark and new baseline](/img/9b/b8b1148406e8e521f12ddd5c12bf89.png)
Rgb-t tracking: [multimodal fusion] visible thermal UAV tracking: a large scale benchmark and new baseline

Bluetooth technology | the total scale of charging piles in Beijing will reach 700000 in 2025. Talk about the indissoluble relationship between Bluetooth and charging piles

对话MySQL之父:代码一次性完成才是优秀程序员

Get started quickly with flask (I) understand the framework flask, project structure and development environment

【打包部署】

An entry artifact tensorflowplayground

RGB-T追踪——【多模态融合】Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline

2022 safety officer-b certificate examination simulated 100 questions and answers
随机推荐
《我的Vivado实战—单周期CPU指令分析》
快速上手Flask(一) 认识框架Flask、项目结构、开发环境
The new mode of 3D panoramic display has become the key to breaking the game
Regular expressions are hexadecimal digits?
[English postgraduate entrance examination vocabulary training camp] day 15 - analyze, general, avoid, surveillance, compared
2022年起重机司机(限桥式起重机)考试题库及模拟考试
MySQL 8.0.30 GA
Get started quickly with flask (I) understand the framework flask, project structure and development environment
VR panoramic shooting helps promote the diversity of B & B
Promise学习笔记
【杂谈】程序员的发展最需要两点能力
mq的学习
Conference OA system
MATLAB的实时编辑器
01 tensorflow calculation model (I) - calculation diagram
Rgb-t tracking: [multimodal fusion] visible thermal UAV tracking: a large scale benchmark and new baseline
2022 supplementary questions for the first session of Niuke multi school
Detailed introduction of v-bind instruction
What is the difference between these two sets of code?
Send a message to the background when closing the page