当前位置:网站首页>Your random IO hard disk
Your random IO hard disk
2020-11-08 16:12:00 【Zhang Yanfei Allen】
We all know that hard drives are random IO Very slowly , But compared to the order IO How much slower , I don't know if you've ever had a direct digital comparison . Today, I'm going to do the actual pressure test to compare the order of the disks IO And random IO Performance data performance in different scenarios . With today's experimental data , You will have a deep understanding of why database transactions are implemented in the form of logs , Why do you want to use larger nodes in the index B+ Trees .
For any storage system , Performance is nothing more than bandwidth 、 Delay or delay IOPS. My test machine's hard disk configuration is a by 7 block 300G It is made up of ten thousand rotating mechanical disks RAID5, The pressure measuring tool is used fio, During pressure measurement , We fix a few parameters :
- IO Engine we choose libaio
- In order to avoid operating system management PageCache Memory interference with test results , Use direct Parameters bypass
- open unified_rw_reporting, Let the results show read and write respectively
- To ensure that the test is relatively accurate , We set the runtime to 300s
- Due to server sensitivity , No bare equipment is selected for pressure test object , The files used , There's a little bit of file system overhead
- The test file size is defined as 100G, my RAID The card cache is 1G, The goal is to keep it from hitting too much
- Scheduling policy we choose the most commonly used noop
- open refill_buffers, Every time I/O Rebuild the test file data fragment after submission , Guarantee randomness
- according to RAID Use configuration suggestions , Turn off the disk cache
Then, we adjust the other parameters dynamically , And then do a number of comparative tests
- In reading and writing mode , Use sequential and random reading to verify separately
- disk IO Unit we use integer multiples of sectors ,512 1K 2K ...
- RAID Card pre reading strategy , Set separately NORA( Don't turn on preview ) and RA( Open preview ) To test independently
Sequential read test
Let's look at the sequential reading case first , The bandwidth performance of the disk array , See the picture 1:
You can see , When IO size When I was younger , Even if it's sequential, continuous IO request , Bandwidth is not awesome , Only less than 20MB/s. With IO size When it's added , Bandwidth is coming up , The maximum can reach 1.2GB many .
Now, let's take a look at NORA Under the circumstances , stay 128K Add to 256K When , Bandwidth has suddenly increased a lot , Why is this ? The secret is mine RAID The stripe size in the array is 128K, When IO size by 256K When , It's only the disk array that really works in parallel .IO size When I was a child , It doesn't take advantage of multiple sets .
/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL
......
Strip Size : 128 KB
The other is the order IO The situation of ,RA Prefetching can also play a role , stay IO size stay 64k You can reach 1.2GB The bandwidth of the .
Let's look at the delay , See the picture 2:
The units in our graph are microseconds -us, stay 《 Let's talk about disk partitioning 》 in , I have theoretically estimated disk time , Disk time consumption is mainly in two places :
- Seek time :3-15ms, This time can be optimized by rational zoning
- Rotation delay : The delay is about 0-6ms
Why in the picture 2 In the experimental results , The delay is very low , stay IO size by 512 When , The average is only 30us about ? In fact, the order is IO Under the circumstances ,RAID Card cache hit rate is high , In fact, most of the read requests don't penetrate into the mechanical axis of the disk .
Let's see IOPS, See the picture 3:
stay IO request size Just for 1 When it's a sector size , Disk array IOPS The highest performance , Reached 3W Many times per second . When IO size When it's added ,IOPS In a gradual decline , But this time , In fact, the throughput of the disk is increasing .
Put it together , Disk array in order IO In the case of the performance is still very good , There are three reasons :
- The order IO Under the circumstances ,RAID The card has a high hit rate , Especially when it's set up RAID Prefetch
- The order of the single dish itself IO It's also the most comfortable state of disk work , Because it saves the seek delay
- When IO exceed RAID When it's the size of a bar ,IO It will be distributed to multiple disks for parallel processing
Random read test
When we use disks as developers , It may not be guaranteed that it will always work in the most comfortable state , Sometimes it may have to be visited randomly . So let's try my disk array in random conditions today , about fio For tools, you just need to set rw Parameter is randread As well as . however IO size I only tested 128 It stops , Because the bigger it is, the more like the order IO 了 .
Let's start with bandwidth , See the picture 4:
The mechanical hard disk is even made up of RAID array , And there's caching , It seems to be random IO There's nothing to do . At random IO Under the circumstances , Bandwidth throughput is terrible , stay IO size When I was younger , It's only a few tenths of a second .
Let's look at the delay again , See the picture 5:
In random cases, the delay is basically 5ms about , This is in line with our previous theoretical calculation . Random access leads to more requests actually penetrating the mechanical axis .
Look again. IOPS, This indicator is also very poor , That is to say 200 Or so! . This data and graph 5 The delay of the formation of echo , Processing a request 5ms about , that 1 Second is not only to deal with 200 About times . So hard disk manufacturers give you a hair every day , Talking about his disk IOPS It can reach tens of thousands of . But they never talk about randomness IO Under the circumstances , In fact, the only special thing is 200.
You see my ten thousand turn mechanical hard disk composition RAID5 array , In the case of the best sequence conditions , Bandwidth can reach 1GB/s above , The average delay is also very low , Lowest only 20 many us. But at random IO Under the circumstances , The short board of the mechanical hard disk is fully exposed , A few tenths of a megabyte of bandwidth , nearly 5ms Delay of ,IOPS Only 200 about . The reason is that
- Random access makes RAID The card cache became a device
- Disks can't work in parallel , Because of my machine RAID Width Strip Size by 128 KB
- Mechanical shafts also have to jump and fro between tracks .
Understand the disk order IO Dozens of times M Even one. GB The bandwidth of the , Random IO This is really pathetic .
Conclusion
From the above test data, we can see that the mechanical hard disk is in order IO And random IO The huge performance difference under . In order IO Under the circumstances , Disk is the best order IO, Plus Raid Card cache hit rate is also high . At this time, the bandwidth performance has dozens of 、 A few hundred M, Under the best conditions, it can even reach 1GB.IOPS There can be 2-3W about . At random IO Under the situation of , Mechanical axis is forced to jump to find out ,RAID The card cache has also failed . Bandwidth has dropped to 1MB following , Lowest only 100K,IOPS It's just pathetic 200 about .
If you really understand the data from the above experiments , Can understand a lot of things in engineering practice .
Copy folder : We all know , When copying a folder , If this folder contains a lot of heap files , It's very slow to copy . The reason is that the rate of mechanical hard disk is random IO. How to improve the replication speed ? It's simple , It's just to bag them first . After packing, the folder becomes a big file , If you copy it at this time , Disk is the best order of execution IO 了 , So it's going to be a lot faster .
Database transactions : All databases are implementing transactions , All must ensure that the write data is successfully dropped before returning . But why do they almost all return success when they are put into their own transaction log files , Instead of writing directly to a data table file . The reason behind this is disk read and write performance issues , Transactions only need to ensure that the data landing is successful , As for where to write it doesn't matter . If you write to a data file, the probability becomes random IO 了 . If you write to a log file , It's just the order IO, Performance is the ultimate .
Mysql Of B+ Trees : You can see in the above data , No matter the order IO Or random IO, Just add it every time IO The unit of , Performance will rise . Understand this , You can really understand why Mysql Is to use B+ Trees are indexes , Instead of using other trees ( For example, a binary tree ). because B+ The nodes of the tree are bigger ,IO Getting up makes the disk work more comfortable .
Finally, I would like to share a 5 My practical performance optimization case in engineering years ago . We took over a system , With millions of users imei, To Mysql To query another string of users id(clientid) data . The implementation of pre development is traditional batch Mysql Statement query . In this way , Not to mention the network many times RTT Time consuming , speak only one point Mysql Inquire about , Even if there is an index, a lot of randomization is needed IO, Because the user imei It's randomly distributed . The optimization I used was also very simple , Put... Directly Mysql The user table passes the order of the whole user table at one time IO The way to read it out ,load Into memory . Use... In memory HashTable Organize , adopt Hash For quick query . In the end, the time-consuming optimization was lost 90% above .
Development of hard disk album of internal training :
- 1. Disk opening : Take off the hard coat of the mechanical hard disk !
- 2. Disk partitioning also implies technical skills
- 3. How can we solve the problem that mechanical hard disks are slow and easy to break down ?
- 4. Disassemble the SSD structure
- 5. How much disk space does a new empty file take ?
- 6. Only 1 How much disk space does a byte file actually take up
- 7. When there are too many documents ls Why is the command stuck ?
- 8. Understand the principle of formatting
- 9.read How much disk does a byte of file actually take place on IO?
- 10.write When to write to disk after one byte of file IO?
- 11. Mechanical hard disk random IO Slower than you think
- 12. How much faster is a server equipped with a SSD than a mechanical hard disk ?
My official account is 「 Develop internal skill and practice 」, I'm not just talking about technical theory here , It's not just about practical experience . It's about combining theory with practice , Deepen the understanding of theory with practice 、 Use theory to improve your technical practice ability . Welcome to my official account , Please also share with your friends ~~~
版权声明
本文为[Zhang Yanfei Allen]所创,转载请带上原文链接,感谢
边栏推荐
猜你喜欢
Examples of unconventional aggregation
wanxin finance
学习记录并且简单分析
Station B STM32 video learning
Welcome to offer, grade P7, face-to-face sharing, 10000 words long text to take you through the interview process
laravel8更新之维护模式改进
小米、OPPO在欧洲市场继续飙涨,小米更是直逼苹果
C + + things: from rice cookers to rockets, C + + is everywhere
Talking about, check the history of which famous computer viruses, 80% of the people do not know!
On the software of express delivery cabinet and deposit cabinet under Windows
随机推荐
别再在finally里面释放资源了,解锁个新姿势!
read文件一个字节实际会发生多大的磁盘IO?
机械硬盘随机IO慢的超乎你的想象
LiteOS-消息队列
Application of four ergodic square of binary tree
API生命周期的5个阶段
markdown使用
I used Python to find out all the people who deleted my wechat and deleted them automatically
AI weekly: employees are allowed to voluntarily reduce salary; company response: employees are happy and satisfied; tiger tooth HR takes employees out of the company; Sweden forbids Huawei ZTE 5g equi
How to solve the difference between NAT IP and port IP
刚刚好,才是最理想的状态
Mac环境安装Composer
How to cooperate with people in software development? |Daily anecdotes
我用 Python 找出了删除我微信的所有人并将他们自动化删除了
Apache Kylin远程代码执行漏洞复现(CVE-2020-1956)
构建者模式(Builder pattern)
非常规聚合问题举例
聊聊Go代码覆盖率技术与最佳实践
AI周报:允许“员工自愿降薪”;公司回应:员工内心高兴满意;虎牙HR将员工抬出公司;瑞典禁用华为中兴5G设备
一分钟全面看懂forsage智能合约全球共享以太坊矩阵计划