当前位置:网站首页>Your random IO hard disk
Your random IO hard disk
2020-11-08 16:12:00 【Zhang Yanfei Allen】
We all know that hard drives are random IO Very slowly , But compared to the order IO How much slower , I don't know if you've ever had a direct digital comparison . Today, I'm going to do the actual pressure test to compare the order of the disks IO And random IO Performance data performance in different scenarios . With today's experimental data , You will have a deep understanding of why database transactions are implemented in the form of logs , Why do you want to use larger nodes in the index B+ Trees .
For any storage system , Performance is nothing more than bandwidth 、 Delay or delay IOPS. My test machine's hard disk configuration is a by 7 block 300G It is made up of ten thousand rotating mechanical disks RAID5, The pressure measuring tool is used fio, During pressure measurement , We fix a few parameters :
- IO Engine we choose libaio
- In order to avoid operating system management PageCache Memory interference with test results , Use direct Parameters bypass
- open unified_rw_reporting, Let the results show read and write respectively
- To ensure that the test is relatively accurate , We set the runtime to 300s
- Due to server sensitivity , No bare equipment is selected for pressure test object , The files used , There's a little bit of file system overhead
- The test file size is defined as 100G, my RAID The card cache is 1G, The goal is to keep it from hitting too much
- Scheduling policy we choose the most commonly used noop
- open refill_buffers, Every time I/O Rebuild the test file data fragment after submission , Guarantee randomness
- according to RAID Use configuration suggestions , Turn off the disk cache
Then, we adjust the other parameters dynamically , And then do a number of comparative tests
- In reading and writing mode , Use sequential and random reading to verify separately
- disk IO Unit we use integer multiples of sectors ,512 1K 2K ...
- RAID Card pre reading strategy , Set separately NORA( Don't turn on preview ) and RA( Open preview ) To test independently
Sequential read test
Let's look at the sequential reading case first , The bandwidth performance of the disk array , See the picture 1:
You can see , When IO size When I was younger , Even if it's sequential, continuous IO request , Bandwidth is not awesome , Only less than 20MB/s. With IO size When it's added , Bandwidth is coming up , The maximum can reach 1.2GB many .
Now, let's take a look at NORA Under the circumstances , stay 128K Add to 256K When , Bandwidth has suddenly increased a lot , Why is this ? The secret is mine RAID The stripe size in the array is 128K, When IO size by 256K When , It's only the disk array that really works in parallel .IO size When I was a child , It doesn't take advantage of multiple sets .
/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL
......
Strip Size : 128 KB
The other is the order IO The situation of ,RA Prefetching can also play a role , stay IO size stay 64k You can reach 1.2GB The bandwidth of the .
Let's look at the delay , See the picture 2:
The units in our graph are microseconds -us, stay 《 Let's talk about disk partitioning 》 in , I have theoretically estimated disk time , Disk time consumption is mainly in two places :
- Seek time :3-15ms, This time can be optimized by rational zoning
- Rotation delay : The delay is about 0-6ms
Why in the picture 2 In the experimental results , The delay is very low , stay IO size by 512 When , The average is only 30us about ? In fact, the order is IO Under the circumstances ,RAID Card cache hit rate is high , In fact, most of the read requests don't penetrate into the mechanical axis of the disk .
Let's see IOPS, See the picture 3:
stay IO request size Just for 1 When it's a sector size , Disk array IOPS The highest performance , Reached 3W Many times per second . When IO size When it's added ,IOPS In a gradual decline , But this time , In fact, the throughput of the disk is increasing .
Put it together , Disk array in order IO In the case of the performance is still very good , There are three reasons :
- The order IO Under the circumstances ,RAID The card has a high hit rate , Especially when it's set up RAID Prefetch
- The order of the single dish itself IO It's also the most comfortable state of disk work , Because it saves the seek delay
- When IO exceed RAID When it's the size of a bar ,IO It will be distributed to multiple disks for parallel processing
Random read test
When we use disks as developers , It may not be guaranteed that it will always work in the most comfortable state , Sometimes it may have to be visited randomly . So let's try my disk array in random conditions today , about fio For tools, you just need to set rw Parameter is randread As well as . however IO size I only tested 128 It stops , Because the bigger it is, the more like the order IO 了 .
Let's start with bandwidth , See the picture 4:
The mechanical hard disk is even made up of RAID array , And there's caching , It seems to be random IO There's nothing to do . At random IO Under the circumstances , Bandwidth throughput is terrible , stay IO size When I was younger , It's only a few tenths of a second .
Let's look at the delay again , See the picture 5:
In random cases, the delay is basically 5ms about , This is in line with our previous theoretical calculation . Random access leads to more requests actually penetrating the mechanical axis .
Look again. IOPS, This indicator is also very poor , That is to say 200 Or so! . This data and graph 5 The delay of the formation of echo , Processing a request 5ms about , that 1 Second is not only to deal with 200 About times . So hard disk manufacturers give you a hair every day , Talking about his disk IOPS It can reach tens of thousands of . But they never talk about randomness IO Under the circumstances , In fact, the only special thing is 200.
You see my ten thousand turn mechanical hard disk composition RAID5 array , In the case of the best sequence conditions , Bandwidth can reach 1GB/s above , The average delay is also very low , Lowest only 20 many us. But at random IO Under the circumstances , The short board of the mechanical hard disk is fully exposed , A few tenths of a megabyte of bandwidth , nearly 5ms Delay of ,IOPS Only 200 about . The reason is that
- Random access makes RAID The card cache became a device
- Disks can't work in parallel , Because of my machine RAID Width Strip Size by 128 KB
- Mechanical shafts also have to jump and fro between tracks .
Understand the disk order IO Dozens of times M Even one. GB The bandwidth of the , Random IO This is really pathetic .
Conclusion
From the above test data, we can see that the mechanical hard disk is in order IO And random IO The huge performance difference under . In order IO Under the circumstances , Disk is the best order IO, Plus Raid Card cache hit rate is also high . At this time, the bandwidth performance has dozens of 、 A few hundred M, Under the best conditions, it can even reach 1GB.IOPS There can be 2-3W about . At random IO Under the situation of , Mechanical axis is forced to jump to find out ,RAID The card cache has also failed . Bandwidth has dropped to 1MB following , Lowest only 100K,IOPS It's just pathetic 200 about .
If you really understand the data from the above experiments , Can understand a lot of things in engineering practice .
Copy folder : We all know , When copying a folder , If this folder contains a lot of heap files , It's very slow to copy . The reason is that the rate of mechanical hard disk is random IO. How to improve the replication speed ? It's simple , It's just to bag them first . After packing, the folder becomes a big file , If you copy it at this time , Disk is the best order of execution IO 了 , So it's going to be a lot faster .
Database transactions : All databases are implementing transactions , All must ensure that the write data is successfully dropped before returning . But why do they almost all return success when they are put into their own transaction log files , Instead of writing directly to a data table file . The reason behind this is disk read and write performance issues , Transactions only need to ensure that the data landing is successful , As for where to write it doesn't matter . If you write to a data file, the probability becomes random IO 了 . If you write to a log file , It's just the order IO, Performance is the ultimate .
Mysql Of B+ Trees : You can see in the above data , No matter the order IO Or random IO, Just add it every time IO The unit of , Performance will rise . Understand this , You can really understand why Mysql Is to use B+ Trees are indexes , Instead of using other trees ( For example, a binary tree ). because B+ The nodes of the tree are bigger ,IO Getting up makes the disk work more comfortable .
Finally, I would like to share a 5 My practical performance optimization case in engineering years ago . We took over a system , With millions of users imei, To Mysql To query another string of users id(clientid) data . The implementation of pre development is traditional batch Mysql Statement query . In this way , Not to mention the network many times RTT Time consuming , speak only one point Mysql Inquire about , Even if there is an index, a lot of randomization is needed IO, Because the user imei It's randomly distributed . The optimization I used was also very simple , Put... Directly Mysql The user table passes the order of the whole user table at one time IO The way to read it out ,load Into memory . Use... In memory HashTable Organize , adopt Hash For quick query . In the end, the time-consuming optimization was lost 90% above .
Development of hard disk album of internal training :
- 1. Disk opening : Take off the hard coat of the mechanical hard disk !
- 2. Disk partitioning also implies technical skills
- 3. How can we solve the problem that mechanical hard disks are slow and easy to break down ?
- 4. Disassemble the SSD structure
- 5. How much disk space does a new empty file take ?
- 6. Only 1 How much disk space does a byte file actually take up
- 7. When there are too many documents ls Why is the command stuck ?
- 8. Understand the principle of formatting
- 9.read How much disk does a byte of file actually take place on IO?
- 10.write When to write to disk after one byte of file IO?
- 11. Mechanical hard disk random IO Slower than you think
- 12. How much faster is a server equipped with a SSD than a mechanical hard disk ?
My official account is 「 Develop internal skill and practice 」, I'm not just talking about technical theory here , It's not just about practical experience . It's about combining theory with practice , Deepen the understanding of theory with practice 、 Use theory to improve your technical practice ability . Welcome to my official account , Please also share with your friends ~~~
版权声明
本文为[Zhang Yanfei Allen]所创,转载请带上原文链接,感谢
边栏推荐
- AI周报:允许“员工自愿降薪”;公司回应:员工内心高兴满意;虎牙HR将员工抬出公司;瑞典禁用华为中兴5G设备
- 关于update操作并发问题
- 聊聊Go代码覆盖率技术与最佳实践
- Gopherchina 2020 Conference
- Don't release resources in finally, unlock a new pose!
- Dev-c++在windows环境下无法debug(调试)的解决方案
- RestfulApi 学习笔记——父子资源(四)
- The network adapter could not establish the connection
- Tips and skills of CSP examination
- 数据库连接报错之IO异常(The Network Adapter could not establish the connection)
猜你喜欢
I used Python to find out all the people who deleted my wechat and deleted them automatically
二叉树的四种遍历方应用
GopherChina 2020大会
We made a medical version of the MNIST dataset, and found that the common automl algorithm is not so easy to use
打工人,打工魂,抽终身会员,成为人上人!
Recurrence of Apache kylin Remote Code Execution Vulnerability (cve-2020-1956)
2020-11-05
软件开发中如何与人协作? | 每日趣闻
CSP考试须知与各种小技巧
Gopherchina 2020 Conference
随机推荐
SQL quick query
DeepMind 最新论文解读:首次提出离散概率树中的因果推理算法
Interpretation of deepmind's latest paper: the causal reasoning algorithm in discrete probability tree is proposed for the first time
2035我们将建成这样的国家
第五章编程题
浅谈OpenGL之DSA
漫画:寻找股票买入卖出的最佳时机(整合版)
Dev-c++在windows环境下无法debug(调试)的解决方案
Tencent: Although Ali's Taichung is good, it is not omnipotent!
The first open source Chinese Bert pre training model in the financial field
佛萨奇forsage以太坊智能合约是什么?以太坊全球滑落是怎么回事
LiteOS-消息队列
Learn to record and analyze
How to solve the conflict when JD landed on Devops platform?
Golang ICMP Protocol detects viable hosts
On the software of express delivery cabinet and deposit cabinet under Windows
write文件一个字节后何时发起写磁盘IO
学习记录并且简单分析
VIM configuration tutorial + source code
Elasticsearch learning one (basic introduction)