当前位置:网站首页>How much disk IO does a byte of read file actually take place?
How much disk IO does a byte of read file actually take place?
2020-11-08 16:12:00 【Zhang Yanfei Allen】
No matter what language you use ,C/PHP/GO、 still Java, I believe everyone has the experience of reading files . Let's think about two questions , If we read a byte in a file :
- Whether a disk will occur IO?
- If it happens ,Linux How many bytes were actually read to disk ?
To make it easier to understand the problem , We put c The code for is listed :
int main()
{
char c;
int in;
in = open("in.txt", O_RDONLY);
read(in,&c,1);
return 0;
}
If not engaged in c/c++ Students in the development work , It's really not easy to understand this problem in depth . Because the mainstream language that is commonly used at present ,PHP/Java/Go The encapsulation level of what is relatively high , Many details of the kernel are completely shielded . If you want to make the above two questions clear , It needs to be cut open Linux To understand from the inside of Linux Of IO Stack .
Linux IO Introduction to the stack
I don't say much nonsense , Let's go straight to Linux IO A simplified version of the stack is drawn :( Official IO The stack refers to this Linux.IO.stack_v1.0.pdf)
We also shared several articles earlier on the hardware layer in the figure above , And file system module . But through this IO Stack we found , We are right. Linux Of documents IO The understanding of is still far from enough , There are several kernel components :IO engine 、VFS、PageCache、 General management block 、IO We don't know much about scheduling layer and other modules . take it easy , Let's come together :
IO engine
We develop students who want to read and write files , stay lib The library layer has many functions to choose from , such as read,write,mmap etc. . This is actually a choice Linux Provided IO engine . What we use most often read、write Functions belong to sync engine , except sync, also map、psync、vsync、libaio、posixaio etc. . sync,psync It's all synchronous ,libaio and posixaio It's asynchronous IO.
Yes, of course IO The engine also needs VFS、 General block layer and other lower level support can be realized . stay sync Engine read Function will enter VFS Provided read system call .
VFS Virtual file system
In the kernel layer , The first thing to see is VFS.VFS The idea was to abstract a generic file system model , Provide a set of common interfaces for our developers or users , Let's not care Specific file system implementation .VFS There are four core data structures provided , They are defined in the kernel source code include/linux/fs.h and include/linux/dcache.h in .
- superblock:Linux Used to mark information about a specific installed file system
- inode:Linux Every file in has a inode, You can take inode The ID card that is understood as a document
- file: File objects in memory , It is used to save the correspondence between process and disk file
- desty: Catalog items , It's part of the path , All the directory entry objects are concatenated into one tree Linux Under the directory tree .
Around these four core data structures ,VFS It also defines a series of operation methods . such as ,inode The definition of the operation method of inode_operations
(include/linux/fs.h), It defines what we are very familiar with mkdir
and rename
etc. .
struct inode_operations {
......
int (*link) (struct dentry *,struct inode *,struct dentry *);
int (*unlink) (struct inode *,struct dentry *);
int (*mkdir) (struct inode *,struct dentry *,umode_t);
int (*rmdir) (struct inode *,struct dentry *);
int (*rename) (struct inode *, struct dentry *,
struct inode *, struct dentry *, unsigned int);
......
stay file Corresponding operation method file_operations
It defines what we often use read and write:
struct file_operations {
......
ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
......
int (*mmap) (struct file *, struct vm_area_struct *);
int (*open) (struct inode *, struct file *);
int (*flush) (struct file *, fl_owner_t id);
Page Cache
stay VFS Look down , We have noticed Page Cache. Its Chinese translation is called page cache , yes Linux The main disk cache used by the kernel , Is a pure memory working component , Its function is to speed up access to relatively slow disks . If you want to access the file block It happens to exist in Page Cache Inside , So there's no actual disk IO happen . If it doesn't exist , Then you will apply for a new page , Issue page break , And then read it on disk block Content to fill it in , Next time use it directly .Linux The kernel uses a search tree to efficiently manage a large number of pages .
If you have a special need you want to bypass Page Cache, Just set DIRECT_IO That's all right. . There are two situations that need to be bypassed :
- Test disk IO Real performance of
- Save the use of Page Cache When the system call falls into kernel state , And copy kernel memory to user process memory to overhead .
file system
In my previous article 《 How much disk space does a new empty file take ?》、《 Understand the principle of formatting 》 It's all about specific file systems . The two most important concepts in a file system are inode and block, We have seen both of them in previous articles . One block How big , This is decided by operation and maintenance when formatting , The general default is 4KB.
except inode and block, Each file system also defines its own actual operation function . For example, in ext4 As defined in ext4_file_operations and ext4_file_inode_operations as follows :
const struct file_operations ext4_file_operations = {
.read_iter = ext4_file_read_iter,
.write_iter = ext4_file_write_iter,
.mmap = ext4_file_mmap,
.open = ext4_file_open,
......
};
const struct inode_operations ext4_file_inode_operations = {
.setattr = ext4_setattr,
.getattr = ext4_file_getattr,
......
};
General block layer
The general block layer is all the block devices in a processing system IO The requested kernel module . It defines a name called bio To represent once IO Operation request (include/linux/bio.h).
So once bio Corresponding to IO The size unit is the page , Or sectors ? Are not , It's a paragraph ! Every bio It may contain multiple segments . A segment is a complete page , Or part of the page , Please refer to https://www.ilinuxkernel.com/files/Linux.Generic.Block.Layer.pdf.
Why come up with something so puzzling ? This is because of the data continuously stored on the disk , When it comes to memory Page Cache The memory may not be continuous . It's normal for this to happen , I can't say that continuous data in the disk, I have to use continuous space to cache in memory . Segment is to make memory available once IO DMA To many “ paragraph ” Address is not continuous in memory .
A common sector / paragraph / The page size comparison is shown in the figure below :
IO Scheduling layer
When the general block layer puts IO After the request was actually sent out , It doesn't have to be executed immediately . Because the scheduling layer will start from the overall situation , Try to make the whole disk IO Maximize performance . The general way to work is to make the head work like an elevator , Go in one direction first , Come back at the end of the day , In this way, the disk efficiency will be higher . The specific algorithms are noop,deadline and cfg etc. .
On your machine , adopt dmesg | grep -i scheduler
To check out your Linux Supported algorithms , And you can choose one of them when testing .
The process of reading files
We have Linux IO The various kernel components in the stack are introduced . Now let's go through the whole process of reading files from the beginning
- lib Inside read Function first enters the system call sys_read
- stay sys_read Enter again VFS Inside vfs_read、generic_file_read Such as function
- stay vfs Inside generic_file_read Will determine whether the cache hit , Hit returns
- If the kernel is not hit Page Cache Assign a new page box to , Issue page break ,
- The kernel initiates blocks to the general block layer I/O request , Block devices block disks 、U The difference between plates
- General block layer uses bio Representative I/O Ask to put in IO Request queue
- IO The scheduling layer uses the elevator algorithm to schedule the requests in the queue
- The driver sends a read command to the disk controller to control ,DMA The method is filled directly into Page Cache New page box in
- The controller sends out interrupt notification
- The kernel will be what the user needs 1 Bytes filled into user memory
- Then your process is awakened
You can see , If Page Cache If you hit it , There's no disk at all IO produce . therefore , Don't think that the performance will be slow if there are several read-write logic in the code . The operating system has been optimized a lot for you , Memory level access latency is about ns Grade , Than mechanical disks IO fast 2-3 An order of magnitude . If you have enough memory , Or your files are accessed frequently enough , In fact, at this time read Very few operations have real disks IO happen .
Let's look at the second situation , If Page Cache If you miss ,Linux How many bytes of disk are actually carried out IO. Whole IO Several kernel components are involved in the process . Each component uses different length blocks to manage disk data .
- Page Cache It's in pages ,Linux Page size is usually 4KB( Avoid being pricked by gods , Here under Linux Can set up large memory pages )
- File systems are managed in blocks . Use
dumpe2fs
You can see , Generally, a block defaults to 4KB - The general block layer deals with disks in segments IO Requested , A segment is a page or part of a page
- IO The scheduler passes through DMA Mode transmission N Sectors to memory , The sector is usually 512 byte
- Hard disk also uses “ A sector ” Management and transmission of data
You can see , Although we are really read-only from the user's point of view 1 Bytes ( In the opening code, we only give this disk IO Left a byte of cache ). But throughout the kernel workflow , The smallest unit of work is the sector of the disk , by 512 byte , Than 1 It's a lot bigger than a byte . in addition block、page cache Higher level components work in larger units , So the actual disk read is a lot of bytes together . If a segment is a memory page , One disk IO Namely 4KB(8 individual 512 Byte sector ) Read together .
Linux What we don't talk about in the kernel is that there is also a complex pre read strategy . therefore , In practice , Maybe it's better than 8 More sectors are transferred to memory together .
Last
The original intention of operating system is to make you simple and reliable , Let's try to think of it as a black box . You want a byte , It gives you a byte , But I did a lot of work in silence . Although most of our domestic development is not at the bottom , But if you're concerned about the performance of your application , You should understand when the operating system quietly improves your performance , How to improve . So that at some time in the future your online server can't bear to hang up , You can quickly find out where the problem lies .
Let's expand , If Page Cache missed , Then there must be disks that drive to the mechanical shaft IO Do you ?
Not necessarily , Why? , Because now the disk itself will carry a cache . In addition, today's servers will build disk arrays , The core hardware in a disk array Raid The card will also integrate RAM As caching . Only when all the caches miss , The mechanical shaft works only with a magnetic head .
Development of hard disk album of internal training :
- 1. Disk opening : Take off the hard coat of the mechanical hard disk !
- 2. Disk partitioning also implies technical skills
- 3. How can we solve the problem that mechanical hard disks are slow and easy to break down ?
- 4. Disassemble the SSD structure
- 5. How much disk space does a new empty file take ?
- 6. Only 1 How much disk space does a byte file actually take up
- 7. When there are too many documents ls Why is the command stuck ?
- 8. Understand the principle of formatting
- 9.read How much disk does a byte of file actually take place on IO?
- 10.write When to write to disk after one byte of file IO?
- 11. Mechanical hard disk random IO Slower than you think
- 12. How much faster is a server equipped with a SSD than a mechanical hard disk ?
My official account is 「 Develop internal skill and practice 」, I'm not just talking about technical theory here , It's not just about practical experience . It's about combining theory with practice , Deepen the understanding of theory with practice 、 Use theory to improve your technical practice ability . Welcome to my official account , Please also share with your friends ~~~
版权声明
本文为[Zhang Yanfei Allen]所创,转载请带上原文链接,感谢
边栏推荐
- read文件一个字节实际会发生多大的磁盘IO?
- Summary of template engine
- The first open source Chinese Bert pre training model in the financial field
- Blockchain weekly: the development of digital currency is written into the 14th five year plan; Biden invited senior adviser of MIT digital currency program to join the presidential transition team; V
- Returning to the third place in the world, what did Xiaomi do right?
- python开发qt程序读取图片的简单流程
- RestfulApi 学习笔记——父子资源(四)
- Golang system ping program to detect the surviving host (any permission)
- 技术总监7年总结,如何进行正确的沟通?
- How to make a correct summary for 7 years?
猜你喜欢
Comics: looking for the best time to buy and sell stocks
Builder pattern
c++ opencv4.3 sift匹配
Build simple business monitoring Kanban based on Alibaba cloud log service
Returning to the third place in the world, what did Xiaomi do right?
Learn to record and analyze
函数分类大pk!sigmoid和softmax,到底分别怎么用?
契约式设计(Dbc)以及其在C语言中的应用
DeepMind 最新论文解读:首次提出离散概率树中的因果推理算法
10 common software architecture patterns
随机推荐
Google's AI model, which can translate 101 languages, is only one more than Facebook
构建者模式(Builder pattern)
jsliang 求职系列 - 07 - Promise
打工人,打工魂,抽终身会员,成为人上人!
awk实现类sql的join操作
[open source]. Net uses ORM to access Huawei gaussdb database
python开发qt程序读取图片的简单流程
Improvement of rate limit for laravel8 update
漫画:寻找股票买入卖出的最佳时机(整合版)
C + + things: from rice cookers to rockets, C + + is everywhere
Mac环境安装Composer
关于adb连接手机offline的问题解决
VIM configuration tutorial + source code
区块链周报:数字货币发展写入十四五规划;拜登邀请MIT数字货币计划高级顾问加入总统过渡团队;委内瑞拉推出国营加密交易所
. net large data concurrency solution
I used Python to find out all the people who deleted my wechat and deleted them automatically
机械硬盘随机IO慢的超乎你的想象
On DSA of OpenGL
Application of four ergodic square of binary tree
Huawei has an absolute advantage in the 5g mobile phone market, and the market share of Xiaomi is divided by the market survey organization