当前位置：网站首页>How much disk IO does a byte of read file actually take place?

How much disk IO does a byte of read file actually take place?

2020-11-08 16:12:00 【Zhang Yanfei Allen】

Want to make APP Same thing as WeChat , Can run small programs smoothly ？ | Experience will send you to Xinjiang 、 Huawei 、 Cherry keyboard ！>>>

No matter what language you use ,C/PHP/GO、 still Java, I believe everyone has the experience of reading files . Let's think about two questions , If we read a byte in a file ：

Whether a disk will occur IO？
If it happens ,Linux How many bytes were actually read to disk ？

To make it easier to understand the problem , We put c The code for is listed ：

int main()  
{  	
	char    c;  
	int     in;

	in = open("in.txt", O_RDONLY); 
	read(in,&c,1);
	return 0;  
}

If not engaged in c/c++ Students in the development work , It's really not easy to understand this problem in depth . Because the mainstream language that is commonly used at present ,PHP/Java/Go The encapsulation level of what is relatively high , Many details of the kernel are completely shielded . If you want to make the above two questions clear , It needs to be cut open Linux To understand from the inside of Linux Of IO Stack .

Linux IO Introduction to the stack

I don't say much nonsense , Let's go straight to Linux IO A simplified version of the stack is drawn ：（ Official IO The stack refers to this Linux.IO.stack_v1.0.pdf）

file

We also shared several articles earlier on the hardware layer in the figure above , And file system module . But through this IO Stack we found , We are right. Linux Of documents IO The understanding of is still far from enough , There are several kernel components ：IO engine 、VFS、PageCache、 General management block 、IO We don't know much about scheduling layer and other modules . take it easy , Let's come together ：

IO engine

We develop students who want to read and write files , stay lib The library layer has many functions to choose from , such as read,write,mmap etc. . This is actually a choice Linux Provided IO engine . What we use most often read、write Functions belong to sync engine , except sync, also map、psync、vsync、libaio、posixaio etc. . sync,psync It's all synchronous ,libaio and posixaio It's asynchronous IO.

Yes, of course IO The engine also needs VFS、 General block layer and other lower level support can be realized . stay sync Engine read Function will enter VFS Provided read system call .

VFS Virtual file system

In the kernel layer , The first thing to see is VFS.VFS The idea was to abstract a generic file system model , Provide a set of common interfaces for our developers or users , Let's not care Specific file system implementation .VFS There are four core data structures provided , They are defined in the kernel source code include/linux/fs.h and include/linux/dcache.h in .

superblock：Linux Used to mark information about a specific installed file system
inode：Linux Every file in has a inode, You can take inode The ID card that is understood as a document
file： File objects in memory , It is used to save the correspondence between process and disk file
desty： Catalog items , It's part of the path , All the directory entry objects are concatenated into one tree Linux Under the directory tree .

Around these four core data structures ,VFS It also defines a series of operation methods . such as ,inode The definition of the operation method of inode_operations(include/linux/fs.h), It defines what we are very familiar with mkdir and rename etc. .

struct inode_operations {
				......
        int (*link) (struct dentry *,struct inode *,struct dentry *);
        int (*unlink) (struct inode *,struct dentry *);
        int (*mkdir) (struct inode *,struct dentry *,umode_t);
        int (*rmdir) (struct inode *,struct dentry *);
        int (*rename) (struct inode *, struct dentry *,
                        struct inode *, struct dentry *, unsigned int);
        ......

stay file Corresponding operation method file_operations It defines what we often use read and write：


struct file_operations {
				......
        ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
        ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
				......
        int (*mmap) (struct file *, struct vm_area_struct *);
        int (*open) (struct inode *, struct file *);
        int (*flush) (struct file *, fl_owner_t id);

Page Cache

stay VFS Look down , We have noticed Page Cache. Its Chinese translation is called page cache , yes Linux The main disk cache used by the kernel , Is a pure memory working component , Its function is to speed up access to relatively slow disks . If you want to access the file block It happens to exist in Page Cache Inside , So there's no actual disk IO happen . If it doesn't exist , Then you will apply for a new page , Issue page break , And then read it on disk block Content to fill it in , Next time use it directly .Linux The kernel uses a search tree to efficiently manage a large number of pages .

If you have a special need you want to bypass Page Cache, Just set DIRECT_IO That's all right. . There are two situations that need to be bypassed ：

Test disk IO Real performance of
Save the use of Page Cache When the system call falls into kernel state , And copy kernel memory to user process memory to overhead .

file system

In my previous article 《 How much disk space does a new empty file take ？》、《 Understand the principle of formatting 》 It's all about specific file systems . The two most important concepts in a file system are inode and block, We have seen both of them in previous articles . One block How big , This is decided by operation and maintenance when formatting , The general default is 4KB.

except inode and block, Each file system also defines its own actual operation function . For example, in ext4 As defined in ext4_file_operations and ext4_file_inode_operations as follows ：

const struct file_operations ext4_file_operations = {
        .read_iter      = ext4_file_read_iter,
        .write_iter     = ext4_file_write_iter,
        .mmap           = ext4_file_mmap,
        .open           = ext4_file_open,
        ......
};

const struct inode_operations ext4_file_inode_operations = {
        .setattr        = ext4_setattr,
        .getattr        = ext4_file_getattr,
        ......
};

General block layer

The general block layer is all the block devices in a processing system IO The requested kernel module . It defines a name called bio To represent once IO Operation request （include/linux/bio.h）.

So once bio Corresponding to IO The size unit is the page , Or sectors ？ Are not , It's a paragraph ！ Every bio It may contain multiple segments . A segment is a complete page , Or part of the page , Please refer to https://www.ilinuxkernel.com/files/Linux.Generic.Block.Layer.pdf.

Why come up with something so puzzling ？ This is because of the data continuously stored on the disk , When it comes to memory Page Cache The memory may not be continuous . It's normal for this to happen , I can't say that continuous data in the disk, I have to use continuous space to cache in memory . Segment is to make memory available once IO DMA To many “ paragraph ” Address is not continuous in memory .

A common sector / paragraph / The page size comparison is shown in the figure below ：

file

IO Scheduling layer

When the general block layer puts IO After the request was actually sent out , It doesn't have to be executed immediately . Because the scheduling layer will start from the overall situation , Try to make the whole disk IO Maximize performance . The general way to work is to make the head work like an elevator , Go in one direction first , Come back at the end of the day , In this way, the disk efficiency will be higher . The specific algorithms are noop,deadline and cfg etc. .

On your machine , adopt dmesg | grep -i scheduler To check out your Linux Supported algorithms , And you can choose one of them when testing .

The process of reading files

We have Linux IO The various kernel components in the stack are introduced . Now let's go through the whole process of reading files from the beginning

lib Inside read Function first enters the system call sys_read
stay sys_read Enter again VFS Inside vfs_read、generic_file_read Such as function
stay vfs Inside generic_file_read Will determine whether the cache hit , Hit returns
If the kernel is not hit Page Cache Assign a new page box to , Issue page break ,
The kernel initiates blocks to the general block layer I/O request , Block devices block disks 、U The difference between plates
General block layer uses bio Representative I/O Ask to put in IO Request queue
IO The scheduling layer uses the elevator algorithm to schedule the requests in the queue
The driver sends a read command to the disk controller to control ,DMA The method is filled directly into Page Cache New page box in
The controller sends out interrupt notification
The kernel will be what the user needs 1 Bytes filled into user memory
Then your process is awakened

You can see , If Page Cache If you hit it , There's no disk at all IO produce . therefore , Don't think that the performance will be slow if there are several read-write logic in the code . The operating system has been optimized a lot for you , Memory level access latency is about ns Grade , Than mechanical disks IO fast 2-3 An order of magnitude . If you have enough memory , Or your files are accessed frequently enough , In fact, at this time read Very few operations have real disks IO happen .

Let's look at the second situation , If Page Cache If you miss ,Linux How many bytes of disk are actually carried out IO. Whole IO Several kernel components are involved in the process . Each component uses different length blocks to manage disk data .

Page Cache It's in pages ,Linux Page size is usually 4KB（ Avoid being pricked by gods , Here under Linux Can set up large memory pages ）
File systems are managed in blocks . Use dumpe2fs You can see , Generally, a block defaults to 4KB
The general block layer deals with disks in segments IO Requested , A segment is a page or part of a page
IO The scheduler passes through DMA Mode transmission N Sectors to memory , The sector is usually 512 byte
Hard disk also uses “ A sector ” Management and transmission of data

You can see , Although we are really read-only from the user's point of view 1 Bytes （ In the opening code, we only give this disk IO Left a byte of cache ）. But throughout the kernel workflow , The smallest unit of work is the sector of the disk , by 512 byte , Than 1 It's a lot bigger than a byte . in addition block、page cache Higher level components work in larger units , So the actual disk read is a lot of bytes together . If a segment is a memory page , One disk IO Namely 4KB（8 individual 512 Byte sector ） Read together .

Linux What we don't talk about in the kernel is that there is also a complex pre read strategy . therefore , In practice , Maybe it's better than 8 More sectors are transferred to memory together .

Last

The original intention of operating system is to make you simple and reliable , Let's try to think of it as a black box . You want a byte , It gives you a byte , But I did a lot of work in silence . Although most of our domestic development is not at the bottom , But if you're concerned about the performance of your application , You should understand when the operating system quietly improves your performance , How to improve . So that at some time in the future your online server can't bear to hang up , You can quickly find out where the problem lies .

Let's expand , If Page Cache missed , Then there must be disks that drive to the mechanical shaft IO Do you ？

Not necessarily , Why? , Because now the disk itself will carry a cache . In addition, today's servers will build disk arrays , The core hardware in a disk array Raid The card will also integrate RAM As caching . Only when all the caches miss , The mechanical shaft works only with a magnetic head .

file

Development of hard disk album of internal training ：

My official account is 「 Develop internal skill and practice 」, I'm not just talking about technical theory here , It's not just about practical experience . It's about combining theory with practice , Deepen the understanding of theory with practice 、 Use theory to improve your technical practice ability . Welcome to my official account , Please also share with your friends ~~~

版权声明
本文为[Zhang Yanfei Allen]所创，转载请带上原文链接，感谢