当前位置:网站首页>How much disk IO does a byte of read file actually take place?

How much disk IO does a byte of read file actually take place?

2020-11-08 16:12:00 Zhang Yanfei Allen

No matter what language you use ,C/PHP/GO、 still Java, I believe everyone has the experience of reading files . Let's think about two questions , If we read a byte in a file :

  • Whether a disk will occur IO?
  • If it happens ,Linux How many bytes were actually read to disk ?

To make it easier to understand the problem , We put c The code for is listed :

int main()  
{  	
	char    c;  
	int     in;

	in = open("in.txt", O_RDONLY); 
	read(in,&c,1);
	return 0;  
} 

If not engaged in c/c++ Students in the development work , It's really not easy to understand this problem in depth . Because the mainstream language that is commonly used at present ,PHP/Java/Go The encapsulation level of what is relatively high , Many details of the kernel are completely shielded . If you want to make the above two questions clear , It needs to be cut open Linux To understand from the inside of Linux Of IO Stack .

Linux IO Introduction to the stack

I don't say much nonsense , Let's go straight to Linux IO A simplified version of the stack is drawn :( Official IO The stack refers to this Linux.IO.stack_v1.0.pdf

file

We also shared several articles earlier on the hardware layer in the figure above , And file system module . But through this IO Stack we found , We are right. Linux Of documents IO The understanding of is still far from enough , There are several kernel components :IO engine 、VFS、PageCache、 General management block 、IO We don't know much about scheduling layer and other modules . take it easy , Let's come together :

IO engine

We develop students who want to read and write files , stay lib The library layer has many functions to choose from , such as read,write,mmap etc. . This is actually a choice Linux Provided IO engine . What we use most often read、write Functions belong to sync engine , except sync, also map、psync、vsync、libaio、posixaio etc. . sync,psync It's all synchronous ,libaio and posixaio It's asynchronous IO.

Yes, of course IO The engine also needs VFS、 General block layer and other lower level support can be realized . stay sync Engine read Function will enter VFS Provided read system call .

VFS Virtual file system

In the kernel layer , The first thing to see is VFS.VFS The idea was to abstract a generic file system model , Provide a set of common interfaces for our developers or users , Let's not care Specific file system implementation .VFS There are four core data structures provided , They are defined in the kernel source code include/linux/fs.h and include/linux/dcache.h in .

  • superblock:Linux Used to mark information about a specific installed file system
  • inode:Linux Every file in has a inode, You can take inode The ID card that is understood as a document
  • file: File objects in memory , It is used to save the correspondence between process and disk file
  • desty: Catalog items , It's part of the path , All the directory entry objects are concatenated into one tree Linux Under the directory tree .

Around these four core data structures ,VFS It also defines a series of operation methods . such as ,inode The definition of the operation method of inode_operations(include/linux/fs.h), It defines what we are very familiar with mkdir and rename etc. .

struct inode_operations {
				......
        int (*link) (struct dentry *,struct inode *,struct dentry *);
        int (*unlink) (struct inode *,struct dentry *);
        int (*mkdir) (struct inode *,struct dentry *,umode_t);
        int (*rmdir) (struct inode *,struct dentry *);
        int (*rename) (struct inode *, struct dentry *,
                        struct inode *, struct dentry *, unsigned int);
        ......

stay file Corresponding operation method file_operations It defines what we often use read and write:


struct file_operations {
				......
        ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
        ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
				......
        int (*mmap) (struct file *, struct vm_area_struct *);
        int (*open) (struct inode *, struct file *);
        int (*flush) (struct file *, fl_owner_t id);

Page Cache

stay VFS Look down , We have noticed Page Cache. Its Chinese translation is called page cache , yes Linux The main disk cache used by the kernel , Is a pure memory working component , Its function is to speed up access to relatively slow disks . If you want to access the file block It happens to exist in Page Cache Inside , So there's no actual disk IO happen . If it doesn't exist , Then you will apply for a new page , Issue page break , And then read it on disk block Content to fill it in , Next time use it directly .Linux The kernel uses a search tree to efficiently manage a large number of pages .

If you have a special need you want to bypass Page Cache, Just set DIRECT_IO That's all right. . There are two situations that need to be bypassed :

  • Test disk IO Real performance of
  • Save the use of Page Cache When the system call falls into kernel state , And copy kernel memory to user process memory to overhead .

file system

In my previous article 《 How much disk space does a new empty file take ?》、《 Understand the principle of formatting 》 It's all about specific file systems . The two most important concepts in a file system are inode and block, We have seen both of them in previous articles . One block How big , This is decided by operation and maintenance when formatting , The general default is 4KB.

except inode and block, Each file system also defines its own actual operation function . For example, in ext4 As defined in ext4_file_operations and ext4_file_inode_operations as follows :

const struct file_operations ext4_file_operations = {
        .read_iter      = ext4_file_read_iter,
        .write_iter     = ext4_file_write_iter,
        .mmap           = ext4_file_mmap,
        .open           = ext4_file_open,
        ......
};

const struct inode_operations ext4_file_inode_operations = {
        .setattr        = ext4_setattr,
        .getattr        = ext4_file_getattr,
        ......
};

General block layer

The general block layer is all the block devices in a processing system IO The requested kernel module . It defines a name called bio To represent once IO Operation request (include/linux/bio.h).

So once bio Corresponding to IO The size unit is the page , Or sectors ? Are not , It's a paragraph ! Every bio It may contain multiple segments . A segment is a complete page , Or part of the page , Please refer to https://www.ilinuxkernel.com/files/Linux.Generic.Block.Layer.pdf.

Why come up with something so puzzling ? This is because of the data continuously stored on the disk , When it comes to memory Page Cache The memory may not be continuous . It's normal for this to happen , I can't say that continuous data in the disk, I have to use continuous space to cache in memory . Segment is to make memory available once IO DMA To many “ paragraph ” Address is not continuous in memory .

A common sector / paragraph / The page size comparison is shown in the figure below :

file

IO Scheduling layer

When the general block layer puts IO After the request was actually sent out , It doesn't have to be executed immediately . Because the scheduling layer will start from the overall situation , Try to make the whole disk IO Maximize performance . The general way to work is to make the head work like an elevator , Go in one direction first , Come back at the end of the day , In this way, the disk efficiency will be higher . The specific algorithms are noop,deadline and cfg etc. .

On your machine , adopt dmesg | grep -i scheduler To check out your Linux Supported algorithms , And you can choose one of them when testing .

The process of reading files

We have Linux IO The various kernel components in the stack are introduced . Now let's go through the whole process of reading files from the beginning

  • lib Inside read Function first enters the system call sys_read
  • stay sys_read Enter again VFS Inside vfs_read、generic_file_read Such as function
  • stay vfs Inside generic_file_read Will determine whether the cache hit , Hit returns
  • If the kernel is not hit Page Cache Assign a new page box to , Issue page break ,
  • The kernel initiates blocks to the general block layer I/O request , Block devices block disks 、U The difference between plates
  • General block layer uses bio Representative I/O Ask to put in IO Request queue
  • IO The scheduling layer uses the elevator algorithm to schedule the requests in the queue
  • The driver sends a read command to the disk controller to control ,DMA The method is filled directly into Page Cache New page box in
  • The controller sends out interrupt notification
  • The kernel will be what the user needs 1 Bytes filled into user memory
  • Then your process is awakened

You can see , If Page Cache If you hit it , There's no disk at all IO produce . therefore , Don't think that the performance will be slow if there are several read-write logic in the code . The operating system has been optimized a lot for you , Memory level access latency is about ns Grade , Than mechanical disks IO fast 2-3 An order of magnitude . If you have enough memory , Or your files are accessed frequently enough , In fact, at this time read Very few operations have real disks IO happen .

Let's look at the second situation , If Page Cache If you miss ,Linux How many bytes of disk are actually carried out IO. Whole IO Several kernel components are involved in the process . Each component uses different length blocks to manage disk data .

  • Page Cache It's in pages ,Linux Page size is usually 4KB( Avoid being pricked by gods , Here under Linux Can set up large memory pages )
  • File systems are managed in blocks . Use dumpe2fs You can see , Generally, a block defaults to 4KB
  • The general block layer deals with disks in segments IO Requested , A segment is a page or part of a page
  • IO The scheduler passes through DMA Mode transmission N Sectors to memory , The sector is usually 512 byte
  • Hard disk also uses “ A sector ” Management and transmission of data

You can see , Although we are really read-only from the user's point of view 1 Bytes ( In the opening code, we only give this disk IO Left a byte of cache ). But throughout the kernel workflow , The smallest unit of work is the sector of the disk , by 512 byte , Than 1 It's a lot bigger than a byte . in addition block、page cache Higher level components work in larger units , So the actual disk read is a lot of bytes together . If a segment is a memory page , One disk IO Namely 4KB(8 individual 512 Byte sector ) Read together .

Linux What we don't talk about in the kernel is that there is also a complex pre read strategy . therefore , In practice , Maybe it's better than 8 More sectors are transferred to memory together .

Last

The original intention of operating system is to make you simple and reliable , Let's try to think of it as a black box . You want a byte , It gives you a byte , But I did a lot of work in silence . Although most of our domestic development is not at the bottom , But if you're concerned about the performance of your application , You should understand when the operating system quietly improves your performance , How to improve . So that at some time in the future your online server can't bear to hang up , You can quickly find out where the problem lies .

Let's expand , If Page Cache missed , Then there must be disks that drive to the mechanical shaft IO Do you ?

Not necessarily , Why? , Because now the disk itself will carry a cache . In addition, today's servers will build disk arrays , The core hardware in a disk array Raid The card will also integrate RAM As caching . Only when all the caches miss , The mechanical shaft works only with a magnetic head .


file


Development of hard disk album of internal training :


My official account is 「 Develop internal skill and practice 」, I'm not just talking about technical theory here , It's not just about practical experience . It's about combining theory with practice , Deepen the understanding of theory with practice 、 Use theory to improve your technical practice ability . Welcome to my official account , Please also share with your friends ~~~

版权声明
本文为[Zhang Yanfei Allen]所创,转载请带上原文链接,感谢