当前位置:网站首页>How much disk IO does a byte of read file actually take place?
How much disk IO does a byte of read file actually take place?
2020-11-08 16:12:00 【Zhang Yanfei Allen】
No matter what language you use ,C/PHP/GO、 still Java, I believe everyone has the experience of reading files . Let's think about two questions , If we read a byte in a file :
- Whether a disk will occur IO?
- If it happens ,Linux How many bytes were actually read to disk ?
To make it easier to understand the problem , We put c The code for is listed :
int main()
{
char c;
int in;
in = open("in.txt", O_RDONLY);
read(in,&c,1);
return 0;
}
If not engaged in c/c++ Students in the development work , It's really not easy to understand this problem in depth . Because the mainstream language that is commonly used at present ,PHP/Java/Go The encapsulation level of what is relatively high , Many details of the kernel are completely shielded . If you want to make the above two questions clear , It needs to be cut open Linux To understand from the inside of Linux Of IO Stack .
Linux IO Introduction to the stack
I don't say much nonsense , Let's go straight to Linux IO A simplified version of the stack is drawn :( Official IO The stack refers to this Linux.IO.stack_v1.0.pdf)
We also shared several articles earlier on the hardware layer in the figure above , And file system module . But through this IO Stack we found , We are right. Linux Of documents IO The understanding of is still far from enough , There are several kernel components :IO engine 、VFS、PageCache、 General management block 、IO We don't know much about scheduling layer and other modules . take it easy , Let's come together :
IO engine
We develop students who want to read and write files , stay lib The library layer has many functions to choose from , such as read,write,mmap etc. . This is actually a choice Linux Provided IO engine . What we use most often read、write Functions belong to sync engine , except sync, also map、psync、vsync、libaio、posixaio etc. . sync,psync It's all synchronous ,libaio and posixaio It's asynchronous IO.
Yes, of course IO The engine also needs VFS、 General block layer and other lower level support can be realized . stay sync Engine read Function will enter VFS Provided read system call .
VFS Virtual file system
In the kernel layer , The first thing to see is VFS.VFS The idea was to abstract a generic file system model , Provide a set of common interfaces for our developers or users , Let's not care Specific file system implementation .VFS There are four core data structures provided , They are defined in the kernel source code include/linux/fs.h and include/linux/dcache.h in .
- superblock:Linux Used to mark information about a specific installed file system
- inode:Linux Every file in has a inode, You can take inode The ID card that is understood as a document
- file: File objects in memory , It is used to save the correspondence between process and disk file
- desty: Catalog items , It's part of the path , All the directory entry objects are concatenated into one tree Linux Under the directory tree .
Around these four core data structures ,VFS It also defines a series of operation methods . such as ,inode The definition of the operation method of inode_operations
(include/linux/fs.h), It defines what we are very familiar with mkdir
and rename
etc. .
struct inode_operations {
......
int (*link) (struct dentry *,struct inode *,struct dentry *);
int (*unlink) (struct inode *,struct dentry *);
int (*mkdir) (struct inode *,struct dentry *,umode_t);
int (*rmdir) (struct inode *,struct dentry *);
int (*rename) (struct inode *, struct dentry *,
struct inode *, struct dentry *, unsigned int);
......
stay file Corresponding operation method file_operations
It defines what we often use read and write:
struct file_operations {
......
ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
......
int (*mmap) (struct file *, struct vm_area_struct *);
int (*open) (struct inode *, struct file *);
int (*flush) (struct file *, fl_owner_t id);
Page Cache
stay VFS Look down , We have noticed Page Cache. Its Chinese translation is called page cache , yes Linux The main disk cache used by the kernel , Is a pure memory working component , Its function is to speed up access to relatively slow disks . If you want to access the file block It happens to exist in Page Cache Inside , So there's no actual disk IO happen . If it doesn't exist , Then you will apply for a new page , Issue page break , And then read it on disk block Content to fill it in , Next time use it directly .Linux The kernel uses a search tree to efficiently manage a large number of pages .
If you have a special need you want to bypass Page Cache, Just set DIRECT_IO That's all right. . There are two situations that need to be bypassed :
- Test disk IO Real performance of
- Save the use of Page Cache When the system call falls into kernel state , And copy kernel memory to user process memory to overhead .
file system
In my previous article 《 How much disk space does a new empty file take ?》、《 Understand the principle of formatting 》 It's all about specific file systems . The two most important concepts in a file system are inode and block, We have seen both of them in previous articles . One block How big , This is decided by operation and maintenance when formatting , The general default is 4KB.
except inode and block, Each file system also defines its own actual operation function . For example, in ext4 As defined in ext4_file_operations and ext4_file_inode_operations as follows :
const struct file_operations ext4_file_operations = {
.read_iter = ext4_file_read_iter,
.write_iter = ext4_file_write_iter,
.mmap = ext4_file_mmap,
.open = ext4_file_open,
......
};
const struct inode_operations ext4_file_inode_operations = {
.setattr = ext4_setattr,
.getattr = ext4_file_getattr,
......
};
General block layer
The general block layer is all the block devices in a processing system IO The requested kernel module . It defines a name called bio To represent once IO Operation request (include/linux/bio.h).
So once bio Corresponding to IO The size unit is the page , Or sectors ? Are not , It's a paragraph ! Every bio It may contain multiple segments . A segment is a complete page , Or part of the page , Please refer to https://www.ilinuxkernel.com/files/Linux.Generic.Block.Layer.pdf.
Why come up with something so puzzling ? This is because of the data continuously stored on the disk , When it comes to memory Page Cache The memory may not be continuous . It's normal for this to happen , I can't say that continuous data in the disk, I have to use continuous space to cache in memory . Segment is to make memory available once IO DMA To many “ paragraph ” Address is not continuous in memory .
A common sector / paragraph / The page size comparison is shown in the figure below :
IO Scheduling layer
When the general block layer puts IO After the request was actually sent out , It doesn't have to be executed immediately . Because the scheduling layer will start from the overall situation , Try to make the whole disk IO Maximize performance . The general way to work is to make the head work like an elevator , Go in one direction first , Come back at the end of the day , In this way, the disk efficiency will be higher . The specific algorithms are noop,deadline and cfg etc. .
On your machine , adopt dmesg | grep -i scheduler
To check out your Linux Supported algorithms , And you can choose one of them when testing .
The process of reading files
We have Linux IO The various kernel components in the stack are introduced . Now let's go through the whole process of reading files from the beginning
- lib Inside read Function first enters the system call sys_read
- stay sys_read Enter again VFS Inside vfs_read、generic_file_read Such as function
- stay vfs Inside generic_file_read Will determine whether the cache hit , Hit returns
- If the kernel is not hit Page Cache Assign a new page box to , Issue page break ,
- The kernel initiates blocks to the general block layer I/O request , Block devices block disks 、U The difference between plates
- General block layer uses bio Representative I/O Ask to put in IO Request queue
- IO The scheduling layer uses the elevator algorithm to schedule the requests in the queue
- The driver sends a read command to the disk controller to control ,DMA The method is filled directly into Page Cache New page box in
- The controller sends out interrupt notification
- The kernel will be what the user needs 1 Bytes filled into user memory
- Then your process is awakened
You can see , If Page Cache If you hit it , There's no disk at all IO produce . therefore , Don't think that the performance will be slow if there are several read-write logic in the code . The operating system has been optimized a lot for you , Memory level access latency is about ns Grade , Than mechanical disks IO fast 2-3 An order of magnitude . If you have enough memory , Or your files are accessed frequently enough , In fact, at this time read Very few operations have real disks IO happen .
Let's look at the second situation , If Page Cache If you miss ,Linux How many bytes of disk are actually carried out IO. Whole IO Several kernel components are involved in the process . Each component uses different length blocks to manage disk data .
- Page Cache It's in pages ,Linux Page size is usually 4KB( Avoid being pricked by gods , Here under Linux Can set up large memory pages )
- File systems are managed in blocks . Use
dumpe2fs
You can see , Generally, a block defaults to 4KB - The general block layer deals with disks in segments IO Requested , A segment is a page or part of a page
- IO The scheduler passes through DMA Mode transmission N Sectors to memory , The sector is usually 512 byte
- Hard disk also uses “ A sector ” Management and transmission of data
You can see , Although we are really read-only from the user's point of view 1 Bytes ( In the opening code, we only give this disk IO Left a byte of cache ). But throughout the kernel workflow , The smallest unit of work is the sector of the disk , by 512 byte , Than 1 It's a lot bigger than a byte . in addition block、page cache Higher level components work in larger units , So the actual disk read is a lot of bytes together . If a segment is a memory page , One disk IO Namely 4KB(8 individual 512 Byte sector ) Read together .
Linux What we don't talk about in the kernel is that there is also a complex pre read strategy . therefore , In practice , Maybe it's better than 8 More sectors are transferred to memory together .
Last
The original intention of operating system is to make you simple and reliable , Let's try to think of it as a black box . You want a byte , It gives you a byte , But I did a lot of work in silence . Although most of our domestic development is not at the bottom , But if you're concerned about the performance of your application , You should understand when the operating system quietly improves your performance , How to improve . So that at some time in the future your online server can't bear to hang up , You can quickly find out where the problem lies .
Let's expand , If Page Cache missed , Then there must be disks that drive to the mechanical shaft IO Do you ?
Not necessarily , Why? , Because now the disk itself will carry a cache . In addition, today's servers will build disk arrays , The core hardware in a disk array Raid The card will also integrate RAM As caching . Only when all the caches miss , The mechanical shaft works only with a magnetic head .
Development of hard disk album of internal training :
- 1. Disk opening : Take off the hard coat of the mechanical hard disk !
- 2. Disk partitioning also implies technical skills
- 3. How can we solve the problem that mechanical hard disks are slow and easy to break down ?
- 4. Disassemble the SSD structure
- 5. How much disk space does a new empty file take ?
- 6. Only 1 How much disk space does a byte file actually take up
- 7. When there are too many documents ls Why is the command stuck ?
- 8. Understand the principle of formatting
- 9.read How much disk does a byte of file actually take place on IO?
- 10.write When to write to disk after one byte of file IO?
- 11. Mechanical hard disk random IO Slower than you think
- 12. How much faster is a server equipped with a SSD than a mechanical hard disk ?
My official account is 「 Develop internal skill and practice 」, I'm not just talking about technical theory here , It's not just about practical experience . It's about combining theory with practice , Deepen the understanding of theory with practice 、 Use theory to improve your technical practice ability . Welcome to my official account , Please also share with your friends ~~~
版权声明
本文为[Zhang Yanfei Allen]所创,转载请带上原文链接,感谢
边栏推荐
- Golang ICMP协议探测存活主机
- LeanCloud 十月变化
- Using k3s to create local development cluster
- Talking about, check the history of which famous computer viruses, 80% of the people do not know!
- 基于阿里云日志服务快速打造简版业务监控看板
- Liteos message queuing
- 搭载固态硬盘的服务器究竟比机械硬盘快多少
- 函数分类大pk!sigmoid和softmax,到底分别怎么用?
- vim-配置教程+源码
- I used Python to find out all the people who deleted my wechat and deleted them automatically
猜你喜欢
What are the necessary laws and regulations to know when entering the Internet?
. net large data concurrency solution
机械硬盘随机IO慢的超乎你的想象
刚刚好,才是最理想的状态
rabbitmq(一)-基础入门
关于update操作并发问题
The first open source Chinese Bert pre training model in the financial field
Millet and oppo continue to soar in the European market, and Xiaomi is even closer to apple
.NET 大数据量并发解决方案
Stm32uberide download and install - GPIO basic configuration operation - debug (based on CMSIS DAP debug)
随机推荐
我用 Python 找出了删除我微信的所有人并将他们自动化删除了
Development of uni app imitating wechat app
CSP考试须知与各种小技巧
学习记录并且简单分析
wanxin finance
Using k3s to create local development cluster
使用K3S创建本地开发集群
Elasticsearch 学习一(基础入门).
Millet and oppo continue to soar in the European market, and Xiaomi is even closer to apple
Jsliang job series - 07 - promise
Liteos message queuing actual combat
Learn to record and analyze
腾讯:阿里的大中台虽好,但也不是万能的!
Comics: looking for the best time to buy and sell stocks
华为在5G手机市场占据绝对优势,市调机构对小米的市占出现分歧
Rabbitmq (1) - basic introduction
刚刚好,才是最理想的状态
Design by contract (DBC) and its application in C language
wanxin finance
Framework - SPI four modes + general device driver implementation - source code