当前位置:网站首页>Why is the LS command stuck when there are too many files?
Why is the LS command stuck when there are too many files?
2020-11-06 21:04:00 【Zhang Yanfei Allen】
I don't know if you have ever encountered a folder with many files , Execute below ls
Problems that have to wait a long time to show when ordering ? If there is , Have you ever thought about why , How can we solve ? To get a deeper understanding of the causes of this problem , We need to start with the disk space occupied by the folder .
inode Consumption verification
stay 《 How much disk space does a new empty file take ?》 I mentioned that each file consumes a little bit of space in its folder . Folder , In fact, it will also consume inode Of . Let's take a look at the current inode The occupancy of
# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
......
/dev/sdb1 2147361984 12785020 2134576964 1% /search
Create another empty folder
# mkdir temp
# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
......
/dev/sdb1 2147361984 12785021 2134576963 1% /search
adopt IUsed You can see , Like an empty file , Empty folders also consume one inode. But this one is very small , It's on my machine 256 It's just bytes , It should not have caused ls
Order the culprit of the card owner .
block Consumption verification
Where is the name of the folder ? Um. , and 《 How much disk space does a new empty file take ?》 The files in are similar to , Will consume one ext4_dir_entry_2
( Use today ext4 give an example , It's in linux Source code fs/ext4/ex4.h The document defines ), Put it in its parent directory block In the . According to this , I believe you can think of it soon , If it creates a bunch of files under its own node , It will take its own block. Let's start to verify :
# mkdir test
# cd test
# du -h
4.0K .
there 4KB It means that one is consumed block. Empty files don't consume block, Why is an empty directory consumed in the first place block What about it , That's because it has to default with two directory entries "." and "..". And this one 4K It doesn't have to be that big on your machine , It's actually a block size, It was decided when you formatted .
Let's create two more empty files , Check it again :
# touch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab
# touch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
# du -h
4.0K .
Looks like , Nothing has changed . This is because
- First of all 、 The new empty file does not occupy block, So what is shown here is still occupied by the directory block.
- second 、 Previously assigned when the folder was created 4KB There's still space in it , There's enough space for these two file items
So I'll try to create more , Using scripts to create 100 File names with a length of 32Byte Empty file .
#!/bin/bash
for((i=1;i<=100;i++));
do
file="tempDir/"$(echo $i|awk '{printf("%032d",$0)}')
echo $file
touch $file
done
# du -h
12K .
ha-ha , At this time, we found that the disk space occupied by the directory has increased , a 3 individual Block 了 . When we create 10000 When it comes to documents ,
# du -h
548K .
In every one of them ext4_dir_entry_2
Except for the file name , It also records inode Number and other information , The detailed definition is as follows :
struct ext4_dir_entry_2 {
__le32 inode; /* Inode number */
__le16 rec_len; /* Directory entry length */
__u8 name_len; /* Name length */
__u8 file_type;
char name[EXT4_NAME_LEN]; /* File name */
};
Let's calculate , Average space per file =548K/10000=54 byte . in other words , Than our file name 32 A little bit bigger , Basically right . Here we also get a fact , The longer the file name , The more space it consumes in its parent directory .
This paper concludes that
A folder, of course, also consumes disk space .
- The first thing to do is to consume one inode, On my machine, it's 256 byte
- Need to consume a directory entry under its parent directory
ext4_dir_entry_2
, Save yourself inode Number , Directory name . - Below it, if you create a folder or file , It needs to be in its own block in
ext4_dir_entry_2
Array
A file in a directory / The more subdirectories , The more you need to apply for a catalog block. in addition ext4_dir_entry_2
Size is not fixed , file name / The longer the subdirectory name is , The more space a single directory entry consumes .
For the opening question , I think you should understand by now why , The problem is in the folder block On the body . This is when there are so many files under your folder , Especially when the file name is long , It will consume a lot of block. When you traverse the folder , If Page Cache You didn't hit what you were going to visit block, It will penetrate into the disk and do the actual IO. From your point of view , It's when you're done ls
after , Get stuck .
So you're sure to ask , I really want to save a lot of documents , What am I gonna do? ? It's also very simple , Just create more folders , Don't save too much in one directory , There won't be such a problem . In engineering practice , The general way is to go through the first level or even the second level hash Hash files into multiple directories , Control the number of single directory files to 100000 or less .
ext Of bug
It seems that today's practice should be over , Now let's delete all the files we just created , Look again. .
# rm -f *
# du -h
72K .
wait , What circumstance ? The files in the folder have been deleted , Why does this folder still occupy 72K Of disk space ? This doubt has been with me for a long time , It was only later that I was able to solve the puzzle . The key is ext4_dir_entry_2
Medium rec_len
. This variable stores the current entire ext4_dir_entry_2
Length of object , So when the operating system traverses the folder , You can pass the current pointer , Add this length to find the next file in the folder dir_entry
了 . The advantage is that traversal is very convenient , It's kind of like a linked list , One by one . however , If you want to delete a file , It's a little bit of a hassle , The current file structure variable cannot be deleted directly , Otherwise, the list will be broken . Linux When you delete a file , In its catalog, it just put inode Set to 0 Just pull it down , It didn't recycle the whole thing ext4_dir_entry_2
object . In fact, we often use the false deletion when we do the project . current xfs The file system doesn't seem to have this little problem anymore , But how to solve it , There is no in-depth study for the time being , If you have an answer , Welcome to leave a message !
Development of hard disk album of internal training :
- 1. Disk opening : Take off the hard coat of the mechanical hard disk !
- 2. Disk partitioning also implies technical skills
- 3. How can we solve the problem that mechanical hard disks are slow and easy to break down ?
- 4. Disassemble the SSD structure
- 5. How much disk space does a new empty file take ?
- 6. Only 1 How much disk space does a byte file actually take up
- 7. When there are too many documents ls Why is the command stuck ?
- 8. Understand the principle of formatting
- 9.read How much disk does a byte of file actually take place on IO?
- 10.write When to write to disk after one byte of file IO?
- 11. Mechanical hard disk random IO Slower than you think
- 12. How much faster is a server equipped with a SSD than a mechanical hard disk ?
My official account is 「 Develop internal skill and practice 」, I'm not just talking about technical theory here , It's not just about practical experience . It's about combining theory with practice , Deepen the understanding of theory with practice 、 Use theory to improve your technical practice ability . Welcome to my official account , Please also share with your friends ~~~
版权声明
本文为[Zhang Yanfei Allen]所创,转载请带上原文链接,感谢
边栏推荐
- Even liver three all night, jvm77 high frequency interview questions detailed analysis, this?
- Summary of front-end performance optimization that every front-end engineer should understand:
- The legality of IPFs / filecoin: protecting personal privacy from disclosure
- Bitcoin once exceeded 14000 US dollars and is about to face the test of the US election
- 美团内部讲座|周烜:华东师范大学的数据库系统研究
- Python basic variable type -- list analysis
- image operating system windows cannot be used on this platform
- 【:: 是什么语法?】
- 開源一套極簡的前後端分離專案腳手架
- 2020年第四届中国 BIM (数字建造)经理高峰论坛即将在杭举办
猜你喜欢
What are manufacturing and new automation technologies?
How to play sortable JS vuedraggable to realize nested drag function of forms
An article will take you to understand CSS3 fillet knowledge
行为型模式之解释器模式
Those who have worked in China for six years and a million annual salary want to share these four points with you
The legality of IPFs / filecoin: protecting personal privacy from disclosure
What is the meaning of sector sealing of filecoin mining machine since the main network of filecoin was put online
An article takes you to understand CSS3 picture border
What are the criteria for selecting a cluster server?
JVM内存分配 -Xms128m -Xmx512m -XX:PermSize=128m -XX:MaxPermSize=512m
随机推荐
Using an example to understand the underlying processing mechanism of JS function
小游戏云开发入门
What course of artificial intelligence? Will it replace human work?
Markdown tricks
美团内部讲座|周烜:华东师范大学的数据库系统研究
Summary of front-end performance optimization that every front-end engineer should understand:
FastThreadLocal 是什么鬼?吊打 ThreadLocal 的存在!!
Will blockchain be the antidote to the global epidemic accelerating the transformation of Internet enterprises?
(2) ASP.NET Core3.1 Ocelot routing
The method of realizing high SLO on large scale kubernetes cluster
What knowledge do Python automated testing learn?
Asp.Net Core learning notes: Introduction
Use modelarts quickly, zero base white can also play AI!
嘉宾专访|2020 PostgreSQL亚洲大会阿里云数据库专场:王涛
開源一套極簡的前後端分離專案腳手架
jenkins安装部署过程简记
Behind the record breaking Q2 revenue of Alibaba cloud, the cloud opening mode is reshaping
Analysis of serilog source code -- how to use it
An article takes you to understand CSS pagination examples
Digital city responds to relevant national policies and vigorously develops the construction of digital twin platform