当前位置:网站首页>Why is the LS command stuck when there are too many files?
Why is the LS command stuck when there are too many files?
2020-11-06 21:04:00 【Zhang Yanfei Allen】
I don't know if you have ever encountered a folder with many files , Execute below ls Problems that have to wait a long time to show when ordering ? If there is , Have you ever thought about why , How can we solve ? To get a deeper understanding of the causes of this problem , We need to start with the disk space occupied by the folder .
inode Consumption verification
stay 《 How much disk space does a new empty file take ?》 I mentioned that each file consumes a little bit of space in its folder . Folder , In fact, it will also consume inode Of . Let's take a look at the current inode The occupancy of
# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
......
/dev/sdb1 2147361984 12785020 2134576964 1% /search
Create another empty folder
# mkdir temp
# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
......
/dev/sdb1 2147361984 12785021 2134576963 1% /search
adopt IUsed You can see , Like an empty file , Empty folders also consume one inode. But this one is very small , It's on my machine 256 It's just bytes , It should not have caused ls Order the culprit of the card owner .
block Consumption verification
Where is the name of the folder ? Um. , and 《 How much disk space does a new empty file take ?》 The files in are similar to , Will consume one ext4_dir_entry_2 ( Use today ext4 give an example , It's in linux Source code fs/ext4/ex4.h The document defines ), Put it in its parent directory block In the . According to this , I believe you can think of it soon , If it creates a bunch of files under its own node , It will take its own block. Let's start to verify :
# mkdir test
# cd test
# du -h
4.0K .
there 4KB It means that one is consumed block. Empty files don't consume block, Why is an empty directory consumed in the first place block What about it , That's because it has to default with two directory entries "." and "..". And this one 4K It doesn't have to be that big on your machine , It's actually a block size, It was decided when you formatted .
Let's create two more empty files , Check it again :
# touch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab
# touch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
# du -h
4.0K .
Looks like , Nothing has changed . This is because
- First of all 、 The new empty file does not occupy block, So what is shown here is still occupied by the directory block.
- second 、 Previously assigned when the folder was created 4KB There's still space in it , There's enough space for these two file items
So I'll try to create more , Using scripts to create 100 File names with a length of 32Byte Empty file .
#!/bin/bash
for((i=1;i<=100;i++));
do
file="tempDir/"$(echo $i|awk '{printf("%032d",$0)}')
echo $file
touch $file
done
# du -h
12K .
ha-ha , At this time, we found that the disk space occupied by the directory has increased , a 3 individual Block 了 . When we create 10000 When it comes to documents ,
# du -h
548K .
In every one of them ext4_dir_entry_2 Except for the file name , It also records inode Number and other information , The detailed definition is as follows :
struct ext4_dir_entry_2 {
__le32 inode; /* Inode number */
__le16 rec_len; /* Directory entry length */
__u8 name_len; /* Name length */
__u8 file_type;
char name[EXT4_NAME_LEN]; /* File name */
};
Let's calculate , Average space per file =548K/10000=54 byte . in other words , Than our file name 32 A little bit bigger , Basically right . Here we also get a fact , The longer the file name , The more space it consumes in its parent directory .
This paper concludes that
A folder, of course, also consumes disk space .
- The first thing to do is to consume one inode, On my machine, it's 256 byte
- Need to consume a directory entry under its parent directory
ext4_dir_entry_2, Save yourself inode Number , Directory name . - Below it, if you create a folder or file , It needs to be in its own block in
ext4_dir_entry_2Array
A file in a directory / The more subdirectories , The more you need to apply for a catalog block. in addition ext4_dir_entry_2 Size is not fixed , file name / The longer the subdirectory name is , The more space a single directory entry consumes .
For the opening question , I think you should understand by now why , The problem is in the folder block On the body . This is when there are so many files under your folder , Especially when the file name is long , It will consume a lot of block. When you traverse the folder , If Page Cache You didn't hit what you were going to visit block, It will penetrate into the disk and do the actual IO. From your point of view , It's when you're done ls after , Get stuck .
So you're sure to ask , I really want to save a lot of documents , What am I gonna do? ? It's also very simple , Just create more folders , Don't save too much in one directory , There won't be such a problem . In engineering practice , The general way is to go through the first level or even the second level hash Hash files into multiple directories , Control the number of single directory files to 100000 or less .
ext Of bug
It seems that today's practice should be over , Now let's delete all the files we just created , Look again. .
# rm -f *
# du -h
72K .
wait , What circumstance ? The files in the folder have been deleted , Why does this folder still occupy 72K Of disk space ? This doubt has been with me for a long time , It was only later that I was able to solve the puzzle . The key is ext4_dir_entry_2 Medium rec_len. This variable stores the current entire ext4_dir_entry_2 Length of object , So when the operating system traverses the folder , You can pass the current pointer , Add this length to find the next file in the folder dir_entry 了 . The advantage is that traversal is very convenient , It's kind of like a linked list , One by one . however , If you want to delete a file , It's a little bit of a hassle , The current file structure variable cannot be deleted directly , Otherwise, the list will be broken . Linux When you delete a file , In its catalog, it just put inode Set to 0 Just pull it down , It didn't recycle the whole thing ext4_dir_entry_2 object . In fact, we often use the false deletion when we do the project . current xfs The file system doesn't seem to have this little problem anymore , But how to solve it , There is no in-depth study for the time being , If you have an answer , Welcome to leave a message !

Development of hard disk album of internal training :
- 1. Disk opening : Take off the hard coat of the mechanical hard disk !
- 2. Disk partitioning also implies technical skills
- 3. How can we solve the problem that mechanical hard disks are slow and easy to break down ?
- 4. Disassemble the SSD structure
- 5. How much disk space does a new empty file take ?
- 6. Only 1 How much disk space does a byte file actually take up
- 7. When there are too many documents ls Why is the command stuck ?
- 8. Understand the principle of formatting
- 9.read How much disk does a byte of file actually take place on IO?
- 10.write When to write to disk after one byte of file IO?
- 11. Mechanical hard disk random IO Slower than you think
- 12. How much faster is a server equipped with a SSD than a mechanical hard disk ?
My official account is 「 Develop internal skill and practice 」, I'm not just talking about technical theory here , It's not just about practical experience . It's about combining theory with practice , Deepen the understanding of theory with practice 、 Use theory to improve your technical practice ability . Welcome to my official account , Please also share with your friends ~~~
版权声明
本文为[Zhang Yanfei Allen]所创,转载请带上原文链接,感谢
边栏推荐
- In depth to uncover the bottom layer of garbage collection, this time let you understand her thoroughly
- 检测证书过期脚本
- Zero basis to build a web search engine of its own
- 统计项目代码行数
- 常用SQL语句总结
- 【:: 是什么语法?】
- 游戏开发中的新手引导与事件管理系统
- How to turn data into assets? Attracting data scientists
- EOS founder BM: what's the difference between UE, UBI and URI?
- Basic usage of GDB debugging
猜你喜欢

代码生成器插件与Creator预制体文件解析

What is the tensor in tensorflow?

Python basic variable type -- list analysis

Small program introduction to proficient (2): understand the four important files of small program development

Pn8162 20W PD fast charging chip, PD fast charging charger scheme

嘉宾专访|2020 PostgreSQL亚洲大会阿里云数据库专场:王涛

es创建新的索引库并拷贝旧的索引库 实践亲测有效!

Even liver three all night, jvm77 high frequency interview questions detailed analysis, this?

How to understand Python iterators and generators?

Elasticsearch database | elasticsearch-7.5.0 application construction
随机推荐
如何在终端启动Coda 2中隐藏的首选项?
实用工具类函数(持续更新)
git远程库回退指定版本
What is the tensor in tensorflow?
Share with Lianyun: is IPFs / filecoin worth investing in?
2020年第四届中国 BIM (数字建造)经理高峰论坛即将在杭举办
检测证书过期脚本
ES中删除索引的mapping字段时应该考虑的点
Top 5 Chinese cloud manufacturers in 2018: Alibaba cloud, Tencent cloud, AWS, telecom, Unicom
Markdown tricks
小游戏云开发入门
html+vue.js 實現分頁可相容IE
The AI method put forward by China has more and more influence. Tianda et al. Mined the development law of AI from a large number of literatures
To Lianyun analysis: why is IPFs / filecoin mining so difficult?
Git rebase is in trouble. What to do? Waiting line
一路踩坑,被迫聊聊 C# 代码调试技巧和远程调试
It's time for your financial report to change to a more advanced style -- financial analysis cockpit
nacos、ribbon和feign的簡明教程
Using an example to understand the underlying processing mechanism of JS function
Summary of front-end interview questions (C, s, s) that front-end engineers need to understand (2)