当前位置:网站首页>Why is the LS command stuck when there are too many files?
Why is the LS command stuck when there are too many files?
2020-11-06 21:04:00 【Zhang Yanfei Allen】
I don't know if you have ever encountered a folder with many files , Execute below ls
Problems that have to wait a long time to show when ordering ? If there is , Have you ever thought about why , How can we solve ? To get a deeper understanding of the causes of this problem , We need to start with the disk space occupied by the folder .
inode Consumption verification
stay 《 How much disk space does a new empty file take ?》 I mentioned that each file consumes a little bit of space in its folder . Folder , In fact, it will also consume inode Of . Let's take a look at the current inode The occupancy of
# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
......
/dev/sdb1 2147361984 12785020 2134576964 1% /search
Create another empty folder
# mkdir temp
# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
......
/dev/sdb1 2147361984 12785021 2134576963 1% /search
adopt IUsed You can see , Like an empty file , Empty folders also consume one inode. But this one is very small , It's on my machine 256 It's just bytes , It should not have caused ls
Order the culprit of the card owner .
block Consumption verification
Where is the name of the folder ? Um. , and 《 How much disk space does a new empty file take ?》 The files in are similar to , Will consume one ext4_dir_entry_2
( Use today ext4 give an example , It's in linux Source code fs/ext4/ex4.h The document defines ), Put it in its parent directory block In the . According to this , I believe you can think of it soon , If it creates a bunch of files under its own node , It will take its own block. Let's start to verify :
# mkdir test
# cd test
# du -h
4.0K .
there 4KB It means that one is consumed block. Empty files don't consume block, Why is an empty directory consumed in the first place block What about it , That's because it has to default with two directory entries "." and "..". And this one 4K It doesn't have to be that big on your machine , It's actually a block size, It was decided when you formatted .
Let's create two more empty files , Check it again :
# touch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab
# touch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
# du -h
4.0K .
Looks like , Nothing has changed . This is because
- First of all 、 The new empty file does not occupy block, So what is shown here is still occupied by the directory block.
- second 、 Previously assigned when the folder was created 4KB There's still space in it , There's enough space for these two file items
So I'll try to create more , Using scripts to create 100 File names with a length of 32Byte Empty file .
#!/bin/bash
for((i=1;i<=100;i++));
do
file="tempDir/"$(echo $i|awk '{printf("%032d",$0)}')
echo $file
touch $file
done
# du -h
12K .
ha-ha , At this time, we found that the disk space occupied by the directory has increased , a 3 individual Block 了 . When we create 10000 When it comes to documents ,
# du -h
548K .
In every one of them ext4_dir_entry_2
Except for the file name , It also records inode Number and other information , The detailed definition is as follows :
struct ext4_dir_entry_2 {
__le32 inode; /* Inode number */
__le16 rec_len; /* Directory entry length */
__u8 name_len; /* Name length */
__u8 file_type;
char name[EXT4_NAME_LEN]; /* File name */
};
Let's calculate , Average space per file =548K/10000=54 byte . in other words , Than our file name 32 A little bit bigger , Basically right . Here we also get a fact , The longer the file name , The more space it consumes in its parent directory .
This paper concludes that
A folder, of course, also consumes disk space .
- The first thing to do is to consume one inode, On my machine, it's 256 byte
- Need to consume a directory entry under its parent directory
ext4_dir_entry_2
, Save yourself inode Number , Directory name . - Below it, if you create a folder or file , It needs to be in its own block in
ext4_dir_entry_2
Array
A file in a directory / The more subdirectories , The more you need to apply for a catalog block. in addition ext4_dir_entry_2
Size is not fixed , file name / The longer the subdirectory name is , The more space a single directory entry consumes .
For the opening question , I think you should understand by now why , The problem is in the folder block On the body . This is when there are so many files under your folder , Especially when the file name is long , It will consume a lot of block. When you traverse the folder , If Page Cache You didn't hit what you were going to visit block, It will penetrate into the disk and do the actual IO. From your point of view , It's when you're done ls
after , Get stuck .
So you're sure to ask , I really want to save a lot of documents , What am I gonna do? ? It's also very simple , Just create more folders , Don't save too much in one directory , There won't be such a problem . In engineering practice , The general way is to go through the first level or even the second level hash Hash files into multiple directories , Control the number of single directory files to 100000 or less .
ext Of bug
It seems that today's practice should be over , Now let's delete all the files we just created , Look again. .
# rm -f *
# du -h
72K .
wait , What circumstance ? The files in the folder have been deleted , Why does this folder still occupy 72K Of disk space ? This doubt has been with me for a long time , It was only later that I was able to solve the puzzle . The key is ext4_dir_entry_2
Medium rec_len
. This variable stores the current entire ext4_dir_entry_2
Length of object , So when the operating system traverses the folder , You can pass the current pointer , Add this length to find the next file in the folder dir_entry
了 . The advantage is that traversal is very convenient , It's kind of like a linked list , One by one . however , If you want to delete a file , It's a little bit of a hassle , The current file structure variable cannot be deleted directly , Otherwise, the list will be broken . Linux When you delete a file , In its catalog, it just put inode Set to 0 Just pull it down , It didn't recycle the whole thing ext4_dir_entry_2
object . In fact, we often use the false deletion when we do the project . current xfs The file system doesn't seem to have this little problem anymore , But how to solve it , There is no in-depth study for the time being , If you have an answer , Welcome to leave a message !
Development of hard disk album of internal training :
- 1. Disk opening : Take off the hard coat of the mechanical hard disk !
- 2. Disk partitioning also implies technical skills
- 3. How can we solve the problem that mechanical hard disks are slow and easy to break down ?
- 4. Disassemble the SSD structure
- 5. How much disk space does a new empty file take ?
- 6. Only 1 How much disk space does a byte file actually take up
- 7. When there are too many documents ls Why is the command stuck ?
- 8. Understand the principle of formatting
- 9.read How much disk does a byte of file actually take place on IO?
- 10.write When to write to disk after one byte of file IO?
- 11. Mechanical hard disk random IO Slower than you think
- 12. How much faster is a server equipped with a SSD than a mechanical hard disk ?
My official account is 「 Develop internal skill and practice 」, I'm not just talking about technical theory here , It's not just about practical experience . It's about combining theory with practice , Deepen the understanding of theory with practice 、 Use theory to improve your technical practice ability . Welcome to my official account , Please also share with your friends ~~~
版权声明
本文为[Zhang Yanfei Allen]所创,转载请带上原文链接,感谢
边栏推荐
- 【自学unity2d传奇游戏开发】地图编辑器
- Summary of front-end interview questions (C, s, s) that front-end engineers need to understand (2)
- Application of restful API based on MVC
- Vue communication and cross component listening state Vue communication
- 常用SQL语句总结
- 代码重构之法——方法重构分析
- Flink's datasource Trilogy 2: built in connector
- What is the meaning of sector sealing of filecoin mining machine since the main network of filecoin was put online
- Will blockchain be the antidote to the global epidemic accelerating the transformation of Internet enterprises?
- An article takes you to understand CSS gradient knowledge
猜你喜欢
Metersphere developer's Manual
检测证书过期脚本
意派Epub360丨你想要的H5模板都在这里,电子书、大转盘、红包雨、问卷调查……
Elasticsearch database | elasticsearch-7.5.0 application construction
游戏主题音乐对游戏的作用
Zero basis to build a web search engine of its own
Python basic variable type -- list analysis
行为型模式之备忘录模式
华为Mate 40 系列搭载HMS有什么亮点?
Isn't data product just a report? absolutely wrong! There are university questions in this category
随机推荐
mongo 用户权限 登录指令
Building a new generation cloud native data lake with iceberg on kubernetes
Network programming NiO: Bio and NiO
CloudQuery V1.2.0 版本发布
华为Mate 40 系列搭载HMS有什么亮点?
An article takes you to understand CSS pagination examples
事件监听问题
面试官: ShardingSphere 学一下吧
如何在终端启动Coda 2中隐藏的首选项?
An article will take you to understand SVG gradient knowledge
Zero basis to build a web search engine of its own
PHP application docking justswap special development kit【 JustSwap.PHP ]
Zero basis to build a web search engine of its own
使用 Iceberg on Kubernetes 打造新一代雲原生資料湖
MongoDB与SQL常用语法对应表
【字节跳动 秋招岗位开放啦】Ohayoo!放学别走,我想约你做游戏!!!
electron 實現檔案下載管理器
Markdown tricks
StickEngine-架构11-消息队列(MessageQueue)
事务的本质和死锁的原理