当前位置:网站首页>Why is the LS command stuck when there are too many files?
Why is the LS command stuck when there are too many files?
2020-11-06 21:04:00 【Zhang Yanfei Allen】
I don't know if you have ever encountered a folder with many files , Execute below ls
Problems that have to wait a long time to show when ordering ? If there is , Have you ever thought about why , How can we solve ? To get a deeper understanding of the causes of this problem , We need to start with the disk space occupied by the folder .
inode Consumption verification
stay 《 How much disk space does a new empty file take ?》 I mentioned that each file consumes a little bit of space in its folder . Folder , In fact, it will also consume inode Of . Let's take a look at the current inode The occupancy of
# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
......
/dev/sdb1 2147361984 12785020 2134576964 1% /search
Create another empty folder
# mkdir temp
# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
......
/dev/sdb1 2147361984 12785021 2134576963 1% /search
adopt IUsed You can see , Like an empty file , Empty folders also consume one inode. But this one is very small , It's on my machine 256 It's just bytes , It should not have caused ls
Order the culprit of the card owner .
block Consumption verification
Where is the name of the folder ? Um. , and 《 How much disk space does a new empty file take ?》 The files in are similar to , Will consume one ext4_dir_entry_2
( Use today ext4 give an example , It's in linux Source code fs/ext4/ex4.h The document defines ), Put it in its parent directory block In the . According to this , I believe you can think of it soon , If it creates a bunch of files under its own node , It will take its own block. Let's start to verify :
# mkdir test
# cd test
# du -h
4.0K .
there 4KB It means that one is consumed block. Empty files don't consume block, Why is an empty directory consumed in the first place block What about it , That's because it has to default with two directory entries "." and "..". And this one 4K It doesn't have to be that big on your machine , It's actually a block size, It was decided when you formatted .
Let's create two more empty files , Check it again :
# touch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab
# touch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
# du -h
4.0K .
Looks like , Nothing has changed . This is because
- First of all 、 The new empty file does not occupy block, So what is shown here is still occupied by the directory block.
- second 、 Previously assigned when the folder was created 4KB There's still space in it , There's enough space for these two file items
So I'll try to create more , Using scripts to create 100 File names with a length of 32Byte Empty file .
#!/bin/bash
for((i=1;i<=100;i++));
do
file="tempDir/"$(echo $i|awk '{printf("%032d",$0)}')
echo $file
touch $file
done
# du -h
12K .
ha-ha , At this time, we found that the disk space occupied by the directory has increased , a 3 individual Block 了 . When we create 10000 When it comes to documents ,
# du -h
548K .
In every one of them ext4_dir_entry_2
Except for the file name , It also records inode Number and other information , The detailed definition is as follows :
struct ext4_dir_entry_2 {
__le32 inode; /* Inode number */
__le16 rec_len; /* Directory entry length */
__u8 name_len; /* Name length */
__u8 file_type;
char name[EXT4_NAME_LEN]; /* File name */
};
Let's calculate , Average space per file =548K/10000=54 byte . in other words , Than our file name 32 A little bit bigger , Basically right . Here we also get a fact , The longer the file name , The more space it consumes in its parent directory .
This paper concludes that
A folder, of course, also consumes disk space .
- The first thing to do is to consume one inode, On my machine, it's 256 byte
- Need to consume a directory entry under its parent directory
ext4_dir_entry_2
, Save yourself inode Number , Directory name . - Below it, if you create a folder or file , It needs to be in its own block in
ext4_dir_entry_2
Array
A file in a directory / The more subdirectories , The more you need to apply for a catalog block. in addition ext4_dir_entry_2
Size is not fixed , file name / The longer the subdirectory name is , The more space a single directory entry consumes .
For the opening question , I think you should understand by now why , The problem is in the folder block On the body . This is when there are so many files under your folder , Especially when the file name is long , It will consume a lot of block. When you traverse the folder , If Page Cache You didn't hit what you were going to visit block, It will penetrate into the disk and do the actual IO. From your point of view , It's when you're done ls
after , Get stuck .
So you're sure to ask , I really want to save a lot of documents , What am I gonna do? ? It's also very simple , Just create more folders , Don't save too much in one directory , There won't be such a problem . In engineering practice , The general way is to go through the first level or even the second level hash Hash files into multiple directories , Control the number of single directory files to 100000 or less .
ext Of bug
It seems that today's practice should be over , Now let's delete all the files we just created , Look again. .
# rm -f *
# du -h
72K .
wait , What circumstance ? The files in the folder have been deleted , Why does this folder still occupy 72K Of disk space ? This doubt has been with me for a long time , It was only later that I was able to solve the puzzle . The key is ext4_dir_entry_2
Medium rec_len
. This variable stores the current entire ext4_dir_entry_2
Length of object , So when the operating system traverses the folder , You can pass the current pointer , Add this length to find the next file in the folder dir_entry
了 . The advantage is that traversal is very convenient , It's kind of like a linked list , One by one . however , If you want to delete a file , It's a little bit of a hassle , The current file structure variable cannot be deleted directly , Otherwise, the list will be broken . Linux When you delete a file , In its catalog, it just put inode Set to 0 Just pull it down , It didn't recycle the whole thing ext4_dir_entry_2
object . In fact, we often use the false deletion when we do the project . current xfs The file system doesn't seem to have this little problem anymore , But how to solve it , There is no in-depth study for the time being , If you have an answer , Welcome to leave a message !
Development of hard disk album of internal training :
- 1. Disk opening : Take off the hard coat of the mechanical hard disk !
- 2. Disk partitioning also implies technical skills
- 3. How can we solve the problem that mechanical hard disks are slow and easy to break down ?
- 4. Disassemble the SSD structure
- 5. How much disk space does a new empty file take ?
- 6. Only 1 How much disk space does a byte file actually take up
- 7. When there are too many documents ls Why is the command stuck ?
- 8. Understand the principle of formatting
- 9.read How much disk does a byte of file actually take place on IO?
- 10.write When to write to disk after one byte of file IO?
- 11. Mechanical hard disk random IO Slower than you think
- 12. How much faster is a server equipped with a SSD than a mechanical hard disk ?
My official account is 「 Develop internal skill and practice 」, I'm not just talking about technical theory here , It's not just about practical experience . It's about combining theory with practice , Deepen the understanding of theory with practice 、 Use theory to improve your technical practice ability . Welcome to my official account , Please also share with your friends ~~~
版权声明
本文为[Zhang Yanfei Allen]所创,转载请带上原文链接,感谢
边栏推荐
- html+vue.js 實現分頁可相容IE
- From overseas to China, rancher wants to do research on container cloud market
- Vue communication and cross component listening state Vue communication
- 事务的隔离级别与所带来的问题
- GitHub: the foundation of the front end
- What are the common problems of DTU connection
- 【字节跳动 秋招岗位开放啦】Ohayoo!放学别走,我想约你做游戏!!!
- What knowledge do Python automated testing learn?
- 游戏主题音乐对游戏的作用
- Flink's datasource Trilogy 2: built in connector
猜你喜欢
EOS founder BM: what's the difference between UE, UBI and URI?
【转发】查看lua中userdata的方法
To teach you to easily understand the basic usage of Vue codemirror: mainly to achieve code editing, verification prompt, code formatting
Behind the record breaking Q2 revenue of Alibaba cloud, the cloud opening mode is reshaping
Pn8162 20W PD fast charging chip, PD fast charging charger scheme
ES6 learning notes (3): teach you to use js object-oriented thinking to realize the function of adding, deleting, modifying and checking tab column
【:: 是什么语法?】
Take you to learn the new methods in Es5
Vue communication and cross component listening state Vue communication
ERD-ONLINE 免费在线数据库建模工具
随机推荐
一路踩坑,被迫聊聊 C# 代码调试技巧和远程调试
image operating system windows cannot be used on this platform
To Lianyun analysis: why is IPFs / filecoin mining so difficult?
開源一套極簡的前後端分離專案腳手架
Filecoin has completed a major upgrade and achieved four major project progress!
es创建新的索引库并拷贝旧的索引库 实践亲测有效!
华为Mate 40 系列搭载HMS有什么亮点?
Basic usage of GDB debugging
Zero basis to build a web search engine of its own
What are the criteria for selecting a cluster server?
What is the purchasing supplier system? Solution of purchasing supplier management platform
Outsourcing is really difficult. As an outsourcer, I can't help sighing.
Tron smart wallet PHP development kit [zero TRX collection]
The importance of big data application is reflected in all aspects
Get twice the result with half the effort: automation without cabinet
[efficiency optimization] Nani? Memory overflow again?! It's time to sum up the wave!!
What knowledge do Python automated testing learn?
Share with Lianyun: is IPFs / filecoin worth investing in?
Python basic variable type -- list analysis
Swagger 3.0 brushes the screen every day. Does it really smell good?