当前位置:网站首页>Why is the LS command stuck when there are too many files?

Why is the LS command stuck when there are too many files?

2020-11-06 21:04:00 Zhang Yanfei Allen

I don't know if you have ever encountered a folder with many files , Execute below ls Problems that have to wait a long time to show when ordering ? If there is , Have you ever thought about why , How can we solve ? To get a deeper understanding of the causes of this problem , We need to start with the disk space occupied by the folder .

inode Consumption verification

stay 《 How much disk space does a new empty file take ?》 I mentioned that each file consumes a little bit of space in its folder . Folder , In fact, it will also consume inode Of . Let's take a look at the current inode The occupancy of

# df -i  
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
......
/dev/sdb1            2147361984 12785020 2134576964    1% /search

Create another empty folder

# mkdir temp
# df -i
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
......
/dev/sdb1            2147361984 12785021 2134576963    1% /search

adopt IUsed You can see , Like an empty file , Empty folders also consume one inode. But this one is very small , It's on my machine 256 It's just bytes , It should not have caused ls Order the culprit of the card owner .

block Consumption verification

Where is the name of the folder ? Um. , and 《 How much disk space does a new empty file take ?》 The files in are similar to , Will consume one ext4_dir_entry_2 ( Use today ext4 give an example , It's in linux Source code fs/ext4/ex4.h The document defines ), Put it in its parent directory block In the . According to this , I believe you can think of it soon , If it creates a bunch of files under its own node , It will take its own block. Let's start to verify :

# mkdir test
# cd test  
# du -h
4.0K    .

there 4KB It means that one is consumed block. Empty files don't consume block, Why is an empty directory consumed in the first place block What about it , That's because it has to default with two directory entries "." and "..". And this one 4K It doesn't have to be that big on your machine , It's actually a block size, It was decided when you formatted .

Let's create two more empty files , Check it again :

# touch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab
# touch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
# du -h
4.0K    .

Looks like , Nothing has changed . This is because

  • First of all 、 The new empty file does not occupy block, So what is shown here is still occupied by the directory block.
  • second 、 Previously assigned when the folder was created 4KB There's still space in it , There's enough space for these two file items

So I'll try to create more , Using scripts to create 100 File names with a length of 32Byte Empty file .

#!/bin/bash
for((i=1;i<=100;i++));
do
        file="tempDir/"$(echo $i|awk '{printf("%032d",$0)}')
        echo $file
        touch $file
done
# du -h
12K    .

ha-ha , At this time, we found that the disk space occupied by the directory has increased , a 3 individual Block 了 . When we create 10000 When it comes to documents ,

# du -h
548K     .

In every one of them ext4_dir_entry_2 Except for the file name , It also records inode Number and other information , The detailed definition is as follows :

struct ext4_dir_entry_2 {
        __le32  inode;                  /* Inode number */
        __le16  rec_len;                /* Directory entry length */
        __u8    name_len;               /* Name length */
        __u8    file_type;
        char    name[EXT4_NAME_LEN];    /* File name */
};

Let's calculate , Average space per file =548K/10000=54 byte . in other words , Than our file name 32 A little bit bigger , Basically right . Here we also get a fact , The longer the file name , The more space it consumes in its parent directory .

This paper concludes that

A folder, of course, also consumes disk space .

  • The first thing to do is to consume one inode, On my machine, it's 256 byte
  • Need to consume a directory entry under its parent directory ext4_dir_entry_2, Save yourself inode Number , Directory name .
  • Below it, if you create a folder or file , It needs to be in its own block in ext4_dir_entry_2 Array

A file in a directory / The more subdirectories , The more you need to apply for a catalog block. in addition ext4_dir_entry_2 Size is not fixed , file name / The longer the subdirectory name is , The more space a single directory entry consumes .

For the opening question , I think you should understand by now why , The problem is in the folder block On the body . This is when there are so many files under your folder , Especially when the file name is long , It will consume a lot of block. When you traverse the folder , If Page Cache You didn't hit what you were going to visit block, It will penetrate into the disk and do the actual IO. From your point of view , It's when you're done ls after , Get stuck .

So you're sure to ask , I really want to save a lot of documents , What am I gonna do? ? It's also very simple , Just create more folders , Don't save too much in one directory , There won't be such a problem . In engineering practice , The general way is to go through the first level or even the second level hash Hash files into multiple directories , Control the number of single directory files to 100000 or less .

ext Of bug

It seems that today's practice should be over , Now let's delete all the files we just created , Look again. .

# rm -f *
# du -h
72K     .

wait , What circumstance ? The files in the folder have been deleted , Why does this folder still occupy 72K Of disk space ? This doubt has been with me for a long time , It was only later that I was able to solve the puzzle . The key is ext4_dir_entry_2 Medium rec_len. This variable stores the current entire ext4_dir_entry_2 Length of object , So when the operating system traverses the folder , You can pass the current pointer , Add this length to find the next file in the folder dir_entry 了 . The advantage is that traversal is very convenient , It's kind of like a linked list , One by one . however , If you want to delete a file , It's a little bit of a hassle , The current file structure variable cannot be deleted directly , Otherwise, the list will be broken . Linux When you delete a file , In its catalog, it just put inode Set to 0 Just pull it down , It didn't recycle the whole thing ext4_dir_entry_2 object . In fact, we often use the false deletion when we do the project . current xfs The file system doesn't seem to have this little problem anymore , But how to solve it , There is no in-depth study for the time being , If you have an answer , Welcome to leave a message !


file


Development of hard disk album of internal training :


My official account is 「 Develop internal skill and practice 」, I'm not just talking about technical theory here , It's not just about practical experience . It's about combining theory with practice , Deepen the understanding of theory with practice 、 Use theory to improve your technical practice ability . Welcome to my official account , Please also share with your friends ~~~

版权声明
本文为[Zhang Yanfei Allen]所创,转载请带上原文链接,感谢