当前位置:网站首页>Shell implements basic file operations (cutting, sorting, and de duplication)
Shell implements basic file operations (cutting, sorting, and de duplication)
2022-07-03 00:33:00 【Dreamy channeling】
Use Shell Built-in tools , Realize the operation of large text files , Meet the needs of daily data processing .
One 、 command cut - cutting
cut It can process text by column , It is especially suitable for data processing of large files .
The basic grammar is cut [option] filename
Parameters
cut -f Column number , What column do you want to get ;
cut -c Divide by character ;
cut -d Divide in bytes , Automatically ignore multi byte character boundaries , Rujia -n , Then do not split multi byte characters ;
cut n- Designate the n Column start ;
cut n-m Specify from n List to m Column ;
demo demonstration
1) Byte cutting
The original file is shown below .
Get the first two columns . Enter the command cut info.text -d " " -f 1-2, Custom segmentation , Split by space .
2) cutting bash Of PID
Found... In the virtual machine bash Information about , As shown in the figure below .
Carry out orders ps -aux | grep bash | head -n 1 | cut -d " " -f 8, lookup bash process , Take the first line , Space division , Intercept by column , Take the first place 8 Column , The results are shown in the following figure .
Two 、 command sort - Sort
sort Sort the files , And output the sorting result standard or redirection to the specified file .
The basic grammar is **sort [option] **
Parameters
sort -n Sort by numerical value ;
sort -r Sort in reverse order ;
sort -t Separator Default space separator , Separator when sorting ;
sort -k Specify the columns to sort ;
sort -o Save the sorted results to the specified file ;
sort -u Result only , That is to remove the heavy ;
demo demonstration
1) Sort
The original file is shown below .
Carry out orders sort -t " " -k2n,2 infodata.txt, The second column is sorted in ascending numerical order , Note that the sorting should specify from which column to which column , The effect is shown below .
There are duplicate data in the above results , How to remove heavy ?
Add... To the command -uk1,2, Full command sort -t " " -k2n,2 -uk1,2 infodata.txt, The effect is as follows .
How to print out duplicate data ?
Use command sort infodata.txt | uniq -dc, The effect is shown below .
3、 ... and 、 command uniq - duplicate removal
uniq Behavior unit , Compare and remove the weight between lines , It can only be effective De duplication of ordered text , Therefore, sort Command in combination with .
The basic grammar is **uniq [option] **
Parameters
uniq -c Count the number of rows ;
uniq -d Show only duplicate lines and remove duplicates ;
uniq -u Show only unique rows ;
uniq -i Ignore case ;
uniq -f Ignore before N A field , Fields are separated by white space characters ;
demo demonstration
1) Sort and de duplicate
Show only the lines that appear once , Carry out orders sort infodata.txt | uniq -u, The effect is shown below .
For text files with line numbers , Use -f Parameter ignores the first line number field , Reprocess the following fields .
Tests found sort duplicate removal It doesn't seem to work for the last line ( The last line repeats without ), Verify again in practical application .
Reference blog
【1】https://blog.csdn.net/qq_43382735/article/details/121007185
边栏推荐
猜你喜欢
Why is the website slow to open?
Shell 实现文件基本操作(sed-编辑、awk-匹配)
[shutter] Introduction to the official example of shutter Gallery (project introduction | engineering construction)
布隆过滤器
An excellent orm in dotnet circle -- FreeSQL
Monitor container runtime tool Falco
多进程编程(二):管道
Solution to the problem of abnormal display of PDF exported Chinese documents of confluence
Automated defect analysis in electronic microscopic images
Linux Software: how to install redis service
随机推荐
CMake基本使用
论文的英文文献在哪找(除了知网)?
Slf4j + logback logging framework
Pageoffice - bug modification journey
MySQL 23道经典面试吊打面试官
FAQ | FAQ for building applications for large screen devices
Shell 实现文件基本操作(切割、排序、去重)
node_modules删不掉
Don't want teachers to see themselves with cameras in online classes? Virtual camera you deserve!
MySQL 23 classic interview hanging interviewer
[Chongqing Guangdong education] audio visual language reference materials of Xinyang Normal University
微信小程序获取某个元素的信息(高、宽等),并将px转换为rpx。
LeedCode1480. Dynamic sum of one-dimensional array
Markdown tutorial
Shell脚本基本使用
为什么网站打开速度慢?
Understanding and application of least square method
[IELTS reading] Wang Xiwei reading P1 (reading judgment question)
What is the standard format of a 2000-3000 word essay for college students' classroom homework?
An excellent orm in dotnet circle -- FreeSQL