当前位置:网站首页>Shell implements basic file operations (cutting, sorting, and de duplication)
Shell implements basic file operations (cutting, sorting, and de duplication)
2022-07-03 00:33:00 【Dreamy channeling】
Use Shell Built-in tools , Realize the operation of large text files , Meet the needs of daily data processing .
One 、 command cut - cutting
cut It can process text by column , It is especially suitable for data processing of large files .
The basic grammar is cut [option] filename
Parameters
cut -f Column number , What column do you want to get ;
cut -c Divide by character ;
cut -d Divide in bytes , Automatically ignore multi byte character boundaries , Rujia -n , Then do not split multi byte characters ;
cut n- Designate the n Column start ;
cut n-m Specify from n List to m Column ;
demo demonstration
1) Byte cutting
The original file is shown below .
Get the first two columns . Enter the command cut info.text -d " " -f 1-2, Custom segmentation , Split by space .
2) cutting bash Of PID
Found... In the virtual machine bash Information about , As shown in the figure below .
Carry out orders ps -aux | grep bash | head -n 1 | cut -d " " -f 8, lookup bash process , Take the first line , Space division , Intercept by column , Take the first place 8 Column , The results are shown in the following figure .
Two 、 command sort - Sort
sort Sort the files , And output the sorting result standard or redirection to the specified file .
The basic grammar is **sort [option] **
Parameters
sort -n Sort by numerical value ;
sort -r Sort in reverse order ;
sort -t Separator Default space separator , Separator when sorting ;
sort -k Specify the columns to sort ;
sort -o Save the sorted results to the specified file ;
sort -u Result only , That is to remove the heavy ;
demo demonstration
1) Sort
The original file is shown below .
Carry out orders sort -t " " -k2n,2 infodata.txt, The second column is sorted in ascending numerical order , Note that the sorting should specify from which column to which column , The effect is shown below .
There are duplicate data in the above results , How to remove heavy ?
Add... To the command -uk1,2, Full command sort -t " " -k2n,2 -uk1,2 infodata.txt, The effect is as follows .
How to print out duplicate data ?
Use command sort infodata.txt | uniq -dc, The effect is shown below .
3、 ... and 、 command uniq - duplicate removal
uniq Behavior unit , Compare and remove the weight between lines , It can only be effective De duplication of ordered text , Therefore, sort Command in combination with .
The basic grammar is **uniq [option] **
Parameters
uniq -c Count the number of rows ;
uniq -d Show only duplicate lines and remove duplicates ;
uniq -u Show only unique rows ;
uniq -i Ignore case ;
uniq -f Ignore before N A field , Fields are separated by white space characters ;
demo demonstration
1) Sort and de duplicate
Show only the lines that appear once , Carry out orders sort infodata.txt | uniq -u, The effect is shown below .
For text files with line numbers , Use -f Parameter ignores the first line number field , Reprocess the following fields .
Tests found sort duplicate removal It doesn't seem to work for the last line ( The last line repeats without ), Verify again in practical application .
Reference blog
【1】https://blog.csdn.net/qq_43382735/article/details/121007185
边栏推荐
- Program analysis and Optimization - 9 appendix XLA buffer assignment
- [golang syntax] map common errors golang panic: assignment to entry in nil map
- Shell 实现文件基本操作(sed-编辑、awk-匹配)
- Markdown tutorial
- There is an unknown problem in inserting data into the database
- An excellent orm in dotnet circle -- FreeSQL
- Multi process programming (III): message queue
- 为什么网站打开速度慢?
- University of Toronto:Anthony Coache | 深度强化学习的条件可诱导动态风险度量
- Array common operation methods sorting (including ES6) and detailed use
猜你喜欢

Automated defect analysis in electronic microscopic images

可下载《2022年中国数字化办公市场研究报告》详解1768亿元市场

FAQ | FAQ for building applications for large screen devices

University of Toronto:Anthony Coache | 深度强化学习的条件可诱导动态风险度量
![Luogu_ P1149 [noip2008 improvement group] matchstick equation_ Enumeration and tabulation](/img/4a/ab732c41ea8a939fa0983fec475622.png)
Luogu_ P1149 [noip2008 improvement group] matchstick equation_ Enumeration and tabulation

Markdown tutorial
![Luogu_ P2010 [noip2016 popularization group] reply date_ Half enumeration](/img/a3/55bb71d39801ceeee421a0c8ded333.png)
Luogu_ P2010 [noip2016 popularization group] reply date_ Half enumeration
![[target detection] r-cnn, fast r-cnn, fast r-cnn learning](/img/f0/df285f01ffadff62eb3dcb92f2e04f.jpg)
[target detection] r-cnn, fast r-cnn, fast r-cnn learning

One of the reasons why setinterval timer does not take effect in ie: the callback is the arrow function
![[shutter] Introduction to the official example of shutter Gallery (project introduction | engineering construction)](/img/f7/a8eb8e40b9ea25021751d7150936ac.jpg)
[shutter] Introduction to the official example of shutter Gallery (project introduction | engineering construction)
随机推荐
关于XML一些介绍和注意事项
Go自定义排序
NC24325 [USACO 2012 Mar S]Flowerpot
Architecture: database architecture design
LeedCode1480. Dynamic sum of one-dimensional array
The most painful programming problem in 2021, adventure of code 2021 Day24
Bigder: how to deal with the bugs found in the 32/100 test if they are not bugs
[Chongqing Guangdong education] audio visual language reference materials of Xinyang Normal University
MySQL 23道经典面试吊打面试官
写论文可以去哪些网站搜索参考文献?
About the practice topic of screen related to unity screen, unity moves around a certain point inside
There is an unknown problem in inserting data into the database
TypeError: Cannot read properties of undefined (reading ***)
Pat 1030 travel plan (30 points) (unfinished)
node_modules删不掉
What are the recommended thesis translation software?
Angled detection frame | calibrated depth feature for target detection (with implementation source code)
Maya fishing house modeling
JS interviewer wants to know how much you understand call, apply, bind no regrets series
论文的设计方案咋写?