当前位置:网站首页>Shell implements basic file operations (cutting, sorting, and de duplication)
Shell implements basic file operations (cutting, sorting, and de duplication)
2022-07-03 00:33:00 【Dreamy channeling】
Use Shell Built-in tools , Realize the operation of large text files , Meet the needs of daily data processing .
One 、 command cut - cutting
cut It can process text by column , It is especially suitable for data processing of large files .
The basic grammar is cut [option] filename
Parameters
cut -f Column number , What column do you want to get ;
cut -c Divide by character ;
cut -d Divide in bytes , Automatically ignore multi byte character boundaries , Rujia -n , Then do not split multi byte characters ;
cut n- Designate the n Column start ;
cut n-m Specify from n List to m Column ;
demo demonstration
1) Byte cutting
The original file is shown below .
Get the first two columns . Enter the command cut info.text -d " " -f 1-2, Custom segmentation , Split by space .
2) cutting bash Of PID
Found... In the virtual machine bash Information about , As shown in the figure below .
Carry out orders ps -aux | grep bash | head -n 1 | cut -d " " -f 8, lookup bash process , Take the first line , Space division , Intercept by column , Take the first place 8 Column , The results are shown in the following figure .
Two 、 command sort - Sort
sort Sort the files , And output the sorting result standard or redirection to the specified file .
The basic grammar is **sort [option] **
Parameters
sort -n Sort by numerical value ;
sort -r Sort in reverse order ;
sort -t Separator Default space separator , Separator when sorting ;
sort -k Specify the columns to sort ;
sort -o Save the sorted results to the specified file ;
sort -u Result only , That is to remove the heavy ;
demo demonstration
1) Sort
The original file is shown below .
Carry out orders sort -t " " -k2n,2 infodata.txt, The second column is sorted in ascending numerical order , Note that the sorting should specify from which column to which column , The effect is shown below .
There are duplicate data in the above results , How to remove heavy ?
Add... To the command -uk1,2, Full command sort -t " " -k2n,2 -uk1,2 infodata.txt, The effect is as follows .
How to print out duplicate data ?
Use command sort infodata.txt | uniq -dc, The effect is shown below .
3、 ... and 、 command uniq - duplicate removal
uniq Behavior unit , Compare and remove the weight between lines , It can only be effective De duplication of ordered text , Therefore, sort Command in combination with .
The basic grammar is **uniq [option] **
Parameters
uniq -c Count the number of rows ;
uniq -d Show only duplicate lines and remove duplicates ;
uniq -u Show only unique rows ;
uniq -i Ignore case ;
uniq -f Ignore before N A field , Fields are separated by white space characters ;
demo demonstration
1) Sort and de duplicate
Show only the lines that appear once , Carry out orders sort infodata.txt | uniq -u, The effect is shown below .
For text files with line numbers , Use -f Parameter ignores the first line number field , Reprocess the following fields .
Tests found sort duplicate removal It doesn't seem to work for the last line ( The last line repeats without ), Verify again in practical application .
Reference blog
【1】https://blog.csdn.net/qq_43382735/article/details/121007185
边栏推荐
- JSON conversion tool class
- 【单片机项目实训】八路抢答器
- Cmake basic use
- pageoffice-之bug修改之旅
- Luogu_ P1149 [noip2008 improvement group] matchstick equation_ Enumeration and tabulation
- [pulsar document] concepts and architecture
- 百数不断创新,打造自由的低代码办公工具
- Where can I find foreign papers?
- FRP reverse proxy +msf get shell
- TypeError: Cannot read properties of undefined (reading ***)
猜你喜欢
![Luogu_ P2010 [noip2016 popularization group] reply date_ Half enumeration](/img/a3/55bb71d39801ceeee421a0c8ded333.png)
Luogu_ P2010 [noip2016 popularization group] reply date_ Half enumeration

Hundreds of continuous innovation to create free low code office tools

多进程编程(二):管道

UART、RS232、RS485、I2C和SPI的介绍

写论文可以去哪些网站搜索参考文献?

Cmake basic use

University of Toronto:Anthony Coache | 深度强化学习的条件可诱导动态风险度量

使用jenkins之二Job

maya渔屋建模

pod生命周期详解
随机推荐
Linux软件:如何安装Redis服务
NC24325 [USACO 2012 Mar S]Flowerpot
Andorid gets the system title bar height
Luogu_ P2010 [noip2016 popularization group] reply date_ Half enumeration
Nc20806 District interval
Question e: merged fruit -noip2004tgt2
DotNet圈里一个优秀的ORM——FreeSql
Array common operation methods sorting (including ES6) and detailed use
Pageoffice - bug modification journey
论文的设计方案咋写?
Pytorch 20 realizes corrosion expansion based on pytorch
Feature Engineering: summary of common feature transformation methods
CMake基本使用
Redis21 classic interview questions, extreme pull interviewer
Multiprocess programming (V): semaphores
Automated defect analysis in electron microscopic images-论文阅读笔记
百数不断创新,打造自由的低代码办公工具
The most painful programming problem in 2021, adventure of code 2021 Day24
Rust所有权(非常重要)
Detailed explanation of pod life cycle