当前位置:网站首页>Awk of three swordsmen in text processing

Awk of three swordsmen in text processing

2022-07-07 12:57:00 LC181119

1.awk Working principle and basic usage instructions

awkAho, Weinberger, Kernighan, Report generator , Formatted text output ,GNU/Linux released AWK At present, it is from By the software foundation (FSF) Develop and maintain , It is often called GNU AWK
There are many versions :
  • AWK: Originally from AT & T Laboratory AWK
  • NAWKNew awk,AT & T Laboratory, AWK Upgraded version
  • GAWK: namely GNU AWK. be-all GNU/Linux The release comes with GAWK, It is associated with AWK and NAWK Fully compatible with
GNU AWK User manual documentation
https://www.gnu.org/software/gawk/manual/gawk.html
gawk: Pattern scanning and processing languages , The following functions can be realized
  • Text processing
  • Output formatted text report
  • Perform arithmetic operations
  • Perform string operations
Format :
awk [options]   'program' var=value   file…
awk [options]   -f programfile    var=value file…
explain :
program It's usually put in single quotation marks , And can be composed of three parts
  • BEGIN Sentence block
  • Generic statement blocks for pattern matching
  • END Sentence block

Common options :
  • -F “ Separator Indicates the field separator used for input , The default delimiter is several consecutive white space characters
  • -v var=value Variable assignment
Program Format :
pattern{action statements;..}
pattern: Determines when an action statement triggers an event , such as :BEGIN,END, Regular expressions, etc
action statements: Process the data , Put it in {} It is indicated in the article that , common :print, printf
awk working process
First step : perform BEGIN{action;… } Statement in statement block
The second step : Input from a file or standard (stdin) Read a line , And then execute pattern{ action;… } Sentence block , It scans the file line by line , Repeat the process from the first line to the last line , Until all the files have been read .
The third step : When reading to the end of the input stream , perform END{action;…} Sentence block
BEGIN The sentence block is awk Is executed before starting to read rows from the input stream , This is an optional block of statements , For example, variable initialization 、 hit The header of the printed form and other statements can usually be written in BEGIN In the block
END The sentence block is awk After reading all the lines from the input stream, it is executed , For example, print the analysis results of all lines and summarize such information It's all in END Complete in statement block , It's also an optional block of statements
pattern The general command in the statement block is the most important part , Also optional . If not provided pattern Sentence block , By default { print }, That is to print every read line ,awk Each line read will execute the block of statements

Separator 、 Domains and records
  • Fields separated by separators ( Column column, Domain field) Mark $1,$2...$n It's called a domain identifier ,$0 For all domains , Be careful : and shell Medium variable $ They have different meanings
  • Each line of a file is called a record record
  • If omitted action, By default print $0 The operation of
frequently-used action classification
  • output statementsprint,printf
  • Expressions: The arithmetic , Compare expressions, etc
  • Compound statements: Combining statements
  • Control statementsif, while etc.
  • input statements
awk Control statement
  • { statements;… } Combining statements
  • if(condition) {statements;…}
  • if(condition) {statements;…} else {statements;…}
  • while(conditon) {statments;…}
  • do {statements;…} while(condition)
  • for(expr1;expr2;expr3) {statements;…}
  • break
  • continue
  • exit

2. action print

Format
print item1, item2, ...
explain :
  • GNU sed
  • Output item You can use strings , It's also a numerical value ; The field of the current record 、 Variable or awk The expression of
  • If omitted item, amount to print $0
  • Fixed character characters need to use “ ” Lead up , Variables and numbers don't need
Example : Take out the top websites with the largest number of visits 3 individual IP
[[email protected]_0_10_centos logs]# awk '{print $1}' nginx.access.log-20200428|sort | 
uniq -c |sort -nr|head -3
   5498 122.51.38.20
   2161 117.157.173.214
    953 211.159.177.120
[[email protected] ~]#awk '{print $1}' access_log |sort |uniq -c|sort -nr|head 
   4870 172.20.116.228
   3429 172.20.116.208
   2834 172.20.0.222
   2613 172.20.112.14
   2267 172.20.0.227
   2262 172.20.116.179
   2259 172.20.65.65
   1565 172.20.0.76
   1482 172.20.0.200
   1110 172.20.28.145
Example : Extract partition utilization
[[email protected] ~]# df | awk -F"[[:space:]]+|%" '{print $5}'
Use
0
0
1
0
3
19
1
0

Example : take ifconfig In the output result IP Address

[[email protected] ~]# ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.0.85  netmask 255.255.255.0  broadcast 10.0.0.255
        inet6 fe80::20c:29ff:fe3d:d1e7  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:3d:d1:e7  txqueuelen 1000  (Ethernet)
        RX packets 24590  bytes 25224965 (24.0 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 12793  bytes 4232673 (4.0 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
[[email protected] ~]# ifconfig eth0 | sed -n "2p"
        inet 10.0.0.85  netmask 255.255.255.0  broadcast 10.0.0.255
[[email protected] ~]# ifconfig eth0 | sed -n "2p" | awk '{print $2}'
10.0.0.85

[[email protected] ~]# ifconfig eth0 | awk '/netmask/{print $2}'
10.0.0.85

[[email protected] ~]# ifconfig eth0 | awk 'NR==2{print $2}'
10.0.0.85

3.awk Variable

awk The variables in are divided into : Built in and custom variables

3.1 Common built-in variables

  • FS: Enter field separator , The default is blank , Function equivalent to -F

Example :

[[email protected] ~]#awk -v FS=":" '{print $1FS$3}' /etc/passwd |head -n3
root:0
bin:1
daemon:2
  • OFS: Output field separator , The default is blank

Example :

[[email protected] ~]#awk -v FS=':'   '{print $1,$3,$7}'   /etc/passwd|head -n1
root 0 /bin/bash
[[email protected] ~]#awk -v FS=':' -v OFS=':' '{print $1,$3,$7}'   
/etc/passwd|head -n1
root:0:/bin/bash
  • RS: Input record record Separator , Specify line breaks when entering
Example :
awk -v RS=' ' '{print }' /etc/passwd
  • ORS: Output record separator , Output with specified symbol instead of line break
Example :
awk -v RS=' ' -v ORS='###'  '{print $0}' /etc/passwd
  • NF: Number of fields
Example :
# When referencing variables , You don't need to add before a variable $
[[email protected] ~]#awk -F:'{print NF}' /etc/fstab 
[[email protected] ~]#awk -F:'{print $(NF-1)}' /etc/passwd
[[email protected] ~]#ls /misc/cd/BaseOS/Packages/*.rpm |awk -F"." '{print $(NF-
1)}'|sort |uniq -c
    389 i686
    208 noarch
   1060 x86_64
  • NR: Record number
Example :
[[email protected] ~]#awk '{print NR,$0}' /etc/issue /etc/centos-release
1 \S
2 Kernel \r on an \m
34 CentOS Linux release 8.1.1911 (Core)
  •  FNR: Count each document separately , Record number
Example :
awk '{print FNR}' /etc/fstab /etc/inittab
[[email protected] ~]#awk '{print NR,$0}' /etc/issue /etc/redhat-release 
1 \S
2 Kernel \r on an \m
34 CentOS Linux release 8.0.1905 (Core) 
[[email protected] script40]#awk '{print FNR,$0}' /etc/issue /etc/redhat-release 
1 \S
2 Kernel \r on an \m
31 CentOS Linux release 8.0.1905 (Core)
  • FILENAME: Current filename
Example :
[[email protected] ~]#awk '{print FILENAME}' /etc/fstab
[[email protected] ~]#awk '{print FNR,FILENAME,$0}' /etc/issue /etc/redhat-release 
1 /etc/issue \S
2 /etc/issue Kernel \r on an \m
3 /etc/issue 
1 /etc/redhat-release CentOS Linux release 8.0.1905 (Core)
  • ARGC: Number of command line arguments
Example :
[[email protected] ~]#awk '{print ARGC}' /etc/issue /etc/redhat-release 
3
3
3
3
[[email protected] ~]#awk 'BEGIN{print ARGC}' /etc/issue /etc/redhat-release 
3
  • ARGV: Array , The parameters given by the command line are saved , Every parameter :ARGV[0],......
Example :
[[email protected] ~]#awk 'BEGIN{print ARGV[0]}' /etc/issue /etc/redhat-release 
awk
[[email protected] ~]#awk 'BEGIN{print ARGV[1]}' /etc/issue /etc/redhat-release 
/etc/issue
[[email protected] ~]#awk 'BEGIN{print ARGV[2]}' /etc/issue /etc/redhat-release 
/etc/redhat-release
[[email protected] ~]#awk 'BEGIN{print ARGV[3]}' /etc/issue /etc/redhat-release 
[[email protected] ~]#

3.2 Custom variable

Custom variables are case sensitive , Assign values in the following way
  • -v var=value
  • stay program Directly defined in
Example :  
[[email protected] ~]#awk -v test1=test2="hello,gawk" 'BEGIN{print test1,test2}'   
test2=hello,gawk 
[[email protected] ~]#awk -v test1=test2="hello1,gawk" 
'BEGIN{test1=test2="hello2,gawk";print test1,test2}'   
hello2,gawk hello2,g

4. action printf

printf Can achieve formatted output
Format :
printf “FORMAT”, item1, item2, ...
explain :
  • Must specify FORMAT
  • No line wrapping , Need to explicitly give newline control \n
  • FORMAT Need to be followed by each item Specify formatter
Format symbol : And item One-to-one correspondence
%s: display string
%d, %i: Show decimal integers
%f: Display as floating point
%e, %E: Display scientific count values
%c: Display character's ASCII code
%g, %G: Display values in scientific or floating-point form
%u: Unsigned integer
%%: Show % Oneself
Modifier
#[.#] The first number controls the width of the display ; the second # Precision after decimal point , Such as :%3.1f
- Align left ( Default right alignment ) Such as :%-15s
+   Show positive and negative signs of values   Such as :%+d
Example :
awk -F:   '{printf "%s",$1}' /etc/passwd
awk -F:   '{printf "%s\n",$1}' /etc/passwd
awk -F:   '{printf "%20s\n",$1}' /etc/passwd
awk -F:   '{printf "%-20s\n",$1}' /etc/passwd
awk -F:   '{printf "%-20s %10d\n",$1,$3}' /etc/passwd
awk -F:   '{printf "Username: %s\n",$1}' /etc/passwd
awk -F:   '{printf “Username: %sUID:%d\n",$1,$3}' /etc/passwd
awk -F:   '{printf "Username: %25sUID:%d\n",$1,$3}' /etc/passwd
awk -F:   '{printf "Username: %-25sUID:%d\n",$1,$3}'

5. The operator

arithmetic operator :

x+y, x-y, x*y, x/y, x^y, x%y
-x: Convert to negative
+x: Convert a string to a number
String Operators : Unsigned operator , String connection
Assignment operator :
=, +=, -=, *=, /=, %=, ^=,++, --
Example :
[[email protected] ~]#awk 'BEGIN{i=0;print i++,i}'
0 1
[[email protected] ~]#awk 'BEGIN{i=0;print ++i,i}'
1 1
Comparison operator :
==, !=, >, >=, <, <=
Example : Take strange , Even number line
[[email protected] ~]#seq 10 | awk 'NR%2==0'
2
4
6
8
10
[[email protected] ~]#seq 10 | awk 'NR%2==1'
1
3
5
7
9
Pattern match :
~ Whether the left side matches the right side , Inclusion relation
!~ Mismatch or not
Example :
[[email protected] ~]#awk -F: '$0 ~ /root/{print $1}' /etc/passwd
[[email protected] ~]#awk -F: '$0 ~ "^root"{print $1}' /etc/passwd
[[email protected] ~]#awk '$0 !~ /root/'   /etc/passwd
[[email protected] ~]#awk '/root/'   /etc/passwd
[[email protected] ~]#awk -F: '/r/' /etc/passwd
[[email protected] ~]#awk -F: '$3==0'     /etc/passwd
[[email protected] ~]#df | awk -F"[[:space:]]+|%" '$0 ~ /^\/dev\/sd/{print $5}'
51
92
[[email protected] ~]#ifconfig eth0 | awk 'NR==2{print $2}'
10.0.0.8
Logical operators :
And :&&, And the relationship
or :||, Or the relationship
Not :!, Take the opposite
Example :
[[email protected] ~]#awk 'BEGIN{print !i}'
1
[[email protected] ~]#awk -v i=10 'BEGIN{print !i}'
0
[[email protected] ~]#awk -v i=-3 'BEGIN{print !i}'
0
[[email protected] ~]#awk -v i=0 'BEGIN{print !i}'
1
[[email protected] ~]#awk -v i=abc 'BEGIN{print !i}'
0
Conditional expression ( Binocular expression )
selector?if-true-expression:if-false-expression

6. Pattern PATTERN

PATTERN: according to pattern Conditions , Filter matching rows , Do it again.
  • If not specified : Empty mode , Match each line
Example :
[[email protected] ~]#awk -F: '{print $1,$3}' /etc/passwd
  • /regular expression/: Only rows that can be pattern matched are processed , Need to use / / Cover up
  • relational expression: Relationship expression , The result is really Will be dealt with
         really : Result is not 0 value , Non empty string
         false : The result is an empty string or 0 value
  • line ranges: Line scope
  • Direct use of line numbers is not supported , But you can use variables NR Indirectly specify the line number
        /pat1/,/pat2/ Direct number format is not supported
  • BEGIN/END Pattern
        BEGIN{}: Execute only once before starting to process text in the file
        END{}: Only once after text processing is complete

7. conditional if-else

grammar :
if(condition){statement;…}[else statement]
if(condition1){statement1}else if(condition2){statement2}else if(condition3)
{statement3}...... else {statementN}
Use scenarios : Yes awk Get the whole line or a field for condition judgment

8. conditional switch

grammar :
switch(expression) {case VALUE1 or /REGEXP/: statement1; case VALUE2 or 
/REGEXP2/: statement2; ...; default: statementn}

9. loop while

grammar :
while (condition) {statement;…}
Conditions really , Into the loop ; Conditions false , Exit loop
Use scenarios :
         Used for similar processing of multiple fields in a row one by one
         Use when processing each element of an array one by one

10. loop do-while

grammar :
do {statement;…}while(condition)
significance : True or false , Execute the loop body at least once
do-while loop
grammar :do {statement;…}while(condition)
significance : True or false , Execute the loop body at least once

11. loop for

grammar :
for(expr1;expr2;expr3) {statement;…}
Common use :
for(variable assignment;condition;iteration process) {for-body}

Special Usage : Can traverse the elements in an array

for(var in array) {for-body}

12.continue and break

continue Break this cycle
break Interrupt the whole cycle
Format :
continue [n]
break [n]

13.next

next You can end the processing of this line in advance and go directly to the next line (awk Self circulation )

14. Array

awk The array of is an associative array
Format
array_name[index-expression]
index-expression
  • Using the array , Realization k/v function
  • Any string can be used ; String to be enclosed in double quotes
  • If an array element does not exist in advance , When quoted ,awk This element will be created automatically , And initialize its value to Empty string
  • To determine whether an element exists in an array , To use “index in array” Format for traversal

15.awk function

awk The functions of are divided into built-in and user-defined functions

Official documents
https://www.gnu.org/software/gawk/manual/gawk.html#Functions

15.1 Common built-in functions

  • Numerical processing :
rand(): return 0 and 1 A random number between 
srand(): coordination rand()  function , Seeds that generate random numbers 
int(): Return integer 
  • string manipulation :
length([s]): Returns the length of the specified string 
sub(r,s,[t]): Yes t String search r Represents the content of pattern matching , And replace the first match with s
gsub(r,s,[t]): Yes t String to search r Content of the pattern match represented by , And replace them all with s Content represented 
split(s,array,[r]): With r Separator , Cut string s, And save the results after cutting to array In the array represented , The first 
 An index value is 1, The second index value is 2,…
  • Sure awk Call in shell command
system('cmd')
Space is awk String connector in , If system Required in awk Variables in can be separated by spaces , Or say
except awk All variables except "" To quote
  • Time function
Official documents : Time function
https://www.gnu.org/software/gawk/manual/gawk.html#Time-Functions
systime()  The current time is 1970 year 1 month 1 Seconds of the day 
strftime()  Specify the time format   

15.2 Custom function

Custom function format :
function name ( parameter, parameter, ... ) {
   statements
   return expression
}

16.awk Script

take awk Script the program , Call or execute directly
towards awk Script pass through parameters
Format :
awkfile  var=value  var2=value2... Inputfile
Be careful : stay BEGIN Not available in process . Until the first line of input is complete , Variable available . Can pass -v Parameters , Give Way awk In execution BEGIN Get the value of the variable before . One for each specified variable on the command line -v Parameters

原网站

版权声明
本文为[LC181119]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202130616434052.html