当前位置:网站首页>[turn] explain awk (1)__ Awk Basics_ Options_ Program segment parsing and examples
[turn] explain awk (1)__ Awk Basics_ Options_ Program segment parsing and examples
2022-06-13 05:53:00 【morpheusWB】
Catalog
order : review shell The relationship between the three swordsmen
One 、awk Basics
1.1awk brief introduction
1.2 Basic grammar
Two awk in OPTIONS Explanation
2.0 parameter list
2.1 -f program-file
2.2 -F fs
2.3 -v var =val
2.4 -d [file]
2.5 -h
2.6 -P To be added
2.7 -S To be added
2.8 -V Display version
3、 ... and awk in PROGRAM Explanation
3.1 Easy to understand
3.2 PROGRAM The grammar of
3.3 PROGRAM The relationship between the two parts of
3.4 awk For each line in PROGRAM The process of
3.5 Detailed explanation PROGRAM in PATTERN part
order : review shell The relationship between the three swordsmen
shell The three swordsmen are best at solving problems
sed Solve the problem of file modification
grep Solve the problem found
awk Solve the problem of interception
One 、awk Basics
1.1awk brief introduction
1.1.1awk A brief history
a. Now it's 1985 Version of
b. Branch version
nawk: The update of the eighties
mawk、gawk:GNU Project
1.1.2awk Briefly explain
a. A programming language designed specifically for text processing
b. For data extraction and reporting ( Generate format report ) Tools for
c. Is a data flow driven scripting language
Contains a set of operations for text data streams , You can directly process text files .
Data can be obtained from the pipe symbol .
d. Good at handling databases and phenotype files
e.awk It's an explanatory language
f. Be similar to C Grammar of language
g. Can use a very short program to modify the documentation 、 Compare 、 extract 、 Printout
h. Benboyi awk Of gawk For example , By default centos What you use is gawk
[root@localhost ~]# ll /usr/bin/awk
lrwxrwxrwx. 1 root root 4 Jan 2 17:02 /usr/bin/awk -> gawk
1.1.3awk Features and advantages
a. There are many built-in functions and variables .
b. Extended regular expressions used by default .
contrast :sed -r,grep -E Using extended regular expressions , and sed and grep By default, the tool uses basic regular expressions
k.awk And sed similar , Read the contents line by line from the input stream , And the normalized output after the trip is extracted by some rules . So the concept of record separator is involved ( That is, to define what a row should contain ).
1.2 Basic grammar
1.2.1 awk What the program will do automatically
Read the input line
Segment fields ( Use field separators )
Storage management
initialization
If user-defined variables are used, there is no need to declare variable types ,awk Built in string type and numeric type variables .
Concepts in analogy database , Each row read is called a record , The fields separated by delimiters in the records are .
1.2.2 Grammar format
awk [OPTIONS] -f PROGRAM_FILE [--] filename_list
awk [OPTIONS] [--] PROGRAM filename_list
1.2.3 Simple understanding PROGRAM
Can be PROGRAM Comprehend sed Medium script,PROGRAM By multiple PATTERN and ACTION form ,PATTERN In some form ( Like regular expressions ) To determine whether the current row is the matching rule of the row I need to process , and ACTION Is the action to be performed after the matching is successful .
#PROGRAM Common common formats for :
awk [OPTIONS] [--] 'pattern{action};pattern{action};' filename_list
1.2.4 Simply understand the field separator
awk Search the file and put each input line according to awk The originally defined field separator FS Split records into fields (field). The field separator is each awk One of the built-in variables of the program . Of course ,awk The program also has built-in variables of other string and numeric types , Used to conveniently represent certain quantities ( For example, field separator 、 Record separator, etc ).
# For example, dealing with /etc/passwd Files are usually handled this way
[root@localhost ~]# awk -F: '{print $1,$4}' /etc/passwd
root 0
bin 1
daemon 2
adm 4
lp 7
sync 0
shutdown 0
in addition ,awk It also supports user-defined variables without declaring their data types ( because awk The program has built-in string and numeric types ).
# Parameters are usually used -v var=val Form designation
1.2.5 Simple understanding of field variables
for example :$3 It's a position parameter , Represents the third field
[tyson@localhost learnawk]$ awk '$3 > 0 { print $1,$2 * $3 }' firstprogram.txt
Kathy 40
Mark 100
Mary 121
Susie 76.5
[tyson@localhost learnawk]$ cat firstprogram.txt
Beth 4.00 0
Dan 3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18
Be careful :
awk The command section in (PROGRAM part ) Enclose... In single quotation marks , Otherwise, it is used shell Program to parse .
Enclosing the command part in quotation marks is to tell the system to let awk Program to parse the command , The content enclosed in single quotation marks is a complete awk Program .
1.2.6 Simple understanding filname_list meaning
It can be multiple files
[tyson@localhost learnawk]$ awk ' $3>0 {print$1,$2 * $3}' firstprogram.txt firstprogram2.txt
Kathy 40
Mark 100
Mary 121
Susie 76.5
Kathy 40
Mark 100
Mary 121
Susie 76.5
1.2.7 first awk Program
[[email protected] testawk]$ cat text.txt
SR 8649 275 Asia
Canada 3852 25 North_America
China 3705 1032 Asia
USA 3615 237 North_America
Brazil 3286 134 South_America
India 1267 746 Asia
Mexico 762 78 North_America
France 211 55 Europe
Japan 144 120 Asia
Germany 96 61 Europe
England 94 56 Europe
[[email protected] testawk]$ awk '$2>3000{ print $2*$3 }' text.txt
2378475
96300
3823560
856755
440324
The awk In the command program part ( The part of the sheet enclosed in quotation marks ) There is only one pattern{action} The complete program of composition .
among :
PATTERN yes $2>3000, Execute after pattern matching ACTION
ACTION yes print $2*$3, Print the product of the second and third fields
Two awk in OPTIONS Explanation
2.0 parameter list
Common parts
-f program_file_list
The specification contains awk The document of the order , Usually, the .awk Suffix naming , You can specify multiple
-F fs
Custom field separator , Used to split each record into fields (field). Default is space
-v var=val
stay awk Assign values to custom variables before the program runs . This value can be used BEGIN Block of ACTION In the part .
-d[file]
Will some awk The final values of the program's built-in variables are sorted into a list and printed into a file ( It is generated in the same directory by default awkvars.out file , Of course, you can also directly specify )
-h
--help
get help
-P
To be added
-S
To be added
-V
Display version
2.1 -f program-file
The specification contains awk The document of the order , Not read from command line arguments . Can pass -f Option specifies multiple contains awk Command file .
[[email protected] Lee testawk]$ awk -f fortext.awk text.txt
Country Aera 0 CONTENT
SR 8649 275 Asia
Canada 3852 25 North_America
China 3705 1032 Asia
USA 3615 237 North_America
Brazil 3286 134 South_America
India 1267 746 Asia
Mexico 762 78 North_America
France 211 55 Europe
Japan 144 120 Asia
Germany 96 61 Europe
England 94 56 Europe
we are donw
[[email protected] Lee testawk]$ cat fortext.awk
BEGIN{
FS=" "
printf("%10s %6s %5d %s\n\n","Country","Aera","POP","CONTENT")
}
{
printf("%10s %6d %5d %s\n",$1,$2,$3,$4)
}
END{
printf("\nwe are donw\n")
}
2.2 -F fs
-F Option to customize the field separator , Since we can change FS The value corresponding to the variable to meet the requirements .
fs Is a string or regular expression .
for example : Use awk Program pair passwd Document processing . Pay attention to awk Use variables in programs ( Built-in variables 、 Custom variable ) There is no need to use symbols “$” Refer to the , Using field variables requires symbols "$" Refer to the , This will be mentioned in the next variables section .
[[email protected] Lee testawk]$ awk -F: '{print $1,$3*$4}' /etc/passwd |grep tyson
tyson 1002001
tyson1 1006009
[[email protected] Lee testawk]$ cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
…………………………………………………………………………………………………………
By default awk in FS The value of is a space ,OFMT Is a numeric output format .
# Another built-in variable is printed later to indicate FS There is an output space instead of an empty content
[[email protected] Lee testawk]$ awk 'BEGIN{print FS,OFMT}'
%.6g
2.3 -v var =val
Before the program starts val This value is assigned to var This variable , adopt -v The specified variable can be used for awk programmatic BEGIN In block
In short, it is to assign a user-defined variable .
[[email protected] Lee testawk]$ awk -v testVar=$USER 'BEGIN{print testVar}'
tyson
2.4 -d [file]
--dump-variables
Will some awk The final values of the program's built-in variables are sorted into a list and printed into a file ( It is generated in the same directory by default awkvars.out file , Of course, you can also directly specify )
[[email protected] testawk]$ ll
total 8
-rw-rw-r--. 1 tyson tyson 103 Jan 10 20:36 big
-rw-rw-r--. 1 tyson tyson 304 Jan 10 19:51 text.txt
[[email protected] testawk]$ awk -d '$2>3000{ print $2*$3 }' text.txt
2378475
96300
3823560
856755
440324
[[email protected] testawk]$ ll
total 12
-rw-rw-r--. 1 tyson tyson 298 Jan 12 14:55 awkvars.out
-rw-rw-r--. 1 tyson tyson 103 Jan 10 20:36 big
-rw-rw-r--. 1 tyson tyson 304 Jan 10 19:51 text.txt
[[email protected] testawk]$ cat awkvars.out
ARGC: 2
ARGIND: 1
ARGV: array, 2 elements
BINMODE: 0
CONVFMT: "%.6g"
ERRNO: ""
FIELDWIDTHS: ""
FILENAME: "text.txt"
FNR: 11
FPAT: "[^[:space:]]+"
FS: " "
IGNORECASE: 0
LINT: 0
NF: 4
NR: 11
OFMT: "%.6g"
OFS: " "
ORS: "\n"
RLENGTH: 0
RS: "\n"
RSTART: 0
RT: "\n"
SUBSEP: "\034"
TEXTDOMAIN: "messages"
2.5 -h
--help
obtain awk Program options description
POSIX options: GNU long options: (standard)
-f progfile --file=progfile
-F fs --field-separator=fs
-v var=val --assign=var=val
Short options: GNU long options: (extensions)
-b --characters-as-bytes
-c --traditional
-C --copyright
-d[file] --dump-variables[=file]
-e 'program-text' --source='program-text'
-E file --exec=file
-g --gen-pot
-h --help
-L [fatal] --lint[=fatal]
-n --non-decimal-data
-N --use-lc-numeric
-O --optimize
-p[file] --profile[=file]
-P --posix
-r --re-interval
-S --sandbox
-t --lint-old
-V --version
2.6 -P To be added
2.7 -S To be added
2.8 -V Display version
[[email protected] Lee testawk]$ awk -V
GNU Awk 4.0.2
3、 ... and awk in PROGRAM Explanation
3.1 Easy to understand
awk -d '$2>3000{ print $2*$3 }' text.txt
In this case , Inside the quotation marks is PROGRAM part , That is to say awk Program part of command .
Can be PROGRAM Comprehend sed Medium script,PROGRAM By multiple PATTERN and ACTION form ,PATTERN In some form ( Like regular expressions ) To determine whether the current row is the matching rule of the row I need to process , and ACTION Is the action to be performed after the matching is successful .
3.2 PROGRAM The grammar of
every last PROGRAM Parts are made up of one or more PATTER-ACTION( Pattern - action ) The sequence of components . Different invocation methods have different writing specifications .
#1.PROGRAM Write in a line
awk 'pattern{action};pattern{action};pattern{action}…………' test.txt
#2.PROGRAM It's in the file
#/test/program.awk
patter { action }
patter { action }
………………
patter { action }
patter { action }
# if PROGRAM Write it in the file awk The command needs to specify in this form that it contains awk The document of the order
awk -f program.awk test.txt
3.3 PROGRAM The relationship between the two parts of
parogram in pattern Part and action A part can exist only one of them
action Part omitted , By default, only each line will be print.
pattern Part omitted , It means that every line is matched successfully , I'm going to do it later action(s).
3.4 awk For each line in PROGRAM The process of
Read a line , Execute... On this line pattern-action( That is to say PROGRAM part ).
Scan and search the read rows , Whether there is content in the search line that is pattern matching .
If the line contains a pattern For the hit part, the line is marked as matching success .
The row was matched successfully : perform action part .( If no action is specified, there will be automatic output )
If the match is not successful : Do not perform action Part also does not perform automatic output .
Read next line .
3.5 Detailed explanation PROGRAM in PATTERN part
3.5.1 pattern Function description of
awk programmatic pattern And sed The same goes for addressing , It determines whether the currently read row is a row that can be successfully matched by the specified matching method . That is, only with the specified pattern Only after a successful match will the following... Be executed on this row action, If you omit patter Is to execute for each line action.
3.5.2 And sed Addressing comparison
awk Of pattern Supported rules : Regular expressions 、 Comparison between string and numeric value 、 Flow control statement .
By contrast ,sed in script Only regular expressions are supported in the addressing part of .
3.5.3 Schema representation summary
Commonly used
BEGIN
END
Normal expression
/regular expression/
Regular expressions
Advanced
Relationship expression
Patterns and
Mode or
Select expression
pattern1?pattern2:pattern3
Pattern grouping
(pattern)
Mode reversal
!pattern
Mode range
pattern1,pattern2
Use fewer
BEGINFILE
ENDFILE
3.5.3.4 Mode of BEGIN And END
BEGIN : stay awk To be executed before reading the first line of the first file awk Program ( Before processing the input stream ).
END: stay awk To be executed after processing the last line of the last file awk Program ( After processing all the contents of the input stream ).
3.5.3.4.1 explain
a. They don't match any input lines
Control initialization and finalization , Cannot be combined with other schema expressions
Can't miss action part .
b. If there are multiple BEGIN or END such PATTERN, Then execute in order , It's not parallel .
c. Will usually BEGIN Put it in PROGRAM The beginning of the section ,END Put it in PROGRAM The end of the section .
3.5.3.4.2 Common effects
BEGIN Commonly used to change field delimiters FS.
END Used to output some summary information .
3.5.3.4.3 Case study
First of all, we need to pay attention to several points :
awk Medium action Some use classes c Grammar of language , instead of shell grammar .
There is no need to initialize variables .
Father shell The environment variable of does not apply to awk Program . But if you don't use a variable in quotation marks, the environment variable can still be recognized .
stay PROGRAM Neither built-in variables nor user-defined variables need symbols “$”
[[email protected] Lee testawk]$ echo $PATH
/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/tyson/.local/bin:/home/tyson/bin
[[email protected] Lee testawk]$ echo $USER
tyson
[[email protected] Lee testawk]$ awk 'BEGIN{print $USER}'
[[email protected] Lee testawk]$ awk -v USER=$USER 'BEGIN{print USER}'
tyson
Practical cases
[[email protected] Lee testawk]$ cat fortext.awk
BEGIN{
FS=" "
printf("%10s %6s %5d %s\n\n","Country","Aera","POP","CONTENT")
}
{
printf("%10s %6d %5d %s\n",$1,$2,$3,$4)
}
END{
printf("\nwe are donw\n")
}
[[email protected] Lee testawk]$ cat text.txt
SR 8649 275 Asia
Canada 3852 25 North_America
China 3705 1032 Asia
USA 3615 237 North_America
Brazil 3286 134 South_America
India 1267 746 Asia
Mexico 762 78 North_America
France 211 55 Europe
Japan 144 120 Asia
Germany 96 61 Europe
England 94 56 Europe
[[email protected] Lee testawk]$ awk -f fortext.awk text.txt
Country Aera 0 CONTENT
SR 8649 275 Asia
Canada 3852 25 North_America
China 3705 1032 Asia
USA 3615 237 North_America
Brazil 3286 134 South_America
India 1267 746 Asia
Mexico 762 78 North_America
France 211 55 Europe
Japan 144 120 Asia
Germany 96 61 Europe
England 94 56 Europe
we are donw
3.5.3.5 Common expressions for patterns
In mode , You can use the truth and falseness of ordinary expressions as PATTERN To determine whether the row matches , Such as :$1>$2{print $3}. But in awk In the program , It can be more than just numbers that are used to perform ordinary expression operations , It can also be a string constant 、 An empty string 、 Substring .
also ,awk in PATTERN The normal expression of can also compare strings and numbers ( Cast will be done ).
If the final result is 0 Then the row does not match , If the result is not 0 Then the line matches .
3.5.3.5.1 Comparison operator summary
<
<=
==
be equal to
!=
It's not equal to
>=
>
~
matching
!~
Mismatch
It should be noted that :
In ordinary expressions :~、!~, They are match and mismatch .
In regular expressions : The wave sign does not represent the above two meanings .
3.5.3.5.2 Expression operation summary
operation
Operator
Example
Example values
assignment
= += -= *= /= %= ^=
Conditional expression
?:
x?y;z
x If true, the expression value is y, Instead of z
Logic or 、 Logic and
||、&&
Array member
in
i in a
if a[i] There is , The expression is true
matching
~ !~
$1~/x/
If first field contains characters x Then the expression is true
Relationship between operation
<=、==、>=、!=、>=、>
Splicing
"a""bc"
"abc"
Addition, subtraction, multiplication, division and modulo operation
Monocular plus 、 Monocular subtraction
+ -
-x、+x
Index of operation
^
Self increasing 、 Self reduction
++、--
Field
$
$i+1
1+ The first i Values for fields
Combine
()
($i)++
Give it to i The value of the fields plus 1
Basic knowledge required ( All students should know ):
An expression is a kind of sentence for judging words , Then it must be used for PATTERN To determine whether the row matches . besides ,ACTION Of course, ordinary expressions can also be used in the part , for example {if($1>$2) print $3}.
The priority of operators .
The left-right combination of operators .
Numerical comparison : Compare the size ; String comparison : according to ASCII Code size is compared bit by bit .
The classification of operands :awk Operand fractional constants in : Strings and numbers ; Variable : User defined 、 The built-in 、 Or field .
3.5.3.6 Regular expressions of patterns
3.5.3.6.1/regexpr/
The current row can match regexpr
3.5.3.6.2expression ~ /regexpr/
expression The string value contains a substring that can be regexpr Matching sub characters , The pattern matches .(expression It can be variable constant expression, etc ).
[[email protected] Lee testawk]$ awk '$1~/an/' text.txt
Canada 3852 25 North_America
France 211 55 Europe
Japan 144 120 Asia
Germany 96 61 Europe
England 94 56 Europe
[[email protected] Lee testawk]$ cat text.txt
SR 8649 275 Asia
Canada 3852 25 North_America
China 3705 1032 Asia
USA 3615 237 North_America
Brazil 3286 134 South_America
India 1267 746 Asia
Mexico 762 78 North_America
France 211 55 Europe
Japan 144 120 Asia
Germany 96 61 Europe
England 94 56 Europe
3.5.3.6.3 expression!~/regexpr/
expression The string value does not contain a substring that can be regexpr Matching sub characters , The pattern matches .
for example :
awk '$1 ~ /U/{print $0}' file
awk '$1 !~ /U/{print $0}' file
In two forms , The regular expression does not match the entire line , It is what the previous expression specifies , An example here is the field 1.
[[email protected] Lee testawk]$ awk '$1!~/an/' text.txt
SR 8649 275 Asia
China 3705 1032 Asia
USA 3615 237 North_America
Brazil 3286 134 South_America
India 1267 746 Asia
Mexico 762 78 North_America
3.5.3.7 The compound pattern of patterns
The so-called composite mode , Is through logical operators &&、||、 Not a pattern that combines multiple patterns .
pattern || pattern
pattern && pattern
# Find out the records where the first two fields are both letters or the last two fields are both numbers
[[email protected] Lee testawk]$ awk '$1~/[a-zA-Z]/&&$2~/[a-zA-Z]/||$3~/[0-9]/&&$4~/[0-9]/' testFuhe.txt
a 2 3 4
a b 1 2
a b c d
边栏推荐
- Etcd understanding of microservice architecture
- Leetcode- first unique character in string - simple
- Tongweb customs clearance guidelines
- Leetcode- hex number - simple
- Tongweb adapts to openrasp
- Nacos series registry principle and source code analysis
- 2021.9.29 learning log restful architecture
- August 15, 2021 another week
- How to Algorithm Evaluation Methods
- Sentinel series introduction to service flow restriction
猜你喜欢
Source code analysis of ArrayList
19 calling subprocess (callactivity) of a flowable task
软件测试——接口常见问题汇总
How to Algorithm Evaluation Methods
3. Postman easy to use
Shardingsphere JDBC exception: no table route info
What happens when the MySQL union index ABC encounters a "comparison operator"?
Byte buddy print execution time and method link tracking
Service fusing and degradation of Note Series
Getclassloader() returns null, getclassloader() gets null
随机推荐
Sentinel series integrates Nacos and realizes dynamic flow control
Tongweb adapts to openrasp
Leetcode fizz buzz simple
Service fusing and degradation of Note Series
Unity游戏优化(第2版)学习记录7
ffmpeg 下载后缀为.m3u8的视频文件
No assessment summary
AUTOSAR actual combat tutorial pdf version
One of PowerShell optimizations: prompt beautification
The problem of distinguishing and sharing sessions for multiple applications in tongweb
Let's talk about how ArrayList is dynamically resized and what kind of mechanism is it?
Sentinel series introduction to service flow restriction
Why do so many people hate a-spice
2021.9.29学习日志-Restful架构
Vagrant virtual machine installation, disk expansion and LAN access tutorial
19 calling subprocess (callactivity) of a flowable task
Mongodb multi field aggregation group by
移动端适配方案
Conf/tongweb Functions of properties
16 the usertask of a flowable task includes task assignment, multi person countersignature, and dynamic forms