当前位置:网站首页>Awk implements SQL like join operation
Awk implements SQL like join operation
2020-11-08 16:12:00 【xindoo】
awk、grep and sed go by the name of linux Three swordsmen , in fact grep and awk I also use it in my daily work (sed Use less ), Maybe some people are right about awk Less understanding , Let me give you a general introduction to . Many people think awk It's just a text processing tool , In fact, they use it the same way . But it's actually a language , Have mathematical operators 、 Process control statements , It even encapsulates many built-in variables and functions for text processing , This makes it powerful in text processing . If grep Can only do data filtering , that awk It can also process data 、 Analyze and even generate reports , After all, it's a complete programming language .
Because this article is not awk Getting started with , If you want to get started, I recommend Ruan Yifeng awk Introductory tutorial and Left ear mouse AWK A concise tutorial .
Back to my topic today , Today I'd like to show you something I use a lot awk Scene . Because we do back-end development , Often when doing data analysis, there will be such a problem ,1. Facing hundreds of thousands of data , We need to screen out hundreds and thousands of specific key The data of .2. For these millions of data , Directed against id Fields complement other fields .
There may be mastery of excel My classmates jumped out and said “ Is this ,so easy,vlookup That's it !” , in fact ,excel It does solve the problem , But it's a little heavy , Even sometimes we can't use it on the server excel. What other ways ? For the two scenes I'm talking about , Actually, think about it , Is it right? sql Two watches in Chinese join Can solve the problem . In fact, you don't really need to put files in the database , Only need to use awk One order can solve .
example
Let's take the question like , Suppose there are two files ,score.txt Save the student number + Performance data , Another one name.txt Save the student number + Name data , Now you want to know how many points everyone has baked .
score.txt
id score
1 87
2 67
3 68
4 75
5 90
6 100
7 0
name.txt
id name
1 Zhang San
2 Li Si
3 Wang Wu
4 Zhao er
5 Lennon
6 Big bear
You want a copy with a student number Name and grade data , It looks like this .
id score name
1 87 Zhang San
2 67 Li Si
3 68 Wang Wu
4 75 Zhao er
5 90 Lennon
6 100 Big bear
7 0
use awk How easy it is to generate such data ? Just one line of code , You can save name.txt and score.txt, And then execute the following command to try .
awk 'ARGV[1]==FILENAME {map[$1]=$2} ARGV[2]==FILENAME {print $0, map[$1]}' name.txt score.txt
It's simple name.txt and score.txt stay id Above right join.
Explain the code above ,ARGV and FILENAME yes awk Built in variables ,ARGV It's stored in awk List of accepted parameters , Like above ARGV[1] Namely "name.txt",ARGV[2] Namely "score.txt".awk It's line oriented , So for each row of data ARGV[1]==FILENAME {map[$1]=$2} ARGV[2]==FILENAME {print $0, map[$1]}
, Each row of data belongs to a file ,FILENAME Identify the file name to which the current line belongs , Like in brackets {}
Before ARGV[1]==FILENAME
You can think of it as in other grammars conditional , You can think of it as omitting if, But and if The function of is consistent .
So the meaning of the above code is If the current line is input name.txt Of , Just save the student number and name map in (awk You don't have to pre declare the variables in ). If the current line belongs to score.txt, From map Find out the name , And then output the data .
Conclusion
awk 'ARGV[1]==FILENAME {map[$1]=$2} ARGV[2]==FILENAME {print $0, map[$1]}' name.txt score.txt
For different data , Just adjust $ The specific value after , You can use different columns as key To do it join, It's not like sql Of join complex , But in linux It's very convenient on the server . On the Internet, I just realized right join, If in print $0, map[$1]
with if (length(map[$1]) > 0)
Can be realized inner join.left join You just need to change the file name .
I know that ,awk Realize the intersection of multiple files 、 The difference set and other operations are no exception . in addition Don't forget awk It's also a programming language , So it can be very complex , You can put the code in a file and use -f The parameters are set up , For example, my teacher used to use awk Do some simple statistical work , For example, calculate the mean value 、 The sum of the …… awk It can be called The back-end engineer is a sharp tool to improve efficiency .
This article is from https://blog.csdn.net/xindoo
版权声明
本文为[xindoo]所创,转载请带上原文链接,感谢
边栏推荐
- [open source]. Net uses ORM to access Huawei gaussdb database
- 区块链周报:数字货币发展写入十四五规划;拜登邀请MIT数字货币计划高级顾问加入总统过渡团队;委内瑞拉推出国营加密交易所
- Google's AI model, which can translate 101 languages, is only one more than Facebook
- awk实现类sql的join操作
- “他,程序猿,35岁,被劝退”:不要只懂代码,会说话,胜过10倍默默努力
- 喜获蚂蚁offer,定级p7,面经分享,万字长文带你走完面试全过程
- RabbitMQ之Helloworld
- Comics: looking for the best time to buy and sell stocks
- Mac环境安装Composer
- 3、 The parameters of the function
猜你喜欢
Is there no way out for older programmers?
阿里云视频云技术专家 LVS 演讲全文:《“云端一体”的智能媒体生产制作演进之路》
Essential for back-end programmers: distributed transaction Basics
GopherChina 2020大会
浅谈,盘点历史上有哪些著名的电脑病毒,80%的人都不知道!
2020-11-05
[开源] .Net 使用 ORM 访问 华为GaussDB数据库
Alibaba cloud accelerates its growth and further consolidates its leading edge
How to solve the difference between NAT IP and port IP
Huawei has an absolute advantage in the 5g mobile phone market, and the market share of Xiaomi is divided by the market survey organization
随机推荐
svg究竟是什么?
模板引擎的整理归纳
基于阿里云日志服务快速打造简版业务监控看板
jsliang 求职系列 - 07 - Promise
关于adb连接手机offline的问题解决
华为在5G手机市场占据绝对优势,市调机构对小米的市占出现分歧
我们做了一个医疗版MNIST数据集,发现常见AutoML算法没那么好用
Using k3s to create local development cluster
read文件一个字节实际会发生多大的磁盘IO?
VIM configuration tutorial + source code
STM32CubeIDE下载安装-GPIO基本配置操作-Debug调试(基于CMSIS DAP Debug)
What is the database paradigm
Returning to the third place in the world, what did Xiaomi do right?
Drink soda, a bottle of soda water 1 yuan, two empty bottles can change a bottle of soda, give 20 yuan, how much soda can you
Eight ways to optimize if else code
Build simple business monitoring Kanban based on Alibaba cloud log service
使用K3S创建本地开发集群
金融领域首个开源中文BERT预训练模型,熵简科技推出FinBERT 1.0
I used Python to find out all the people who deleted my wechat and deleted them automatically
学习记录并且简单分析