当前位置:网站首页>RI Geng series: write a simple shell script, but it seems to have technical content
RI Geng series: write a simple shell script, but it seems to have technical content
2022-06-24 03:08:00 【mariolu】
One 、 Consistency comparison :
Recently in the reconstruction algorithm rerank modular . Basically rewritten the code . The first point of refactoring is to build test tools , That is, every time you change a line of code , Run down the entire test case set . For refactoring , It is to ensure that the business logic is consistent with that before , With the same input flow ( Here is the advertisement app Of ), after rarank service , The output should be consistent . Of course, this flow is the flow from the real online bypass . So we can run for hours , Ensure data consistency from tens of thousands of requests .
Two 、 preparation
We will leave the concerned data on the disk . Each request concerns only three elements : advertisement app id,CTR fraction ,CVR fraction . Here we use a request for a file . A request so that there is 4 File , Namely : Request and response of the module before modification , Request and response of modified module . Of course, in order to exclude detailed information , A complete request will be left json, This request json Contains a lot of internal information : Advertising bidding mode , Price , originality id etc. . This is mainly for data analysis , Analyze the cause of the inconsistency .
3、 ... and 、 Start the script journey
3.1 Enumerate inconsistent requests
The dropped files are similar to this , A request id Yes 5 Documents . This directory stores tens of thousands of requested files .
The goal is for the same request id Compare two different files of . The comparison tool used here is icdiff.icdiff yes github Open source project , Than GNU The one I brought with me diff More human . This is not the point of this article , Jump first .
First, you need to generate... Based on the file name icdiff command . One line command for a request id do icdiff.
3.1.1 Generate request file md5
Here we use the conventional find command , Enumerate all files in the folder , And use xargs Tools , Pass the file name as an input parameter to md5sum Make consistency comparison .
find /tmp/consistent-check/ -type f | xargs -n 1 md5sum > /tmp/check_consistent.result
Each file comes with md5sum.
3.1.2 Then filter the requests from the old and new modules
The conventional... Is used here grep and sort Filtering and sorting .
3.1.3 Use icdiff Contrast
This makes it easy to see which requests id atypism
3.1.4 The next step of pretreatment
Before proceeding to the next step , Some pretreatment is needed , From the picture output here, you can see , It contains colors , also --- accord with , We should get rid of all these .
Google searched for the next reliable removal A method of displaying characters without printing .https://stackoverflow.com/questions/3844311/how-do-i-replace-or-find-non-printable-characters-in-vim-regex.
sed 's/[^[:print:]\r\t]//g'
Take a closer look , Are there any extra characters , If there is , Keep using sed Deal with .
3.1.5 Inconsistent statistics
Because of inconsistent requests id If there is more data . So first we have to analyze , Categorize these inconsistent requests .
It's not the same size . For example, a request id There is an inconsistency in the lines of .
But because there are so many files , We can't print it manually every time .
Then it will be used naturally for loop sentence , What I use here is fish shell, If it is bash shell There should be something similar .
set LINE (wc -l /tmp/check_consistent.result.compare.stage8 | awk '{print $1}')
for i in (seq $LINE);echo "handle single req file";end;Then we read out the previously sorted commands from the file in turn , Execute sequentially .
How to write a specific line file after reading it , The most common ones are sed, awk
Here are some ways .
https://linuxhandbook.com/display-specific-lines/ I used to use sed, But found sed Read for Who broke in idnex value , Can not resolve , Use... In the back awk This command yes ok Of
for i in (seq $LINE);awk "NR==$i" /tmp/check_consistent.result.compare.stage8;awk "NR==$i" /tmp/check_consistent.result.compare.stage8 | bash | grep -v "\-\-\-" | wc -l;end
边栏推荐
- LeetCode 205. Isomorphic Strings
- What does operation and maintenance audit fortress mean? How to select the operation and maintenance audit fortress machine?
- Why do cloud desktops use rack servers? Why choose cloud desktop?
- Tstor onecos, focusing on a large number of object scenes
- Cloud function pressure measurement based on wechat applet
- What is the case when easynvr non administrator logs in to view the empty video list?
- The most comprehensive arrangement of safe operation solutions from various manufacturers
- Tencent Youtu won the champion of iccv2021 LVIs challenge workshop and the best innovation award of the project
- Cloud call: one line of code is directly connected to wechat open interface capability
- [Tencent cloud] how can the MySQL database on the cloud fully back up the data base script?
猜你喜欢
![[51nod] 2653 section XOR](/img/2d/cb4bf4e14939ce432cac6d35b6a41b.jpg)
[51nod] 2653 section XOR

2022-2028 global indoor pressure monitor and environmental monitor industry research and trend analysis report

What is etcd and its application scenarios

2022-2028 global aircraft front wheel steering system industry research and trend analysis report

The cost of on-site development of software talent outsourcing is higher than that of software project outsourcing. Why

Simple and beautiful weather code

2022-2028 global marine clutch industry research and trend analysis report

2022-2028 global tungsten copper alloy industry research and trend analysis report

2022-2028 global pilot night vision goggle industry research and trend analysis report

IOS development - multithreading - thread safety (3)
随机推荐
Gin framework: adding swagger UI
Windowsvpn client is coveted by vulnerabilities, 53% of companies face supply chain attacks | global network security hotspot
How to set up a cloud desktop server? Is there a charge for cloud desktop server setup?
VNC enters the password and goes around for a long time before entering the desktop. Use procmon to locate the reason
Cloud call: one line of code is directly connected to wechat open interface capability
2022-2028 global anti counterfeiting label industry research and trend analysis report
2022-2028 global portable two-way radio equipment industry research and trend analysis report
LeetCode 1323. Maximum number of 6 and 9
[51nod] 2102 or minus and
How to register a trademark? Is the process troublesome?
Building a web site -- whether to rent or host a server
Tstor onecos, focusing on a large number of object scenes
Which domestic cloud desktop server is good? What are the security guarantees for cloud desktop servers?
Micro build low code enterprise exchange day · Shenzhen station opens registration
Tencent cloud CVM starts IPv6
2022-2028 global pilot night vision goggle industry research and trend analysis report
How to log in the remote server of Fortress machine working principle of Fortress machine
Instructions for performance pressure test tool
How to build a shopping website? What problems should be paid attention to in the construction of shopping websites?
Vscode common shortcut keys, updating