当前位置:网站首页>RI Geng series: write a simple shell script, but it seems to have technical content

RI Geng series: write a simple shell script, but it seems to have technical content

2022-06-24 03:08:00 mariolu

One 、 Consistency comparison :

Recently in the reconstruction algorithm rerank modular . Basically rewritten the code . The first point of refactoring is to build test tools , That is, every time you change a line of code , Run down the entire test case set . For refactoring , It is to ensure that the business logic is consistent with that before , With the same input flow ( Here is the advertisement app Of ), after rarank service , The output should be consistent . Of course, this flow is the flow from the real online bypass . So we can run for hours , Ensure data consistency from tens of thousands of requests .

Two 、 preparation

We will leave the concerned data on the disk . Each request concerns only three elements : advertisement app id,CTR fraction ,CVR fraction . Here we use a request for a file . A request so that there is 4 File , Namely : Request and response of the module before modification , Request and response of modified module . Of course, in order to exclude detailed information , A complete request will be left json, This request json Contains a lot of internal information : Advertising bidding mode , Price , originality id etc. . This is mainly for data analysis , Analyze the cause of the inconsistency .

3、 ... and 、 Start the script journey

3.1 Enumerate inconsistent requests

The dropped files are similar to this , A request id Yes 5 Documents . This directory stores tens of thousands of requested files .

The goal is for the same request id Compare two different files of . The comparison tool used here is icdiff.icdiff yes github Open source project , Than GNU The one I brought with me diff More human . This is not the point of this article , Jump first .

First, you need to generate... Based on the file name icdiff command . One line command for a request id do icdiff.

3.1.1 Generate request file md5

Here we use the conventional find command , Enumerate all files in the folder , And use xargs Tools , Pass the file name as an input parameter to md5sum Make consistency comparison .

find /tmp/consistent-check/ -type f | xargs -n 1 md5sum > /tmp/check_consistent.result

Each file comes with md5sum.

3.1.2 Then filter the requests from the old and new modules

The conventional... Is used here grep and sort Filtering and sorting .

3.1.3 Use icdiff Contrast

This makes it easy to see which requests id atypism

3.1.4 The next step of pretreatment

Before proceeding to the next step , Some pretreatment is needed , From the picture output here, you can see , It contains colors , also --- accord with , We should get rid of all these .

Google searched for the next reliable removal A method of displaying characters without printing .https://stackoverflow.com/questions/3844311/how-do-i-replace-or-find-non-printable-characters-in-vim-regex.

 sed 's/[^[:print:]\r\t]//g'

Take a closer look , Are there any extra characters , If there is , Keep using sed Deal with .

3.1.5 Inconsistent statistics

Because of inconsistent requests id If there is more data . So first we have to analyze , Categorize these inconsistent requests .

It's not the same size . For example, a request id There is an inconsistency in the lines of .

But because there are so many files , We can't print it manually every time .

Then it will be used naturally for loop sentence , What I use here is fish shell, If it is bash shell There should be something similar .

set LINE (wc -l /tmp/check_consistent.result.compare.stage8 | awk '{print $1}')
for i in (seq $LINE);echo "handle single req file";end;

Then we read out the previously sorted commands from the file in turn , Execute sequentially .

How to write a specific line file after reading it , The most common ones are sed, awk

Here are some ways .

https://linuxhandbook.com/display-specific-lines/ I used to use sed, But found sed Read for Who broke in idnex value , Can not resolve , Use... In the back awk This command yes ok Of

for i in (seq $LINE);awk "NR==$i" /tmp/check_consistent.result.compare.stage8;awk "NR==$i" /tmp/check_consistent.result.compare.stage8 | bash | grep -v "\-\-\-" | wc -l;end
原网站

版权声明
本文为[mariolu]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/10/20211017050104809n.html