当前位置：网站首页>Baidu online AI competition - image processing challenge: the 8th program of handwriting erasure

Baidu online AI competition - image processing challenge: the 8th program of handwriting erasure

2022-07-06 05:48:00 【Python's path to immortality】

The eighth place solution in the competition

Baidu online game II , Handwritten text erasure

Thank Baidu for organizing the competition , Thank the team members for their joint efforts

Special thanks Jordan2020 Open source be based on MTRNet++ Realize image and text erasure , Ranking score 0.55599 programme , It has benefited a lot .

One 、 Algorithm is introduced

It is said that , Choose a good baseline It's half done （ Manual formation ）, When you get the dataset , We use it Pixel2Pixel,CycleGAN,EnsNet,MTRNet++ Tested , Finally, I chose MTRNet++ As baseline.
The reason is that ： ~~Those with a plus sign are generally stronger , This model has two plus signs~~

End2End, Good integration ;
The model is small , The effect is not bad , There is magic space ;
Network design is closer to the task of this competition .

Network architecture ：

from MTRNet++

Changed to

Please consider dis

Magic change ideas ：

Remove input mask.MTRNet++ Need to enter mask As the fourth channel , ~~How inconvenient it is , Let's throw it away~~ If the full picture is adopted 1 Method of filling , I feel this data is redundant , But if you want to pass in high-precision segmentation results , How to get accurate Mask It's a challenge in itself . therefore , We deleted the fourth channel , Based on transformer Of segformer Substitute network generation msdk Part of , Delete A1-A4 Door structure , Make the network loosely coupled , Easy to separate 、 tuning （ See the practice tuning section in Section III for details ）.
Fix area focus . In the original , Pre generated pictures （ The picture in the middle , after GCI Generated graph after ） After cmp（mask Select the white area to generate image content , Select the content of the original image in the black area , Superimpose to generate a new picture ） operation , We introduce this mechanism to the next iteration , Finally, the generated image also goes through cmp After the operation , And again GT do loss operation , The gradient of the network generated in this way almost all comes from mask part , Instead of mask Some are basically ignored . We think this training is faster , better .

Two 、 Data to enhance

The troops did not move , Gateway leading . This thing , It's rations .

Data cleaning

Measurement professionals, we get the map , The first reaction is always ： Where is the error ？

After the experiment, we think ： The difference between the two pictures , Different by the system 、 Artificial differences 、 The differences caused by handwritten words together cause .

Above picture , The pictures are arranged in the following order ：

Original picture Truth value threshold 1 threshold 5

threshold 10 threshold 15 threshold 25

Analysis conclusion ：

The threshold for 1 A large number of pixels appear , This is partly due to data storage compression 、 Interpolation and other reasons , Identified as system differences （ Not a computer major , There may be a problem with the presentation , Please understand , Please tap ）
The threshold for 5-25 Basically, handwritten words are visible , But the noise is different , Because some operators wiped out the noisy area during the processing , Some operators did not erase （ Especially between the lines , Places that are not easy to deal with ）
The difference caused by handwriting , This is what we should really pay attention to

After data processing , Retain the 2、3 Two kinds of differences , The first kind uses cmp The way is removed .

Enhanced expansion

Combine this data set with the public data set to form handwritten text materials

For parentheses 、 Circle and other error recognition areas form material

MixUp

Simple and crude , And effective

3、 ... and 、 Practice tuning

Code implementation and practice tuning ：

After the above work , Start network training , There are two problems ：

During multi task training ,mask The effect of task generation is not ideal ;
In the process of forecasting , If yes mask Make enhanced predictions , It is helpful to improve the accuracy , But it is difficult to realize in a network .

Theory belongs to theory , Practice belongs to practice , Accuracy is the most important . Model dismemberment , Elegance is not （ This is also the reason for the previous emphasis on loose coupling ）.

We divide the network into two parts

The first piece is used directly PaddleSeg Generation network , Train alone segformerB2
Another piece trains the rest of the network , Including countermeasure network generator and discriminator

Use 512 512 A window the size of ,256 256 Step size for sliding prediction , Generate the whole big picture mask（ Good sliding effect with repeated areas ）; Then erase the content of the corresponding area in the picture , With 512X512 The size is input into the subsequent network for prediction .

Four 、 Code Introduction - Introduction to training recurrence

#1. Prepare the ingredients ： Decompress the data set of the game 
!unzip data/data126591/dehw_testB_dataset.zip -d data/ >>/dev/null
!unzip data/data126591/dataset1.zip -d data/ >>/dev/null
!unzip data/data126591/dataset2.zip -d data/ >>/dev/null

#2. The preparation of the instruments 
!pip install scikit_image -q
!pip install paddleseg -q

#3. Start the first part of the training 
!python work/train/train_seg.py

#4. Start the second part of the training 
!python work/train/train.py

5、 ... and 、 Code Introduction - Introduction to prediction code

This section contains the prediction result code , Operation mode ：

1、B List image data into /dehw_testB_dataset Under the folder （ Retain , Direct operation ）;

2、 function 1_predict_segformer.py file , stay segoutput The result of semantic segmentation of handwritten words will appear in the folder , efficiency ：4s/step;

3、 After the second step is completed , function 2_predictgan.py file , efficiency ：4s/step;

4、 The final submission result appears in submit In the folder .

#1. Big picture of prediction mask
!python work/pre_and_submit/1_predict_segformer.py

segoutput Generate semantic segmentation results under the path . As shown in the figure below ：

#3. Handwriting gland information repair 
!python work/pre_and_submit/2_predictgan.py

submit Get the prediction results under the folder ：

#4. Compressed file submission score 
%cd submit/
!zip result.zip *.png *.txt

6、 ... and 、 thank

Thank you again ：

Thank Baidu for organizing the competition , Provide opportunities and stage

Thank the team members for their joint efforts , Thank Yang Libo for enduring 23：00 The bad habit of holding seminars , Thanks for Shen Chen 24 Hours 5 Restart the service in minutes , Thank Zhai Xuekui for sharing the firepower , Thank you for your data support 、 Cooperative support .

Special thanks ：

Jordan2020 Open source be based on MTRNet++ Realize image and text erasure , Ranking score 0.55599 programme , It has benefited a lot .

原网站

版权声明
本文为[Python's path to immortality]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202132041411155.html