当前位置:网站首页>Baidu online AI competition - image processing challenge: the 8th program of handwriting erasure
Baidu online AI competition - image processing challenge: the 8th program of handwriting erasure
2022-07-06 05:48:00 【Python's path to immortality】
The eighth place solution in the competition
Baidu online game II , Handwritten text erasure
Thank Baidu for organizing the competition , Thank the team members for their joint efforts
Special thanks Jordan2020 Open source be based on MTRNet++ Realize image and text erasure , Ranking score 0.55599 programme , It has benefited a lot .
One 、 Algorithm is introduced
It is said that , Choose a good baseline It's half done ( Manual formation ), When you get the dataset , We use it Pixel2Pixel,CycleGAN,EnsNet,MTRNet++ Tested , Finally, I chose MTRNet++ As baseline.
The reason is that : Those with a plus sign are generally stronger , This model has two plus signs
- End2End, Good integration ;
- The model is small , The effect is not bad , There is magic space ;
- Network design is closer to the task of this competition .
Network architecture :
from MTRNet++

Changed to

Please consider dis
Magic change ideas :
Remove input mask.MTRNet++ Need to enter mask As the fourth channel ,
How inconvenient it is , Let's throw it awayIf the full picture is adopted 1 Method of filling , I feel this data is redundant , But if you want to pass in high-precision segmentation results , How to get accurate Mask It's a challenge in itself . therefore , We deleted the fourth channel , Based on transformer Of segformer Substitute network generation msdk Part of , Delete A1-A4 Door structure , Make the network loosely coupled , Easy to separate 、 tuning ( See the practice tuning section in Section III for details ).Fix area focus . In the original , Pre generated pictures ( The picture in the middle , after GCI Generated graph after ) After cmp(mask Select the white area to generate image content , Select the content of the original image in the black area , Superimpose to generate a new picture ) operation , We introduce this mechanism to the next iteration , Finally, the generated image also goes through cmp After the operation , And again GT do loss operation , The gradient of the network generated in this way almost all comes from mask part , Instead of mask Some are basically ignored . We think this training is faster , better .
Two 、 Data to enhance
The troops did not move , Gateway leading . This thing , It's rations .
- Data cleaning
Measurement professionals, we get the map , The first reaction is always : Where is the error ?
After the experiment, we think : The difference between the two pictures , Different by the system 、 Artificial differences 、 The differences caused by handwritten words together cause .
Above picture , The pictures are arranged in the following order :
Original picture Truth value threshold 1 threshold 5
threshold 10 threshold 15 threshold 25
Analysis conclusion :
- The threshold for 1 A large number of pixels appear , This is partly due to data storage compression 、 Interpolation and other reasons , Identified as system differences ( Not a computer major , There may be a problem with the presentation , Please understand , Please tap )
- The threshold for 5-25 Basically, handwritten words are visible , But the noise is different , Because some operators wiped out the noisy area during the processing , Some operators did not erase ( Especially between the lines , Places that are not easy to deal with )
- The difference caused by handwriting , This is what we should really pay attention to
After data processing , Retain the 2、3 Two kinds of differences , The first kind uses cmp The way is removed .
Enhanced expansion
- Combine this data set with the public data set to form handwritten text materials

- For parentheses 、 Circle and other error recognition areas form material

- MixUp

Simple and crude , And effective
3、 ... and 、 Practice tuning
Code implementation and practice tuning :
After the above work , Start network training , There are two problems :
- During multi task training ,mask The effect of task generation is not ideal ;
- In the process of forecasting , If yes mask Make enhanced predictions , It is helpful to improve the accuracy , But it is difficult to realize in a network .
Theory belongs to theory , Practice belongs to practice , Accuracy is the most important . Model dismemberment , Elegance is not ( This is also the reason for the previous emphasis on loose coupling ).
We divide the network into two parts
- The first piece is used directly PaddleSeg Generation network , Train alone segformerB2
- Another piece trains the rest of the network , Including countermeasure network generator and discriminator
Use 512 512 A window the size of ,256 256 Step size for sliding prediction , Generate the whole big picture mask( Good sliding effect with repeated areas ); Then erase the content of the corresponding area in the picture , With 512X512 The size is input into the subsequent network for prediction .
Four 、 Code Introduction - Introduction to training recurrence
#1. Prepare the ingredients : Decompress the data set of the game
!unzip data/data126591/dehw_testB_dataset.zip -d data/ >>/dev/null
!unzip data/data126591/dataset1.zip -d data/ >>/dev/null
!unzip data/data126591/dataset2.zip -d data/ >>/dev/null
#2. The preparation of the instruments
!pip install scikit_image -q
!pip install paddleseg -q
#3. Start the first part of the training
!python work/train/train_seg.py
#4. Start the second part of the training
!python work/train/train.py
5、 ... and 、 Code Introduction - Introduction to prediction code
This section contains the prediction result code , Operation mode :
1、B List image data into /dehw_testB_dataset Under the folder ( Retain , Direct operation );
2、 function 1_predict_segformer.py file , stay segoutput The result of semantic segmentation of handwritten words will appear in the folder , efficiency :4s/step;
3、 After the second step is completed , function 2_predictgan.py file , efficiency :4s/step;
4、 The final submission result appears in submit In the folder .
#1. Big picture of prediction mask
!python work/pre_and_submit/1_predict_segformer.py
segoutput Generate semantic segmentation results under the path . As shown in the figure below :

#3. Handwriting gland information repair
!python work/pre_and_submit/2_predictgan.py
submit Get the prediction results under the folder :

#4. Compressed file submission score
%cd submit/
!zip result.zip *.png *.txt
6、 ... and 、 thank
Thank you again :
Thank Baidu for organizing the competition , Provide opportunities and stage
Thank the team members for their joint efforts , Thank Yang Libo for enduring 23:00 The bad habit of holding seminars , Thanks for Shen Chen 24 Hours 5 Restart the service in minutes , Thank Zhai Xuekui for sharing the firepower , Thank you for your data support 、 Cooperative support .
Special thanks :
Jordan2020 Open source be based on MTRNet++ Realize image and text erasure , Ranking score 0.55599 programme , It has benefited a lot .
边栏推荐
- What preparations should be made for website server migration?
- Construction of yolox based on paste framework
- Vulhub vulnerability recurrence 72_ uWSGI
- The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
- Hongliao Technology: how to quickly improve Tiktok store
- Promise summary
- Algorithm -- climbing stairs (kotlin)
- 01. Project introduction of blog development project
- What impact will frequent job hopping have on your career?
- Redis message queue
猜你喜欢

Practice sharing: how to safely and quickly migrate from CentOS to openeuler

Easy to understand IIC protocol explanation

ArcGIS application foundation 4 thematic map making

Vulhub vulnerability recurrence 73_ Webmin

Leetcode 701 insertion operation in binary search tree -- recursive method and iterative method

Download, install and use NVM of node, and related use of node and NRM
![[experience] install Visio on win11](/img/f5/42bd597340d0aed9bfd13620bb0885.png)
[experience] install Visio on win11

B站刘二大人-线性回归及梯度下降

c语言——冒泡排序

Li Chuang EDA learning notes 12: common PCB board layout constraint principles
随机推荐
LeetCode_ String inversion_ Simple_ 557. Reverse word III in string
Station B Liu Erden - linear regression and gradient descent
Web Security (V) what is a session? Why do I need a session?
29io stream, byte output stream continue write line feed
[imgui] unity MenuItem shortcut key
Garbage collector with serial, throughput priority and response time priority
华为路由器忘记密码怎么恢复
Cannot build artifact 'test Web: War expanded' because it is included into a circular depend solution
How can large websites choose better virtual machine service providers?
Li Chuang EDA learning notes 12: common PCB board layout constraint principles
Rustdesk builds its own remote desktop relay server
B站刘二大人-线性回归及梯度下降
Station B Liu Erden softmx classifier and MNIST implementation -structure 9
First knowledge database
【华为机试真题详解】统计射击比赛成绩
Sword finger offer II 039 Maximum rectangular area of histogram
My 2021
Demander le Code de texte standard correspondant à un centre de travail dans l'ordre de production
Problems encountered in installing mysql8 on MAC
Anti shake and throttling are easy to understand