当前位置:网站首页>Baidu online AI competition - image processing challenge: the 8th program of handwriting erasure
Baidu online AI competition - image processing challenge: the 8th program of handwriting erasure
2022-07-06 05:48:00 【Python's path to immortality】
The eighth place solution in the competition
Baidu online game II , Handwritten text erasure
Thank Baidu for organizing the competition , Thank the team members for their joint efforts
Special thanks Jordan2020 Open source be based on MTRNet++ Realize image and text erasure , Ranking score 0.55599 programme , It has benefited a lot .
One 、 Algorithm is introduced
It is said that , Choose a good baseline It's half done ( Manual formation ), When you get the dataset , We use it Pixel2Pixel,CycleGAN,EnsNet,MTRNet++ Tested , Finally, I chose MTRNet++ As baseline.
The reason is that : Those with a plus sign are generally stronger , This model has two plus signs
- End2End, Good integration ;
- The model is small , The effect is not bad , There is magic space ;
- Network design is closer to the task of this competition .
Network architecture :
from MTRNet++

Changed to

Please consider dis
Magic change ideas :
Remove input mask.MTRNet++ Need to enter mask As the fourth channel ,
How inconvenient it is , Let's throw it awayIf the full picture is adopted 1 Method of filling , I feel this data is redundant , But if you want to pass in high-precision segmentation results , How to get accurate Mask It's a challenge in itself . therefore , We deleted the fourth channel , Based on transformer Of segformer Substitute network generation msdk Part of , Delete A1-A4 Door structure , Make the network loosely coupled , Easy to separate 、 tuning ( See the practice tuning section in Section III for details ).Fix area focus . In the original , Pre generated pictures ( The picture in the middle , after GCI Generated graph after ) After cmp(mask Select the white area to generate image content , Select the content of the original image in the black area , Superimpose to generate a new picture ) operation , We introduce this mechanism to the next iteration , Finally, the generated image also goes through cmp After the operation , And again GT do loss operation , The gradient of the network generated in this way almost all comes from mask part , Instead of mask Some are basically ignored . We think this training is faster , better .
Two 、 Data to enhance
The troops did not move , Gateway leading . This thing , It's rations .
- Data cleaning
Measurement professionals, we get the map , The first reaction is always : Where is the error ?
After the experiment, we think : The difference between the two pictures , Different by the system 、 Artificial differences 、 The differences caused by handwritten words together cause .
Above picture , The pictures are arranged in the following order :
Original picture Truth value threshold 1 threshold 5
threshold 10 threshold 15 threshold 25
Analysis conclusion :
- The threshold for 1 A large number of pixels appear , This is partly due to data storage compression 、 Interpolation and other reasons , Identified as system differences ( Not a computer major , There may be a problem with the presentation , Please understand , Please tap )
- The threshold for 5-25 Basically, handwritten words are visible , But the noise is different , Because some operators wiped out the noisy area during the processing , Some operators did not erase ( Especially between the lines , Places that are not easy to deal with )
- The difference caused by handwriting , This is what we should really pay attention to
After data processing , Retain the 2、3 Two kinds of differences , The first kind uses cmp The way is removed .
Enhanced expansion
- Combine this data set with the public data set to form handwritten text materials

- For parentheses 、 Circle and other error recognition areas form material

- MixUp

Simple and crude , And effective
3、 ... and 、 Practice tuning
Code implementation and practice tuning :
After the above work , Start network training , There are two problems :
- During multi task training ,mask The effect of task generation is not ideal ;
- In the process of forecasting , If yes mask Make enhanced predictions , It is helpful to improve the accuracy , But it is difficult to realize in a network .
Theory belongs to theory , Practice belongs to practice , Accuracy is the most important . Model dismemberment , Elegance is not ( This is also the reason for the previous emphasis on loose coupling ).
We divide the network into two parts
- The first piece is used directly PaddleSeg Generation network , Train alone segformerB2
- Another piece trains the rest of the network , Including countermeasure network generator and discriminator
Use 512 512 A window the size of ,256 256 Step size for sliding prediction , Generate the whole big picture mask( Good sliding effect with repeated areas ); Then erase the content of the corresponding area in the picture , With 512X512 The size is input into the subsequent network for prediction .
Four 、 Code Introduction - Introduction to training recurrence
#1. Prepare the ingredients : Decompress the data set of the game
!unzip data/data126591/dehw_testB_dataset.zip -d data/ >>/dev/null
!unzip data/data126591/dataset1.zip -d data/ >>/dev/null
!unzip data/data126591/dataset2.zip -d data/ >>/dev/null
#2. The preparation of the instruments
!pip install scikit_image -q
!pip install paddleseg -q
#3. Start the first part of the training
!python work/train/train_seg.py
#4. Start the second part of the training
!python work/train/train.py
5、 ... and 、 Code Introduction - Introduction to prediction code
This section contains the prediction result code , Operation mode :
1、B List image data into /dehw_testB_dataset Under the folder ( Retain , Direct operation );
2、 function 1_predict_segformer.py file , stay segoutput The result of semantic segmentation of handwritten words will appear in the folder , efficiency :4s/step;
3、 After the second step is completed , function 2_predictgan.py file , efficiency :4s/step;
4、 The final submission result appears in submit In the folder .
#1. Big picture of prediction mask
!python work/pre_and_submit/1_predict_segformer.py
segoutput Generate semantic segmentation results under the path . As shown in the figure below :

#3. Handwriting gland information repair
!python work/pre_and_submit/2_predictgan.py
submit Get the prediction results under the folder :

#4. Compressed file submission score
%cd submit/
!zip result.zip *.png *.txt
6、 ... and 、 thank
Thank you again :
Thank Baidu for organizing the competition , Provide opportunities and stage
Thank the team members for their joint efforts , Thank Yang Libo for enduring 23:00 The bad habit of holding seminars , Thanks for Shen Chen 24 Hours 5 Restart the service in minutes , Thank Zhai Xuekui for sharing the firepower , Thank you for your data support 、 Cooperative support .
Special thanks :
Jordan2020 Open source be based on MTRNet++ Realize image and text erasure , Ranking score 0.55599 programme , It has benefited a lot .
边栏推荐
- Leetcode 701 insertion operation in binary search tree -- recursive method and iterative method
- The usage and difference between strlen and sizeof
- What impact will frequent job hopping have on your career?
- Text classification still stays at Bert? The dual contrast learning framework is too strong
- How to use PHP string query function
- H3C V7版本交换机配置IRF
- 嵌入式面试题(一:进程与线程)
- H3C防火墙RBM+VRRP 组网配置
- Migrate Infones to stm32
- How can large websites choose better virtual machine service providers?
猜你喜欢

Easy to understand IIC protocol explanation

什么是独立IP,独立IP主机怎么样?

移植InfoNES到STM32
![[Jiudu OJ 08] simple search x](/img/a7/12a00c5d1db2deb064ff5f2e83dc58.jpg)
[Jiudu OJ 08] simple search x

Text classification still stays at Bert? The dual contrast learning framework is too strong

(column 22) typical column questions of C language: delete the specified letters in the string.

wib3.0 跨越,在跨越(ง •̀_•́)ง

PDK工艺库安装-CSMC
![[Tang Laoshi] C -- encapsulation: classes and objects](/img/4e/30d2d4652ea2d4cd5fa7cbbb795863.jpg)
[Tang Laoshi] C -- encapsulation: classes and objects

Remember an error in MySQL: the user specified as a definer ('mysql.infoschema '@' localhost ') does not exist
随机推荐
[Jiudu OJ 07] folding basket
How to download GB files from Google cloud hard disk
局域网同一个网段通信过程
Vulhub vulnerability recurrence 72_ uWSGI
Deep learning -yolov5 introduction to actual combat click data set training
05. Security of blog project
大型网站如何选择比较好的云主机服务商?
Download, install and use NVM of node, and related use of node and NRM
P2802 回家
Remember an error in MySQL: the user specified as a definer ('mysql.infoschema '@' localhost ') does not exist
Easy to understand IIC protocol explanation
Li Chuang EDA learning notes 12: common PCB board layout constraint principles
Improve jpopup to realize dynamic control disable
JS array list actual use summary
H3C V7版本交换机配置IRF
Rustdesk builds its own remote desktop relay server
Embedded interview questions (IV. common algorithms)
The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
[JVM] [Chapter 17] [garbage collector]
数字经济破浪而来 ,LTD是权益独立的Web3.0网站?