当前位置:网站首页>Baidu online AI competition - image processing challenge: the 8th program of handwriting erasure
Baidu online AI competition - image processing challenge: the 8th program of handwriting erasure
2022-07-06 05:48:00 【Python's path to immortality】
The eighth place solution in the competition
Baidu online game II , Handwritten text erasure
Thank Baidu for organizing the competition , Thank the team members for their joint efforts
Special thanks Jordan2020 Open source be based on MTRNet++ Realize image and text erasure , Ranking score 0.55599 programme , It has benefited a lot .
One 、 Algorithm is introduced
It is said that , Choose a good baseline It's half done ( Manual formation ), When you get the dataset , We use it Pixel2Pixel,CycleGAN,EnsNet,MTRNet++ Tested , Finally, I chose MTRNet++ As baseline.
The reason is that : Those with a plus sign are generally stronger , This model has two plus signs
- End2End, Good integration ;
- The model is small , The effect is not bad , There is magic space ;
- Network design is closer to the task of this competition .
Network architecture :
from MTRNet++

Changed to

Please consider dis
Magic change ideas :
Remove input mask.MTRNet++ Need to enter mask As the fourth channel ,
How inconvenient it is , Let's throw it awayIf the full picture is adopted 1 Method of filling , I feel this data is redundant , But if you want to pass in high-precision segmentation results , How to get accurate Mask It's a challenge in itself . therefore , We deleted the fourth channel , Based on transformer Of segformer Substitute network generation msdk Part of , Delete A1-A4 Door structure , Make the network loosely coupled , Easy to separate 、 tuning ( See the practice tuning section in Section III for details ).Fix area focus . In the original , Pre generated pictures ( The picture in the middle , after GCI Generated graph after ) After cmp(mask Select the white area to generate image content , Select the content of the original image in the black area , Superimpose to generate a new picture ) operation , We introduce this mechanism to the next iteration , Finally, the generated image also goes through cmp After the operation , And again GT do loss operation , The gradient of the network generated in this way almost all comes from mask part , Instead of mask Some are basically ignored . We think this training is faster , better .
Two 、 Data to enhance
The troops did not move , Gateway leading . This thing , It's rations .
- Data cleaning
Measurement professionals, we get the map , The first reaction is always : Where is the error ?
After the experiment, we think : The difference between the two pictures , Different by the system 、 Artificial differences 、 The differences caused by handwritten words together cause .
Above picture , The pictures are arranged in the following order :
Original picture Truth value threshold 1 threshold 5
threshold 10 threshold 15 threshold 25
Analysis conclusion :
- The threshold for 1 A large number of pixels appear , This is partly due to data storage compression 、 Interpolation and other reasons , Identified as system differences ( Not a computer major , There may be a problem with the presentation , Please understand , Please tap )
- The threshold for 5-25 Basically, handwritten words are visible , But the noise is different , Because some operators wiped out the noisy area during the processing , Some operators did not erase ( Especially between the lines , Places that are not easy to deal with )
- The difference caused by handwriting , This is what we should really pay attention to
After data processing , Retain the 2、3 Two kinds of differences , The first kind uses cmp The way is removed .
Enhanced expansion
- Combine this data set with the public data set to form handwritten text materials

- For parentheses 、 Circle and other error recognition areas form material

- MixUp

Simple and crude , And effective
3、 ... and 、 Practice tuning
Code implementation and practice tuning :
After the above work , Start network training , There are two problems :
- During multi task training ,mask The effect of task generation is not ideal ;
- In the process of forecasting , If yes mask Make enhanced predictions , It is helpful to improve the accuracy , But it is difficult to realize in a network .
Theory belongs to theory , Practice belongs to practice , Accuracy is the most important . Model dismemberment , Elegance is not ( This is also the reason for the previous emphasis on loose coupling ).
We divide the network into two parts
- The first piece is used directly PaddleSeg Generation network , Train alone segformerB2
- Another piece trains the rest of the network , Including countermeasure network generator and discriminator
Use 512 512 A window the size of ,256 256 Step size for sliding prediction , Generate the whole big picture mask( Good sliding effect with repeated areas ); Then erase the content of the corresponding area in the picture , With 512X512 The size is input into the subsequent network for prediction .
Four 、 Code Introduction - Introduction to training recurrence
#1. Prepare the ingredients : Decompress the data set of the game
!unzip data/data126591/dehw_testB_dataset.zip -d data/ >>/dev/null
!unzip data/data126591/dataset1.zip -d data/ >>/dev/null
!unzip data/data126591/dataset2.zip -d data/ >>/dev/null
#2. The preparation of the instruments
!pip install scikit_image -q
!pip install paddleseg -q
#3. Start the first part of the training
!python work/train/train_seg.py
#4. Start the second part of the training
!python work/train/train.py
5、 ... and 、 Code Introduction - Introduction to prediction code
This section contains the prediction result code , Operation mode :
1、B List image data into /dehw_testB_dataset Under the folder ( Retain , Direct operation );
2、 function 1_predict_segformer.py file , stay segoutput The result of semantic segmentation of handwritten words will appear in the folder , efficiency :4s/step;
3、 After the second step is completed , function 2_predictgan.py file , efficiency :4s/step;
4、 The final submission result appears in submit In the folder .
#1. Big picture of prediction mask
!python work/pre_and_submit/1_predict_segformer.py
segoutput Generate semantic segmentation results under the path . As shown in the figure below :

#3. Handwriting gland information repair
!python work/pre_and_submit/2_predictgan.py
submit Get the prediction results under the folder :

#4. Compressed file submission score
%cd submit/
!zip result.zip *.png *.txt
6、 ... and 、 thank
Thank you again :
Thank Baidu for organizing the competition , Provide opportunities and stage
Thank the team members for their joint efforts , Thank Yang Libo for enduring 23:00 The bad habit of holding seminars , Thanks for Shen Chen 24 Hours 5 Restart the service in minutes , Thank Zhai Xuekui for sharing the firepower , Thank you for your data support 、 Cooperative support .
Special thanks :
Jordan2020 Open source be based on MTRNet++ Realize image and text erasure , Ranking score 0.55599 programme , It has benefited a lot .
边栏推荐
- Vulhub vulnerability recurrence 72_ uWSGI
- Quantitative description of ANC noise reduction
- The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
- Easy to understand IIC protocol explanation
- PDK process library installation -csmc
- Leetcode 701 insertion operation in binary search tree -- recursive method and iterative method
- Go language -- language constants
- [machine learning notes] univariate linear regression principle, formula and code implementation
- Station B, Mr. Liu Er - multiple logistic regression, structure 7
- Jushan database appears again in the gold fair to jointly build a new era of digital economy
猜你喜欢

26file filter anonymous inner class and lambda optimization

授予渔,从0开始搭建一个自己想要的网页

Construction of yolox based on paste framework

初识数据库

Station B, Master Liu Er - back propagation

Game push image / table /cv/nlp, multi-threaded start

H3C防火墙RBM+VRRP 组网配置

How to download GB files from Google cloud hard disk

59. Spiral matrix

进程和线程
随机推荐
Redis消息队列
Game push: image / table /cv/nlp, multi-threaded start!
27io stream, byte output stream, OutputStream writes data to file
查詢生產訂單中某個(些)工作中心對應的標准文本碼
[cloud native] 3.1 kubernetes platform installation kubespher
Classes and objects (I) detailed explanation of this pointer
数字经济破浪而来 ,LTD是权益独立的Web3.0网站?
Download, install and use NVM of node, and related use of node and NRM
Summary of deep learning tuning tricks
[Jiudu OJ 07] folding basket
wib3.0 跨越,在跨越(ง •̀_•́)ง
H3C S5820V2_5830V2交换机IRF2堆叠后升级方法
Station B Liu Erden linear regression pytoch
[email protected]树莓派
[detailed explanation of Huawei machine test] check whether there is a digital combination that meets the conditions
Sequoiadb Lake warehouse integrated distributed database, June 2022 issue
Auto.js学习笔记17:基础监听事件和UI简单的点击事件操作
[string] palindrome string of codeup
Closure, decorator
Hongliao Technology: how to quickly improve Tiktok store