当前位置:网站首页>Baidu online AI competition - image processing challenge: the 8th program of handwriting erasure
Baidu online AI competition - image processing challenge: the 8th program of handwriting erasure
2022-07-06 05:48:00 【Python's path to immortality】
The eighth place solution in the competition
Baidu online game II , Handwritten text erasure
Thank Baidu for organizing the competition , Thank the team members for their joint efforts
Special thanks Jordan2020 Open source be based on MTRNet++ Realize image and text erasure , Ranking score 0.55599 programme , It has benefited a lot .
One 、 Algorithm is introduced
It is said that , Choose a good baseline It's half done ( Manual formation ), When you get the dataset , We use it Pixel2Pixel,CycleGAN,EnsNet,MTRNet++ Tested , Finally, I chose MTRNet++ As baseline.
The reason is that : Those with a plus sign are generally stronger , This model has two plus signs
- End2End, Good integration ;
- The model is small , The effect is not bad , There is magic space ;
- Network design is closer to the task of this competition .
Network architecture :
from MTRNet++
Changed to
Please consider dis
Magic change ideas :
Remove input mask.MTRNet++ Need to enter mask As the fourth channel ,
How inconvenient it is , Let's throw it awayIf the full picture is adopted 1 Method of filling , I feel this data is redundant , But if you want to pass in high-precision segmentation results , How to get accurate Mask It's a challenge in itself . therefore , We deleted the fourth channel , Based on transformer Of segformer Substitute network generation msdk Part of , Delete A1-A4 Door structure , Make the network loosely coupled , Easy to separate 、 tuning ( See the practice tuning section in Section III for details ).Fix area focus . In the original , Pre generated pictures ( The picture in the middle , after GCI Generated graph after ) After cmp(mask Select the white area to generate image content , Select the content of the original image in the black area , Superimpose to generate a new picture ) operation , We introduce this mechanism to the next iteration , Finally, the generated image also goes through cmp After the operation , And again GT do loss operation , The gradient of the network generated in this way almost all comes from mask part , Instead of mask Some are basically ignored . We think this training is faster , better .
Two 、 Data to enhance
The troops did not move , Gateway leading . This thing , It's rations .
- Data cleaning
Measurement professionals, we get the map , The first reaction is always : Where is the error ?
After the experiment, we think : The difference between the two pictures , Different by the system 、 Artificial differences 、 The differences caused by handwritten words together cause .
Above picture , The pictures are arranged in the following order :
Original picture Truth value threshold 1 threshold 5
threshold 10 threshold 15 threshold 25
Analysis conclusion :
- The threshold for 1 A large number of pixels appear , This is partly due to data storage compression 、 Interpolation and other reasons , Identified as system differences ( Not a computer major , There may be a problem with the presentation , Please understand , Please tap )
- The threshold for 5-25 Basically, handwritten words are visible , But the noise is different , Because some operators wiped out the noisy area during the processing , Some operators did not erase ( Especially between the lines , Places that are not easy to deal with )
- The difference caused by handwriting , This is what we should really pay attention to
After data processing , Retain the 2、3 Two kinds of differences , The first kind uses cmp The way is removed .
Enhanced expansion
- Combine this data set with the public data set to form handwritten text materials
- For parentheses 、 Circle and other error recognition areas form material
- MixUp
Simple and crude , And effective
3、 ... and 、 Practice tuning
Code implementation and practice tuning :
After the above work , Start network training , There are two problems :
- During multi task training ,mask The effect of task generation is not ideal ;
- In the process of forecasting , If yes mask Make enhanced predictions , It is helpful to improve the accuracy , But it is difficult to realize in a network .
Theory belongs to theory , Practice belongs to practice , Accuracy is the most important . Model dismemberment , Elegance is not ( This is also the reason for the previous emphasis on loose coupling ).
We divide the network into two parts
- The first piece is used directly PaddleSeg Generation network , Train alone segformerB2
- Another piece trains the rest of the network , Including countermeasure network generator and discriminator
Use 512 512 A window the size of ,256 256 Step size for sliding prediction , Generate the whole big picture mask( Good sliding effect with repeated areas ); Then erase the content of the corresponding area in the picture , With 512X512 The size is input into the subsequent network for prediction .
Four 、 Code Introduction - Introduction to training recurrence
#1. Prepare the ingredients : Decompress the data set of the game
!unzip data/data126591/dehw_testB_dataset.zip -d data/ >>/dev/null
!unzip data/data126591/dataset1.zip -d data/ >>/dev/null
!unzip data/data126591/dataset2.zip -d data/ >>/dev/null
#2. The preparation of the instruments
!pip install scikit_image -q
!pip install paddleseg -q
#3. Start the first part of the training
!python work/train/train_seg.py
#4. Start the second part of the training
!python work/train/train.py
5、 ... and 、 Code Introduction - Introduction to prediction code
This section contains the prediction result code , Operation mode :
1、B List image data into /dehw_testB_dataset Under the folder ( Retain , Direct operation );
2、 function 1_predict_segformer.py file , stay segoutput The result of semantic segmentation of handwritten words will appear in the folder , efficiency :4s/step;
3、 After the second step is completed , function 2_predictgan.py file , efficiency :4s/step;
4、 The final submission result appears in submit In the folder .
#1. Big picture of prediction mask
!python work/pre_and_submit/1_predict_segformer.py
segoutput Generate semantic segmentation results under the path . As shown in the figure below :
#3. Handwriting gland information repair
!python work/pre_and_submit/2_predictgan.py
submit Get the prediction results under the folder :
#4. Compressed file submission score
%cd submit/
!zip result.zip *.png *.txt
6、 ... and 、 thank
Thank you again :
Thank Baidu for organizing the competition , Provide opportunities and stage
Thank the team members for their joint efforts , Thank Yang Libo for enduring 23:00 The bad habit of holding seminars , Thanks for Shen Chen 24 Hours 5 Restart the service in minutes , Thank Zhai Xuekui for sharing the firepower , Thank you for your data support 、 Cooperative support .
Special thanks :
Jordan2020 Open source be based on MTRNet++ Realize image and text erasure , Ranking score 0.55599 programme , It has benefited a lot .
边栏推荐
- Remember an error in MySQL: the user specified as a definer ('mysql.infoschema '@' localhost ') does not exist
- After the project is released, index Html is cached
- Game push: image / table /cv/nlp, multi-threaded start!
- Station B Liu Erden linear regression pytoch
- Demander le Code de texte standard correspondant à un centre de travail dans l'ordre de production
- What is independent IP and how about independent IP host?
- Practice sharing: how to safely and quickly migrate from CentOS to openeuler
- B站刘二大人-反向传播
- Is it difficult for an information system project manager?
- continue和break的区别与用法
猜你喜欢
How to use PHP string query function
29io stream, byte output stream continue write line feed
Self built DNS server, the client opens the web page slowly, the solution
[cloud native] 3.1 kubernetes platform installation kubespher
【SQL server速成之路】——身份验证及建立和管理用户账户
SequoiaDB湖仓一体分布式数据库2022.6月刊
[Tang Laoshi] C -- encapsulation: classes and objects
B站刘二大人-数据集及数据加载 Lecture 8
Mysql database master-slave cluster construction
Is it difficult for an information system project manager?
随机推荐
Self built DNS server, the client opens the web page slowly, the solution
C language learning notes (mind map)
How to download GB files from Google cloud hard disk
【华为机试真题详解】统计射击比赛成绩
B站刘二大人-反向传播
H3C S5820V2_5830V2交换机IRF2堆叠后升级方法
Web服务连接器:Servlet
【SQL server速成之路】——身份验证及建立和管理用户账户
How to use PHP string query function
Improve jpopup to realize dynamic control disable
Cannot build artifact 'test Web: War expanded' because it is included into a circular depend solution
Game push: image / table /cv/nlp, multi-threaded start!
Migrate Infones to stm32
Hongliao Technology: how to quickly improve Tiktok store
Winter 2021 pat class B problem solution (C language)
c语言——冒泡排序
Station B, Master Liu Er - dataset and data loading
Practice sharing: how to safely and quickly migrate from CentOS to openeuler
无代码六月大事件|2022无代码探索者大会即将召开;AI增强型无代码工具推出...
B站刘二大人-线性回归 Pytorch