当前位置:网站首页>AAAI2020: Real-time Scene Text Detection with Differentiable Binarization
AAAI2020: Real-time Scene Text Detection with Differentiable Binarization
2022-07-01 19:38:00 【Highlight_ Jin】
AAAI2020: Real-time Scene Text Detection with Differentiable Binarization
Probability map : Is the original text mask Shrunk graph
Threshold diagram : Is the text boundary inward 、 After outward expansion , The resulting difference set region , Better describe the boundaries of the text .
1 Introduction
In recent years , Due to image / Video understanding 、 Visual search 、 Extensive practical applications such as autonomous driving and blind assistance , Reading texts in scene images has become an active research field . As a key component of scene text reading , Scene text detection aimed at locating the bounding box or area of each text instance is still a challenging task , Because scene text usually has different scales and shapes , Including levels 、 Multidirectional and curved text . Scene text detection based on segmentation has recently attracted a lot of attention , Because it can describe various shapes of text , Benefit from its prediction results at the pixel level . However , Most segmentation based methods require complex post-processing , The pixel level prediction results are grouped into detected text instances , The time cost in the reasoning process is quite high . Take the recent two most advanced scene text detection methods as examples .PSENet(Wang wait forsomeone ,2019a) A post-processing method of progressive proportional expansion is proposed , To improve the detection accuracy ;Pixel embedding(Tian wait forsomeone ,2019) It is used to cluster pixels according to the segmentation results , It is necessary to calculate the characteristic distance between pixels .
Most existing detection methods use similar post-treatment pipelines , Pictured 2 Shown ( As shown by the blue arrow ). First , They set a fixed threshold , Convert the probability map generated by the segmented network into binary image ; then , Use some heuristic techniques , Such as pixel clustering , Group pixels into text instances . in addition , Our pipeline ( According to the plan 2 The red arrow in ) The purpose is to insert the binarization operation into the segmented network for joint optimization . In this way , It can adaptively predict the threshold value of every part of the image , This can completely distinguish the pixels of the foreground and background . However , The standard binarization function is not separable , We propose an approximate binarization function , It is called separable binarization (DB), When training with segmented Networks , It is completely separable .
The main contribution of this paper is to propose a distinguishable DB modular , This makes the process of binarization in CNN You can do end-to-end training in . By combining a simple semantic segmentation network and the proposed DB modular , We propose a powerful and fast scene text detector . From using DB Module performance evaluation , We find that our detector has several outstanding advantages over the previous most advanced segmentation based methods .
- Our method has achieved consistently better performance on the benchmark data set of five scene texts , Including levels 、 Multidirectional and curved text .
- Our method performs faster than the previous leading method , because DB It can provide a highly robust binary graph , Greatly simplifies the post-processing process .
- DB The effect is quite good when using lightweight backbone , This greatly enhances ResNet-18 Detection performance of backbone .
- because DB It can be removed in the reasoning phase without affecting performance , Therefore, there is no additional memory in the test / Time cost .
2 Related work
3 Methodology
The structure of our proposed method is shown in Figure 3 Shown . First , The input image is fed into a feature pyramid skeleton . secondly , Pyramid features are up sampled to the same ratio and cascaded to produce features F. then , features F It is used to predict the probability diagram (P) And threshold graph (T). after , Approximate binary graph (ˆB) from P and F Calculation . During the training period , Supervision is applied to probability graphs 、 Threshold graph and approximate binary graph , Probability graph and approximate binary graph share the same supervision . In the reasoning stage , The boundary box can be easily obtained from the approximate binary diagram or probability diagram through the box module .
3.1Binarization
Standard binarization Given a probability graph generated by the segmented network P∈RH×W, among H and W Represents the height and width of the graph , It must be converted into a binary diagram P∈RH×W, The value is 1 Pixels are considered to be effective text areas . Usually , This binarization process can be described as follows :
among t Is the predetermined threshold ,(i,j) Express map Coordinate points in .
Differentiable binarization The formula 1 The standard binarization described in is inseparable . therefore , During training , It cannot be optimized with segmented Networks . To solve this problem , We suggest using an approximate ladder function to binarize : ˆBi,j = 1 1 + e-k(Pi,j-Ti,j) (2) among ˆB Is an approximate binary graph ;T It is an adaptive threshold graph learned from the network ;k Represents the magnification factor . The behavior of this approximate binarization function is similar to that of the standard binarization function ( See the picture 4), But it is differentiable , Therefore, it can be optimized together with the segmented network during training . Differentiated binarization with adaptive threshold is not only helpful to distinguish the text area from the background , It can also separate tightly bound text instances . Some examples are shown in Figure 7 As explained in .
3.2 Adaptive threshold
3.3 Deformable convolution
3.4 Label generation
The label generation of probability graph is restricted PSENet(Wang wait forsomeone ,2019a) Inspired by the . Given a text image , Each polygon of its text area is described by a group of segments .G={Sk}nk=1 (5)n It's the number of vertices , It may be different in different data sets , for example ,ICDAR 2015 Data sets (Karatzas wait forsomeone ,2015) by 4,CTW1500 Data sets (Liu wait forsomeone ,2019a) by 16. And then by using V atti clipping Algorithm (V ati 1992) Put the polygon G Shrink to Gs The afterlife becomes a positive area . Reduced offset D Is the perimeter of the original polygon L And area A Calculated .D = A(1 - r2) L (6) among r Is the shrinkage , According to experience, it is set to 0.4.
Through a similar program , We can generate labels for the threshold graph . First , Text polygon G Offset by the same amount D Expanded to Gd. We think Gs and Gd The gap between them is the boundary of the text area , ad locum , The label of the threshold graph can be calculated with G The distance of the nearest fragment in .
3.5 Optimization
4 Experiments
5 Conclusion
边栏推荐
- Flutter 实战-快速实现音视频通话应用
- web开发常用的开源框架的开源协议整理
- CMU AI PhD 第一年总结
- 新版国标GB28181视频平台EasyGBS如何配置WebRTC视频流格式播放?
- ffmpeg AVFrame 转 cv::Mat
- Interview question 16.16 Partial sorting - Double finger needling
- Junit单元测试框架详解
- Audio and video, encoding and decoding related e-books, gadgets, packaged for free!
- Interview questions for audio and video positions in Dachang -- today's headline
- Nat penetration of gb28181
猜你喜欢
音视频、编解码相关电子书、小工具,打包奉送!
EasyGBS主子码流都为H.265时,切换出现花屏如何解决?
Thesis reading [distinctive late semantic graph for video capturing]
研究了11种实时聊天软件,我发现都具备这些功能…
nacos启动失败问题解决与总结
论文阅读【Discriminative Latent Semantic Graph for Video Captioning】
P2433 【深基1-2】小学数学 N 合一
Methods of finding various limits
Dom4j parsing XML, XPath retrieving XML
求各种极限的方法
随机推荐
Audio and video, encoding and decoding related e-books, gadgets, packaged for free!
Thesis reading [distinctive late semantic graph for video capturing]
Uni app wechat applet one click login to obtain permission function
241. Different Ways to Add Parentheses
AAAI2020: Real-time Scene Text Detection with Differentiable Binarization
118. Yanghui triangle
ES6中的代理proxy
The use of subplot function in MATLAB
事务隔离级别 gap锁 死锁
【英语语法】Unit1 冠词、名词、代词和数词
703. 数据流中的第 K 大元素
Les canaux de culture intensive s'efforcent de développer Fu Xin et Wei Shi jiajie pour organiser une conférence de formation sur les nouveaux produits
学习笔记【gumbel softmax】
Dom4j parsing XML, XPath retrieving XML
Reading the paper [learning to discretely compose reasoning module networks for video captioning]
智慧防疫系统为建筑工地复工复产提供安全保障
Dlib+opencv library for fatigue detection
Task: denial of service DOS
Introduction and installation of crunch, and making password dictionary with crunch
Witness the times! "The future of Renji collaboration has come" 2022 Hongji ecological partnership conference opens live broadcast reservation