当前位置:网站首页>AAAI2020: Real-time Scene Text Detection with Differentiable Binarization
AAAI2020: Real-time Scene Text Detection with Differentiable Binarization
2022-07-01 19:38:00 【Highlight_ Jin】
AAAI2020: Real-time Scene Text Detection with Differentiable Binarization
Probability map : Is the original text mask Shrunk graph
Threshold diagram : Is the text boundary inward 、 After outward expansion , The resulting difference set region , Better describe the boundaries of the text .
1 Introduction
In recent years , Due to image / Video understanding 、 Visual search 、 Extensive practical applications such as autonomous driving and blind assistance , Reading texts in scene images has become an active research field . As a key component of scene text reading , Scene text detection aimed at locating the bounding box or area of each text instance is still a challenging task , Because scene text usually has different scales and shapes , Including levels 、 Multidirectional and curved text . Scene text detection based on segmentation has recently attracted a lot of attention , Because it can describe various shapes of text , Benefit from its prediction results at the pixel level . However , Most segmentation based methods require complex post-processing , The pixel level prediction results are grouped into detected text instances , The time cost in the reasoning process is quite high . Take the recent two most advanced scene text detection methods as examples .PSENet(Wang wait forsomeone ,2019a) A post-processing method of progressive proportional expansion is proposed , To improve the detection accuracy ;Pixel embedding(Tian wait forsomeone ,2019) It is used to cluster pixels according to the segmentation results , It is necessary to calculate the characteristic distance between pixels .
Most existing detection methods use similar post-treatment pipelines , Pictured 2 Shown ( As shown by the blue arrow ). First , They set a fixed threshold , Convert the probability map generated by the segmented network into binary image ; then , Use some heuristic techniques , Such as pixel clustering , Group pixels into text instances . in addition , Our pipeline ( According to the plan 2 The red arrow in ) The purpose is to insert the binarization operation into the segmented network for joint optimization . In this way , It can adaptively predict the threshold value of every part of the image , This can completely distinguish the pixels of the foreground and background . However , The standard binarization function is not separable , We propose an approximate binarization function , It is called separable binarization (DB), When training with segmented Networks , It is completely separable .
The main contribution of this paper is to propose a distinguishable DB modular , This makes the process of binarization in CNN You can do end-to-end training in . By combining a simple semantic segmentation network and the proposed DB modular , We propose a powerful and fast scene text detector . From using DB Module performance evaluation , We find that our detector has several outstanding advantages over the previous most advanced segmentation based methods .
- Our method has achieved consistently better performance on the benchmark data set of five scene texts , Including levels 、 Multidirectional and curved text .
- Our method performs faster than the previous leading method , because DB It can provide a highly robust binary graph , Greatly simplifies the post-processing process .
- DB The effect is quite good when using lightweight backbone , This greatly enhances ResNet-18 Detection performance of backbone .
- because DB It can be removed in the reasoning phase without affecting performance , Therefore, there is no additional memory in the test / Time cost .
2 Related work
3 Methodology
The structure of our proposed method is shown in Figure 3 Shown . First , The input image is fed into a feature pyramid skeleton . secondly , Pyramid features are up sampled to the same ratio and cascaded to produce features F. then , features F It is used to predict the probability diagram (P) And threshold graph (T). after , Approximate binary graph (ˆB) from P and F Calculation . During the training period , Supervision is applied to probability graphs 、 Threshold graph and approximate binary graph , Probability graph and approximate binary graph share the same supervision . In the reasoning stage , The boundary box can be easily obtained from the approximate binary diagram or probability diagram through the box module .
3.1Binarization
Standard binarization Given a probability graph generated by the segmented network P∈RH×W, among H and W Represents the height and width of the graph , It must be converted into a binary diagram P∈RH×W, The value is 1 Pixels are considered to be effective text areas . Usually , This binarization process can be described as follows :
among t Is the predetermined threshold ,(i,j) Express map Coordinate points in .
Differentiable binarization The formula 1 The standard binarization described in is inseparable . therefore , During training , It cannot be optimized with segmented Networks . To solve this problem , We suggest using an approximate ladder function to binarize : ˆBi,j = 1 1 + e-k(Pi,j-Ti,j) (2) among ˆB Is an approximate binary graph ;T It is an adaptive threshold graph learned from the network ;k Represents the magnification factor . The behavior of this approximate binarization function is similar to that of the standard binarization function ( See the picture 4), But it is differentiable , Therefore, it can be optimized together with the segmented network during training . Differentiated binarization with adaptive threshold is not only helpful to distinguish the text area from the background , It can also separate tightly bound text instances . Some examples are shown in Figure 7 As explained in .
3.2 Adaptive threshold
3.3 Deformable convolution
3.4 Label generation
The label generation of probability graph is restricted PSENet(Wang wait forsomeone ,2019a) Inspired by the . Given a text image , Each polygon of its text area is described by a group of segments .G={Sk}nk=1 (5)n It's the number of vertices , It may be different in different data sets , for example ,ICDAR 2015 Data sets (Karatzas wait forsomeone ,2015) by 4,CTW1500 Data sets (Liu wait forsomeone ,2019a) by 16. And then by using V atti clipping Algorithm (V ati 1992) Put the polygon G Shrink to Gs The afterlife becomes a positive area . Reduced offset D Is the perimeter of the original polygon L And area A Calculated .D = A(1 - r2) L (6) among r Is the shrinkage , According to experience, it is set to 0.4.
Through a similar program , We can generate labels for the threshold graph . First , Text polygon G Offset by the same amount D Expanded to Gd. We think Gs and Gd The gap between them is the boundary of the text area , ad locum , The label of the threshold graph can be calculated with G The distance of the nearest fragment in .
3.5 Optimization
4 Experiments
5 Conclusion
边栏推荐
猜你喜欢
Uni app product classification
Shell高级进阶
Instagram 为何从内容共享平台变成营销工具?独立站卖家如何利用该工具?
论文泛读【FiLM: Visual Reasoning with a General Conditioning Layer】
How to solve the problem of splash screen when the main and sub code streams of easygbs are h.265?
Les canaux de culture intensive s'efforcent de développer Fu Xin et Wei Shi jiajie pour organiser une conférence de formation sur les nouveaux produits
Why must we move from Devops to bizdevops?
uni-app微信小程序一键登录获取权限功能
Oracle物理体系结构
MySQl的基本使用
随机推荐
AAAI2020: Real-time Scene Text Detection with Differentiable Binarization
音视频、编解码相关电子书、小工具,打包奉送!
What is the essential difference between Bi development and report development?
为什么一定要从DevOps走向BizDevOps?
Intensive cultivation of channels for joint development Fuxin and Weishi Jiajie held a new product training conference
精耕渠道共謀發展 福昕攜手偉仕佳傑開展新產品培訓大會
uni-app微信小程序一键登录获取权限功能
Why has instagram changed from a content sharing platform to a marketing tool? How do independent sellers use this tool?
Opencv video quality detection -- sharpness detection
703. The k-th element in the data flow
【英语语法】Unit1 冠词、名词、代词和数词
Why must we move from Devops to bizdevops?
狼人杀攻略:你当我好骗吗,我们相信谁!
研究了11种实时聊天软件,我发现都具备这些功能…
DTD建模
白盒加密技术浅理解
Basic knowledge of audio coding and decoding
事务隔离级别 gap锁 死锁
面试题 16.16. 部分排序-双指针法
Optaplanner learning notes (I) case cloud balance