当前位置:网站首页>Text recognition svtr paper interpretation
Text recognition svtr paper interpretation
2022-06-30 21:05:00 【‘Atlas’】
List of articles
The paper : 《SVTR: Scene Text Recognition with a Single Visual Model》
github: https://github.com/PaddlePaddle/PaddleOCR
solve the problem
The conventional text recognition model consists of two parts : Visual model for feature extraction and sequence model for text transcription ;
problem :
Although this model has high accuracy , But complex and inefficient ;
solve :
The author puts forward SVTR, Only visual models , Eliminate the sequence model ;
1、 Decouple image text patch;
2、 The hierarchical stage passes mixing、merging、combining Loop execution ; Global and local mixing The module is used to sense the morphology within and between characters ;
SVTR-L It is fast in English and Chinese recognition accuracy high ;
Algorithm
SVTR The overall structure is shown in the figure 2 Shown ,
The process is as follows :
1、 Enter text image H × W × 3 H \times W \times 3 H×W×3, after patch embedding modular , Convert to H 4 × W 4 \frac H4 \times \frac W4 4H×4W The dimensions are D 0 D_0 D0 Of patch;
2、 Three stage For feature extraction , Every stage There is a series of mixingblock And merging or combing constitute ;
Local and global mixing block It is used to extract local features of strokes and dependencies between elements ;
Use this backnobe, It can represent the character element features and dependencies of different distances and scales , The feature size is 1 × W 4 × D 3 1 \times \frac W4 \times D_3 1×4W×D3, Use symbols C Express ;
3、 Last pass FC Layer to get the character sequence ;
progressive overlap patch embedding
The author did not use vit in kernel=4, stride=4 Convolution ; Instead, use two kernel=3,stride=2 Convolution , Pictured 3; Although some calculations are added , But it is conducive to feature fusion ; See for ablation experiment 3.3
mixing block
mixing block Pictured 4 Shown ,
Local features : The morphological characteristics of coded characters and the correlation between different parts of characters ;
Global features : Between different characters 、 With or without text patch Relationship between ;
Merging
In order to reduce the amount of computation and remove redundant representations , Put forward Merging; adopt kernel=3,stride=(2,1),conv Sample the height down 2 times ; Because most texts are horizontal ; At the same time increase channel Dimension compensates for information loss ;
Combining & Prediction
Combining
First, the pool height dimension is 1, Next, the full connection layer 、 Nonlinear activation layer and dropout layer ;
Prediction
Linear classifiers have N Nodes , Generate W 4 \frac W4 4W Sequence , Ideally, the same character patch Will be transcribed into repeated characters , No text patch Will be transcribed as spaces ; In English N Set to 37, In Chinese N Set to 6625;
The maximum prediction length of the English model is 25, The maximum prediction length of the Chinese model is 40.
Structural variants
SVTR There are several hyperparameters in , Every stage in channel depth ,head Number ,mixing blockj Quantity and local mixing、global mixing Number , So there is SVTR- T (Tiny), SVTR-S (Small), SVTR-B (Base) and SVTR-L (Large), As shown in the table 1.
experiment
IC13:ICDAR 2013 Data sets , Rule text .
IC15:ICDAR 2015 Data sets , Irregular text .
patch embedding Ablation Experiment
As shown in the table 2 left , gradual embedding The mechanism goes beyond the limit 0.75%,2.8%, In irregular text recognition effect is obvious ;
Merging Ablation Experiment
As shown in the table 2 On the right side , The gradual resolution reduction network not only increases the amount of computation compared with the fixed resolution network , And the performance is improved
Replacement fusion module Ablation Experiment
As shown in the table 3,
1、 Each strategy has a certain degree of improvement , Due to full character feature perception ;
2、L6G6 The best way ,IC13 Performance improvement 1.9%,IC15 Performance improvement 6.6%.
3、 Switch their combination pit and you lead to the overall situation mixing block It doesn't work , It may repeatedly focus on local features ;
SOTA Compare
chart 5 For each model accuracy And parameter quantity 、 Speed relationship ;
surface 4 Compare the performance of various methods ,
SVTR Comprehensive time and accuracy Good performance ;
Conclusion
This paper presents a visual model for image text recognition SVTR, Multi - fine - grained character features are proposed to represent local strokes and the dependency between characters at multi - scales ; therefore SVTR Good effect .
边栏推荐
- Basic components of STL
- 多表操作-外键约束
- Web APIs 综合案例-Tab栏切换 丨黑马程序员
- Lumiprobe cell biology - dia, instructions for lipophilic tracer
- B_QuRT_User_Guide(31)
- oprator-1初识oprator
- 二叉查找树(一) - 概念与C语言实现
- Personal developed penetration testing tool Satania
- 数字货币:影响深远的创新
- Huffman tree (I) basic concept and C language implementation
猜你喜欢
ArcGIS构建发布简单路网Network数据服务及Rest调用测试
开发技术-使用easyexcel导入文件(简单示例)
3Ds Max 精模obj模型导入ArcGIS Pro (二)要点补充
Introduction of 3D Max fine model obj model into ArcGIS pro (II) key points supplement
个人开发的渗透测试工具Satania
RP原型资源分享-购物类App
Adobe-Photoshop(PS)-脚本开发-去除文件臃肿脚本
Lumiprobe dye hydrazide - BDP FL hydrazide solution
MySQL introduction, detailed installation steps and usage | dark horse programmer
偏向锁/轻量锁/重级锁锁锁更健康,上锁解锁到底是怎么完成实现的
随机推荐
DM8:生成DM AWR报告
CentOS - enable / disable Oracle
Based on the open source stream batch integrated data synchronization engine Chunjun data restore DDL parsing module actual combat sharing
多态在代码中的体现
Lumiprobe 改性三磷酸盐丨生物素-11-UTP研究
毕业五年,想当初若没有入行测试,我是否还会如这般焦虑
树基本概念
【微服务~Nacos】Nacos之配置中心
Double solid histogram / double y-axis
C file pointer
Flinksql两个kafka 流可以进行join么?
Lumiprobe染料 NHS 酯丨BDP FL NHS 酯研究
文本生成模型退化怎么办?SimCTG 告诉你答案
有趣网站汇总
MySQL:SQL概述及数据库系统介绍 | 黑马程序员
两个skyline
go搭建服务器基础
Lumiprobe生物素亚磷酰胺(羟脯氨酸)说明书
Flutter 嵌套地狱?不存在的,ConstraintLayout 来解救!
Vite2兼容低版本chrome(如搜狗80),通过polyfills处理部分需求高版本的语法