Scene Text Recognition Recommendations
Everythin about Scene Text Recognition
SOTA • Papers • Datasets • Code
Contents
1.Papers
All Papers Can be Find Here
up to (2021-12-8)
up to (2021-12-3)
up to (2021-11-25)
2.Datasets
2.1 Synthetic Datasets
Dataset
Description
Examples
BaiduNetdisk link
SynthText
9 million synthetic text instance images from a set of 90k common English words. Words are rendered onto nartural images with random transformations
Scene text datasets(提取码:emco)
MJSynth
6 million synthetic text instances. It's a generation of SynthText.
Scene text datasets(提取码:emco)
2.2 Benchmarks
Dataset
Description
Examples
BaiduNetdisk link
IIIT5k-Words(IIIT5K)
3000 test images instances. Take from street scenes and from originally-digital images
Scene text datasets(提取码:emco)
Street View Text(SVT)
647 test images instances. Some images are severely corrupted by noise, blur, and low resolution
Scene text datasets(提取码:emco)
StreetViewText-Perspective(SVT-P)
639 test images instances. It is specifically designed to evaluate perspective distorted textrecognition. It is built based on the original SVT dataset by selecting the images at the sameaddress on Google Street View but with different view angles. Therefore, most text instancesare heavily distorted by the non-frontal view angle.
Scene text datasets(提取码:emco)
ICDAR 2003(IC03)
867 test image instances
Scene text datasets(提取码:mfir)
ICDAR 2013(IC13)
1015 test images instances
Scene text datasets(提取码:emco)
ICDAR 2015(IC15)
2077 test images instances. As text images were taken by Google Glasses without ensuringthe image quality, most of the text is very small, blurred, and multi-oriented
Scene text datasets(提取码:emco)
CUTE80(CUTE)
288 It focuses on curved text recognition. Most images in CUTE have acomplex background, perspective distortion, and poor resolution
Scene text datasets(提取码:emco)
3.1 Public Code
3.1. Frameworks
PaddleOCR (百度)
PaddlePaddle/PaddleOCR
特性 (截取至PaddleOCR):
使用百度自研深度学习框架PaddlePaddle 搭建
PP-OCR系列高质量预训练模型,准确的识别效果
超轻量PP-OCRv2系列:检测(3.1M)+ 方向分类器(1.4M)+ 识别(8.5M)= 13.0M
超轻量PP-OCR mobile移动端系列:检测(3.0M)+方向分类器(1.4M)+ 识别(5.0M)= 9.4M
通用PPOCR server系列:检测(47.1M)+方向分类器(1.4M)+ 识别(94.9M)= 143.4M
支持中英文数字组合识别、竖排文本识别、长文本识别
支持多语言识别:韩语、日语、德语、法语
丰富易用的OCR相关工具组件
半自动数据标注工具PPOCRLabel:支持快速高效的数据标注
数据合成工具Style-Text:批量合成大量与目标场景类似的图像
文档分析能力PP-Structure:版面分析与表格识别
支持用户自定义训练,提供丰富的预测推理部署方案
支持PIP快速安装使用
可运行于Linux、Windows、MacOS等多种系统
支持算法(识别) :
CRNN
Rosetta
STAR-Net
RARE
SRN
NRTR
MMOCR (商汤)
open-mmlab/mmocr
特性 (截取至MMOCR):
MMOCR 是基于 PyTorch 和 mmdetection 的开源工具箱,专注于文本检测,文本识别以及相应的下游任务,如关键信息提取。 它是 OpenMMLab 项目的一部分。
该工具箱不仅支持文本检测和文本识别,还支持其下游任务,例如关键信息提取。
支持算法(识别)
CRNN (TPAMI'2016)
NRTR (ICDAR'2019)
RobustScanner (ECCV'2020)
SAR (AAAI'2019)
SATRN (CVPR'2020 Workshop on Text and Documents in the Deep Learning Era)
SegOCR (Manuscript'2021)
Deep Text Recognition Benchmark (ClovaAI)
3.2. Algorithms
CRNN
ASTER
MORANv2
4.SOTA
Regular Dataset
Irregular dataset
Model
Year
IIIT
SVT
IC13(857)
IC13(1015)
IC15(1811)
IC15(2077)
SVTP
CUTE
CRNN
2015
78.2
80.8
-
86.7
-
-
-
-
ASTER(L2R)
2015
92.67
91.16
-
90.74
76.1
-
78.76
76.39
CombBest
2019
87.9
87.5
93.6
92.3
77.6
71.8
79.2
74
ESIR
2019
93.3
90.2
-
91.3
-
76.9
79.6
83.3
SE-ASTER
2020
93.8
89.6
-
92.8
80
81.4
83.6
DAN
2020
94.3
89.2
-
93.9
-
74.5
80
84.4
RobustScanner
2020
95.3
88.1
-
94.8
-
77.1
79.5
90.3
AutoSTR
2020
94.7
90.9
-
94.2
81.8
-
81.7
-
Yang et al.
2020
94.7
88.9
-
93.2
79.5
77.1
80.9
85.4
SATRN
2020
92.8
91.3
-
94.1
-
79
86.5
87.8
SRN
2020
94.8
91.5
95.5
-
82.7
-
85.1
87.8
GA-SPIN
2021
95.2
90.9
-
94.8
82.8
79.5
83.2
87.5
PREN2D
2021
95.6
94
96.4
-
83
-
87.6
91.7
Bhunia et al.
2021
95.2
92.2
-
95.5
-
84
85.7
89.7
VisionLAN
2021
95.8
91.7
95.7
-
83.7
-
86
88.5
ABINet
2021
96.2
93.5
97.4
-
86.0
-
89.3
89.2
MATRN
2021
96.7
94.9
97.9
95.8
86.6
82.9
90.5
94.1
Baek's Reimplementation Version