当前位置:网站首页>Overview of OCR character recognition methods
Overview of OCR character recognition methods
2022-07-06 03:04:00 【GoAI】
OCR A collection of introduction to character recognition technology :
1️⃣OCR The first chapter of character recognition technology series :OCR Summary of character recognition technology ( One )
2️⃣OCR Chapter 2 of the character recognition technology series :OCR Summary of character recognition technology ( Two )
3️⃣OCR Chapter 3 of character recognition technology series :OCR Summary of character recognition technology ( 3、 ... and )
4️⃣OCR Chapter 4 of character recognition technology series :OCR Summary of character recognition technology ( Four )
5️⃣OCR Chapter 5 of character recognition technology series :OCR Summary of character recognition technology ( 5、 ... and )
OCR Summary of classic papers in the field of character recognition :
1️⃣OCR Text recognition classic papers detailed
OCR Overview of character recognition methods
pick want : Character recognition can transform massive unstructured data into structured data , So as to support various innovative AI applications , It's one of the branches of computer vision , Its task is to recognize the text content in the image , Generally, the input comes from the image text area truncated from the text box obtained by text detection . in recent years , The character recognition algorithm model based on deep learning has achieved good results , The process does not need feature processing and can realize character recognition in complex scenes , The effect is better than traditional character recognition methods , It has gradually become the mainstream way of character recognition research and Application . This paper will mainly introduce the overview of character recognition technology based on deep learning , Classify and summarize the classic algorithms of mainstream character recognition , Discuss the future development and research trends in the field of character recognition .
key word :OCR, Deep learning , Scene recognition ,CTC
1. introduction
Words are human thoughts 、 Knowledge and cultural heritage are indispensable carriers , It is also an important carrier for human information exchange and perception of the world . In the Internet information age, a large number of bills will be produced every day 、 Forms 、 Certificate data , At this time, we need to use character recognition technology to extract and input , Electronic data is of great significance for enterprises to improve production efficiency . Character recognition technology (optical character recognition,OCR) It refers to the use of optical technology and computer technology to detect the text in the image , Then recognize the text content in the image , It's one of the branches of computer vision [1]. Its concept lies in 1929 By German scientists Tausheck First put forward and applied for a patent , After nearly a hundred years of development ,OCR Character recognition has achieved good results in various fields . There are many application scenarios for text recognition , Document identification 、 Road sign recognition 、 License plate recognition 、 Industrial number identification, etc , Currently in medical treatment 、 Education and other industries are widely used , Its integrated digital image processing 、 Computer graphics, artificial intelligence and other theoretical knowledge , It has increasingly become the focus of attention in the field of artificial intelligence . Tradition OCR Although the recognition technology can achieve high accuracy in the specific scene of printed characters , However, in complex scenes, it is illuminated 、 shape 、 Fuzzy and other problems lead to low recognition accuracy . in recent years , With deep learning becoming the latest trend in the field of machine learning and artificial intelligence , The character recognition algorithm model based on deep learning has achieved good results , The process does not need feature processing and can realize character recognition in complex scenes , The effect is better than traditional character recognition methods , It has gradually become the mainstream way of character recognition research and Application .
2. Research status of character recognition based on deep learning
Tradition OCR Character recognition is a process that regards character recognition of text lines as a multi label task . Pictured 1 Shown , The recognition process is image preprocessing ( Color image graying 、 binarization 、 Images Change angle detection 、 Corrective treatment, etc )、 Layout ( Straight line detection 、 Tilt detection )、 Character positioning and segmentation 、 character distinguish 、 Layout recovery 、 post-processing 、 Calibration equivalence . Traditional character recognition is generally First, we need to locate the text area , Correct the inclined text after positioning, and then segment a single text , Then use artificial features HOG perhaps CNN features , Combine the classification model to recognize the words , Finally, based on the statistical language model ( Like the hidden Markov chain ,HMM) Or rules for semantic error correction , That is, language rule post-processing . Tradition OCR Character recognition algorithm is mainly based on image processing technology ( Like projection 、 inflation 、 Spin, etc ) And statistical machine learning (Adaboot、SVM) Realize image text content extraction [2], It is mainly applied to the single background color 、 Simple document image recognition with high resolution .
chart 1 Traditional character recognition method process
In a complex scene , Tradition OCR Recognition accuracy is difficult to meet the needs of practical applications , And based on deep learning OCR Performance is better than traditional methods [3]. Character recognition based on deep learning uses the ability of model algorithm , Replace traditional manual methods , Automatic detection of text category and location information , Automatically recognize the text content according to the corresponding location text information . Most of the existing deep learning recognition algorithms include image correction 、 feature extraction 、 Sequence prediction, etc , The identification process is shown in the figure 2 Shown .
chart 2 Mainstream deep learning character recognition method process
2006 year Hinton Put forward “ Deep learning ” The concept begins [4], Deep learning research methods have been widely used in various industries . With the continuous development of artificial intelligence technology in recent years , Character recognition based on deep learning has gradually become the mainstream technology of application , At present, good results have been achieved in the field of character recognition [5]. Deeply learn the development of character recognition , Pictured 3 Shown .
chart 3 The development of character recognition technology
At present, there are two mainstream deep learning character recognition algorithms , The difference is based on CTC[6] The algorithm is based on Attention Algorithm , The difference is mainly in the decoding stage . The former is to access the sequence generated by coding CTC decode , The latter is to connect the sequence to the cyclic neural network module for cyclic decoding . Besides , And based on segmentation 、 be based on Transformer And end-to-end character recognition methods .
2.1 be based on CTC The algorithm of
Temporal classification of connectionism (connectionist temporal classification,CTC) The mechanism is usually used in the prediction stage ,CTC By accumulating conditional probabilities CNN or RNN The output feature is converted into a string sequence . The application in text recognition technology can solve the problem of temporal text alignment , That is to ensure that the predicted text sequence is consistent with the actual text sequence , Same length .
As a classic character recognition algorithm , Bai Xiang's team and others are 2016 In, a character recognition algorithm was proposed CRNN[7], The convolution neural network 、 Recurrent neural networks and CTC Loss function combination , It is used to solve the problem of image-based sequence recognition , Especially the problem of scene character recognition . Such as chart 4 Shown ,CRNN Model Introduce two-way LSTM(Long Short-Term Memory)[8] Used to enhance context modeling , And pass CTC Loss function to realize end-to-end indefinite length sequence recognition , Its algorithm only needs basic word level labels and input pictures to realize model training , It has become one of the most popular frameworks in the field of character recognition .
chart 4 CRNN Network structure chart
Whereas CRNN Good results have been achieved in the field of character recognition , Later generations improved its basic algorithm structure ,FaceBook The company proposes to improve CTC Algorithm Rosetta[9], Its model is CRNN On the basis of improvement , The model consists of a full convolution network and CTC form , It performs well on English datasets . Besides ,Gao[10] Used by others CNN Convolution substitution LSTM, It has fewer parameters , Performance improvement and accuracy balance . The above two algorithms have good results in regular text , However, due to the limitations of network design , This kind of method is difficult to solve the irregular text recognition task of bending and rotation [11]. To solve this kind of problem , Some algorithm researchers have proposed a series of improved algorithms based on the above two kinds of algorithms [12][13].
2.2 be based on Attention Methods
Irregular text scene recognition is the main research direction in the field of text recognition . As a mainstream character recognition method , be based on Attention This method can realize irregular text recognition , Its content is often not in the horizontal position , And there is bending 、 Occlusion 、 Ambiguity and so on [14]. be based on Attention The main method of character recognition is coding - Decoding network structure , Its main input image passes through convolution neural network , Using a recurrent neural network RNN Sequence processing , Give greater weight to the target data and related data , Make the decoder “ attention ” Centralized mapping to target data , Get information details , Achieve a reasonable vector representation of a long input sequence . stay Attention Before the method appeared ,RARE[15] The algorithm proposes a correction method for irregular text , This method is a robust text recognizer with automatic correction function , The whole network is divided into two main parts , A spatial transformation network STN(Spatial Transformer Network) And one based on Sequence2Squence Identification network . The irregular text image passes through the correction module STN, from TPS(Thin-Plate-Spline) Transform into a horizontal image , This transformation can correct the bending to some extent 、 Transmission transformed text , After correction, it is sent to the sequence recognition network for decoding .
After the emergence of correction based methods ,R2AM[16] Algorithm for the first time Attention Introduce the field of text recognition , Firstly, the input image is extracted by recursive convolution layer , Then use the implicitly learned character level language statistics to pass RNN Decode output characters . Introduce... In the decoding process Attention Mechanism to realize soft feature selection , To make better use of image features , More in line with human intuition . The method based on correction has good mobility , In addition to the above RARE This is based on Attention Out of the way ,STAR-Net[17] The correction module is applied to CTC In terms of algorithm , Compared with tradition CRNN There is also a good improvement .Shi[18] A new method based on Attention Codec framework to recognize text . Pictured 5 Shown , The algorithm extracts features through convolution layer , Access bidirectional cyclic neural network mode , Can learn the character level language model hidden in the string from the training data , It can realize regular character recognition .
Integrated text correction module and Attention Method , Bai Xiang's team and others [18] A new classical model of text recognition is proposed ASTER. Pictured 6 Shown , The algorithm adopts encoding and decoding framework , First introduced STN The correction network module preprocesses the text , Posterior Union Attention Realize the alignment of features and label information . among , Integrate correction network and identification network into an end-to-end network to train , It has been widely used in irregular scene character recognition . because ASTER It shows good performance in solving the task of text recognition in irregular scenes , However, the methods based on correction are often limited by the geometric characteristics of characters , And the model is more easily affected by background noise .
chart 6 ASTER Network structure chart
To overcome the above problems ,Luo wait forsomeone [19] A multi-objective corrective attention network is proposed (multi-object rectified attention network,MORAN), Pictured 7 As shown in , Its structure consists of multi-objective correction network and Attention The sequence recognition network of mechanism , The correction network is a pixel level correction network , The network is not subject to geometric constraints , The transformation is more flexible , It can perfectly handle the problem of irregular text recognition .
chart 6 MORAN Network structure chart
There are a lot of algorithms in the follow-up Attention Explore and update the field , for example SAR[20] take 1D attention Expand to 2D attention On , Correction module mentioned RARE Is based on Attention Methods , The experimental proof is based on Attention Compared with CTC The method has a good accuracy improvement .Cheng wait forsomeone [21] This paper proposes a focused attention network FAN, For low pixels in processing / Complex images , Methods based on attention mechanism perform poorly , The main reason is that the attention network cannot accurately focus the attention center of the characters in this special image to the center of the corresponding target area , The focus network can be used to detect and correct the attention center , Effectively solve the problem of attention shift .
in summary , although CRNN+CTC Good results have been achieved in long text recognition , But it can only solve the problem of one-dimensional sequence recognition , And when the deformation of the text line is large ,CTC The recognition effect of will be greatly affected . and Seq2Seq+Attention How to identify , Although in principle, it can solve the problem of two-dimensional sequence recognition , But limited by RNN The limitation of network in long sequence recognition , And seq2seq The serial mechanism of leads to poor performance in long sequence text recognition and operation efficiency [22]. To overcome the above problems ,2019 Annuity Lianwen team and others [23] A sequence recognition algorithm based on cross entropy loss is proposed ACE. Pictured 7 Shown ,ACE The decoding method of the algorithm is different from CTC and Attention, Its supervision signal is actually a kind of weak supervision , Ignore the corresponding relationship between the characters in the label , There is no sequence information , Pay attention to the number of characters , In the case of low complexity, it can achieve the same effect as the mainstream recognition technology .ACE The loss is better than CTC Loss , And it can be used for multi line character recognition , Solve the problems of the above two methods from another angle .
chart 7 ACE Algorithm structure diagram
2020 year ,Hu[24] Et al. Proposed a new fusion text recognition algorithm GTC, Based on Attention and CTC Two ways to integrate , utilize Attention Yes CTC The alignment of , Effectively solved CRNN The network lacks the ability to focus on local areas ,GTC The model transfers the extracted features into CTC Decoder and attention indicator . Simultaneous addition GCN Graph convolution neural network improves the ability of model expression , The experimental effect is better than the above methods .
be based on Attention Methods to summarize
2.3 Segmentation based approach
The segmentation based method takes each character of the text line as an independent individual , Compared with the recognition after correcting the whole text line , It's easier to recognize a single character [25]. Try to locate each character position from the input text image , And the character classifier is applied to obtain the recognition results , Simplify complex global problems into local problems to solve , It has a good effect in the scene of irregular text , However, this method requires character level annotation , There are some difficulties in data acquisition .Lyu[26] An example word segmentation model for word recognition is proposed by et al , In its recognition part, the model uses FCN Methods . The literature [27] Consider text recognition from a two-dimensional perspective , Design character attention FCN To solve the problem of text recognition , When the text is bent or severely distorted , This method has better positioning results for both regular text and irregular text .
2022 year , Jin Lianwen [28] Et al. Proposed a new end-to-end text recognition algorithm based on non segmentation , The result adopts the full neural network model , Combine weak supervised learning module with context information for joint training . Among them, a new weak supervised learning method is proposed , Enable the network to train using only transcriptional annotations , You can avoid character segmentation comments , In handwritten text data set, the recognition effect is better than the above non separated recognition algorithm , Structure is shown in figure 8 Shown .
chart 8 Structure diagram of non separated recognition algorithm
2.4 be based on Transformer Methods
With Transformer Rapid development of , Classification and detection domain validation Transformer Effectiveness in visual tasks . For example, in the rule text recognition part ,CNN There are limitations in long dependency modeling ,Transformer Structure just solves this problem , It can focus on global information in the feature extractor , And can replace additional LSTM Context modeling module .
Yu D Et al. 2020 In, a new end-to-end trainable framework algorithm was proposed SRN[29]. Pictured 9 Shown ,SRN By the backbone 、 Parallel visual prompt module (PVA Propose a parallel attention module )、 Global semantic reasoning module (GSRM) And visual semantic fusion decoder (VSFD) Four-part composition , You can use the read order as a query , Make the calculation independent of time , Finally, the aligned visual features of all time steps are output in parallel .SRN Algorithm utilization Transformer Of Encoder As a semantic module , Integrate the visual information and semantic information of the picture , In occlusion 、 Fuzzy and other irregular text has a good recognition effect .NRTR Algorithm [30] Propose to use complete Transformer Structure encodes and decodes the input picture , Use simple integrations for feature extraction , Verify on text recognition Transformer Effectiveness of structure .
chart 9 SRN Algorithm structure diagram
2.5 End to end identification method
The end-to-end recognition method can share the information of text detection and recognition , And it can be jointly optimized , The overall reasoning speed is faster than the cascade method . The model trained end-to-end can learn more abundant image features , Only one network is needed , Enter a picture , At the same time, output the results of detection and recognition , It can effectively save time .STN-OCR[31] The network integrates detection and identification , It can carry out end-to-end text recognition . The network uses semi supervised training , No text location information is required , The whole system can be trained end-to-end . Based on end-to-end character recognition method FOTS[32], Can quickly locate the text network , Application RoI Rotate The module realizes the combination of detection and identification , Character recognition is fast and effective .Mask TextSpotter[33] Use a simple and smooth end-to-end learning process , Get accurate text detection and recognition through semantic segmentation . Besides , This method deals with irregular shaped text instances ( for example , Bend text ) Better than the previous method .ABCNet[34] Network is an end-to-end scene text detection and recognition network , For the first time, the network adaptively fits arbitrary shape text by parameterized Bessel curve , The calculation cost can be neglected , among BezierAlign Layer can accurately extract convolution features, so that the recognition accuracy is significantly improved , It is smoother and faster in detecting multi-directional and multi-scale text , It can realize real-time text recognition , Its structure is shown in the figure 10 Shown .
chart 10 ABCNet Algorithm structure diagram
Rule text based on the above deep learning 、 Irregular text 、 End to end character recognition methods , This paper summarizes the mainstream character recognition methods and the representative papers in various fields , As shown in the table 1 Shown .
surface 1 Summary of mainstream character recognition methods
Algorithm category | Main idea | Main papers | ||
Traditional algorithms | The sliding window 、 Character extraction 、 Dynamic programming | |||
CTC | be based on CTC Methods , The sequence is not aligned , Quickly identify | CRNN,Rosetta | ||
Attention | be based on attention Methods , Apply to unconventional text | RARE,DAN,PREN | ||
CTC+Attention | The fusion CTC and Attention thought | GTC | ||
Transformer | be based on transformer Methods | SRN,NRTR,Master,ABINet | ||
correction | The correction module learns the text boundary and corrects it to the horizontal direction | RARE,ASTER,SAR | ||
Division | Segmentation based approach , Extract the character position and then classify | TextScanner, Mask TextSpotter |
3. Character recognition data set and evaluation index
The task of text recognition is to recognize the text content in the image , Generally, the input comes from the image text area truncated from the text box obtained by text detection [35]. According to the actual scene , Text recognition data sets can generally be divided into regular text recognition and irregular text recognition according to the shape of the text to be recognized , The classification results are shown in the figure 11 Shown .
chart 11 Classification of regular and irregular text data sets
Different recognition algorithms are generally compared through the above two public data sets , At present, the more general English evaluation set classification . Regular text recognition mainly refers to printing fonts 、 Scanning text and other scenes , Think the text is roughly horizontal , Its representative data sets mainly include IC13[36]、SVT[37]、IIIT5K[38] etc. . Irregular text recognition occurs in natural scenes , And because of the curvature of the text 、 Direction 、 There are great differences in deformation and so on , Text is often not horizontal , Presence of bending 、 Occlusion 、 Ambiguity and so on , Its representative data sets are IC15 [39]、COCO-Text[40]、SVTP[41]、CUTE [42] etc. . For the appropriate character recognition data set , It is very important to find the corresponding recognition method and apply it . Each data set corresponds to a different OCR Identify processing methods , Each method also has a suitable data set . According to different image acquisition methods , Character data sets can be divided into three categories : Character image data set collected in natural environment 、 Handwritten character image data set 、 Character image data set synthesized by computer with different fonts , Chinese character data set ICDAR2019-LSVT[43]、ICDAR2019-ReCTS[44]、CTW[45]、ShopSign[46] etc. ; The synthetic text data set includes Synth90K[47]、SynthText[48]、SynthAdd[49] etc. .
This paper summarizes and sorts out the information of common character recognition data sets , The common Chinese and English text datasets are shown in the table 2 Shown .
4. Development and research trend of character recognition technology
at present , The development of character recognition technology based on deep learning is relatively mature , In the education 、 Widely used in the medical industry , However, due to the current lack of open source data sets , As a result, the improvement of recognition algorithm is limited [50]. In identifying scenes , People have higher and higher requirements for the recognition effect of character recognition in complex scenes ; The future research trend of character recognition technology is mainly reflected in the following aspects :
(1) Complex scene character recognition
Deep learning has natural advantages in the field of character recognition , Although the problems that can be solved are becoming more and more complex , But there are also certain problems that need to be solved , For example, the detection performance of dense text and irregular text is still far lower than that of detection level text . Especially in handwritten scenes , Such as handwritten mathematical formula recognition 、 The research of minority language recognition is of great significance [51][52]. secondly , How in natural scenes 、 A complex scenario ( Such as character deformation 、 overlap )、 Multilingual scenes ( One piece contains multiple words at the same time ) Etc. for text detection and recognition , Solve the problem of character positioning and preprocessing , Improve the recognition accuracy , It is a hot research direction of character recognition in the future [53].
(2) Zero sample 、 Study with fewer samples (Zero-shot[54]/Few-shot[55])
Combine zero sample learning or small sample learning algorithm in the process of character recognition , Joint context semantic information , It is one of the hot research trends in the future development of character recognition technology . Especially in the research of ancient book recognition , By adding no or less relevant recognition characters to the training samples , Combine various auxiliary information , Fuse the visual model with the context semantic information , Partial categories for implementation ( Such as simplified characters ) Sample training recognition , To identify new categories ( Such as traditional Chinese characters ) sample , Make the machine recognize unknown words .
(3) Large scale dataset and character set annotation
Data set is the key to the improvement of character recognition algorithm , Directly affect the final recognition effect . At present, there is a lack of open source character data sets in the field of character recognition . One side , Enterprises regard relevant business data as privacy , As a result, it cannot be made public ; On the other hand , In the academic field, character recognition data sets are limited by manual and technical conditions , Resulting in smaller data size . therefore , In the future, we need to open source more processed large-scale text data sets , On the one hand, we can try to expand data through data enhancement related algorithms ; On the other hand , Can pass GAN Generative antagonistic network [56] Generate multiple font images , Improve the performance and generalization ability of the recognition algorithm model .
5. Conclusion
This paper will mainly introduce the overview of character recognition technology based on deep learning , Classify and summarize the classic algorithms of mainstream character recognition , List the ideas and contributions of classic papers respectively . First , This paper introduces the method of regular text recognition based on CTC And based on Attention、Transformer And segmentation , Summarize the end-to-end Algorithm . Last , Discuss the development and research trends in the field of character recognition . Tradition OCR Up to now , Most simple scenarios have been solved and achieved good results , But in Some complex scenes , Tradition OCR Recognition accuracy is difficult to meet the needs of practical applications . Based on deep learning OCR Performance is better than traditional methods , With the continuous development of artificial intelligence technology in recent years , Character recognition based on deep learning has gradually become the mainstream technology of application , At present, good results have been achieved in the field of character recognition , Its future development direction will gradually expand to more 、 More complex scenarios , Combine multidisciplinary work , Make the application of character recognition technology in artificial intelligence more mature . As the driving force of deep learning , Data plays a crucial role , Therefore, open source large-scale data sets are also the focus of improving the effect of character recognition at this stage . in addition , In the application of character recognition , We need to introduce more lightweight models , Improve the training speed of the model while ensuring a certain accuracy , So that its system can be quickly deployed to the server .
reference
- Liu Weihong . The current situation and significance of digitalization of Chinese ancient books [J]. Books and information , 2009 (4): 134-137
- Jiang Wei , Zhang Chongsheng , Yin Xucheng . Overview of scene text detection based on deep learning [J]. Journal of Electronics , 2019, 47(5): 1152.
- Liu Yanju , Yixinhai , Li Yange , Zhanghuiyu , Liu Yanzhong . An overview of the application of deep learning in scene character recognition technology [J]. Computer engineering and Application ,2022,58(04):52-63.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
- Jin Lianwen , Zhong zhuoyao et al . A review of the application of deep learning in handwritten Chinese character recognition [J]. Journal of Automation 2016,42(8):1125-1141.
- Graves, Alex, et al.Connectionist temporal classification: labelling unsegmented sequence data withrecurrent neural networks. In ICML, 2006.
- Shi B, Bai X,Yao C. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition[J]. ieee transactions on pattern analysis & machine intelligence, 2016, 39(11):2298-2304.
- F. A. Gers, N. N. Schraudolph, and J. Schmidhuber. Learning precise timing with LSTM recurrent networks. JMLR,3:115–143, 2002.
- Fedor Borisyuk, Albert Gordo, and Viswanath Sivakumar. Rosetta: Large scale system for text detection and recognition in images. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 71–79. ACM, 2018.
- Gao, Yunze, et al. Reading scene text with attention convolutional sequence modeling. arXiv preprint arXiv:1709.04303, 2017.
- Wang Deqing , Wushouer et al . A survey of scene character recognition technology [J]. Computer engineering and Application , 2020, 56(18): 1-15.
- Niu Xiaoming , Bi Kejun , Tang Jun . Overview of graphic recognition technology [J]. Chinese stereology and image analysis , 2019, 25(3):241-256.
- Ganji . Research on Algorithms of handwritten character recognition and related problems [D]. University of Chinese Academy of Sciences ( Computer science and technology, Chinese Academy of Sciences Technical College ),2021.DOI:10.44196/d.cnki.gjskx.2021.000003.
- Shi B, Wang X, Lyu P, et al. Robust scene text recognition with automatic rectification[C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 4168-4176.
- Lee C Y , Osindero S . Recursive Recurrent Nets with Attention Modeling for OCR in the Wild[C]// IEEE Conference on Computer Vision & Pattern Recognition. IEEE, 2016.
- Star-Net Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. Spa- tial transformer networks. In Advances in neural information processing systems, pages 2017–2025, 2015.
- Baoguang Shi, Mingkun Yang, XingGang Wang, Pengyuan Lyu, Xiang Bai, and Cong Yao. Aster: An attentional scene text recognizer with flexible rectification. IEEE transactions on pattern analysis and machine intelligence, 31(11):855–868, 2018.
- F. A. Gers, N. N. Schraudolph, and J. Schmidhuber. Learning precise timing with LSTM recurrent networks. JMLR,3:115–143, 2002.
- Luo C, Jin L, Sun Z. A multi-object rectified attention network for scene text recognition[J]. Pattern Recognition, 2019, 90: 109-118.
- Li H, Wang P, Shen C, et al. Show, attend and read: A simple and strong baseline for irregular text recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33(01): 8610-8617.
- Cheng Z, Bai F, Xu Y, et al. Focusing attention: Towards accurate text recognition in natural images[C]//Proceedings of the IEEE international conference on computer vision. 2017: 5076-5084.
- Lee C Y , Osindero S . Recursive Recurrent Nets with Attention Modeling for OCR in the Wild[C]// IEEE Conference on Computer Vision & Pattern Recognition. IEEE, 2016.
- Xie Z, Huang Y, Zhu Y, et al. Aggregation cross-entropy for sequence recognition[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA, 2019: 6538-6547
- Hu W, Cai X, Hou J, et al. Gtc: Guided training of ctc towards efficient and accurate scene text recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2020, 34(07): 11005-11012.
- Raja S, Mondal A, Jawahar C V. Table structure recognition using top-down and bottom-up cues[C]//European Conference on Computer Vision. Springer, Cham, 2020: 70-86.
- P. Lyu, C. Yao, W. Wu, S. Yan, and X. Bai. Multi-oriented scene text detection via corner localization and region segmentation. In Proc. CVPR, pages 7553–7563, 2018.
- Liao M, Zhang J, Wan Z, et al. Scene text recognition from two-dimensional perspective[C] //Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33(01): 8714-8721.
- Peng D, Jin L, Ma W, et al. Recognition of Handwritten Chinese Text by Segmentation: A Segment-annotation-free Approach[J]. IEEE Transactions on Multimedia, 2022.
- Yu D, Li X, Zhang C, et al. Towards accurate scene text recognition with semantic reasoning networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 12113-12122.
- Sheng F, Chen Z, Xu B. NRTR: A no-recurrence sequence-to-sequence model for scene text recognition[C]//2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2019: 781-786.
- Bartz C, Yang H, Meinel C. STN-OCR: A single neural network for text detection and text recognition[J]. arXiv preprint arXiv:1707.08831, 2017.
- Liu X, Liang D, Yan S, et al. Fots: Fast oriented text spotting with a unified network[C] //Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 5676-5685.
- Lyu P, Liao M, Yao C, et al. Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 67-83.
- Liu Y, Chen H, Shen C, et al. Abcnet: Real-time scene text spotting with adaptive bezier-curve network[C]//proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 9809-9818.
- Zhanghuaping , Huang Chen . Research on character recognition technology [J]. Internet of things technology ,2018,8(08):17-19.DOI:10.16667/j.issn.2095- 1302.2018.08.002.
- Karatzas D, Shafait F, Uchida S, et al. ICDAR 2013 robust reading competition[C]//2013 12th International Conference on Document Analysis and Recognition. IEEE, 2013: 1484-1493.
- Wang K, Babenko B, Belongie S. End-to-end scene text recognition[C]//2011 International conference on computer vision. IEEE, 2011: 1457-1464.
- Yasmeen U, Shah J H, Khan M A, et al. Text detection and classification from low quality natural images[J]. 2020.
- Karatzas D, Gomez-Bigorda L, Nicolaou A, et al. ICDAR 2015 competition on robust reading[C]// international conference on document analysis IEEE, 2015: 1156-1160.
- Veit A, Matera T, Neumann L, et al. Coco-text: Dataset and benchmark for text detection andrecognition in natural images[J]. arXiv preprint arXiv:1601.07140, 2016
- Phan T Q, Shivakumara P, Tian S, et al. Recognizing text with perspective distortion in naturalscenes[C]//Proceedings of the IEEE International Conference on Computer Vision. 2013: 569-576.
- Risnumawan A, Shivakumara P, Chan C S, et al. A robust arbitrary text detection system for natural scene images[J]. Expert Systems with Applications, 2014, 41(18): 8027-8048.
- Sun Y, Ni Z, Chng C K, et al. ICDAR 2019 competition on large-scale street view text with partial labeling-RRC-LSVT[C]//2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2019: 1557-1562.
- Zhang R, Zhou Y, Jiang Q, et al. Icdar 2019 robust reading challenge on reading chinese text onsignboard[C]//2019 international conference on document analysis and recognition (ICDAR). IEEE, 2019: 1577-1581.
- Yuan T L, Zhu Z, Xu K, et al. A large chinese text dataset in the wild[J]. Journal of Computer Science and Technology, 2019, 34(3): 509-521.
- Zhang C, Ding W, Peng G, et al. Street view text recognition with deep learning for urban sceneunderstanding in intelligent transportation systems[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(7): 4727-4743.
- Jaderberg M, Simonyan K, Vedaldi A, et al. Synthetic data and artificial neural networks for natural scene text recognition[J]. arXiv preprint arXiv:1406.2227, 2014.
- Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2315-2324.
- Li H, Wang P, Shen C, et al. Show, attend and read: A simple and strong baseline for irregular text recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33(01): 8610-8617.
- Bai Wenrong . Research on online handwritten Mongolian Character Recognition Technology [D]. Inner Mongolia University ,2007.
- Lu Yan . AI The application of character recognition technology in the digitization of urban planning archives [J]. Scientific and technological innovation ,2022(14):54-57
- Chen X, Jin L, Zhu Y, et al. Text recognition in the wild: A survey[J]. ACM Computing Surveys (CSUR), 2021, 54(2): 1-35.
- Romera-Paredes B, Torr P. An embarrassingly simple approach to zero-shot learning[C] //International conference on machine learning. PMLR, 2015: 2152-2161.
- Sung F, Yang Y, Zhang L, et al. Learning to compare: Relation network for few-shot learning[C] //Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 1199-1208.
- Karras T, Laine S, Aittala M, et al. Analyzing and improving the image quality of stylegan[C] //Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 8110-8119.
Classification of scene text detection and recognition methods
边栏推荐
- [unity3d] GUI control
- 1003 emergency (25 points), "DIJ deformation"
- 07 单件(Singleton)模式
- Solution: attributeerror: 'STR' object has no attribute 'decode‘
- tcpdump: no suitable device found
- Software design principles
- Descriptor implements ORM model
- C语言sizeof和strlen的区别
- Single instance mode of encapsulating PDO with PHP in spare time
- ERA5再分析资料下载攻略
猜你喜欢
Eight super classic pointer interview questions (3000 words in detail)
Referenceerror: primordials is not defined error resolution
[ruoyi] enable Mini navigation bar
Misc (eternal night), the preliminary competition of the innovation practice competition of the National College Students' information security competition
【Unity3D】GUI控件
Reverse repackaging of wechat applet
主数据管理理论与实践
Codeworks 5 questions per day (1700 average) - day 6
银行核心业务系统性能测试方法
淘宝焦点图布局实战
随机推荐
Buuctf question brushing notes - [geek challenge 2019] easysql 1
[ruoyi] set theme style
IPv6 jobs
[kubernetes series] learn the exposed application of kubernetes service security
What is the investment value of iFLYTEK, which does not make money?
Atcoder beginer contest 233 (a~d) solution
Descriptor implements ORM model
张丽俊:穿透不确定性要靠四个“不变”
淘宝焦点图布局实战
Universal crud interface
QT release exe software and modify exe application icon
Data and Introspection__ dict__ Attributes and__ slots__ attribute
Spherical lens and cylindrical lens
ERA5再分析资料下载攻略
Era5 reanalysis data download strategy
Linear regression and logistic regression
codeforces每日5題(均1700)-第六天
Audio audiorecord binder communication mechanism
2.13 simulation summary
深度解析链动2+1模式,颠覆传统卖货思维?