当前位置:网站首页>Develop your own text recognition application with Tesseract
Develop your own text recognition application with Tesseract
2022-08-04 21:32:00 【ReyZhang】
前言
这次和⼤Chat at home⽂Word recognition related topics.⼤At home, there must be various types of scanning APP 不陌⽣.拿着⼿camera facing any⽂字,directly into the camera⽂Word content is converted into ⼿Onboard editable strings.
完整的 OCR 算法,并不简单.It involves image recognition algorithms,as well as from Fig⽚identified in⽂Words can be converted into our program⽤的⼆进制形式.This also includes different languages⾔how to recognize characters,How to more accurately identify and so on.
设想⼀下,如果作为开发者,让你⾃⼰Develop this from scratch⼀套算法,I'm afraid it will cost a lot⼤⼒⽓.不过好消息是,我们⽣Live in this age of open source,There are many seniors who have already made the way for us.That's what we're going to talk about today, Tesseract
就是这样⼀个开源库,给定任意⼀张图⽚,它可以识别出⾥⾯所有的⽂字内容,并且 API 接⼝使⽤⾮常简单.
Tesseract
Tesseract 是 Google 发布的⼀款 OCR 开源库,它⽀Holds multiple idioms⾔环境,以及运⾏环境,Including us⾥将要介绍的 iOS 环境.今天就⽤It leads⼤家开发⼀one belongs to you⾃⼰的 OCR 应⽤.
1.安装 Tesseract
最简单的⽅式是⽤过 Cocoapods,进⼊你的项⽬根⽬录,输⼊:
pod init
然后,编辑⽣成的 Podfile 配置⽂件:
target 'ocrSamples' do
use_frameworks!
pod 'TesseractOCRiOS'
end
将TesseractOCRiOS加⼊配置列表中,最后输⼊:
pod install
2,.配置⼯作
安装完成后,我们还需要进⾏⼀Simple configuration below,⾸先要在 Build Phases -> Link Binary With Libraries
中配置,项⽬需要的依赖库, CoreImage
, libstdc++
, 以及 TesseractOCRiOS
⾃⾝:
After configuring the dependency library,我们还需要将⽂Word recognition training data
添加进来,What is the training data,TesseractOCRiOS
识别图⽚的时候,It will be recognized according to the rules of this training data⽂字.⽐如中⽂,英⽂等,都有对应的训练数据,This can be understood as deep learning pre-trained models for us.⽤它来进⾏核⼼的识别算法.
Training data is required to follow不同语言
来区分的,Tesseract There is a dedicated page listing all available training data:https://github.com/tesseract-ocr/tessdata/tree/3.04.00
For example, we need to recognize Simplified Chinese here,就可以下载 chi-sim this training data:
Then drag and drop the training data into ⼯程中:
上图中的 chi_sim.traineddata
is the training data we downloaded,这⾥⾯有⼀The point to note is that,这个⽂item must be present testdata
这个⽂件夹中.并且这个⽂folder to start with "Create Folder Reference"
的⽅drag and drop in:
This citation⽤⽅formula and other⼀种 "Create Groups"
的⽅What's the difference?
主要区别在于:
- 使⽤
引⽤⽅式
Drag and drop in⽬录,在最后⽣成 APP 包的时候,Main Bunlde
中是以testdata/chi_sim.traineddata
This path form saves our training data resources - 使⽤
"Create Groups"
⽅式,It will be when it is finally stored忽略⽂Folder name
的,最后存储在Main Bundle
中的⽂件是以/chi_sim.traineddata
stored in this path.
注意: ⽽ TesseractOCRiOS
,By default it will be there testdata/chi_sim.traineddata
This path looks for training data,所以如果使⽤"Create Groups"
⽅drag⼊,will cause luck⾏No training data was found,⽽报错.This detail requires special attention.
最后, 为了让 TesseractOCRiOS
can be shipped correctly⾏,We also need to turn off BitCode
,否则会报编译错误,We need to turn it off in both places,⼀个是⾄⼯程,另外⼀个是 Pods 模块,如下图:
3. 开始编码
Complete the above preparations⼯作后,我们就可以开始编码了,这⾥只给⼤Home Show⽰最精简的代码.⾸First we need to be in the main world⾯上显⽰两个控件,⼀One is our pre-stored band⽂photo of words⽚,另外⼀个是⽤于展⽰识别结果的⽂本框:
override func viewDidLoad() {
super.viewDidLoad()
self.imageView = UIImageView(frame: CGRect(x: 0, y: 80, width: self.view.frame.size.width, height:
self.view.frame.size.width * 0.7))
self.imageView?.image = UIImage(named: "textimg")
self.view.addSubview( self.imageView!)
self.textView = UITextView(frame: CGRect(x: 0, y: labelResult.frame.origin.y + labelResult.frame.size.height + 20, width: self.view.frame.size.width, height: 200))
self.view.addSubview( self.textView!)
}
这⾥⾯We only write out the key code for initialization of the two controls,Other unimportant codes are omitted.Then you can tune⽤ TesseractOCRiOS 来进⾏⽂word recognized:
func recognizeText(image: UIImage) {
if let tesseract= G8Tesseract(language: "chi_sim") {
tesseract. engineMode= .tesseractOnly
tesseract. pageSegmentationMode= .auto
tesseract. image= image
tesseract.recognize()
self.textView?. text= tesseract.recognizedText
}
}
All identification related code is here⾥了.⾸先调⽤ G8Tesseract
进⾏初始化,我们传⼊The name of the training data,这⾥是 "chi_sim"
Represents Simplified Chinese⽂.
engineMode
There are three modes to choose from, tesseractOnly
,tesseractCubeCombined
和 cubeOnly
.我们使⽤第⼀种模式,采⽤训练数据的⽅式. cubeOnly
Consciousness is to make⽤更精准的 cube ⽅式, tesseractCubeCombined
It is a combination of the two modes⽤.
cube
Schema requires additional model data,⼤The example is this:
The picture above is in English⽂的 cube 识别模型.简体中⽂I haven't found the model yet,So we just in this instance⽤到了 tesseractOnly
模式.并且在没有 cube
in the case of model data,我们是不⽤使⽤ tesseractCubeCombined
和 cubeOnly
的,Otherwise it will be because the model data does not exist⽽报错.⼤Home if you can find it⽂的 cube 模型,You are also welcome to stay⾔中反馈,This will make this⽂Word recognition is more accurate.
other tones⽤No need to say more,will be identified image 对象设置给 Tesseract.然后调⽤它的 recognize ⽅法进⾏识别,Finally the result will be identified tesseract.recognizedText 设置给 TextView.
Final luck⾏效果如下:
上⾯的图⽚I took it SwiftCafe ⽹站上⼀篇⽂Chapter photo⽚,from the identification results,还算⽐较准确.
总结
⼤Home from above⾯The code should also feel it, Tesseract Although provided essentially counts⽐较复杂的⽂Word recognition algorithm,But it provides access to developers⼝可以说得上是⾮常简单. OCR ⽂Word recognition as a whole,也可以算得上是 AI 的⼀个应⽤分⽀,在这个 AI ⼤⾏its time,即便掌握⼀些应⽤技术,It can also be very good for us developers⼤to broaden our horizons.发挥你的创意,也许类似Tesseract These components can help you create AI A newcomer of the times⽤.
Tesseract These components can help you create AI A newcomer of the times⽤.
当然, Tesseract ⽬前⾃⾝还有⼀些缺陷,⽐as it isOnly prints are recognized
字体,⽽cannot be accurately identified⼿write font
.But that⼜怎么样呢,It would have been quite complicated OCR 算法,可以让我们⽤⾮Often a small price should be paid⽤起来,还是⼀piece to the developer⾮A very happy thing.
照例,The sample project code in this article has been placed Github 中,You can download it directly if you need it:https://github.com/swiftcafex/Samples/tree/master/ocrSamples.
边栏推荐
猜你喜欢
[2022 Nioke Duo School 5 A Question Don't Starve] DP
数电快速入门(一)(BCD码和三种基本逻辑运算的介绍)
27. Dimensionality reduction
【线性代数02】AX=b的2种解释和矩阵乘法的5种视角
如何根据“前序遍历,中序遍历”,“中序遍历,后序遍历”构建按二叉树
NFT宝典:你需要知道NFT的术语和定义
JdbcTemplate概述和测试
Red team kill-free development practice of simulated confrontation
【PCBA program design】Grip dynamometer program
Altium Designer 19.1.18 - 保护锁定的对象
随机推荐
使用堡塔应用管理器配置laravel队列方法
【2022杭电多校5 1003 Slipper】多个超级源点+最短路
Dotnet using WMI software acquisition system installation
ini怎么使用? C#教程
milvus配置相关
【2022杭电多校5 1012题 Buy Figurines】STL的运用
立方度量(Cubic Metric)
大势所趋之下的nft拍卖,未来艺术品的新赋能
In action: 10 ways to implement delayed tasks, with code!
数电快速入门(一)(BCD码和三种基本逻辑运算的介绍)
mdk5.14无法烧录
SPSS-System Clustering Hand Calculation Practice
dotnet 删除只读文件
dotnet 启动 JIT 多核心编译提升启动性能
【2022牛客多校5 A题 Don‘t Starve】DP
PowerCLi batch configuration of NTP
PMP证书在哪些行业有用?
LayaBox---TypeScript---structure
AI/ML无线通信
Re24:读论文 IOT-Match Explainable Legal Case Matching via Inverse Optimal Transport-based Rationale Ext