当前位置:网站首页>有了这款工具,自动化识别验证码再也不是问题
有了这款工具,自动化识别验证码再也不是问题
2022-06-29 09:34:00 【二 黑】
01 环境准备
1、windows 环境下载 exe
http://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-4.00.00dev.exe
双击 exe,一路 next 完成 Tesseract-OCR 安装



2、配置环境变量
PATH 增加 D:\ProgramFiles\Tesseract-OCR
新建环境变量 TESSDATA_PREFIX 值为
D:\ProgramFiles\Tesseract-OCR\tessdata
这是将语言字库文件夹添加到环境变量 TESSDATA_PREFIX 中
CMD 命令行窗口输入如下命令:
查看版本号
C:\Users\18611>tesseract -v
tesseract 4.00.00alpha
leptonica-1.74.1
libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.6.20: libtiff 4.0.6 : zlib 1.2.8 :
libwebp 0.4.3 : libopenjp2 2.1.0
查看支持的语言包
C:\Users\18611>tesseract --list-langs
List of available languages (2):
eng
osd
C:\Users\18611>
02 命令识别图片
识别如下图片验证码

使用 tesseract 命令识别图片中的内容
C:\Users\18611>cd Desktop
C:\Users\18611\Desktop>tesseract test2.png output
Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica
C:\Users\18611\Desktop>
【语法】:tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile…]
imagename 为目标图片文件名,需加格式后缀;
outputbase 是转换结果文件名;
lang 是语言名称(在 Tesseract-OCR 中 tessdata 文件夹可看到以 eng 开头的语言文件 eng.traineddata),如不标-l eng 则默认为 eng。
03 java自动识别图片
将 tesseract.exe 命令保存为 bat 文件,bat 内容为:
//图片路径 D:\Tesseract-OCR\test.png 生成 txt 文件存放路径及文件名 result

代码实现如下:
package com.mtx.util;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStreamReader;
/** * @ClassName ReadCpacha * @Description TODO * @Author 彩虹 rainbow QQ3130978832 * @Date-Time 2022/6/9 13:55 * @ProjectName MtxPublic * @Copyright 北京码同学网络科技有限公司 **/
public class ReadCpacha{
public static String readPic(){
String cmd= "cmd /c start D:\\Tesseract-OCR\\tesseract.bat";
try {
Runtime.getRuntime().exec(cmd);
} catch(Exception e) {
e.printStackTrace();
}
try {
//线程阻塞 3 秒等待 tesseract.exe 执行完成
Thread.sleep(3000);
}catch (InterruptedException e) {
e.printStackTrace();
}
//执行 tesseract.exe 识别图片后生成 result.txt 文件中保存识别后验证码
//读取 result.txt 文件获取验证码
// ReadTxt
BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
StringBuffer sb= new StringBuffer();
String text = null;
while((text = bufferedReader.readLine()) != null){
//逐行读取到的字符串存到 StringBuffer 对象
sb.append(text);
}
return sb.toString();
}catch (Exception e) {
e.printStackTrace();
}
}
return null;
}
public static void main(String[] args) {
String str = readPic();//调用封装方法测试
System.out.println(str);
}
}
C:\Users\18611\IdeaProjects\MtxPublic>tesseract --help-psm
Page segmentation modes:
0 Orientation and script detection (OSD) only.
1 Automatic page segmentation with OSD.
2 Automatic page segmentation, but no OSD, or OCR.
3 Fully automatic page segmentation, but no OSD. (Default)
4 Assume a single column of text of variable sizes.
5 Assume a single uniform block of vertically aligned text.
6 Assume a single uniform block of text.
7 Treat the image as a single text line.
8 Treat the image as a single word.
9 Treat the image as a single word in a circle.
10 Treat the image as a single character.
11 Sparse text. Find as much text as possible in no particular order.
12 Sparse text with OSD.
13 Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific.
C:\Users\18611\IdeaProjects\MtxPublic>
学习资源分享
最后感谢每一个认真阅读我文章的人,看着粉丝一路的上涨和关注,礼尚往来总是要有的,虽然不是什么很值钱的东西,如果你用得到的话可以直接拿走
这些资料,对于想从事【软件测试】的朋友来说应该是最全面最完整的备战仓库,这个仓库也陪伴我走过了最艰难的路程,希望也能帮助到你!以上资料均可以分享,只需要你点下方进群即可。
边栏推荐
- mysql中的if [not] exists
- Installing and configuring wmware esxi 6.5.0 in VMware Workstation
- 《CLR via C#》读书笔记-单实例应用程序
- 2020-09-18 referer authentication URL escape
- 查看CSDN的博客排名
- I would like to know how to open an account for free online stock registration? In addition, is it safe to open a mobile account?
- C#窗体向另一个窗体实时传值
- std::unique_ptr<T>与boost::scoped_ptr<T>的特殊性
- Ce projet Open source est super wow, des photos manuscrites sont générées en ligne
- mysql 8.0 一条insert语句的具体执行流程分析(三)
猜你喜欢

《CLR via C#》读书笔记-加载与AppDomain

Given the values of two integer variables, the contents of the two values are exchanged (C language)

Design of intelligent test paper generation system

AQS之ReentrantLock源码解析

AQS之Atomic详解

Arc view and arc viewpager

MySQL InnoDB data length limit per row

How to quickly complete disk partitioning

给定两个整形变量的值,将两个值的内容进行交换 (C语言)

Downloading and installing VMware (basic idea + detailed process)
随机推荐
Print leap years between 1000 and 2000 (C language)
Redis installation and cluster setup under Linux
stream流(Collectors)用法
1-数据库了解
mysql中的if [not] exists
打印100~200之间的素数(C语言)
Summary after the 2009 ICPC Shanghai regional competition
《MongoDB入门教程》第02篇 MongoDB安装
CLR via C reading notes - loading and AppDomain
Download control 1 of custom control (downloadview1)
Software test model (V model and W model)
30岁,女,普通软件测试媛,对职业的迷茫和焦虑
September 25, 2020 noncopyable of boost library for singleton mode
DevExpress的双击获取单元格数据
September 29, 2020 non commodity templating code level rapidjson Library
IIS服务器相关错误
Recyclerview sticky (suspended) head
Learn spark computing framework in practice (01)
BUUCTF RE-easyre
1- database understanding