当前位置:网站首页>opencv学习笔记五--文件扫描+OCR文字识别
opencv学习笔记五--文件扫描+OCR文字识别
2022-07-01 14:52:00 【Cloudy_to_sunny】
文件扫描
# 导入工具包
import numpy as np
import argparse
import cv2
import matplotlib.pyplot as plt#Matplotlib是RGB
定义函数
# 绘图展示
def cv_show(name,img):
b,g,r = cv2.split(img)
img_rgb = cv2.merge((r,g,b))
plt.imshow(img_rgb)
plt.show()
def cv_show1(name,img):
plt.imshow(img)
plt.show()
cv2.imshow(name,img)
cv2.waitKey()
cv2.destroyAllWindows()
def order_points(pts):
# 一共4个坐标点
rect = np.zeros((4, 2), dtype = "float32")
# 按顺序找到对应坐标0123分别是 左上,右上,右下,左下
# 计算左上,右下
s = pts.sum(axis = 1)#横纵坐标相加,最大的是右下,最小的是左上
rect[0] = pts[np.argmin(s)]
rect[2] = pts[np.argmax(s)]
# 计算右上和左下
diff = np.diff(pts, axis = 1)
rect[1] = pts[np.argmin(diff)]
rect[3] = pts[np.argmax(diff)]
return rect
def four_point_transform(image, pts):
# 获取输入坐标点
rect = order_points(pts)#pts是原图上的四个点坐标
(tl, tr, br, bl) = rect
# 计算输入的w和h值
widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
maxWidth = max(int(widthA), int(widthB))
heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
maxHeight = max(int(heightA), int(heightB))
# 变换后对应坐标位置
dst = np.array([
[0, 0],
[maxWidth - 1, 0],
[maxWidth - 1, maxHeight - 1],
[0, maxHeight - 1]], dtype = "float32")
# 计算变换矩阵
M = cv2.getPerspectiveTransform(rect, dst)#从rect到dst的变换矩阵
warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight))#得到变换结果
# 返回变换后结果
return warped
def resize(image, width=None, height=None, inter=cv2.INTER_AREA):
dim = None
(h, w) = image.shape[:2]
if width is None and height is None:
return image
if width is None:
r = height / float(h)
dim = (int(w * r), height)
else:
r = width / float(w)
dim = (width, int(h * r))
resized = cv2.resize(image, dim, interpolation=inter)
return resized
# 读取输入
image = cv2.imread("./images/receipt.jpg")
#坐标也会相同变化
cv_show("Image",image)
ratio = image.shape[0] / 500.0
print(image.shape[0])
orig = image.copy()

2448
image = resize(orig, height = 500)
cv_show("Image",image)

边缘检测
# 预处理
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (5, 5), 0)
edged = cv2.Canny(gray, 75, 200)
# 展示预处理结果
print("STEP 1: 边缘检测")
cv_show("Image", image)
cv_show1("Edged", edged)
STEP 1: 边缘检测


# 轮廓检测
cnts = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)[1]#检测轮廓
cnts = sorted(cnts, key = cv2.contourArea, reverse = True)[:5]#对轮廓按照面积大小进行排序
#cv2.drawContours(image, cnts, -1, (0, 255, 0), 2)
#cv_show("Outline", image)
# 遍历轮廓
for c in cnts:
# 计算轮廓近似
peri = cv2.arcLength(c, True)#计算轮廓周长
# C表示输入的点集
# epsilon表示从原始轮廓到近似轮廓的最大距离,它是一个准确度参数
# True表示封闭的
approx = cv2.approxPolyDP(c, 0.02 * peri, True)
# 4个点的时候就拿出来
if len(approx) == 4:
screenCnt = approx
break
获取轮廓
# 展示结果
print("STEP 2: 获取轮廓")
cv2.drawContours(image, [screenCnt], -1, (0, 255, 0), 2)
cv_show("Outline", image)
STEP 2: 获取轮廓

# 透视变换
warped = four_point_transform(orig, screenCnt.reshape(4, 2) * ratio)
print(screenCnt.reshape(4, 2))
print(screenCnt.reshape(4, 2).sum(axis = 1))
[[465 110]
[113 137]
[147 375]
[474 323]]
[575 250 522 797]
变换
# 二值处理
warped = cv2.cvtColor(warped, cv2.COLOR_BGR2GRAY)
ref = cv2.threshold(warped, 100, 255, cv2.THRESH_BINARY)[1]
cv2.imwrite('scan.jpg', ref)
# 展示结果
print("STEP 3: 变换")
cv_show("Original", resize(orig, height = 650))
cv_show1("Scanned", resize(ref, height = 650))
STEP 3: 变换


OCR文字识别
环境配置
安装tesseract-ocr-w64-setup-v5.0.1.20220118.exe
- https://digi.bib.uni-mannheim.de/tesseract/
- 配置环境变量如E:\Program Files (x86)\Tesseract-OCR
- tesseract -v进行测试
- tesseract XXX.png 得到结果
- pip install pytesseract
- anaconda lib site-packges pytesseract pytesseract.py
- tesseract_cmd 修改为绝对路径即可
代码
from PIL import Image
import pytesseract
import cv2
import os
import matplotlib.pyplot as plt#Matplotlib是RGB
# 绘图展示
def cv_show(name,img):
b,g,r = cv2.split(img)
img_rgb = cv2.merge((r,g,b))
plt.imshow(img_rgb)
plt.show()
def cv_show1(name,img):
plt.imshow(img)
plt.show()
cv2.imshow(name,img)
cv2.waitKey()
cv2.destroyAllWindows()
preprocess = 'thresh' #thresh
image = cv2.imread('scan.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
if preprocess == "thresh":
gray = cv2.threshold(gray, 0, 255,cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]#自适应二值化
if preprocess == "blur":
gray = cv2.medianBlur(gray, 3)#中位模糊
filename = "{}.png".format(os.getpid())
cv2.imwrite(filename, gray)
True
text = pytesseract.image_to_string(Image.open(filename))
print(text)
os.remove(filename)
cv_show("Image", image)
cv_show1("Output", gray)
we KK Re KK KK OK OK KK
WHOLE FOODS MARKET - WESTPORT, CT 06880
399 POST RD WEST - (203) 227-6858
365
365
365
365
BROTH
BACON
BACON
BACON
BACUN
LS
LS
LS
LS
CHIC
FLOUR ALMUND
CHKN BRST BNLSS SK
HEAVY CREAM
BALSMC REDUCT
GRND 85/15
BEEF
JUICE
COF CASHEW
L
DOCS PINT ORGANIC
HNY ALMOND BUTTER
xeene TAX
.00
BAL
NP
NP
NP
NP
NP
NP
NP
NP
NP
NP
NP
NP
NP
4
4
4
99
.99
.99
mal
7 7 T
mana Ramm


边栏推荐
- tensorflow2-savedmodel convert to pb(frozen_graph)
- One of the data Lake series | you must love to read the history of minimalist data platforms, from data warehouse, data lake to Lake warehouse
- ArrayList 扩容详解,扩容原理[通俗易懂]
- 手把手带你入门 API 开发
- 保证生产安全!广州要求危化品企业“不安全不生产、不变通”
- Buuctf reinforcement question ezsql
- Don't want to knock the code? Here comes the chance
- Error-tf.function-decorated function tried to create variables on non-first call
- 关于重载运算符的再整理
- Pat 1065 a+b and C (64bit) (20 points) (16 points)
猜你喜欢
![[dynamic programming] p1004 grid access (four-dimensional DP template question)](/img/3a/3b82a4d9dcc25a3c9bf26b6089022f.jpg)
[dynamic programming] p1004 grid access (four-dimensional DP template question)

Microservice development steps (Nacos)

炎炎夏日,这份安全用气指南请街坊们收好!

Blog recommendation | in depth study of message segmentation in pulsar

Salesforce、约翰霍普金斯、哥大 | ProGen2: 探索蛋白语言模型的边界

Cannot link redis when redis is enabled

MIT团队使用图神经网络,加速无定形聚合物电解质筛选,促进下一代锂电池技术开发

博文推荐 | 深入研究 Pulsar 中的消息分块

对于编程思想和能力有重大提升的书有哪些?

Salesforce, Johns Hopkins, Columbia | progen2: exploring the boundaries of protein language models
随机推荐
数据产品经理需要掌握哪些数据能力?
Research Report on the development trend and competitive strategy of the global diamond suspension industry
互联网医院系统源码 医院小程序源码 智慧医院源码 在线问诊系统源码
这3款在线PS工具,得试试
TypeScript: let
JVM第一话 -- JVM入门详解以及运行时数据区分析
Salesforce, Johns Hopkins, Columbia | progen2: exploring the boundaries of protein language models
One of the data Lake series | you must love to read the history of minimalist data platforms, from data warehouse, data lake to Lake warehouse
What are the books that have greatly improved the thinking and ability of programming?
Redis安装及Ubuntu 14.04下搭建ssdb主从环境
[dynamic programming] p1004 grid access (four-dimensional DP template question)
Build your own website (14)
网速、宽带、带宽、流量三者之间的关系是什么?
The data in the database table recursively forms a closed-loop data. How can we get these data
Configuration of ZABBIX API and PHP
Rearrangement of overloaded operators
Mongodb second talk - - mongodb High available Cluster Implementation
The first technology podcast month will be broadcast soon
Develop small programs and official account from zero [phase III]
Vnctf2022 open web gocalc0