当前位置：网站首页>Oblique document scanning and character recognition (opencv, coordinate transformation analysis)

Oblique document scanning and character recognition (opencv, coordinate transformation analysis)

2022-07-30 20:19:00 【csp_】

项目源码

可在github下载：

https://github.com/chenshunpeng/Doc-scan

图像预处理

首先导入工具包

import numpy as np
import argparse
import cv2

设置命令行参数

# Structural parameters and parse
# we instantiate the ArgumentParser object as ap（实例化）
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", default='./images/pic.jpg'
				, required = False, help = "Path to the image to be scanned")
args = vars(ap.parse_args())

读取输入,对图像进行缩放

图像：

在这里插入图片描述

# 读取输入
image = cv2.imread(args["image"])
# 图像缩放,坐标也会相同变化
ratio = image.shape[0] / 500.0
orig = image.copy()
image = resize(orig, height = 500)

这里的ratio是4.896

在这里插入图片描述

在此给出resize函数定义：

def resize(image, width=None, height=None, inter=cv2.INTER_AREA):
	dim = None
	(h, w) = image.shape[:2]
	if width is None and height is None:
		return image
	if width is None:
		r = height / float(h)
		dim = (int(w * r), height)
	else:
		r = width / float(w)
		dim = (width, int(h * r))
	resized = cv2.resize(image, dim, interpolation=inter)
	return resized

对图像进行预处理,And show preprocessing results

# 转灰度图
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# 高斯滤波,去除噪音点
gray = cv2.GaussianBlur(gray, (5, 5), 0)
# 边缘检测
edged = cv2.Canny(gray, 75, 200)

# 展示预处理结果
print("STEP 1: 边缘检测")
cv2.imshow("Image", image)
cv2.imshow("Edged", edged)
cv2.waitKey(0)
cv2.destroyAllWindows()

结果：

在这里插入图片描述

对于一些陌生的知识（如Canny边缘检测）,可以在w3cschool看OpenCVDocument translation study（https://www.w3cschool.cn/opencv）

在这里插入图片描述

Recommend a certain English level directly to seeOpenCV官方文档（https://docs.opencv.org/3.4/index.html）：

在这里插入图片描述

To obtain the optimal profile

轮廓提取

# cv.findContours（）函数中有三个参数,第一个是源图像,第二个是轮廓检索模式,第三个是轮廓近似方法.
# 它输出轮廓和层次结构.Contours是图像中所有轮廓的Python列表.The outline of each individual are object boundary point （x,y） 坐标的 Numpy 数组
cnts = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)[0]
# To outline according to the area from big to small order,取前5个（先从小到大排序,之后取reverse翻转）
cnts = sorted(cnts, key = cv2.contourArea, reverse = True)[:5]

在 Opencv4中,cv2.findContour()仅返回 2 个值：contours, hierachy,所以在这里用[0]得到第一个值

Specific reading this web site：https://docs.opencv.org/4.x/d4/d73/tutorial_py_contours_begin.html

Before ordering493个轮廓：

在这里插入图片描述

排序后取面积最大的5个轮廓（The outside area of the framework of a certain maximum,So the circumference also larger,Or more points）：

在这里插入图片描述

筛选轮廓

To traverse the outline,To find the optimal profile

# 对screenCnt初始化,不然可能会有警告
screenCnt = [[0,0], [255,0], [255,255], [0,255]]

for c in cnts:
	# 计算轮廓近似
	peri = cv2.arcLength(c, True)

	# cv2.approxPolyDP()Main function is to put a continuous smooth curve line is changed,After the polygon approximation
	# c表示输入的点集
	# 其中第二个参数epsilon的作用：double epsilon：判断点到相对应的line segment的距离的阈值
	# （距离大于此阈值则舍弃,小于此阈值则保留,epsilon越小,折线的形状越“接近”曲线.）
	# True表示封闭的
	approx = cv2.approxPolyDP(c, 0.02 * peri, True)
	# Because is the text line,Returns the box should be at least a quadrilateral,As long as find the biggest quadrilateral,就可以退出了
	if len(approx) == 4:
		screenCnt = approx
		break

v2.approxPolyDPFunction to understand to see：

Google：Contour Features（OpenCV官网教程）

在这里插入图片描述

总之,This function class knowledge recommended Google 搜索,可以直接搜索到官方文档,权威一些

The optimal profile is as follows：

在这里插入图片描述

We found the optimal profile is the outline of the largest,But the most is not the most points（1009<1042）

在这里插入图片描述

screenCnt的值：

在这里插入图片描述

The original and transform coordinates calculation

通过screenCnt.reshape(4, 2)将其变为4x2的矩阵,之后* ratio得到的pts矩阵为：

在这里插入图片描述

即：
$\begin{cases}t_{1}\left( 2222.784,151.776\right) \\ t_{1}\left( 93.024, 876.384\right) \\ t_{1}\left(685.44,2291.328\right) \\ t_{1}(2624.256,1439.424) \end{cases}$

这里4个点用 $t_{1-4}$ 表示是因为这4A sequence of points which cannot represent the relative position relations,两者没有任何关联

之后通过order_pointsFunction to obtain input coordinates of point

def order_points(pts):
	# 一共4个坐标点
	rect = np.zeros((4, 2), dtype = "float32")

	# 按顺序找到对应坐标0123分别是 左上,右上,右下,左下
	# 计算左上,右下
	s = pts.sum(axis = 1)
	rect[0] = pts[np.argmin(s)]
	rect[2] = pts[np.argmax(s)]

	# 计算右上和左下
	diff = np.diff(pts, axis = 1)
	rect[1] = pts[np.argmin(diff)]
	rect[3] = pts[np.argmax(diff)]

	return rect

过程如下：

在这里插入图片描述

After the transformations,给出函数four_point_transform：

def four_point_transform(image, pts):
	# 获取输入坐标点
	rect = order_points(pts)
	(tl, tr, br, bl) = rect

	# 计算输入的w和h值
	widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
	widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
	maxWidth = max(int(widthA), int(widthB))

	heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
	heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
	maxHeight = max(int(heightA), int(heightB))

	# 变换后对应坐标位置
	dst = np.array([
		[0, 0],
		[maxWidth - 1, 0],
		[maxWidth - 1, maxHeight - 1],
		[0, maxHeight - 1]], dtype = "float32")

	# 计算变换矩阵
	M = cv2.getPerspectiveTransform(rect, dst)
	warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight))

	# 返回变换后结果
	return warped

Intermediate variable as follows：

在这里插入图片描述
给出(tl, tr, br, bl)（变换前）的表达：
$\begin{cases}A_{tl}\left( 93.024, 876.384\right) \\ B_{tr}\left( 2222.784,151.776\right) \\ C_{br}\left(2624.256,1439.424\right) \\ D_{bl}(685.44,2291.328) \end{cases}$

给出dst（变换后）的表达：
$\begin{cases}A\left(0.,0.\right) \\ B\left( 2248. , 0.\right) \\ C\left(2248. ,1532.\right) \\ D(0. ,1532.) \end{cases}$

Draw a figure something like this：
在这里插入图片描述

其中通过cv2.getPerspectiveTransform(rect, dst)Function to solve the transformation matrix

原理可看b站视频：The perspective transformation matrix solution is derived（通俗易懂）

显示识别结果

图像处理

After the gray level change in turn,二值化处理：

# 灰度,二值处理
warped = cv2.cvtColor(warped, cv2.COLOR_BGR2GRAY)
ref = cv2.threshold(warped, 100, 255, cv2.THRESH_BINARY)[1]

# 把ref写入scan.jpg
cv2.imwrite('scan.jpg', ref)

修改图片大小,并逆时针旋转90度

借鉴博客：

opencv-python 图片旋转90度

cv2.getRotationMatrix2D()

# 修改图片大小,At the same time image counterclockwise90度

# 获取图片,Change the size of the picture
img = cv2.imread("scan.jpg")
# Pay attention to the need for the return value asimg2,Can't not return a value
img2 = cv2.resize(img, (900, 600))
cv2.imshow("temp", img2)
cv2.waitKey(0)
# 对图片进行旋转
# 方法一
# img90 = np.rot90(img2)
# 方法二
# 绕任意点旋转
# 第一个参数旋转中心,第二个参数旋转角度,第三个参数：缩放比例
M = cv2.getRotationMatrix2D((450, 450), 90, 1)
# 仿射变化
# 第三个参数：输入图像的大小
img90 = cv2.warpAffine(img2, M, img2.shape[:2])
# (600, 900)与img2.shape[:2]等价
# img90 = cv2.warpAffine(img2, M, (600, 900))
cv2.imwrite('scan.jpg', img90)
# cv2.imshow("rotate", img90)
# cv2.waitKey(0)
# cv2.destroyAllWindows()

显示图像

print("STEP 3: 变换")
# cv2.imshow("Original", resize(orig, height = 650))
cv2.imshow("Scanned", resize(img90, height = 650))
cv2.waitKey(0)

旋转前：

在这里插入图片描述
旋转后：

在这里插入图片描述

OCR识别

在这里用github开源OCR软件tesseract,可看：

https://github.com/tesseract-ocr/tesseract

下载地址：

https://digi.bib.uni-mannheim.de/tesseract/

我的安装路径是：F:\soft_f\Tesseract-OCR,Installed remember with my environment variables（I only match the user variables）,之后输入tesseract -v有（Version can be high to low,I install is high version）：

在这里插入图片描述

在python中pytesseractAnd installation at the localtesseract-ocr.exeFile is used with,因此需要pip install pytesseract,我的python环境在F:\F_software\Anaconda,安装如下：

在这里插入图片描述

在执行text = pytesseract.image_to_string(Image.open(filename))时发生了报错：

在这里插入图片描述

Found that even if is to configure the environment variables,也找不到tesseract的路径（原因未知QAQ）,解决办法：打开F:\F_software\Anaconda\Lib\site-packages\pytesseract

在这里插入图片描述

修改一下tesseract_cmd = 'F:/soft_f/Tesseract-OCR/tesseract.exe'：

在这里插入图片描述

If it's an error to restart the computer（Sometimes the need to restart the computer to work）

这样就可以识别了,代码如下：

from PIL import Image
import pytesseract
import cv2
import os

preprocess = 'blur' #thresh

# 读入scan.pyThe output of the image：scan.jpg
image = cv2.imread('scan.jpg')
# 转灰度图
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# A grayscale image binarization
if preprocess == "thresh":
    gray = cv2.threshold(gray, 0, 255,cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# 实现中值滤波
if preprocess == "blur":
    gray = cv2.medianBlur(gray, 3)
    
filename = "{}.png".format(os.getpid())
cv2.imwrite(filename, gray)
    
text = pytesseract.image_to_string(Image.open(filename))
print(text)
os.remove(filename)

cv2.imshow("Image", image)
cv2.imshow("Output", gray)
cv2.waitKey(0)

识别结果：

=

Ce ee ee ee ee

WHOLE FOODS MARKE)
399 POST RD WEST - (203) 227-6858

WHOLE
FOODS
(mM AR KE T)

WESTPORT.CT 06880

Seb BALUN LS NP 4
$65 BACUN LS NP 499

$65 BREON LS NP 4

305 BACUN iS fi 4
BROTH CHIL ae é 19
HLQUR ALMUND NP ol 99

CHKN BRST BNLSS SK 8 18 80
HEAVY CREAM $.39
BALSMC REODUCT 6.49

BEEF GRND 85/15 5.04

JUICE COF CASHEW 8
DOCS PINI ORGAile NP 14.49
HNY ALMOND bulithk NP 9

eee TAX 00 Rat 101.33