当前位置:网站首页>Image table solid line and dashed line detection
Image table solid line and dashed line detection
2022-06-13 02:15:00 【zjuPeco】
List of articles
1 Background
The structure of tables in images is a hot topic , The input is a picture , The output is all structured tables , It can also be said that the output is a excel. At present, no one in the market has done a perfect job , Because tables are always strange . But for some simple and regular wired tables or multiline tables , It can be well structured .
The general process of image table detection is
【 Form detection 】 To find the table position on the image , At the same time, separate some watches that are close to each other , We need to train a model of image detection , This is a lot of hard data train That's it ,Yolov5 And other general image detection models have good results . But for a table with only one row , The detection effect is not very good , This needs the help of traditional methods .
【 Horizontal and vertical line detection 】 To detect the split line in the table , It has great reference significance for table structure . Some single row tables can also be judged by the results of this step . Or the four sides are wired , You can use the results here to determine the position of the table . This article is about this step .
【OCR】 To get the text of each cell in the table .
【 Table structure 】 It is the result of combining the first three modules , Get structured tables , Here, according to the diversity of tables in the business scenario to be processed , There will be rules for different amounts of code . I have to write thousands of lines of rules for the scenarios I deal with .
This article only talks about how to detect the table lines in the image .
There are two main options :
(1) Two valued + Corrosion expansion + Contour detection , This is a camelot The method used in .
(2) edge detection + Hoff line detection , This is the method that we see more on the Internet .
Let's talk about these two methods , The pictures used are
This picture is taken because the table in the picture has solid lines , Another dotted line . It is convenient to compare the effects of different methods .
2 camelot The method in
2.1 Two valued
Binarization is used opencv In the middle of cv2.adaptiveThreshold, This binarization method can adjust the threshold according to the local color difference , It is more in line with the colorful scene of the table background .
def adaptive_threshold(imagename, process_background=False, blocksize=15, c=-2):
"""Thresholds an image using OpenCV's adaptiveThreshold. Parameters ---------- imagename : string Path to image file. process_background : bool, optional (default: False) Whether or not to process lines that are in background. blocksize : int, optional (default: 15) Size of a pixel neighborhood that is used to calculate a threshold value for the pixel: 3, 5, 7, and so on. For more information, refer `OpenCV's adaptiveThreshold <https://docs.opencv.org/2.4/modules/imgproc/doc/miscellaneous_transformations.html#adaptivethreshold>`_. c : int, optional (default: -2) Constant subtracted from the mean or weighted mean. Normally, it is positive but may be zero or negative as well. For more information, refer `OpenCV's adaptiveThreshold <https://docs.opencv.org/2.4/modules/imgproc/doc/miscellaneous_transformations.html#adaptivethreshold>`_. Returns ------- img : object numpy.ndarray representing the original image. threshold : object numpy.ndarray representing the thresholded image. """
img = cv2.imread(imagename)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
if process_background:
threshold = cv2.adaptiveThreshold(
gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, blocksize, c
)
else:
threshold = cv2.adaptiveThreshold(
np.invert(gray),
255,
cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY,
blocksize,
c,
)
return img, threshold
When you use it , Just use it directly
image, threshold = adaptive_threshold(
image_path,
process_background=False,
blocksize=11,
c=-2,
)
there threshold This is the binary image .
2.2 Corrosion expansion
The purpose of corrosion expansion is to find out the long lines along the horizontal and vertical directions . Corrosion is when kernel Within range 0 when , Just set it all 0, Discontinuous pixels are filtered out ; Inflation is the expansion of kernel The center is 255 The dot of expands to kernel Size , Restore the corroded points on the original line segment .
for instance , We first construct a vertical length of 5 Of kernel.
import cv2
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 5))
kernel by
array([[1],
[1],
[1],
[1],
[1]], dtype=uint8)
Situation 1 :
We first construct a numerical direction with insufficient length 5 The straight line of , And use kernel Corrode
import numpy as np
img = np.zeros((10, 5))
img[1:5, 0] = 255
erode_img = cv2.erode(img, kernel)
img by
array([[ 0., 0., 0., 0., 0.],
[255., 0., 0., 0., 0.],
[255., 0., 0., 0., 0.],
[255., 0., 0., 0., 0.],
[255., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
erode_img by
array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]])
Situation two :
We construct a numerical direction with insufficient length 5 The straight line of , And use kernel Corrode
import numpy as np
img = np.zeros((10, 5))
img[1:6, 0] = 255
erode_img = cv2.erode(img, kernel)
img by
array([[ 0., 0., 0., 0., 0.],
[255., 0., 0., 0., 0.],
[255., 0., 0., 0., 0.],
[255., 0., 0., 0., 0.],
[255., 0., 0., 0., 0.],
[255., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
erode_img by
array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[255., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]])
Inflated words , The point that has not been completely corroded , Restore the return line , There are no examples here .
2.3 Contour detection
The contour detection part is to find out the corrosion expansion line , This is in the same function as corrosion expansion
def find_lines(
threshold, regions=None, direction="horizontal", line_scale=15, iterations=0
):
"""Finds horizontal and vertical lines by applying morphological transformations on an image. Parameters ---------- threshold : object numpy.ndarray representing the thresholded image. regions : list, optional (default: None) List of page regions that may contain tables of the form x1,y1,x2,y2 where (x1, y1) -> left-top and (x2, y2) -> right-bottom in image coordinate space. direction : string, optional (default: 'horizontal') Specifies whether to find vertical or horizontal lines. line_scale : int, optional (default: 15) Factor by which the page dimensions will be divided to get smallest length of lines that should be detected. The larger this value, smaller the detected lines. Making it too large will lead to text being detected as lines. iterations : int, optional (default: 0) Number of times for erosion/dilation is applied. For more information, refer `OpenCV's dilate <https://docs.opencv.org/2.4/modules/imgproc/doc/filtering.html#dilate>`_. Returns ------- dmask : object numpy.ndarray representing pixels where vertical/horizontal lines lie. lines : list List of tuples representing vertical/horizontal lines with coordinates relative to a left-top origin in image coordinate space. """
lines = []
if direction == "vertical":
size = threshold.shape[0] // line_scale
if size < 2:
size = threshold.shape[0]
el = cv2.getStructuringElement(cv2.MORPH_RECT, (1, size))
elif direction == "horizontal":
size = threshold.shape[1] // line_scale
if size < 2:
size = threshold.shape[1]
el = cv2.getStructuringElement(cv2.MORPH_RECT, (size, 1))
elif direction is None:
raise ValueError("Specify direction as either 'vertical' or 'horizontal'")
if regions is not None:
region_mask = np.zeros(threshold.shape)
for region in regions:
x, y, w, h = region
region_mask[y : y + h, x : x + w] = 1
threshold = np.multiply(threshold, region_mask)
threshold = cv2.erode(threshold, el)
threshold = cv2.dilate(threshold, el)
dmask = cv2.dilate(threshold, el, iterations=iterations)
try:
contours, _ = cv2.findContours(
threshold.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
)
except ValueError:
# for opencv backward compatibility
_, contours, _ = cv2.findContours(
threshold.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
)
for c in contours:
x, y, w, h = cv2.boundingRect(c)
x1, x2 = x, x + w
y1, y2 = y, y + h
if direction == "vertical":
lines.append(((x1 + x2) // 2, y1, (x1 + x2) // 2, y2))
elif direction == "horizontal":
lines.append((x1, (y1 + y2) // 2, x2, (y1 + y2) // 2))
return dmask, lines
The detection code for horizontal and vertical lines is
# Vertical line detection
iterations = 0
vertical_line_scale = 60
regions = None
vertical_mask, vertical_segments = find_lines(
threshold,
regions=regions,
direction="vertical",
line_scale=vertical_line_scale,
iterations=iterations,
)
# Horizontal line detection
iterations = 0
horizontal_line_scale = 50
regions = None
horizontal_mask, horizontal_segments = find_lines(
threshold,
regions=regions,
direction="horizontal",
line_scale=horizontal_line_scale,
iterations=iterations,
)
By controlling vertical_line_scale and horizontal_line_scale You can control the minimum segment length .vertical_mask and horizontal_mask It's a binary image ,vertical_segments and horizontal_segments Is the position of the line segment .
2.4 Result display
The line segments detected by this method are shown in the following figure 2-1 Shown .
It's not hard to see. , In this way , All the required solid lines have been found , But strokes on some large characters are also considered line segments , More serious , The dotted line cannot be detected .
In the case where there is no dotted line in the table , This is actually a simple, fast and accurate solution .
3 Method based on Hough line detection
In order to solve the problem that the dotted line cannot be detected , I thought of Hoff line detection . Here is a brief explanation of how Hoff line detection is going on .
3.1 Hoff line detection principle
Hoff's line detection is simple , Junior high school knowledge can be solved .
To understand this question , First of all, you have to know xy A point in a coordinate system ( x 0 , y 0 ) (x_0, y_0) (x0,y0) How do all the lines of represent . We all know , A straight line can have a slope k k k And intercept b b b The only certainty is y = k x + b y=kx+b y=kx+b. Let's construct another kb Coordinate system , Horizontal axis k k k, The vertical axis is b b b. So any point in this coordinate system ( k i , b i ) (k_i, b_i) (ki,bi) Namely xy A straight line in a coordinate system . Say it again ,kb A point in a coordinate system , It stands for xy A straight line in a coordinate system .
So good , too ( x 0 , y 0 ) (x_0, y_0) (x0,y0) All the straight lines in kb The coordinate system is a straight line
y 0 = k x 0 + b → { k = − 1 x 0 b + y 0 x 0 i f x 0 ≠ 0 b = y 0 i f x 0 = 0 (3-1) y_0 = kx_0 + b \rightarrow \begin{cases} k = -\frac{1}{x_0}b + \frac{y_0}{x_0} &if\ x_0 \ne 0\\ b = y_0 &if\ x_0 =0 \end{cases} \tag{3-1} y0=kx0+b→{ k=−x01b+x0y0b=y0if x0=0if x0=0(3-1)
kb A straight line in a coordinate system , Just pass xy All lines at a point in the coordinate system .
Empathy , Suppose there is another point ( x 1 , y 1 ) (x_1, y_1) (x1,y1), All the straight lines through the store are at kb The line in the coordinate system is y 1 = k x 1 + b y_1=kx_1+b y1=kx1+b.
Equations ( 3 − 2 ) (3-2) (3−2) Solution ( k ∗ , b ∗ ) (k^*, b^*) (k∗,b∗), Just pass ( x 0 , y 0 ) (x_0, y_0) (x0,y0) and ( x 1 , y 1 ) (x_1, y_1) (x1,y1) The straight line determined by these two points .
{ y 0 = k x 0 + b y 1 = k x 1 + b (3-2) \begin{cases} y_0 = kx_0 + b \\ y_1=kx_1+b \end{cases} \tag{3-2} { y0=kx0+by1=kx1+b(3-2)
xy The representation of all lines of all points on the same line in the coordinate system , stay kb The coordinate system must all pass through the same point , Here's the picture 3-1 Shown . The picture is from Reference material [2] Borrow it , So the symbols are inconsistent , I'm too lazy to draw .
Since then , So we can kb How many straight lines pass through a certain point in space to judge whether it is in xy How many points in the coordinate system are on this line .
But map to kb There will be a problem with the coordinate system , When xy The lines in the coordinate system are nearly parallel y Axial time ,k It will also be close to infinity , Infinity cannot be calculated . To solve this problem , He put forward the idea of xy Space mapping to polar coordinates θ r \theta r θr Space .
The mapping method is shown in the following figure 3-2 Shown , This picture is borrowed Reference material [3] Of .xy Every line in the coordinate system is θ r \theta r θr A point in space ( θ , r ) (\theta, r) (θ,r), r r r by xy The distance between the original point and the straight line in the coordinate system , θ \theta θ Represents the perpendicular line from the origin to the line and x Angle between axes ;xy All lines passing through a point in the coordinate system correspond to θ r \theta r θr A curve in space r = x i c o s θ + y i s i n θ r = x_i cos\theta + y_i sin\theta r=xicosθ+yisinθ.
If you don't understand why r = x i c o s θ + y i s i n θ r = x_i cos\theta + y_i sin\theta r=xicosθ+yisinθ Can be said xy Too much space ( x i , y i ) (x_i, y_i) (xi,yi) All the straight lines . Think about it this way , A straight line crosses the graph 3-2 In the point ( x 2 , y 2 ) (x_2, y_2) (x2,y2), At first it was with y Axis parallel , namely θ = 0 \theta=0 θ=0, Then start winding ( x 2 , y 2 ) (x_2, y_2) (x2,y2) rotate , θ \theta θ Growing , Until you turn 2 π 2\pi 2π. This process of rotation traverses all the passes ( x 2 , y 2 ) (x_2, y_2) (x2,y2) The straight line of , And if you draw an auxiliary line and calculate it , Will find r r r Always satisfied r = x 2 c o s θ + y 2 s i n θ r = x_2 cos\theta + y_2 sin\theta r=x2cosθ+y2sinθ.
And kb Different coordinate systems , here θ \theta θ Namely [ 0 , 2 π ) [0, 2\pi) [0,2π) Value range of , r r r Just order it ( x i , y i ) (x_i, y_i) (xi,yi) Just be a finite distance from the origin , This can be met in the actual scene . Does not produce infinite values .
As for how to find a straight line , Also and kb Same coordinate system , stay θ r \theta r θr Space find the point where many curves intersect , Namely xy A straight line in space . The number of points sets a threshold , Just don't let too short lines in .
The advantage of Hoff's straight line is that you can find the dotted line .
3.2 Probabilistic Hough line detection
Hoff line detection usually needs to detect the edge of the image first ( such as canny), Then map all the edge points to θ r \theta r θr Space to find those points that are intersected by more than a certain number of curves . There are two drawbacks to this , First, the amount of calculation is too large , Second, I don't know the real length of the line segment . So there is the probabilistic Hough line detection .
Probability Hoff will take a subset of the edge points , To carry out θ r \theta r θr Statistics of spatial intersections , There is an accumulator (Hough accumulator) Count the number of times the candidate points are crossed by the curve . This greatly reduces the amount of computation , according to Reference material [6] Yes , as long as 2% Edge point of , It will have a better effect . Because we use subsets , Therefore, the threshold value of the number of points should be reduced accordingly .
After the candidate points are determined according to the threshold , Probability Hoff will go to the complete set of edge points to find out which points are also on this line , The points with too large concurrency interval are filtered out , So you can find all the points on a line segment .
In this way, the problems of large amount of calculation and not knowing the real length of the line segment are solved .
3.3 Hoff line application
Hoff line detection in opencv Corresponds to cv2.HoughLines This function , Only return θ \theta θ and r r r. Probability Hoff is opencv Corresponds to cv2.HoughLinesP This function , Returns the coordinates of the starting and ending points of the line segment . For parameter descriptions of these two functions, please refer to Reference material [7], Not here .
Go straight to the code , In general, it's just two steps , First Canny edge detection , Then probability Hoff .
import cv2
im = cv2.imread(image_path)
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, 150, 200, apertureSize=3)
img = im.copy()
lines = cv2.HoughLinesP(edges, 1, np.pi/180, 100, minLineLength = 100, maxLineGap = 10)
The result is shown in the figure below 3-3 Shown .
It's not hard to see. , The dotted line comes out , But there are three problems , First, the strokes of words are also considered as straight lines , Second, there are many lines close to coincidence , Third, a slash appears . These problems can be solved by post-processing .
ocr The result can provide the position of the text , Those lines on the text can be filtered out with this information ; Lines close to coincidence can be filtered out according to the distance of the lines ; The slash can be filtered out according to the slope , The detection box detected by the table can also filter out a large wave line , Because we only need the lines in the table .
As a whole , There are more ways than difficulties .
Reference material
[1] https://github.com/atlanhq/camelot
[2] https://blog.csdn.net/lkj345/article/details/50699981
[3] https://congleetea.github.io/blog/2018/09/28/hough-transform/
[4] https://stackoverflow.com/questions/59340367/how-does-the-probabilistic-hough-transform-compute-the-end-points-of-lines
[5] https://blog.csdn.net/zhaocj/article/details/40047397
[6] https://jayrambhia.com/blog/probabilistic-hough-transform
[7] https://www.cxyzjd.com/article/hihell/113670582
边栏推荐
- STM32 external interrupt Usage Summary
- Armv8-m (Cortex-M) TrustZone summary and introduction
- Swiper horizontal rotation grid
- [keras] data of 3D u-net source code analysis py
- [learning notes] xr872 GUI littlevgl 8.0 migration (file system)
- [work with notes] MFC solves the problem that pressing ESC and enter will automatically exit
- Laptop touch pad operation
- Huawei equipment is configured with CE dual attribution
- [analysis notes] source code analysis of siliconlabs efr32bg22 Bluetooth mesh sensorclient
- Why is "iFLYTEK Super Brain 2030 plan" more worthy of expectation than "pure" virtual human
猜你喜欢
Understanding and thinking about multi-core consistency
华为设备配置私网IP路由FRR
Introduction to armv8/armv9 - learning this article is enough
Day 1 of the 10 day smart lock project (understand the SCM stm32f401ret6 and C language foundation)
SQLserver2008 拒绝了对对象 '****' (数据库 '****',架构 'dbo')的 SELECT 权限
Use of Arduino series pressure sensors and detected data displayed by OLED (detailed tutorial)
Huawei equipment is configured with dual reflectors to optimize the backbone layer of the virtual private network
[the third day of actual combat of smart lock project based on stm32f401ret6 in 10 days] communication foundation and understanding serial port
Viewing the ambition of Xiaodu technology from intelligent giant screen TV v86
Ten thousand words make it clear that synchronized and reentrantlock implement locks in concurrency
随机推荐
C语言压缩字符串保存到二进制文件,从二进制文件读取压缩字符串后解压。
[keras learning]fit_ Generator analysis and complete examples
Huawei equipment is configured with dual reflectors to optimize the backbone layer of the virtual private network
Number of special palindromes in basic exercise of test questions
Stm32+ze-08 formaldehyde sensor tutorial
Introduction to easydl object detection port
JS get element
[the 4th day of the 10 day smart lock project based on stm32f401ret6] what is interrupt, interrupt service function, system tick timer
STM32F103 IIC OLED program migration complete engineering code
[printf function and scanf function] (learning note 5 -- standard i/o function)
Huawei equipment is configured with IP and virtual private network hybrid FRR
Ten thousand words make it clear that synchronized and reentrantlock implement locks in concurrency
Area of basic exercise circle ※
[arithmetic, relation, logic, bit, compound assignment, self increasing, self decreasing and other] operators (learning note 4 -- C language operators)
LabVIEW large project development tools to improve quality
Introduction to arm Cortex-M learning
Basic exercise of test questions Yanghui triangle (two-dimensional array and shallow copy)
[pytorch]fixmatch code explanation - data loading
【 unity】 Problems Encountered in Packaging webgl Project and their resolution Records
ROS learning-6 detailed explanation of publisher programming syntax