当前位置:网站首页>Image table solid line and dashed line detection

Image table solid line and dashed line detection

2022-06-13 02:15:00 zjuPeco

1 Background

The structure of tables in images is a hot topic , The input is a picture , The output is all structured tables , It can also be said that the output is a excel. At present, no one in the market has done a perfect job , Because tables are always strange . But for some simple and regular wired tables or multiline tables , It can be well structured .

The general process of image table detection is
 Image table detection process

chart 1-1 Flow chart of image table detection

【 Form detection 】 To find the table position on the image , At the same time, separate some watches that are close to each other , We need to train a model of image detection , This is a lot of hard data train That's it ,Yolov5 And other general image detection models have good results . But for a table with only one row , The detection effect is not very good , This needs the help of traditional methods .

【 Horizontal and vertical line detection 】 To detect the split line in the table , It has great reference significance for table structure . Some single row tables can also be judged by the results of this step . Or the four sides are wired , You can use the results here to determine the position of the table . This article is about this step .

【OCR】 To get the text of each cell in the table .

【 Table structure 】 It is the result of combining the first three modules , Get structured tables , Here, according to the diversity of tables in the business scenario to be processed , There will be rules for different amounts of code . I have to write thousands of lines of rules for the scenarios I deal with .

This article only talks about how to detect the table lines in the image .

There are two main options :
(1) Two valued + Corrosion expansion + Contour detection , This is a camelot The method used in .
(2) edge detection + Hoff line detection , This is the method that we see more on the Internet .

Let's talk about these two methods , The pictures used are
 Sample test picture

chart 1-2 Sample test picture

This picture is taken because the table in the picture has solid lines , Another dotted line . It is convenient to compare the effects of different methods .

2 camelot The method in

2.1 Two valued

Binarization is used opencv In the middle of cv2.adaptiveThreshold, This binarization method can adjust the threshold according to the local color difference , It is more in line with the colorful scene of the table background .

def adaptive_threshold(imagename, process_background=False, blocksize=15, c=-2):
    """Thresholds an image using OpenCV's adaptiveThreshold. Parameters ---------- imagename : string Path to image file. process_background : bool, optional (default: False) Whether or not to process lines that are in background. blocksize : int, optional (default: 15) Size of a pixel neighborhood that is used to calculate a threshold value for the pixel: 3, 5, 7, and so on. For more information, refer `OpenCV's adaptiveThreshold <https://docs.opencv.org/2.4/modules/imgproc/doc/miscellaneous_transformations.html#adaptivethreshold>`_. c : int, optional (default: -2) Constant subtracted from the mean or weighted mean. Normally, it is positive but may be zero or negative as well. For more information, refer `OpenCV's adaptiveThreshold <https://docs.opencv.org/2.4/modules/imgproc/doc/miscellaneous_transformations.html#adaptivethreshold>`_. Returns ------- img : object numpy.ndarray representing the original image. threshold : object numpy.ndarray representing the thresholded image. """
    img = cv2.imread(imagename)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    if process_background:
        threshold = cv2.adaptiveThreshold(
            gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, blocksize, c
        )
    else:
        threshold = cv2.adaptiveThreshold(
            np.invert(gray),
            255,
            cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
            cv2.THRESH_BINARY,
            blocksize,
            c,
        )
    return img, threshold

When you use it , Just use it directly

image, threshold = adaptive_threshold(
    image_path,
    process_background=False,
    blocksize=11,
    c=-2,
)

there threshold This is the binary image .

2.2 Corrosion expansion

The purpose of corrosion expansion is to find out the long lines along the horizontal and vertical directions . Corrosion is when kernel Within range 0 when , Just set it all 0, Discontinuous pixels are filtered out ; Inflation is the expansion of kernel The center is 255 The dot of expands to kernel Size , Restore the corroded points on the original line segment .

for instance , We first construct a vertical length of 5 Of kernel.

import cv2
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 5))

kernel by

array([[1],
       [1],
       [1],
       [1],
       [1]], dtype=uint8)

Situation 1 :
We first construct a numerical direction with insufficient length 5 The straight line of , And use kernel Corrode

import numpy as np
img = np.zeros((10, 5))
img[1:5, 0] = 255
erode_img = cv2.erode(img, kernel)

img by

array([[  0.,   0.,   0.,   0.,   0.],
       [255.,   0.,   0.,   0.,   0.],
       [255.,   0.,   0.,   0.,   0.],
       [255.,   0.,   0.,   0.,   0.],
       [255.,   0.,   0.,   0.,   0.],
       [  0.,   0.,   0.,   0.,   0.],
       [  0.,   0.,   0.,   0.,   0.],
       [  0.,   0.,   0.,   0.,   0.],
       [  0.,   0.,   0.,   0.,   0.],
       [  0.,   0.,   0.,   0.,   0.]])

erode_img by

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

Situation two :
We construct a numerical direction with insufficient length 5 The straight line of , And use kernel Corrode

import numpy as np
img = np.zeros((10, 5))
img[1:6, 0] = 255
erode_img = cv2.erode(img, kernel)

img by

array([[  0.,   0.,   0.,   0.,   0.],
       [255.,   0.,   0.,   0.,   0.],
       [255.,   0.,   0.,   0.,   0.],
       [255.,   0.,   0.,   0.,   0.],
       [255.,   0.,   0.,   0.,   0.],
       [255.,   0.,   0.,   0.,   0.],
       [  0.,   0.,   0.,   0.,   0.],
       [  0.,   0.,   0.,   0.,   0.],
       [  0.,   0.,   0.,   0.,   0.],
       [  0.,   0.,   0.,   0.,   0.]])

erode_img by

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [255., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

Inflated words , The point that has not been completely corroded , Restore the return line , There are no examples here .

2.3 Contour detection

The contour detection part is to find out the corrosion expansion line , This is in the same function as corrosion expansion

def find_lines(
    threshold, regions=None, direction="horizontal", line_scale=15, iterations=0
):
    """Finds horizontal and vertical lines by applying morphological transformations on an image. Parameters ---------- threshold : object numpy.ndarray representing the thresholded image. regions : list, optional (default: None) List of page regions that may contain tables of the form x1,y1,x2,y2 where (x1, y1) -> left-top and (x2, y2) -> right-bottom in image coordinate space. direction : string, optional (default: 'horizontal') Specifies whether to find vertical or horizontal lines. line_scale : int, optional (default: 15) Factor by which the page dimensions will be divided to get smallest length of lines that should be detected. The larger this value, smaller the detected lines. Making it too large will lead to text being detected as lines. iterations : int, optional (default: 0) Number of times for erosion/dilation is applied. For more information, refer `OpenCV's dilate <https://docs.opencv.org/2.4/modules/imgproc/doc/filtering.html#dilate>`_. Returns ------- dmask : object numpy.ndarray representing pixels where vertical/horizontal lines lie. lines : list List of tuples representing vertical/horizontal lines with coordinates relative to a left-top origin in image coordinate space. """
    lines = []

    if direction == "vertical":
        size = threshold.shape[0] // line_scale
        if size < 2:
            size = threshold.shape[0]
        el = cv2.getStructuringElement(cv2.MORPH_RECT, (1, size))
    elif direction == "horizontal":
        size = threshold.shape[1] // line_scale
        if size < 2:
            size = threshold.shape[1]
        el = cv2.getStructuringElement(cv2.MORPH_RECT, (size, 1))
    elif direction is None:
        raise ValueError("Specify direction as either 'vertical' or 'horizontal'")

    if regions is not None:
        region_mask = np.zeros(threshold.shape)
        for region in regions:
            x, y, w, h = region
            region_mask[y : y + h, x : x + w] = 1
        threshold = np.multiply(threshold, region_mask)

    threshold = cv2.erode(threshold, el)
    threshold = cv2.dilate(threshold, el)
    dmask = cv2.dilate(threshold, el, iterations=iterations)

    try:
        contours, _ = cv2.findContours(
            threshold.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
        )
    except ValueError:
        # for opencv backward compatibility
        _, contours, _ = cv2.findContours(
            threshold.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
        )

    for c in contours:
        x, y, w, h = cv2.boundingRect(c)
        x1, x2 = x, x + w
        y1, y2 = y, y + h
        if direction == "vertical":
            lines.append(((x1 + x2) // 2, y1, (x1 + x2) // 2, y2))
        elif direction == "horizontal":
            lines.append((x1, (y1 + y2) // 2, x2, (y1 + y2) // 2))

    return dmask, lines

The detection code for horizontal and vertical lines is

#  Vertical line detection 
iterations = 0
vertical_line_scale = 60
regions = None
vertical_mask, vertical_segments = find_lines(
    threshold,
    regions=regions,
    direction="vertical",
    line_scale=vertical_line_scale,
    iterations=iterations,
)

#  Horizontal line detection 
iterations = 0
horizontal_line_scale = 50
regions = None
horizontal_mask, horizontal_segments = find_lines(
    threshold,
    regions=regions,
    direction="horizontal",
    line_scale=horizontal_line_scale,
    iterations=iterations,
)

By controlling vertical_line_scale and horizontal_line_scale You can control the minimum segment length .vertical_mask and horizontal_mask It's a binary image ,vertical_segments and horizontal_segments Is the position of the line segment .

2.4 Result display

The line segments detected by this method are shown in the following figure 2-1 Shown .
 The results show that

chart 2-1 The results show that

It's not hard to see. , In this way , All the required solid lines have been found , But strokes on some large characters are also considered line segments , More serious , The dotted line cannot be detected .

In the case where there is no dotted line in the table , This is actually a simple, fast and accurate solution .

3 Method based on Hough line detection

In order to solve the problem that the dotted line cannot be detected , I thought of Hoff line detection . Here is a brief explanation of how Hoff line detection is going on .

3.1 Hoff line detection principle

Hoff's line detection is simple , Junior high school knowledge can be solved .

To understand this question , First of all, you have to know xy A point in a coordinate system ( x 0 , y 0 ) (x_0, y_0) (x0,y0) How do all the lines of represent . We all know , A straight line can have a slope k k k And intercept b b b The only certainty is y = k x + b y=kx+b y=kx+b. Let's construct another kb Coordinate system , Horizontal axis k k k, The vertical axis is b b b. So any point in this coordinate system ( k i , b i ) (k_i, b_i) (ki,bi) Namely xy A straight line in a coordinate system . Say it again ,kb A point in a coordinate system , It stands for xy A straight line in a coordinate system .

So good , too ( x 0 , y 0 ) (x_0, y_0) (x0,y0) All the straight lines in kb The coordinate system is a straight line

y 0 = k x 0 + b → { k = − 1 x 0 b + y 0 x 0 i f   x 0 ≠ 0 b = y 0 i f   x 0 = 0 (3-1) y_0 = kx_0 + b \rightarrow \begin{cases} k = -\frac{1}{x_0}b + \frac{y_0}{x_0} &if\ x_0 \ne 0\\ b = y_0 &if\ x_0 =0 \end{cases} \tag{3-1} y0=kx0+b{ k=x01b+x0y0b=y0if x0=0if x0=0(3-1)

kb A straight line in a coordinate system , Just pass xy All lines at a point in the coordinate system .

Empathy , Suppose there is another point ( x 1 , y 1 ) (x_1, y_1) (x1,y1), All the straight lines through the store are at kb The line in the coordinate system is y 1 = k x 1 + b y_1=kx_1+b y1=kx1+b.

Equations ( 3 − 2 ) (3-2) (32) Solution ( k ∗ , b ∗ ) (k^*, b^*) (k,b), Just pass ( x 0 , y 0 ) (x_0, y_0) (x0,y0) and ( x 1 , y 1 ) (x_1, y_1) (x1,y1) The straight line determined by these two points .

{ y 0 = k x 0 + b y 1 = k x 1 + b (3-2) \begin{cases} y_0 = kx_0 + b \\ y_1=kx_1+b \end{cases} \tag{3-2} { y0=kx0+by1=kx1+b(3-2)

xy The representation of all lines of all points on the same line in the coordinate system , stay kb The coordinate system must all pass through the same point , Here's the picture 3-1 Shown . The picture is from Reference material [2] Borrow it , So the symbols are inconsistent , I'm too lazy to draw .
xy and kb Spatial map

chart 3-1 xy and kb Spatial map

Since then , So we can kb How many straight lines pass through a certain point in space to judge whether it is in xy How many points in the coordinate system are on this line .

But map to kb There will be a problem with the coordinate system , When xy The lines in the coordinate system are nearly parallel y Axial time ,k It will also be close to infinity , Infinity cannot be calculated . To solve this problem , He put forward the idea of xy Space mapping to polar coordinates θ r \theta r θr Space .

The mapping method is shown in the following figure 3-2 Shown , This picture is borrowed Reference material [3] Of .xy Every line in the coordinate system is θ r \theta r θr A point in space ( θ , r ) (\theta, r) (θ,r), r r r by xy The distance between the original point and the straight line in the coordinate system , θ \theta θ Represents the perpendicular line from the origin to the line and x Angle between axes ;xy All lines passing through a point in the coordinate system correspond to θ r \theta r θr A curve in space r = x i c o s θ + y i s i n θ r = x_i cos\theta + y_i sin\theta r=xicosθ+yisinθ.
xy And polar mapping

chart 3-2 xy And polar space mapping

If you don't understand why r = x i c o s θ + y i s i n θ r = x_i cos\theta + y_i sin\theta r=xicosθ+yisinθ Can be said xy Too much space ( x i , y i ) (x_i, y_i) (xi,yi) All the straight lines . Think about it this way , A straight line crosses the graph 3-2 In the point ( x 2 , y 2 ) (x_2, y_2) (x2,y2), At first it was with y Axis parallel , namely θ = 0 \theta=0 θ=0, Then start winding ( x 2 , y 2 ) (x_2, y_2) (x2,y2) rotate , θ \theta θ Growing , Until you turn 2 π 2\pi 2π. This process of rotation traverses all the passes ( x 2 , y 2 ) (x_2, y_2) (x2,y2) The straight line of , And if you draw an auxiliary line and calculate it , Will find r r r Always satisfied r = x 2 c o s θ + y 2 s i n θ r = x_2 cos\theta + y_2 sin\theta r=x2cosθ+y2sinθ.

And kb Different coordinate systems , here θ \theta θ Namely [ 0 , 2 π ) [0, 2\pi) [0,2π) Value range of , r r r Just order it ( x i , y i ) (x_i, y_i) (xi,yi) Just be a finite distance from the origin , This can be met in the actual scene . Does not produce infinite values .

As for how to find a straight line , Also and kb Same coordinate system , stay θ r \theta r θr Space find the point where many curves intersect , Namely xy A straight line in space . The number of points sets a threshold , Just don't let too short lines in .

The advantage of Hoff's straight line is that you can find the dotted line .

3.2 Probabilistic Hough line detection

Hoff line detection usually needs to detect the edge of the image first ( such as canny), Then map all the edge points to θ r \theta r θr Space to find those points that are intersected by more than a certain number of curves . There are two drawbacks to this , First, the amount of calculation is too large , Second, I don't know the real length of the line segment . So there is the probabilistic Hough line detection .

Probability Hoff will take a subset of the edge points , To carry out θ r \theta r θr Statistics of spatial intersections , There is an accumulator (Hough accumulator) Count the number of times the candidate points are crossed by the curve . This greatly reduces the amount of computation , according to Reference material [6] Yes , as long as 2% Edge point of , It will have a better effect . Because we use subsets , Therefore, the threshold value of the number of points should be reduced accordingly .

After the candidate points are determined according to the threshold , Probability Hoff will go to the complete set of edge points to find out which points are also on this line , The points with too large concurrency interval are filtered out , So you can find all the points on a line segment .

In this way, the problems of large amount of calculation and not knowing the real length of the line segment are solved .

3.3 Hoff line application

Hoff line detection in opencv Corresponds to cv2.HoughLines This function , Only return θ \theta θ and r r r. Probability Hoff is opencv Corresponds to cv2.HoughLinesP This function , Returns the coordinates of the starting and ending points of the line segment . For parameter descriptions of these two functions, please refer to Reference material [7], Not here .

Go straight to the code , In general, it's just two steps , First Canny edge detection , Then probability Hoff .

import cv2

im = cv2.imread(image_path)
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, 150, 200, apertureSize=3)
img = im.copy()
lines = cv2.HoughLinesP(edges, 1, np.pi/180, 100, minLineLength = 100, maxLineGap = 10)

The result is shown in the figure below 3-3 Shown .
 Probability Hoff result

chart 3-3 Probability Hoff result graph

It's not hard to see. , The dotted line comes out , But there are three problems , First, the strokes of words are also considered as straight lines , Second, there are many lines close to coincidence , Third, a slash appears . These problems can be solved by post-processing .

ocr The result can provide the position of the text , Those lines on the text can be filtered out with this information ; Lines close to coincidence can be filtered out according to the distance of the lines ; The slash can be filtered out according to the slope , The detection box detected by the table can also filter out a large wave line , Because we only need the lines in the table .

As a whole , There are more ways than difficulties .

Reference material

[1] https://github.com/atlanhq/camelot
[2] https://blog.csdn.net/lkj345/article/details/50699981
[3] https://congleetea.github.io/blog/2018/09/28/hough-transform/
[4] https://stackoverflow.com/questions/59340367/how-does-the-probabilistic-hough-transform-compute-the-end-points-of-lines
[5] https://blog.csdn.net/zhaocj/article/details/40047397
[6] https://jayrambhia.com/blog/probabilistic-hough-transform
[7] https://www.cxyzjd.com/article/hihell/113670582

原网站

版权声明
本文为[zjuPeco]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202280546060399.html