当前位置：网站首页>Recognition engine ocropy- & gt; ocropy2-＆gt; Ocropus3 summary

Recognition engine ocropy- & gt; ocropy2-＆gt; Ocropus3 summary

2022-07-23 18:58:00 【Full stack programmer webmaster】

Hello everyone , I meet you again , I'm your friend, Quan Jun .

The paper ：

The OCRopus Open Source OCR System

Transfer Learning for OCRopus Model Training on Early Printed Books

GitHub：

https://github.com/tmbdev/ocropy

https://github.com/tmbdev/ocropy2

https://github.com/NVlabs/ocropus3

https://github.com/tmbdev/clstm

https://github.com/chreul/OCR_Testdata_EarlyPrintedBooks

rely on :

Python2

Pytorch

System structure ：

The whole system consists of ocropus-nlbin（ Binarization pretreatment ）,ocropus-gpageseg（ Line detection segmentation ）,ocropus-rpred（ Based on over segmentation +OpenFST The identification of + Language model correction ）,ocropus-hocr（HTML Show results ） These parts make up .

among ,ocropy and ocropy2 Not much difference ,ocropus3 Most of the modules are replaced by Neural Networks , And change it to nvidialab Maintained .

Identification process ：

ocropus-nlbin,ocrobin:

This module is mainly responsible for image preprocessing , It mainly includes image normalization processing , Get rid of the noise , Go to background processing , Angular rotation processing , Histogram equalization processing .

Normalized to the conventional minus mean , Divide variance operation .

The de noisy background is , Through a probability distribution filter , about 20*2 The pixels in the window are arranged from small to large , Take the pixel values and arrange them in 80% As the background pixel . Then subtract the background from the original picture to get the word in the foreground .

Angle rotation , Based on the set angle value , Rotate the original picture , Then find the average value of each line , Then the mean value of all rows forms a vector , Find the variance of the vector . Suppose a graph is positive or negative 5 Degrees every 1 Measure and find a variance , All in all, we can get 10 Variance value , The angle corresponding to the largest variance is the angle that needs to be rotated . because , A normal picture is always a line of black words , A line of blank , So the variance is the largest .

Histogram equalization , First, sort all pixels from small to large , Pixels in 5% The value of the position locates the minimum value , Pixels in 90% Positioning maximum value of position , Then perform histogram equalization .

In the original program scipy modular , Slower , after opencv After improvement , Achieved both speed and effect .

from __future__ import print_function
import numpy as np
import cv2
import time

class Pre_Process(object):
    def __init__(self):
        self.zoom=0.5
        self.perc=50
        self.range=20
        self.bignore=0.2
        self.maxskew=5
        self.skewsteps=1
        self.escale=1.0
        self.lo=0.05
        self.hi=0.9



    def normalize_raw_image(self,raw):
        ''' perform image normalization '''
        image = raw - np.amin(raw)
        if np.amax(image) == np.amin(image):
            return image
        image /= np.amax(image)
        return image


    def estimate_local_whitelevel(self,image,  bignore=0.2,zoom=0.5, perc=80, range=20):
        '''flatten it by estimating the local whitelevel
        zoom for page background estimation, smaller=faster, default: %(default)s
        percentage for filters, default: %(default)s
        range for filters, default: %(default)s
        '''
        d0, d1 = image.shape
        o0, o1 = int(bignore * d0), int(bignore * d1)
        est = image[o0:d0 - o0, o1:d1 - o1]

        image_black=np.sum(est < 0.05)
        image_white=np.sum(est > 0.95)
        extreme = (image_black+image_white) * 1.0 / np.prod(est.shape)

        if np.mean(est)<0.4:
            print( np.mean(est),np.median(est))
            image = 1 - image

        if extreme > 0.95:
            flat = image
        else:
            m=cv2.blur(image,(range,range))
            w, h = np.minimum(np.array(image.shape), np.array(m.shape))
            flat = np.clip(image[:w, :h] - m[:w, :h] + 1, 0, 1)


        return flat


    def estimate_skew_angle(self,image, angles):
        estimates = []
        for a in angles:
            matrix = cv2.getRotationMatrix2D((int(image.shape[1] / 2), int(image.shape[0] / 2)), a, 1)
            rotate_image = cv2.warpAffine(image, matrix, (image.shape[1], image.shape[0]))

            v = np.mean(rotate_image, axis=1)
            v = np.var(v)
            estimates.append((v, a))
        _, a = max(estimates)
        return a

    def estimate_skew(self,flat,maxskew=2, skewsteps=1):
        ''' estimate skew angle and rotate'''
        flat = np.amax(flat) - flat
        flat -= np.amin(flat)
        ma = maxskew
        ms = int(2 * maxskew * skewsteps)

        angle = self.estimate_skew_angle(flat, np.linspace(-ma, ma, ms + 1))

        matrix = cv2.getRotationMatrix2D((int(flat.shape[1] / 2), int(flat.shape[0] / 2)), angle, 1)
        flat= cv2.warpAffine(flat, matrix, (flat.shape[1], flat.shape[0]))


        flat = np.amax(flat) - flat
        return flat, angle

    def estimate_thresholds(self,flat, bignore=0.2, escale=1, lo=0.05, hi=0.9):
        '''# estimate low and high thresholds
        ignore this much of the border for threshold estimation, default: %(default)s
        scale for estimating a mask over the text region, default: %(default)s
        lo percentile for black estimation, default: %(default)s
        hi percentile for white estimation, default: %(default)s
        '''
        d0, d1 = flat.shape
        o0, o1 = int(bignore * d0), int(bignore * d1)
        est = flat[o0:d0 - o0, o1:d1 - o1]

        if escale > 0:
            # by default, we use only regions that contain
            # significant variance; this makes the percentile
            # based low and high estimates more reliable

            v = est -cv2.GaussianBlur(est, (3,3), escale * 20)
            v=cv2.GaussianBlur(v ** 2, (3,3), escale * 20)** 0.5


            v = (v > 0.3 * np.amax(v))
            v=np.asarray(v,np.uint8)
            v=cv2.cvtColor(v, cv2.COLOR_GRAY2RGB)

            kernel =  cv2.getStructuringElement(cv2.MORPH_RECT,(int(escale * 50),int(escale * 50)))
            v = cv2.dilate(v, kernel, 1)
            v=cv2.cvtColor(v, cv2.COLOR_RGB2GRAY)
            v = (v > 0.3 * np.amax(v))
            est = est[v]

        if len(est)!=0:
            est=np.sort(est)
            lo = est[int(lo*len(est))]
            hi = est[int(hi*len(est))]

        # rescale the image to get the gray scale image
        flat -= lo
        flat /= (hi - lo)
        flat = np.clip(flat, 0, 1)
        return flat

    def process(self,img):
        # perform image normalization(30ms)
        image = self.normalize_raw_image(img)

        # check whether the image is already effectively binarized(70ms)
        flat = self.estimate_local_whitelevel(image,self.bignore, self.zoom, self.perc, self.range)

        # estimate skew angle and rotate(100ms)
        flat, angle = self.estimate_skew(flat, self.maxskew, self.skewsteps)

        # estimate low and high thresholds(200ms)
        flat = self.estimate_thresholds(flat, self.bignore, self.escale, self.lo, self.hi)
        flat=np.asarray(flat*255,np.uint8)

        return flat



if __name__=="__main__":
    pp=Pre_Process()
    image=cv2.imread("0020_0022.png",0)
    image=image/255

    for i in range(1):
        start = time.time()
        flat=pp.process(image)
        print("time:",time.time()-start)
cv2.imwrite("gray.jpg", flat)
cv2.imwrite("binary.jpg", 255*(flat>128))

stay ocropus3 in , be based on pytorch Build a network structure for the above pretreatment . Especially the treatment of noise , Can get better results . And the speed is faster than the original traditional method . The model is also very small , Only 26K. The network mainly uses 2DLSTM.

MDLSTM structure ：

To put it simply, first do a horizontal one-dimensional for a plane lstm, Then do the vertical one dimension lstm.

The network structure is as follows ,

Sequential(
  (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True)
  (2): ReLU()
  (3): LSTM2(
    (hlstm): RowwiseLSTM(
      (lstm): LSTM(8, 4, bidirectional=1)
    )
    (vlstm): RowwiseLSTM(
      (lstm): LSTM(8, 4, bidirectional=1)
    )
  )
  (4): Conv2d(8, 1, kernel_size=(1, 1), stride=(1, 1))
  (5): Sigmoid()
)

The handler ：

import ocrobin
import cv2
import numpy as np
import time
bm = ocrobin.Binarizer("bin-000000046-005393.pt")
bm.model
image = np.mean(cv2.imread("0020_0022.png")[:, :, :3], 2)
start=time.time()
binary = bm.binarize(image)
print("time:",time.time()-start)
print(np.max(binary),np.min(binary))
gray=(1-binary)*255
binary=(binary<0.5)*255
cv2.imwrite("gray.png",gray)
cv2.imwrite("bin.png",binary)

ocrorot:

The module is ocropus3 Rotation in (rotation) And symmetry (skew) Corrective module . Let's start with , Rotation refers to 0 degree ,90 degree ,180 degree ,360 degree , this 4 Correction of three angles . Symmetry correction refers to less than 90 Correction of degree angle , Symmetry correction . Compared with ocropy Only small angle correction can be carried out in ,ocrorot It can be said to be more practical . rotate (rotation) And symmetry (skew) Correction is achieved through neural networks .

Among them, the module of rotation correction ：

Sequential(
  (0): CheckSizes [(1, 128), (1, 512), (256, 256), (256, 256)]
  (1): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (2): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True)
  (3): ReLU()
  (4): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
  (5): Conv2d(8, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (6): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True)
  (7): ReLU()
  (8): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
  (9): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (10): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True)
  (11): ReLU()
  (12): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
  (13): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (14): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
  (15): ReLU()
  (16): Img2FlatSum
  (17): Linear(in_features=64, out_features=64, bias=True)
  (18): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True)
  (19): ReLU()
  (20): Linear(in_features=64, out_features=4, bias=True)
  (21): Sigmoid()
  (22): CheckSizes [(1, 128), (4, 4)]
)

Symmetrical correction module ：

Sequential(
  (0): CheckSizes [(1, 128), (1, 512), (256, 256), (256, 256)]
  (1): Conv2d(1, 8, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (2): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True)
  (3): ReLU()
  (4): Spectrum
  (5): Conv2d(8, 4, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (6): BatchNorm2d(4, eps=1e-05, momentum=0.1, affine=True)
  (7): ReLU()
  (8): Reshape((0, [1, 2, 3]))
  (9): Linear(in_features=262144, out_features=128, bias=True)
  (10): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True)
  (11): ReLU()
  (12): Linear(in_features=128, out_features=30, bias=True)
  (13): Sigmoid()
  (14): CheckSizes [(1, 128), (30, 30)]
)

ocrodeg:

This module is also the same ocropus3 Module in , Mainly responsible for the enhancement processing of training data . Including page rotation （page rotation）, Random geometric transformation （random geometric transformations）, Random distribution transformation （random distortions）, Regular surface distortion （ruled surface distortions）, Fuzzy （blur）, Thresholding （thresholding）, noise （noise）, Multiscale noise （multiscale noise）, Random spots （random blobs）, Fiber noise （fibrous noise）, Foreground Background selection （foreground/background selection） etc. .

ocropus-gpageseg,ocroseg：

This module is mainly responsible for line image segmentation . Specifically, it includes the detection of image color scale , Division of rows , Calculate the reading order, etc . The division of rows , First, remove the black underline and other disturbing lines in the picture , Then find the dividing line of the column , Then find the line based on the method of connected domain , So as to split the rows .

stay ocropus3 in , This module mainly passes cnn Realization . The highlight is also 2 dimension LSTM. Mainly through the network to achieve the detection of underlined text . Then take a fixed height above and below the middle line to generate a text box , Realize the detection of text .

The network structure is as follows ：

Sequential(
  (0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True)
  (2): ReLU()
  (3): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
  (4): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (5): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True)
  (6): ReLU()
  (7): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
  (8): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (9): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
  (10): ReLU()
  (11): LSTM2(
    (hlstm): RowwiseLSTM(
      (lstm): LSTM(64, 32, bidirectional=1)
    )
    (vlstm): RowwiseLSTM(
      (lstm): LSTM(64, 32, bidirectional=1)
    )
  )
  (12): Conv2d(64, 32, kernel_size=(1, 1), stride=(1, 1))
  (13): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True)
  (14): ReLU()
  (15): LSTM2(
    (hlstm): RowwiseLSTM(
      (lstm): LSTM(32, 32, bidirectional=1)
    )
    (vlstm): RowwiseLSTM(
      (lstm): LSTM(64, 32, bidirectional=1)
    )
  )
  (16): Conv2d(64, 1, kernel_size=(1, 1), stride=(1, 1))
  (17): Sigmoid()
)

Their own tensorflow Reappear ,https://github.com/watersink/ocrsegment,

ocropus-dewarp：

Mainly for line pictures dewarp operation . First, Gaussian filtering and uniform filtering are performed on a line image , Then take the maximum value of each column as the midpoint of the words in this column . And then take 0–h The mean value of the difference between the data and the midpoint , Multiply by the change interval range, As half of the column height of each column . Then the original line image , The upper and lower columns are filled with half of the background pixels . Then take the center position as the center in the figure , Half the height of the column is the length. Take out some pictures with words , Finally, the image is affine transformed , Transform to a picture with a specified height .

from __future__ import print_function
import os
import numpy as np
import matplotlib.pyplot as plt
from scipy.ndimage import interpolation,filters

def scale_to_h(img,target_height,order=1,dtype=np.dtype('f'),cval=0):
    h,w = img.shape
    scale = target_height*1.0/h
    target_width = int(scale*w)
    output = interpolation.affine_transform(1.0*img,np.eye(2)/scale,order=order,
                                            output_shape=(target_height,target_width),
                                            mode='constant',cval=cval)
    output = np.array(output,dtype=dtype)
    return output

class CenterNormalizer:
    def __init__(self,target_height=48,params=(4,1.0,0.3)):
        self.debug = int(os.getenv("debug_center") or "0")
        self.target_height = target_height
        self.range,self.smoothness,self.extra = params
    def setHeight(self,target_height):
        self.target_height = target_height
    def measure(self,line):
        h,w = line.shape
        h=float(h)
        w=float(w)
        smoothed = filters.gaussian_filter(line,(h*0.5,h*self.smoothness),mode='constant')
        smoothed += 0.001*filters.uniform_filter(smoothed,(h*0.5,w),mode='constant')
        self.shape = (h,w)
        a = np.argmax(smoothed,axis=0)
        a = filters.gaussian_filter(a,h*self.extra)
        self.center = np.array(a,'i')
        deltas = np.abs(np.arange(h)[:,np.newaxis]-self.center[np.newaxis,:])
        self.mad = np.mean(deltas[line!=0])
        self.r = int(1+self.range*self.mad)
        if self.debug:
            plt.figure("center")
            plt.imshow(line,cmap=plt.cm.gray)
            plt.plot(self.center)
            plt.ginput(1,1000)
    def dewarp(self,img,cval=0,dtype=np.dtype('f')):
        print(img.shape==self.shape)
        assert img.shape==self.shape
        h,w = img.shape
        # The actual image img is embedded into a larger image by
        # adding vertical space on top and at the bottom (padding)
        hpadding = self.r # this is large enough
        padded = np.vstack([cval*np.ones((hpadding,w)),img,cval*np.ones((hpadding,w))])
        center = self.center + hpadding
        dewarped = [padded[center[i]-self.r:center[i]+self.r,i] for i in range(w)]
        dewarped = np.array(dewarped,dtype=dtype).T
        return dewarped
    def normalize(self,img,order=1,dtype=np.dtype('f'),cval=0):
        dewarped = self.dewarp(img,cval=cval,dtype=dtype)
        h,w = dewarped.shape
        scaled = scale_to_h(dewarped,self.target_height,order=order,dtype=dtype,cval=cval)
        return scaled

if __name__=="__main__":
    cn=CenterNormalizer()
    import cv2
    image=cv2.imread("20180727122251.png",0)
    image=(image>128)*255
    image=255-image

    image=np.float32(image)
    cn.measure(image)
    scaled=cn.normalize(image)
    print(np.max(scaled),np.min(scaled))
    cv2.imwrite("scaled.png",255-scaled)

This step dewarp operation , When the diversity of training data is not great , There are still some advantages . Of course, if the data diversity is relatively large , Doing this operation may not improve greatly . Of course, stn Wait for the operation , The effect may be better than this operation .

ocropus-linegen：

This module is mainly based on corpus （ Adventures of Tom Sawyer ） And font files （DejaVuSans.ttf） Generate training text and label. A well written document . It must be useful on the ground .

from __future__ import print_function

import random as pyrandom
import glob
import sys
import os
import re
import codecs
import traceback
import argparse

import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from PIL import ImageFont,ImageDraw
from scipy.ndimage import filters,measurements,interpolation
from scipy.misc import imsave

replacements = [
    (u'[_~#]',u"~"), # OCR control characters
    (u'"',u"''"), # typewriter double quote
    (u"`",u"'"), # grave accent
    (u'[“”]',u"''"), # fancy quotes
    (u"´",u"'"), # acute accent
    (u"[‘’]",u"'"), # left single quotation mark
    (u"[“”]",u"''"), # right double quotation mark
    (u"“",u"''"), # German quotes
    (u"„",u",,"), # German quotes
    (u"…",u"..."), # ellipsis
    (u"′",u"'"), # prime
    (u"″",u"''"), # double prime
    (u"‴",u"'''"), # triple prime
    (u"〃",u"''"), # ditto mark
    (u"µ",u"μ"), # replace micro unit with greek character
    (u"[–—]",u"-"), # variant length hyphens
    (u"ﬂ",u"fl"), # expand Unicode ligatures
    (u"ﬁ",u"fi"),
    (u"ﬀ",u"ff"),
    (u"ﬃ",u"ffi"),
    (u"ﬄ",u"ffl"),
]

import unicodedata
def normalize_text(s):
    """Apply standard Unicode normalizations for OCR.
    This eliminates common ambiguities and weird unicode
    characters."""
    #s = unicode(s)
    s = unicodedata.normalize('NFC',s)
    s = re.sub(r'\s+(?u)',' ',s)
    s = re.sub(r'\n(?u)','',s)
    s = re.sub(r'^\s+(?u)','',s)
    s = re.sub(r'\s+$(?u)','',s)
    for m,r in replacements:
        s = re.sub((m),(r),s)
        #s = re.sub(unicode(m),unicode(r),s)
    return s

parser = argparse.ArgumentParser(description = "Generate text line training data")
parser.add_argument('-o','--base',default='linegen',help='output directory, default: %(default)s')
parser.add_argument('-r','--distort',type=float,default=1.0)
parser.add_argument('-R','--dsigma',type=float,default=20.0)
parser.add_argument('-f','--fonts',default="tests/DejaVuSans.ttf")
parser.add_argument('-F','--fontlist',default=None)
parser.add_argument('-t','--texts',default="tests/tomsawyer.txt")
parser.add_argument('-T','--textlist',default=None)
parser.add_argument('-m','--maxlines',default=200,type=int,
    help='max # lines for each directory, default: %(default)s')
parser.add_argument('-e','--degradations',default="lo",
    help="lo, med, or hi; or give a file, default: %(default)s")
parser.add_argument('-j','--jitter',default=0.5)
parser.add_argument('-s','--sizes',default="40-70")
parser.add_argument('-d','--display',action="store_true")
parser.add_argument('--numdir',action="store_true")
parser.add_argument('-C','--cleanup',default='[_~#]')
parser.add_argument('-D','--debug_show',default=None,
    help="select a class for stepping through")
args = parser.parse_args()

if "-" in args.sizes:
    lo,hi = args.sizes.split("-")
    sizes = range(int(lo),int(hi)+1)
else:
    sizes = [int(x) for x in args.sizes.split(",")]


if args.degradations=="lo":
    # sigma +/-   threshold +/-
    deglist = """
    0.5 0.0   0.5 0.0
    """
elif args.degradations=="med":
    deglist = """
    0.5 0.0   0.5 0.05
    1.0 0.3   0.4 0.05
    1.0 0.3   0.5 0.05
    1.0 0.3   0.6 0.05
    """
elif args.degradations=="hi":
    deglist = """
    0.5 0.0   0.5 0.0
    1.0 0.3   0.4 0.1
    1.0 0.3   0.5 0.1
    1.0 0.3   0.6 0.1
    1.3 0.3   0.4 0.1
    1.3 0.3   0.5 0.1
    1.3 0.3   0.6 0.1
    """
elif args.degradations is not None:
    with open(args.degradations) as stream:
        deglist = stream.read()

degradations = []
for deg in deglist.split("\n"):
    deg = deg.strip()
    if deg=="": continue
    deg = [float(x) for x in deg.split()]
    degradations.append(deg)

if args.fonts is not None:
    fonts = []
    for pat in args.fonts.split(':'):
        if pat=="": continue
        fonts += sorted(glob.glob(pat))
elif args.fontlist is not None:
    with open(args.fontlist) as fh:
        lines = (line.strip() for line in fh)
        fonts = [line for line in lines if line]
else:
    print("use -f or -F arguments to specify fonts")
    sys.exit(1)
assert len(fonts)>0,"no fonts?"
print("fonts", fonts)

if args.texts is not None:
    texts = []
    for pat in args.texts.split(':'):
        print(pat)
        if pat=="": continue
        texts += sorted(glob.glob(pat))
elif args.textlist is not None:
    texts = re.split(r'\s*\n\s*',open(args.textlist).read())
else:
    print("use -t or -T arguments to specify texts")
    sys.exit(1)
assert len(texts)>0,"no texts?"

lines = []
for text in texts:
    print("# reading", text)
    with codecs.open(text,'r','utf-8') as stream:
        for line in stream.readlines():
            line = line.strip()
            line = re.sub(args.cleanup,'',line)
            if len(line)<1: continue
            lines.append(line)
print("got", len(lines), "lines")
assert len(lines)>0
lines = list(set(lines))
print("got", len(lines), "unique lines")


def rgeometry(image,eps=0.03,delta=0.3):
    m = np.array([[1+eps*np.random.randn(),0.0],[eps*np.random.randn(),1.0+eps*np.random.randn()]])
    w,h = image.shape
    c = np.array([w/2.0,h/2])
    d = c-np.dot(m,c)+np.array([np.random.randn()*delta,np.random.randn()*delta])
    return interpolation.affine_transform(image,m,offset=d,order=1,mode='constant',cval=image[0,0])


def rdistort(image,distort=3.0,dsigma=10.0,cval=0):
    h,w = image.shape
    hs = np.random.randn(h,w)
    ws = np.random.randn(h,w)
    hs = filters.gaussian_filter(hs,dsigma)
    ws = filters.gaussian_filter(ws,dsigma)
    hs *= distort/np.amax(hs)
    ws *= distort/np.amax(ws)
    def f(p):
        return (p[0]+hs[p[0],p[1]],p[1]+ws[p[0],p[1]])
    return interpolation.geometric_transform(image,f,output_shape=(h,w),
        order=1,mode='constant',cval=cval)

if args.debug_show:
    plt.ion()
    plt.gray()


base = args.base
print("base", base)
if os.path.exists(base)==False:
    os.mkdir(base)


def crop(image,pad=1):
    [[r,c]] = measurements.find_objects(np.array(image==0,'i'))
    r0 = r.start
    r1 = r.stop
    c0 = c.start
    c1 = c.stop
    image = image[r0-pad:r1+pad,c0-pad:c1+pad]
    return image

last_font = None
last_size = None
last_fontfile = None


def genline(text,fontfile=None,size=36,sigma=0.5,threshold=0.5):
    global image,draw,last_font,last_fontfile
    if last_fontfile!=fontfile or last_size!=size:
        last_font = ImageFont.truetype(fontfile,size)
        last_fontfile = fontfile
    font = last_font
    image = Image.new("L",(6000,200))
    draw = ImageDraw.Draw(image)
    draw.rectangle((0,0,6000,6000),fill="white")
    # print("\t", size, font)
    draw.text((250,20),text,fill="black",font=font)
    a = np.asarray(image,'f')
    a = a*1.0/np.amax(a)
    if sigma>0.0:
        a = filters.gaussian_filter(a,sigma)
    a += np.clip(np.random.randn(*a.shape)*0.2,-0.25,0.25)
    a = rgeometry(a)
    a = np.array(a>threshold,'f')
    a = crop(a,pad=3)
    # FIXME add grid warping here
    # clf(); ion(); gray(); imshow(a); ginput(1,0.1)
    del draw
    del image
    return a

lines_per_size = args.maxlines//len(sizes)
for pageno,font in enumerate(fonts):
    if args.numdir:
        pagedir = "%s/%04d"%(base,pageno+1)
    else:
        fbase = re.sub(r'^[./]*','',font)
        fbase = re.sub(r'[.][^/]*$','',fbase)
        fbase = re.sub(r'[/]','_',fbase)
        pagedir = "%s/%s"%(base,fbase)
    if os.path.exists(pagedir)==False:
        os.mkdir(pagedir)
    print("===", pagedir, font)
    lineno = 0
    while lineno<args.maxlines:
        (sigma,ssigma,threshold,sthreshold) = pyrandom.choice(degradations)
        sigma += (2*np.random.rand()-1)*ssigma
        threshold += (2*np.random.rand()-1)*sthreshold
        line = pyrandom.choice(lines)
        size = pyrandom.choice(sizes)
        with open(pagedir+".info","w") as stream:
            stream.write("%s\n"%font)
        try:
            image = genline(text=line,fontfile=font,
                size=size,sigma=sigma,threshold=threshold)
        except:
            traceback.print_exc()
            continue
        if np.amin(image.shape)<10: continue
        if np.amax(image)<0.5: continue
        if args.distort>0:
            image = rdistort(image,args.distort,args.dsigma,cval=np.amax(image))
        if args.display:
            plt.gray()
            plt.clf()
            plt.imshow(image)
            plt.ginput(1,0.1)
        fname = pagedir+"/01%04d"%lineno
        imsave(fname+".bin.png",image)
        gt = normalize_text(line)
        with codecs.open(fname+".gt.txt","w",'utf-8') as stream:
            stream.write(gt+"\n")
        print("%5.2f %5.2f %3d\t%s" % (sigma, threshold, size, line))
        lineno += 1

dlinputs:

data IO Read module , The main advantages ：

pure python
Support any deep learning framework
Support very large datasets
Support data flow
Support map-reduce And distributed data enhancement
Support tar,tfrecords Etc

ocropus-rtrain,ocropus-ltrain,dltrainer：

This module is a training module , The training process uses cpu Training . The model is a multi-layer perceptron MLP, It uses CTC loss. Training is fast .

dltrainer by ocropus3 Training module for .

ocropus-rpred,ocropus-lpred,ocroline：

This module is to test the trained model .

stay ocropus3 in , The identification module is convolution network module . Loss or CTC LOSS.

Network structure ：

Sequential(
  (0): Reorder BHWD->BDHW
  (1): CheckSizes [(0, 900), (1, 1), (48, 48), (0, 9000)]
  (2): Conv2d(1, 100, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): BatchNorm2d(100, eps=1e-05, momentum=0.1, affine=True)
  (4): ReLU()
  (5): MaxPool2d(kernel_size=(2, 1), stride=(2, 1), dilation=(1, 1), ceil_mode=False)
  (6): Conv2d(100, 200, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (7): BatchNorm2d(200, eps=1e-05, momentum=0.1, affine=True)
  (8): ReLU()
  (9): Reshape((0, [1, 2], 3))
  (10): CheckSizes [(0, 900), (0, 5000), (0, 9000)]
  (11): LSTM1:LSTM(4800, 200, bidirectional=1)
  (12): Conv1d(400, 97, kernel_size=(1,), stride=(1,))
  (13): Reorder BDW->BWD
  (14): CheckSizes [(0, 900), (0, 9000), (97, 97)]
)

ocropus-hocr,ocropus-gtedit,ocropus-visualize-results：

The recognition results are HTML Displayed modules .

ocropus-econf,ocropus-errs：

Calculate the error rate , Missing rate , Accuracy module .

summary ：

ocropy Forward and backward of the network python Realization , There is no dependence on the third-party neural network framework , Support your training , need python2 edition .
ocropy2,ocropus3 Yes pytorch rely on
ocropus3 Separate each module , Less coupling

Publisher ： Full stack programmer stack length , Reprint please indicate the source ：https://javaforall.cn/126699.html Link to the original text ：https://javaforall.cn

原网站

版权声明
本文为[Full stack programmer webmaster]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/204/202207231637070959.html

当前位置：网站首页>Recognition engine ocropy- & gt; ocropy2-＆gt; Ocropus3 summary

Recognition engine ocropy- & gt; ocropy2-＆gt; Ocropus3 summary

边栏推荐

猜你喜欢

随机推荐