Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Last update: Oct 15, 2021

Overview

Scene-Text-Detection-with-SPCNET

Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.08605] with tensorflow.

参考代码

网络实现主要借鉴Keras版本的Mask-RCNN,训练数据接口参考了argman/EAST.论文作者在知乎的文章介绍SPCNet.

训练

1、训练数据准备

训练数据放在data/下，训练数据准备在data/icdar.py：

data

icdar2017

Annotaions //image_1.txt
JPEGImages //image_1.jpg
train.txt //存储训练图片的名称，例如：image_1

2、参数修改

修改./train.py中的学习率、batch、模型存储路径等参数，如果需要调整网络参数，在nets/config.py中修改。

3、执行训练

python train.py

代码运行环境：Python2.7 tensorflow-gpu1.13 单张1080Ti

测试

修改demo.py中的模型文件夹路径、测试图片路径，然后执行python demo.py

测试结果：论文中还有一些地方我也不确定，因此目前没有在公开数据集测试。值得注意的是，按照原文中的训练说明，最好在多卡上训练，请加大你的batch size.

值得注意的地方

1、global text segmentation（gts）的训练

计算gts训练时损失函数时，我采用的方法是将feature pyramid的各个level产生的gts分别与全局mask gt计算softmax loss,然后取平均作为Loss_gts。因为没找到与原文关于这一块的描述，因此可能是其他的计算方法：每个level准备不同的mask_gt、将多个level的gts预测融合计算loss等等。感兴趣的可以去问问作者或者自己试试。

2、实现Rescore 时gts的选取

计算predict box对应的pyramid level,然后选取对应的gts计算。还有一种思路是：融合P2,P3,P4,P5的gts，然后计算box rescore.

3、Bounding Box的生成

MASK RCNN中是先对输出的box进行阈值过滤以及NMS，然后将剩余的回归之后的box对应的rois送入mask branch计算mask，目的是减少计算量同时获得更准确的mask。SPCNet为了减小FP与FN,对Inference流程做了修改：先对模型输出的box与mask进行Rescore,然后经过threshold filter，再对剩下的mask求Bounding Box,然后利用Poly NMS减少重叠，输出剩下的。

在目前代码（nets/models.py utils.py）里：是先对模型输出的box与mask进行Rescore,然后经过threshold filter与NMS，再对剩下的mask求Bounding Box,然后直接输出。

Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Related tags

Overview

Scene-Text-Detection-with-SPCNET

参考代码

训练

1、训练数据准备

2、参数修改

3、执行训练

测试

值得注意的地方

1、global text segmentation（gts）的训练

2、实现Rescore 时gts的选取

3、Bounding Box的生成

Owner

color detection using python

Read Japanese manga inside browser with selectable text.

Hiiii this is the Spanish for Linux and win 10 and in the near future the english version of PortScan my new tool on which you can see what ports are Open only with the IP adress.

Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"

A toolbox of scene text detection and recognition

Programa que viabiliza a OCR (Optical Character Reading - leitura óptica de caracteres) de um PDF.

governance proposal to make fei redeemable for eth

EAST for ICPR MTWI 2018 Challenge II (Text detection of network images)

Application that instantly translates sign-language to letters.

Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

Source Code for AAAI 2022 paper "Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching"

Simple app for visual editing of Page XML files

Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

原神风花节自动弹琴辅助

This project is basically to draw lines with your hand, using python, opencv, mediapipe.

A facial recognition program that plays a alarm (mp3 file) when a person i seen in the room. A basic theif using Python and OpenCV

It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.

An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

Demo for the paper "Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation"