MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

Overview

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

Python 2.7 Python 3.6
Build Status Build Status

MORAN is a network with rectification mechanism for general scene text recognition. The paper (accepted to appear in Pattern Recognition, 2019) in arXiv, final version is available now.

Here is a brief introduction in Chinese.

Recent Update

  • 2019.03.21 Fix a bug about Fractional Pickup.
  • Support Python 3.

Improvements of MORAN v2:

  • More stable rectification network for one-stage training
  • Replace VGG backbone by ResNet
  • Use bidirectional decoder (a trick borrowed from ASTER)
Version IIIT5K SVT IC03 IC13 SVT-P CUTE80 IC15 (1811) IC15 (2077)
MORAN v1 (curriculum training)* 91.2 88.3 95.0 92.4 76.1 77.4 74.7 68.8
MORAN v2 (one-stage training) 93.4 88.3 94.2 93.2 79.7 81.9 77.8 73.9

*The results of v1 were reported in our paper. If this project is helpful for your research, please cite our Pattern Recognition paper.

Requirements

(Welcome to develop MORAN together.)

We recommend you to use Anaconda to manage your libraries.

Or use pip to install the libraries. (Maybe the torch is different from the anaconda version. Please check carefully and fix the warnings in training stage if necessary.)

    pip install -r requirements.txt

Data Preparation

Please convert your own dataset to LMDB format by using the tool (run in Python 2.7) provided by @Baoguang Shi.

You can also download the training (NIPS 2014, CVPR 2016) and testing datasets prepared by us.

The raw pictures of testing datasets can be found here.

Training and Testing

Modify the path to dataset folder in train_MORAN.sh:

	--train_nips path_to_dataset \
	--train_cvpr path_to_dataset \
	--valroot path_to_dataset \

And start training: (manually decrease the learning rate for your task)

	sh train_MORAN.sh
  • The training process should take less than 20s for 100 iterations on a 1080Ti.

Demo

Download the model parameter file demo.pth.

Put it into root folder. Then, execute the demo.py for more visualizations.

	python demo.py

Citation

@article{cluo2019moran,
  author    = {Canjie Luo and Lianwen Jin and Zenghui Sun},
  title     = {MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition},
  journal   = {Pattern Recognition}, 
  volume    = {90}, 
  pages     = {109--118},
  year      = {2019},
  publisher = {Elsevier}
}

Acknowledgment

The repo is developed based on @Jieru Mei's crnn.pytorch and @marvis' ocr_attention. Thanks for your contribution.

Attention

The project is only free for academic research purposes.

Owner
Canjie Luo
Canjie Luo
Document Image Dewarping

Document image dewarping using text-lines and line Segments Abstract Conventional text-line based document dewarping methods have problems when handli

Taeho Kil 268 Dec 23, 2022
Let's explore how we can extract text from forms

Form Segmentation Let's explore how we can extract text from any forms / scanned pages. Objectives The goal is to find an algorithm that can extract t

Philip Doxakis 42 Jun 05, 2022
Some codes from PyImageSearch course's and external projects.

👨‍💻 Some codes and projects 👨‍💻 💡 Technologies 📜 Projects 📍 Chrome Dinosaur Controller 📦 Script 📍 Coins Counter 📦 Script 🤓 Author Lucas Biv

Lucas Bivar 25 Oct 24, 2021
Programa que viabiliza a OCR (Optical Character Reading - leitura óptica de caracteres) de um PDF.

Este programa tem o intuito de ser um modificador de arquivos PDF. Os arquivos PDFs podem ser 3: PDFs verdadeiros - em que podem ser selecionados o ti

Daniel Soares Saldanha 2 Oct 11, 2021
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

Microsoft 235 Dec 22, 2022
Pixie - A full-featured 2D graphics library for Python

Pixie - A full-featured 2D graphics library for Python Pixie is a 2D graphics library similar to Cairo and Skia. pip install pixie-python Features: Ty

treeform 65 Dec 30, 2022
End-to-end pipeline for real-time scene text detection and recognition.

Real-time-Scene-Text-Detection-and-Recognition-System End-to-end pipeline for real-time scene text detection and recognition. The detection model use

Fangneng Zhan 89 Aug 04, 2022
A tool for extracting text from scanned documents (via OCR), with user-defined post-processing.

The project is based on older versions of tesseract and other tools, and is now superseded by another project which allows for more granular control o

Maxim 32 Jul 24, 2022
A python screen recorder for low-end computers, provides high quality video output.

RecorderX - v1.0 A screen recorder made in Python with the help of OpenCv, it has ability to record your screen in high quality. No matter what your P

Priyanshu Jindal 4 Nov 10, 2021
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

English | 简体中文 Introduction PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and a

27.5k Jan 08, 2023
Generate a list of papers with publicly available source code in the daily arxiv

2021-06-08 paper code optimal network slicing for service-oriented networks with flexible routing and guaranteed e2e latency networkslicing multi-moda

79 Jan 03, 2023
Table recognition inside douments using neural networks

TableTrainNet A simple project for training and testing table recognition in documents. This project was developed to make a neural network which reco

Giovanni Cavallin 93 Jul 24, 2022
Autonomous Driving project for Euro Truck Simulator 2

hope-autonomous-driving Autonomous Driving project for Euro Truck Simulator 2 Video: How is it working ? In this video, the program processes the imag

Umut Görkem Kocabaş 36 Nov 06, 2022
Comparison-of-OCR (KerasOCR, PyTesseract,EasyOCR)

Optical Character Recognition OCR (Optical Character Recognition) is a technology that enables the conversion of document types such as scanned paper

21 Dec 25, 2022
Image augmentation for machine learning experiments.

imgaug This python library helps you with augmenting images for your machine learning projects. It converts a set of input images into a new, much lar

Alexander Jung 13.2k Jan 02, 2023
Extract tables from scanned image PDFs using Optical Character Recognition.

ocr-table This project aims to extract tables from scanned image PDFs using Optical Character Recognition. Install Requirements Tesseract OCR sudo apt

Abhijeet Singh 209 Dec 06, 2022
Generates a message from the infamous Jerma Impostor image

Generate your very own jerma sus imposter message. Modes: Default Mode: Only supports the characters " ", !, a, b, c, d, e, h, i, m, n, o, p, q, r, s,

Giorno420 1 Oct 27, 2022
Face Detection with DLIB

Face Detection with DLIB In this project, we have detected our face with dlib and opencv libraries. Setup This Project Install DLIB & OpenCV You can i

Can 2 Jan 16, 2022
CNN+Attention+Seq2Seq

Attention_OCR CNN+Attention+Seq2Seq The model and its tensor transformation are shown in the figure below It is necessary ch_ train and ch_ test the p

Tsukinousag1 2 Jul 14, 2022
The papers published in top-tier AI conferences in recent years.

AI-conference-papers The papers published in top-tier AI conferences in recent years. Paper table AAAI ICLR CVPR ICML ICCV ECCV NIPS 2019 ✔️ ✔️ ✔️ ✔️

Jinbae Park 6 Dec 09, 2022