Code for "Primitive Representation Learning for Scene Text Recognition" (CVPR 2021)

Related tags

Deep Learningpren
Overview

Primitive Representation Learning Network (PREN)

This repository contains the code for our paper accepted by CVPR 2021

Primitive Representation Learning for Scene Text Recognition

Ruijie Yan, Liangrui Peng, Shanyu Xiao, Gang Yao

For now we only provide code for PREN.

Requirements

  • python 3.7.9, pytorch 1.4.0, and torchvision 0.5.0
  • other libraries can be installed by
pip install -r requirements.txt

Recognition with pretrained model

We provide code for using our pretrained model to recognize text images.

  • The pretrained model can be downloaded via Baidu net disk: download_link key: 2txt

  • After downloading the pretrained model (pren.pth), put it in the "models" folder.

  • To recognize three samples in the "samples" folder, just run

python recog.py

The results would be

[Info] Load model from ./models/pren.pth
samples/001.jpg: ronaldo
samples/002.png: leaves
samples/003.jpg: salmon

Training

Two simple steps to train your own model:

  • Modify training configurations in Configs/trainConf.py
  • Run python train.py

To run the training code, please modify image_dir and train_list to your own training data.

image_dir is the path of training data root.

train_list is the path of a text file containing image paths (relative to image_dir) and corresponding labels.

For example, image_dir could be './samples', and train_list could be a text file with the following content

001.jpg RONALDO
002.png LEAVES
003.jpg SALMON

Evaluation

Similar to train, one can modify Configs/testConf.py and run python test.py to evaluate a model.

Acknowledgement

The code of EfficientNet is modified from EfficientNet-PyTorch, where we output multi-scale feature maps.

Citation

If you find this project helpful for your research, please cite our paper

@inproceedings{yan2021primitive,
  author    = {Yan, Ruijie and
               Peng, Liangrui and
               Xiao, Shanyu and
               Yao, Gang},
  title     = {Primitive Representation Learning for Scene Text Recognition},
  booktitle = {CVPR},
  year      = {2021}
}
Owner
Ruijie Yan
Ruijie Yan
A 3D sparse LBM solver implemented using Taichi

taichi_LBM3D Background Taichi_LBM3D is a 3D lattice Boltzmann solver with Multi-Relaxation-Time collision scheme and sparse storage structure impleme

Jianhui Yang 121 Jan 06, 2023
MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python

MNE-Python MNE-Python software is an open-source Python package for exploring, visualizing, and analyzing human neurophysiological data such as MEG, E

MNE tools for MEG and EEG data analysis 2.1k Dec 28, 2022
Quantum-enhanced transformer neural network

Example of a Quantum-enhanced transformer neural network Get the code: git clone https://github.com/rdisipio/qtransformer.git cd qtransformer Create

Riccardo Di Sipio 61 Nov 08, 2022
Reverse engineer your pytorch vision models, in style

šŸ” Rover Reverse engineer your CNNs, in style Rover will help you break down your CNN and visualize the features from within the model. No need to wri

Mayukh Deb 32 Sep 24, 2022
Semantic Image Synthesis with SPADE

Semantic Image Synthesis with SPADE New implementation available at imaginaire repository We have a reimplementation of the SPADE method that is more

NVIDIA Research Projects 7.3k Jan 07, 2023
Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation.

Understanding Minimum Bayes Risk Decoding This repo provides code and documentation for the following paper: MĆ¼ller and Sennrich (2021): Understanding

ZurichNLP 13 May 01, 2022
This is the official code for the paper "Ad2Attack: Adaptive Adversarial Attack for Real-Time UAV Tracking".

Ad^2Attackļ¼šAdaptive Adversarial Attack on Real-Time UAV Tracking Demo video šŸ“¹ Our video on bilibili demonstrates the test results of Ad^2Attack on se

Intelligent Vision for Robotics in Complex Environment 10 Nov 07, 2022
Pytorch Geometric Tutorials

Pytorch Geometric Tutorials

Antonio Longa 648 Jan 08, 2023
Image to Image translation, image generataton, few shot learning

Semi-supervised Learning for Few-shot Image-to-Image Translation [paper] Abstract: In the last few years, unpaired image-to-image translation has witn

yaxingwang 49 Nov 18, 2022
A framework that constructs deep neural networks, autoencoders, logistic regressors, and linear networks

A framework that constructs deep neural networks, autoencoders, logistic regressors, and linear networks without the use of any outside machine learning libraries - all from scratch.

Kordel K. France 2 Nov 14, 2022
A scikit-learn-compatible module for estimating prediction intervals.

|Anaconda|_ MAPIE - Model Agnostic Prediction Interval Estimator MAPIE allows you to easily estimate prediction intervals using your favourite sklearn

SimAI 584 Dec 27, 2022
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

DALL-E in Pytorch Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch. It will also contain CLIP for ranking the ge

Phil Wang 5k Jan 04, 2023
Multi-Glimpse Network With Python

Multi-Glimpse Network Our code requires Python ā‰„ 3.8 Installation For example, venv + pip: $ python3 -m venv env $ source env/bin/activate (env) $ pyt

9 May 10, 2022
BarcodeRattler - A Raspberry Pi Powered Barcode Reader to load a game on the Mister FPGA using MBC

Barcode Rattler A Raspberry Pi Powered Barcode Reader to load a game on the Mist

Chrissy 29 Oct 31, 2022
Large-Scale Unsupervised Object Discovery

Large-Scale Unsupervised Object Discovery Huy V. Vo, Elena Sizikova, Cordelia Schmid, Patrick PĆ©rez, Jean Ponce [PDF] We propose a novel ranking-based

17 Sep 19, 2022
MODALS: Modality-agnostic Automated Data Augmentation in the Latent Space

Update (20 Jan 2020): MODALS on text data is avialable MODALS MODALS: Modality-agnostic Automated Data Augmentation in the Latent Space Table of Conte

38 Dec 15, 2022
Employs neural networks to classify images into four categories: ship, automobile, dog or frog

Neural Net Image Classifier Employs neural networks to classify images into four categories: ship, automobile, dog or frog Viterbi_1.py uses a classic

Riley Baker 1 Jan 18, 2022
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

PySlowFast PySlowFast is an open source video understanding codebase from FAIR that provides state-of-the-art video classification models with efficie

Meta Research 5.3k Jan 03, 2023
PyTorch code for the paper "FIERY: Future Instance Segmentation in Bird's-Eye view from Surround Monocular Cameras"

FIERY This is the PyTorch implementation for inference and training of the future prediction bird's-eye view network as described in: FIERY: Future In

Wayve 406 Dec 24, 2022
Unofficial Implementation of RobustSTL: A Robust Seasonal-Trend Decomposition Algorithm for Long Time Series (AAAI 2019)

RobustSTL: A Robust Seasonal-Trend Decomposition Algorithm for Long Time Series (AAAI 2019) This repository contains python (3.5.2) implementation of

Doyup Lee 222 Dec 21, 2022