caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Overview

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Abstract

This is a caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection.

This project is modified from py-R-FCN, and inclined nms and generate rotated box component is imported from EAST project. Thanks for the author's(@zxytim @argman) help. Please cite this paper if you find this useful.

Contents

  1. Abstract
  2. Structor
  3. Installation
  4. Demo
  5. Test
  6. Train
  7. Experiments
  8. Furthermore

Structor

Code structor

.
├── docker-compose.yml
├── docker // docker deps file
├── Dockerfile // docker build file
├── model // model directory
│   ├── caffemodel // trained caffe model
│   ├── icdar15_gt // ICDAR2015 groundtruth
│   ├── prototxt // caffe prototxt file
│   └── imagenet_models // pretrained on imagenet
├── nvidia-docker-compose.yml
├── logs
│   ├── submit // original submit file
│   ├── submit_zip // zip submit file
│   ├── snapshots
│   └── train
│       ├── VGG16.txt.*
│       └── snapshots
├── README.md
├── requirements.txt // python package
├── src
│   ├── cfgs // train config yml
│   ├── data // cache file
│   ├── lib
│   ├── _init_path.py
│   ├── demo.py
│   ├── eval_icdar15.py // eval 2015 icdar dataset F-meaure
│   ├── test_net.py
│   └── train_net.py
├── demo.sh
├── train.sh
├── images // test images
│   ├── img_1.jpg
│   ├── img_2.jpg
│   ├── img_3.jpg
│   ├── img_4.jpg
│   └── img_5.jpg
└── test.sh // test script

Data structor

It should have this basic structure

ICDARdevkit_Root
.
├── ICDAR2013
├── merge_train.txt  // images list contains ICDAR2013+ICDAR2015 train dataset, then raw data augmentation the same as the paper
├── ICDAR2015
│   ├── augmentation // contains all augmented images
│   └── ImageSets/Main/test.txt // ICDAR2015 test images list

Installation

Install caffe

It is highly recommended to use docker to build environment. More about how to configure docker, see Running with Docker If you are familiar with docker, please run

    1. nvidia-docker-compose run --rm --service-ports rrcnn bash
    2. bash ./demo.sh

If you don't familiar with docker, please follow py-R-FCN to install caffe.

Build

    cd src/lib && make
    

Download Model

  1. please download VGG16 pre-trained model on Imagenet, place it to model/imagenet_models/VGG16.v2.caffemodel.
  2. please download VGG16 trained model by this project, place it model/caffemodel/TextBoxes-v2_iter_12w.caffemodel.

Demo

It is recommended to use UNIX socket to support GUI for docker, plesase open another terminal and type:

    xhost + # may be you need it when open a new terminal
    # docker-compose.yml: mount host  volume : /tmp/.X11-unix to docker volume: /tmp/.X11-unix  
    # pass DISPLAY variable to docker container so host X server can display image in docker
    docker exec -it -e DISPLAY=$DISPLAY ${CURRENT_CONTAINER_ID} bash
    bash ./demo.sh

Test

Single Test

    bash ./test.sh

Multi-scale Test

    # please uncomment two lines in src/cfgs/faster_rcnn_end2end.yml
    SCALES: [720, 1200]
    MULTI_SCALES_NOC: True
    # modify src/lib/datasets/icdar.py to find ICDAR2015 test data, please refer to commit @bbac1cf
    # then run
    bash ./test.sh

Train

Train data

  • Mine: ICDAR2013+ICDAR2015 train dataset, and raw data augmentation, at last got 15977 images.
  • Paper: ICDAR2015 + 2000 focused scene text images they collected.

Train commands

  1. Go to ./src/lib/datasets/icdar.py, modify images path to let train.py find merge_train.txt images list.
  2. Remove cache in src/data/*.pkl or you can load cached roidb data of this project, and place it to src/data/
    # Train for RRCNN4-TextBoxes-v2-OHEM
    bash ./train.sh

note: If you use USE_FLIPPED=True&USE_FLIPPED_QUAD=True, you will get almost 31200 roidb.

Experiments

Mine VS Paper

Approaches Anchor Scales Pooled sizes Inclined NMS Test scales(short side) F-measure(Mine VS paper)
R2CNN-2 (4, 8, 16) (7, 7) Y (720) 71.12% VS 68.49%
R2CNN-3 (4, 8, 16) (7, 7) Y (720) 73.10% VS 74.29%
R2CNN-4 (4, 8, 16, 32) (7, 7) Y (720) 74.14% VS 74.36%
R2CNN-4 (4, 8, 16, 32) (7, 7) Y (720, 1200) 79.05% VS 81.80%
R2CNN-5 (4, 8, 16, 32) (7, 7) (11, 3) (3, 11) Y (720) 74.34% VS 75.34%
R2CNN-5 (4, 8, 16, 32) (7, 7) (11, 3) (3, 11) Y (720, 1200) 78.70% VS 82.54%

Appendixes

Approaches Anchor Scales aspect ration Pooled sizes Inclined NMS Test scales(short side) F-measure
R2CNN-4 (4, 8, 16, 32) (0.5, 1, 2) (7, 7) Y (720) 74.36%
R2CNN-4 (4, 8, 16, 32) (0.5, 1, 2) (7, 7) Y (720, 1200) VS 81.80%
R2CNN-4-TextBoxes-OHEM (4, 8, 16, 32) (0.5, 1, 2, 3, 5, 7, 10) (7, 7) Y (720) 76.53%

Furthermore

You can try Resnet-50, Resnet-101 and so on.

Owner
candler
a computer vision worker
candler
A tool combining EasyOCR and LaMa to automatically detect text and replace it with an inpainted background.

EasyLaMa (WIP) This is a tool combining EasyOCR and LaMa to automatically detect text and replace it with an inpainted background. Installation For GP

3 Sep 17, 2022
https://arxiv.org/abs/1904.01941

Character-Region-Awareness-for-Text-Detection- https://arxiv.org/abs/1904.01941 Train You can train SynthText data use python source/train_SynthText.p

DayDayUp 120 Dec 28, 2022
Apply different text recognition services to images of handwritten documents.

Handprint The Handwritten Page Recognition Test is a command-line program that invokes HTR (handwritten text recognition) services on images of docume

Caltech Library 117 Jan 02, 2023
Rest API Written In Python To Classify NSFW Images.

✨ NSFW Classifier API ✨ Rest API Written In Python To Classify NSFW Images. Fastest Solution If you don't want to selfhost it, there's already an inst

Akshay Rajput 23 Dec 30, 2022
The virtual calculator will be above the live streaming from your camera

The virtual calculator is above the live streaming from my camera usb , the program first detect my hand and in each frame calculate the distance between two finger ,if the distance is lower than the

gasbaoui mohammed al amine 5 Jul 01, 2022
Write-ups for the SwissHackingChallenge2021 CTF.

SwissHackingChallenge 2021 : Write-ups This repository contains a collection of my write-ups for challenges solved during the SwissHackingChallenge (S

Julien Béguin 3 Jun 07, 2021
Convert Text-to Handwriting Using Python

Convert Text-to Handwriting Using Python Description In this project we'll use python library that's "pywhatkit" for converting text to handwriting. t

8 Nov 19, 2022
零样本学习测评基准,中文版

ZeroCLUE 零样本学习测评基准,中文版 零样本学习是AI识别方法之一。 简单来说就是识别从未见过的数据类别,即训练的分类器不仅仅能够识别出训练集中已有的数据类别, 还可以对于来自未见过的类别的数据进行区分。 这是一个很有用的功能,使得计算机能够具有知识迁移的能力,并无需任何训练数据, 很符合现

CLUE benchmark 27 Dec 10, 2022
Um simples projeto para fazer o reconhecimento do captcha usado pelo jogo bombcrypto

CaptchaSolver - LEIA ISSO 😓 Para iniciar o codigo: pip install -r requirements.txt python captcha_solver.py Se você deseja pegar ver o resultado das

Kawanderson 50 Mar 21, 2022
Open Source Computer Vision Library

OpenCV: Open Source Computer Vision Library Resources Homepage: https://opencv.org Courses: https://opencv.org/courses Docs: https://docs.opencv.org/m

OpenCV 65.7k Jan 03, 2023
InverseRenderNet: Learning single image inverse rendering, CVPR 2019.

InverseRenderNet: Learning single image inverse rendering !! Check out our new work InverseRenderNet++ paper and code, which improves the inverse rend

Ye Yu 141 Dec 20, 2022
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

English | 简体中文 Introduction PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and a

27.5k Jan 08, 2023
Um RPG de texto orientado a objetos.

RPG de texto Um RPG de texto orientado a objetos, sem história. Um RPG (Role-playing game) baseado em texto em que você pode viajar para alguns locais

Vinicius 3 Oct 05, 2022
RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

RepMLP RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition Released the code of RepMLP together with an example o

260 Jan 03, 2023
a deep learning model for page layout analysis / segmentation.

OCR Segmentation a deep learning model for page layout analysis / segmentation. dependencies tensorflow1.8 python3 dataset: uw3-framed-lines-degraded-

99 Dec 12, 2022
A selectional auto-encoder approach for document image binarization

The code of this repository was used for the following publication. If you find this code useful please cite our paper: @article{Gallego2019, title =

Javier Gallego 89 Nov 18, 2022
TextBoxes: A Fast Text Detector with a Single Deep Neural Network https://github.com/MhLiao/TextBoxes 基于SSD改进的文本检测算法,textBoxes_note记录了之前整理的笔记。

TextBoxes: A Fast Text Detector with a Single Deep Neural Network Introduction This paper presents an end-to-end trainable fast scene text detector, n

zhangjing1 24 Apr 28, 2022
Play the Namibian game of Owela against a terrible AI. Built using Django and htmx.

Owela Club A Django project for playing the Namibian game of Owela against a dumb AI. Built following the rules described on the Mancala World wiki pa

Adam Johnson 18 Jun 01, 2022
SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

NVIDIA Research Projects 31 Nov 22, 2022
A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Attention-based OCR Visual attention-based OCR model for image recognition with additional tools for creating TFRecords datasets and exporting the tra

Ed Medvedev 933 Dec 29, 2022