基于openpose和图像分类的手语识别项目

Overview

手语识别


0、使用到的模型

(1). openpose,作者:CMU-Perceptual-Computing-Lab

https://github.com/CMU-Perceptual-Computing-Lab/openpose

(2). 图像分类classification,作者:Bubbliiiing

https://github.com/bubbliiiing/classification-pytorch

B站对应视频:https://www.bilibili.com/video/BV143411B7wg

(3). 手语教学视频,作者:二碳碳

https://www.bilibili.com/video/BV1XE41137LV

(感谢大佬们的开源项目和教程,都已star加三连)




1、大致思路

方法一: 将视频输入到openpose中,检测出关节点的变化轨迹,将轨迹绘制在一张图片上,把这张图片传到图像分类网络中检测属于哪个动作

视频  ----->  |  openpose  |-----> 关节点运动轨迹图-------> |  图像分类模型  | ----------> 单词分类  

方法二: 将视频输入到openpose中,检测出每一帧中关节点的位置,将多帧进行堆叠,形成一个三维张量,其中两个维度是图片的宽和高,一个维度是时间,然后对这个三维张量使用三维卷积进行训练和预测

视频  ----->  |  openpose  |-----> 多张关节点位置图 ---------> |  堆叠  | --------> 三维张量 -------> |  三维卷积网络  | ----------> 单词分类  



2、环境配置

python:3.7(其他版本会导致openpose无法运行,建议使用anaconda的python环境)
cuda:10
cudnn:7或8应该都行
(配置cuda和cudnn会比较麻烦,如果实在不想配,你可以去openpose的github网站下载使用cpu的版本,这里这个版本应该不支持cpu)


具体的配置环境方式:

(0).python和cuda和cudnn自己装


(1).下载文件

下载代码文件后,再从网盘下载模型和数据文件(没有这些跑不起来),网盘链接:

链接:https://pan.baidu.com/s/1Q2aVVhMhSfWL4qKS9QslkQ 
提取码:abcd 

将从github下载的文件夹和网盘下载的文件夹合并,然后就可以下一步了。 (当然你大可直接找我要u盘拿完整的文件)


(2).安装requirements.txt中的库

cmd进入环境后,cd到项目文件夹下,执行指令:

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

(3).安装torch和torchvision

先下载好torch(1.2.0)和torchvision(0.4.0)的whl文件,下载地址:

链接:https://pan.baidu.com/s/1QIuJfEE5qQFpXY8ZlHeLNQ 
提取码:abcd 

(当然你依旧可直接找我拿u盘)

下载好torch和torchvision的whl文件后,cmd进入环境,cd到下载文件夹下,执行指令:

pip install [torch或torchvision的whl文件的文件名]

(先装torch再装torchvision,不然有可能会报错)



3、测试运行openpose

项目文件夹下有三个文件:

test.py
test_video.py
test_video_track_point.py

分别对应openpose的功能:检测图片、检测视频、检测视频并绘制关节点轨迹
具体的使用方法可以看文件中的注释部分


在test_video_track_point.py中,取消掉最后几行的注释,就可以将绘制的轨迹图送到classification中去做分类检测
(不过现阶段分类器尚未做好)




4、classification的训练和使用

可以看下classification文件夹中的README.md文件,大佬已经在里边讲得很详细了

CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras

简介 基于Tensorflow和Keras实现端到端的不定长中文字符检测和识别 文本检测:CTPN 文本识别:DenseNet + CTC 环境部署 sh setup.sh 注:CPU环境执行前需注释掉for gpu部分,并解开for cpu部分的注释 Demo 将测试图片放入test_images

Yang Chenguang 2.6k Dec 29, 2022
This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Handwritten Text Recognition (OCR) with MXNet Gluon These notebooks have been created by Jonathan Chung, as part of his internship as Applied Scientis

Amazon Web Services - Labs 422 Jan 03, 2023
Text Detection from images using OpenCV

EAST Detector for Text Detection OpenCV’s EAST(Efficient and Accurate Scene Text Detection ) text detector is a deep learning model, based on a novel

Abhishek Singh 88 Oct 20, 2022
Recognizing cropped text in natural images.

ASTER: Attentional Scene Text Recognizer with Flexible Rectification ASTER is an accurate scene text recognizer with flexible rectification mechanism.

Baoguang Shi 681 Jan 02, 2023
QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021)

QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021) Yuanming Hu, Jiafeng Liu, Xuanda Yang, Mingkuan Xu, Ye Kuang, Weiwei Xu, Qiang Dai, W

Taichi Developers 119 Dec 02, 2022
Geometric Augmentation for Text Image

Text Image Augmentation A general geometric augmentation tool for text images in the CVPR 2020 paper "Learn to Augment: Joint Data Augmentation and Ne

Canjie Luo 440 Jan 05, 2023
This is a tensorflow re-implementation of PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network.My blog:

PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network Introduction This is a tensorflow re-implementation of PSENet: Shape Robu

Michael liu 498 Dec 30, 2022
Generating .npy dataset and labels out of given image, containing numbers from 0 to 9, using opencv

basic-dataset-generator-from-image-of-numbers generating .npy dataset and labels out of given image, containing numbers from 0 to 9, using opencv inpu

1 Jan 01, 2022
Volume Control using OpenCV

Gesture-Volume-Control Volume Control using OpenCV Here i made volume control using Python and OpenCV in which we can control the volume of our laptop

Mudit Sinha 3 Oct 10, 2021
零样本学习测评基准,中文版

ZeroCLUE 零样本学习测评基准,中文版 零样本学习是AI识别方法之一。 简单来说就是识别从未见过的数据类别,即训练的分类器不仅仅能够识别出训练集中已有的数据类别, 还可以对于来自未见过的类别的数据进行区分。 这是一个很有用的功能,使得计算机能够具有知识迁移的能力,并无需任何训练数据, 很符合现

CLUE benchmark 27 Dec 10, 2022
Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

CSCBLI Code for our ACL Findings 2021 paper, "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction". Require

Jinpeng Zhang 12 Oct 08, 2022
Python bindings for JIGSAW: a Delaunay-based unstructured mesh generator.

JIGSAW: An unstructured mesh generator JIGSAW is an unstructured mesh generator and tessellation library; designed to generate high-quality triangulat

Darren Engwirda 26 Dec 13, 2022
Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"

TableNet Unofficial implementation of ICDAR 2019 paper : TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from

Jainam Shah 243 Dec 30, 2022
computer vision, image processing and machine learning on the web browser or node.

Image processing and Machine learning labs   computer vision, image processing and machine learning on the web browser or node note Fast Fourier Trans

ryohei tanaka 487 Nov 11, 2022
This repo contains a script that allows us to find range of colors in images using openCV, and then convert them into geo vectors.

Vectorizing color range This repo contains a script that allows us to find range of colors in images using openCV, and then convert them into geo vect

Development Seed 9 Jul 27, 2022
An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments

Tips: A more recent scene text detection algorithm: PixelLink, has been implemented here: https://github.com/ZJULearning/pixel_link Contents: Introduc

dengdan 484 Dec 07, 2022
scene-linear test images

Scene-Referred Image Collection A collection of OpenEXR Scene-Referred images, encoded as max 2048px width, DWAA 80 compression. All exrs are encoded

Gralk Klorggson 7 Aug 25, 2022
Natural language detection

Detect the language of text. What’s so cool about franc? franc can support more languages(†) than any other library franc is packaged with support for

Titus 3.8k Jan 02, 2023
Train custom VR face tracking parameters

Pal Buddy Guy: The anipal's best friend This is a small script to improve upon the tracking capabilities of the Vive Pro Eye and facial tracker. You c

7 Dec 12, 2021