当前位置:网站首页>[PyTorchVideo Tutorial 01] Quickly implement video action recognition
[PyTorchVideo Tutorial 01] Quickly implement video action recognition
2022-07-30 19:21:00 【CV-Yang Fan】
1 PyTorchVideo介绍
PyTorchVideo是Facebook2021年4月份发布,Mainly for video deep learning applications.
b站:https://www.bilibili.com/video/BV1QT411j7M3
1.1 参考资料:
pytorchvideo官网:https://pytorchvideo.org/
pytorchvideo Github:https://github.com/facebookresearch/pytorchvideo
Tutorials:https://pytorchvideo.org/docs/tutorial_torchhub_inference
深入浅出PyTorch:8.3 PyTorchVideo简介
PyTorchVideo: Deep Learning for Video,你想要的它都有:https://zhuanlan.zhihu.com/p/390909705
PyTorchVideo: A Deep Learning Library for Video Understanding:https://arxiv.org/pdf/2111.09887.pdf
1.2 介绍
近几年来,随着传播媒介和视频平台的发展,视频正在取代图片成为下一代的主流媒体,这也使得有关视频的深度学习模型正在获得越来越多的关注.
然而,有关视频的深度学习模型仍然有着许多缺点:
- 计算资源耗费更多,并且没有高质量的 model zoo,不能像图片一样进行迁移学习和论文复现.
- 数据集处理较麻烦,但没有一个很好的视频处理工具.
- 随着多模态越来越流行,亟需一个工具来处理其他模态.
除此之外,There are also issues such as deployment optimization,为了解决这些问题,Meta推出了PyTorchVideo深度学习库(Contains components such asFigure 1所示).PyTorchVideo 是一个专注于视频理解工作的深度学习库.PytorchVideo 提供了加速视频理解研究所需的可重用、模块化和高效的组件.PyTorchVideo 是使用PyTorch开发的,支持不同的深度学习视频组件,如视频模型、视频数据集和视频特定转换.
Put one before the text beginsdemo,PyTorchVideoDeploy optimization mods through the model(accelerator)It is the first to realize real-time video action recognition on the mobile terminal(基于X3D模型),It is no longer a dream to run video models on the mobile terminal in the future.
PyTorchVideo Real-time video action recognition for mobile
PyTorchVideo A deep learning library for video understanding
3 GPU平台
极链AI:https://cloud.videojj.com/auth/register?inviter=18452&activityChannel=student_invite
镜像快速搭建
4 安装pytorchvideo
cd /home
pip install pytorchvideo
wget https://dl.fbaipublicfiles.com/pyslowfast/dataset/class_names/kinetics_classnames.json
wget https://dl.fbaipublicfiles.com/pytorchvideo/projects/archery.mp4
如果archery.mp4无法下载,可以先下载好,然后上传,I have uploaded the video resources to Alibaba Cloud Disk:
https://www.aliyundrive.com/s/xjzfmH3uoFB
我在csdnVideo resources are also uploaded:archery.mp4 行为识别 pytorchvideo demo演示视频(行为识别)
5 demo演示
A video needs to be prepared in advance
开始搭建(使用Notebook,Mainly look at the intermediate steps)
import torch
import json
from torchvision.transforms import Compose, Lambda
from torchvision.transforms._transforms_video import (
CenterCropVideo,
NormalizeVideo,
)
from pytorchvideo.data.encoded_video import EncodedVideo
from pytorchvideo.transforms import (
ApplyTransformToKey,
ShortSideScale,
UniformTemporalSubsample,
UniformCropVideo
)
from typing import Dict
# Device on which to run the model
# Set to cuda to load on GPU
device = "cpu"
# Pick a pretrained model and load the pretrained weights
model_name = "slowfast_r50"
model = torch.hub.load("facebookresearch/pytorchvideo", model=model_name, pretrained=True)
# Set to eval mode and move to desired device
model = model.to(device)
model = model.eval()
with open("kinetics_classnames.json", "r") as f:
kinetics_classnames = json.load(f)
# Create an id to label name mapping
kinetics_id_to_classname = {
}
for k, v in kinetics_classnames.items():
kinetics_id_to_classname[v] = str(k).replace('"', "")
####################
# SlowFast transform
####################
side_size = 256
mean = [0.45, 0.45, 0.45]
std = [0.225, 0.225, 0.225]
crop_size = 256
num_frames = 32
sampling_rate = 2
frames_per_second = 30
alpha = 4
class PackPathway(torch.nn.Module):
""" Transform for converting video frames as a list of tensors. """
def __init__(self):
super().__init__()
def forward(self, frames: torch.Tensor):
fast_pathway = frames
# Perform temporal sampling from the fast pathway.
slow_pathway = torch.index_select(
frames,
1,
torch.linspace(
0, frames.shape[1] - 1, frames.shape[1] // alpha
).long(),
)
frame_list = [slow_pathway, fast_pathway]
return frame_list
transform = ApplyTransformToKey(
key="video",
transform=Compose(
[
UniformTemporalSubsample(num_frames),
Lambda(lambda x: x/255.0),
NormalizeVideo(mean, std),
ShortSideScale(
size=side_size
),
CenterCropVideo(crop_size),
PackPathway()
]
),
)
# The duration of the input clip is also specific to the model.
clip_duration = (num_frames * sampling_rate)/frames_per_second
# Load the example video
video_path = "archery.mp4"
# Select the duration of the clip to load by specifying the start and end duration
# The start_sec should correspond to where the action occurs in the video
start_sec = 0
end_sec = start_sec + clip_duration
# Initialize an EncodedVideo helper class
video = EncodedVideo.from_path(video_path)
# Load the desired clip
video_data = video.get_clip(start_sec=start_sec, end_sec=end_sec)
# Apply a transform to normalize the video input
video_data = transform(video_data)
# Move the inputs to the desired device
inputs = video_data["video"]
inputs = [i.to(device)[None, ...] for i in inputs]
# Pass the input clip through the model
preds = model(inputs)
# Get the predicted classes
post_act = torch.nn.Softmax(dim=1)
preds = post_act(preds)
pred_classes = preds.topk(k=5).indices
# Map the predicted classes to the label names
pred_class_names = [kinetics_id_to_classname[int(i)] for i in pred_classes[0]]
print("Predicted labels: %s" % ", ".join(pred_class_names))
处理结果:
Predicted labels: archery, throwing axe, playing paintball, disc golfing, riding or walking with horse
边栏推荐
- MindSpore:自定义dataset的tensor问题
- Range.CopyFromRecordset method (Excel)
- What is the difference between a cloud database and an on-premises database?
- Mysql execution principle analysis
- 【刷题篇】计算质数
- What is a RESTful API?
- The problem of writing go run in crontab does not execute
- VS Code connects to SQL Server
- LocalDate时间生成
- Tensorflow2.0 confusion matrix does not match printing accuracy
猜你喜欢
6 yuan per catty, why do Japanese companies come to China to collect cigarette butts?
Mysql execution principle analysis
牛客刷题系列之进阶版(搜索旋转排序数组,链表内指定区间反转)
SimpleOSS third-party library libcurl and engine libcurl error solution
[Prometheus] An optimization record of the Prometheus federation [continued]
C# wpf 无边框窗口添加阴影效果
VS Code connects to SQL Server
C# wpf borderless window add shadow effect
redis
云数据库和本地数据库有什么区别?
随机推荐
NC | Tao Liang Group of West Lake University - TMPRSS2 "assists" virus infection and mediates the host invasion of Clostridium sothrix hemorrhagic toxin...
VBA batch import Excel data into Access database
【Swords Offer】Swords Offer 17. Print n digits from 1 to the largest
开心的聚餐
不同的路径依赖
深入浅出边缘云 | 3. 资源配置
Recommendation | People who are kind to you, don't repay them by inviting them to eat
kotlin by lazy
OneFlow source code analysis: Op, Kernel and interpreter
MindSpore:Cifar10Dataset‘s num_workers=8, this value is not within the required range of [1, cpu_thr
还有三天忙完
6 yuan per catty, why do Japanese companies come to China to collect cigarette butts?
【flink】报错整理 Could not instantiate the executor. Make sure a planner module is on the classpath
VBA runtime error '-2147217900 (80040e14): Automation error
【MindSpore1.2.0-rc1产品】num_workers问题
JS提升:Promise中reject与then之间的关系
MindSpore:【模型训练】【mindinsight】timeline的时间和实际用时相差很远
LeetCode每日一题(1717. Maximum Score From Removing Substrings)
经济新闻:错误# 15:初始化libiomp5md。dll,但发现libiomp5md。已经初始化dll。解决方法
SwiftUI iOS Boutique Open Source Project Complete Baked Food Recipe App based on SQLite (tutorial including source code)