当前位置:网站首页>[PyTorchVideo Tutorial 01] Quickly implement video action recognition
[PyTorchVideo Tutorial 01] Quickly implement video action recognition
2022-07-30 19:21:00 【CV-Yang Fan】
1 PyTorchVideo介绍
PyTorchVideo是Facebook2021年4月份发布,Mainly for video deep learning applications.
b站:https://www.bilibili.com/video/BV1QT411j7M3
1.1 参考资料:
pytorchvideo官网:https://pytorchvideo.org/
pytorchvideo Github:https://github.com/facebookresearch/pytorchvideo
Tutorials:https://pytorchvideo.org/docs/tutorial_torchhub_inference
深入浅出PyTorch:8.3 PyTorchVideo简介
PyTorchVideo: Deep Learning for Video,你想要的它都有:https://zhuanlan.zhihu.com/p/390909705
PyTorchVideo: A Deep Learning Library for Video Understanding:https://arxiv.org/pdf/2111.09887.pdf
1.2 介绍
近几年来,随着传播媒介和视频平台的发展,视频正在取代图片成为下一代的主流媒体,这也使得有关视频的深度学习模型正在获得越来越多的关注.
然而,有关视频的深度学习模型仍然有着许多缺点:
- 计算资源耗费更多,并且没有高质量的 model zoo,不能像图片一样进行迁移学习和论文复现.
- 数据集处理较麻烦,但没有一个很好的视频处理工具.
- 随着多模态越来越流行,亟需一个工具来处理其他模态.
除此之外,There are also issues such as deployment optimization,为了解决这些问题,Meta推出了PyTorchVideo深度学习库(Contains components such asFigure 1所示).PyTorchVideo 是一个专注于视频理解工作的深度学习库.PytorchVideo 提供了加速视频理解研究所需的可重用、模块化和高效的组件.PyTorchVideo 是使用PyTorch开发的,支持不同的深度学习视频组件,如视频模型、视频数据集和视频特定转换.
Put one before the text beginsdemo,PyTorchVideoDeploy optimization mods through the model(accelerator)It is the first to realize real-time video action recognition on the mobile terminal(基于X3D模型),It is no longer a dream to run video models on the mobile terminal in the future.
PyTorchVideo Real-time video action recognition for mobile
PyTorchVideo A deep learning library for video understanding
3 GPU平台
极链AI:https://cloud.videojj.com/auth/register?inviter=18452&activityChannel=student_invite
镜像快速搭建
4 安装pytorchvideo
cd /home
pip install pytorchvideo
wget https://dl.fbaipublicfiles.com/pyslowfast/dataset/class_names/kinetics_classnames.json
wget https://dl.fbaipublicfiles.com/pytorchvideo/projects/archery.mp4
如果archery.mp4无法下载,可以先下载好,然后上传,I have uploaded the video resources to Alibaba Cloud Disk:
https://www.aliyundrive.com/s/xjzfmH3uoFB
我在csdnVideo resources are also uploaded:archery.mp4 行为识别 pytorchvideo demo演示视频(行为识别)
5 demo演示
A video needs to be prepared in advance
开始搭建(使用Notebook,Mainly look at the intermediate steps)
import torch
import json
from torchvision.transforms import Compose, Lambda
from torchvision.transforms._transforms_video import (
CenterCropVideo,
NormalizeVideo,
)
from pytorchvideo.data.encoded_video import EncodedVideo
from pytorchvideo.transforms import (
ApplyTransformToKey,
ShortSideScale,
UniformTemporalSubsample,
UniformCropVideo
)
from typing import Dict
# Device on which to run the model
# Set to cuda to load on GPU
device = "cpu"
# Pick a pretrained model and load the pretrained weights
model_name = "slowfast_r50"
model = torch.hub.load("facebookresearch/pytorchvideo", model=model_name, pretrained=True)
# Set to eval mode and move to desired device
model = model.to(device)
model = model.eval()
with open("kinetics_classnames.json", "r") as f:
kinetics_classnames = json.load(f)
# Create an id to label name mapping
kinetics_id_to_classname = {
}
for k, v in kinetics_classnames.items():
kinetics_id_to_classname[v] = str(k).replace('"', "")
####################
# SlowFast transform
####################
side_size = 256
mean = [0.45, 0.45, 0.45]
std = [0.225, 0.225, 0.225]
crop_size = 256
num_frames = 32
sampling_rate = 2
frames_per_second = 30
alpha = 4
class PackPathway(torch.nn.Module):
""" Transform for converting video frames as a list of tensors. """
def __init__(self):
super().__init__()
def forward(self, frames: torch.Tensor):
fast_pathway = frames
# Perform temporal sampling from the fast pathway.
slow_pathway = torch.index_select(
frames,
1,
torch.linspace(
0, frames.shape[1] - 1, frames.shape[1] // alpha
).long(),
)
frame_list = [slow_pathway, fast_pathway]
return frame_list
transform = ApplyTransformToKey(
key="video",
transform=Compose(
[
UniformTemporalSubsample(num_frames),
Lambda(lambda x: x/255.0),
NormalizeVideo(mean, std),
ShortSideScale(
size=side_size
),
CenterCropVideo(crop_size),
PackPathway()
]
),
)
# The duration of the input clip is also specific to the model.
clip_duration = (num_frames * sampling_rate)/frames_per_second
# Load the example video
video_path = "archery.mp4"
# Select the duration of the clip to load by specifying the start and end duration
# The start_sec should correspond to where the action occurs in the video
start_sec = 0
end_sec = start_sec + clip_duration
# Initialize an EncodedVideo helper class
video = EncodedVideo.from_path(video_path)
# Load the desired clip
video_data = video.get_clip(start_sec=start_sec, end_sec=end_sec)
# Apply a transform to normalize the video input
video_data = transform(video_data)
# Move the inputs to the desired device
inputs = video_data["video"]
inputs = [i.to(device)[None, ...] for i in inputs]
# Pass the input clip through the model
preds = model(inputs)
# Get the predicted classes
post_act = torch.nn.Softmax(dim=1)
preds = post_act(preds)
pred_classes = preds.topk(k=5).indices
# Map the predicted classes to the label names
pred_class_names = [kinetics_id_to_classname[int(i)] for i in pred_classes[0]]
print("Predicted labels: %s" % ", ".join(pred_class_names))
处理结果:
Predicted labels: archery, throwing axe, playing paintball, disc golfing, riding or walking with horse
边栏推荐
- The problem of writing go run in crontab does not execute
- Win11如何更改默认下载路径?Win11更改默认下载路径的方法
- The use of @ symbol in MySql
- 来了!东方甄选为龙江农产品直播带货
- NXP IMX8QXP replacement DDR model operation process
- 试写C语言三子棋
- 牛客刷题系列之进阶版(组队竞赛,排序子序列,倒置字符串, 删除公共字符,修理牧场)
- MindSpore:【Resolve node failed】解析节点失败的问题
- kotlin的by lazy
- 技术很牛逼,还需要“向上管理”吗?
猜你喜欢
阿里云武林头条活动分享
【Pointing to Offer】Pointing to Offer 22. The kth node from the bottom in the linked list
SwiftUI iOS Boutique Open Source Project Complete Baked Food Recipe App based on SQLite (tutorial including source code)
OneFlow source code analysis: Op, Kernel and interpreter
After 23 years of operation, the former "China's largest e-commerce website" has turned yellow...
OneFlow源码解析:Op、Kernel与解释器
MindSpore:【语音识别】DFCNN网络训练loss不收敛
kotlin的by lazy
经济新闻:错误# 15:初始化libiomp5md。dll,但发现libiomp5md。已经初始化dll。解决方法
牛客刷题系列之进阶版(组队竞赛,排序子序列,倒置字符串, 删除公共字符,修理牧场)
随机推荐
What is the value of biomedical papers? How to translate the papers into Chinese and English?
【刷题篇】计算质数
Talking about Contrastive Learning (Contrastive Learning) the first bullet
NXP IMX8QXP更换DDR型号操作流程
Anaconda Navigator stuck on loading applications
WEBSOCKETPP使用简介+demo
常见链表题及其 Go 实现
NC | Tao Liang Group of West Lake University - TMPRSS2 "assists" virus infection and mediates the host invasion of Clostridium sothrix hemorrhagic toxin...
Correct pose of Vulkan open feature
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.解决方法
跨域问题的解决方法
Scala学习:类和对象
MySQl数据库————DQL数据查询语言
MindSpore:自定义dataset的tensor问题
【Pointing to Offer】Pointing to Offer 18. Delete the node of the linked list
【flink】报错整理 Could not instantiate the executor. Make sure a planner module is on the classpath
深入浅出边缘云 | 3. 资源配置
牛客刷题系列之进阶版(搜索旋转排序数组,链表内指定区间反转)
架构师如何成长
谷歌AlphaFold近日宣称预测出地球上几乎所有蛋白质结构