当前位置:网站首页>Baidu flying general BMN timing action positioning framework | data preparation and training guide (Part 1)
Baidu flying general BMN timing action positioning framework | data preparation and training guide (Part 1)
2022-07-07 01:39:00 【Xinxu】
One 、 Introduce
BMN The model is developed by Baidu ,2019 year ActivityNet Winning scheme , In the problem of video action positioning proposal The generation of provides an efficient solution .
In short , The timing action positioning of video is to give a video , Analyze from xxx Seconds to xxx What action is a second , Compared with action recognition, it is necessary to infer the start time and end time of this action , The indicators mainly involve two :(1) Classification accuracy (2) And GT Of IoU.
Project address :
The algorithm is mainly divided into three stages :
(1) Video understanding
PP-TSM, Audio features :VGGish
(2) Timing nomination
BMN
(3) Action classification and positioning
AttentionLSTM
Each stage includes data preparation 、 Training 、 Verification and derivation of reasoning model .
The preparation environment mainly depends on requirements.txt The content inside is installed , Basically no problem ,paddlepaddle-gpu You'd better install the latest version .
Two 、PP-TSM
The dataset uses FootballAction Open source football action data set of flying oars
Data set from EuroCup2012, EuroCup2016, WorldCup2014, WorldCup2018 The competition video of the four events is composed , total 272 Training set 、25 Test set , Support 15 Positioning and recognition of wonderful football moves , The action categories are : A shot 、 goal 、 There are cheers for the goal 、 Corner kick 、 free kick 、 A yellow card 、 The red card 、 A penalty 、 substitutions 、 Out of bounds 、 Goal ball 、 Kick off 、 Flag waving offside 、 Replay air confrontation and replay goals .
In the project, not all the data of the propeller are open source , It's open source altogether 49 Data sets .
(1) Download datasets
Use bash File download , The download script file is located in PaddleVideo-develop/applications/FootballAction/datasets/EuroCup2016/download_dataset.sh, After giving the file permission to execute, you can run it directly , When the download is complete, it will be in PaddleVideo-develop/applications/FootballAction/datasets/EuroCup2016/mp4 Under this folder 49 individual MP4 video , total 78.1GB size . The marked data is directly given in the project file :
datasets/EuroCup2016/label.json List of tags for classification
datasets/EuroCup2016/label_cls8_train.json Tag the training data
datasets/EuroCup2016/label_cls8_train.json To validate data labels
datasets/EuroCup2016/url.list List of files for training data
datasets/EuroCup2016/url_val.list To verify the data file list
(2) Prepare the data
In the first stage, you need to prepare PP-TSM Training data , Use the following command :
Before that, there needs to be ffmpeg Environmental Science ,sudo apt install ffmpeg
cd PaddleVideo-develop/applications/FootballAction/datasets/script
python get_frames_pcm.py
This step is to sample the original video file , Image sampling is in seconds 5 The frequency of the frame , Audio sampling is based on 16000 The frequency of . It takes a long time to deal with , After processing, two new folders will be generated :
|-- datasets # Training data sets and processing scripts
|-- EuroCup2016 # Data sets
|-- mp4 # The original video .mp4
|-- frames # image frame ( new )
|-- pcm # Audio pcm( new )
|-- url.list # Video list
|-- label.json # The original video gts
(3) Process sampling
Process the above sampling data into PP-TSM Training data sets for
cd PaddleVideo-develop/applications/FootballAction/datasets/script
python get_instance_for_pptsm.py
This step is to take the motion interval as a positive sample according to the annotation , All frames in the interval generate a pkl file , The non motion interval is taken as a negative sample , Random sampling N Intervals generate N individual pkl file
After that step :
|-- datasets # Training data sets and processing scripts
|-- EuroCup2016 # Data sets
|-- input_for_pptsm # pptsm Training data ( new )
(4) Training PP-TSM
First, you need to download a pre training weight :
cd PaddleVideo-develop/applications/FootballAction
wget https://videotag.bj.bcebos.com/PaddleVideo/PretrainModel/ResNet50_vd_ssld_v2_pretrained.pdparams
mkdir pretrain
mv ResNet50_vd_ssld_v2_pretrained.pdparams pretrain/ResNet50_vd_ssld_v2_pretrained.pdparams
Open the training Profile :
PaddleVideo-develop/applications/FootballAction/train_proposal/configs/pptsm_football_v2.0.yaml
The first 5 That's ok : Write the location of the pre training model just downloaded , Note the absolute path
The first 17,18 That's ok :batchsize size , I am a 2080Ti-8G, Can only write 4/4
The first 19 That's ok : Change it to 1
The first 23 That's ok : find PaddleVideo-develop/applications/FootballAction/datasets/EuroCup2016/input_for_pptsm/train.list This file , Then write his absolute path
The first 28 That's ok : find PaddleVideo-develop/applications/FootballAction/datasets/EuroCup2016/input_for_pptsm/val.list, Then write his absolute path , This is actually what was just (3) The index file generated in that step
The first 33 That's ok : and 28 All right , Just write the same thing
For single card, use the following command to start training :
python -B -m paddle.distributed.launch --gpus="0" --log_dir=./football/logs_pptsm main.py --validate -c applications/FootballAction/train_proposal/configs/pptsm_football_v2.0.yaml -o output_dir=./football/pptsm
Probably need 3 God 3 Night training complete , Next, change the code to reasoning mode :
Before switching to prediction mode , Need modification
PaddleVideo/paddlevideo/modeling/framework/recognizers/recognizer2d.py
file , take init and infer_step The functions are updated to the following code :
def __init__(self, backbone=None, head=None):
super().__init__(backbone=backbone, head=head)
self.avgpool2d = paddle.nn.AdaptiveAvgPool2D((1, 1), data_format='NCHW')
def infer_step(self, data_batch):
"""Define how the model is going to test, from input to output."""
imgs = data_batch[0]
imgs = paddle.reshape_(imgs, [-1] + list(imgs.shape[2:]))
feature = self.backbone(imgs)
feat = self.avgpool2d(feature)
return feat
stay PaddleVideo Root execution
python tools/export_model.py -c applications/FootballAction/train_proposal/configs/pptsm_football_v2.0.yaml \
-p ./football/pptsm/ppTSM_best.pdparams \
-o ./football/inference_model
The reasoning model can be derived
(5) To configure PP-TSM
take
PaddleVideo/applications/FootballAction/predict/action_detect/models/pptsm_infer.py
In file 41 Yes
self.output_tensor = self.predictor.get_output_handle(output_names[1])
Replace with
self.output_tensor = self.predictor.get_output_handle(output_names[0])
Feature extraction of image and audio , Because we use the weights we just trained to extract features , So you need to modify the configuration file :
stay PaddleVideo-develop/applications/FootballAction/extractor/configs/configs.yaml In this document ,
The first 4 All right index_label_football_8.json The path of is configured as PaddleVideo-develop/applications/FootballAction/extractor/configs/index_label_football_8.json The absolute path of
The first 13 OK, change the default weight road strength to PaddleVideo-develop/football/inference_model/ppTSM.pdmodel The absolute path of
The first 14 Line change the default parameter file to PaddleVideo-develop/football/inference_model/ppTSM.pdiparams The absolute path of
The first 29 The audio model weight path of the row is changed to PaddleVideo-develop/applications/FootballAction/checkpoints/AUDIO/__model__ The absolute path of
The first 30 The audio model parameter file path of line is changed to PaddleVideo-develop/applications/FootballAction/checkpoints/AUDIO/__param__ The absolute path of
The first 38 That's ok BMN The weight path of the model is changed to PaddleVideo-develop/applications/FootballAction/checkpoints/BMN/__model__ The absolute path of
The first 39 That's ok BMN The parameter file path of the model is changed to PaddleVideo-develop/applications/FootballAction/checkpoints/BMN/__param__ The absolute path of
The first 51 Yes LSTM The model weight path is changed to PaddleVideo-develop/applications/FootballAction/checkpoints/LSTM/__model__ The absolute path of
The first 52 Yes LSTM The path of model parameter file is changed to PaddleVideo-develop/applications/FootballAction/checkpoints/LSTM/__param__ The absolute path of
And on again PaddleVideo-develop/applications/FootballAction/extractor/extract_feat.py
The first 83 The row path is changed to EuroCup2016 Path to folder :PaddleVideo-develop/applications/FootballAction/datasets/EuroCup2016
After the above configuration , Enter into PaddleVideo-develop/applications/FootballAction Run under the directory
python extract_feat.py
After that step , Data storage location
|-- datasets # Training data sets and processing scripts
|-- EuroCup2016 # Data sets
|-- features # Video images + Audio features
Next, use the processed features Training BMN
边栏推荐
猜你喜欢
Js逆向——捅了【马蜂窝】的ob混淆与加速乐
Appium自动化测试基础 — uiautomatorviewer定位工具
Yunna | work order management measures, how to carry out work order management
The difference between Tansig and logsig. Why does BP like to use Tansig
永久的摇篮
云呐-工单管理制度及流程,工单管理规范
AcWing 361. Sightseeing cow problem solution (SPFA seeking positive ring)
Byte P7 professional level explanation: common tools and test methods for interface testing, Freeman
Your cache folder contains root-owned files, due to a bug in npm ERR! previous versions of npm which
C language - array
随机推荐
THREE. AxesHelper is not a constructor
AcWing 345. Cattle station solution (nature and multiplication of Floyd)
Installation of gazebo & connection with ROS
AcWing 346. 走廊泼水节 题解(推公式、最小生成树)
ClickHouse字段分组聚合、按照任意时间段粒度查询SQL
Transformation transformation operator
1123. The nearest common ancestor of the deepest leaf node
hdu 4661 Message Passing(木DP&组合数学)
1123. 最深叶节点的最近公共祖先
AcWing 361. 观光奶牛 题解(spfa求正环)
C语言实例_3
MySQL script batch queries all tables containing specified field types in the database
dvajs的基础介绍及使用
Long press the button to execute the function
What does front-end processor mean? What is the main function? What is the difference with fortress machine?
Gin 入门实战
C语言实例_5
AcWing 345. 牛站 题解(floyd的性质、倍增)
Transplant DAC chip mcp4725 to nuc980
AcWing 904. 虫洞 题解(spfa求负环)