当前位置：网站首页>Baidu flying general BMN timing action positioning framework | data preparation and training guide (Part 1)

Baidu flying general BMN timing action positioning framework | data preparation and training guide (Part 1)

2022-07-07 01:39:00 【Xinxu】

One 、 Introduce

BMN The model is developed by Baidu ,2019 year ActivityNet Winning scheme , In the problem of video action positioning proposal The generation of provides an efficient solution .

In short , The timing action positioning of video is to give a video , Analyze from xxx Seconds to xxx What action is a second , Compared with action recognition, it is necessary to infer the start time and end time of this action , The indicators mainly involve two ：（1） Classification accuracy （2） And GT Of IoU.

Project address ：

GitHub - PaddlePaddle/PaddleVideo: Based on modular design , Provide rich video algorithm implementation 、 Industrial video algorithm optimization and Application , Including security 、 sports 、 Internet 、 Action positioning and recognition in media and other industries 、 behavior analysis 、 Smart cover 、 Video tagging 、 Video tagging, etc , Covering motion recognition and video classification 、 Action positioning 、 Motion detection 、 Multimodal text video retrieval and other technologies . Based on modular design , Provide rich video algorithm implementation 、 Industrial video algorithm optimization and Application , Including security 、 sports 、 Internet 、 Action positioning and recognition in media and other industries 、 behavior analysis 、 Smart cover 、 Video tagging 、 Video tagging, etc , Covering motion recognition and video classification 、 Action positioning 、 Motion detection 、 Multimodal text video retrieval and other technologies . - GitHub - PaddlePaddle/PaddleVideo: Based on modular design , Provide rich video algorithm implementation 、 Industrial video algorithm optimization and Application , Including security 、 sports 、 Internet 、 Action positioning and recognition in media and other industries 、 behavior analysis 、 Smart cover 、 Video tagging 、 Video tagging, etc , Covering motion recognition and video classification 、 Action positioning 、 Motion detection 、 Multimodal text video retrieval and other technologies .https://github.com/PaddlePaddle/PaddleVideo This project requires a lot of storage space , Probably 200G about , Put it in a sufficient space

The algorithm is mainly divided into three stages ：

（1） Video understanding

PP-TSM, Audio features ：VGGish

（2） Timing nomination

BMN

（3） Action classification and positioning

AttentionLSTM

Each stage includes data preparation 、 Training 、 Verification and derivation of reasoning model .

The preparation environment mainly depends on requirements.txt The content inside is installed , Basically no problem ,paddlepaddle-gpu You'd better install the latest version .

Two 、PP-TSM

The dataset uses FootballAction Open source football action data set of flying oars

Data set from EuroCup2012, EuroCup2016, WorldCup2014, WorldCup2018 The competition video of the four events is composed , total 272 Training set 、25 Test set , Support 15 Positioning and recognition of wonderful football moves , The action categories are ： A shot 、 goal 、 There are cheers for the goal 、 Corner kick 、 free kick 、 A yellow card 、 The red card 、 A penalty 、 substitutions 、 Out of bounds 、 Goal ball 、 Kick off 、 Flag waving offside 、 Replay air confrontation and replay goals .

In the project, not all the data of the propeller are open source , It's open source altogether 49 Data sets .

（1） Download datasets

Use bash File download , The download script file is located in PaddleVideo-develop/applications/FootballAction/datasets/EuroCup2016/download_dataset.sh, After giving the file permission to execute, you can run it directly , When the download is complete, it will be in PaddleVideo-develop/applications/FootballAction/datasets/EuroCup2016/mp4 Under this folder 49 individual MP4 video , total 78.1GB size . The marked data is directly given in the project file ：

datasets/EuroCup2016/label.json List of tags for classification

datasets/EuroCup2016/label_cls8_train.json Tag the training data

datasets/EuroCup2016/label_cls8_train.json To validate data labels

datasets/EuroCup2016/url.list List of files for training data

datasets/EuroCup2016/url_val.list To verify the data file list

（2） Prepare the data

In the first stage, you need to prepare PP-TSM Training data , Use the following command ：

Before that, there needs to be ffmpeg Environmental Science ,sudo apt install ffmpeg

cd PaddleVideo-develop/applications/FootballAction/datasets/script
python get_frames_pcm.py

This step is to sample the original video file , Image sampling is in seconds 5 The frequency of the frame , Audio sampling is based on 16000 The frequency of . It takes a long time to deal with , After processing, two new folders will be generated ：

|-- datasets # Training data sets and processing scripts
|-- EuroCup2016 # Data sets
|-- mp4 # The original video .mp4
|-- frames # image frame （ new ）
|-- pcm # Audio pcm（ new ）
|-- url.list # Video list
|-- label.json # The original video gts

（3） Process sampling

Process the above sampling data into PP-TSM Training data sets for

cd PaddleVideo-develop/applications/FootballAction/datasets/script
python get_instance_for_pptsm.py

This step is to take the motion interval as a positive sample according to the annotation , All frames in the interval generate a pkl file , The non motion interval is taken as a negative sample , Random sampling N Intervals generate N individual pkl file

After that step ：

|-- datasets # Training data sets and processing scripts
|-- EuroCup2016 # Data sets
|-- input_for_pptsm # pptsm Training data （ new ）

（4） Training PP-TSM

First, you need to download a pre training weight ：

cd PaddleVideo-develop/applications/FootballAction
wget https://videotag.bj.bcebos.com/PaddleVideo/PretrainModel/ResNet50_vd_ssld_v2_pretrained.pdparams
mkdir pretrain
mv ResNet50_vd_ssld_v2_pretrained.pdparams pretrain/ResNet50_vd_ssld_v2_pretrained.pdparams

Open the training Profile ：

PaddleVideo-develop/applications/FootballAction/train_proposal/configs/pptsm_football_v2.0.yaml

The first 5 That's ok ： Write the location of the pre training model just downloaded , Note the absolute path

The first 17,18 That's ok ：batchsize size , I am a 2080Ti-8G, Can only write 4/4

The first 19 That's ok ： Change it to 1

The first 23 That's ok ： find PaddleVideo-develop/applications/FootballAction/datasets/EuroCup2016/input_for_pptsm/train.list This file , Then write his absolute path

The first 28 That's ok ： find PaddleVideo-develop/applications/FootballAction/datasets/EuroCup2016/input_for_pptsm/val.list, Then write his absolute path , This is actually what was just （3） The index file generated in that step

The first 33 That's ok ： and 28 All right , Just write the same thing

For single card, use the following command to start training ：

 python -B -m paddle.distributed.launch     --gpus="0"     --log_dir=./football/logs_pptsm     main.py      --validate     -c applications/FootballAction/train_proposal/configs/pptsm_football_v2.0.yaml      -o output_dir=./football/pptsm

Probably need 3 God 3 Night training complete , Next, change the code to reasoning mode ：

Before switching to prediction mode , Need modification PaddleVideo/paddlevideo/modeling/framework/recognizers/recognizer2d.py file , take init and infer_step The functions are updated to the following code ：

    def __init__(self, backbone=None, head=None):
        super().__init__(backbone=backbone, head=head)
        self.avgpool2d = paddle.nn.AdaptiveAvgPool2D((1, 1), data_format='NCHW')

    def infer_step(self, data_batch):
        """Define how the model is going to test, from input to output."""
        imgs = data_batch[0]
        imgs = paddle.reshape_(imgs, [-1] + list(imgs.shape[2:]))
        feature = self.backbone(imgs)
        feat = self.avgpool2d(feature)
        return feat

stay PaddleVideo Root execution

python tools/export_model.py -c applications/FootballAction/train_proposal/configs/pptsm_football_v2.0.yaml \
                             -p ./football/pptsm/ppTSM_best.pdparams \
                             -o ./football/inference_model

The reasoning model can be derived

（5） To configure PP-TSM

take PaddleVideo/applications/FootballAction/predict/action_detect/models/pptsm_infer.py In file 41 Yes

self.output_tensor = self.predictor.get_output_handle(output_names[1])

Replace with

self.output_tensor = self.predictor.get_output_handle(output_names[0])

Feature extraction of image and audio , Because we use the weights we just trained to extract features , So you need to modify the configuration file ：

stay PaddleVideo-develop/applications/FootballAction/extractor/configs/configs.yaml In this document ,

The first 4 All right index_label_football_8.json The path of is configured as PaddleVideo-develop/applications/FootballAction/extractor/configs/index_label_football_8.json The absolute path of

The first 13 OK, change the default weight road strength to PaddleVideo-develop/football/inference_model/ppTSM.pdmodel The absolute path of

The first 14 Line change the default parameter file to PaddleVideo-develop/football/inference_model/ppTSM.pdiparams The absolute path of

The first 29 The audio model weight path of the row is changed to PaddleVideo-develop/applications/FootballAction/checkpoints/AUDIO/__model__ The absolute path of

The first 30 The audio model parameter file path of line is changed to PaddleVideo-develop/applications/FootballAction/checkpoints/AUDIO/__param__ The absolute path of

The first 38 That's ok BMN The weight path of the model is changed to PaddleVideo-develop/applications/FootballAction/checkpoints/BMN/__model__ The absolute path of

The first 39 That's ok BMN The parameter file path of the model is changed to PaddleVideo-develop/applications/FootballAction/checkpoints/BMN/__param__ The absolute path of

The first 51 Yes LSTM The model weight path is changed to PaddleVideo-develop/applications/FootballAction/checkpoints/LSTM/__model__ The absolute path of

The first 52 Yes LSTM The path of model parameter file is changed to PaddleVideo-develop/applications/FootballAction/checkpoints/LSTM/__param__ The absolute path of

And on again PaddleVideo-develop/applications/FootballAction/extractor/extract_feat.py

The first 83 The row path is changed to EuroCup2016 Path to folder ：PaddleVideo-develop/applications/FootballAction/datasets/EuroCup2016

After the above configuration , Enter into PaddleVideo-develop/applications/FootballAction Run under the directory

python extract_feat.py

After that step , Data storage location

|-- datasets # Training data sets and processing scripts
|-- EuroCup2016 # Data sets
|-- features # Video images + Audio features

Next, use the processed features Training BMN

原网站

版权声明
本文为[Xinxu]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/188/202207061805504773.html