当前位置:网站首页>Baidu flying general BMN timing action positioning framework | data preparation and training guide (Part 1)

Baidu flying general BMN timing action positioning framework | data preparation and training guide (Part 1)

2022-07-07 01:39:00 Xinxu

One 、 Introduce

        BMN The model is developed by Baidu ,2019 year ActivityNet Winning scheme , In the problem of video action positioning proposal The generation of provides an efficient solution .

         In short , The timing action positioning of video is to give a video , Analyze from xxx Seconds to xxx What action is a second , Compared with action recognition, it is necessary to infer the start time and end time of this action , The indicators mainly involve two :(1) Classification accuracy (2) And GT Of IoU.

Project address :

GitHub - PaddlePaddle/PaddleVideo: Based on modular design , Provide rich video algorithm implementation 、 Industrial video algorithm optimization and Application , Including security 、 sports 、 Internet 、 Action positioning and recognition in media and other industries 、 behavior analysis 、 Smart cover 、 Video tagging 、 Video tagging, etc , Covering motion recognition and video classification 、 Action positioning 、 Motion detection 、 Multimodal text video retrieval and other technologies . Based on modular design , Provide rich video algorithm implementation 、 Industrial video algorithm optimization and Application , Including security 、 sports 、 Internet 、 Action positioning and recognition in media and other industries 、 behavior analysis 、 Smart cover 、 Video tagging 、 Video tagging, etc , Covering motion recognition and video classification 、 Action positioning 、 Motion detection 、 Multimodal text video retrieval and other technologies . - GitHub - PaddlePaddle/PaddleVideo: Based on modular design , Provide rich video algorithm implementation 、 Industrial video algorithm optimization and Application , Including security 、 sports 、 Internet 、 Action positioning and recognition in media and other industries 、 behavior analysis 、 Smart cover 、 Video tagging 、 Video tagging, etc , Covering motion recognition and video classification 、 Action positioning 、 Motion detection 、 Multimodal text video retrieval and other technologies .https://github.com/PaddlePaddle/PaddleVideo This project requires a lot of storage space , Probably 200G about , Put it in a sufficient space

The algorithm is mainly divided into three stages :

(1) Video understanding

        PP-TSM, Audio features :VGGish

(2) Timing nomination

        BMN

(3) Action classification and positioning

        AttentionLSTM

Each stage includes data preparation 、 Training 、 Verification and derivation of reasoning model .

The preparation environment mainly depends on requirements.txt The content inside is installed , Basically no problem ,paddlepaddle-gpu You'd better install the latest version .

Two 、PP-TSM

The dataset uses FootballAction Open source football action data set of flying oars

Data set from EuroCup2012, EuroCup2016, WorldCup2014, WorldCup2018 The competition video of the four events is composed , total 272 Training set 、25 Test set , Support 15 Positioning and recognition of wonderful football moves , The action categories are : A shot 、 goal 、 There are cheers for the goal 、 Corner kick 、 free kick 、 A yellow card 、 The red card 、 A penalty 、 substitutions 、 Out of bounds 、 Goal ball 、 Kick off 、 Flag waving offside 、 Replay air confrontation and replay goals .

In the project, not all the data of the propeller are open source , It's open source altogether 49 Data sets .

(1) Download datasets

Use bash File download , The download script file is located in PaddleVideo-develop/applications/FootballAction/datasets/EuroCup2016/download_dataset.sh, After giving the file permission to execute, you can run it directly , When the download is complete, it will be in PaddleVideo-develop/applications/FootballAction/datasets/EuroCup2016/mp4 Under this folder 49 individual MP4 video , total 78.1GB size . The marked data is directly given in the project file :

datasets/EuroCup2016/label.json List of tags for classification

datasets/EuroCup2016/label_cls8_train.json Tag the training data

datasets/EuroCup2016/label_cls8_train.json To validate data labels

datasets/EuroCup2016/url.list List of files for training data

datasets/EuroCup2016/url_val.list To verify the data file list

(2) Prepare the data

In the first stage, you need to prepare PP-TSM Training data , Use the following command :

Before that, there needs to be ffmpeg Environmental Science ,sudo apt install ffmpeg

cd PaddleVideo-develop/applications/FootballAction/datasets/script
python get_frames_pcm.py

This step is to sample the original video file , Image sampling is in seconds 5 The frequency of the frame , Audio sampling is based on 16000 The frequency of . It takes a long time to deal with , After processing, two new folders will be generated :

|--  datasets                   # Training data sets and processing scripts
        |--  EuroCup2016   # Data sets
            |--  mp4               # The original video .mp4
            |--  frames           # image frame ( new )
            |--  pcm               # Audio pcm( new )
            |--  url.list            # Video list
            |--  label.json       # The original video gts

(3) Process sampling

Process the above sampling data into PP-TSM Training data sets for

cd PaddleVideo-develop/applications/FootballAction/datasets/script
python get_instance_for_pptsm.py

This step is to take the motion interval as a positive sample according to the annotation , All frames in the interval generate a pkl file , The non motion interval is taken as a negative sample , Random sampling N Intervals generate N individual pkl file

After that step :

|--  datasets                   # Training data sets and processing scripts
        |--  EuroCup2016            # Data sets
            |--  input_for_pptsm   # pptsm Training data ( new )

(4) Training PP-TSM

First, you need to download a pre training weight :

cd PaddleVideo-develop/applications/FootballAction
wget https://videotag.bj.bcebos.com/PaddleVideo/PretrainModel/ResNet50_vd_ssld_v2_pretrained.pdparams
mkdir pretrain
mv ResNet50_vd_ssld_v2_pretrained.pdparams pretrain/ResNet50_vd_ssld_v2_pretrained.pdparams

Open the training Profile :

PaddleVideo-develop/applications/FootballAction/train_proposal/configs/pptsm_football_v2.0.yaml

The first 5 That's ok : Write the location of the pre training model just downloaded , Note the absolute path

The first 17,18 That's ok :batchsize size , I am a 2080Ti-8G, Can only write 4/4

The first 19 That's ok : Change it to 1

The first 23 That's ok : find PaddleVideo-develop/applications/FootballAction/datasets/EuroCup2016/input_for_pptsm/train.list This file , Then write his absolute path

The first 28 That's ok : find PaddleVideo-develop/applications/FootballAction/datasets/EuroCup2016/input_for_pptsm/val.list, Then write his absolute path , This is actually what was just (3) The index file generated in that step

The first 33 That's ok : and 28 All right , Just write the same thing

For single card, use the following command to start training :

 python -B -m paddle.distributed.launch     --gpus="0"     --log_dir=./football/logs_pptsm     main.py      --validate     -c applications/FootballAction/train_proposal/configs/pptsm_football_v2.0.yaml      -o output_dir=./football/pptsm

Probably need 3 God 3 Night training complete , Next, change the code to reasoning mode :

Before switching to prediction mode , Need modification  PaddleVideo/paddlevideo/modeling/framework/recognizers/recognizer2d.py  file , take init and infer_step The functions are updated to the following code :

    def __init__(self, backbone=None, head=None):
        super().__init__(backbone=backbone, head=head)
        self.avgpool2d = paddle.nn.AdaptiveAvgPool2D((1, 1), data_format='NCHW')

    def infer_step(self, data_batch):
        """Define how the model is going to test, from input to output."""
        imgs = data_batch[0]
        imgs = paddle.reshape_(imgs, [-1] + list(imgs.shape[2:]))
        feature = self.backbone(imgs)
        feat = self.avgpool2d(feature)
        return feat

stay PaddleVideo Root execution

python tools/export_model.py -c applications/FootballAction/train_proposal/configs/pptsm_football_v2.0.yaml \
                             -p ./football/pptsm/ppTSM_best.pdparams \
                             -o ./football/inference_model

The reasoning model can be derived

(5) To configure PP-TSM

take  PaddleVideo/applications/FootballAction/predict/action_detect/models/pptsm_infer.py  In file 41 Yes

self.output_tensor = self.predictor.get_output_handle(output_names[1])

Replace with

self.output_tensor = self.predictor.get_output_handle(output_names[0])

Feature extraction of image and audio , Because we use the weights we just trained to extract features , So you need to modify the configuration file :

stay PaddleVideo-develop/applications/FootballAction/extractor/configs/configs.yaml In this document ,

The first 4 All right index_label_football_8.json The path of is configured as PaddleVideo-develop/applications/FootballAction/extractor/configs/index_label_football_8.json The absolute path of

The first 13 OK, change the default weight road strength to PaddleVideo-develop/football/inference_model/ppTSM.pdmodel The absolute path of

The first 14 Line change the default parameter file to PaddleVideo-develop/football/inference_model/ppTSM.pdiparams The absolute path of

The first 29 The audio model weight path of the row is changed to PaddleVideo-develop/applications/FootballAction/checkpoints/AUDIO/__model__ The absolute path of

The first 30 The audio model parameter file path of line is changed to PaddleVideo-develop/applications/FootballAction/checkpoints/AUDIO/__param__ The absolute path of

The first 38 That's ok BMN The weight path of the model is changed to PaddleVideo-develop/applications/FootballAction/checkpoints/BMN/__model__ The absolute path of

The first 39 That's ok BMN The parameter file path of the model is changed to PaddleVideo-develop/applications/FootballAction/checkpoints/BMN/__param__ The absolute path of

The first 51 Yes LSTM The model weight path is changed to PaddleVideo-develop/applications/FootballAction/checkpoints/LSTM/__model__ The absolute path of

The first 52 Yes LSTM The path of model parameter file is changed to PaddleVideo-develop/applications/FootballAction/checkpoints/LSTM/__param__ The absolute path of

And on again PaddleVideo-develop/applications/FootballAction/extractor/extract_feat.py

The first 83 The row path is changed to EuroCup2016 Path to folder :PaddleVideo-develop/applications/FootballAction/datasets/EuroCup2016

After the above configuration , Enter into PaddleVideo-develop/applications/FootballAction Run under the directory

python extract_feat.py

After that step , Data storage location

   |--  datasets                   # Training data sets and processing scripts
        |--  EuroCup2016            # Data sets
            |--  features          # Video images + Audio features

Next, use the processed features Training BMN

原网站

版权声明
本文为[Xinxu]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207061805504773.html