当前位置：网站首页>The classification effect of converting video classification data set to picture classification data set on vgg16

The classification effect of converting video classification data set to picture classification data set on vgg16

2022-06-11 06:13:00 【Glutinous rice balls】

Dataset processing

Recently, experiments are being done to extract frames from video classification data sets into image classification data sets , Then put it into the existing model for training and evaluation .

This article Refer to the post The process of building video classification model is introduced in detail , But it mainly deals with data sets . So my experiment uses the method of data set processing in this blog post for reference , But there are some modifications to individual contents , And added some understanding .

Easy to handle , Only take UCF101 Before 10 Categories , Mainly to test the effect of video frame extraction processing into image classification data set , Other factors are not considered for the time being .

Key points ： Tag every video

The boss blog is based on Video class name Divide the label , But in fact, it can also be shown in the video Numbers Divide , such , There is no need to separate Text labels to digital labels 了 .

Division of leaders ：

#  Create tags for training videos 
train_video_tag = []# Tag list 
for i in range(train.shape[0]):#shape[0], Traverse each video name （ From back to front ）
    train_video_tag.append(train['video_name'][i].split('/')[0])# According to the data frame No ['video_name'] Column for each row “/” The division is taken as 0 Dimension and add to the tag list 
    
train['tag'] = train_video_tag# Add a column to the data frame and name the category , Assign the tag list to this column 
train.head()# Before display 5 That's ok

effect ：

Divide by numbers ：

#  Create tags for training videos 
train_video_tag = []# Tag list 
for i in range(train.shape[0]):
    train_video_tag.append(train['video_name'][i].split('avi')[1])
    
train['tag'] = train_video_tag
train.head()

effect ：

I think this is to see what you need to choose , And I went around in a big guy's way .

Main points two ： Extract frames from the training group video

This is the core code , But the blogger's code indentation problem is still something that has not come out . So I changed it all , Very successful . Reference resources

#  Save the frames extracted from the training video 
for i in tqdm(range(train.shape[0])):#tqmd Progress bar extraction process 
    count = 0
    videoFile = train['trainvideo_name'][i]# Video name in data frame 

    #  Global variables 
    VIDEO_PATH = 'Desktop/UCF10/videos_10/'+videoFile.split(' ')[0]#  Video address , The video name is divided by a space 
    EXTRACT_FOLDER = 'Desktop/UCF10/train_1/' #  Where to store frame pictures  
    EXTRACT_FREQUENCY = 50 #  Frame extraction frequency （ How many frames per second , One video is 25 frame /s, Therefore, the number of frames drawn is small ）


    def extract_frames(video_path, dst_folder, index):
        #  Main operation 
        import cv2
        video = cv2.VideoCapture()# Turn on the video or camera, etc 
        if not video.open(video_path):
            print("can not open the video")
            exit(1)
        count = 1
        while True:
            _, frame = video.read()# Read video by frame 
            # Exit after reading the video 
            if frame is None:
                break
            # every 50 Frame extraction 
            if count % EXTRACT_FREQUENCY == 0:
                # Storage format of frames , Change... As needed , It is best to keep the same with the video name 
                save_path ='Desktop/UCF10/train_1/'+videoFile.split('/')[1].split(' ')[0] +"_frame%d.jpg" % count
                cv2.imwrite(save_path, frame)# Save frame 
            count += 1
        video.release()
        #  Print out the total number of extracted frames 
        print("Totally save {:d} pics".format(index-1))

    def main():
        '''
         Generate pictures in batches , You can not use this code 
        '''
    #     #  Recursively delete the folder where the frame pictures were stored before , And build a new 
    #     import shutil
    #     try:
    #         shutil.rmtree(EXTRACT_FOLDER)
    #     except OSError:
    #         pass
    #     import os
    #     os.mkdir(EXTRACT_FOLDER)
        #  Extract frame picture , And save to the specified path 
        extract_frames(VIDEO_PATH, EXTRACT_FOLDER, 1)

    if __name__ == '__main__':
        main()

effect ：

Main points three ： Save the name of the frame and the corresponding label in .csv In the document . Creating this file will help us read the frames we need in the next phase

from tqdm import tqdm
from glob import glob
import pandas as pd
#  Get the names of all images 
images = glob("Desktop/UCF10/train_1_50/*.jpg")
train_image = []
train_class = []
for i in tqdm(range(len(images))):
    #  Name of the created image 
    train_image.append(images[i].split('50\\')[1])
    #  Create a label for the image 
    train_class.append(images[i].split('v_')[1].split('_')[0])

    #  Save images and their labels in data frames 
    train_data = pd.DataFrame()
    train_data['image'] = train_image
    train_data['class'] = train_class
    #  Convert data frame to csv. file 
    train_data.to_csv('Desktop/UCF10/train_new.csv',header=True, index=False)

In this step, the string division between the image and the label name can be modified appropriately , I can't tell by the big guy ...

Read the csv file ：

import pandas as pd
train = pd.read_csv('Desktop/UCF10/train_new.csv')
train.head()

effect ：

Point four ： Use this .csv file , Extract the frame and save it as a Numpy Array . Improvements have been made here Turn multiple pictures into one .npy File store

Point five ： Divide the training set, verification set and corresponding labels

import pandas as pd
from sklearn.model_selection import train_test_split

# Read the file containing the picture name and the corresponding class alias .csv file 
train = pd.read_csv('train_new.csv')
#  Distinguish between goals 
y = train['class']
#  Create training sets and validation sets 
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=0.2, stratify = y)

Later, the article of the boss is to build vgg16 The classification model of , The data columns of the target variables are created for the training set and the verification set . But I just need data sets , So save the above results .

Save training sets and validation sets

import numpy as np
np.save("Desktop/UCF10/train1/Xtr01", X_train)
np.save("Desktop/UCF10/train1/Ytr01", y_train)
np.save("Desktop/UCF10/test1/Xte01",  X_test)
np.save("Desktop/UCF10/test1/Yte01", y_test)

Last , take Convert text labels to digital labels , If the labels are divided by numbers , You don't have to do this .

Put the data set into the model test

existing vgg16 The hyperparametric settings of the model are used to run cifar_10 Of ,learning_rate = 0.02,batch_size = 100,epoch = 1,momentuma = 0.5

But enter this processed data set , result , The loss is nan, The accuracy of one iteration is only 11% about , And the accuracy of each test is the same .

So adjust the super parameters ：learning_rate = 0.002,batch_size = 20,epoch = 1,momentuma = 0.5

The loss and accuracy of the test are ：

Then you can view the results of real tags and predicted tags ：

It doesn't really make sense to see , The teacher said to look at the label , Actually, it depends on the test accuracy .

原网站

版权声明
本文为[Glutinous rice balls]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/03/202203020529569357.html

当前位置：网站首页>The classification effect of converting video classification data set to picture classification data set on vgg16

The classification effect of converting video classification data set to picture classification data set on vgg16

Dataset processing

边栏推荐

猜你喜欢

随机推荐