当前位置:网站首页>(2)融合cbam的two-stream项目搭建----数据准备
(2)融合cbam的two-stream项目搭建----数据准备
2022-07-27 15:09:00 【秃头嘤嘤魔】
数据准备
1、光流提取
dense_flow安装教程可见:安装dense_flow
dense_flow提取过程可见:dense_flow代码理解
过程很简单,首先就是一些参数的输入,数据集、光流存放和光流提取工具的路径,读取帧的宽、高,线程个数,GPU个数,提取出的光流格式(dir、zip),视频文件格式(avi、mp4)。读取数据集中所有视频文件存在vid_list,并打印视频数量。将视频路径和视频类别打包成一个元组,输入到光流提取函数run_optical_flow中,调用dense_flow工具,完成rgb图片和光流提取。具体见代码注释。
from __future__ import print_function
import os
import sys
import glob
import argparse
from pipes import quote
from multiprocessing import Pool, current_process
def run_optical_flow(vid_item):
#文件名
vid_path = vid_item[0]
#文件索引
vid_id = vid_item[1]
#得到视频文件的名字
vid_name = vid_path.split('/')[-1].split('.')[0]
#为视频文件创建一个新文件夹
out_full_path = os.path.join(out_path, vid_name)
try:
os.mkdir(out_full_path)
except OSError:
pass
current = current_process()
#获取GPU的id
dev_id = (int(current._identity[0]) - 1) % NUM_GPU
#构建图像、光流路径
image_path = '{}/img'.format(out_full_path)
flow_x_path = '{}/flow_x'.format(out_full_path)
flow_y_path = '{}/flow_y'.format(out_full_path)
#quote将字符串转化为ASSIC码
#-f 视频路径 -x x光流 -y y光流 -i rgb图片 -d GPU编号 -o 文件格式 -w 新宽度 -h 新高度
cmd = os.path.join(df_path + 'build/extract_gpu')+' -f {} -x {} -y {} -i {} -b 20 -t 1 -d {} -s 1 -o {} -w {} -h {}'.format(
quote(vid_path), quote(flow_x_path), quote(flow_y_path), quote(image_path), dev_id, out_format, new_size[0], new_size[1])
print(cmd);
os.system(cmd)
print('{} {} done'.format(vid_id, vid_name))
sys.stdout.flush()
return True
if __name__ == '__main__':
parser = argparse.ArgumentParser(description="extract optical flows")
#src_dir :数据集存放路径
parser.add_argument("--src_dir", type=str, default='./UCF-101',
help='path to the video data')
#out_dir :rgb图像和光流存放路径
parser.add_argument("--out_dir", type=str, default='./ucf101_frames',
help='path to store frames and optical flow')
#df_path :密集光流工具路径
parser.add_argument("--df_path", type=str, default='./dense_flow/',
help='path to the dense_flow toolbox')
#new_width :图像resize新宽度
parser.add_argument("--new_width", type=int, default=0, help='resize image width')
#new_height :图像resize新高度
parser.add_argument("--new_height", type=int, default=0, help='resize image height')
#num_worker 线程个数
parser.add_argument("--num_worker", type=int, default=8)
#num_gpu :gpu数量
parser.add_argument("--num_gpu", type=int, default=2, help='number of GPU')
#out_format :光流文件格式
parser.add_argument("--out_format", type=str, default='dir', choices=['dir','zip'],
help='path to the dense_flow toolbox')
#ext :视频文件后缀
parser.add_argument("--ext", type=str, default='avi', choices=['avi','mp4'],
help='video file extensions')
args = parser.parse_args()
out_path = args.out_dir
src_path = args.src_dir
num_worker = args.num_worker
df_path = args.df_path
out_format = args.out_format
ext = args.ext
new_size = (args.new_width, args.new_height)
NUM_GPU = args.num_gpu
if not os.path.isdir(out_path):
print("creating folder: "+out_path)
os.makedirs(out_path)
#获取数据集所有的文件
vid_list = glob.glob(src_path+'/*/*.'+ext)
#打印文件个数
print(len(vid_list))
pool = Pool(num_worker)
#zip将列表打包成元组,此处是将文件名和文件索引即视频所属类别打包成元组
pool.map(run_optical_flow, zip(vid_list, range(len(vid_list))))
2、rgb和flow训练、测试数据列表生成
在UCF-101数据集中,官方附赠了一个数据集列表文件,将ucf101数据集用三种划分方法分成训练集和测试集,得到六个数据列表文件testlist01.txt、testlist02.txt、testlist03.txt、trainlist01.txt、trainlist02.txt、trainlist03.txt,文件内容为类别/视频名,再加上一个ClassInd(描述类别对应的标签1开始)。
由于网络的输入是rgb帧和flow光流,因此需要将上述六个视频列表文件转化成rgb和flow列表文件。build_file_list.py就是用来生成rgb和flow列表文件的。对每种划分方案中的每个训练集和测试集列表文件,计算其中每个视频中含有的rgb数量和flow数量,并将其与视频所属类别一起写进文件中。也就是说上述6个文件是三种不同划分形式的测试集和训练集文件,再将其分别分为rgb和flow,对应生成了12个文件。
实现过程:
1、parse_ucf101_splits()
读取自带的6个文件(类别/视频名),根据ClassInd找出每个视频的类别,
先读取第一种划分方案下的训练集和测试集,元组形式返回训练集中的(视频名,标签)和测试集中的(视频名,标签),然后将其加到split中,并按照此顺序读取其他两种划分方案。返回的列表Split,第一维表示哪种划分方案,第二维表示数据集类别(测试集or训练集),第三维代表(视频名,标签)。
2、parse_directory(path, rgb_prefix=‘img_’, flow_x_prefix=‘flow_x_’, flow_y_prefix=‘flow_y_’)
根据输入的rgb和flow路径,计算每个视频下rgb和flow的数量,返回二维列表分别表示rgb和flow数量
3、build_split_list(split_tuple, frame_info, split_idx, shuffle=False)
split_tuple为第一步解析得到的列表,首先得到第idx中划分方案的数据集即split =split_tuple[idx],分别对其中训练集split[0]和测试集split[1]中的视频求其rgb和flow数量,并按照视频名 rgb/flow数量 标签的格式以字符串存储。
4、对第三步中得到的,按行分别写进对应文件,最终形成12个帧列表文件。
####
#前提是已经将视频分帧以及光流计算处理,存放在每个视频文件下
#建立描述性文件,描述每个视频文件中含有的帧的数量、光流的数量以及视频所属类别
####
import argparse
import os
import glob
import random
import fnmatch
#函数主要作用是对path路径下的文件进行分析rgb图和flow的数量,其中每个文件夹表示一个视频
def parse_directory(path, rgb_prefix='img_', flow_x_prefix='flow_x_', flow_y_prefix='flow_y_'):
#将path路径输出
print('parse frames under folder {}'.format(path))
#h获取path路径下所有文件路径于frame_folders中
frame_folders = glob.glob(os.path.join(path, '*'))
#函数作用:计算某个视频下rgb,flow_x,flow_y文件的个数
def count_files(directory, prefix_list):
#返回directory目录下的所有文件或文件夹的名字
lst = os.listdir(directory)
#找出所有文件中满足img_,flow_x,flow_y的文件的个数存在cnt_list中
cnt_list = [len(fnmatch.filter(lst, x+'*')) for x in prefix_list]
return cnt_list
rgb_counts = {
}
flow_counts = {
}
#i为文件在path下的索引,f为文件路径
for i,f in enumerate(frame_folders):
#查找在f文件夹中,分别有图片、光流x、光流y文件的个数
all_cnt = count_files(f, (rgb_prefix, flow_x_prefix, flow_y_prefix))
#截断f的最后一块,即f文件的名字
k = f.split('/')[-1]
rgb_counts[k] = all_cnt[0]
#x、y分别为光流的x、y方向的两个通道
x_cnt = all_cnt[1]
y_cnt = all_cnt[2]
if x_cnt != y_cnt:
raise ValueError('x and y direction have different number of flow images. video: '+f)
flow_counts[k] = x_cnt
if i % 200 == 0:
print('{} videos parsed'.format(i))
print('frame folder analysis done')
return rgb_counts, flow_counts
#split_tuple的长度为训练集文件位置描述的文件数量,即为3
#frame_info指每个视频中含有的rgb数量和flow数量
#返回每个视频文件中rgb图像数量和flow数量以及视频对应类别
def build_split_list(split_tuple, frame_info, split_idx, shuffle=False):
#找到第split_idx训练集文件和测试集文件
split = split_tuple[split_idx]
#set_list描述文件名和文件类别
def build_set_list(set_list):
rgb_list, flow_list = list(), list()
for item in set_list:
#item[0]为文件名,item[1]为文件类别
rgb_cnt = frame_info[0][item[0]]
flow_cnt = frame_info[1][item[0]]
rgb_list.append('{} {} {}\n'.format(item[0], rgb_cnt, item[1]))
flow_list.append('{} {} {}\n'.format(item[0], flow_cnt, item[1]))
if shuffle:
random.shuffle(rgb_list)
random.shuffle(flow_list)
return rgb_list, flow_list
#split[0]表示训练集中文件名+类别,split[1]表示测试集文件名+类别
#根据split构建训练集rgb和flow以及测试集rgb和flow
train_rgb_list, train_flow_list = build_set_list(split[0])
test_rgb_list, test_flow_list = build_set_list(split[1])
return (train_rgb_list, test_rgb_list), (train_flow_list, test_flow_list)
#解析UCF101数据集,将训练集和测试集每个文件的名字id和类别label返回
def parse_ucf101_splits():
#class_ind 序号+视频类别
class_ind = [x.strip().split() for x in open('ucf101_splits/classInd.txt')]
#将视频类别映射为数字即标签,用元胞数组表示(序号从0开始),
class_mapping = {
x[1]:int(x[0])-1 for x in class_ind}
#将训练集、测试集文件解析,返回文件名和类别
def line2rec(line):
#去掉line前后空白格用/分开,动作描述+文件名称
items = line.strip().split('/')
#得到文件的标签
label = class_mapping[items[0]]
#得到文件的名字,split('.')是将文件后缀名去掉
vid = items[1].split('.')[0]
return vid, label
splits = []
for i in xrange(1, 4):
train_list = [line2rec(x) for x in open('ucf101_splits/trainlist{:02d}.txt'.format(i))]
test_list = [line2rec(x) for x in open('ucf101_splits/testlist{:02d}.txt'.format(i))]
splits.append((train_list, test_list))
return splits
if __name__ == '__main__':
parser = argparse.ArgumentParser()
#添加参数,数据集默认为ucf101
parser.add_argument('--dataset', type=str, default='ucf101', choices=['ucf101', 'hmdb51'])
#rgb和flow的路径,默认为./ucf101_frame
parser.add_argument('--frame_path', type=str, default='./ucf101_frames',
help="root directory holding the frames")
#外部列表路径
parser.add_argument('--out_list_path', type=str, default='./settings')
#rgb文件的前缀
parser.add_argument('--rgb_prefix', type=str, default='img_',
help="prefix of RGB frames")
#x方向flow文件前缀
parser.add_argument('--flow_x_prefix', type=str, default='flow_x',
help="prefix of x direction flow images")
#y方向flow文件前缀
parser.add_argument('--flow_y_prefix', type=str, default='flow_y',
help="prefix of y direction flow images", )
#数据集划分段个数
parser.add_argument('--num_split', type=int, default=3,
help="number of split building file list")
#数据是否打乱
parser.add_argument('--shuffle', action='store_true', default=False)
args = parser.parse_args()
dataset = args.dataset
frame_path = args.frame_path
rgb_p = args.rgb_prefix
flow_x_p = args.flow_x_prefix
flow_y_p = args.flow_y_prefix
num_split = args.num_split
out_path = args.out_list_path
shuffle = args.shuffle
#得到数据集路径
out_path = os.path.join(out_path,dataset)
#数据集路径不存在的话,创建一个这样的路径
if not os.path.isdir(out_path):
print("creating folder: "+out_path)
os.makedirs(out_path)
# operation
print('processing dataset {}'.format(dataset))
#解析数据集,得到数据集每个文件的名字和类别
if dataset=='ucf101':
split_tp = parse_ucf101_splits()
else:
split_tp = parse_hmdb51_splits()
#得到frame_path下每个视频中单帧图片和光流图片的数量
f_info = parse_directory(frame_path, rgb_p, flow_x_p, flow_y_p)
print('writing list files for training/testing')
#xrange(m)-----[0,1...m-1]
for i in xrange(max(num_split, len(split_tp))):
lists = build_split_list(split_tp, f_info, i, shuffle)
open(os.path.join(out_path, 'train_rgb_split{}.txt'.format(i + 1)), 'w').writelines(lists[0][0])
open(os.path.join(out_path, 'val_rgb_split{}.txt'.format(i + 1)), 'w').writelines(lists[0][1])
open(os.path.join(out_path, 'train_flow_split{}.txt'.format(i + 1)), 'w').writelines(lists[1][0])
open(os.path.join(out_path, 'val_flow_split{}.txt'.format(i + 1)), 'w').writelines(lists[1][1])
3、ucf101数据集
实现:
1、首先将数据集解析为一个视频类别列表和一个(视频,标签)的元组。接着根据输入指定的帧列表文件得到其-----> List[视频路径,帧数量,视频类别]。
2、getitem()函数:segment为一个视频中rgb或者flow取segment个。首先或者该视频的(视频路径,数量duration,视频类别),接着根据segment,每隔duration/segment取一个rgb图或者flow,如果是训练集则随机取,如果是测试集则每次取中间。对取到的图片进行预处理(增强特征),返回取到的segment个流,以及对应的标签。
附:训练集随机取,这样保证每次的图片不一样,从而可以使训练的模型更加准确,而测试集每次取的需要保证一样,因为它是用来测试的,相同的图片测试不同的模型才会有比较性。
这个文件不能单独运行,是用来在主函数中,训练模型时进行数据导入的。
import torch.utils.data as data
import os
import sys
import random
import numpy as np
import cv2
#给定数据集路径,返回视频类别和一个视频类别与标签的元组
def find_classes(dir):
classes = [d for d in os.listdir(dir) if os.path.isdir(os.path.join(dir, d))]
classes.sort()
class_to_idx = {
classes[i]: i for i in range(len(classes))}
return classes, class_to_idx
def make_dataset(root, source):
if not os.path.exists(source):
print("Setting file %s for ucf101 dataset doesn't exist." % (source))
sys.exit()
else:
clips = []
with open(source) as split_f:
data = split_f.readlines()
for line in data:
line_info = line.split()
#视频路径
clip_path = os.path.join(root, line_info[0])
#视频下帧数量
duration = int(line_info[1])
#视频类别
target = int(line_info[2])
item = (clip_path, duration, target)
clips.append(item)
return clips
#path 视频路径
def ReadSegmentRGB(path, offsets, new_height, new_width, new_length, is_color, name_pattern):
if is_color:
cv_read_flag = cv2.IMREAD_COLOR # > 0
else:
cv_read_flag = cv2.IMREAD_GRAYSCALE # = 0
interpolation = cv2.INTER_LINEAR
sampled_list = []
for offset_id in range(len(offsets)):
offset = offsets[offset_id]
for length_id in range(1, new_length+1):
frame_name = name_pattern % (length_id + offset)
frame_path = path + "/" + frame_name
cv_img_origin = cv2.imread(frame_path, cv_read_flag)
if cv_img_origin is None:
print("Could not load file %s" % (frame_path))
sys.exit()
# TODO: error handling here
if new_width > 0 and new_height > 0:
# use OpenCV3, use OpenCV2.4.13 may have error
cv_img = cv2.resize(cv_img_origin, (new_width, new_height), interpolation)
else:
cv_img = cv_img_origin
cv_img = cv2.cvtColor(cv_img, cv2.COLOR_BGR2RGB)
sampled_list.append(cv_img)
clip_input = np.concatenate(sampled_list, axis=2)
return clip_input
def ReadSegmentFlow(path, offsets, new_height, new_width, new_length, is_color, name_pattern):
if is_color:
cv_read_flag = cv2.IMREAD_COLOR # > 0
else:
cv_read_flag = cv2.IMREAD_GRAYSCALE # = 0
interpolation = cv2.INTER_LINEAR
sampled_list = []
for offset_id in range(len(offsets)):
offset = offsets[offset_id]
for length_id in range(1, new_length+1):
frame_name_x = name_pattern % ("x", length_id + offset)
frame_path_x = path + "/" + frame_name_x
cv_img_origin_x = cv2.imread(frame_path_x, cv_read_flag)
frame_name_y = name_pattern % ("y", length_id + offset)
frame_path_y = path + "/" + frame_name_y
cv_img_origin_y = cv2.imread(frame_path_y, cv_read_flag)
if cv_img_origin_x is None or cv_img_origin_y is None:
print("Could not load file %s or %s" % (frame_path_x, frame_path_y))
sys.exit()
# TODO: error handling here
if new_width > 0 and new_height > 0:
cv_img_x = cv2.resize(cv_img_origin_x, (new_width, new_height), interpolation)
cv_img_y = cv2.resize(cv_img_origin_y, (new_width, new_height), interpolation)
else:
cv_img_x = cv_img_origin_x
cv_img_y = cv_img_origin_y
sampled_list.append(np.expand_dims(cv_img_x, 2))
sampled_list.append(np.expand_dims(cv_img_y, 2))
clip_input = np.concatenate(sampled_list, axis=2)
return clip_input
class ucf101(data.Dataset):
def __init__(self,
root, #root 数据集路径
source, #source 数据集设置文件
phase, #phase 设置文件中的关键词(train val)
modality,#modality 数据形式(rgb、flow)
name_pattern=None, #name_pattern 文件格式
is_color=True,
num_segments=1, #num_segments 文件个数
new_length=1, #new_length 帧的个数
new_width=0,
new_height=0,
transform=None,
target_transform=None,
video_transform=None):
#class 视频名称 class_to_index 字典 名称:编号
classes, class_to_idx = find_classes(root)
#得到设置文件中每个视频 -----> List[视频路径,帧数量,视频类别]
clips = make_dataset(root, source)
if len(clips) == 0:
raise(RuntimeError("Found 0 video clips in subfolders of: " + root + "\n"
"Check your data directory."))
self.root = root
self.source = source
self.phase = phase
self.modality = modality
self.classes = classes
self.class_to_idx = class_to_idx
self.clips = clips
if name_pattern:
self.name_pattern = name_pattern
else:
if self.modality == "rgb":
self.name_pattern = "img_%05d.jpg"
elif self.modality == "flow":
self.name_pattern = "flow_%s_%05d.jpg"
self.is_color = is_color
self.num_segments = num_segments
self.new_length = new_length
self.new_width = new_width
self.new_height = new_height
self.transform = transform
self.target_transform = target_transform
self.video_transform = video_transform
#返回视频中的一帧
def __getitem__(self, index):
#某个视频文件中所有数据:路径、个数、类别
path, duration, target = self.clips[index]
average_duration = int(duration / self.num_segments)
offsets = []
#num_segment 将帧分成num_segment块
#Train中从每块中随机取一个,Val中取中间帧 offset记录帧的编号
for seg_id in range(self.num_segments):
if self.phase == "train":
if average_duration >= self.new_length:
offset = random.randint(0, average_duration - self.new_length)
# No +1 because randint(a,b) return a random integer N such that a <= N <= b.
offsets.append(offset + seg_id * average_duration)
else:
offsets.append(0)
elif self.phase == "val":
if average_duration >= self.new_length:
offsets.append(int((average_duration - self.new_length + 1)/2 + seg_id * average_duration))
else:
offsets.append(0)
else:
print("Only phase train and val are supported.")
if self.modality == "rgb":
clip_input = ReadSegmentRGB(path,
offsets,
self.new_height,
self.new_width,
self.new_length,
self.is_color,
self.name_pattern
)
elif self.modality == "flow":
clip_input = ReadSegmentFlow(path,
offsets,
self.new_height,
self.new_width,
self.new_length,
self.is_color,
self.name_pattern
)
else:
print("No such modality %s" % (self.modality))
#clip_input 对视频下的帧采样、resize、灰度化
#再对图片预处理
if self.transform is not None:
clip_input = self.transform(clip_input)
#target是标签处理,因为此处target是一个数值,可能需要将其转化为向量
if self.target_transform is not None:
target = self.target_transform(target)
if self.video_transform is not None:
clip_input = self.video_transform(clip_input)
#返回网络输入,类别
return clip_input, target
def __len__(self):
return len(self.clips)
边栏推荐
- 2021-06-18 SSM项目中自动装配错误
- Shardingsphere-proxy-5.0.0 distributed snowflake ID generation (III)
- webView基本使用
- 项目练习:表格的查改功能
- 如何通过C#/VB.NET从PDF中提取表格
- Node package depends on download management
- Day 7 summary & homework
- This large model sparse training method with high accuracy and low resource consumption has been found by Alibaba cloud scientists! Has been included in IJCAI
- 国产新冠口服药为什么是治艾滋病的药
- Reference of meta data placeholder
猜你喜欢

This large model sparse training method with high accuracy and low resource consumption has been found by Alibaba cloud scientists! Has been included in IJCAI

C语言之指针初级

MySQL - linked table query

Understand the basic properties of BOM and DOM

三表联查3

Node package depends on download management

App Crash收集和分析

Natural sorting: comparable interface, customized sorting: the difference between comparator interface

Flex弹性盒布局2

Niuke topic -- judge whether it is a complete binary tree or a balanced binary tree
随机推荐
App Crash收集和分析
SAP UI5 FileUploader 使用的隐藏 iframe 和 form 元素的设计明细
Layoff quarrel, musk: I'm too hard; Mercilessly open source a public opinion acquisition project; Feature engineering is as simple as parameter adjustment?! Nerf boss shouted that he couldn't move; Cu
Project exercise: the function of checking and modifying tables
Reference of meta data placeholder
Redis: 配置AOF不起作用
URL 返回nil 以及urlhash处理
Day 7 summary & homework
Leader: who uses redis overdue monitoring to close orders and get out of here!
随机数公式Random
C语言之文件操作
牛客题目——对称的二叉树
Niuke topic -- binary search tree and bidirectional linked list
Purchase in Appstore
【SAML SSO解决方案】上海道宁为您带来SAML for ASP.NET/SAML for ASP.NET Core下载、试用、教程
两表联查1
Jerry's book can't find Bluetooth solutions [article]
New attributes of ES6 array
大厂们终于无法忍受“加一秒”了,微软谷歌Meta等公司提议废除闰秒
牛客题目——链表的奇偶重排、输出二叉树的右视图、括号生成、字符流中第一个不重复的字符