当前位置:网站首页>Mmdetection dataloader construction
Mmdetection dataloader construction
2022-06-10 22:41:00 【Wu lele~】
List of articles
Preface
This article will introduce mmdetection How to build dataloader Class .dataloader It mainly controls the iterative reading of data sets . It is necessary to realize dataset class . About dataset Class implementation please go to mmdetection And dataset Class construction .
1、 Overall process
stay pytorch in ,Dataloader The following important parameters are required for instance construction ( Intercept dataloader Source code ).
Arguments:
dataset (Dataset): dataset from which to load the data.
batch_size (int, optional): how many samples per batch to load
(default: ``1``).
shuffle (bool, optional): set to ``True`` to have the data reshuffled
at every epoch (default: ``False``).
sampler (Sampler or Iterable, optional): defines the strategy to draw
samples from the dataset. Can be any ``Iterable`` with ``__len__``
implemented. If specified, :attr:`shuffle` must not be specified.
batch_sampler (Sampler or Iterable, optional): like :attr:`sampler`, but
returns a batch of indices at a time. Mutually exclusive with
:attr:`batch_size`, :attr:`shuffle`, :attr:`sampler`,
and :attr:`drop_last`.
num_workers (int, optional): how many subprocesses to use for data
loading. ``0`` means that the data will be loaded in the main process.
(default: ``0``)
collate_fn (callable, optional): merges a list of samples to form a
mini-batch of Tensor(s). Used when using batched loading from a
map-style dataset.
Briefly introduce the meaning of each parameter :
dataset: Is to inherit Dataset Class ;
batch_size: Batch size
shuffle: True. At the beginning of a new round epoch when , Whether the data will be disrupted again
sampler: iterator : It stores the subscript of the data set ( May be disturbed / The order ). Is the iterator .
batch_samper: iteration sampler Middle subscript , Then according to the subscript dataset Remove from batch_size Data .
collate_fn: take batch Data is integrated into one list, Adjust width and height .
Maybe the definitions of the above parameters are a little vague . No problem , Just remember dataset,sampler,batch_sampler,dataloader Are iterators . As for iterators : Can be understood as for … in dataset: Just use it .
since Dataloader The main parameters are , Now look at mmdetection How about Chinese build_dataloader Of . Next, I'm going to explain it in two parts :
(1) How to instantiate a dataloader object . As shown in the figure below :mmdetection The following four parameters are mainly implemented in .GroupSamper Inherited from torch Of sampler class .shuffle Most of them are True. and batch_sampler Parameters mmdetection Use yes pytorch Implemented in BatchSampler class .
(2) Read a batch Data flow .
2、 Instantiation dataloader
2.1. GroupSampler Class implementation
dataset Please go to dataset Class construction . Here I post GroupSampler Source code :
class GroupSampler(Sampler):
def __init__(self, dataset, samples_per_gpu=1):
assert hasattr(dataset, 'flag')
self.dataset = dataset
self.samples_per_gpu = samples_per_gpu
self.flag = dataset.flag.astype(np.int64) #
self.group_sizes = np.bincount(self.flag) # np.bincount() Function Statistics Subscript 01 Number of occurrences .
self.num_samples = 0
for i, size in enumerate(self.group_sizes):
self.num_samples += int(np.ceil(
size / self.samples_per_gpu)) * self.samples_per_gpu
def __iter__(self):
indices = []
for i, size in enumerate(self.group_sizes): # self.group_sizes = [942,4096] ; among 942 Represents the length ratio <1 Number of images ;
if size == 0:
continue
indice = np.where(self.flag == i)[0] # Extract self.flag Medium equals current i The subscript . self.flag All images in the training set are stored in sequence aspect-ratio
assert len(indice) == size
np.random.shuffle(indice) # The subscript here is confused
num_extra = int(np.ceil(size / self.samples_per_gpu)
) * self.samples_per_gpu - len(indice)
indice = np.concatenate(
[indice, np.random.choice(indice, num_extra)])
indices.append(indice)
indices = np.concatenate(indices) # Merge Chen Yi list, The length is 5011 Of
indices = [ # according to batch take list Divide : if batch=1, Then the list is divided into 5011 Array of .
indices[i * self.samples_per_gpu:(i + 1) * self.samples_per_gpu]
for i in np.random.permutation(
range(len(indices) // self.samples_per_gpu))
]
indices = np.concatenate(indices)
indices = indices.astype(np.int64).tolist()
assert len(indices) == self.num_samples
return iter(indices)
def __len__(self):
return self.num_samples
In fact, it mainly realizes __iter__ Method to make it an iterator . The general idea is : If I had one 5000 A data set of images . Then the data set subscript is 0~4999. adopt np.random.shuffle Upset 5000 A subscript . If batch yes 2, Then get 2500 Yes . Will this 2500 The pair is stored in an array indices This list in . Finally through iter(indices) iteration .
2.2. BatchSampler class
This part mmdetection It uses pytorch Source code . I posted the source code :
class BatchSampler(Sampler[List[int]]):
def __init__(self, sampler: Sampler[int], batch_size: int, drop_last: bool) -> None:
self.sampler = sampler
self.batch_size = batch_size
self.drop_last = drop_last
def __iter__(self):
batch = []
for idx in self.sampler:
batch.append(idx)
if len(batch) == self.batch_size:
yield batch
batch = []
if len(batch) > 0 and not self.drop_last:
yield batch
def __len__(self):
# Can only be called if self.sampler has __len__ implemented
# We cannot enforce this condition, so we turn off typechecking for the
# implementation below.
# Somewhat related: see NOTE [ Lack of Default `__len__` in Python Abstract Base Classes ]
if self.drop_last:
return len(self.sampler) // self.batch_size # type: ignore
else:
return (len(self.sampler) + self.batch_size - 1) // self.batch_size # type: ignore
You can see from the source code :BatchSampler With sampler The initialization of the . At the same time __iter__ Method , Each iteration is enough for one batch, With the help of the generator yield batch, I.e. return to a batch data .
3、 Read a batch Data flow
Here I want to use a picture to illustrate : Words are not easy to describe :
summary
This paper mainly introduces mmdetection How to realize dataset,sampler To construct a Dataloader, in addition , It shows dataloader How to iterate each batch data internally . If you have any questions, welcome +vx:wulele2541612007, Pull you into the group to discuss communication .
边栏推荐
- 自己搞了一个相亲软件的源码,用兴趣的可以聊聊
- How small and micro enterprises build micro official websites at low cost
- LeetCode - 5. 最长回文子串
- Apache相关的几个安全漏洞修复
- 很流行的状态管理库 MobX 是怎么回事?
- Variables (automatic variables, static variables, register variables, external variables) and memory allocation of C malloc/free, calloc/realloc
- Opencv_100问_第三章 (11-15)
- 【TcaplusDB知识库】TcaplusDB shard搬迁介绍
- Redis从入门到入土
- 小微企业如何低成本搭建微官网
猜你喜欢

Sealem finance builds Web3 decentralized financial platform infrastructure

【TcaplusDB知识库】TcaplusDB日常巡检介绍

datagrip 报错 “The specified database user/password combination is rejected...”的解决方法

中小型会议如何进行数字化升级?
![[tcapulusdb knowledge base] tcapulusdb transaction management introduction](/img/cd/dc23cb8bddcbf04efeaf60dfadaec9.png)
[tcapulusdb knowledge base] tcapulusdb transaction management introduction

TcaplusDB君 · 行业新闻汇编(六)

Whale conference sharing: what should we do if the conference is difficult?

【TcaplusDB知识库】TcaplusDB查看线上运行情况介绍

Digital twin: third person mouse operation

Modify frontsortinglayer variable of spritemask
随机推荐
【Py】接口签名校验失败可能是由于ensure_ascii的问题
[py] the failure of interface signature verification may be due to ensure_ ASCII problems
SQL server queries are case sensitive
【TcaplusDB知识库】TcaplusDB查看进程状态介绍
[XPath] use following sibling to obtain the following peer nodes
MySQL主从复制解决读写分离
leetcode 130. Surrounded Regions 被围绕的区域(中等)
Tcapulusdb Jun · industry news collection (III)
Notes (II)
leetcode 130. Surrounded regions (medium)
Several Apache related security vulnerability fixes
1.Tornado简介&&本专栏搭建tornado项目简介
Opencv_100问_第四章 (16-20)
Notes (IV) - multithreading
GMPNN:Drug-drug interaction prediction with learnable size-adaptive molecular substructures.
重排 (reflow) 与重绘 (repaint)
Opencv_ 100 questions_ Chapter IV (16-20)
笔记(五)- JVM
[tcapulusdb knowledge base] tcapulusdb transaction management introduction
torch_ geometric