当前位置:网站首页>Dataloader参数collate_fn的使用
Dataloader参数collate_fn的使用
2022-06-28 02:07:00 【xx_xjm】
简而言之,这个参数就是用来设定dataloader最后输出的batch内容;dataloader一次性从dataset得到batch大小的数据,但这些数据本身是分散的,拿图片举例,比如我们batch为8,则我们得到的是8个[3,256,256](256为图片形状,随便设置的)大小的张量,通过collate_fn这个参数转化为形状为[8,3,256,256]的张量作为dataloader的输出。
一般情况下,这个参数是不用设置的,那什么是不一般的情况呢,比如数据长度不同的时候,最明显的拿NLP里面的句子长度举例,每个batch里面的句子长度不一样,如果使用默认的collate_fn方法,就可能报错,这时候就需要自定义collate_fn参数。
而重写这个参数也很简单,就是自定义一个函数,假设我们这里给这个函数取名字叫做my_collate_fn(batch),注意,只能有一个输入变量batch,batch包含了dataset里面getitem返回的所有值。
拿vqa任务来举例子,比如dataset的getitem每次返回一张图片的数据data,label,以及相应的question,answer,如果我们设定dataloader一次性获得8个大小的batch,则此时传入my_collate_fn的变量为一个list,这个list包含8个[data, label, question, answer],如果要使用默认的collate_fn,则要求,这8个[data, label, question, answer]里面,每一个变量的形状都是相同的,就是说8个data的形状相同,8个label的形状相同,8个question的形状,8个answer的形状也相同,这样才可以用默认的collate_fn参数,否则会报错。如果不满足的话就需要自定义collate_fn参数
举例如下,此时自定义的vqa_collate_fn的输入是list为8的变量,我们单独看其中的每个变量的第三个元素,可以发现,第0和第1个变量的第三个元素长度是不相同的,这时候如果使用默认的collate_fn就会报错,而在这里,我们自定义的函数里面,我们知识把每个变量的第三个元素放在一个list里面直接返回:
例子来源于ALBEF

边栏推荐
- 抓包整理外篇fiddler————了解工具栏[一]
- 数字化时代,企业须做好用户信息安全
- 为什么OpenCV计算的帧率是错误的?
- 剑指 Offer 53 - I. 在排序数组中查找数字 I(改进二分)
- What are the technologies to be mastered in the test? Database design for software testing
- 如何获取GC(垃圾回收器)的STW(暂停)时间?
- CMU提出NLP新范式—重构预训练,高考英语交出134高分
- [today in history] June 20: the father of MP3 was born; Fujitsu was established; Google acquires dropcam
- A16z: metauniverse unlocks new opportunities in game infrastructure
- 业内首个!可运行在移动设备端的视频画质主观体验MOS分评估模型!
猜你喜欢

为什么大厂压力大,竞争大,还有这么多人热衷于大厂呢?

一位博士在华为的22年(干货满满)

Mixed programming of C language and assembly language in stm32

Le routage des microservices de la passerelle a échoué au chargement des ressources statiques des microservices
![[today in history] June 17: the creator of the term](/img/00/30ccc2f54415a6aca000c42e277dc3.png)
[today in history] June 17: the creator of the term "hypertext" was born; The birth of Novell's chief scientist; Discovery channel on
![[today in history] June 25: the father of notebook was born; Windows 98 release; First commercial use of generic product code](/img/ef/a26127284fe57ac049a4313d89cf97.png)
[today in history] June 25: the father of notebook was born; Windows 98 release; First commercial use of generic product code

Gateway microservice routing failed to load microservice static resources

文件的相对路径写法

Artifact for converting pcap to JSON file: joy (installation)

2022年R1快开门式压力容器操作特种作业证考试题库及答案
随机推荐
windows 2003 64位系统php运行报错:1% 不是有效的 win32 应用程序
Reprinted article: the digital economy generates strong demand for computing power Intel releases a number of innovative technologies to tap the potential of computing power
Mixed programming of C language and assembly language in stm32
2021年软件测试工具总结——模糊测试工具
买股票应该下载什么软件最好最安全?
Windows 2003 64 bit system PHP running error: 1% is not a valid Win32 Application
What are the technologies to be mastered in the test? Database design for software testing
Packet capturing and sorting out external Fiddler -- understanding the toolbar [1]
Is your IOT security strong enough?
为什么OpenCV计算的帧率是错误的?
be fond of the new and tired of the old? Why do it companies prefer to spend 20K on recruiting rather than raise salaries to retain old employees
基于流的深度生成模型
Mysql database operation - stored procedure, view, transaction, index, database backup
Apache——阿帕奇簡介
You got 8K in the 3-year function test, but were overtaken by the new tester. In fact, you are pretending to work hard
抓包整理外篇fiddler————了解工具栏[一]
多快好省,低门槛AI部署工具FastDeploy测试版来了!
Reading makes people quiet
QEMU monitor usage
[today in history] June 15: the first mobile phone virus; AI master simahe was born; Chromebook launch