当前位置:网站首页>Transformers datacollatorwithpadding class
Transformers datacollatorwithpadding class
2022-06-26 14:32:00 【Live up to your youth】
Construction method
DataCollatorWithPadding(tokenizer:PreTrainedTokenizerBase
padding:typing.Union[bool, str, transformers.utils.generic.PaddingStrategy] = True
max_length : typing.Optional[int] = None
pad_to_multiple_of : typing.Optional[int] = None
return_tensors : str = 'pt ' )
stay transfomers in , Defined a DataCollator class , This class is used to package a single element of a dataset into a batch of data .DataCollatorWithPadding Class is DataCollator Class , This class will dynamically fill in the input data when packaging .
Parameters tokenizer Indicates the input word breaker . Parameters padding It can be for bool type ,True Indicates filling ,False Means not to fill ; It can also be a string , Indicates a population policy ,"longest" It means to fill according to the longest data in the input data ,"max_length" Indicates that it is filled to the parameter max_length Set the length ,“do_not_pad" Means not to fill . Parameters pad_to_multiple_of Represents a multiple of the filled data . Parameters return_tensors Indicates the data type returned , It can be for "pt”,pytorch data type ;“tf”,tensorflow data type ;“np”,"numpy" data type .
Examples of use
>>> import transformers
>>> import datasets
>>> dataset = datasets.load_dataset("glue", "cola", split="train")
>>> dataset = dataset.map(lambda data: tokenizer(data["sentence"],padding=True), batched=True)
>>> dataset
Dataset({
features: ['sentence', 'label', 'idx', 'input_ids', 'token_type_ids', 'attention_mask'],
num_rows: 8551
})
>>> tokenizer = transformers.BertTokenizer.from_pretrained("bert-base-uncased")
>>> data_collator = transformers.DataCollatorWithPadding(tokenizer,
padding="max_length",
max_length=12,
return_tensors="tf")
>>> dataset = dataset.to_tf_dataset(columns=["label", "input_ids"], batch_size=16, shuffle=False, collate_fn=data_collator)
>>> dataset
<PrefetchDataset element_spec={'input_ids': TensorSpec(shape=(None, None), dtype=tf.int64, name=None), 'attention_mask': TensorSpec(shape=(None, None), dtype=tf.int64, name=None), 'labels': TensorSpec(shape=(None,), dtype=tf.int64, name=None)}>
边栏推荐
- Eigen(3):error: ‘Eigen’ has not been declared
- Bucket of P (segment tree + linear basis)
- Educational Codeforces Round 117 (Rated for Div. 2)E. Messages
- Experience sharing of mathematical modeling: comparison between China and USA / reference for topic selection / common skills
- Linear basis
- Niuke challenge 48 e speed instant forwarding (tree over tree)
- Niuke challenge 53:c. strange magic array
- RISC-V 芯片架构新规范
- Electron
- 近期比较重要消息
猜你喜欢

Experience sharing of mathematical modeling: comparison between China and USA / reference for topic selection / common skills

ThreadLocal巨坑!内存泄露只是小儿科...

Win10 home vs pro vs enterprise vs enterprise LTSC

Relevant knowledge of information entropy

Usage of unique function

vmware部分设置

Sword finger offer 40.41 Sort (medium)

Freefilesync folder comparison and synchronization software

ThreadLocal giant pit! Memory leaks are just Pediatrics

ArcGIS batch export layer script
随机推荐
How to personalize VIM editor format (DIY)
C language | Consortium
Hard (magnetic) disk (II)
Record: why is there no lightning 4 interface graphics card docking station and mobile hard disk?
Usage of unique function
方程推导:二阶有源带通滤波器设计!(下载:教程+原理图+视频+代码)
Installation and uninstallation of MySQL software for windows
Atcoder bit operation & Conclusion + formula derivation
Error when redis is started: could not create server TCP listening socket *: 6379: bind: address already in use - solution
Jianzhi offer 43.47.46.48 dynamic planning (medium)
Chinese output of PostGIS console is garbled
wptx64能卸载吗_win10自带的软件哪些可以卸载
Extended hooks
Hard (magnetic) disk (I)
这才是优美的文件系统挂载方式,亲测有效
SwiftUI找回丢失的列表视图(List)动画
9 articles, 6 interdits! Le Ministère de l'éducation et le Ministère de la gestion des urgences publient et publient conjointement neuf règlements sur la gestion de la sécurité incendie dans les établ
Leaflet load day map
Luogu p4145 seven minutes of God created questions 2 / Huashen travels around the world
ArcGIS secondary development - arcpy delete layer