当前位置：网站首页>Build a series of vision transformer practices, and finally meet, Timm library!

Build a series of vision transformer practices, and finally meet, Timm library!

2022-07-25 12:19:00 【MengYa_ Dream】

Preface ： Poor use of tools , Everything worries , Originally, it was really a very simple idea to realize , It happened that I went round and round , Let's meet today Timm Kuba ！

Catalog

1. Provided by Baidu PaddlePaddle - Learn vision from scratch Transformer

2. resources ： Vision Transformer Excellent open source work

3. How to find out Timm-Debug

4 Timm library

4.1 Concept

4.2 Timm Library Vision Transformer Flexible use

4.2.1 vision_transformer.py Parameter interpretation

4.2.2 Timm Call and build the Library Vision Transformer

at present ,pytorch Needless to say, the popularity of ,vision transformer The popularity of , Know everything. . Actually , Build yourself vision transformer It was my original intention , However, there is simply no way to start , It's mainly about flexible use . today , Suddenly found that , It's not as bad as I thought , In previous experiments , It's still traceable .

First, let's put some hand-in-hand structures seen during the preliminary study transformer Resources for framework ideas and practices ：

1. Provided by Baidu PaddlePaddle - Learn vision from scratch Transformer

The theory is quite detailed , Code practice is built on Baidu sub paddle On the library , And pytorch Somewhat different , But the original intention remains unchanged .

Course ： Learn vision from scratch Transformer

Practice address ：BR-IDL/PaddleViT: PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+ (github.com)

Paddle-Pytorch API Corresponding table ：paddle pytorch contrast

2. resources ： Vision Transformer Excellent open source work

Vision Transformer Excellent open source work ：timm library vision transformer Code reading

python timm library -CSDN Blog _python timm

3. How to find out Timm-Debug

I have been trying to build my own network model , But he didn't get the effect he wanted , Actually , It's very painful . The reason is to reproduce the work of predecessors , There are many recurrences , But often there is no in-depth interpretation , Because floating on the surface , How is it possible to know his true intention ？ How do you know how he quotes the experience of his predecessors ？ It started again and again Debug, Breakthrough learning point by point , Between airborne python Of me , A lot of knowledge has also begun to accumulate , Sure enough , As long as the mind does not slip , There are more ways than difficulties ！

This is science popularization timm Kura ！ This is my reappearance Debug Observed by the author Vision Transformer When the module , What's going on , It's not completely self built at all ！ There are tools out there , Just rectify it ！ Everything seems to be better again hahahaha ！

4 Timm library

4.1 Concept

Timm：pyTorImageModels, Simply speaking , Namely PyTorch One of the libraries of , It is torchvision.models Expansion module for , oriented CV Model of , Mainly based on classification . meanwhile , All models have default API.

Github：rwightman/pytorch-image-models: PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more (github.com)

Model is introduced ：Model Summaries - Pytorch Image Models (rwightman.github.io)

Model results ：Results - Pytorch Image Models (rwightman.github.io)

4.2 Timm Library Vision Transformer Flexible use

Code：pytorch-image-models/vision_transformer.py at master · rwightman/pytorch-image-models (github.com)

4.2.1 vision_transformer.py Parameter interpretation

img_size: Image size , Default 224,tuple type , Inside int type .
patch_size: Patch size, Default 16,tuple type , Inside int type .
in_chans: Input the channel Count , Default 3,int type .
num_classes: classification head Number of classifications , Default 1000,int type .
embed_dim:Transformer Of embedding dimension, Default 768, int type .
depth: Transformer Of Block The number of , Default 12,int type .
num_head: attention heads The number of , Default 12,int type .
mlp_ratio: mlp hidden dim/embedding dim Value , Default 4, int type .
qkv_bias: attention Module Computing qkv when , need bias Do you ？ Default True,bool type .
qk_scale: Usually it is None.
drop_rate: dropout rate, Default 0,float type .
attn_drop_rate: attention Modular dropout rate, Default 0,float type .
drop_path_rate: Default 0,float type .
hybrid_backbone: Converting images into Patch Before , You need to pass a Backbone Do you ？ Default None. If it is None, Just convert the image directly into Patch. If not None, Just pass this Backbone, And then into Patch,nn.Module type .
norm_layer: Normalization layer type , Default None,nn.Module type .
Tuples (Tulpe) yes Python Another data type in , And list （List） It is also a set of ordered objects .

4.2.2 Timm Call and build the Library Vision Transformer

（1） Import the necessary libraries and models ：

import timm

（2） call timm Model in Library ：

model = timm.create_model("vit_deit_base_patch16_384", pretrained=pretrained)

（3） Adjust as needed , If there is a need , Follow up

原网站

版权声明
本文为[MengYa_ Dream]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/206/202207251127160103.html

当前位置：网站首页>Build a series of vision transformer practices, and finally meet, Timm library!

Build a series of vision transformer practices, and finally meet, Timm library!

1. Provided by Baidu PaddlePaddle - Learn vision from scratch Transformer

2. resources ： Vision Transformer Excellent open source work

3. How to find out Timm-Debug

4 Timm library

4.1 Concept

4.2 Timm Library Vision Transformer Flexible use

4.2.1 vision_transformer.py Parameter interpretation

4.2.2 Timm Call and build the Library Vision Transformer

边栏推荐

猜你喜欢

随机推荐