Keras attention models including botnet,CoaT,CoAtNet,CMT,cotnet,halonet,resnest,resnext,resnetd,volo,mlp-mixer,resmlp,gmlp,levit

Overview

Keras_cv_attention_models


Usage

Basic Usage

  • Current under works: CMT, CoAtNet training.
  • Install as pip package:
    pip install -U keras-cv-attention-models
    # Or
    pip install -U git+https://github.com/leondgarse/keras_cv_attention_models
    Refer to each sub directory for detail usage.
  • Basic model prediction
    from keras_cv_attention_models import volo
    mm = volo.VOLO_d1(pretrained="imagenet")
    
    """ Run predict """
    import tensorflow as tf
    from tensorflow import keras
    from skimage.data import chelsea
    img = chelsea() # Chelsea the cat
    imm = keras.applications.imagenet_utils.preprocess_input(img, mode='torch')
    pred = mm(tf.expand_dims(tf.image.resize(imm, mm.input_shape[1:3]), 0)).numpy()
    pred = tf.nn.softmax(pred).numpy()  # If classifier activation is not softmax
    print(keras.applications.imagenet_utils.decode_predictions(pred)[0])
    # [('n02124075', 'Egyptian_cat', 0.9692954),
    #  ('n02123045', 'tabby', 0.020203391),
    #  ('n02123159', 'tiger_cat', 0.006867502),
    #  ('n02127052', 'lynx', 0.00017674894),
    #  ('n02123597', 'Siamese_cat', 4.9493494e-05)]
  • Exclude model top layers by set num_classes=0
    from keras_cv_attention_models import resnest
    mm = resnest.ResNest50(num_classes=0)
    print(mm.output_shape)
    # (None, 7, 7, 2048)

Layers

  • attention_layers is __init__.py only, which imports core layers defined in model architectures. Like RelativePositionalEmbedding from botnet, outlook_attention from volo.
from keras_cv_attention_models import attention_layers
aa = attention_layers.RelativePositionalEmbedding()
print(f"{aa(tf.ones([1, 4, 14, 16, 256])).shape = }")
# aa(tf.ones([1, 4, 14, 16, 256])).shape = TensorShape([1, 4, 14, 16, 14, 16])

Model surgery

  • model_surgery including functions used to change model parameters after built.
from keras_cv_attention_models import model_surgery
# Replace all ReLU with PReLU
mm = model_surgery.replace_ReLU(keras.applications.ResNet50(), target_activation='PReLU')

AotNet

  • Keras AotNet is just a ResNet / ResNetV2 like framework, that set parameters like attn_types and se_ratio and others, which is used to apply different types attention layer.
    # Mixing se and outlook and halo and mhsa and cot_attention, 21M parameters
    # 50 is just a picked number that larger than the relative `num_block`
    from keras_cv_attention_models import aotnet
    attn_types = [None, "outlook", ["mhsa", "halo"] * 50, "cot"]
    se_ratio = [0.25, 0, 0, 0]
    mm = aotnet.AotNet50V2(attn_types=attn_types, se_ratio=se_ratio, deep_stem=True, strides=1)

ResNetD

Model Params Image resolution Top1 Acc Download
ResNet50D 25.58M 224 80.530 resnet50d.h5
ResNet101D 44.57M 224 83.022 resnet101d.h5
ResNet152D 60.21M 224 83.680 resnet152d.h5
ResNet200D 64.69 224 83.962 resnet200d.h5

ResNeXt

Model Params Image resolution Top1 Acc Download
ResNeXt50 (32x4d) 25M 224 79.768 resnext50_imagenet.h5
- SWSL 25M 224 82.182 resnext50_swsl.h5
ResNeXt50D (32x4d + deep) 25M 224 79.676 resnext50d_imagenet.h5
ResNeXt101 (32x4d) 42M 224 80.334 resnext101_imagenet.h5
- SWSL 42M 224 83.230 resnext101_swsl.h5
ResNeXt101W (32x8d) 89M 224 79.308 resnext101_imagenet.h5
- SWSL 89M 224 84.284 resnext101w_swsl.h5

ResNetQ

Model Params Image resolution Top1 Acc Download
ResNet51Q 35.7M 224 82.36 resnet51q.h5

BotNet

Model Params Image resolution Top1 Acc Download
botnet50 21M 224 77.604 botnet50_imagenet.h5
botnet101 41M 224
botnet152 56M 224

VOLO

Model Params Image resolution Top1 Acc Download
volo_d1 27M 224 84.2 volo_d1_224.h5
volo_d1 ↑384 27M 384 85.2 volo_d1_384.h5
volo_d2 59M 224 85.2 volo_d2_224.h5
volo_d2 ↑384 59M 384 86.0 volo_d2_384.h5
volo_d3 86M 224 85.4 volo_d3_224.h5
volo_d3 ↑448 86M 448 86.3 volo_d3_448.h5
volo_d4 193M 224 85.7 volo_d4_224.h5
volo_d4 ↑448 193M 448 86.8 volo_d4_448.h5
volo_d5 296M 224 86.1 volo_d5_224.h5
volo_d5 ↑448 296M 448 87.0 volo_d5_448.h5
volo_d5 ↑512 296M 512 87.1 volo_d5_512.h5

ResNeSt

Model Params Image resolution Top1 Acc Download
resnest50 28M 224 81.03 resnest50.h5
resnest101 49M 256 82.83 resnest101.h5
resnest200 71M 320 83.84 resnest200.h5
resnest269 111M 416 84.54 resnest269.h5

HaloNet

Model Params Image resolution Top1 Acc
HaloNetH0 6.6M 256 77.9
HaloNetH1 9.1M 256 79.9
HaloNetH2 10.3M 256 80.4
HaloNetH3 12.5M 320 81.9
HaloNetH4 19.5M 384 83.3
- 21k 19.5M 384 85.5
HaloNetH5 31.6M 448 84.0
HaloNetH6 44.3M 512 84.4
HaloNetH7 67.9M 600 84.9

CoTNet

Model Params Image resolution FLOPs Top1 Acc Download
CoTNet-50 22.2M 224 3.3 81.3 cotnet50_224.h5
CoTNeXt-50 30.1M 224 4.3 82.1
SE-CoTNetD-50 23.1M 224 4.1 81.6 se_cotnetd50_224.h5
CoTNet-101 38.3M 224 6.1 82.8 cotnet101_224.h5
CoTNeXt-101 53.4M 224 8.2 83.2
SE-CoTNetD-101 40.9M 224 8.5 83.2 se_cotnetd101_224.h5
SE-CoTNetD-152 55.8M 224 17.0 84.0 se_cotnetd152_224.h5
SE-CoTNetD-152 55.8M 320 26.5 84.6 se_cotnetd152_320.h5

CoAtNet

Model Params Image resolution Top1 Acc
CoAtNet-0 25M 224 81.6
CoAtNet-1 42M 224 83.3
CoAtNet-2 75M 224 84.1
CoAtNet-2, ImageNet-21k pretrain 75M 224 87.1
CoAtNet-3 168M 224 84.5
CoAtNet-3, ImageNet-21k pretrain 168M 224 87.6
CoAtNet-3, ImageNet-21k pretrain 168M 512 87.9
CoAtNet-4, ImageNet-21k pretrain 275M 512 88.1
CoAtNet-4, ImageNet-21K + PT-RA-E150 275M 512 88.56

CMT

Model Params Image resolution Top1 Acc
CMTTiny 9.5M 160 79.2
CMTXS 15.2M 192 81.8
CMTSmall 25.1M 224 83.5
CMTBig 45.7M 256 84.5

CoaT

Model Params Image resolution Top1 Acc Download
CoaTLiteTiny 5.7M 224 77.5 coat_lite_tiny_imagenet.h5
CoaTLiteMini 11M 224 79.1 coat_lite_mini_imagenet.h5
CoaTLiteSmall 20M 224 81.9 coat_lite_small_imagenet.h5
CoaTTiny 5.5M 224 78.3 coat_tiny_imagenet.h5
CoaTMini 10M 224 81.0 coat_mini_imagenet.h5

MLP mixer

Model Params Top1 Acc ImageNet Imagenet21k ImageNet SAM
MLPMixerS32 19.1M 68.70
MLPMixerS16 18.5M 73.83
MLPMixerB32 60.3M 75.53 b32_imagenet_sam.h5
MLPMixerB16 59.9M 80.00 b16_imagenet.h5 b16_imagenet21k.h5 b16_imagenet_sam.h5
MLPMixerL32 206.9M 80.67
MLPMixerL16 208.2M 84.82 l16_imagenet.h5 l16_imagenet21k.h5
- input 448 208.2M 86.78
MLPMixerH14 432.3M 86.32
- input 448 432.3M 87.94

ResMLP

Model Params Image resolution Top1 Acc ImageNet
ResMLP12 15M 224 77.8 resmlp12_imagenet.h5
ResMLP24 30M 224 80.8 resmlp24_imagenet.h5
ResMLP36 116M 224 81.1 resmlp36_imagenet.h5
ResMLP_B24 129M 224 83.6 resmlp_b24_imagenet.h5
- imagenet22k 129M 224 84.4 resmlp_b24_imagenet22k.h5

GMLP

Model Params Image resolution Top1 Acc ImageNet
GMLPTiny16 6M 224 72.3
GMLPS16 20M 224 79.6 gmlp_s16_imagenet.h5
GMLPB16 73M 224 81.6

LeViT

Model Params Image resolution Top1 Acc ImageNet
LeViT128S 7.8M 224 76.6 levit128s_imagenet.h5
LeViT128 9.2M 224 78.6 levit128_imagenet.h5
LeViT192 11M 224 80.0 levit192_imagenet.h5
LeViT256 19M 224 81.6 levit256_imagenet.h5
LeViT384 39M 224 82.6 levit384_imagenet.h5

Other implemented keras models


Comments
  • TPU support for VOLO

    TPU support for VOLO

    While trying VOLO with TPU I'm getting this error, any idea how to reolve this?

    InvalidArgumentError: 9 root error(s) found.
      (0) Invalid argument: {{function_node __inference_train_function_137027}} Compilation failure: Detected unsupported operations when trying to compile graph cluster_train_function_5876961707884240013[] on XLA_TPU_JIT: ExtractImagePatches (No registered 'ExtractImagePatches' OpKernel for XLA_TPU_JIT devices compatible with node {{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}
    	 (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_INT64, _xla_inferred_shapes=[[1,?,?,9]], ksizes=[1, 3, 3, 1], padding="VALID", rates=[1, 1, 1, 1], strides=[1, 2, 2, 1], _device="/device:TPU_REPLICATED_CORE"){{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}One approach is to outside compile the unsupported ops to run on CPUs by enabling soft placement `tf.config.set_soft_device_placement(True)`. This has a potential performance penalty.
    	TPU compilation failed
    	 [[tpu_compile_succeeded_assert/_17543318848583046929/_5]]
    	 [[tpu_compile_succeeded_assert/_17543318848583046929/_5/_127]]
      (1) Invalid argument: {{function_node __inference_train_function_137027}} Compilation failure: Detected unsupported operations when trying to compile graph cluster_train_function_5876961707884240013[] on XLA_TPU_JIT: ExtractImagePatches (No registered 'ExtractImagePatches' OpKernel for XLA_TPU_JIT devices compatible with node {{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}
    	 (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_INT64, _xla_inferred_shapes=[[1,?,?,9]], ksizes=[1, 3, 3, 1], padding="VALID", rates=[1, 1, 1, 1], strides=[1, 2, 2, 1], _device="/device:TPU_REPLICATED_CORE"){{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}One approach is to outside compile the unsupported ops to run on CPUs by enabling soft placement `tf.config.set_soft_device_placement(True)`. This has a potential performance penalty.
    	TPU compilation failed
    	 [[tpu_compile_succeeded_assert/_17543318848583046929/_5]]
    	 [[tpu_compile_succeeded_assert/_17543318848583046929/_5/_103]]
      (2) Invalid argument: {{function_node __inference_train_function_137027}} Compilation failure: Detected unsupported operations when trying to compile graph cluster_train_function_5876961707884240013[] on XLA_TPU_JIT: ExtractImagePatches (No registered 'ExtractImagePatches' OpKernel for XLA_TPU_JIT devices compatible with node {{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}
    	 (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_INT64, _xla_inferred_shapes=[[1,?,?,9]], ksizes=[1, 3, 3, 1], padding="VALID", rates=[1, 1, 1, 1], strides=[1, 2, 2, 1], _device="/device:TPU_REPLICATED_CORE"){{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}One approach is to outside compile the unsupported ops to run on CPUs by enabling soft placement `tf.config.set_soft_device_placement(True)`. This has a potential performance penalty.
    	TPU compilation failed
    	 [[tpu_compile_succeeded_assert/_17543318848583046929 ... [truncated]
    
    enhancement 
    opened by awsaf49 14
  • Use YoloR with swin transformer as backbone.

    Use YoloR with swin transformer as backbone.

    @leondgarse I am trying to get inference using yolor with swin backbone but getting the following results. What can be the issue?

    from keras_cv_attention_models import efficientnet, yolor
    from keras_cv_attention_models import swin_transformer_v2
    
    from keras_cv_attention_models import efficientnet, yolor
    bb = swin_transformer_v2.SwinTransformerV2Small_window16(input_shape=(256, 256, 3), num_classes=1000)
    model = yolor.YOLOR(backbone=bb) 
    
    from keras_cv_attention_models import test_images
    imm = test_images.dog_cat()
    preds = model(model.preprocess_input(imm))
    bboxs, lables, confidences = model.decode_predictions(preds)[0]
    
    from keras_cv_attention_models.coco import data
    data.show_image_with_bboxes(imm, bboxs, lables, confidences)
    

    resulting output download

    opened by farazBhatti 10
  • MobileViT

    MobileViT

    Tried to run MobileViT_S model with input shape 256, 256, 3 and got the following error

    UnimplementedError Traceback (most recent call last) in () 2 3 history = model.fit(get_training_dataset_with_oversample(repeat_dataset=True, oversample=True), steps_per_epoch=STEPS_PER_EPOCH, epochs=EPOCHS, ----> 4 validation_data=get_validation_dataset(), validation_steps=VALIDATION_STEPS) 5

    1 frames /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in _numpy(self) 1189 return self._numpy_internal() 1190 except core._NotOkStatusException as e: # pylint: disable=protected-access -> 1191 raise core._status_to_exception(e) from None # pylint: disable=protected-access 1192 1193 @property

    UnimplementedError: 9 root error(s) found. (0) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] (1) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] [[while/body/_1/while/strided_slice_35/_445]] (2) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] [[while/body/_1/while/strided_slice_23/_381]] (3) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] [[while/body/_1/while/Pad_8/_407]] (4) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] [[while/body/_1/while/Maximum_2/y/_341]] (5) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f3 ... [truncated]

    bug good first issue 
    opened by KyloRen1 10
  • [General Questions] Rough estimates for training time for pre-training CoAtNet?

    [General Questions] Rough estimates for training time for pre-training CoAtNet?

    Hi, 👋 Thanks for such an amazing library and taking out the time to implement so many parts of the CoatNet paper!

    In your CoAtNet README, you mentioned you use TPU accelerators. Could you provide a ballpark for the amount of time it took for you to train the biggest models and the corresponding accelerators? I have a task for which I wish to use scaled-up models, but I'd have to pre-train on Imagenet first because of low data amount (<5-10M) and squeeze out maximum accuracy from fine-tuning.

    I assume there might've been a few bottlenecks also, perhaps data? 🤔 If you could describe your setup, it would be very helpful to my experiments!

    Sorry for bothering you with minor questions, and again thank you for all your work!

    opened by neel04 9
  • Visualize saliency map with the attention models

    Visualize saliency map with the attention models

    It would be great if some functional code could be included for plotting attention maps using the attention models. Such a functionality has been provided for the vision transformer models at https://github.com/faustomorales/vit-keras. Thanks and looking forward.

    enhancement good first issue 
    opened by sivaramakrishnan-rajaraman 9
  • How to save models ?

    How to save models ?

    @leondgarse I want to save the models in saved_model format. How to do that? When I am attempting it, it is showing me the error

    WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
    

    What can be the soluion for this?

    Code:

    import os
    from keras_cv_attention_models import mobilevit
    pretrained = '/content/mobilevit_xxs_imagenet.h5'
    model = mobilevit.MobileViT_XXS(pretrained=pretrained)
    model.save('mobilevit_xxs_imagenet1k')
    
    opened by sayannath 7
  • The order of height and width seems wrong in `tf.meshgrid(range(height), range(width))`

    The order of height and width seems wrong in `tf.meshgrid(range(height), range(width))`

    In Line 44 of beit.py, you use tf.meshgrid(range(height), range(width)), while it should be tf.meshgrid(range(width), range(height)), isn't it?

    When I ran the code from Line 44 to Line 52 with height=3 and width=4, it gives the output

    [[17 16 15 10  9  8  3  2  1 -4 -5 -6]
     [18 17 16 11 10  9  4  3  2 -3 -4 -5]
     [19 18 17 12 11 10  5  4  3 -2 -3 -4]
     [24 23 22 17 16 15 10  9  8  3  2  1]
     [25 24 23 18 17 16 11 10  9  4  3  2]
     [26 25 24 19 18 17 12 11 10  5  4  3]
     [31 30 29 24 23 22 17 16 15 10  9  8]
     [32 31 30 25 24 23 18 17 16 11 10  9]
     [33 32 31 26 25 24 19 18 17 12 11 10]
     [38 37 36 31 30 29 24 23 22 17 16 15]
     [39 38 37 32 31 30 25 24 23 18 17 16]
     [40 39 38 33 32 31 26 25 24 19 18 17]], shape=(12, 12), dtype=int32)
    

    which seems incorrect.

    Of course, this is not a problem if you assume height==width, but I think tf.meshgrid(range(width), range(height)) gives more readability and can potentially prevent bugs if height != width is supported in the future.

    bug enhancement 
    opened by xskxzr 6
  • Training of YoloXS Model on Coco dataset

    Training of YoloXS Model on Coco dataset

    Hi, I am currently reproducing the coco training on YoloXS model with line below:

    python leondgarse/coco_train_script.py --det_header yolox.YOLOXS --data_name coco/2014 --batch_size 16

    After my training using 30 epochs, I am getting poor result, as

    # Show result
    from keras_cv_attention_models.coco import data
    data.show_image_with_bboxes(imm, bboxs, labels, confidences, num_classes=80)
    

    b8fb80fa-e897-4a40-a8be-50d4a59b23a1

    Do I have anything configure wrongly? Or any suggestion could I change? Thanks!

    opened by ThePaperFish 5
  • Update for EdgeNeXt

    Update for EdgeNeXt

    I reproduced EdgeNeXt based on torch and your project, Is there any mistake with this code? Why can't it show all layers details,looks like it's missing some layers in “summary”

    import tensorflow as tf
    from tensorflow import keras
    from keras_cv_attention_models.common_layers import (
        layer_norm, activation_by_name
    )
    from tensorflow.keras import initializers
    from keras_cv_attention_models.attention_layers import (
        conv2d_no_bias,
        drop_block,
    )
    import math
    
    BATCH_NORM_DECAY = 0.9
    BATCH_NORM_EPSILON = 1e-5
    TF_BATCH_NORM_EPSILON = 0.001
    LAYER_NORM_EPSILON = 1e-5
    
    
    @tf.keras.utils.register_keras_serializable(package="EdgeNeXt")
    class PositionalEncodingFourier(keras.layers.Layer):
        def __init__(self, hidden_dim=32, dim=768, temperature=10000):
            super(PositionalEncodingFourier, self).__init__()
            self.token_projection = tf.keras.layers.Conv2D(dim, kernel_size=1)
            self.scale = 2 * math.pi
            self.temperature = temperature
            self.hidden_dim = hidden_dim
            self.dim = dim
            self.eps = 1e-6
    
        def __call__(self, B, H, W, *args, **kwargs):
            mask_tf = tf.zeros([B, H, W])
            not_mask_tf = 1 - mask_tf
            y_embed_tf = tf.cumsum(not_mask_tf, axis=1)
            x_embed_tf = tf.cumsum(not_mask_tf, axis=2)
            y_embed_tf = y_embed_tf / (y_embed_tf[:, -1:, :] + self.eps) * self.scale  # 2 * math.pi
            x_embed_tf = x_embed_tf / (x_embed_tf[:, :, -1:] + self.eps) * self.scale  # 2 * math.pi
            dim_t_tf = tf.range(self.hidden_dim, dtype=tf.float32)
            dim_t_tf = self.temperature ** (2 * (dim_t_tf // 2) / self.hidden_dim)
            pos_x_tf = x_embed_tf[:, :, :, None] / dim_t_tf
            pos_y_tf = y_embed_tf[:, :, :, None] / dim_t_tf
            pos_x_tf = tf.reshape(tf.stack([tf.math.sin(pos_x_tf[:, :, :, 0::2]),
                                            tf.math.cos(pos_x_tf[:, :, :, 1::2])], axis=4),
                                  shape=[B, H, W, self.hidden_dim])
            pos_y_tf = tf.reshape(tf.stack([tf.math.sin(pos_y_tf[:, :, :, 0::2]),
                                            tf.math.cos(pos_y_tf[:, :, :, 1::2])], axis=4),
                                  shape=[B, H, W, self.hidden_dim])
            pos_tf = tf.concat([pos_y_tf, pos_x_tf], axis=-1)
            pos_tf = self.token_projection(pos_tf)
    
            return pos_tf
    
        def get_config(self):
            base_config = super().get_config()
            base_config.update({"token_projection": self.token_projection, "scale": self.scale,
                                "temperature": self.temperature, "hidden_dim": self.hidden_dim,
                                "dim": self.dim, "eps": self.eps})
            return base_config
    
    
    def EdgeNeXt(input_shape=(256, 256, 3), depths=[3, 3, 9, 3], dims=[24, 48, 88, 168],
                 global_block=[0, 0, 0, 3], global_block_type=['None', 'None', 'None', 'SDTA'],
                 drop_path_rate=1, layer_scale_init_value=1e-6, head_init_scale=1., expan_ratio=4,
                 kernel_sizes=[7, 7, 7, 7], heads=[8, 8, 8, 8], use_pos_embd_xca=[False, False, False, False],
                 use_pos_embd_global=False, d2_scales=[2, 3, 4, 5], epsilon=1e-6, model_name='EdgeNeXt'):
        inputs = keras.layers.Input(input_shape, batch_size=2)
    
        nn = conv2d_no_bias(inputs, dims[0], kernel_size=4, strides=4, padding="valid", name="stem_")
        nn = layer_norm(nn, epsilon=epsilon, name='stem_')
    
        drop_connect_rates = tf.linspace(0, stop=drop_path_rate, num=int(
            sum(depths)))  # drop_connect_rates_split(num_blocks, start=0.0, end=drop_connect_rate)
        cur = 0
        for i in range(4):
            for j in range(depths[i]):
                if j > depths[i] - global_block[i] - 1:
                    if global_block_type[i] == 'SDTA':
                        SDTA_encoder(dim=dims[i], drop_path=drop_connect_rates[cur + j],
                                     expan_ratio=expan_ratio, scales=d2_scales[i],
                                     use_pos_emb=use_pos_embd_xca[i], num_heads=heads[i], name='stage_'+str(i)+'_SDTA_encoder_'+str(j))(nn)
                    else:
                        raise NotImplementedError
                else:
                    if i != 0 and j == 0:
                        nn = layer_norm(nn, epsilon=epsilon, name='stage_' + str(i) + '_')
                        nn = conv2d_no_bias(nn, dims[i], kernel_size=2, strides=2, padding="valid",
                                            name='stage_' + str(i) + '_')
    
                    Conv_Encoder(dim=dims[i], drop_path=drop_connect_rates[cur + j],
                                 layer_scale_init_value=layer_scale_init_value,
                                 expan_ratio=expan_ratio, kernel_size=kernel_sizes[i], name='stage_'+str(i)+'_Conv_Encoder_'+str(j) + '_')(nn)  # drop_connect_rates[cur + j]
    
        model = keras.models.Model(inputs, nn, name=model_name)
        return model
    
    
    @tf.keras.utils.register_keras_serializable(package="EdgeNeXt")
    class Conv_Encoder(keras.layers.Layer):
        def __init__(self, dim, drop_path=0., layer_scale_init_value=1e-6, expan_ratio=4, kernel_size=7, epsilon=1e-6,
                     name=''):
    
            super(Conv_Encoder, self).__init__()
            self.encoder_name = name
            self.gamma = tf.Variable(layer_scale_init_value * tf.ones(dim), trainable=True,
                                     name=name + 'gamma') if layer_scale_init_value > 0 else None
            self.drop_path = drop_path
            self.dim = dim
            self.expan_ratio = expan_ratio
            self.kernel_size = kernel_size
            self.epsilon = epsilon
    
        def __call__(self, x, *args, **kwargs):
            inputs = x
            x = keras.layers.Conv2D(self.dim, kernel_size=self.kernel_size, padding="SAME", name=self.encoder_name +'Conv2D')(x)
            x = layer_norm(x, epsilon=self.epsilon, name=self.encoder_name)
            x = keras.layers.Dense(self.expan_ratio * self.dim)(x)
            x = activation_by_name(x, activation="gelu")
            x = keras.layers.Dense(self.dim)(x)
            if self.gamma is not None:
                x = self.gamma * x
    
            x = inputs + drop_block(x, drop_rate=0.)
    
            return x
    
        def get_config(self):
            base_config = super().get_config()
            base_config.update({"gamma": self.gamma, "drop_path": self.drop_path,
                                "dim": self.dim, "expan_ratio": self.expan_ratio,
                                "kernel_size": self.kernel_size})
            return base_config
    
    
    @tf.keras.utils.register_keras_serializable(package="EdgeNeXt")
    class SDTA_encoder(keras.layers.Layer):
        def __init__(self, dim, drop_path=0., layer_scale_init_value=1e-6, expan_ratio=4,
                     use_pos_emb=True, num_heads=8, qkv_bias=True, attn_drop=0., drop=0., scales=1, zero_gamma=False,
                     activation='gelu', use_bias=False, name='sdf'):
            super(SDTA_encoder, self).__init__()
            self.expan_ratio = expan_ratio
            self.width = max(int(math.ceil(dim / scales)), int(math.floor(dim // scales)))
            self.width_list = [self.width] * (scales - 1)
            self.width_list.append(dim - self.width * (scales - 1))
            self.dim = dim
            self.scales = scales
            if scales == 1:
                self.nums = 1
            else:
                self.nums = scales - 1
            self.pos_embd = None
            if use_pos_emb:
                self.pos_embd = PositionalEncodingFourier(dim=dim)
            self.xca = XCA(dim, num_heads=num_heads, qkv_bias=qkv_bias, attn_drop=attn_drop, proj_drop=drop)
            self.gamma_xca = tf.Variable(layer_scale_init_value * tf.ones(dim), trainable=True,
                                         name=name + 'gamma') if layer_scale_init_value > 0 else None
            self.gamma = tf.Variable(layer_scale_init_value * tf.ones(dim), trainable=True,
                                     name=name + 'gamma') if layer_scale_init_value > 0 else None
            self.drop_rate = drop_path
            self.drop_path = keras.layers.Dropout(drop_path)
            gamma_initializer = tf.zeros_initializer() if zero_gamma else tf.ones_initializer()
            self.norm = keras.layers.LayerNormalization(epsilon=LAYER_NORM_EPSILON, gamma_initializer=gamma_initializer,
                                                        name=name and name + "ln")
            self.norm_xca = keras.layers.LayerNormalization(epsilon=LAYER_NORM_EPSILON, gamma_initializer=gamma_initializer,
                                                            name=name and name + "norm_xca")
            self.activation = activation
            self.use_bias = use_bias
    
        def get_config(self):
            base_config = super().get_config()
            base_config.update({"width": self.width, "dim": self.dim,
                                "nums": self.nums, "pos_embd": self.pos_embd,
                                "xca": self.xca, "gamma_xca": self.gamma_xca,
                                "gamma": self.gamma, "norm": self.norm,
                                "activation": self.activation, "use_bias": self.use_bias,
                                })
            return base_config
    
        def __call__(self, inputs, *args, **kwargs):
            x = inputs
            spx = tf.split(inputs, self.width_list, axis=-1)
            for i in range(self.nums):
                if i == 0:
                    sp = spx[i]
                else:
                    sp = sp + spx[i]
                sp = keras.layers.Conv2D(self.width, kernel_size=3, padding='SAME')(sp)  # , groups=self.width
                if i == 0:
                    out = sp
                else:
                    out = tf.concat([out, sp], -1)
            inputs = tf.concat([out, spx[self.nums]], -1)
    
            # XCA
            B, H, W, C = inputs.shape
            inputs = tf.reshape(inputs, (-1, H * W, C))  # tf.transpose(), perm=[0, 2, 1])
    
            if self.pos_embd:
                pos_encoding = tf.reshape(self.pos_embd(B, H, W), (-1, H * W, C))
                inputs += pos_encoding
    
            if self.gamma_xca is not None:
                inputs = self.gamma_xca * inputs
            input_xca = self.gamma_xca * self.xca(self.norm_xca(inputs))
            inputs = inputs + drop_block(input_xca, drop_rate=self.drop_rate, name="SDTA_encoder_")
            inputs = tf.reshape(inputs, (-1, H, W, C))
    
            # Inverted Bottleneck
            inputs = self.norm(inputs)
            inputs = keras.layers.Conv2D(self.expan_ratio * self.dim, kernel_size=1, use_bias=self.use_bias)(inputs)
            inputs = activation_by_name(inputs, activation=self.activation)
            inputs = keras.layers.Conv2D(self.dim, kernel_size=1, use_bias=self.use_bias)(inputs)
            if self.gamma is not None:
                inputs = self.gamma * inputs
    
            x = x + self.drop_path(inputs)
            return x
    
    
    @tf.keras.utils.register_keras_serializable(package="EdgeNeXt")
    class XCA(keras.layers.Layer):
        def __init__(self, dim, num_heads=8, qkv_bias=False, attn_drop=0., proj_drop=0., name=""):
            super(XCA, self).__init__()
            self.num_heads = num_heads
            self.temperature = tf.Variable(tf.ones(num_heads, 1, 1), trainable=True, name=name + 'gamma')
    
            self.qkv = keras.layers.Dense(dim * 3, use_bias=qkv_bias)
            self.attn_drop = keras.layers.Dropout(attn_drop)
            self.k_ini = initializers.GlorotUniform()
            self.b_ini = initializers.Zeros()
            self.proj = keras.layers.Dense(dim, name="out",
                                           kernel_initializer=self.k_ini, bias_initializer=self.b_ini)
            self.proj_drop = keras.layers.Dropout(proj_drop)
    
        def __call__(self, inputs, training=None, *args, **kwargs):
            input_shape = inputs.shape
            qkv = self.qkv(inputs)
            qkv = tf.reshape(qkv, (input_shape[0], input_shape[1], 3,
                                   self.num_heads,
                                   input_shape[2] // self.num_heads))  # [batch, hh * ww, 3, num_heads, dims_per_head]
            qkv = tf.transpose(qkv, perm=[2, 0, 3, 4, 1])  # [3, batch, num_heads, dims_per_head, hh * ww]
            query, key, value = tf.split(qkv, 3, axis=0)  # [batch, num_heads, dims_per_head, hh * ww]
    
            norm_query, norm_key = tf.nn.l2_normalize(tf.squeeze(query), axis=-1, epsilon=1e-6), \
                                   tf.nn.l2_normalize(tf.squeeze(key), axis=-1, epsilon=1e-6)
            attn = tf.matmul(norm_query, norm_key, transpose_b=True)
            attn = tf.transpose(tf.transpose(attn, perm=[0, 2, 3, 1]) * self.temperature, perm=[0, 3, 2, 1])
    
            attn = tf.nn.softmax(attn, axis=-1)
            attn = self.attn_drop(attn, training=training)  # [batch, num_heads, hh * ww, hh * ww]
    
            x = tf.matmul(attn, value)  # [batch, num_heads, hh * ww, dims_per_head]
            x = tf.reshape(x, [input_shape[0], input_shape[1], input_shape[2]])
    
            x = self.proj(x)
            x = self.proj_drop(x)
    
            return x
    
        def get_config(self):
            base_config = super().get_config()
            base_config.update({"num_heads": self.num_heads, "temperature": self.temperature,
                                "qkv": self.qkv, "attn_drop": self.attn_drop,
                                "proj": self.proj, "proj_drop": self.proj_drop})
            return base_config
    
    
    def edgenext_xx_small(pretrained=False, **kwargs):
        # 1.33M & 260.58M @ 256 resolution
        # 71.23% Top-1 accuracy
        # No AA, Color Jitter=0.4, No Mixup & Cutmix, DropPath=0.0, BS=4096, lr=0.006, multi-scale-sampler
        # Jetson FPS=51.66 versus 47.67 for MobileViT_XXS
        # For A100: FPS @ BS=1: 212.13 & @ BS=256: 7042.06 versus FPS @ BS=1: 96.68 & @ BS=256: 4624.71 for MobileViT_XXS
        model = EdgeNeXt(depths=[2, 2, 6, 2], dims=[24, 48, 88, 168], expan_ratio=4,
                         global_block=[0, 1, 1, 1],
                         global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'],
                         use_pos_embd_xca=[False, True, False, False],
                         kernel_sizes=[3, 5, 7, 9],
                         heads=[4, 4, 4, 4],
                         d2_scales=[2, 2, 3, 4],
                         **kwargs)
    
        return model
    
    
    def edgenext_x_small(pretrained=False, **kwargs):
        # 2.34M & 538.0M @ 256 resolution
        # 75.00% Top-1 accuracy
        # No AA, No Mixup & Cutmix, DropPath=0.0, BS=4096, lr=0.006, multi-scale-sampler
        # Jetson FPS=31.61 versus 28.49 for MobileViT_XS
        # For A100: FPS @ BS=1: 179.55 & @ BS=256: 4404.95 versus FPS @ BS=1: 94.55 & @ BS=256: 2361.53 for MobileViT_XS
        model = EdgeNeXt(depths=[3, 3, 9, 3], dims=[32, 64, 100, 192], expan_ratio=4,
                         global_block=[0, 1, 1, 1],
                         global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'],
                         use_pos_embd_xca=[False, True, False, False],
                         kernel_sizes=[3, 5, 7, 9],
                         heads=[4, 4, 4, 4],
                         d2_scales=[2, 2, 3, 4],
                         **kwargs)
    
        return model
    
    
    def edgenext_small(pretrained=False, **kwargs):
        # 5.59M & 1260.59M @ 256 resolution
        # 79.43% Top-1 accuracy
        # AA=True, No Mixup & Cutmix, DropPath=0.1, BS=4096, lr=0.006, multi-scale-sampler
        # Jetson FPS=20.47 versus 18.86 for MobileViT_S
        # For A100: FPS @ BS=1: 172.33 & @ BS=256: 3010.25 versus FPS @ BS=1: 93.84 & @ BS=256: 1785.92 for MobileViT_S
        model = EdgeNeXt(depths=[3, 3, 9, 3], dims=[48, 96, 160, 304], expan_ratio=4,
                         global_block=[0, 1, 1, 1],
                         global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'],
                         use_pos_embd_xca=[False, True, False, False],
                         kernel_sizes=[3, 5, 7, 9],
                         d2_scales=[2, 2, 3, 4],
                         **kwargs)
    
        return model
    
    
    if __name__ == '__main__':
        model = edgenext_small()
        model.summary()
        # from download_and_load import keras_reload_from_torch_model
        # keras_reload_from_torch_model(
        #     'D:\GitHub\EdgeNeXt\edgenext_small.pth',
        #     keras_model=model,
        #     # tail_align_dict=tail_align_dict,
        #     # full_name_align_dict=full_name_align_dict,
        #     # additional_transfer=additional_transfer,
        #     input_shape=(256, 256),
        #     do_convert=True,
        #     save_name="adaface_ir101_webface4m.h5",
        # )
    
    
    
    ```
    
    
    
    
    
    opened by whalefa1I 5
  • custom layer issue at tflite conversion

    custom layer issue at tflite conversion

    Hi, thanks for the good references.

    I have implemented MobileViT with your package, and tried to convert the trained model into tflite format. At there, I met an error saying,

    Unknown layer: Addons>GroupNormalization. Please ensure this object is passed to the `custom_objects` argument. See https://www.tensorflow.org/guide/keras/save_and_serialize#registering_the_custom_object for details
    

    I tried to addd custom layer name as a parameter of model load, but still facing the issue.

    model = tf.keras.models.load_model('./checkpoints/model_best.h5', custom_objects={'AttentionLayer': AttentionLayer})
    

    Is there any way to solve this?

    Thanks,

    bug good first issue 
    opened by mhyeonsoo 4
  • coat.CoaTMini(input_shape=(200, 240, 1) ...) error: Dimensions must be equal, but are 730 and 677 for ...

    coat.CoaTMini(input_shape=(200, 240, 1) ...) error: Dimensions must be equal, but are 730 and 677 for ...

    Hi,

    I try to train a model with:

        model = coat.CoaTMini(input_shape=(200, 240, 1), num_classes=240, pretrained=None)
    

    but the model cannot be build, error out with:

    ValueError: Exception encountered when calling layer "tf.__operators__.add" (type TFOpLambda).
    
    Dimensions must be equal, but are 730 and 677 for '{{node tf.__operators__.add/AddV2}} = AddV2[T=DT_FLOAT](Placeholder, Placeholder_1)' with input shapes: [?,730,216], [?,677,216].
    
    Call arguments received:
      • x=tf.Tensor(shape=(None, 730, 216), dtype=float32)
      • y=tf.Tensor(shape=(None, 677, 216), dtype=float32)
      • name=None
    

    I just wonder something wrong with coat?

    Thanks.

    bug enhancement 
    opened by mw66 4
  • Can you provide the code for converting pytorch weights to tf?

    Can you provide the code for converting pytorch weights to tf?

    Hi. Can you provide the code for converting pytorch weights to tf, such as beit. Because I wanted to try the effect of beitv2's pre-training weights. Thanks!

    opened by 131404060321 1
  • tflite conversion - GPU/XNNPACK fails

    tflite conversion - GPU/XNNPACK fails

    Hi! Thanks for great repo! I have converted the EfficientFormer model to tflite. However, applying both XNNPACK and GPU delegates fail.

    GPU delegate created. INFO: Initialized TensorFlow Lite runtime. INFO: Created TensorFlow Lite delegate for GPU. Failed to apply GPU delegate. Benchmarking failed.

    XNNPACK delegate created. INFO: Initialized TensorFlow Lite runtime. INFO: Created TensorFlow Lite XNNPACK delegate for CPU. Failed to apply XNNPACK delegate. Benchmarking failed.

    Do you know what could be the issue? Im using latest tensorflow version for conversion.

    opened by macsmy 3
Releases(yolov7)
CUAD

Contract Understanding Atticus Dataset This repository contains code for the Contract Understanding Atticus Dataset (CUAD), a dataset for legal contra

The Atticus Project 273 Dec 17, 2022
code for the ICLR'22 paper: On Robust Prefix-Tuning for Text Classification

On Robust Prefix-Tuning for Text Classification Prefix-tuning has drawed much attention as it is a parameter-efficient and modular alternative to adap

Zonghan Yang 12 Nov 30, 2022
DeLag: Detecting Latency Degradation Patterns in Service-based Systems

DeLag: Detecting Latency Degradation Patterns in Service-based Systems Replication package of the work "DeLag: Detecting Latency Degradation Patterns

SEALABQualityGroup @ University of L'Aquila 2 Mar 24, 2022
Official code of paper: MovingFashion: a Benchmark for the Video-to-Shop Challenge

SEAM Match-RCNN Official code of MovingFashion: a Benchmark for the Video-to-Shop Challenge paper Installation Requirements: Pytorch 1.5.1 or more rec

HumaticsLAB 31 Oct 10, 2022
An open source implementation of CLIP.

OpenCLIP Welcome to an open source implementation of OpenAI's CLIP (Contrastive Language-Image Pre-training). The goal of this repository is to enable

2.7k Dec 31, 2022
A Keras implementation of YOLOv4 (Tensorflow backend)

keras-yolo4 请使用更完善的版本: https://github.com/miemie2013/Keras-YOLOv4 Please visit here for more complete model: https://github.com/miemie2013/Keras-YOLOv

384 Nov 29, 2022
YOLOPのPythonでのONNX推論サンプル

YOLOP-ONNX-Video-Inference-Sample YOLOPのPythonでのONNX推論サンプルです。 ONNXモデルは、hustvl/YOLOP/weights を使用しています。 Requirement OpenCV 3.4.2 or later onnxruntime 1.

KazuhitoTakahashi 8 Sep 05, 2022
这是一个利用facenet和retinaface实现人脸识别的库,可以进行在线的人脸识别。

Facenet+Retinaface:人脸识别模型在Pytorch当中的实现 目录 注意事项 Attention 所需环境 Environment 文件下载 Download 预测步骤 How2predict 参考资料 Reference 注意事项 该库中包含了两个网络,分别是retinaface和

Bubbliiiing 102 Dec 30, 2022
PanopticBEV - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images

Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images This r

63 Dec 16, 2022
A general and strong 3D object detection codebase that supports more methods, datasets and tools (debugging, recording and analysis).

ALLINONE-Det ALLINONE-Det is a general and strong 3D object detection codebase built on OpenPCDet, which supports more methods, datasets and tools (de

Michael.CV 5 Nov 03, 2022
Point Cloud Registration using Representative Overlapping Points.

Point Cloud Registration using Representative Overlapping Points (ROPNet) Abstract 3D point cloud registration is a fundamental task in robotics and c

ZhuLifa 36 Dec 16, 2022
Notification Triggers for Python

Notipyer Notification triggers for Python Send async email notifications via Python. Get updates/crashlogs from your scripts with ease. Installation p

Chirag Jain 17 May 16, 2022
Deep Q-Learning Network in pytorch (not actively maintained)

pytoch-dqn This project is pytorch implementation of Human-level control through deep reinforcement learning and I also plan to implement the followin

Hung-Tu Chen 342 Jan 01, 2023
Code for "Human Pose Regression with Residual Log-likelihood Estimation", ICCV 2021 Oral

Human Pose Regression with Residual Log-likelihood Estimation [Paper] [arXiv] [Project Page] Human Pose Regression with Residual Log-likelihood Estima

JeffLi 347 Dec 24, 2022
Contrastive Learning Inverts the Data Generating Process

Official code to reproduce the results and data presented in the paper Contrastive Learning Inverts the Data Generating Process.

71 Nov 25, 2022
Pytorch implementation of PTNet for high-resolution and longitudinal infant MRI synthesis

Pyramid Transformer Net (PTNet) Project | Paper Pytorch implementation of PTNet for high-resolution and longitudinal infant MRI synthesis. PTNet: A Hi

Xuzhe Johnny Zhang 6 Jun 08, 2022
Face Alignment using python

Face Alignment Face Alignment using python Input Image Aligned Face Aligned Face Aligned Face Input Image Aligned Face Input Image Aligned Face Instal

Sajjad Aemmi 28 Nov 23, 2022
Music Classification: Beyond Supervised Learning, Towards Real-world Applications

Music Classification: Beyond Supervised Learning, Towards Real-world Applications

104 Dec 15, 2022
Parameterized Explainer for Graph Neural Network

PGExplainer This is a Tensorflow implementation of the paper: Parameterized Explainer for Graph Neural Network https://arxiv.org/abs/2011.04573 NeurIP

Dongsheng Luo 89 Dec 12, 2022
This repository contains the source code of Auto-Lambda and baselines from the paper, Auto-Lambda: Disentangling Dynamic Task Relationships.

Auto-Lambda This repository contains the source code of Auto-Lambda and baselines from the paper, Auto-Lambda: Disentangling Dynamic Task Relationship

Shikun Liu 76 Dec 20, 2022