Gluon CV Toolkit

Overview

Gluon CV Toolkit

Build Status GitHub license PyPI PyPI Pre-release Downloads

PWC PWC PWC PWC PWC

| Installation | Documentation | Tutorials |

GluonCV provides implementations of the state-of-the-art (SOTA) deep learning models in computer vision.

It is designed for engineers, researchers, and students to fast prototype products and research ideas based on these models. This toolkit offers four main features:

  1. Training scripts to reproduce SOTA results reported in research papers
  2. Supports both PyTorch and MXNet
  3. A large number of pre-trained models
  4. Carefully designed APIs that greatly reduce the implementation complexity
  5. Community supports

Demo


Check the HD video at Youtube or Bilibili.

Supported Applications

Application Illustration Available Models
Image Classification:
recognize an object in an image.
classification 50+ models, including
ResNet, MobileNet,
DenseNet, VGG, ...
Object Detection:
detect multiple objects with their
bounding boxes in an image.
detection Faster RCNN, SSD, Yolo-v3
Semantic Segmentation:
associate each pixel of an image
with a categorical label.
semantic FCN, PSP, ICNet, DeepLab-v3, DeepLab-v3+, DANet, FastSCNN
Instance Segmentation:
detect objects and associate
each pixel inside object area with an
instance label.
instance Mask RCNN
Pose Estimation:
detect human pose
from images.
pose Simple Pose
Video Action Recognition:
recognize human actions
in a video.
action_recognition MXNet: TSN, C3D, I3D, I3D_slow, P3D, R3D, R2+1D, Non-local, SlowFast
PyTorch: TSN, I3D, I3D_slow, R2+1D, Non-local, CSN, SlowFast, TPN
Depth Prediction:
predict depth map
from images.
depth Monodepth2
GAN:
generate visually deceptive images
lsun WGAN, CycleGAN, StyleGAN
Person Re-ID:
re-identify pedestrians across scenes
re-id Market1501 baseline

Installation

GluonCV is built on top of MXNet and PyTorch. Depending on the individual model implementation(check model zoo for the complete list), you will need to install either one of the deep learning framework. Of course you can always install both for the best coverage.

Please also check installation guide for a comprehensive guide to help you choose the right installation command for your environment.

Installation (MXNet)

GluonCV supports Python 3.6 or later. The easiest way to install is via pip.

Stable Release

The following commands install the stable version of GluonCV and MXNet:

pip install gluoncv --upgrade
# native
pip install -U --pre mxnet -f https://dist.mxnet.io/python/mkl
# cuda 10.2
pip install -U --pre mxnet -f https://dist.mxnet.io/python/cu102mkl

The latest stable version of GluonCV is 0.8 and we recommend mxnet 1.6.0/1.7.0

Nightly Release

You may get access to latest features and bug fixes with the following commands which install the nightly build of GluonCV and MXNet:

pip install gluoncv --pre --upgrade
# native
pip install -U --pre mxnet -f https://dist.mxnet.io/python/mkl
# cuda 10.2
pip install -U --pre mxnet -f https://dist.mxnet.io/python/cu102mkl

There are multiple versions of MXNet pre-built package available. Please refer to mxnet packages if you need more details about MXNet versions.

Installation (PyTorch)

GluonCV supports Python 3.6 or later. The easiest way to install is via pip.

Stable Release

The following commands install the stable version of GluonCV and PyTorch:

pip install gluoncv --upgrade
# native
pip install torch==1.6.0+cpu torchvision==0.7.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
# cuda 10.2
pip install torch==1.6.0 torchvision==0.7.0

There are multiple versions of PyTorch pre-built package available. Please refer to PyTorch if you need other versions.

The latest stable version of GluonCV is 0.8 and we recommend PyTorch 1.6.0

Nightly Release

You may get access to latest features and bug fixes with the following commands which install the nightly build of GluonCV:

pip install gluoncv --pre --upgrade
# native
pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
# cuda 10.2
pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu102/torch_nightly.html

Docs ??

GluonCV documentation is available at our website.

Examples

All tutorials are available at our website!

Resources

Check out how to use GluonCV for your own research or projects.

Citation

If you feel our code or models helps in your research, kindly cite our papers:

@article{gluoncvnlp2020,
  author  = {Jian Guo and He He and Tong He and Leonard Lausen and Mu Li and Haibin Lin and Xingjian Shi and Chenguang Wang and Junyuan Xie and Sheng Zha and Aston Zhang and Hang Zhang and Zhi Zhang and Zhongyue Zhang and Shuai Zheng and Yi Zhu},
  title   = {GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing},
  journal = {Journal of Machine Learning Research},
  year    = {2020},
  volume  = {21},
  number  = {23},
  pages   = {1-7},
  url     = {http://jmlr.org/papers/v21/19-429.html}
}

@article{he2018bag,
  title={Bag of Tricks for Image Classification with Convolutional Neural Networks},
  author={He, Tong and Zhang, Zhi and Zhang, Hang and Zhang, Zhongyue and Xie, Junyuan and Li, Mu},
  journal={arXiv preprint arXiv:1812.01187},
  year={2018}
}

@article{zhang2019bag,
  title={Bag of Freebies for Training Object Detection Neural Networks},
  author={Zhang, Zhi and He, Tong and Zhang, Hang and Zhang, Zhongyue and Xie, Junyuan and Li, Mu},
  journal={arXiv preprint arXiv:1902.04103},
  year={2019}
}

@article{zhang2020resnest,
  title={ResNeSt: Split-Attention Networks},
  author={Zhang, Hang and Wu, Chongruo and Zhang, Zhongyue and Zhu, Yi and Zhang, Zhi and Lin, Haibin and Sun, Yue and He, Tong and Muller, Jonas and Manmatha, R. and Li, Mu and Smola, Alexander},
  journal={arXiv preprint arXiv:2004.08955},
  year={2020}
}
Comments
  • The training loss is nan in SSD

    The training loss is nan in SSD

    I use the default training script (train_ssd.py) to train SSD300, However, the training loss seems to be large and do not converge. The log file is shown below, and what's the problem for this? Thanks!

    Namespace(batch_size=32, data_shape=300, dataset='voc', epochs=240, gpus='0', log_interval=100, lr=0.001, lr_decay=0.1, lr_decay_epoch='160,200', momentum=0.9, network='vgg16_atrous', num_workers=4, resume='', save_interval=10, save_prefix='ssd_300_vgg16_atrous_voc', seed=233, start_epoch=0, wd=0.0005)
    Start training from [Epoch 0]
    [Epoch 0][Batch 99], Speed: 31.855627 samples/sec, CrossEntropy=12.250115, SmoothL1=nan
    [Epoch 0][Batch 199], Speed: 31.908488 samples/sec, CrossEntropy=12.212773, SmoothL1=nan
    [Epoch 0][Batch 299], Speed: 31.279352 samples/sec, CrossEntropy=12.199556, SmoothL1=nan
    [Epoch 0][Batch 399], Speed: 30.720933 samples/sec, CrossEntropy=12.192253, SmoothL1=nan
    [Epoch 0][Batch 499], Speed: 31.459032 samples/sec, CrossEntropy=12.187363, SmoothL1=nan
    [Epoch 0] Training cost: 548.135802, CrossEntropy=12.186622, SmoothL1=nan
    [Epoch 0] Validation: 
    aeroplane=0.000389
    bicycle=0.000102
    bird=0.004152
    boat=0.000011
    bottle=0.000183
    bus=0.000084
    car=0.000175
    cat=0.004419
    chair=0.001955
    cow=0.000036
    diningtable=0.000058
    dog=0.001009
    horse=0.000108
    motorbike=0.005600
    person=0.000281
    pottedplant=0.000103
    sheep=0.000045
    sofa=0.000598
    train=0.002716
    tvmonitor=0.000453
    mAP=0.001124
    [Epoch 1][Batch 99], Speed: 31.706873 samples/sec, CrossEntropy=12.165724, SmoothL1=nan
    
    opened by 1292765944 31
  • Quantization : Converting Yolo3 from fp32 to int8 : output always -1

    Quantization : Converting Yolo3 from fp32 to int8 : output always -1

    import mxnet as mx import gluoncv as gcv from mxnet.contrib.quantization import quantize_model #,quantize_net import logging net = gcv.model_zoo.yolo3_darknet53_voc(pretrained=True,ctx=mx.gpu())

    ctx = [mx.gpu(i) for i in range(mx.context.num_gpus())] net.hybridize() _=net(mx.nd.random.randn(1,3,608,608,ctx=mx.gpu())) net.export('yolo')

    def save_symbol(fname, sym, logger=None): if logger is not None: logger.info('Saving symbol into file at %s' % fname) sym.save(fname)

    def save_params(fname, arg_params, aux_params, logger=None): if logger is not None: logger.info('Saving params into file at %s' % fname) save_dict = {('arg:%s' % k): v.as_in_context(mx.gpu()) for k, v in arg_params.items()} save_dict.update({('aux:%s' % k): v.as_in_context(mx.gpu()) for k, v in aux_params.items()}) mx.nd.save(fname, save_dict)

    sym, arg_params, aux_params = mx.model.load_checkpoint('yolo',0)

    qsym, qarg_params, qaux_params = quantize_model(sym=sym, arg_params=arg_params, aux_params=aux_params, ctx=mx.gpu(), excluded_sym_names=['yolov30_yolooutputv32_conv0_fwd', 'yolov30_yolooutputv31_conv0_fwd', 'darknetv30_conv0_fwd', 'yolov30_yolooutputv30_conv0_fwd'], calib_mode=None, quantized_dtype='uint8', logger=logging) save_symbol('yolo-int8-symbol.json',qsym) save_params('yolo-int8-0000.params',qarg_params,aux_params)

    ##################################################### --The results are completely different for net(x) and mod.forward(x)-- Check the output for : data1 = mx.nd.random.randn(1,3,608,608,ctx = ctx[0]) mod = mx.mod.Module.load("yolo-int8",0) mod.bind(for_training=False, data_shapes=[('data', (1,3,608,608))], label_shapes=mod._label_shapes) mod.load_params('yolo-int8-0000.params') from collections import namedtuple Batch = namedtuple('Batch', ['data']) mod.forward(Batch([data1])) mod.get_outputs() data1 = mx.nd.random.randn(1,3,608,608,ctx = ctx[0]) mod.forward(Batch([data1])) mody.forward(Batch([data1]))

    #mody is same as mod but for fp32 yolo which gives same output as net(x) which is expected net(data1)[0][0][:5],mod.get_outputs()[0][0][:5],mody.get_outputs()[0][0][:5]

    OUT: ( [[19.] [-1.] [-1.] [-1.] [-1.]] <NDArray 5x1 @gpu(0)>, [[-1.] [-1.] [-1.] [-1.] [-1.]] <NDArray 5x1 @cpu(0)>, [[19.] [-1.] [-1.] [-1.] [-1.]] <NDArray 5x1 @cpu(0)>)


    Quantized model always gives -1s in classes. Also, is there any source where i can learn on how to convert this manually ? (sick of waiting for small number of devs working on this and also if there is any slack channel please add me i work at Amazon and have so many issues with mxnet/onnx/quantization/tensorrt/...)

    opened by djaym7 28
  • Where can I found the generation scripts of quantized ssd models ?

    Where can I found the generation scripts of quantized ssd models ?

    I found that the gluoncv/model_zoo/quantized/quantized.py only download the already quantized ssd models, like ssd_512_mobilenet1.0_voc_int8-symbol.json or ssd_300_vgg16_atrous_voc_int8-symbol.json. Now I want to generate my own ssd model implemented based on Gluon-CV, and i am not sure the contents of ecluded_sym_names, any advise ?

    opened by TriLoo 28
  • Please make sure source and target networks have the same prefix.

    Please make sure source and target networks have the same prefix.

    Trained an object detection model with a base net set as resnet50 on aws. Trying to load that in gluoncv for real time object detection. Using the code from this tutorial: https://gluon-cv.mxnet.io/build/examples_detection/demo_webcam.html

    Getting an issue where the prefixes aren't the same. Can someone point me in the right direction so I can run this model ?

    Here is my code where I load the model:

    net = gcv.model_zoo.get_model('ssd_512_resnet50_v1_custom', classes=my_classes, pretrained=True)
    net.load_parameters('model_algo_1-0000.params')
    

    Here is the output after running my program:

    AssertionError: Parameter 'resnetv10_conv0_weight' is missing in file 'model_algo_1-0000.params', which contains parameters: 'ssd0_stage1_unit3_bn2_moving_var', 'ssd0_stage1_unit1_bn2_gamma', 'ssd0_stage4_unit3_bn3_gamma', ..., 'ssd0_stage2_unit4_conv3_weight', 'ssd0_conv0_weight', 'ssd0_stage1_unit3_bn3_moving_mean', 'ssd0_stage2_unit3_bn2_gamma'.

    Please make sure source and target networks have the same prefix.

    It looks like I have the right parameter but it just has ssd0 as the prefix instead of resnetv10.

    resnetv10_conv0_weight ssd0_conv0_weight

    Any idea on how I can fix this ?

    Stale 
    opened by kunal732 26
  • Feature extraction using I3D

    Feature extraction using I3D

    I followed the same steps as the feature extraction tutorial using I3D, however, when I print the shape of the npy array I get, the shape is [1,2048]. My guess is this is what we get after flattening. After reading the documentation provided (gluoncv.model_zoo.action_recognition.i3d_resnet), it says that feat_ext: bool specifies whether or not to extract features before classification layer or do a complete forward pass. I tried setting bool to True and False, I still get the same shape [1,2048]. I want to generate the features without the flattening. Any insights on how to achieve that? Thank you

    opened by ysminabk 24
  • How Prepare custom datasets for object detection on gluonCV and MXNet?

    How Prepare custom datasets for object detection on gluonCV and MXNet?

    I would like to know how to prepare custom data sets for object detection for fine tuning a pretrained detection model in reference to this site "https://gluon-cv.mxnet.io/build/examples_datasets/detection_custom.html#derive-from-pascal-voc-format" But it seems difficult so i need some help ..

    Stale 
    opened by hakS07 23
  • Add quantized Fully Convolutional Network model

    Add quantized Fully Convolutional Network model

    @xinyu-intel @pengzhao-intel

    This PR is to add quantized FCN models into GluonCV model-zoo. With this PR, INT8 FCN model can get more than ~4x speedup on AWS C5.12xlarge.

    opened by wuxun-zhang 22
  • cuda8.0 can't train YOLOv3   Loss : nan

    cuda8.0 can't train YOLOv3 Loss : nan

    When I use cuda8.0, I run the yolov3 script using 2 GPUs, just changing the batch-size to 32, I got the loss nan:

    INFO:root:[Epoch 0][Batch 99], LR: 5.99E-05, Speed: 31.597 samples/sec, ObjLoss=nan, BoxCenterLoss=nan, BoxScaleLoss=nan, ClassLoss=nan
    INFO:root:[Epoch 0][Batch 199], LR: 1.20E-04, Speed: 32.253 samples/sec, ObjLoss=nan, BoxCenterLoss=nan, BoxScaleLoss=nan, ClassLoss=nan
    INFO:root:[Epoch 0][Batch 299], LR: 1.81E-04, Speed: 31.947 samples/sec, ObjLoss=nan, BoxCenterLoss=nan, BoxScaleLoss=nan, ClassLoss=nan
    

    When I comment the net.hybridize() in train() and validate() as mentioned here , I can run it with proper loss, but sacrificing the training speed.

    Besides, If I use batch-szie=4, the loss won't become nan with net.hybridize() so I guess that it is not the smaller batch size resulting in nan.

    cuda9.0 with bath-size=32 is also OK.

    opened by ShoufaChen 20
  • MackRCNN C++ Deployment Not Working in GPU Mode

    MackRCNN C++ Deployment Not Working in GPU Mode

    Hi,

    Thanks for the great work! I am pretty interested in C++ deployment so just tried for object detection, and then used maskrcnn for detection to see if it works.

    Setting: MXNet: 1.3 GPU: 1080Ti System: Ubuntu 16.04 Gluon CV: master branch image: dog.jpg in object detection tutorial. code: cpp-inference, built bin gluon-detect.

    The faster-rcnn, ssd and yolov3 work well in both cpu and gpu version, with nice standard speed. The first frame is slow for warming gpu so the speed printed on screen doesn't matter.

    ./gluoncv-detect ../../export/faster_rcnn_resnet50_v1b_voc dog.jpg --gpu 0
    [11:03:33] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:98: Using Pascal VOC names...
    [11:03:33] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:120: Using GPU(0)...
    [11:03:35] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:134: Image shape: 640 x 480
    [11:03:41] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:155: Elapsed time {Forward->Result}: 5138.46 ms
    [11:03:44] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:177: Start Ploting with visualize score threshold: 0.3
    [11:03:44] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: bicycle, scores: 0.951641
    [11:03:44] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: car, scores: 0.999761
    [11:03:44] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: dog, scores: 0.999168
    [11:03:44] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: motorbike, scores: 0.301618
    [11:03:44] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: pottedplant, scores: 0.373259
    [11:03:44] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: pottedplant, scores: 0.370312
    
    ./gluoncv-detect ../../export/ssd_512_resnet50_v1_voc dog.jpg --gpu 0
    [10:26:46] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:98: Using Pascal VOC names...
    [10:26:46] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:120: Using GPU(0)...
    [10:26:49] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:134: Image shape: 640 x 480
    [10:26:54] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:109: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
    [10:27:07] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:154: Elapsed time {Forward->Result}: 17022.7 ms
    [10:27:07] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:177: Start Ploting with visualize score threshold: 0.3
    [10:27:07] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: car, scores: 0.998562
    [10:27:07] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: dog, scores: 0.990034
    [10:27:07] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: bicycle, scores: 0.984405
    
    ./gluoncv-detect ../../export/yolo3_darknet53_voc dog.jpg 
    [10:21:31] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:98: Using Pascal VOC names...
    [10:21:31] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:134: Image shape: 640 x 480
    [10:21:36] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:154: Elapsed time {Forward->Result}: 4435.19 ms
    [10:21:36] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:177: Start Ploting with visualize score threshold: 0.3
    [10:21:36] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: car, scores: 0.996979
    [10:21:36] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: dog, scores: 0.995044
    [10:21:36] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: bicycle, scores: 0.991447
    

    When I try maskrcnn, in CPU mode it works well, though slow.

    ./gluoncv-detect ../../export/mask_rcnn_resnet50_v1b_coco dog.jpg 
    [10:22:26] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:103: Using COCO names...
    [10:22:27] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:134: Image shape: 640 x 480
    
    [10:23:53] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:154: Elapsed time {Forward->Result}: 86116.8 ms
    [10:23:53] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:177: Start Ploting with visualize score threshold: 0.3
    [10:23:53] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: bicycle, scores: 0.9996
    [10:23:53] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: dog, scores: 0.997231
    [10:23:53] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: cat, scores: 0.780812
    [10:23:53] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: car, scores: 0.739689
    [10:23:53] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: bicycle, scores: 0.678044
    [10:23:53] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: bicycle, scores: 0.535232
    [10:23:53] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: truck, scores: 0.41138
    [10:23:53] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: potted plant, scores: 0.360993
    

    While in GPU mode, it failed somehow:

    ./gluoncv-detect ../../export/mask_rcnn_resnet50_v1b_coco dog.jpg --gpu 0
    [10:24:58] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:103: Using COCO names...
    [10:24:58] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:120: Using GPU(0)...
    [10:25:02] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:134: Image shape: 640 x 480
    [10:25:06] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:109: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
    [10:25:20] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:154: Elapsed time {Forward->Result}: 18267.5 ms
    terminate called after throwing an instance of 'dmlc::Error'
      what():  [10:25:20] /home/chongzhao/mxnet-1.3/cpp-package/include/mxnet-cpp/ndarray.hpp:236: Check failed: MXNDArrayWaitToRead(blob_ptr_->handle_) == 0 (-1 vs. 0) 
    
    Stack trace returned 8 entries:
    [bt] (0) ./gluoncv-detect(dmlc::StackTrace[abi:cxx11]()+0x54) [0x4af2cc]
    [bt] (1) ./gluoncv-detect(dmlc::LogMessageFatal::~LogMessageFatal()+0x2a) [0x4af598]
    [bt] (2) ./gluoncv-detect(mxnet::cpp::NDArray::WaitToRead() const+0xca) [0x4b3c96]
    [bt] (3) ./gluoncv-detect(viz::PlotBbox(cv::Mat, mxnet::cpp::NDArray, mxnet::cpp::NDArray, mxnet::cpp::NDArray, float, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::map<int, cv::Scalar_<double>, std::less<int>, std::allocator<std::pair<int const, cv::Scalar_<double> > > >, bool)+0x124) [0x4bbb38]
    [bt] (4) ./gluoncv-detect(RunDemo()+0x693) [0x4ab38b]
    [bt] (5) ./gluoncv-detect(main+0x25) [0x4ab82d]
    [bt] (6) /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fbeeddf6830]
    [bt] (7) ./gluoncv-detect(_start+0x29) [0x4a8819]
    
    
    Aborted (core dumped)
    

    Is there anything I can work on to solve the problem? Or any suggestions? Thanks a lot!

    Stale 
    opened by kuonangzhe 20
  • RCNN w/ FPN, Sync BN & Disable static alloc for RCNN

    RCNN w/ FPN, Sync BN & Disable static alloc for RCNN

    • Faster RCNN w/ FPN (refactored from @Angzz ) model zoo not uploaded yet
      • e2e_fasterrcnn_fpn_resnet50_v1b_2x_coco17 mAP: 0.384 | caffe baseline: 0.379
      • e2e_fasterrcnn_fpn_bn_resnet50_v1b_2x_coco17 mAP: 0.393 | caffe baseline: N/A
      • e2e_fasterrcnn_fpn_resnet101_v1d_2x_coco17 mAP: 0.412 caffe baseline: 0.398
    • Mask RCNN w/ FPN model zoo not uploaded yet
      • e2e_maskrcnn_fpn_resnet50_v1b_2x_coco17 boxAP: 0.392, segmAP: 0.353 | caffe boxAP baseline: 0.386 caffe segmAP baseline: 0.345
      • e2e_maskrcnn_fpn_resnet101_v1d_2x_coco17 boxAP: 0.423, segmAP: 0.377 | caffe boxAP baseline: 0.409 caffe segmAP baseline: 0.364
    • disabled static alloc for RCNN to save memory
    • added mask rcnn resnet 101 v1b
    opened by Jerryzcn 19
  • [WIP]A decomposition of lr scheduler

    [WIP]A decomposition of lr scheduler

    Following the data transformation of data loader, I'm decomposing the learning rate scheduler into basic blocks (constant, linear, cosine, poly, and possibly more).

    To construct a learning rate scheduler with multiple stages, one can Compose blocks together. This offers flexibility in the learning rate schedule, for instance now we can design a learning rate schedule as:

    1. warmup for 5 epochs
    2. cosine schedule for 100 epochs
    3. constant schedule for 5 epochs
    4. smaller constant schedule for 5 epochs

    TODO:

    • [x] Test with image classification training
    • [x] Modify affected detection/segmentation scripts
    • [x] Unit tests.
    opened by hetong007 19
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 1
  • IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

    IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

    Hello! I used the train_ssd.py before with one class and everything worked fine. Now I have a two class dataset and modified the train_ssd.py this way:

    1. Added class
    class VOCLike(VOCDetection):
        CLASSES = ['ambr_box', 'gorch_box']
        def __init__(self, root, splits, transform=None, index_map=None, preload_label=True):
            super(VOCLike, self).__init__(root, splits, transform, index_map, preload_label)
    
    1. Modified get_dataset:
    if dataset.lower() == 'voc':
            train_dataset = VOCLike(root='/data_voc', splits=((2018, 'trainval'),))
            val_dataset = VOCLike(root='/data_voc', splits=((2018, 'test'),))
            val_metric = VOC07MApMetric(iou_thresh=0.5, class_names=['ambr_box', 'gorch_box'])
    
    1. Modified name == 'main':
        net_name = '_'.join(('ssd', str(args.data_shape), args.network, 'custom'))
        args.save_prefix += net_name
        if args.syncbn and len(ctx) > 1:
            net = get_model(net_name, pretrained_base=True, norm_layer=gluon.contrib.nn.SyncBatchNorm,
                            norm_kwargs={'num_devices': len(ctx)}, classes = ['ambr_box', 'gorch_box'])
            async_net = get_model(net_name, pretrained_base=False)  # used by cpu worker
        else:
            net = get_model(net_name, pretrained_base=True, norm_layer=gluon.nn.BatchNorm, classes = ['ambr_box', 'gorch_box'])
            async_net = net
    

    but I still got the issue. Please help.

    Traceback (most recent call last): File "train_ssd.py", line 425, in <module> train(net, train_data, val_data, eval_metric, ctx, args) File "train_ssd.py", line 310, in train for i, batch in enumerate(train_data): File "/usr/local/lib/python3.8/dist-packages/mxnet/gluon/data/dataloader.py", line 689, in __iter__ for item in t: File "/usr/local/lib/python3.8/dist-packages/mxnet/gluon/data/dataloader.py", line 699, in same_process_iter ret = self._batchify_fn([self._dataset[idx] for idx in batch]) File "/usr/local/lib/python3.8/dist-packages/mxnet/gluon/data/dataloader.py", line 699, in <listcomp> ret = self._batchify_fn([self._dataset[idx] for idx in batch]) File "/usr/local/lib/python3.8/dist-packages/mxnet/gluon/data/dataset.py", line 219, in __getitem__ return self._fn(*item) File "/usr/local/lib/python3.8/dist-packages/gluoncv/data/transforms/presets/ssd.py", line 167, in __call__ bbox = tbbox.translate(label, x_offset=expand[0], y_offset=expand[1]) File "/usr/local/lib/python3.8/dist-packages/gluoncv/data/transforms/bbox.py", line 160, in translate bbox[:, :2] += (x_offset, y_offset) IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

    opened by Mashaakim 0
  • CVE-2007-4559 Tar Vulnerability

    CVE-2007-4559 Tar Vulnerability

    Ref: https://github.com/advisories/GHSA-gw9q-c7gh-j9vm Scope: tar module used in GluonCV and the extraction method contain a vulnerability that enables remote code execution. https://github.com/dmlc/gluon-cv/blob/40a216547ec5322c851b1f8ce2a4dd7d4b7a6004/gluoncv/auto/data/data_zoo.py#L71

    The scope and patch of this bug were discussed in an open-source project that uses GluonCV by Trellix Vulnerability Team. Link

    The patch adds a sanity check before extracting the Tar file to ensure that all files belong to the tar itself.

    Quote:

    Patching https://github.com/advisories/GHSA-gw9q-c7gh-j9vm Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named https://github.com/advisories/GHSA-gw9q-c7gh-j9vm. https://github.com/advisories/GHSA-gw9q-c7gh-j9vm is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against https://github.com/advisories/GHSA-gw9q-c7gh-j9vm. Further technical information about the vulnerability can be found in this blog. If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    Patch code: https://github.com/BMW-InnovationLab/BMW-Semantic-Segmentation-Training-GUI/pull/1/commits/256bb8875e40f661ef8dcce82b86cf55e555b69f

    opened by hadikoub 0
  • MXnet feature extraction error

    MXnet feature extraction error

    Hi, I run this script: python feat_extract.py --data-list video.txt --model i3d_resnet50_v1_kinetics400 --save-dir ./features

    but unfortunately, the error appears:

    INFO:logger:Namespace(data_aug='v1', data_dir='', data_list='video.txt', dtype='float32', fast_temporal_stride=2, gpu_id=0, hashtag='', input_size=224, log_interval=10, mode=None, model='i3d_resnet50_v1_kinetics400', need_root=False, new_height=256, new_length=32, new_step=1, new_width=340, num_classes=400, num_crop=1, num_segments=1, resume_params='', save_dir='./features', slow_temporal_stride=16, slowfast=False, ten_crop=False, three_crop=False, use_decord=True, use_pretrained=True, video_loader=True)
    Traceback (most recent call last):
      File "feat_extract.py", line 215, in <module>
        main(logger)
      File "feat_extract.py", line 156, in main
        feat_ext=True, num_segments=opt.num_segments, num_crop=opt.num_crop)
      File "/home/tekom/anaconda3/envs/i3d_extractor/lib/python3.7/site-packages/gluoncv/model_zoo/model_zoo.py", line 409, in get_model
        net = _models[name](**kwargs)
      File "/home/tekom/anaconda3/envs/i3d_extractor/lib/python3.7/site-packages/gluoncv/model_zoo/action_recognition/i3d_resnet.py", line 673, in i3d_resnet50_v1_kinetics400
        **kwargs)
      File "/home/tekom/anaconda3/envs/i3d_extractor/lib/python3.7/site-packages/gluoncv/model_zoo/action_recognition/i3d_resnet.py", line 537, in __init__
        self.init_weights(ctx)
      File "/home/tekom/anaconda3/envs/i3d_extractor/lib/python3.7/site-packages/gluoncv/model_zoo/action_recognition/i3d_resnet.py", line 542, in init_weights
        self.first_stage.initialize(ctx=ctx)
      File "/home/tekom/anaconda3/envs/i3d_extractor/lib/python3.7/site-packages/mxnet/gluon/block.py", line 656, in initialize
        self.collect_params().initialize(init, ctx, verbose, force_reinit)
      File "/home/tekom/anaconda3/envs/i3d_extractor/lib/python3.7/site-packages/mxnet/gluon/parameter.py", line 896, in initialize
        v.initialize(None, ctx, init, force_reinit=force_reinit)
      File "/home/tekom/anaconda3/envs/i3d_extractor/lib/python3.7/site-packages/mxnet/gluon/parameter.py", line 472, in initialize
        self._finish_deferred_init()
      File "/home/tekom/anaconda3/envs/i3d_extractor/lib/python3.7/site-packages/mxnet/gluon/parameter.py", line 349, in _finish_deferred_init
        data = zeros_fn(**kwargs)
      File "/home/tekom/anaconda3/envs/i3d_extractor/lib/python3.7/site-packages/mxnet/ndarray/utils.py", line 67, in zeros
        return _zeros_ndarray(shape, ctx, dtype, **kwargs)
      File "/home/tekom/anaconda3/envs/i3d_extractor/lib/python3.7/site-packages/mxnet/ndarray/ndarray.py", line 4752, in zeros
        return _internal._zeros(shape=shape, ctx=ctx, dtype=dtype, **kwargs)
      File "<string>", line 39, in _zeros
      File "mxnet/cython/ndarray.pyx", line 162, in mxnet._cy3.ndarray._imperative_invoke
    TypeError: _imperative_invoke() takes exactly 5 positional arguments (7 given)
    

    Any help would be appreciated. Thanks.

    opened by kevinjcai 0
  • Minor docstring and variable rename for mask.fill()

    Minor docstring and variable rename for mask.fill()

    After change 5bc2c075733e672a9a7c5e14bc474788f88e90de mask.fill() now takes list of masks, but still uses single bounding box. This fixes inconsistencies in docstring and variable naming.

    opened by justinokamoto 0
Releases(v0.10.0)
  • v0.10.0(Mar 9, 2021)

    Highlights

    GluonCV 0.10.0 release features a new Auto Module designed to bootstrap training tasks with less code and effort:

    • simpler and better custom dataset loading experience with pandas DataFrame visualization. Comparing with obsolete code based dataset composition, it allows you to load arbitrary datasets faster and more reliable.

    • one liner fit function with configuration file support(yaml configuration file)

    • built-in HPO support, for effortless tuning of hyper-parameters

    gluoncv.auto

    This release includes a new module called gluoncv.auto, with gluoncv.auto you can access many high-level APIs such as data, estimators and tasks.

    gluoncv.auto.data

    auto.data module is designed to load arbitrary web datasets you find on the internet, such as Kaggle competition datasets. You may refer to this tutorial or check out the fully compatible d8 dataset for loading custom datasets.

    Loading data:

    The dataset has internal DataFrame storage for easier access and analysis

    Visualization:

    similar for object detection:

    gluoncv.auto.estimators

    In this release, we packed the following high-level estimators for training and predicting images for image classification and object detection.

    • gluoncv.auto.estimators.ImageClassificationEstimator
    • gluoncv.auto.estimators.SSDEstimator
    • gluoncv.auto.estimators.CenterNetEstimator
    • gluoncv.auto.estimators.FasterRCNNEstimator
    • gluoncv.auto.estimators.YOLOv3Estimator

    Highlighted usages

    • fit function:

    image

    • predict, predict_proba(for image classification), predict_feature(for image classification)
    • save and load.

    You may visit the tutorial website for more detailed examples.

    gluoncv.auto.tasks

    In this release, the following auto tasks are supported and have been massively tested on many datasets to ensure HPO performance:

    • gluoncv.auto.tasks.ImageClassification
    • gluoncv.auto.tasks.ObjectDetection

    Comparing with pure algorithm-based estimators, the auto tasks provide identical APIs and functionalities but allow you to fit with hyper-parameter optimization(HPO) with specified num_trials and time_limit. For object detection, it allows multiple algorithms(e.g., SSDEstimator and FasterRCNNEstimator) to be tuned as a categorical search space.

    The tutorial is available here

    Bug fixes and improvements

    • Improved training speed for mask-rcnn script (#1595, #1609)
    • Fix an issue in classification dataset (#1599)
    • Fix a batch-size issue for mask-rcnn validation during training (#1594)
    • Fix an os directory issue for model zoo folder (#1591)
    • Improved CI stability (#1581)
    Source code(tar.gz)
    Source code(zip)
  • v0.9.0(Dec 2, 2020)

    Highlights

    GluonCV v0.9.0 starts to support PyTorch!

    PyTorch Support

    We want to make our toolkit agnostic to deep learning frameworks so that it is available for everyone. From this release, we start to support PyTorch. All PyTorch code and models are under torch folder inside gluoncv, arranged in the same hierarchy as before: model, data, nn and utils. model folder contains our model zoo with model definitions, data folder contains dataset definition and dataloader, nn defines new operators and utils provide utility functions to help model training, evaluation and visualization.

    To get started, you can find installation instructions, model zoo and tutorials on our website. In order to make our toolkit easier to use and customize, we provide model definitions separately for each method without extreme abstraction and modularization. In this manner, you can play with each model without jumping across multiple files, and you can modify individual model implementation without affecting other models. At the same time, we adopt yaml for easier configuration. We thrive to make our toolkit more user friendly for students and researchers.

    Video Action Recognition PyTorch Model Zoo

    We have 46 PyTorch models for video action recognition, with better I3D models, more recent TPN family, faster training (DDP support and multi-grid) and K700 pretrained weights. Finetuning and feature extraction can never be easier.

    Details of our model zoo can be seen at here. In terms of models, we cover TSN, I3D, I3D_slow, R2+1D, Non-local, CSN, TSN and TPN. In terms of datasets, we cover Kinetics400, Kinetics700 and Something-something-v2. All of our models have similar or better performance compared to numbers reported in original paper.

    We provide several tutorials to get you started, including how to make predictions using a pretrained model, how to extract video features from a pretrained model, how to finetune a model on your dataset, how to measure a model's flops/speed, and how to use our DDP framework.

    Since video models are slow to train (due to slow IO and large model), we also support distributed dataparallel (DDP) training and multi-grid training. DDP can provide 2x speed up and multi-grid training can provide 3-4x speed up. Combining these two techniques can significantly shorten the training process. In addition, both techniques are provided as helper functions. You can easily add your model definitions to GluonCV (a single python file like this) and enjoy the speed brought by our framework. More details can be read in this tutorial.

    Bug fixes and Improvements

    • Refactored table in csv form. (#1465 )
    • Added DeepLab ResNeSt200 pretrained weights (#1456 )
    • StyleGAN training instructions (#1446 )
    • More settings for Monodepth2 and bug fix (#1459 #1472 )
    • Fix RCNN target generator (#1508)
    • Revise DANet (#1507 )
    • New docker image is added which is ready for GluonCV applications and developments(#1474)

    Acknowledgement

    Special thanks to @Arthurlxy @ECHO960 @zhreshold @yinweisu for their support in this release. Thanks to @coocoo90 for contributing the CSN and R2+1D models. And thanks to other contributors for the bug fixes and improvements.

    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Aug 10, 2020)

    GluonCV 0.8.0 Release Note

    Highlights

    GluonCV v0.8.0 features the popular depth estimation model Monodepth2, semantic segmentation models (DANet and FastSCNN), StyleGAN, and multiple usability improvements.

    Monodepth2 (thanks @KuangHaofei )

    We provide GluonCV implementation of Monodepth2 and the results are fully reproducible. To try out on your own images, please see our demo tutorial. To train a Monodepth2 model on your own dataset, please see our dive deep tutorial.

    Following table shows its performance on the KITTI dataset. | Name | Modality | Resolution | Abs. Rel. Error | delta < 1.25 | Hashtag | | -- | -- | -- | -- | -- | -- | | monodepth2_resnet18_kitti_stereo_640x192 1 | Stereo | 640x192 | 0.114 | 0.856 | 92871317 |

    More Semantic Segmentation Models (thanks @xdeng7 and @ytian8 )

    We include two new semantic segmentation models in this release, one is DANet, the other is FastSCNN.

    Following table shows their performance on the Cityscapes validation set. | Model | Pre-Trained Dataset | Dataset | pixAcc | mIoU | |---------------------------|--------|-----|--------|-------| | danet_resnet50_citys | ImageNet | Cityscapes | 96.3 | 78.5 | | danet_resnet101_citys | ImageNet | Cityscapes | 96.5 | 80.1 | | fastscnn_citys | - | Cityscapes | 95.1 | 72.3 |

    Our FastSCNN is an improved version from a recent paper using semi-supervised learning. To our best knowledge, 72.3 mIoU is the highest-scored implementation of FastSCNN and one of the best real-time semantic segmentation models.

    StyleGAN (thanks @xdeng7 )

    A GluonCV implementation of StyleGAN "A Style-Based Generator Architecture for Generative Adversarial Networks": https://github.com/dmlc/gluon-cv/tree/master/scripts/gan/stylegan

    Bug fixes and Improvements

    • We now officially deprecated python2 support, the minimum required python 3 version is 3.6. (#1399)
    • Fixed Faster-RCNN training script (#1249)
    • Allow SRGAN to be hybridized (#1281)
    • Fix market1501 dataset (#1227)
    • Added Visdrone dataset (#1267)
    • Improved video action recognition task's train.py (#1339)
    • Added jetson object detection tutorial (#1346)
    • Improved guide for contributing new algorithms to GluonCV (#1354)
    • Fixed amp parameter that required in class ForwardBackwardTask (#1404)
    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(Apr 22, 2020)

    Highlights

    GluonCV 0.7 added our latest backbone network: ResNeSt, and the derived models for semantic segmentation and object detection. We achieve significant performance improvement on all three tasks.

    Image Classification

    GluonCV now provides the state-of-art image classification backbones that can be used by various downstream tasks. Our ResNeSt outperforms EfficientNet in accuracy-speed trade-off as shown in the following figures. You can now swap in our new ResNeSt in your research or product to get immediate performance improvement. Checkout the detail in our paper: ResNeSt: Split Attention Network

    Here is a comparison between ResNeSt and EfficientNet. The average latency is computed using a single V100 on a p3dn.24xlarge machine with a batch size of 16.

    resnest_vs_efficientnet

    Model | input size | top-1 acc (%) | avg latency (ms) |   -- | -- | -- | -- | -- SENet_154 | 224x224 | 81.26 | 5.07 | previous ResNeSt50 | 224x224 | 81.13 | 1.78 | v0.7 ResNeSt101 | 256x256 | 82.83 | 3.43 | v0.7 ResNeSt200 | 320x320 | 83.90 | 9.49 | v0.7 ResNeSt269 | 416x416 | 84.54 | 19.50 | v0.7

    Object Detection

    We add two new ResNeSt based Faster R-CNN model. Noted that our model is trained using 2x learning rate schedule instead of the 1x schedule used in our paper. Our two new models are 2-4% higher on COCO mAP than our previous best model “faster_rcnn_fpn_resnet101_v1d_coco”. Notebly, our ResNeSt-50 based model has a 4.1% higher mAP than our previous ResNet-101 based model.

    Model | Backbone | mAP |   -- | -- | -- | -- Faster R-CNN | ResNet-101 | 40.8 | previous Faster R-CNN | ResNeSt-50 | 42.7 | v0.7 Faster R-CNN | ResNeSt-101 | 44.9 | v0.7

    Semantic Segmentation

    We add ResNeSt-50 and ResNeSt-101 based DeepLabV3 for semantic segmentation task on ADE20K dataset. Our new models are 1-2.8% higher than our previous best. Similar to our detection result, ResNeSt-50 performs better than ResNet-101 based model. DeepLabV3 with ResNeSt-101 backbone achieves a new state-of-the-art of 46.9 mIoU on ADE20K validation set, which outperform previous best by more than 1%.

    Model | Backbone | pixel Accuracy | mIoU |   -- | -- | -- | -- | -- DeepLabV3 | ResNet-101 | 81.1 | 44.1 | previous DeepLabV3 | ResNeSt-50 | 81.2 | 45.1 | v0.7 DeepLabV3 | ResNeSt-101 | 82.1 | 46.9 | v0.7

    Bug fixes and Improvements

    • Instructions for achieving 25.7 min Mask R-CNN training.
    • Fix R-CNNs export
    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Jan 13, 2020)

    GluonCV 0.6.0 Release

    Highlights

    GluonCV v0.6.0 added more video classification models, added pose estimation models that are suitable for mobile inference, added quantized models for video classification and pose estimation, and we also included multiple usability and code improvements.

    More video action recognition models

    https://gluon-cv.mxnet.io/model_zoo/action_recognition.html

    We now provide state-of-the-art video classification networks, such as I3D, I3D-Nonlocal and SlowFast. We have a complete model zoo over several widely adopted video datasets. We provide a general video dataloader (which can handle both frame format and raw video format). Users can do training, fine-tuning, prediction and feature extraction without writing complicate code. Just prepare a text file containing the video information is enough.

    Below is the table of new models included in this release.

    Name | Pretrained | Segments | Clip Length | Top-1 | Hashtag | -- | -- | -- | -- | -- | -- | inceptionv1_kinetics400 | ImageNet | 7 | 1 | 69.1 | 6dcdafb1 | inceptionv3_kinetics400 | ImageNet | 7 | 1 | 72.5 | 8a4a6946 | resnet18_v1b_kinetics400 | ImageNet | 7 | 1 | 65.5 | 46d5a985 | resnet34_v1b_kinetics400  | ImageNet | 7 | 1 | 69.1 | 8a8d0d8d | resnet50_v1b_kinetics400  | ImageNet | 7 | 1 | 69.9 | cc757e5c | resnet101_v1b_kinetics400  | ImageNet | 7 | 1 | 71.3 | 5bb6098e | resnet152_v1b_kinetics400  | ImageNet | 7 | 1 | 71.5 | 9bc70c66 | i3d_inceptionv1_kinetics400  | ImageNet | 1 | 32 (64/2) | 71.8 | 81e0be10 | i3d_inceptionv3_kinetics400  | ImageNet | 1 | 32 (64/2) | 73.6 | f14f8a99 | i3d_resnet50_v1_kinetics400  | ImageNet | 1 | 32 (64/2) | 74.0 | 568a722e | i3d_resnet101_v1_kinetics400  | ImageNet | 1 | 32 (64/2) | 75.1 | 6b69f655 | i3d_nl5_resnet50_v1_kinetics400  | ImageNet | 1 | 32 (64/2) | 75.2 | 3c0e47ea | i3d_nl10_resnet50_v1_kinetics400  | ImageNet | 1 | 32 (64/2) | 75.3 | bfb58c41 | i3d_nl5_resnet101_v1_kinetics400  | ImageNet | 1 | 32 (64/2) | 76.0 | fbfc1d30 | i3d_nl10_resnet101_v1_kinetics400  | ImageNet | 1 | 32 (64/2) | 76.1 | 59186c31 | slowfast_4x16_resnet50_kinetics400  | ImageNet | 1 | 36 (64/1) | 75.3 | 9d650f51 | slowfast_8x8_resnet50_kinetics400  | ImageNet | 1 | 40 (64/1) | 76.6 | d6b25339 | slowfast_8x8_resnet101_kinetics400  | ImageNet | 1 | 40 (64/1) | 77.2 | fbde1a7c | resnet50_v1b_ucf101  | ImageNet | 3 | 1 | 83.7 | d728ecc7 | i3d_resnet50_v1_ucf101 | ImageNet | 1 | 32 (64/2) | 83.9 | 7afc7286 | i3d_resnet50_v1_ucf101  | Kinetics400 | 1 | 32 (64/2) | 95.4 | 760d0981 | resnet50_v1b_hmdb51  | ImageNet | 3 | 1 | 55.2 | 682591e2 | i3d_resnet50_v1_hmdb51  | ImageNet | 1 | 32 (64/2) | 48.5 | 0d0ad559 | i3d_resnet50_v1_hmdb51  | Kinetics400 | 1 | 32 (64/2) | 70.9 | 2ec6bf01 | resnet50_v1b_sthsthv2  | ImageNet | 8 | 1 | 35.5 | 80ee0c6b | i3d_resnet50_v1_sthsthv2  | ImageNet | 1 | 16 (32/2) | 50.6 | 01961e4c |

    We include tutorials for how to fine-tune a pre-trained model on users' own dataset. https://gluon-cv.mxnet.io/build/examples_action_recognition/finetune_custom.html

    We include tutorials for introducing a new efficient video reader, Decord. https://gluon-cv.mxnet.io/build/examples_action_recognition/decord_loader.html

    We include tutorials for how to extract features from a pre-trained model. https://gluon-cv.mxnet.io/build/examples_action_recognition/feat_custom.html

    We include tutorials for how to make predictions from a pre-trained model. https://gluon-cv.mxnet.io/build/examples_action_recognition/demo_custom.html

    We include tutorials for how to perform distributed training on deep video models. https://gluon-cv.mxnet.io/build/examples_distributed/distributed_slowfast.html

    We include tutorials for how to prepare HMDB51 and Something-something-v2 dataset. https://gluon-cv.mxnet.io/build/examples_datasets/hmdb51.html https://gluon-cv.mxnet.io/build/examples_datasets/somethingsomethingv2.html

    We will provide Kinetics600 and Kinetics700 pre-trained models in the next release, please stay tuned.

    Mobile pose estimation models

    https://gluon-cv.mxnet.io/model_zoo/pose.html#mobile-pose-models

    |Model | OKS AP | OKS AP (with flip) | Hashtag | |-- | -- | -- | -- | |mobile_pose_resnet18_v1b  | 66.2/89.2/74.3 | 67.9/90.3/75.7 | dd6644eb | |mobile_pose_resnet50_v1b  | 71.1/91.3/78.7 | 72.4/92.3/79.8 | ec8809df | |mobile_pose_mobilenet1.0  | 64.1/88.1/71.2 | 65.7/89.2/73.4 | b399bac7 | |mobile_pose_mobilenetv2_1.0  | 63.7/88.1/71.0 | 65.0/89.2/72.3 | 4acdc130 | |mobile_pose_mobilenetv3_large  | 63.7/88.9/70.8 | 64.5/89.0/72.0 | 1ca004dc | |mobile_pose_mobilenetv3_small  | 54.3/83.7/59.4 | 55.6/84.7/61.7 | b1b148a9 |

    By replacing the backbone network, and use pixel shuffle layer instead of deconvolution, we can have models that are very fast. These models are suitable for edge device applications, tutorials on deployment will come soon.

    More Int8 quantized models

    https://gluon-cv.mxnet.io/build/examples_deployment/int8_inference.html Below CPU performance is benchmarked on AWS EC2 C5.12xlarge instance with 24 physical cores. Note that you will need nightly build of MXNet to properly use these new features.

    Model | Dataset | Batch Size | Speedup (INT8/FP32) | FP32 Accuracy | INT8 Accuracy -- | -- | -- | -- | -- | -- simple_pose_resnet18_v1b | COCO Keypoint | 128 | 2.55 | 66.3 | 65.9 simple_pose_resnet50_v1b | COCO Keypoint | 128 | 3.50 | 71.0 | 70.6 simple_pose_resnet50_v1d | COCO Keypoint | 128 | 5.89 | 71.6 | 71.4 simple_pose_resnet101_v1b | COCO Keypoint | 128 | 4.07 | 72.4 | 72.2 simple_pose_resnet101_v1d | COCO Keypoint | 128 | 5.97 | 73.0 | 72.7 vgg16_ucf101 | UCF101 | 64 | 4.46 | 81.86 | 81.41 inceptionv3_ucf101 | UCF101 | 64 | 5.16 | 86.92 | 86.55 resnet18_v1b_kinetics400 | Kinetics400 | 64 | 5.24 | 63.29 | 63.14 resnet50_v1b_kinetics400 | Kinetics400 | 64 | 6.78 | 68.08 | 68.15 inceptionv3_kinetics400 | Kinetics400 | 64 | 5.29 | 67.93 | 67.92

    For pose-estimation models, the accuracy metric is OKS AP w/o flip. Quantized 2D video action recognition models are calibrated with num-segments=3 (7 is for ResNet-based models).

    Bug fixes and Improvements

    • Performance of PSPNet using ResNet101 as backbone on Cityscapes (semantic segmentation) is improved from mIoU 77.1% to 79.9%, higher than the number reported in original paper.
    • We will deprecate Python2 support in the next release.
    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Sep 10, 2019)

    GluonCV 0.5.0 Release

    Highlights

    GluonCV v0.5.0 added Video Action Recognition models, added AlphaPose, added MobileNetV3, added VPLR semantic segmentation models for driving scenes, added more Int8 quantized models for deployment, and we also included multiple usability improvements.

    New Models released in 0.5

    | Model | Metric | 0.5 | |---------------------------|--------|-----| | vgg16_ucf101 | UCF101 Top-1 | 83.4 | | inceptionv3_ucf101 | UCF101 Top-1 | 88.1 | | inceptionv3_kinetics400 | Kinetics400 Top-1 | 72.5 | | alpha_pose_resnet101_v1b_coco | OKS AP (with flip) | 76.7/92.6/82.9 | | MobileNetV3_Large  | ImageNet Top-1 | 75.32 | | MobileNetV3_Small  | ImageNet Top-1 | 67.72 | | deeplab_v3b_plus_wideresnet_citys | Cityscapes mIoU | 83.5 |

    New application: Video Action Recognition

    https://gluon-cv.mxnet.io/model_zoo/action_recognition.html

    Video Action Recognition in GluonCV is a complete application set, including model definition, training scripts, useful loss and metric functions. We also included some pre-trained models and usage tutorials.

    | Model | Pre-Trained Dataset | Clip Length | Num of Segments | Metric | Dataset | Accuracy | |---------------------------|--------|-----|--------|-----|---|-----| | vgg16_ucf101 | ImageNet | 1 | 1 | Top-1 | UCF101 | 81.5 | | vgg16_ucf101 | ImageNet | 1 | 3 | Top-1 | UCF101 | 83.4 | | inceptionv3_ucf101 | ImageNet | 1 | 1 | Top-1 | UCF101 | 85.6 | | inceptionv3_ucf101 | ImageNet | 1 | 3 | Top-1 | UCF101 | 88.1 | | inceptionv3_kinetics400 | ImageNet | 1 | 3 | Top-1 | Kinetics400 | 72.5 |

    The tutorial for how to prepare UCF101 and Kinetics400 dataset: https://gluon-cv.mxnet.io/build/examples_datasets/ucf101.html and https://gluon-cv.mxnet.io/build/examples_datasets/kinetics400.html .

    The demo for using the pre-trained model to predict human actions: https://gluon-cv.mxnet.io/build/examples_action_recognition/demo_ucf101.html.

    The tutorial for how to train your own action recognition model: https://gluon-cv.mxnet.io/build/examples_action_recognition/dive_deep_ucf101.html.

    More state-of-the-art models (I3D, SlowFast, etc.) are coming in the next release. Stay tuned.

    New model: AlphaPose

    https://gluon-cv.mxnet.io/model_zoo/pose.html#alphapose

    | Model | Dataset |OKS AP | OKS AP (with flip) | |---------------------------|---|----|-------| | alpha_pose_resnet101_v1b_coco | COCO Keypoint | 74.2/91.6/80.7 | 76.7/92.6/82.9 |

    The demo for using the pre-trained AlphaPose model: https://gluon-cv.mxnet.io/build/examples_pose/demo_alpha_pose.html.

    New model: MobileNetV3

    https://gluon-cv.mxnet.io/model_zoo/classification.html#mobilenet

    | Model | Dataset | Top-1 | Top-5 | Top-1 (original paper) | |---------------------|------|-------|-------|--------| | MobileNetV3_Large  | ImageNet | 75.3 | 92.3 | 75.2 | | MobileNetV3_Small  | ImageNet | 67.7 | 87.5 | 67.4 |

    New model: Semantic Segmentation VPLR

    https://gluon-cv.mxnet.io/model_zoo/segmentation.html#cityscapes-dataset

    | Model | Pre-Trained Dataset | Dataset | mIoU | iIoU | |---------------------------|--------|-----|--------|-------| | deeplab_v3b_plus_wideresnet_citys | ImageNet, Mapillary Vista | Cityscapes | 83.5 | 64.4 |

    Improving Semantic Segmentation via Video Propagation and Label Relaxation ported in GluonCV. State-of-the-art method on several driving semantic segmentation benchmarks (Cityscapes, CamVid and KITTI), and generalizes well to other scenes.

    New model: More Int8 quantized models

    https://gluon-cv.mxnet.io/build/examples_deployment/int8_inference.html Below CPU performance is benchmarked on AWS EC2 C5.12xlarge instance with 24 physical cores. Note that you will need nightly build of MXNet to properly use these new features.

    Model | Dataset | Batch Size | C5.12xlarge FP32 | C5.12xlarge INT8 | Speedup | FP32 Acc | INT8 Acc -- | -- | -- | -- | -- | -- | -- | -- FCN_resnet101 | VOC | 1 | 5.46 | 26.33 | 4.82 | 97.97% | 98.00% PSP_resnet101 | VOC | 1 | 3.96 | 10.63 | 2.68 | 98.46% | 98.45% Deeplab_resnet101 | VOC | 1 | 4.17 | 13.35 | 3.20 | 98.36% | 98.34% FCN_resnet101 | COCO | 1 | 5.19 | 26.22 | 5.05 | 91.28% | 90.96% PSP_resnet101 | COCO | 1 | 3.94 | 10.60 | 2.69 | 91.82% | 91.88% Deeplab_resnet101 | COCO | 1 | 4.15 | 13.56 | 3.27 | 91.86% | 91.98%

    For segmentation models, the accuracy metric is pixAcc. Usage of int8 quantized model is identical to standard GluonCV models, simple use suffix _int8.

    Bug fixes and Improvements

    • RCNN added automatic mix precision and horovod integration. Close to 4x improvements in training throughput on 8 V100 GPU.
    • RCNN added multi-image per device support.
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Mar 26, 2019)

    0.4.0 Release Note

    Highlights

    GluonCV v0.4 added Pose Estimation models, Int8 quantization for intel CPUs, added FPN Faster/Mask-RCNN, wide se/resnext models, and we also included multiple usability improvements.

    We highly suggest to use GluonCV 0.4.0 with MXNet>=1.4.0 to avoid some dependency issues. For some specific tasks you may need MXNet nightly build. See https://gluon-cv.mxnet.io/index.html

    New Models released in 0.4

    | Model | Metric | 0.4 | |---------------------------|--------|-----| | simple_pose_resnet152_v1b | OKS AP* | 74.2 | | simple_pose_resnet50_v1b | OKS AP* | 72.2 | | ResNext50_32x4d | ImageNet Top-1 | 79.32 | | ResNext101_64x4d | ImageNet Top-1 | 80.69 | | SE_ResNext101_32x4d | ImageNet Top-1 | 79.95 | | SE_ResNext101_64x4d | ImageNet Top-1 | 81.01 | | yolo3_mobilenet1.0_coco | COCO mAP | 28.6 |

    * Using Ground-Truth person detection results

    Int8 Quantization with Intel Deep Learning boost

    GluonCV is now integrated with Intel's vector neural network instruction(vnni) to accelerate model inference speed. Note that you will need a capable Intel Skylake CPU to see proper speed up ratio.

    Model | Dataset | Batch Size | C5.18x FP32 | C5.18x INT8 | Speedup | FP32 Acc | INT8 Acc -- | -- | -- | -- | -- | -- | -- | -- resnet50_v1 | ImageNet | 128 | 122.02 | 276.72 | 2.27 | 77.21%/93.55% | 76.86%/93.46% mobilenet1.0 | ImageNet | 128 | 375.33 | 1016.39 | 2.71 | 73.28%/91.22% | 72.85%/90.99% ssd_300_vgg16_atrous_voc* | VOC | 224 | 21.55 | 31.47 | 1.46 | 77.4 | 77.46 ssd_512_vgg16_atrous_voc* | VOC | 224 | 7.63 | 11.69 | 1.53 | 78.41 | 78.39 ssd_512_resnet50_v1_voc* | VOC | 224 | 17.81 | 34.55 | 1.94 | 80.21 | 80.16 ssd_512_mobilenet1.0_voc* | VOC | 224 | 31.13 | 48.72 | 1.57 | 75.42 | 75.04

    *nms_thresh=0.45, nms_topk=200

    Usage of int8 quantized model is identical to standard GluonCV models, simple use suffix _int8. For example, use resnet50_v1_int8 as int8 quantized version of resnet50_v1.

    Pruned ResNet

    https://gluon-cv.mxnet.io/model_zoo/classification.html#pruned-resnet

    Pruning channels of convolution layers is an very effective way to reduce model redundency which aims to speed up inference without sacrificing significant accuracy. GluonCV 0.4 has included several pruned resnets from original GluonCV SoTA ResNets for ImageNet.

    | Model | Top-1 | Top-5 | Hashtag | Speedup (to original ResNet) | |-------------------|-------|-------|----------|------------------------------| | resnet18_v1b_0.89 | 67.2 | 87.45 | 54f7742b | 2x | | resnet50_v1d_0.86 | 78.02 | 93.82 | a230c33f | 1.68x | | resnet50_v1d_0.48 | 74.66 | 92.34 | 0d3e69bb | 3.3x | | resnet50_v1d_0.37 | 70.71 | 89.74 | 9982ae49 | 5.01x | | resnet50_v1d_0.11 | 63.22 | 84.79 | 6a25eece | 8.78x | | resnet101_v1d_0.76 | 79.46 | 94.69 | a872796b | 1.8x | | resnet101_v1d_0.73 | 78.89 | 94.48 | 712fccb1 | 2.02x |

    Scripts for pruning resnets will be release in the future.

    More GANs(thanks @husonchen)

    SRGAN

    A GluonCV SRGAN of "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network ": https://github.com/dmlc/gluon-cv/tree/master/scripts/gan/srgan

    CycleGAN

    teaser

    Image-to-Image translation reproduced in GluonCV: https://github.com/dmlc/gluon-cv/tree/master/scripts/gan/cycle_gan

    Residual Attention Network(thanks @PistonY)

    GluonCV implementation of https://arxiv.org/abs/1704.06904

    figure2

    New application: Human Pose Estimation

    https://gluon-cv.mxnet.io/model_zoo/pose.html

    sphx_glr_demo_simple_pose_001

    Human Pose Estimation in GluonCV is a complete application set, including model definition, training scripts, useful loss and metric functions. We also included some pre-trained models and usage tutorials.

    | Model | OKS AP | OKS AP (with flip) | |--------------------------------------------------|----------------|--------------------| | simple_pose_resnet18_v1b | 66.3/89.2/73.4 | 68.4/90.3/75.7 | | simple_pose_resnet18_v1b | 52.8/83.6/57.9 | 54.5/84.8/60.3 | | simple_pose_resnet50_v1b | 71.0/91.2/78.6 | 72.2/92.2/79.9 | | simple_pose_resnet50_v1d | 71.6/91.3/78.7 | 73.3/92.4/80.8 | | simple_pose_resnet101_v1b | 72.4/92.2/79.8 | 73.7/92.3/81.1 | | simple_pose_resnet101_v1d | 73.0/92.2/80.8 | 74.2/92.4/82.0 | | simple_pose_resnet152_v1b | 72.4/92.1/79.6 | 74.2/92.3/82.1 | | simple_pose_resnet152_v1d | 73.4/92.3/80.7 | 74.6/93.4/82.1 | | simple_pose_resnet152_v1d | 74.8/92.3/82.0 | 76.1/92.4/83.2 | 2f544338 |

    Feature Pyramid Network for Faster/Mask-RCNN

    | Model | bbox/seg mAP | Caffe bbox/seg | |--------------------------------------------------|----------------|--------------------| | faster_rcnn_fpn_resnet50_v1b_coco | 0.384/- | 0.379 | | faster_rcnn_fpn_bn_resnet50_v1b_coco | 0.393/- | - | | faster_rcnn_fpn_resnet101_v1d_coco | 0.412/- | 0.398/- | | maskrcnn_fpn_resnet50_v1b_coco | 0.392/0.353 | 0.386/0.345 | | maskrcnn_fpn_resnet101_v1d_coco | 0.423/0.377 | 0.409/0.364 |

    Bug fixes and Improvements

    • Now all resnet definitions in GluonCV support Synchronized BatchNorm
    • Now pretrained object detection models support reset_class for reuse partial category knowledge so some task may not need to finetune models anymore: https://gluon-cv.mxnet.io/build/examples_detection/skip_fintune.html#sphx-glr-build-examples-detection-skip-fintune-py
    • Fix some dataloader issue(need mxnet >= 1.4.0)
    • Fix some segmentation models that won't hybridize
    • Fix some detection model random Nan problems (require mxnet latest nightly build, >= 20190315)
    • Various other minor bug fixes
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Oct 16, 2018)

    0.3 Release Note

    Highlights

    Added 5 new algorithms and updated 38 pre-trained models with improved accuracy

    Compare 7 selected models

    | Model | Metric | 0.2 | 0.3 | Reference | | ------------------- | --------------------- | ------ | ------ | ------------------------------------------------------------ | | ResNet-50 | top-1 acc on ImageNet | 77.07% | 79.15% | 75.3% (Caffe impl) | | ResNet-101 | top-1 acc on ImageNet | 78.81% | 80.51% | 76.4% (Caffe impl) | | MobileNet 1.0 | top-1 acc on ImageNet | N/A | 73.28% | 70.9% (tensorflow impl) | | Faster-RCNN | mAP on COCO | N/A | 40.1% | 39.6% (Detectron) | | Yolo-v3 | mAP on COCO | N/A | 37.0% | 33.0% (paper) | | DeepLab-v3 | mIoU on VOC | N/A | 86.7% | 85.7% (paper) | | Mask-RCNN | mask AP on COCO | N/A | 33.1% | 32.8% (Detectron) |

    Interactive visualizations for pre-trained models

    For image classification:

    and for object detection

    Deploy without Python

    All models are hybridiziable. They can be deployed without Python. See tutorials to deploy these models in C++.

    New Models with Training Scripts

    DenseNet, DarkNet, SqueezeNet for image classification

    We now provide a broader range of model families that are good for out of box usage and various research purposes.

    YoloV3 for object detection

    Significantly more accurate than original paper. For example, we get 37.0% mAP on CoCo versus the original paper's 33.0%. The techniques we used will be included in a paper to be released later.

    Mask-RCNN for instance segmentation

    Accuracy now matches Caffe2 Detectron without FPN, e.g. 38.3% box AP and 33.1% mask AP on COCO with ResNet50.

    FPN support will come in future versions.

    DeepLabV3 for semantic segmentation.

    Slightly more accurate than original paper. For example, we get 86.7% mIoU on voc versus the original paper's 85.7%.

    WGAN

    Reproduced WGAN with ResNet

    Person Re-identification

    Provide a baseline model which achieved 93.1 best rank1 score on Market1501 dataset.

    Enhanced Models with Better Accuracy

    Faster R-CNN

    • Improved Pascal VOC model accuracy. mAP improves to 78.3% from previous version's 77.9%. VOC models with 80%+ mAP will be released with the tech paper.
    • Added models trained on COCO dataset.
      • Now Resnet50 model achieves 37.0 mAP, out-performs Caffe2 Detectron without FPN (36.5 mAP).
      • Resnet101 model achieves 40.1 mAP, out-performs Caffe2 Detectron with FPN(39.8 mAP)
    • FPN support will come in future versions.

    ResNet, MobileNet, DarkNet, Inception for image classifcation

    • Significantly improved accuracy for some models. For example, ResNet50_v1b gets 78.3% versus previous version's ResNet50_v1b's 77.07%.
    • Added models trained with mixup and distillation. For example, ResNet50_v1d has 3 versions: ResNet50_v1d_distill (78.67%), ResNet50_v1d_mixup (79.16%), ResNet50_v1d_mixup_distill (79.29%).

    Semantic Segmentation

    • Synchronized Batch Normalization training.
    • Added Cityscapes dataset and pretrained models.
    • Added training details for reproducing state-of-the-art on Pascal VOC and Provided COCO pre-trained models for VOC.

    Dependency

    GluonCV 0.3.0 now depends on incubator-mxnet >= 1.3.0, please update mxnet according to installation guide to avoid compatibility issues.

    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Jun 26, 2018)

    Gluon CV Toolkit v0.2 Release Notes

    Note: This release rely on some features of mxnet 1.3.0. You can early access these features by installing nightly build of mxnet.

    You can update mxnet with pip:

    pip install mxnet --upgrade --pre
    # or 
    pip install mxnet-cu90 --upgrade --pre
    

    New Features in 0.2

    Image Classification

    Highlight: Much more accurate pre-trained ResNet models on ImageNet classification

    These high accuracy models are updated to Gluon Model Zoo.

    • ResNet50 v1b achieves over 77% accuracy, ResNet101 v1b at 78.8%, and ResNet152 v1b over 79%.
    • Training with large batchsize, with float16 data type
    • Speeding up training with ImageRecordIter interface
    • ResNeXt for ImageNet and CIFAR10 classification
    • SE-ResNet(v1b) for ImageNet

    Object Detection

    Highlight: Faster-RCNN model with training/testing scripts

    • Faster-RCNN

      • RPN (region proposal network)
      • Region Proposal
      • ROI Align operator
    • Train SSD on COCO dataset

    Semantic Segmentation

    Highlight: PSPNet for Semantic Segmentation

    Datasets

    Added the following datasets and usage tutorials

    • MS COCO
    • ADE20k

    New Pre-trained Models in GluonCV

    • cifar_resnext29_16x64d
    • resnet{18|34|50|101}_v1b
    • ssd_512_mobilenet1.0_voc
    • faster_rcnn_resnet50_v2a_voc
    • ssd_300_vgg16_atrous_coco
    • ssd_512_vgg16_atrous_coco
    • ssd_512_resnet50_v1_coco
    • psp_resnet50_ade

    Breaking changes

    • Rename DilatedResnetV0 to ResNetV1b
    Source code(tar.gz)
    Source code(zip)
  • v0.1(May 1, 2018)

    Gluon CV Toolkit v0.1 Release Notes

    GluonCV provides implementations of state-of-the-art (SOTA) deep learning algorithms in computer vision. It is designed for helping engineers, researchers, and students to quickly prototype products, validate new ideas, and learning computer vision.

    Table of Contents

    • New Features
      • Tutorials

        • Image Classification (CIFAR + ImageNet demo + divedeep)
        • Object Detection (SSD demo + train + divedeep)
        • Semantic Segmentation (FCN demo + train)
      • Model Zoo

        • ResNet on ImageNet and CIFAR-10
        • SSD on VOC
        • FCN on VOC
        • Dilated ResNet
      • Training Scripts

        • Image Classification: Train ResNet on ImageNet and CIFAR-10, including Mix-Up training
        • Object Detection: Train SSD on PASCAL VOC
        • Semantic Segmentation Train FCN on PASCAL VOC
      • Util functions

        • Image Visualization:
          • plot_image
          • get_color_pallete for segmentation
        • Bounding Box Visualization
          • plot_bbox
        • Training Helpers
          • PolyLRScheduler
    Source code(tar.gz)
    Source code(zip)
Owner
Distributed (Deep) Machine Learning Community
A Community of Awesome Machine Learning Projects
Distributed (Deep) Machine Learning Community
IRON Kaggle project done while doing IRONHACK Bootcamp where we had to analyze and use a Machine Learning Project to predict future sales

IRON Kaggle project done while doing IRONHACK Bootcamp where we had to analyze and use a Machine Learning Project to predict future sales. In this case, we ended up using XGBoost because it was the o

1 Jan 04, 2022
Instant-nerf-pytorch - NeRF trained SUPER FAST in pytorch

instant-nerf-pytorch This is WORK IN PROGRESS, please feel free to contribute vi

94 Nov 22, 2022
Database Reasoning Over Text project for ACL paper

Database Reasoning over Text This repository contains the code for the Database Reasoning Over Text paper, to appear at ACL2021. Work is performed in

Facebook Research 320 Dec 12, 2022
Reproduce ResNet-v2(Identity Mappings in Deep Residual Networks) with MXNet

Reproduce ResNet-v2 using MXNet Requirements Install MXNet on a machine with CUDA GPU, and it's better also installed with cuDNN v5 Please fix the ran

Wei Wu 531 Dec 04, 2022
Code for the paper "Unsupervised Contrastive Learning of Sound Event Representations", ICASSP 2021.

Unsupervised Contrastive Learning of Sound Event Representations This repository contains the code for the following paper. If you use this code or pa

Eduardo Fonseca 81 Dec 22, 2022
Code repo for "RBSRICNN: Raw Burst Super-Resolution through Iterative Convolutional Neural Network" (Machine Learning and the Physical Sciences workshop in NeurIPS 2021).

RBSRICNN: Raw Burst Super-Resolution through Iterative Convolutional Neural Network An official PyTorch implementation of the RBSRICNN network as desc

Rao Muhammad Umer 6 Nov 14, 2022
PyTorch reimplementation of Diffusion Models

PyTorch pretrained Diffusion Models A PyTorch reimplementation of Denoising Diffusion Probabilistic Models with checkpoints converted from the author'

Patrick Esser 265 Jan 01, 2023
An open software package to develop BCI based brain and cognitive computing technology for recognizing user's intention using deep learning

An open software package to develop BCI based brain and cognitive computing technology for recognizing user's intention using deep learning

deepbci 272 Jan 08, 2023
[ICML 2020] Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control

PG-MORL This repository contains the implementation for the paper Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Contro

MIT Graphics Group 65 Jan 07, 2023
HairCLIP: Design Your Hair by Text and Reference Image

Overview This repository hosts the official PyTorch implementation of the paper: "HairCLIP: Design Your Hair by Text and Reference Image". Our single

322 Jan 06, 2023
CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection

CLOCs is a novel Camera-LiDAR Object Candidates fusion network. It provides a low-complexity multi-modal fusion framework that improves the performance of single-modality detectors. CLOCs operates on

Su Pang 254 Dec 16, 2022
Learning-based agent for Google Research Football

TiKick 1.Introduction Learning-based agent for Google Research Football Code accompanying the paper "TiKick: Towards Playing Multi-agent Football Full

Tsinghua AI Research Team for Reinforcement Learning 90 Dec 26, 2022
Assginment for UofT CSC420: Intro to Image Understanding

Run the code Open edge_detection.ipynb in google colab. Upload image1.jpg,image2.jpg and my_image.jpg to '/content/drive/My Drive'. chooose 'Run all'

Ziyi-Zhou 1 Feb 24, 2022
Differentiable Wavetable Synthesis

Differentiable Wavetable Synthesis

4 Feb 11, 2022
A Jinja extension (compatible with Flask and other frameworks) to compile and/or compress your assets.

A Jinja extension (compatible with Flask and other frameworks) to compile and/or compress your assets.

Jayson Reis 94 Nov 21, 2022
EgoNN: Egocentric Neural Network for Point Cloud Based 6DoF Relocalization at the City Scale

EgonNN: Egocentric Neural Network for Point Cloud Based 6DoF Relocalization at the City Scale Paper: EgoNN: Egocentric Neural Network for Point Cloud

19 Sep 20, 2022
[ICCV 2021] Deep Hough Voting for Robust Global Registration

Deep Hough Voting for Robust Global Registration, ICCV, 2021 Project Page | Paper | Video Deep Hough Voting for Robust Global Registration Junha Lee1,

57 Nov 28, 2022
Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic [Paper] [Colab is coming soon] Approach Example Usage To r

170 Jan 03, 2023
[CVPR 2021] Official PyTorch Implementation for "Iterative Filter Adaptive Network for Single Image Defocus Deblurring"

IFAN: Iterative Filter Adaptive Network for Single Image Defocus Deblurring Checkout for the demo (GUI/Google Colab)! The GUI version might occasional

Junyong Lee 173 Dec 30, 2022