In this tutorial, you will perform inference across 10 well-known pre-trained object detectors and fine-tune on a custom dataset. Design and train your own object detector.

Overview

Object Detection

Object detection is a computer vision task for locating instances of predefined objects in images or videos.

In this tutorial, you will:

  • Perform inference with 10 well-known pre-trained object detectors Open In Colab
  • Fine tune object detectors on a custom dataset Open In Colab
  • Design and train your own object detector Open In Colab

This work is based on MMDetection: OpenMMLab detection toolbox and benchmark.

10 Object detectors

Detector Paper
YOLOF You Only Look One-level Feature (2021)
YOLOX Exceeding YOLO Series in 2021 (2021)
DETR End-to-End Object Detection with Transformers (2020)
Deformable DETR Deformable Transformers for End-to-End Object Detection (2021)
SparseR-CNN End-to-End Object Detection with Learnable Proposals (2020)
VarifocalNet An IoU-aware Dense Object Detector (2020)
PAA Probabilistic Anchor Assignment with IoU Prediction for Object Detection (2020)
SABL Side-Aware Boundary Localization for More Precise Object Detection (2020)
ATSS Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection (2019)
Double Heads Rethinking Classification and Localization for Object Detection (2019)

Perform inference

Open In Colab

Here is how to load a pretrained model, perfrom inference and vizualize the results.

model = init_detector(config, checkpoint, device='cuda:0')
result = inference_detector(model, img)
show_result_pyplot(model, img, result, title=m_name, score_thr=0.6)

res_yolof

And more examples ...

od_res_collage


Fine tune object detectors on a custome dataset

Open In Colab

We can select a pre-trained model, and fine-tune it on a cutome dataset. kitti_tiny is a tiny version of the well known KITTI dataset is used here as the custom dataset.

kitti_sample

While fine-tuning a pretrained model, we should select a small learning rate, monitor the loss and malke sure it is decreasing, etc. For object detection, the loss is comosed of 2 main components:

  • Classification loss
  • Bounding Box loss
- Epoch [1][10/50]	loss_cls: 0.4056, loss_bbox: 0.3570, loss: 0.7625
- Epoch [1][20/50]	loss_cls: 0.3316, loss_bbox: 0.3062, loss: 0.6378
- Epoch [1][30/50]	loss_cls: 0.5663, loss_bbox: 0.3815, loss: 0.9478
- Epoch [1][40/50]	loss_cls: 0.4196, loss_bbox: 0.3604, loss: 0.7801
- Epoch [1][50/50]	loss_cls: 0.4421, loss_bbox: 0.2568, loss: 0.6989
- Epoch [2][10/50]	loss_cls: 0.2546, loss_bbox: 0.3497, loss: 0.6043
- Epoch [2][20/50]	loss_cls: 0.2809, loss_bbox: 0.4048, loss: 0.6857
- Epoch [2][30/50]	loss_cls: 0.2418, loss_bbox: 0.3069, loss: 0.5487
- Epoch [2][40/50]	loss_cls: 0.2912, loss_bbox: 0.4422, loss: 0.7335
- Epoch [2][50/50]	loss_cls: 0.2818, loss_bbox: 0.2620, loss: 0.5438
- Epoch [3][10/50]	loss_cls: 0.2182, loss_bbox: 0.3185, loss: 0.5367
- Epoch [3][20/50]	loss_cls: 0.2711, loss_bbox: 0.4153, loss: 0.6864
- Epoch [3][30/50]	loss_cls: 0.2528, loss_bbox: 0.2959, loss: 0.5488
- Epoch [3][40/50]	loss_cls: 0.2249, loss_bbox: 0.3188, loss: 0.5437
- Epoch [3][50/50]	loss_cls: 0.2020, loss_bbox: 0.3191, loss: 0.5210
- Epoch [4][10/50]	loss_cls: 0.2135, loss_bbox: 0.3957, loss: 0.6091
- Epoch [4][20/50]	loss_cls: 0.1984, loss_bbox: 0.2902, loss: 0.4886
- Epoch [4][30/50]	loss_cls: 0.1694, loss_bbox: 0.2576, loss: 0.4270
- Epoch [4][40/50]	loss_cls: 0.2733, loss_bbox: 0.3334, loss: 0.6067
- Epoch [4][50/50]	loss_cls: 0.2013, loss_bbox: 0.3498, loss: 0.5512
- Epoch [5][10/50]	loss_cls: 0.2169, loss_bbox: 0.3491, loss: 0.5660
- Epoch [5][20/50]	loss_cls: 0.1879, loss_bbox: 0.2640, loss: 0.4518
- Epoch [5][30/50]	loss_cls: 0.1366, loss_bbox: 0.3346, loss: 0.4712
- Epoch [5][40/50]	loss_cls: 0.1981, loss_bbox: 0.2485, loss: 0.4467
- Epoch [5][50/50]	loss_cls: 0.2038, loss_bbox: 0.2929, loss: 0.4968
- Epoch [6][10/50]	loss_cls: 0.2363, loss_bbox: 0.3392, loss: 0.5755
- Epoch [6][20/50]	loss_cls: 0.1515, loss_bbox: 0.3114, loss: 0.4629
- Epoch [6][30/50]	loss_cls: 0.1903, loss_bbox: 0.3343, loss: 0.5246
- Epoch [6][40/50]	loss_cls: 0.1311, loss_bbox: 0.2742, loss: 0.4053
- Epoch [6][50/50]	loss_cls: 0.1533, loss_bbox: 0.3611, loss: 0.5145
- Epoch [7][10/50]	loss_cls: 0.1615, loss_bbox: 0.2453, loss: 0.4067
- Epoch [7][20/50]	loss_cls: 0.1341, loss_bbox: 0.2708, loss: 0.4049
- Epoch [7][30/50]	loss_cls: 0.1876, loss_bbox: 0.4118, loss: 0.5994
- Epoch [7][40/50]	loss_cls: 0.1071, loss_bbox: 0.1954, loss: 0.3026
- Epoch [7][50/50]	loss_cls: 0.2015, loss_bbox: 0.2969, loss: 0.4985
- Epoch [8][10/50]	loss_cls: 0.1945, loss_bbox: 0.2751, loss: 0.4695
- Epoch [8][20/50]	loss_cls: 0.1245, loss_bbox: 0.1816, loss: 0.3061
- Epoch [8][30/50]	loss_cls: 0.1388, loss_bbox: 0.3588, loss: 0.4976
- Epoch [8][40/50]	loss_cls: 0.1417, loss_bbox: 0.2950, loss: 0.4367
- Epoch [8][50/50]	loss_cls: 0.1369, loss_bbox: 0.2684, loss: 0.4053
- Epoch [9][10/50]	loss_cls: 0.1295, loss_bbox: 0.2353, loss: 0.3648
- Epoch [9][20/50]	loss_cls: 0.1349, loss_bbox: 0.2023, loss: 0.3372

Results of the fine-tuned model od_ft_res02 od_ft_res03 od_ft_res01


Design and train your own object detector

Open In Colab

Here, we desing, inspect and train a custome object detection model. First we start with YOLOF detector and then make the following updates:

  1. Backbone: Replace ResNet50 -> Pyramid Vision Transformer (PVT)
# clear the defualt backbone
cfg.model.backbone.clear() 
# add the new backbone
cfg.model.backbone.type='PyramidVisionTransformer'
cfg.model.backbone.num_layers=[2, 2, 2, 2]
cfg.model.backbone.init_cfg=dict(checkpoint='https://github.com/whai362/PVT/releases/download/v2/pvt_tiny.pth')
cfg.model.backbone.out_indices=(3, )
  1. Neck: Replace DilatedEncoder -> Feature Pyramid Network (FPN)
# clear the defualt neck
cfg.model.neck.clear() 

# add the new neck
cfg.model.neck.type='FPN'
cfg.model.neck.in_channels=[512]
cfg.model.neck.out_channels=128
cfg.model.neck.num_outs=1
  1. Head: Update YOLOF head -> num_classes = 3, and input channels
cfg.model.bbox_head.num_classes=3
cfg.model.bbox_head.in_channels=128

Train on KITTI tiny

- Epoch [1][10/50]loss_cls: 11.7010, loss_bbox: 1.2835, loss: 12.9846
- Epoch [1][20/50]loss_cls: 1.5840, loss_bbox: 1.2059, loss: 2.7900
- Epoch [1][30/50]loss_cls: 1.9305, loss_bbox: 1.1042, loss: 3.0348
- Epoch [1][40/50]loss_cls: 1.9289, loss_bbox: 1.1813, loss: 3.1102
- Epoch [1][50/50]loss_cls: 1.5479, loss_bbox: 0.8640, loss: 2.4120
- Epoch [2][10/50]loss_cls: 1.7256, loss_bbox: 1.1628, loss: 2.8884
- Epoch [2][20/50]loss_cls: 1.4963, loss_bbox: 1.1485, loss: 2.6448
- Epoch [2][30/50]loss_cls: 1.5202, loss_bbox: 0.9300, loss: 2.4502
- Epoch [2][40/50]loss_cls: 1.3903, loss_bbox: 1.2719, loss: 2.6623
- Epoch [2][50/50]loss_cls: 1.4891, loss_bbox: 1.0604, loss: 2.5496
- Epoch [3][10/50]loss_cls: 1.2993, loss_bbox: 1.0748, loss: 2.3741
- Epoch [3][20/50]loss_cls: 1.2108, loss_bbox: 1.0574, loss: 2.2682
- Epoch [3][30/50]loss_cls: 1.1432, loss_bbox: 1.1037, loss: 2.2469
- Epoch [3][40/50]loss_cls: 1.0749, loss_bbox: 0.9834, loss: 2.0583
- Epoch [3][50/50]loss_cls: 1.0111, loss_bbox: 1.0372, loss: 2.0483
- Epoch [4][10/50]loss_cls: 0.8880, loss_bbox: 1.3213, loss: 2.2093
- Epoch [4][20/50]loss_cls: 0.9726, loss_bbox: 1.0892, loss: 2.0618
- Epoch [4][30/50]loss_cls: 0.9695, loss_bbox: 0.7829, loss: 1.7524
- Epoch [4][40/50]loss_cls: 0.8184, loss_bbox: 1.0882, loss: 1.9067
- Epoch [4][50/50]loss_cls: 0.8585, loss_bbox: 1.1519, loss: 2.0104
- Epoch [5][10/50]loss_cls: 0.7565, loss_bbox: 1.0927, loss: 1.8492
- Epoch [5][20/50]loss_cls: 0.8421, loss_bbox: 0.9804, loss: 1.8226
- Epoch [5][30/50]loss_cls: 0.7459, loss_bbox: 0.9499, loss: 1.6958
- Epoch [5][40/50]loss_cls: 0.6630, loss_bbox: 0.9623, loss: 1.6253
- Epoch [5][50/50]loss_cls: 0.8257, loss_bbox: 1.0594, loss: 1.8851
- Epoch [6][10/50]loss_cls: 0.7017, loss_bbox: 1.1119, loss: 1.8136
- Epoch [6][20/50]loss_cls: 0.7116, loss_bbox: 1.0703, loss: 1.7819
- Epoch [6][30/50]loss_cls: 0.7036, loss_bbox: 1.0633, loss: 1.7669
- Epoch [6][40/50]loss_cls: 0.6957, loss_bbox: 1.1699, loss: 1.8656
- Epoch [6][50/50]loss_cls: 0.6945, loss_bbox: 1.0165, loss: 1.7111
- Epoch [7][10/50]loss_cls: 0.6606, loss_bbox: 0.9504, loss: 1.6110
- Epoch [7][20/50]loss_cls: 0.6879, loss_bbox: 1.0412, loss: 1.7292
- Epoch [7][30/50]loss_cls: 0.6921, loss_bbox: 1.2121, loss: 1.9042
- Epoch [7][40/50]loss_cls: 0.6256, loss_bbox: 0.9307, loss: 1.5562
- Epoch [7][50/50]loss_cls: 0.6127, loss_bbox: 1.1764, loss: 1.7891
- Epoch [8][10/50]loss_cls: 0.5272, loss_bbox: 1.1579, loss: 1.6851
- Epoch [8][20/50]loss_cls: 0.6060, loss_bbox: 0.9147, loss: 1.5207
- Epoch [8][30/50]loss_cls: 0.6159, loss_bbox: 1.0704, loss: 1.6863
- Epoch [8][40/50]loss_cls: 0.7068, loss_bbox: 1.0493, loss: 1.7561
- Epoch [8][50/50]loss_cls: 0.6400, loss_bbox: 0.9694, loss: 1.6094
- Epoch [9][10/50]loss_cls: 0.6859, loss_bbox: 1.0887, loss: 1.7745
- Epoch [9][20/50]loss_cls: 0.5328, loss_bbox: 0.7794, loss: 1.3123

See how the loss is decreasing.


Quick review

YOLOF: You Only Look One-level Feature (2021)

YOLOF This paper revisits feature pyramids networks (FPN) for one-stage detectors and points out that the success of FPN is due to its divide-and-conquer solution to the optimization problem in object detection rather than multi-scale feature fusion. From the perspective of optimization, we introduce an alternative way to address the problem instead of adopting the complex feature pyramids - {utilizing only one-level feature for detection}. Based on the simple and efficient solution, we present You Only Look One-level Feature (YOLOF). In our method, two key components, Dilated Encoder and Uniform Matching, are proposed and bring considerable improvements. Extensive experiments on the COCO benchmark prove the effectiveness of the proposed model. Our YOLOF achieves comparable results with its feature pyramids counterpart RetinaNet while being 2.5× faster. Without transformer layers, YOLOF can match the performance of DETR in a single-level feature manner with 7× less training epochs. With an image size of 608×608, YOLOF achieves 44.3 mAP running at 60 fps on 2080Ti, which is 13% faster than YOLOv4.

YOLOX: Exceeding YOLO Series in 2021 (2021)

YOLOX In this report, we present some experienced improvements to YOLO series, forming a new high-performance detector -- YOLOX. We switch the YOLO detector to an anchor-free manner and conduct other advanced detection techniques, i.e., a decoupled head and the leading label assignment strategy SimOTA to achieve state-of-the-art results across a large scale range of models: For YOLO-Nano with only 0.91M parameters and 1.08G FLOPs, we get 25.3% AP on COCO, surpassing NanoDet by 1.8% AP; for YOLOv3, one of the most widely used detectors in industry, we boost it to 47.3% AP on COCO, outperforming the current best practice by 3.0% AP; for YOLOX-L with roughly the same amount of parameters as YOLOv4-CSP, YOLOv5-L, we achieve 50.0% AP on COCO at a speed of 68.9 FPS on Tesla V100, exceeding YOLOv5-L by 1.8% AP. Further, we won the 1st Place on Streaming Perception Challenge (Workshop on Autonomous Driving at CVPR 2021) using a single YOLOX-L model. We hope this report can provide useful experience for developers and researchers in practical scenes, and we also provide deploy versions with ONNX, TensorRT, NCNN, and Openvino supported.

DETR: End-to-End Object Detection with Transformers (2020)

DETR We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression procedure or anchor generation that explicitly encode our prior knowledge about the task. The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss that forces unique predictions via bipartite matching, and a transformer encoder-decoder architecture. Given a fixed small set of learned object queries, DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel. The new model is conceptually simple and does not require a specialized library, unlike many other modern detectors. DETR demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset. Moreover, DETR can be easily generalized to produce panoptic segmentation in a unified manner. We show that it significantly outperforms competitive baselines.

Deformable DETR: Deformable Transformers for End-to-End Object Detection (2021)

DDETR DETR has been recently proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance. However, it suffers from slow convergence and limited feature spatial resolution, due to the limitation of Transformer attention modules in processing image feature maps. To mitigate these issues, we proposed Deformable DETR, whose attention modules only attend to a small set of key sampling points around a reference. Deformable DETR can achieve better performance than DETR (especially on small objects) with 10 times less training epochs. Extensive experiments on the COCO benchmark demonstrate the effectiveness of our approach.

SparseR-CNN: End-to-End Object Detection with Learnable Proposals (2020)

SparseRCNN We present Sparse R-CNN, a purely sparse method for object detection in images. Existing works on object detection heavily rely on dense object candidates, such as k anchor boxes pre-defined on all grids of image feature map of size H×W. In our method, however, a fixed sparse set of learned object proposals, total length of N, are provided to object recognition head to perform classification and location. By eliminating HWk (up to hundreds of thousands) hand-designed object candidates to N (e.g. 100) learnable proposals, Sparse R-CNN completely avoids all efforts related to object candidates design and many-to-one label assignment. More importantly, final predictions are directly output without non-maximum suppression post-procedure. Sparse R-CNN demonstrates accuracy, run-time and training convergence performance on par with the well-established detector baselines on the challenging COCO dataset, e.g., achieving 45.0 AP in standard 3× training schedule and running at 22 fps using ResNet-50 FPN model. We hope our work could inspire re-thinking the convention of dense prior in object detectors.

VarifocalNet: An IoU-aware Dense Object Detector (2020)

VarifocalNet Accurately ranking the vast number of candidate detections is crucial for dense object detectors to achieve high performance. Prior work uses the classification score or a combination of classification and predicted localization scores to rank candidates. However, neither option results in a reliable ranking, thus degrading detection performance. In this paper, we propose to learn an Iou-aware Classification Score (IACS) as a joint representation of object presence confidence and localization accuracy. We show that dense object detectors can achieve a more accurate ranking of candidate detections based on the IACS. We design a new loss function, named Varifocal Loss, to train a dense object detector to predict the IACS, and propose a new star-shaped bounding box feature representation for IACS prediction and bounding box refinement. Combining these two new components and a bounding box refinement branch, we build an IoU-aware dense object detector based on the FCOS+ATSS architecture, that we call VarifocalNet or VFNet for short. Extensive experiments on MS COCO show that our VFNet consistently surpasses the strong baseline by ∼2.0 AP with different backbones. Our best model VFNet-X-1200 with Res2Net-101-DCN achieves a single-model single-scale AP of 55.1 on COCO test-dev, which is state-of-the-art among various object detectors.

PAA: Probabilistic Anchor Assignment with IoU Prediction for Object Detection (2020)

paa In object detection, determining which anchors to assign as positive or negative samples, known as anchor assignment, has been revealed as a core procedure that can significantly affect a model's performance. In this paper we propose a novel anchor assignment strategy that adaptively separates anchors into positive and negative samples for a ground truth bounding box according to the model's learning status such that it is able to reason about the separation in a probabilistic manner. To do so we first calculate the scores of anchors conditioned on the model and fit a probability distribution to these scores. The model is then trained with anchors separated into positive and negative samples according to their probabilities. Moreover, we investigate the gap between the training and testing objectives and propose to predict the Intersection-over-Unions of detected boxes as a measure of localization quality to reduce the discrepancy. The combined score of classification and localization qualities serving as a box selection metric in non-maximum suppression well aligns with the proposed anchor assignment strategy and leads significant performance improvements. The proposed methods only add a single convolutional layer to RetinaNet baseline and does not require multiple anchors per location, so are efficient. Experimental results verify the effectiveness of the proposed methods. Especially, our models set new records for single-stage detectors on MS COCO test-dev dataset with various backbones.

SABL: Side-Aware Boundary Localization for More Precise Object Detection (2020)

SABL Current object detection frameworks mainly rely on bounding box regression to localize objects. Despite the remarkable progress in recent years, the precision of bounding box regression remains unsatisfactory, hence limiting performance in object detection. We observe that precise localization requires careful placement of each side of the bounding box. However, the mainstream approach, which focuses on predicting centers and sizes, is not the most effective way to accomplish this task, especially when there exists displacements with large variance between the anchors and the targets. In this paper, we propose an alternative approach, named as Side-Aware Boundary Localization (SABL), where each side of the bounding box is respectively localized with a dedicated network branch. To tackle the difficulty of precise localization in the presence of displacements with large variance, we further propose a two-step localization scheme, which first predicts a range of movement through bucket prediction and then pinpoints the precise position within the predicted bucket. We test the proposed method on both two-stage and single-stage detection frameworks. Replacing the standard bounding box regression branch with the proposed design leads to significant improvements on Faster R-CNN, RetinaNet, and Cascade R-CNN, by 3.0%, 1.7%, and 0.9%, respectively.

ATSS: Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection (2019)

ATSS Object detection has been dominated by anchor-based detectors for several years. Recently, anchor-free detectors have become popular due to the proposal of FPN and Focal Loss. In this paper, we first point out that the essential difference between anchor-based and anchor-free detection is actually how to define positive and negative training samples, which leads to the performance gap between them. If they adopt the same definition of positive and negative samples during training, there is no obvious difference in the final performance, no matter regressing from a box or a point. This shows that how to select positive and negative training samples is important for current object detectors. Then, we propose an Adaptive Training Sample Selection (ATSS) to automatically select positive and negative samples according to statistical characteristics of object. It significantly improves the performance of anchor-based and anchor-free detectors and bridges the gap between them. Finally, we discuss the necessity of tiling multiple anchors per location on the image to detect objects. Extensive experiments conducted on MS COCO support our aforementioned analysis and conclusions. With the newly introduced ATSS, we improve state-of-the-art detectors by a large margin to 50.7% AP without introducing any overhead.

Double Heads: Rethinking Classification and Localization for Object Detection (2019)

![dhs](images/Double Heads.png) Two head structures (i.e. fully connected head and convolution head) have been widely used in R-CNN based detectors for classification and localization tasks. However, there is a lack of understanding of how does these two head structures work for these two tasks. To address this issue, we perform a thorough analysis and find an interesting fact that the two head structures have opposite preferences towards the two tasks. Specifically, the fully connected head (fc-head) is more suitable for the classification task, while the convolution head (conv-head) is more suitable for the localization task. Furthermore, we examine the output feature maps of both heads and find that fc-head has more spatial sensitivity than conv-head. Thus, fc-head has more capability to distinguish a complete object from part of an object, but is not robust to regress the whole object. Based upon these findings, we propose a Double-Head method, which has a fully connected head focusing on classification and a convolution head for bounding box regression. Without bells and whistles, our method gains +3.5 and +2.8 AP on MS COCO dataset from Feature Pyramid Network (FPN) baselines with ResNet-50 and ResNet-101 backbones, respectively.

Owner
Ibrahim Sobh
Ibrahim Sobh
Neural Logic Inductive Learning

Neural Logic Inductive Learning This is the implementation of the Neural Logic Inductive Learning model (NLIL) proposed in the ICLR 2020 paper: Learn

36 Nov 28, 2022
[ICRA 2022] An opensource framework for cooperative detection. Official implementation for OPV2V.

OpenCOOD OpenCOOD is an Open COOperative Detection framework for autonomous driving. It is also the official implementation of the ICRA 2022 paper OPV

Runsheng Xu 322 Dec 23, 2022
The code for two papers: Feedback Transformer and Expire-Span.

transformer-sequential This repo contains the code for two papers: Feedback Transformer Expire-Span The training code is structured for long sequentia

Facebook Research 125 Dec 25, 2022
My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control

My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control

yobi byte 29 Oct 09, 2022
Active learning for Mask R-CNN in Detectron2

MaskAL - Active learning for Mask R-CNN in Detectron2 Summary MaskAL is an active learning framework that automatically selects the most-informative i

49 Dec 20, 2022
PyTorch implementation HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections

HoroPCA This code is the official PyTorch implementation of the ICML 2021 paper: HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projec

HazyResearch 52 Nov 14, 2022
Deep metric learning methods implemented in Chainer

Deep Metric Learning Implementation of several methods for deep metric learning in Chainer v4.2.0. Proxy-NCA: No Fuss Distance Metric Learning using P

ronekko 156 Nov 28, 2022
OBG-FCN - implementation of 'Object Boundary Guided Semantic Segmentation'

OBG-FCN This repository is to reproduce the implementation of 'Object Boundary Guided Semantic Segmentation' in http://arxiv.org/abs/1603.09742 Object

Jiu XU 3 Mar 11, 2019
3D-Transformer: Molecular Representation with Transformer in 3D Space

3D-Transformer: Molecular Representation with Transformer in 3D Space

55 Dec 19, 2022
Learnable Multi-level Frequency Decomposition and Hierarchical Attention Mechanism for Generalized Face Presentation Attack Detection

LMFD-PAD Note This is the official repository of the paper: LMFD-PAD: Learnable Multi-level Frequency Decomposition and Hierarchical Attention Mechani

28 Dec 02, 2022
A visualisation tool for Deep Reinforcement Learning

DRLVIS - Visualising Deep Reinforcement Learning Created by Marios Sirtmatsis with the support of Alex Bäuerle. DRLVis is an application used for visu

Marios Sirtmatsis 1 Nov 04, 2021
Deeper insights into graph convolutional networks for semi-supervised learning

deeper_insights_into_GCNs Deeper insights into graph convolutional networks for semi-supervised learning References data and utils.py come from Implem

Davidham3 17 Dec 16, 2022
Deep universal probabilistic programming with Python and PyTorch

Getting Started | Documentation | Community | Contributing Pyro is a flexible, scalable deep probabilistic programming library built on PyTorch. Notab

7.7k Dec 30, 2022
Generalized Matrix Means for Semi-Supervised Learning with Multilayer Graphs

Generalized Matrix Means for Semi-Supervised Learning with Multilayer Graphs MATLAB implementation of the paper: P. Mercado, F. Tudisco, and M. Hein,

Pedro Mercado 6 May 26, 2022
Implementation of our paper "Video Playback Rate Perception for Self-supervised Spatio-Temporal Representation Learning".

PRP Introduction This is the implementation of our paper "Video Playback Rate Perception for Self-supervised Spatio-Temporal Representation Learning".

yuanyao366 39 Dec 29, 2022
CrossMLP - The repository offers the official implementation of our BMVC 2021 paper (oral) in PyTorch.

CrossMLP Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation Bin Ren1, Hao Tang2, Nicu Sebe1. 1University of Trento, Italy, 2ETH, Switzerla

Bingoren 16 Jul 27, 2022
Part-aware Measurement for Robust Multi-View Multi-Human 3D Pose Estimation and Tracking

Part-aware Measurement for Robust Multi-View Multi-Human 3D Pose Estimation and Tracking Part-Aware Measurement for Robust Multi-View Multi-Human 3D P

19 Oct 27, 2022
Latex code for making neural networks diagrams

PlotNeuralNet Latex code for drawing neural networks for reports and presentation. Have a look into examples to see how they are made. Additionally, l

Haris Iqbal 18.6k Jan 01, 2023
The world's largest toxicity dataset.

The Toxicity Dataset by Surge AI Saving the internet is fun. Combing through thousands of online comments to build a toxicity dataset isn't. That's wh

Surge AI 134 Dec 19, 2022
PyContinual (An Easy and Extendible Framework for Continual Learning)

PyContinual (An Easy and Extendible Framework for Continual Learning) Easy to Use You can sumply change the baseline, backbone and task, and then read

Zixuan Ke 176 Jan 05, 2023