Pytorch implementation of One-Shot Affordance Detection

Related tags

Deep LearningOSAD_Net
Overview

One-shot Affordance Detection

PyTorch implementation of our one-shot affordance detection models. This repository contains PyTorch evaluation code, training code and pretrained models.

๐Ÿ“‹ Table of content

  1. ๐Ÿ“Ž Paper Link
  2. ๐Ÿ’ก Abstract
  3. ๐Ÿ“– Method
    1. IJCAI Version
    2. Extended Version
  4. ๐Ÿ“‚ Dataset
    1. PAD
    2. PADv2
  5. ๐Ÿ“ƒ Requirements
  6. โœ๏ธ Usage
    1. Train
    2. Test
    3. Evaluation
  7. ๐Ÿ“Š Experimental Results
    1. Performance on PADv2
    2. Performance on PAD
  8. ๐ŸŽ Potential Applications
  9. โœ‰๏ธ Statement
  10. ๐Ÿ” Citation

๐Ÿ“Ž Paper Link

  • One-Shot Affordance Detection (IJCAI2021) (link)

Authors: Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, Dacheng Tao

  • One-Shot Affordance Detection (Extended Version) (link)

Authors: Wei Zhai*, Hongchen Luo*, Jing Zhang, Yang Cao, Dacheng Tao

๐Ÿ’ก Abstract

Affordance detection refers to identifying the potential action possibilities of objects in an image, which is a crucial ability for robot perception and manipulation. To empower robots with this ability in unseen scenarios, we first consider the challenging one-shot affordance detection problem in this paper, i.e., given a support image that depicts the action purpose, all objects in a scene with the common affordance should be detected. To this end, we devise a One-Shot Affordance Detection Network (OSAD-Net) that firstly estimates the human action purpose and then transfers it to help detect the common affordance from all candidate images. Through collaboration learning, OSAD-Net can capture the common characteristics between objects having the same underlying affordance and learn a good adaptation capability for perceiving unseen affordances. Besides, we build a Purpose-driven Affordance Dataset v2 (PADv2) by collecting and labeling 30k images from 39 affordance and 94 object categories. With complex scenes and rich annotations, our PADv2 can comprehensively understand the affordance of objects and can even be used in other vision tasks, such as scene understanding, action recognition, robot manipulation, etc. We present a standard one-shot affordance detection benchmark comparing 11 advanced models in several different fields. Experimental results demonstrate the superiority of our model over previous representative ones in terms of both objective metrics and visual quality.


Illustration of perceiving affordance. Given a support image that depicts the action purpose, all objects in ascene with the common affordance could be detected.

๐Ÿ“– Method

OSAD-Net (IJCAI2021)


Our One-Shot Affordance Detection (OS-AD) network. OSAD-Net_ijcai consists of three key modules: Purpose Learning Module (PLM), Purpose Transfer Module (PTM), and Collaboration Enhancement Module (CEM). (a) PLM aims to estimate action purpose from the human-object interaction in the support image. (b) PTM transfers the action purpose to the query images via an attention mechanism to enhance the relevant features. (c) CEM captures the intrinsic characteristics between objects having the common affordance to learn a better affordance perceiving ability.

OSAD-Net (Extended Version)


The framework of our OSAD-Net. For our OSAD-Net pipeline, the network first uses a Resnet50 to extract the features of support image and query images. Subsequently, the support feature, the bounding box of the person and object, and the pose of the person are fed together into the action purpose learning (APL) module to obtain the human action purpose features. And then send the human action purpose features and query images together to the mixture purpose transfer (MPT) to transfer the human action purpose to query images and activate the object region belonging to the affordance in the query images. Then, the output of the MPT is fed into a densely collaborative enhancement (DCE) module to learn the commonality among objects of the same affordance and suppress the irrelevant background regions using the cooperative strategy, and finally feed into the decoder to obtain the final detection results.

๐Ÿ“‚ Dataset


The samples images in the PADv2 of this paper. Our PADv2 has rich annotations such as affordance masks as well as depth information. Thus it provides a solid foundation for the affordance detection task.


The properties of PADv2. (a) The classification structure of the PADv2 in this paper consists of 39 affordance categories and 94 object categories. (b) The word cloud distribution of the PADv2. (c) Overlapping masks visualization of PADv2 mixed with specific affordance classes and overall category masks. (d) Confusion matrix of PADv2 affordance category and object category, where the horizontal axis corresponds to the object category and the vertical axis corresponds to the affordance category, (e) Distribution of co-occurring attributes of the PADv2, the grid is numbered for the total number of images.

Download PAD

cd Downloads/
unzip PAD.zip
cd OSAD-Net
mkdir datasets/PAD
mv Downloads/PAD/divide_1 datasets/PAD/   
mv Downloads/PAD/divide_2 datasets/PAD/   
mv Downloads/PAD/divide_3 datasets/PAD/  

Download PADv2

  • You can download the PADv2 from [ Baidu Pan (1ttj) ].
cd Downloads/
unzip PADv2_part1.zip
cd OSAD-Net
mkdir datasets/PADv2_part1
mv Downloads/PADv2_part1/divide_1 datasets/PADv2_part1/  
mv Downloads/PADv2_part1/divide_2 datasets/PADv2_part1/  
mv Downloads/PADv2_part1/divide_3 datasets/PADv2_part1/   

๐Ÿ“ƒ Requirements

  • python 3.7
  • pytorch 1.1.0
  • opencv

โœ๏ธ Usage

git clone https://github.com/lhc1224/OSAD_Net.git
cd OSAD-Net

Train

You can download the pretrained model from [ Google Drive | Baidu Pan (xjk5) ], then move it to the models folder To train the OSAD-Net_ijcai model, run run_os_ad.py with the desired model architecture:

python run_os_ad.py   

To train the OSAD-Net model, run run_os_adv2.py with the desired model architecture:

python run_os_adv2.py   

Test

To test the OSAD-Net_ijcai model, run run_os_ad.py:

python run_os_ad.py  --mode test 

To test the OSAD-Net model, run run_os_ad.py, you can download the trained models from [ Google Drive | Baidu Pan (611r) ]

python run_os_adv2.py  --mode test 

Evaluation

In order to evaluate the forecast results, the evaluation code can be obtained via the following Evaluation Tools.

๐Ÿ“Š Experimental Results

Performance on PADv2

You can download the affordance maps from [ Google Drive | Baidu Pan (hwtf) ]


Performance on PAD

You can download the affordance maps from [ Google Drive | Baidu Pan(hrlj) ]


๐ŸŽ Potential Applications


Potential Applications of one-shot affordance system. (a) Application I: Content Image Retrieval. The content image retrieval model combined with affordance detection has a promising application in search engines and online shopping platforms. (b) Application II: Learning from Demonstration. The one-shot affordance detection model can help an agent to naturally select the correct object based on the expertโ€™s actions. (c) Application III: Self-exploration of Agents. The one-shot affordance detection model helps an agent to autonomously perceive all instances or areas of a scene with the similar affordance property in unknown human spaces based on historical data (e.g., images of human interactions)

โœ‰๏ธ Statement

This project is for research purpose only, please contact us for the licence of commercial use. For any other questions please contact [email protected] or [email protected].

๐Ÿ” Citation

@inproceedings{Oneluo,
  title={One-Shot Affordance Detection},
  author={Hongchen Luo and Wei Zhai and Jing Zhang and Yang Cao and Dacheng Tao},
  booktitle={IJCAI},
  year={2021}
}
@article{luo2021one,
  title={One-Shot Affordance Detection in the Wild},
  author={Zhai, Wei and Luo, Hongchen and Zhang, Jing and Cao, Yang and Tao, Dacheng},
  journal={arXiv preprint arXiv:2106.14747xx},
  year={2021}
}
Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Networks

CyGNet This repository reproduces the AAAI'21 paper โ€œLearning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Network

CunchaoZ 89 Jan 03, 2023
Dynamical movement primitives (DMPs), probabilistic movement primitives (ProMPs), spatially coupled bimanual DMPs.

Movement Primitives Movement primitives are a common group of policy representations in robotics. There are many different types and variations. This

DFKI Robotics Innovation Center 63 Jan 06, 2023
realsense d400 -> jpg + csv

Realsense-capture realsense d400 - jpg + csv Requirements RealSense sdk : Installation Python3 pyrealsense2 (RealSense SDK) Numpy OpenCV Tkinter Run

Ar-Ray 2 Mar 22, 2022
TSIT: A Simple and Versatile Framework for Image-to-Image Translation

TSIT: A Simple and Versatile Framework for Image-to-Image Translation This repository provides the official PyTorch implementation for the following p

Liming Jiang 255 Nov 23, 2022
OpenMMLab 3D Human Parametric Model Toolbox and Benchmark

Introduction English | ็ฎ€ไฝ“ไธญๆ–‡ MMHuman3D is an open source PyTorch-based codebase for the use of 3D human parametric models in computer vision and comput

OpenMMLab 782 Jan 04, 2023
Effect of Different Encodings and Distance Functions on Quantum Instance-based Classifiers

Effect of Different Encodings and Distance Functions on Quantum Instance-based Classifiers The repository contains the code to reproduce the experimen

Alessandro Berti 4 Aug 24, 2022
torchbearer: A model fitting library for PyTorch

Note: We're moving to PyTorch Lightning! Read about the move here. From the end of February, torchbearer will no longer be actively maintained. We'll

632 Dec 13, 2022
This is a beginner-friendly repo to make a collection of some unique and awesome projects. Everyone in the community can benefit & get inspired by the amazing projects present over here.

Awesome-Projects-Collection Quality over Quantity :) What to do? Add some unique and amazing projects as per your favourite tech stack for the communi

Rohan Sharma 178 Jan 01, 2023
Code for BMVC2021 "MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation"

MOS-Multi-Task-Face-Detect Introduction This repo is the official implementation of "MOS: A Low Latency and Lightweight Framework for Face Detection,

104 Dec 08, 2022
Scripts and misc. stuff related to the PortSwigger Web Academy

PortSwigger Web Academy Notes Mostly scripts to automate the exploits. Going in the order of the recomended learning path - starting with SQLi. Commun

pageinsec 17 Dec 30, 2022
Deep Video Matting via Spatio-Temporal Alignment and Aggregation [CVPR2021]

Deep Video Matting via Spatio-Temporal Alignment and Aggregation [CVPR2021] Paper: https://arxiv.org/abs/2104.11208 Introduction Despite the significa

76 Dec 07, 2022
๐Ÿ”ฅ Cannlytics-powered artificial intelligence ๐Ÿค–

Cannlytics AI ๐Ÿ”ฅ Cannlytics-powered artificial intelligence ๐Ÿค– ๐Ÿ—๏ธ Installation ๐Ÿƒโ€โ™€๏ธ Quickstart ๐Ÿงฑ Development ๐Ÿฆพ Automation ๐Ÿ’ธ Support ๐Ÿ›๏ธ License ?

Cannlytics 3 Nov 11, 2022
Tackling the Class Imbalance Problem of Deep Learning Based Head and Neck Organ Segmentation

Info This is the code repository of the work Tackling the Class Imbalance Problem of Deep Learning Based Head and Neck Organ Segmentation from Elias T

2 Apr 20, 2022
Genshin-assets - ๐Ÿ‘ง Public documentation & static assets for Genshin Impact data.

genshin-assets This repo provides easy access to the Genshin Impact assets, primarily for use on static sites. Sources Genshin Optimizer - An Artifact

Zerite Development 5 Nov 22, 2022
Implement Decoupled Neural Interfaces using Synthetic Gradients in Pytorch

disclaimer: this code is modified from pytorch-tutorial Image classification with synthetic gradient in Pytorch I implement the Decoupled Neural Inter

Andrew 114 Dec 22, 2022
SegNet-Basic with Keras

SegNet-Basic: What is Segnet? Deep Convolutional Encoder-Decoder Architecture for Semantic Pixel-wise Image Segmentation Segnet = (Encoder + Decoder)

Yad Konrad 81 Jun 30, 2022
HeartRate detector with ArduinoandPython - Use Arduino and Python create a heartrate detector.

Syllabus of Contents Syllabus of Contents Introduction Of Project Features Develop With Python code introduction Installation License Developer Contac

1 Jan 05, 2022
Source codes for "Structure-Aware Abstractive Conversation Summarization via Discourse and Action Graphs"

Structure-Aware-BART This repo contains codes for the following paper: Jiaao Chen, Diyi Yang:Structure-Aware Abstractive Conversation Summarization vi

GT-SALT 56 Dec 08, 2022
PyTorch version of the paper 'Enhanced Deep Residual Networks for Single Image Super-Resolution' (CVPRW 2017)

About PyTorch 1.2.0 Now the master branch supports PyTorch 1.2.0 by default. Due to the serious version problem (especially torch.utils.data.dataloade

Sanghyun Son 2.1k Dec 27, 2022
PyTorch/TorchScript compiler for NVIDIA GPUs using TensorRT

PyTorch/TorchScript compiler for NVIDIA GPUs using TensorRT

NVIDIA Corporation 1.8k Dec 30, 2022