Refer-it-in-RGBD

This is the repository of our paper 'Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images' in CVPR 2021

Paper - ArXiv - pdf (abs)
Project page: https://unclemedm.github.io/Refer-it-in-RGBD/

Introduction

We present a novel task of 3D visual grounding in single-view RGB-D images where the referred objects are often only partially scanned. In contrast to previous works that directly generate object proposals for grounding in the 3D scenes, we propose a bottom-up approach to gradually aggregate information, effectively addressing the challenge posed by the partial scans. Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that coarsely localizes the relevant regions in the RGB-D image. Then our approach adopts an adaptive search based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object. We evaluate the proposed method by comparing to the state-of-the-art methods on both the RGB-D images extracted from the ScanRefer dataset and our newly collected SUN-Refer dataset. Experiments show that our method outperforms the previous methods by a large margin (by 11.1% and 11.2% [email protected]) on both datasets.

Dataset

Download SUNREFER_v2 dataset
SUNREFER dataset contains 38,495 referring expression corresponding to 7,699 objects from SUNRGBD dataset. Here is one example from SUNREFER dataset:

Repository of our paper 'Refer-it-in-RGBD' in CVPR 2021

Related tags

Overview

Refer-it-in-RGBD

Introduction

Dataset

Owner

Haolin Liu

End-To-End Optimization of LiDAR Beam Configuration

Code for the paper Learning the Predictability of the Future

NEG loss implemented in pytorch

The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

Repository for the paper "From global to local MDI variable importances for random forests and when they are Shapley values"

A python tutorial on bayesian modeling techniques (PyMC3)

Visualizer for neural network, deep learning, and machine learning models

A customisable game where you have to quickly click on black tiles in order of appearance while avoiding clicking on white squares.

Scripts used to make and evaluate OpenAlex's concept tagging model

Multi-Horizon-Forecasting-for-Limit-Order-Books

Deep learning library for solving differential equations and more

An original implementation of "MetaICL Learning to Learn In Context" by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi

The Multi-Mission Maximum Likelihood framework (3ML)

JAXMAPP: JAX-based Library for Multi-Agent Path Planning in Continuous Spaces

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

A Marvelous ChatBot implement using PyTorch.

Learning to Simulate Dynamic Environments with GameGAN (CVPR 2020)

Graph Neural Networks with Keras and Tensorflow 2.

Code for our paper at ECCV 2020: Post-Training Piecewise Linear Quantization for Deep Neural Networks