HAIS_2GNN: 3D Visual Grounding with Graph and Attention

Overview

HAIS_2GNN: 3D Visual Grounding with Graph and Attention

This repository is for the HAIS_2GNN research project.

Tao Gu, Yue Chen

Introduction

The motivation of this project is to improve the accuracy of 3D visual grounding. In this report, we propose a new model, named HAIS_2GNN based on the InstanceRefer model, to tackle the problem of insufficient connections between instance proposals. Our model incorporates a powerful instance segmentation model HAIS and strengthens the instance features by the structure of graph and attention, so that the text and point cloud can be better matched together. Experiments confirm that our method outperforms the InstanceRefer on ScanRefer validation datasets. Link to the technical report

Setup

The code is tested on Ubuntu 20.04.3 LTS with Python 3.9.7 PyTorch 1.10.1 CUDA 11.3.1 installed.

conda install pytorch==1.10.1 torchvision==0.11.2 cudatoolkit=11.3 -c pytorch

Install the necessary packages listed out in requirements.txt:

pip install -r requirements.txt

After all packages are properly installed, please run the following commands to compile the torchsaprse v1.4.0:

sudo apt-get install libsparsehash-dev
pip install --upgrade git+https://github.com/mit-han-lab/[email protected]

Before moving on to the next step, please don't forget to set the project root path to the CONF.PATH.BASE in lib/config.py.

Data preparation

  1. Download the ScanRefer dataset and unzip it under data/.
  2. Downloadand the preprocessed GLoVE embeddings (~990MB) and put them under data/.
  3. Download the ScanNetV2 dataset and put (or link) scans/ under (or to) data/scannet/scans/ (Please follow the ScanNet Instructions for downloading the ScanNet dataset). After this step, there should be folders containing the ScanNet scene data under the data/scannet/scans/ with names like scene0000_00
  4. Used official and pre-trained HAIS generate panoptic segmentation in PointGroupInst/. We will provide the pre-trained data soon.
  5. Pre-processed instance labels, and new data should be generated in data/scannet/pointgroup_data/
cd data/scannet/
python prepare_data.py --split train --pointgroupinst_path [YOUR_PATH]
python prepare_data.py --split val   --pointgroupinst_path [YOUR_PATH]
python prepare_data.py --split test  --pointgroupinst_path [YOUR_PATH]

Finally, the dataset folder should be organized as follows.

InstanceRefer
├── data
│   ├── glove.p
│   ├── ScanRefer_filtered.json
│   ├── ...
│   ├── scannet
│   │  ├── meta_data
│   │  ├── pointgroup_data
│   │  │  ├── scene0000_00_aligned_bbox.npy
│   │  │  ├── scene0000_00_aligned_vert.npy
│   │  ├──├──  ... ...

Training

Train the InstanceRefer model. You can change hyper-parameters in config/InstanceRefer.yaml:

python scripts/train.py --log_dir HAIS_2GNN

Evaluation

You need specific the use_checkpoint with the folder that contains model.pth in config/InstanceRefer.yaml and run with:

python scripts/eval.py

Pre-trained Models

Input [email protected] Unique [email protected] Checkpoints
xyz+rgb 39.24 33.66 will be released soon

TODO

  • Add pre-trained HAIS dataset.
  • Release pre-trained model.
  • Merge HAIS in an end-to-end manner.
  • Upload to ScanRefer benchmark

Changelog

02/09/2022: Released HAIS_2GNN

Acknowledgement

This work is a research project conducted by Tao Gu and Yue Chen for ADL4CV:Visual Computing course at the Technical University of Munich.

We acknowledge that our work is based on ScanRefer, InstanceRefer, HAIS, torchsaprse, and pytorch_geometric.

License

This repository is released under MIT License (see LICENSE file for details).

Owner
Yue Chen
Yue Chen
Mapping a variable-length sentence to a fixed-length vector using BERT model

Are you looking for X-as-service? Try the Cloud-Native Neural Search Framework for Any Kind of Data bert-as-service Using BERT model as a sentence enc

Han Xiao 11.1k Jan 01, 2023
List of GSoC organisations with number of times they have been selected.

Welcome to GSoC Organisation Frequency And Details 👋 List of GSoC organisations with number of times they have been selected, techonologies, topics,

Shivam Kumar Jha 41 Oct 01, 2022
NLP-Project - Used an API to scrape 2000 reddit posts, then used NLP analysis and created a classification model to mixed succcess

Project 3: Web APIs & NLP Problem Statement How do r/Libertarian and r/Neoliberal differ on Biden post-inaguration? The goal of the project is to see

Adam Muhammad Klesc 2 Mar 29, 2022
lightweight, fast and robust columnar dataframe for data analytics with online update

streamdf Streamdf is a lightweight data frame library built on top of the dictionary of numpy array, developed for Kaggle's time-series code competiti

23 May 19, 2022
SpikeX - SpaCy Pipes for Knowledge Extraction

SpikeX is a collection of pipes ready to be plugged in a spaCy pipeline. It aims to help in building knowledge extraction tools with almost-zero effort.

Erre Quadro Srl 384 Dec 12, 2022
English loanwords in the world's languages

Wiktionary as CLDF Content cldf1 and cldf2 contain cldf-conform data sets with a total of 2 377 756 entries about the vocabulary of all 1403 languages

Viktor Martinović 3 Jan 14, 2022
CPC-big and k-means clustering for zero-resource speech processing

The CPC-big model and k-means checkpoints used in Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing.

Benjamin van Niekerk 5 Nov 23, 2022
Few-shot Natural Language Generation for Task-Oriented Dialog

Few-shot Natural Language Generation for Task-Oriented Dialog This repository contains the dataset, source code and trained model for the following pa

172 Dec 13, 2022
Sapiens is a human antibody language model based on BERT.

Sapiens: Human antibody language model ____ _ / ___| __ _ _ __ (_) ___ _ __ ___ \___ \ / _` | '_ \| |/ _ \ '

Merck Sharp & Dohme Corp. a subsidiary of Merck & Co., Inc. 13 Nov 20, 2022
Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

The KLEJ Benchmark Baselines The KLEJ benchmark (Kompleksowa Lista Ewaluacji Językowych) is a set of nine evaluation tasks for the Polish language und

Allegro Tech 17 Oct 18, 2022
Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models

PEGASUS library Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models, or PEGASUS, uses self-supervised

Google Research 1.4k Dec 22, 2022
HiFi DeepVariant + WhatsHap workflowHiFi DeepVariant + WhatsHap workflow

HiFi DeepVariant + WhatsHap workflow Workflow steps align HiFi reads to reference with pbmm2 call small variants with DeepVariant, using two-pass meth

William Rowell 2 May 14, 2022
Autoregressive Entity Retrieval

The GENRE (Generative ENtity REtrieval) system as presented in Autoregressive Entity Retrieval implemented in pytorch. @inproceedings{decao2020autoreg

Meta Research 611 Dec 16, 2022
PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop

molten A minimal, extensible, fast and productive API framework for Python 3. Changelog: https://moltenframework.com/changelog.html Community: https:/

3.2k Dec 28, 2022
This is a general repo that helps you develop fast/effective NLP classifiers using Huggingface

NLP Classifier Introduction This project trains a bert model on any NLP classifcation model. And uses the model in make predictions on new data using

Abdullah Tarek 3 Mar 11, 2022
Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre

THUNLP 2.3k Jan 08, 2023
Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

Chenhe Dong 28 Nov 10, 2022
An extensive UI tool built using new data scraped from BBC News

BBC-News-Analyzer An extensive UI tool built using new data scraped from BBC New

Antoreep Jana 1 Dec 31, 2021
Machine Learning Course Project, IMDB movie review sentiment analysis by lstm, cnn, and transformer

IMDB Sentiment Analysis This is the final project of Machine Learning Courses in Huazhong University of Science and Technology, School of Artificial I

Daniel 0 Dec 27, 2021
构建一个多源(公众号、RSS)、干净、个性化的阅读环境

2C 构建一个多源(公众号、RSS)、干净、个性化的阅读环境 作为一名微信公众号的重度用户,公众号一直被我设为汲取知识的地方。随着使用程度的增加,相信大家或多或少会有一个比较头疼的问题——广告问题。 假设你关注的公众号有十来个,若一个公众号两周接一次广告,理论上你会面临二十多次广告,实际上会更多,运

howie.hu 678 Dec 28, 2022