Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Related tags

Deep LearningE.T.
Overview

Episodic Transformers (E.T.)

Episodic Transformer for Vision-and-Language Navigation
Alexander Pashevich, Cordelia Schmid, Chen Sun

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions. This code reproduces the results obtained with E.T. on ALFRED benchmark. To learn more about the benchmark and the original code, please refer to ALFRED repository.

Quickstart

Clone repo:

$ git clone https://github.com/alexpashevich/E.T..git ET
$ export ET_ROOT=$(pwd)/ET
$ export ET_LOGS=$ET_ROOT/logs
$ export ET_DATA=$ET_ROOT/data
$ export PYTHONPATH=$PYTHONPATH:$ET_ROOT

Install requirements:

$ virtualenv -p $(which python3.7) et_env
$ source et_env/bin/activate

$ cd $ET_ROOT
$ pip install --upgrade pip
$ pip install -r requirements.txt

Downloading data and checkpoints

Download ALFRED dataset:

$ cd $ET_DATA
$ sh download_data.sh json_feat

Copy pretrained checkpoints:

$ wget http://pascal.inrialpes.fr/data2/apashevi/et_checkpoints.zip
$ unzip et_checkpoints.zip
$ mv pretrained $ET_LOGS/

Render PNG images and create an LMDB dataset with natural language annotations:

$ python -m alfred.gen.render_trajs
$ python -m alfred.data.create_lmdb with args.visual_checkpoint=$ET_LOGS/pretrained/fasterrcnn_model.pth args.data_output=lmdb_human args.vocab_path=$ET_ROOT/files/human.vocab

Note #1: For rendering, you may need to configure args.x_display to correspond to an X server number running on your machine.
Note #2: We do not use JPG images from the full dataset as they would differ from the images rendered during evaluation due to the JPG compression.

Pretrained models evaluation

Evaluate an E.T. agent trained on human data only:

$ python -m alfred.eval.eval_agent with eval.exp=pretrained eval.checkpoint=et_human_pretrained.pth eval.object_predictor=$ET_LOGS/pretrained/maskrcnn_model.pth exp.num_workers=5 eval.eval_range=None exp.data.valid=lmdb_human

Note: make sure that your LMDB database is called exactly lmdb_human as the word embedding won't be loaded otherwise.

Evaluate an E.T. agent trained on human and synthetic data:

$ python -m alfred.eval.eval_agent with eval.exp=pretrained eval.checkpoint=et_human_synth_pretrained.pth eval.object_predictor=$ET_LOGS/pretrained/maskrcnn_model.pth exp.num_workers=5 eval.eval_range=None exp.data.valid=lmdb_human

Note: For evaluation, you may need to configure eval.x_display to correspond to an X server number running on your machine.

E.T. with human data only

Train an E.T. agent:

$ python -m alfred.model.train with exp.model=transformer exp.name=et_s1 exp.data.train=lmdb_human train.seed=1

Evaluate the trained E.T. agent:

$ python -m alfred.eval.eval_agent with eval.exp=et_s1 eval.object_predictor=$ET_LOGS/pretrained/maskrcnn_model.pth exp.num_workers=5

Note: you may need to train up to 5 agents using different random seeds to reproduce the results of the paper.

E.T. with language pretraining

Language encoder pretraining with the translation objective:

$ python -m alfred.model.train with exp.model=speaker exp.name=translator exp.data.train=lmdb_human

Train an E.T. agent with the language pretraining:

$ python -m alfred.model.train with exp.model=transformer exp.name=et_synth_s1 exp.data.train=lmdb_human train.seed=1 exp.pretrained_path=translator

Evaluate the trained E.T. agent:

$ python -m alfred.eval.eval_agent with eval.exp=et_synth_s1 eval.object_predictor=$ET_LOGS/pretrained/maskrcnn_model.pth exp.num_workers=5

Note: you may need to train up to 5 agents using different random seeds to reproduce the results of the paper.

E.T. with joint training

You can also generate more synthetic trajectories using generate_trajs.py, create an LMDB and jointly train a model on it. Please refer to the original ALFRED code to know more the data generation. The steps to reproduce the results are the following:

  1. Generate 45K trajectories with alfred.gen.generate_trajs.
  2. Create a synthetic LMDB dataset called lmdb_synth_45K using args.visual_checkpoint=$ET_LOGS/pretrained/fasterrcnn_model.pth and args.vocab_path=$ET_ROOT/files/synth.vocab.
  3. Train an E.T. agent using exp.data.train=lmdb_human,lmdb_synth_45K.

Citation

If you find this repository useful, please cite our work:

@misc{pashevich2021episodic,
  title ={{Episodic Transformer for Vision-and-Language Navigation}},
  author={Alexander Pashevich and Cordelia Schmid and Chen Sun},
  year={2021},
  eprint={2105.06453},
  archivePrefix={arXiv},
}
Owner
Alex Pashevich
PhD student at Thoth (Inria Alpes, France)
Alex Pashevich
Trying to understand alias-free-gan.

alias-free-gan-explanation Trying to understand alias-free-gan in my own way. [Chinese Version 中文版本] CC-BY-4.0 License. Tzu-Heng Lin motivation of thi

Tzu-Heng Lin 12 Mar 17, 2022
MegEngine implementation of YOLOX

Introduction YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and ind

旷视天元 MegEngine 77 Nov 22, 2022
An implementation of Deep Forest 2021.2.1.

Deep Forest (DF) 21 DF21 is an implementation of Deep Forest 2021.2.1. It is designed to have the following advantages: Powerful: Better accuracy than

LAMDA Group, Nanjing University 795 Jan 03, 2023
Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly

Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly Code for this paper Ultra-Data-Efficient GAN Tra

VITA 77 Oct 05, 2022
Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Keyword Spotting Transformer This is the unofficial TensorFlow implementation of the Keyword Spotting Transformer model. This model is used to train o

Intelligent Machines Limited 8 May 11, 2022
Official code for "Mean Shift for Self-Supervised Learning"

MSF Official code for "Mean Shift for Self-Supervised Learning" Requirements Python = 3.7.6 PyTorch = 1.4 torchvision = 0.5.0 faiss-gpu = 1.6.1 In

UMBC Vision 44 Nov 21, 2022
Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data

Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data This is the official PyTorch implementation of the SeCo paper: @articl

ElementAI 101 Dec 12, 2022
Demonstrates how to divide a DL model into multiple IR model files (division) and introduce a simplest way to implement a custom layer works with OpenVINO IR models.

Demonstration of OpenVINO techniques - Model-division and a simplest-way to support custom layers Description: Model Optimizer in Intel(r) OpenVINO(tm

Yasunori Shimura 12 Nov 09, 2022
This is the repository of our article published on MDPI Entropy "Feature Selection for Recommender Systems with Quantum Computing".

Collaborative-driven Quantum Feature Selection This repository was developed by Riccardo Nembrini, PhD student at Politecnico di Milano. See the websi

Quantum Computing Lab @ Politecnico di Milano 10 Apr 21, 2022
Source code for Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning Official implementation of ACC, described in the paper "Adaptively Calibrated C

3 Sep 16, 2022
PyTorch implementation of MuseMorphose, a Transformer-based model for music style transfer.

MuseMorphose This repository contains the official implementation of the following paper: Shih-Lun Wu, Yi-Hsuan Yang MuseMorphose: Full-Song and Fine-

Yating Music, Taiwan AI Labs 142 Jan 08, 2023
Framework for joint representation learning, evaluation through multimodal registration and comparison with image translation based approaches

CoMIR: Contrastive Multimodal Image Representation for Registration Framework 🖼 Registration of images in different modalities with Deep Learning 🤖

Methods for Image Data Analysis - MIDA 55 Dec 09, 2022
Official implementation of the paper "Steganographer Detection via a Similarity Accumulation Graph Convolutional Network"

SAGCN - Official PyTorch Implementation | Paper | Project Page This is the official implementation of the paper "Steganographer detection via a simila

ZHANG Zhi 1 Nov 26, 2021
Code for IntraQ, PyTorch implementation of our paper under review

IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization paper Requirements Python = 3.7.10 Pytorch == 1.7

1 Nov 19, 2021
The Official Repository for "Generalized OOD Detection: A Survey"

Generalized Out-of-Distribution Detection: A Survey 1. Overview This repository is with our survey paper: Title: Generalized Out-of-Distribution Detec

Jingkang Yang 338 Jan 03, 2023
Data, notebooks, and articles associated with the RSNA AI Deep Learning Lab at RSNA 2021

RSNA AI Deep Learning Lab 2021 Intro Welcome Deep Learners! This document provides all the information you need to participate in the RSNA AI Deep Lea

RSNA 65 Dec 16, 2022
Useful materials and tutorials for 110-1 NTU DBME5028 (Application of Deep Learning in Medical Imaging)

Useful materials and tutorials for 110-1 NTU DBME5028 (Application of Deep Learning in Medical Imaging)

7 Jun 22, 2022
[SIGMETRICS 2022] One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search

One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search paper | website One Proxy Device Is Enough for Hardware-Aware Neural Architec

10 Dec 16, 2022
Image Classification - A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches

A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches

0 Jan 23, 2022