Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Overview

Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

This is the PyTorch companion code for the paper:

Amaia Salvador, Erhan Gundogdu, Loris Bazzani, and Michael Donoser. Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning. CVPR 2021

If you find this code useful in your research, please consider citing using the following BibTeX entry:

@inproceedings{salvador2021revamping,
    title={Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning},
    author={Salvador, Amaia and Gundogdu, Erhan and Bazzani, Loris and Donoser, Michael},
    booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    month = {June},
    year = {2021}
}

Cloning

This repository uses git-lfs to store model checkpoint files. Make sure to install it before cloning by following the instructions here:

Once installed, model checkpoint files will be automatically downloaded when cloning the repository with:

git clone [email protected]:amzn/image-to-recipe-transformers.git

These files can optionally be ignored by using git lfs install --skip-smudge before cloning the repository, and can be downloaded at any time using git lfs pull.

Installation

  • Create conda environment: conda env create -f environment.yml
  • Activate it with conda activate im2recipetransformers

Data preparation

  • Download & uncompress Recipe1M dataset. The contents of the directory DATASET_PATH should be the following:
layer1.json
layer2.json
train/
val/
test/

The directories train/, val/, and test/ must contain the image files for each split after uncompressing.

  • Make splits and create vocabulary by running:
python preprocessing.py --root DATASET_PATH

This process will create auxiliary files under DATASET_PATH/traindata, which will be used for training.

Training

  • Launch training with:
python train.py --model_name model --root DATASET_PATH --save_dir /path/to/saved/model/checkpoints

Tensorboard logging can be enabled with --tensorboard. Then, from the checkpoints directory run:

tensorboard --logdir "./" --port PORT

Run python train.py --help for the full list of available arguments.

Evaluation

  • Extract features from the trained model for the test set samples of Recipe1M:
python test.py --model_name model --eval_split test --root DATASET_PATH --save_dir /path/to/saved/model/checkpoints
  • Compute MedR and recall metrics for the extracted feature set:
python eval.py --embeddings_file /path/to/saved/model/checkpoints/model/feats_test.pkl --medr_N 10000

Pretrained models

  • We provide pretrained model weights under the checkpoints directory. Make sure you run git lfs pull to download the model files.
  • Extract the zip files. For each model, a folder named MODEL_NAME with two files, args.pkl, and model-best.ckpt is provided.
  • Extract features for the test set samples of Recipe1M using one of the pretrained models by running:
python test.py --model_name MODEL_NAME --eval_split test --root DATASET_PATH --save_dir ../checkpoints
  • A file with extracted features will be saved under ../checkpoints/MODEL_NAME.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Owner
Amazon
Amazon
The guide to tackle with the Text Summarization

The guide to tackle with the Text Summarization

Takahiro Kubo 1.2k Dec 30, 2022
A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Chimera: Learning Shared Semantic Space for Speech-to-Text Translation This is a Pytorch implementation for the "Chimera" paper Learning Shared Semant

Chi Han 43 Dec 28, 2022
Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

This repository contains code for the following two papers: VisualBERT: A Simple and Performant Baseline for Vision and Language (arxiv) with a short

Natural Language Processing @UCLA 464 Jan 04, 2023
Azure Text-to-speech service for Home Assistant

Azure Text-to-speech service for Home Assistant The Azure text-to-speech platform uses online Azure Text-to-Speech cognitive service to read a text wi

Yassine Selmi 2 Aug 06, 2022
This repo is to provide a list of literature regarding Deep Learning on Graphs for NLP

This repo is to provide a list of literature regarding Deep Learning on Graphs for NLP

Graph4AI 230 Nov 22, 2022
TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

Alexa 98 Dec 09, 2022
Kinky furry assitant based on GPT2

KinkyFurs-V0 Kinky furry assistant based on GPT2 How to run python3 V0.py then, open web browser and go to localhost:8080 Requirements: Flask trans

Sparki 1 Jun 11, 2022
Test finetuning of XLSR (multilingual wav2vec 2.0) for other speech classification tasks

wav2vec_finetune Test finetuning of XLSR (multilingual wav2vec 2.0) for other speech classification tasks Initial test: gender recognition on this dat

8 Aug 11, 2022
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Texar-PyTorch is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar

ASYML 726 Dec 30, 2022
Example code for "Real-World Natural Language Processing"

Real-World Natural Language Processing This repository contains example code for the book "Real-World Natural Language Processing." AllenNLP (2.5.0 or

Masato Hagiwara 303 Dec 17, 2022
Utilizing RBERT model for KLUE Relation Extraction task

RBERT for Relation Extraction task for KLUE Project Description Relation Extraction task is one of the task of Korean Language Understanding Evaluatio

snoop2head 14 Nov 15, 2022
NLP Text Classification

多标签文本分类任务 近年来随着深度学习的发展,模型参数的数量飞速增长。为了训练这些参数,需要更大的数据集来避免过拟合。然而,对于大部分NLP任务来说,构建大规模的标注数据集非常困难(成本过高),特别是对于句法和语义相关的任务。相比之下,大规模的未标注语料库的构建则相对容易。为了利用这些数据,我们可以

Jason 1 Nov 11, 2021
Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

GAN stability This repository contains the experiments in the supplementary material for the paper Which Training Methods for GANs do actually Converg

Lars Mescheder 884 Nov 11, 2022
Telegram AI chat bot written in Python using Pyrogram

Aurora_Al Just another Telegram AI chat bot written in Python using Pyrogram. A public running instance can be found on telegram as @AuroraAl. Require

♗CσNϙUҽRσR_MҽSƙEƚҽҽR 1 Oct 31, 2021
test

Lidar-data-decode In this project, you can decode your lidar data frame(pcap file) and make your own datasets(test dataset) in Windows without any hug

46 Dec 05, 2022
Turkish Stop Words Türkçe Dolgu Sözcükleri

trstop Turkish Stop Words Türkçe Dolgu Sözcükleri In this repository I put Turkish stop words that is contained in the first 10 thousand words with th

Ahmet Aksoy 103 Nov 12, 2022
189 Jan 02, 2023
Learning to Rewrite for Non-Autoregressive Neural Machine Translation

RewriteNAT This repo provides the code for reproducing our proposed RewriteNAT in EMNLP 2021 paper entitled "Learning to Rewrite for Non-Autoregressiv

Xinwei Geng 20 Dec 25, 2022
Subtitle Workshop (subshop): tools to download and synchronize subtitles

SUBSHOP Tools to download, remove ads, and synchronize subtitles. SUBSHOP Purpose Limitations Required Web Credentials Installation, Configuration, an

Joe D 4 Feb 13, 2022
Unsupervised Abstract Reasoning for Raven’s Problem Matrices

Unsupervised Abstract Reasoning for Raven’s Problem Matrices This code is the implementation of our TIP paper. This is the first unsupervised abstract

Tao Zhuo 9 Dec 17, 2022