Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Last update: Dec 27, 2022

Overview

Optimizing Dense Retrieval Model Training with Hard Negatives

Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, Shaoping Ma

This repo provides code, retrieval results, and trained models for our SIGIR Full paper Optimizing Dense Retrieval Model Training with Hard Negatives. The previous version is Learning To Retrieve: How to Train a Dense Retrieval Model Effectively and Efficiently.

We achieve very impressive retrieval results on both passage and document retrieval bechmarks. The proposed two algorithms (STAR and ADORE) are very efficient. IMHO, they are well worth trying and most likely improve your retriever's performance by a large margin.

The following figure shows the pros and cons of different training methods. You can train an effective Dense Retrieval model in three steps. Firstly, warmup your model using random negatives or BM25 top negatives. Secondly, use our proposed STAR to train the query encoder and document encoder. Thirdly, use our proposed ADORE to train the query encoder.

Retrieval Results and Trained Models

Passage Retrieval	Dev [email protected]	Dev [email protected]	Test [email protected]	Files
Inbatch-Neg	0.264	0.837	0.583	Model
Rand-Neg	0.301	0.853	0.612	Model
STAR	0.340	0.867	0.642	Model Train Dev TRECTest
ADORE (Inbatch-Neg)	0.316	0.860	0.658	Model
ADORE (Rand-Neg)	0.326	0.865	0.661	Model
ADORE (STAR)	0.347	0.876	0.683	Model Train Dev TRECTest Leaderboard

Doc Retrieval	Dev [email protected]	Dev [email protected]	Test [email protected]	Files
Inbatch-Neg	0.320	0.864	0.544	Model
Rand-Neg	0.330	0.859	0.572	Model
STAR	0.390	0.867	0.605	Model Train Dev TRECTest
ADORE (Inbatch-Neg)	0.362	0.884	0.580	Model
ADORE (Rand-Neg)	0.361	0.885	0.585	Model
ADORE (STAR)	0.405	0.919	0.628	Model Train Dev TRECTest Leaderboard

If you want to use our first-stage leaderboard runs, contact me and I will send you the file.

If any links fail or the files go wrong, please contact me or open a issue.

Requirements

To install requirements, run the following commands:

git clone [email protected]:jingtaozhan/DRhard.git
cd DRhard
python setup.py install

However, you need to set up a new python enverionment for data preprocessing (see below).

Data Download

To download all the needed data, run:

bash download_data.sh

Data Preprocess

You need to set up a new environment with transformers==2.8.0 to tokenize the text. This is because we find the tokenizer behaves differently among versions 2, 3 and 4. To replicate the results in our paper with our provided trained models, it is necessary to use version 2.8.0 for preprocessing. Otherwise, you may need to re-train the DR models.

Run the following codes.

python preprocess.py --data_type 0; python preprocess.py --data_type 1

Inference

With our provided trained models, you can easily replicate our reported experimental results. Note that minor variance may be observed due to environmental difference.

STAR

The following codes use the provided STAR model to compute query/passage embeddings and perform similarity search on the dev set. (You can use --faiss_gpus option to use gpus for much faster similarity search.)

python ./star/inference.py --data_type passage --max_doc_length 256 --mode dev   
python ./star/inference.py --data_type doc --max_doc_length 512 --mode dev

Run the following code to evaluate on MSMARCO Passage dataset.

python ./msmarco_eval.py ./data/passage/preprocess/dev-qrel.tsv ./data/passage/evaluate/star/dev.rank.tsv

Eval Started
#####################
MRR @10: 0.3404237731386721
QueriesRanked: 6980
#####################

Run the following code to evaluate on MSMARCO Document dataset.

python ./msmarco_eval.py ./data/doc/preprocess/dev-qrel.tsv ./data/doc/evaluate/star/dev.rank.tsv 100

Eval Started
#####################
MRR @100: 0.3903422772218344
QueriesRanked: 5193
#####################

ADORE

ADORE computes the query embeddings. The document embeddings are pre-computed by other DR models, like STAR. The following codes use the provided ADORE(STAR) model to compute query embeddings and perform similarity search on the dev set. (You can use --faiss_gpus option to use gpus for much faster similarity search.)

python ./adore/inference.py --model_dir ./data/passage/trained_models/adore-star --output_dir ./data/passage/evaluate/adore-star --preprocess_dir ./data/passage/preprocess --mode dev --dmemmap_path ./data/passage/evaluate/star/passages.memmap
python ./adore/inference.py --model_dir ./data/doc/trained_models/adore-star --output_dir ./data/doc/evaluate/adore-star --preprocess_dir ./data/doc/preprocess --mode dev --dmemmap_path ./data/doc/evaluate/star/passages.memmap

Evaluate ADORE(STAR) model on dev passage dataset:

python ./msmarco_eval.py ./data/passage/preprocess/dev-qrel.tsv ./data/passage/evaluate/adore-star/dev.rank.tsv

You will get

Eval Started
#####################
MRR @10: 0.34660697230181425
QueriesRanked: 6980
#####################

Evaluate ADORE(STAR) model on dev document dataset:

python ./msmarco_eval.py ./data/doc/preprocess/dev-qrel.tsv ./data/doc/evaluate/adore-star/dev.rank.tsv 100

You will get

Eval Started
#####################
MRR @100: 0.4049777020859768
QueriesRanked: 5193
#####################

Convert QID/PID Back

Our data preprocessing reassigns new ids for each query and document. Therefore, you may want to convert the ids back. We provide a script for this.

The following code shows an example to convert ADORE-STAR's ranking results on the dev passage dataset.

python ./cvt_back.py --input_dir ./data/passage/evaluate/adore-star/ --preprocess_dir ./data/passage/preprocess --output_dir ./data/passage/official_runs/adore-star --mode dev --dataset passage
python ./msmarco_eval.py ./data/passage/dataset/qrels.dev.small.tsv ./data/passage/official_runs/adore-star/dev.rank.tsv

You will get

Eval Started
#####################
MRR @10: 0.34660697230181425
QueriesRanked: 6980
#####################

Train

Instructions will be ready this weekend (7.18).

Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Related tags

Overview

Optimizing Dense Retrieval Model Training with Hard Negatives

Retrieval Results and Trained Models

Requirements

Data Download

Data Preprocess

Inference

STAR

ADORE

Convert QID/PID Back

Train

Owner

Jingtao Zhan

PyTorch implementation of ENet

Use your Philips Hue lights as Racing Flags. Works with Assetto Corsa, Assetto Corsa Competizione and iRacing.

Project ArXiv Citation Network

Automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azure

ISTR: End-to-End Instance Segmentation with Transformers (https://arxiv.org/abs/2105.00637)

Securetar - A streaming wrapper around python tarfile and allow secure handling files and support encryption

Graph Posterior Network: Bayesian Predictive Uncertainty for Node Classification (NeurIPS 2021)

Clockwork Variational Autoencoder

PyTorch implementation for COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction (CVPR 2021)

naked is a Python tool which allows you to strip a model and only keep what matters for making predictions.

Official PyTorch implemention of our paper "Learning to Rectify for Robust Learning with Noisy Labels".

Official code for "Distributed Deep Learning in Open Collaborations" (NeurIPS 2021)

FirmWire is a full-system baseband firmware emulation platform for fuzzing, debugging, and root-cause analysis of smartphone baseband firmwares

Tutorial materials for Part of NSU Intro to Deep Learning with PyTorch.

Official Keras Implementation for UNet++ in IEEE Transactions on Medical Imaging and DLMIA 2018

Implementation of the method proposed in the paper "Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation"

Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

MT3: Multi-Task Multitrack Music Transcription

A brand new hub for Scene Graph Generation methods based on MMdetection (2021). The pipeline of from detection, scene graph generation to downstream tasks (e.g., image cpationing) is supported. Pytorch version implementation of HetH (ECCV 2020) and TopicSG (ICCV 2021) is included.

PyTorch implementation of Rethinking Positional Encoding in Language Pre-training