This repo provides code for QB-Norm (Cross Modal Retrieval with Querybank Normalisation)

Related tags

Deep Learningqb-norm
Overview

This repo provides code for QB-Norm (Cross Modal Retrieval with Querybank Normalisation)

Usage example

python dynamic_inverted_softmax.py --sims_train_test_path msrvtt/tt-ce-train-captions-test-videos-seed0.pkl --sims_test_path msrvtt/tt-ce-test-captions-test-videos-seed0.pkl --test_query_masks_path msrvtt/tt-ce-test-query_masks.pkl

To test QB-Norm on your own data you need to:

  1. Extract the similarity matrix between the caption from the training split and the videos from the testing split path/to/sims/train/test
  2. Extract testing split similarity matrix (similarities between testing captions and testing video) path/to/sims/test
  3. Run QB-Norm
python dynamic_inverted_softmax.py --sims_train_test_path path/to/sims/train/test --sims_test_path path/to/sims/test

Data

The similarity matrices for each method were extracted using the official repositories as follows: CE+, TT-CE+, CLIP2Video, CLIP4Clip (for CLIP4Clip we used the official repo to train from scratch new models since they do not provide pre-trained weights), CLIP, MMT, Audio-Retrieval.

You can download the extracted similarity matrices for training and testing here: MSRVTT, MSVD, DiDeMo, LSMDC.

Text-Video retrieval results

QB-Norm Results on MSRVTT Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
CE+ Full t2v 14.4(0.1) 37.4(0.1) 50.2(0.1) 10.0(0.0) 30.0(0.1)
CE+ (+QB-Norm) Full t2v 16.4(0.0) 40.3(0.1) 52.9(0.1) 9.0(0.0) 32.7(0.1)
TT-CE+ Full t2v 14.9(0.1) 38.3(0.1) 51.5(0.1) 10.0(0.0) 30.9(0.1)
TT-CE+ (+QB-Norm) Full t2v 17.3(0.0) 42.1(0.2) 54.9(0.1) 8.0(0.0) 34.2(0.1)

QB-Norm Results on MSVD Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
TT-CE+ Full t2v 25.4(0.3) 56.9(0.4) 71.3(0.2) 4.0(0.0) 46.9(0.3)
TT-CE+ (+QB-Norm) Full t2v 26.6(1.0) 58.6(1.3) 71.8(1.1) 4.0(0.0) 48.2(1.2)
CLIP2Video Full t2v 47.0 76.8 85.9 2.0 67.7
CLIP2Video (+QB-Norm) Full t2v 48.0 77.9 86.2 2.0 68.5

QB-Norm Results on DiDeMo Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
TT-CE+ Full t2v 21.6(0.7) 48.6(0.4) 62.9(0.6) 6.0(0.0) 40.4(0.4)
TT-CE+ (+QB-Norm) Full t2v 24.2(0.7) 50.8(0.7) 64.4(0.1) 5.3(0.5) 43.0(0.2)
CLIP4Clip Full t2v 43.0 70.5 80.0 2.0 62.4
CLIP4Clip (+QB-Norm) Full t2v 43.5 71.4 80.9 2.0 63.1

QB-Norm Results on LSMDC Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
TT-CE+ Full t2v 17.2(0.4) 36.5(0.6) 46.3(0.3) 13.7(0.5) 30.7(0.3)
TT-CE+ (+QB-Norm) Full t2v 17.8(0.4) 37.7(0.5) 47.6(0.6) 12.7(0.5) 31.7(0.3)
CLIP4Clip Full t2v 21.3 40.0 49.5 11.0 34.8
CLIP4Clip (+QB-Norm) Full t2v 22.4 40.1 49.5 11.0 35.4

QB-Norm Results on VaTeX Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
TT-CE+ Full t2v 53.2(0.2) 87.4(0.1) 93.3(0.0) 1.0(0.0) 75.7(0.1)
TT-CE+ (+QB-Norm) Full t2v 54.8(0.1) 88.2(0.1) 93.8(0.1) 1.0(0.0) 76.8(0.0)
CLIP2Video Full t2v 57.4 87.9 93.6 1.0 77.9
CLIP2Video (+QB-Norm) Full t2v 58.8 88.3 93.8 1.0 78.7

QB-Norm Results on QuerYD Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
CE+ Full t2v 13.2(2.0) 37.1(2.9) 50.5(1.9) 10.3(1.2) 29.1(2.2)
CE+ (+QB-Norm) Full t2v 14.1(1.8) 38.6(1.3) 51.1(1.6) 10.0(0.8) 30.2(1.7)
TT-CE+ Full t2v 14.4(0.5) 37.7(1.7) 50.9(1.6) 9.8(1.0) 30.3(0.9)
TT-CE+ (+QB-Norm) Full t2v 15.1(1.6) 38.3(2.4) 51.2(2.8) 10.3(1.7) 30.9(2.3)

Text-Image retrieval results

QB-Norm Results on MSCoCo Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
CLIP 5k t2i 30.3 56.1 67.1 4.0 48.5
CLIP (+QB-Norm) 5k t2i 34.8 59.9 70.4 3.0 52.8
MMT-Oscar 5k t2i 52.2 80.2 88.0 1.0 71.7
MMT-Oscar (+QB-Norm) 5k t2i 53.9 80.5 88.1 1.0 72.6

Text-Audio retrieval results

QB-Norm Results on AudioCaps Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
AR-CE Full t2a 23.1(0.6) 55.1(0.7) 70.7(0.6) 4.7(0.5) 44.8(0.7)
AR-CE (+QB-Norm) Full t2a 23.9(0.2) 57.1(0.3) 71.6(0.4) 4.0(0.0) 46.0(0.3)

References

If you find this code useful or use the extracted similarity matrices, please consider citing:

@misc{bogolin2021cross,
      title={Cross Modal Retrieval with Querybank Normalisation}, 
      author={Simion-Vlad Bogolin and Ioana Croitoru and Hailin Jin and Yang Liu and Samuel Albanie},
      year={2021},
      eprint={2112.12777},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Coarse implement of the paper "A Simultaneous Denoising and Dereverberation Framework with Target Decoupling", On DNS-2020 dataset, the DNSMOS of first stage is 3.42 and second stage is 3.47.

SDDNet Coarse implement of the paper "A Simultaneous Denoising and Dereverberation Framework with Target Decoupling", On DNS-2020 dataset, the DNSMOS

Cyril Lv 43 Nov 21, 2022
Code for our paper Aspect Sentiment Quad Prediction as Paraphrase Generation in EMNLP 2021.

Aspect Sentiment Quad Prediction (ASQP) This repo contains the annotated data and code for our paper Aspect Sentiment Quad Prediction as Paraphrase Ge

Isaac 39 Dec 11, 2022
Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels

The official code for the NeurIPS 2021 paper Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels

13 Dec 22, 2022
A visualization tool to show a TensorFlow's graph like TensorBoard

tfgraphviz tfgraphviz is a module to visualize a TensorFlow's data flow graph like TensorBoard using Graphviz. tfgraphviz enables to provide a visuali

44 Nov 09, 2022
docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Mindee 1.5k Jan 01, 2023
Latte: Cross-framework Python Package for Evaluation of Latent-based Generative Models

Cross-framework Python Package for Evaluation of Latent-based Generative Models Latte Latte (for LATent Tensor Evaluation) is a cross-framework Python

Karn Watcharasupat 30 Sep 08, 2022
Practical Blind Denoising via Swin-Conv-UNet and Data Synthesis

Practical Blind Denoising via Swin-Conv-UNet and Data Synthesis [Paper] [Online Demo] The following results are obtained by our SCUNet with purely syn

Kai Zhang 312 Jan 07, 2023
Gated-Shape CNN for Semantic Segmentation (ICCV 2019)

GSCNN This is the official code for: Gated-SCNN: Gated Shape CNNs for Semantic Segmentation Towaki Takikawa, David Acuna, Varun Jampani, Sanja Fidler

859 Dec 26, 2022
PlenOctree Extraction algorithm

PlenOctrees_NeRF-SH This is an implementation of the Paper PlenOctrees for Real-time Rendering of Neural Radiance Fields. Not only the code provides t

49 Nov 05, 2022
ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021)

ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021) Project Page | Video | Paper | Data We present a novel metho

65 Nov 28, 2022
Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Relational Self-Attention: What's Missing in Attention for Video Understanding This repository is the official implementation of "Relational Self-Atte

mandos 43 Dec 07, 2022
Official implementation of the Neurips 2021 paper Searching Parameterized AP Loss for Object Detection.

Parameterized AP Loss By Chenxin Tao, Zizhang Li, Xizhou Zhu, Gao Huang, Yong Liu, Jifeng Dai This is the official implementation of the Neurips 2021

46 Jul 06, 2022
PyTorch code for DriveGAN: Towards a Controllable High-Quality Neural Simulation

PyTorch code for DriveGAN: Towards a Controllable High-Quality Neural Simulation

76 Dec 24, 2022
A Pytorch loader for MVTecAD dataset.

MVTecAD A Pytorch loader for MVTecAD dataset. It strictly follows the code style of common Pytorch datasets, such as torchvision.datasets.CIFAR10. The

Jiyuan 1 Dec 27, 2021
Rethinking of Pedestrian Attribute Recognition: A Reliable Evaluation under Zero-Shot Pedestrian Identity Setting

Pytorch Pedestrian Attribute Recognition: A strong PyTorch baseline of pedestrian attribute recognition and multi-label classification.

Jian 79 Dec 18, 2022
This is the official code of our paper "Diversity-based Trajectory and Goal Selection with Hindsight Experience Relay" (PRICAI 2021)

Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay This is the official implementation of our paper "Diversity-based Traje

Tianhong Dai 6 Jul 18, 2022
The source code and dataset for the RecGURU paper (WSDM 2022)

RecGURU About The Project Source code and baselines for the RecGURU paper "RecGURU: Adversarial Learning of Generalized User Representations for Cross

Chenglin Li 17 Jan 07, 2023
This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

OpenAI 3k Dec 26, 2022
Learned Token Pruning for Transformers

LTP: Learned Token Pruning for Transformers Check our paper for more details. Installation We follow the same installation procedure as the original H

Sehoon Kim 52 Dec 29, 2022
PyTorch implementation of Wide Residual Networks with 1-bit weights by McDonnell (ICLR 2018)

1-bit Wide ResNet PyTorch implementation of training 1-bit Wide ResNets from this paper: Training wide residual networks for deployment using a single

Sergey Zagoruyko 122 Dec 07, 2022