Joint Versus Independent Multiview Hashing for Cross-View Retrieval[J] (IEEE TCYB 2021, PyTorch Code)

Related tags

Deep LearningDCHN
Overview

2021-IEEE TCYB-DCHN

Peng Hu, Xi Peng, Hongyuan Zhu, Jie Lin, Liangli Zhen, Dezhong Peng, Joint Versus Independent Multiview Hashing for Cross-View Retrieval[J]. IEEE Transactions on Cybernetics, vol. 51, no. 10, pp. 4982-4993, Oct. 2021. (PyTorch Code)

Abstract

Thanks to the low storage cost and high query speed, cross-view hashing (CVH) has been successfully used for similarity search in multimedia retrieval. However, most existing CVH methods use all views to learn a common Hamming space, thus making it difficult to handle the data with increasing views or a large number of views. To overcome these difficulties, we propose a decoupled CVH network (DCHN) approach which consists of a semantic hashing autoencoder module (SHAM) and multiple multiview hashing networks (MHNs). To be specific, SHAM adopts a hashing encoder and decoder to learn a discriminative Hamming space using either a few labels or the number of classes, that is, the so-called flexible inputs. After that, MHN independently projects all samples into the discriminative Hamming space that is treated as an alternative ground truth. In brief, the Hamming space is learned from the semantic space induced from the flexible inputs, which is further used to guide view-specific hashing in an independent fashion. Thanks to such an independent/decoupled paradigm, our method could enjoy high computational efficiency and the capacity of handling the increasing number of views by only using a few labels or the number of classes. For a newly coming view, we only need to add a view-specific network into our model and avoid retraining the entire model using the new and previous views. Extensive experiments are carried out on five widely used multiview databases compared with 15 state-of-the-art approaches. The results show that the proposed independent hashing paradigm is superior to the common joint ones while enjoying high efficiency and the capacity of handling newly coming views.

Framework

DCHN

Figure 1. Framework of the proposed DCHN method. g is the output of the corresponding view (i.e., image, text, video, etc.). o is the semantic hash code that is computed by the corresponding label y and semantic hashing transformation W. W is computed by the proposed semantic hashing autoencoder module (SHAM). sgn is an elementwise sign function. ℒR and ℒH are hash reconstruction and semantic hashing functions, respectively. In the training stage, first, W is used to recast the label y as a ground-truth hash code o. Then, the obtained hash code is used to guide view-specific networks with a semantic hashing reconstruction regularizer. Such a learning scheme makes the v view-specific neural networks (one network for each view) can be trained separately since they are decoupled and do not share any trainable parameters. Therefore, our DCHN can be easy to scale to a large number of views. In the inference stage, each trained view-specific network fk(xk, Θk) is used to compute the hash code of the sample xk.

SHAM

Figure 1. Proposed SHAM utilizes the semantic information (e.g., labels or classes) to learn an encoder W and a decoder WT by mutually converting the semantic and Hamming spaces. SHAM is one key component of our independent hashing paradigm.

Usage

First, to train SHAM wtih 64 bits on MIRFLICKR-25K, just run trainSHAM.py as follows:

python trainSHAM.py --datasets mirflickr25k --output_shape 64 --gama 1 --available_num 100

Then, to train a model for image modality wtih 64 bits on MIRFLICKR-25K, just run main_DCHN.py as follows:

python main_DCHN.py --mode train --epochs 100 --view 0 --datasets mirflickr25k --output_shape 64 --alpha 0.02 --gama 1 --available_num 100 --gpu_id 0

For text modality:

python main_DCHN.py --mode train --epochs 100 --view 1 --datasets mirflickr25k --output_shape 64 --alpha 0.02 --gama 1 --available_num 100 --gpu_id 1

To evaluate the trained models, you could run main_DCHN.py as follows:

python main_DCHN.py --mode eval --view -1 --datasets mirflickr25k --output_shape 64 --alpha 0.02 --gama 1 --available_num 100 --num_workers 0

Comparison with the State-of-the-Art

Table 1: Performance comparison in terms of MAP scores on the MIRFLICKR-25K and IAPR TC-12 datasets. The highest MAP score is shown in bold.

   Method    MIRFLICKR-25K IAPR TC-12
Image → Text Text → Image Image → Text Text → Image
16 32 64 128 16 32 64 128 16 32 64 128 16 32 64 128
Baseline 0.581 0.520 0.553 0.573 0.578 0.544 0.556 0.579 0.329 0.292 0.309 0.298 0.332 0.295 0.311 0.304
SePH [21] 0.729 0.738 0.744 0.750 0.753 0.762 0.764 0.769 0.467 0.476 0.486 0.493 0.463 0.475 0.485 0.492
SePHlr [12] 0.729 0.746 0.754 0.763 0.760 0.780 0.785 0.793 0.410 0.434 0.448 0.463 0.461 0.495 0.515 0.525
RoPH [34] 0.733 0.744 0.749 0.756 0.757 0.759 0.768 0.771 0.457 0.481 0.493 0.500 0.451 0.478 0.488 0.495
LSRH [22] 0.756 0.780 0.788 0.800 0.772 0.786 0.791 0.802 0.474 0.490 0.512 0.522 0.474 0.492 0.511 0.526
KDLFH [23] 0.734 0.755 0.770 0.771 0.764 0.780 0.794 0.797 0.306 0.314 0.351 0.357 0.307 0.315 0.350 0.356
DLFH [23] 0.721 0.743 0.760 0.767 0.761 0.788 0.805 0.810 0.306 0.314 0.326 0.340 0.305 0.315 0.333 0.353
MTFH [13] 0.581 0.571 0.645 0.543 0.584 0.556 0.633 0.531 0.303 0.303 0.307 0.300 0.303 0.303 0.308 0.302
DJSRH [14] 0.620 0.630 0.645 0.660 0.620 0.626 0.645 0.649 0.368 0.396 0.419 0.439 0.370 0.400 0.423 0.437
DCMH [9] 0.737 0.754 0.763 0.771 0.753 0.760 0.763 0.770 0.423 0.439 0.456 0.463 0.449 0.464 0.476 0.481
SSAH [20] 0.797 0.809 0.810 0.802 0.782 0.797 0.799 0.790 0.501 0.503 0.496 0.479 0.504 0.530 0.554 0.565
DCHN0 0.806 0.823 0.836 0.842 0.797 0.808 0.823 0.827 0.487 0.492 0.550 0.573 0.481 0.488 0.543 0.567
DCHN100 0.813 0.816 0.823 0.840 0.808 0.803 0.814 0.830 0.533 0.558 0.582 0.596 0.527 0.557 0.582 0.595

Table 2: Performance comparison in terms of MAP scores on the NUS-WIDE and MS-COCO datasets. The highest MAP score is shown in bold.

   Method    NUS-WIDE MS-COCO
Image → Text Text → Image Image → Text Text → Image
16 32 64 128 16 32 64 128 16 32 64 128 16 32 64 128
Baseline 0.281 0.337 0.263 0.341 0.299 0.339 0.276 0.346 0.362 0.336 0.332 0.373 0.348 0.341 0.347 0.359
SePH [21] 0.644 0.652 0.661 0.664 0.654 0.662 0.670 0.673 0.586 0.598 0.620 0.628 0.587 0.594 0.618 0.625
SePHlr [12] 0.607 0.624 0.644 0.651 0.630 0.649 0.665 0.672 0.527 0.571 0.592 0.600 0.555 0.596 0.618 0.621
RoPH [34] 0.638 0.656 0.662 0.669 0.645 0.665 0.671 0.677 0.592 0.634 0.649 0.657 0.587 0.628 0.643 0.652
LSRH [22] 0.622 0.650 0.659 0.690 0.600 0.662 0.685 0.692 0.580 0.563 0.561 0.567 0.580 0.611 0.615 0.632
KDLFH [23] 0.323 0.367 0.364 0.403 0.325 0.365 0.368 0.408 0.373 0.403 0.451 0.542 0.370 0.400 0.449 0.542
DLFH [23] 0.316 0.367 0.381 0.404 0.319 0.379 0.386 0.415 0.352 0.398 0.455 0.443 0.359 0.393 0.456 0.442
MTFH [13] 0.265 0.473 0.434 0.445 0.243 0.418 0.414 0.485 0.288 0.264 0.311 0.413 0.301 0.284 0.310 0.406
DJSRH [14] 0.433 0.453 0.467 0.442 0.457 0.468 0.468 0.501 0.478 0.520 0.544 0.566 0.462 0.525 0.550 0.567
DCMH [9] 0.569 0.595 0.612 0.621 0.548 0.573 0.585 0.592 0.548 0.575 0.607 0.625 0.568 0.595 0.643 0.664
SSAH [20] 0.636 0.636 0.637 0.510 0.653 0.676 0.683 0.682 0.550 0.577 0.576 0.581 0.552 0.578 0.578 0.669
DCHN0 0.648 0.660 0.669 0.683 0.662 0.677 0.685 0.697 0.602 0.658 0.682 0.706 0.591 0.652 0.669 0.696
DCHN100 0.654 0.671 0.681 0.691 0.668 0.683 0.697 0.707 0.662 0.701 0.703 0.720 0.650 0.689 0.693 0.714

Citation

If you find DCHN useful in your research, please consider citing:

@article{hu2021joint,
  author={Hu, Peng and Peng, Xi and Zhu, Hongyuan and Lin, Jie and Zhen, Liangli and Peng, Dezhong},
  journal={IEEE Transactions on Cybernetics}, 
  title={Joint Versus Independent Multiview Hashing for Cross-View Retrieval}, 
  year={2021},
  volume={51},
  number={10},
  pages={4982-4993},
  doi={10.1109/TCYB.2020.3027614}}
}
Owner
https://penghu-cs.github.io/
Code for the upcoming CVPR 2021 paper

The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth Jamie Watson, Oisin Mac Aodha, Victor Prisacariu, Gabriel J. Brostow and Michael

Niantic Labs 496 Dec 30, 2022
Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP

mmc installation git clone https://github.com/dmarx/Multi-Modal-Comparators cd 'Multi-Modal-Comparators' pip install poetry poetry build pip install d

David Marx 37 Nov 25, 2022
UniFormer - official implementation of UniFormer

UniFormer This repo is the official implementation of "Uniformer: Unified Transformer for Efficient Spatiotemporal Representation Learning". It curren

SenseTime X-Lab 573 Jan 04, 2023
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP

CLIP2Video: Mastering Video-Text Retrieval via Image CLIP The implementation of paper CLIP2Video: Mastering Video-Text Retrieval via Image CLIP. CLIP2

168 Dec 29, 2022
Pytorch implementation of CoCon: A Self-Supervised Approach for Controlled Text Generation

COCON_ICLR2021 This is our Pytorch implementation of COCON. CoCon: A Self-Supervised Approach for Controlled Text Generation (ICLR 2021) Alvin Chan, Y

alvinchangw 79 Dec 18, 2022
Hummingbird compiles trained ML models into tensor computation for faster inference.

Hummingbird Introduction Hummingbird is a library for compiling trained traditional ML models into tensor computations. Hummingbird allows users to se

Microsoft 3.1k Dec 30, 2022
[CVPR'21] Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild

IVOS-W Paper Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild Zhaoyun Yin, Jia Zheng, Weixin Luo, Shenhan Qian, Hanli

SVIP Lab 38 Dec 12, 2022
FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware.

FIRM-AFL FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware. FIRM-AFL addresses two fundamental problems in IoT fuzzing. First, it

356 Dec 23, 2022
On Generating Extended Summaries of Long Documents

ExtendedSumm This repository contains the implementation details and datasets used in On Generating Extended Summaries of Long Documents paper at the

Georgetown Information Retrieval Lab 76 Sep 05, 2022
A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019).

ClusterGCN ⠀⠀ A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019). A

Benedek Rozemberczki 697 Dec 27, 2022
PyTorch implementation for our AAAI 2022 Paper "Graph-wise Common Latent Factor Extraction for Unsupervised Graph Representation Learning"

deepGCFX PyTorch implementation for our AAAI 2022 Paper "Graph-wise Common Latent Factor Extraction for Unsupervised Graph Representation Learning" Pr

Thilini Cooray 4 Aug 11, 2022
Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

STCN Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [a

Rex Cheng 456 Dec 12, 2022
Azua - build AI algorithms to aid efficient decision-making with minimum data requirements.

Project Azua 0. Overview Many modern AI algorithms are known to be data-hungry, whereas human decision-making is much more efficient. The human can re

Microsoft 197 Jan 06, 2023
Multi-objective constrained optimization for energy applications via tree ensembles

Multi-objective constrained optimization for energy applications via tree ensembles

C⚙G - Imperial College London 1 Nov 19, 2021
[NeurIPS 2021] A weak-shot object detection approach by transferring semantic similarity and mask prior.

TransMaS This repository is the official pytorch implementation of the following paper: NIPS2021 Mixed Supervised Object Detection by TransferringMask

BCMI 49 Jul 27, 2022
Python Implementation of the CoronaWarnApp (CWA) Event Registration

Python implementation of the Corona-Warn-App (CWA) Event Registration This is an implementation of the Protocol used to generate event and location QR

MaZderMind 17 Oct 05, 2022
Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes (CVPR2021)

RSCD (BS-RSCD & JCD) Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes (CVPR2021) by Zhihang Zhong, Yinqiang Zheng, Imari Sato We co

81 Dec 15, 2022
O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis

O-CNN This repository contains the implementation of our papers related with O-CNN. The code is released under the MIT license. O-CNN: Octree-based Co

Microsoft 607 Dec 28, 2022
Rotary Transformer

[中文|English] Rotary Transformer Rotary Transformer is an MLM pre-trained language model with rotary position embedding (RoPE). The RoPE is a relative

325 Jan 03, 2023
EMNLP 2021 - Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Frustratingly Simple Pretraining Alternatives to Masked Language Modeling This is the official implementation for "Frustratingly Simple Pretraining Al

Atsuki Yamaguchi 31 Nov 18, 2022