Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

Related tags

Deep Learningqpic
Overview

QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information

by Masato Tamura, Hiroki Ohashi, and Tomoaki Yoshinaga.

This repository contains the official implementation of the paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information", which is accepted to CVPR2021.

QPIC is implemented by extending the recently proposed object detector, DETR. QPIC leverages the query-based detection and attention mechanism in the transformer, and as a result, achieves high HOI detection performance with simple detection heads.

Example attention maps.

Preparation

Dependencies

Our implementation uses external libraries such as NumPy and PyTorch. You can resolve the dependencies with the following command.

pip install numpy
pip install -r requirements.txt

Note that this command may dump errors during installing pycocotools, but the errors can be ignored.

Dataset

HICO-DET

HICO-DET dataset can be downloaded here. After finishing downloading, unpack the tarball (hico_20160224_det.tar.gz) to the data directory.

Instead of using the original annotations files, we use the annotation files provided by the PPDM authors. The annotation files can be downloaded from here. The downloaded annotation files have to be placed as follows.

qpic
 |─ data
 │   └─ hico_20160224_det
 |       |─ annotations
 |       |   |─ trainval_hico.json
 |       |   |─ test_hico.json
 |       |   └─ corre_hico.npy
 :       :

V-COCO

First clone the repository of V-COCO from here, and then follow the instruction to generate the file instances_vcoco_all_2014.json. Next, download the prior file prior.pickle from here. Place the files and make directories as follows.

qpic
 |─ data
 │   └─ v-coco
 |       |─ data
 |       |   |─ instances_vcoco_all_2014.json
 |       |   :
 |       |─ prior.pickle
 |       |─ images
 |       |   |─ train2014
 |       |   |   |─ COCO_train2014_000000000009.jpg
 |       |   |   :
 |       |   └─ val2014
 |       |       |─ COCO_val2014_000000000042.jpg
 |       |       :
 |       |─ annotations
 :       :

For our implementation, the annotation file have to be converted to the HOIA format. The conversion can be conducted as follows.

PYTHONPATH=data/v-coco \
        python convert_vcoco_annotations.py \
        --load_path data/v-coco/data \
        --prior_path data/v-coco/prior.pickle \
        --save_path data/v-coco/annotations

Note that only Python2 can be used for this conversion because vsrl_utils.py in the v-coco repository shows a error with Python3.

V-COCO annotations with the HOIA format, corre_vcoco.npy, test_vcoco.json, and trainval_vcoco.json will be generated to annotations directory.

Pre-trained parameters

Our QPIC have to be pre-trained with the COCO object detection dataset. For the HICO-DET training, this pre-training can be omitted by using the parameters of DETR. The parameters can be downloaded from here for ResNet50, and for ResNet101. For the V-COCO training, this pre-training has to be carried out because some images of the V-COCO evaluation set are contained in the training set of DETR. We excluded the images and pre-trained QPIC for the V-COCO evaluation.

After downloading or pre-training, move the pre-trained parameters to the params directory and convert the parameters with the following command (e.g. downloaded ResNet50 parameters).

python convert_parameters.py \
        --load_path params/detr-r50-e632da11.pth \
        --save_path params/detr-r50-pre.pth

Training

After the preparation, you can start the training with the following command.

For the HICO-DET training.

python main.py \
        --pretrained params/detr-r50-pre.pth \
        --output_dir logs \
        --hoi \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --set_cost_bbox 2.5 \
        --set_cost_giou 1 \
        --bbox_loss_coef 2.5 \
        --giou_loss_coef 1

For the V-COCO training.

python main.py \
        --pretrained params/detr-r50-pre.pth \
        --output_dir logs \
        --hoi \
        --dataset_file vcoco \
        --hoi_path data/v-coco \
        --num_obj_classes 80 \
        --num_verb_classes 29 \
        --backbone resnet50 \
        --set_cost_bbox 2.5 \
        --set_cost_giou 1 \
        --bbox_loss_coef 2.5 \
        --giou_loss_coef 1

If you have multiple GPUs on your machine, you can utilize them to speed up the training. The number of GPUs is specified with the --nproc_per_node option. The following command starts the training with 8 GPUs for the HICO-DET training.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained params/detr-r50-pre.pth \
        --output_dir logs \
        --hoi \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --set_cost_bbox 2.5 \
        --set_cost_giou 1 \
        --bbox_loss_coef 2.5 \
        --giou_loss_coef 1

Evaluation

The evaluation is conducted at the end of each epoch during the training. The results are written in logs/log.txt like below:

"test_mAP": 0.29061250833779456, "test_mAP rare": 0.21910348492395765, "test_mAP non-rare": 0.31197234650036926

test_mAP, test_mAP rare, and test_mAP non-rare are the results of the default full, rare, and non-rare setting, respectively.

For the official evaluation of V-COCO, a pickle file of detection results have to be generated. You can generate the file as follows.

python generate_vcoco_official.py \
        --param_path logs/checkpoint.pth
        --save_path vcoco.pickle
        --hoi_path data/v-coco

Results

HICO-DET.

Full (D) Rare (D) Non-rare (D) Full(KO) Rare (KO) Non-rare (KO)
QPIC (ResNet50) 29.07 21.85 31.23 31.68 24.14 33.93
QPIC (ResNet101) 29.90 23.92 31.69 32.38 26.06 34.27

D: Default, KO: Known object

V-COCO.

Scenario 1 Scenario 2
QPIC (ResNet50) 58.8 61.0
QPIC (ResNet101) 58.3 60.7

Citation

Please consider citing our paper if it helps your research.

@inproceedings{tamura_cvpr2021,
author = {Tamura, Masato and Ohashi, Hiroki and Yoshinaga, Tomoaki},
title = {{QPIC}: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information},
booktitle={CVPR},
year = {2021},
}
Comments
  • Reproduction the result on VCOCO dataset

    Reproduction the result on VCOCO dataset

    Hi, I can reproduce the result on HICO-DET dataset, but just get 51 on V-COCO dataset, could you please provide the log.txt? I am not sure where I am wrong. Thanks.

    opened by SherlockHolmes221 7
  • How to get the file logs/checkpoint.pth?

    How to get the file logs/checkpoint.pth?

    Thanks for your interesting work! Now I have a question about how to get the file logs/checkpoint.pth,I have not see the file from your google drive. Desire to your reply.

    opened by Lunarli 4
  • number of decoder layers impact

    number of decoder layers impact

    Hi, thanks for interesting work. Does the number of decoder layers have a significant impact on performance like in detr? And I cant find how much v100 you've used. maybe 8?

    opened by jihwanp 3
  • Why the pretrained VCOCO model has 81 object classes?

    Why the pretrained VCOCO model has 81 object classes?

    When I tried to evaluate the VCOCO model you provide, I have to set the parameter --num_obj_classes to 81. What is the reason for this setting? And should I set --num_obj_classes 81 during training?

    Thanks for your reply!

    opened by weiyunfei 3
  • How to pretrain detector for VCOCO?

    How to pretrain detector for VCOCO?

    Hi, thanks for your excellent work. I am confused about the pre-training for the V-COCO training. In your README.md, you stated the pretraining has to be carried out for the V-COCO training since some images of the V-COCO evaluation set are contained in the training set of DETR, while the given training command just used the pretrained DETR model without any more pretraining. Should I apply a pretraining on the dataset excluding the V-COCO evaluation images or just follow your training command?

    opened by weiyunfei 3
  • Different AP Results for pre-trained VCOCO models

    Different AP Results for pre-trained VCOCO models

    Hi folks,

    First of all, thank you for sharing this repository. I would like to ask a specific question about the evaluation results of provided pre-trained V-COCO models. I followed the instructions you provided (for constructing annotation files for V-COCO and obtaining pickle files) to get the results. However, comparing with the V-COCO results in the table, I got different average role ap results. For instance, I am providing the output of R-50 QPIC, scenario 1 Role AP result in the attached screenshot. I am wondering possible reasons for the issue, could you please provide assistance about this?

    qpic_resnet50_sc1_roleap

    opened by cancam 2
  • How to convert the pretrained DETR model for V-COCO style?

    How to convert the pretrained DETR model for V-COCO style?

    Hi! Sorry for bothering again. When I tried to train a V-COCO model, I found that the object classifier of the pre-trained DETR model is only 81-way (including background), while in #6 you claimed that the V-COCO model has an 82-way object classifier (including background and a missing category id). So what should I do to convert the classifier parameters from 81 classes into 82 classes?

    opened by weiyunfei 2
  • For V-COCO result reproduction, which evaluation code should I use?

    For V-COCO result reproduction, which evaluation code should I use?

    I tried to reproduce your results on V-COCO, and I encountered some problems. I found that the official evaluation code can only achieve 56.5 for Scenario 1 and 58.6 for Scenario 2 with your pre-trained model, while with the vcoco_eval.py in your project which is in PPDM style, the result is 58.35. However, the evaluation process in vcoco_eval.py seems like Scenario 2 in the official evaluation. Which code should I use to reproduce your experiments on V-COCO? Moreover, if I need to use the vcoco_eval.py, what should I do to get the evaluation results for both scenarios?

    opened by weiyunfei 2
  • V-COCO Evaluation Error

    V-COCO Evaluation Error

    Hello,

    Many thanks for your great work.

    I am trying to evaluate your pre-trained models on V-COCO.

    1. So, I first generate the official detection pickle via:
    
    python generate_vcoco_official.py \
            --param_path ./params/qpic_resnet50_vcoco.pth \
            --save_path ./logs_vcoco/vcoco.pickle \
            --hoi_path ./data/v-coco
    
    1. Later, I use the official evaluation code via the following:
    from vsrl_eval import VCOCOeval
    
    vsrl_annot_file_s='./data/v-coco/data/vcoco/vcoco_val.json'
    split_file_s='./data/v-coco/data/splits/vcoco_val.ids'
    
    coco_file_s='./data/v-coco/data/instances_vcoco_all_2014.json'
    vcocoeval = VCOCOeval(vsrl_annot_file_s, coco_file_s, split_file_s)
    
    file_name= './logs_vcoco/vcoco.pickle'
    vcocoeval._do_eval(file_name, ovr_thresh=0.5)
    

    Please note that I adapted the latter script from VSG-Net repo. I face the following error during evaluation:

    zero-size array to reduction operation minimum which has no identity

    on assert(np.amax(rec) <= 1) within _do_agent_eval() function execution.

    I wonder if this is a common error, and how it can be mitigated?

    Many thanks.

    opened by kilickaya 1
  • Distance AP

    Distance AP

    Hi I have a question about distance-wise ap. image

    Does the hoi instance, which participated in calculating ap, correspond to distance=0.1 when the distance between human & center box is between 0.1 and 0.2?

    Or I would fully appreciate it if you could provide code for calculate distance wise ap. Thanks

    opened by jihwanp 1
  • Some useless steps in focal loss

    Some useless steps in focal loss

    image Thx for your interesting work! The neg_weights is meaningful only in heatmap based problem, e.g., Gaussian based detection. Removing those steps may avoid some misunderstanding.

    opened by YueLiao 1
  • a bug of hico dataset file?

    a bug of hico dataset file?

    Firstly,thank for your nice work?but i find small bug here https://github.com/hitachi-rd-cv/qpic/blob/main/datasets/hico.py#L62 The number of HOI turples(len(img_anno['hoi_annotation'])) should not exceed the number of query(self.num_queries) but the number of boxes(len(img_anno['annotations'])) should not exceed the number of query(self.num_queries)? Is that right? Waiting for your thought,thank you very much!!

    opened by jojolee123 0
  • How to convert .onnx model from qpic.pth

    How to convert .onnx model from qpic.pth

    I am facing some issue; onnx conversion of this model. I tried

    torch.onnx.export(pth_model, dummy_input, "onnx_model.onnx", opset_version 11) dummy_input shape is [1, 3, 720, 1280] # same with this model.

    However, the result is [array([[[nan, nan, nan], [nan, nan, nan]]], dtype=float32), array([[[nan, nan], [nan, nan]]], dtype=float32), array([[[nan, nan, nan, nan], [nan, nan, nan, nan]]], dtype=float32), array([[[nan, nan, nan, nan], [nan, nan, nan, nan]]], dtype=float32)]

    It's nan party!

    please commet how to fix it.

    opened by bj-noh 0
  • Is your architecture end-to-end HOI?

    Is your architecture end-to-end HOI?

    Hello Thanks for your implementation. Is your architecture end-to-end HOI? Does it mean that it does not require any features and feature extraction? For example, I can feed my features to your network?

    opened by mansooreh1 0
  • The influence from aux_loss

    The influence from aux_loss

    Hi authors,

    thanks for your implementation which helps my research a lot. In the supplementary materials of your paper you stated that the auxiliary loss is used following DETR. What will happen if it isn't used for HOI training? I wonder if there's any corresponding experiment on QPIC.

    opened by hwfan 0
  • custom dataset implementation

    custom dataset implementation

    Hi, Thank you so much for this awesome repo. I am currently trying a custom dataset implementation using qpic. The dataset's annotation file is in coco format. The convert_vcoco_annotations.py script converts vcoco to HOIA format. But I am trying to convert a coco format annotation to HOIA by manually adding the interactions. In this process, I am unable to understand a few things, in HOIA format : what does {subject_id , category_id, object_id} mean in "hoi_annotations" dict key, what does {category_id} mean in "annotations" dict key"

    opened by dharneeshkar004 0
Configure SRX interfaces with Scrapli

Configure SRX interfaces with Scrapli Overview This example will show how to configure interfaces on Juniper's SRX firewalls. In addition to the Pytho

Calvin Remsburg 1 Jan 07, 2022
Tensors and Dynamic neural networks in Python with strong GPU acceleration

PyTorch is a Python package that provides two high-level features: Tensor computation (like NumPy) with strong GPU acceleration Deep neural networks b

61.4k Jan 04, 2023
Official repo of the paper "Surface Form Competition: Why the Highest Probability Answer Isn't Always Right"

Surface Form Competition This is the official repo of the paper "Surface Form Competition: Why the Highest Probability Answer Isn't Always Right" We p

Peter West 46 Dec 23, 2022
This solves the autonomous driving issue which is supported by deep learning technology. Given a video, it splits into images and predicts the angle of turning for each frame.

Self Driving Car An autonomous car (also known as a driverless car, self-driving car, and robotic car) is a vehicle that is capable of sensing its env

Sagor Saha 4 Sep 04, 2021
Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving

Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving This is the source code for our paper Frequency Domain Image Tran

Mu Cai 52 Dec 23, 2022
Augmentation for Single-Image-Super-Resolution

SRAugmentation Augmentation for Single-Image-Super-Resolution Implimentation CutBlur Cutout CutMix Cutup CutMixup Blend RGBPermutation Identity OneOf

Yubo 6 Jun 27, 2022
Copy Paste positive polyp using poisson image blending for medical image segmentation

Copy Paste positive polyp using poisson image blending for medical image segmentation According poisson image blending I've completely used it for bio

Phạm Vũ Hùng 2 Oct 19, 2021
Multimodal commodity image retrieval 多模态商品图像检索

Multimodal commodity image retrieval 多模态商品图像检索 Not finished yet... introduce explain:The specific description of the project and the product image dat

hongjie 8 Nov 25, 2022
CNN designed for pansharpening

PROGRESSIVE BAND-SEPARATED CONVOLUTIONAL NEURAL NETWORK FOR MULTISPECTRAL PANSHARPENING This repository contains main code for the paper PROGRESSIVE B

SerendipitysX 3 Dec 29, 2021
DWIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data.

DWIPrep: A Robust Preprocessing Pipeline for dMRI Data DWIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data. The transp

Gal Ben-Zvi 1 Jan 09, 2023
Mememoji - A facial expression classification system that recognizes 6 basic emotions: happy, sad, surprise, fear, anger and neutral.

a project built with deep convolutional neural network and ❤️ Table of Contents Motivation The Database The Model 3.1 Input Layer 3.2 Convolutional La

Jostine Ho 761 Dec 05, 2022
Learned model to estimate number of distinct values (NDV) of a population using a small sample.

Learned NDV estimator Learned model to estimate number of distinct values (NDV) of a population using a small sample. The model approximates the maxim

2 Nov 21, 2022
NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

5 Nov 03, 2022
The toolkit to generate auto labeled datasets

Ozeu Ozeu is the toolkit to autolabal dataset for instance segmentation. You can generate datasets labaled with segmentation mask and bounding box fro

Xiong Jie 28 Mar 28, 2022
Search Youtube Video and Get Video info

PyYouTube Get Video Data from YouTube link Installation pip install PyYouTube How to use it ? Get Videos Data from pyyoutube import Data yt = Data("ht

lokaman chendekar 35 Nov 25, 2022
Code for CPM-2 Pre-Train

CPM-2 Pre-Train Pre-train CPM-2 此分支为110亿非 MoE 模型的预训练代码,MoE 模型的预训练代码请切换到 moe 分支 CPM-2技术报告请参考link。 0 模型下载 请在智源资源下载页面进行申请,文件介绍如下: 文件名 描述 参数大小 100000.tar

Tsinghua AI 136 Dec 28, 2022
ECCV18 Workshops - Enhanced SRGAN. Champion PIRM Challenge on Perceptual Super-Resolution. The training codes are in BasicSR.

ESRGAN (Enhanced SRGAN) [ 🚀 BasicSR] [Real-ESRGAN] ✨ New Updates. We have extended ESRGAN to Real-ESRGAN, which is a more practical algorithm for rea

Xintao 4.7k Jan 02, 2023
A PyTorch-based library for fast prototyping and sharing of deep neural network models.

A PyTorch-based library for fast prototyping and sharing of deep neural network models.

78 Jan 03, 2023
Code for "Multi-Time Attention Networks for Irregularly Sampled Time Series", ICLR 2021.

Multi-Time Attention Networks (mTANs) This repository contains the PyTorch implementation for the paper Multi-Time Attention Networks for Irregularly

The Laboratory for Robust and Efficient Machine Learning 68 Dec 17, 2022
Code for "Primitive Representation Learning for Scene Text Recognition" (CVPR 2021)

Primitive Representation Learning Network (PREN) This repository contains the code for our paper accepted by CVPR 2021 Primitive Representation Learni

Ruijie Yan 76 Jan 02, 2023