当前位置：网站首页>Naacl2022: (code practice) good visual guidance promotes better feature extraction, multimodal named entity recognition (with source code download)

Naacl2022: (code practice) good visual guidance promotes better feature extraction, multimodal named entity recognition (with source code download)

2022-06-26 14:37:00 【Computer Vision Research Institute】

Pay attention to the parallel stars

Never get lost

Institute of computer vision

official account ID｜ComputerVisionGzq

Study Group ｜ Scan the code to get the join mode on the homepage

Address of thesis ：https://arxiv.org/pdf/2205.03521.pdf

Code address : https://github.com/zjunlp/HVPNeT

Computer Vision Institute column

author ：Edison_G

Multimodal named entity recognition and relation extraction （MNER and MRE） It is a basic and key branch of information extraction .

Generalization

Multimodal named entity recognition and relation extraction （MNER and MRE） It is a basic and key branch of information extraction . However , When the text contains unrelated object images , The existing MNER and MRE Methods are often affected by error sensitivity .

To solve these problems , Some researchers have proposed a novel Hierarchical visual prefix fusion network （HVPNeT）, For visual enhancement of entity and relationship extraction , Designed to achieve more efficient and powerful performance .

say concretely , Treat visual representations as pluggable visual prefixes , Expressed in text to guide error insensitive prediction decisions . Furthermore, a dynamic gated aggregation strategy is proposed , To achieve hierarchical multi-scale visual features as a fusion of visual prefix . A large number of experiments on three benchmark data sets have proved the effectiveness of the new method , And achieve the most advanced performance .

New framework

Collection of Pyramidal Visual Feature

One side , Images associated with sentences maintain multiple visual objects related to entities in sentences , It further provides more semantic knowledge to assist in information extraction . On the other hand , Global image features may express abstract concepts , Play the role of weak learning signal . therefore , Multiple visual cues are collected for multimodal entity and relationship extraction , It includes the area image as important information , Complemented by the global image .

Dynamic Gated Aggregation

Although objects of different sizes can have appropriate feature representation at corresponding scales , But decide which part of the visual backbone is Transformer It is not easy to assign visual prefixes to each layer in the . To meet this challenge , The researchers suggest building a routing space with dense connections , The hierarchical multi-scale visual features are connected with each transformer layer .

Dynamic Gate Module

Routine processing through the dynamic door module , It can be seen as a process of path decision-making . The motive of the dynamic gate is to predict a normalized vector , It indicates the degree to which the visual features of each block are executed .

Aggregated Hierarchical Feature

Based on the above dynamic door g(l), The final aggregated hierarchical visual features can be derived Vgated, To match Transformer No l layer ：

Visual Prefix-guided Fusion

Hierarchical multi-scale image features are used as visual prefixes , And in BERT The visual prefix sequence is added to the text sequence in each self attention layer of the .

The hierarchical multi-scale visual feature is used as the visual prefix of each fusion layer , And perform multimodal attention in turn to update all text states . In this way , The final text state encodes both context and cross modal semantic information . This helps to reduce the error sensitivity of unrelated object elements .

experiment

Code practice

To run the codes, you need to install the requirements:

pip install -r requirements.txt

Data Collection：

The datasets that we used in our experiments are as follows:

Twitter2015 & Twitter2017
The text data follows the conll format. You can download the Twitter2015 data via this link and download the Twitter2017 data via this link. Please place them in data/NER_data.
You can also put them anywhere and modify the path configuration in run.py
MNER
The MRE dataset comes from MEGA and you can download the MRE dataset with detected visual objects using folloing
command:

cd data
wget 120.27.214.45/Data/re/multimodal/data.tar.gz
tar -xzvf data.tar.gz
mv data RE_data

Data Preprocess：

HMNeT
 |-- data  # conll2003, mit-movie, mit-restaurant and atis
 |    |-- NER_data
 |    |    |-- twitter2015  # text data
 |    |    |    |-- train.txt
 |    |    |    |-- valid.txt
 |    |    |    |-- test.txt
 |    |    |    |-- twitter2015_train_dict.pth  # {full-image-[object-image]}
 |    |    |    |-- ...
 |    |    |-- twitter2015_images       # full image data
 |    |    |-- twitter2015_aux_images   # object image data
 |    |    |-- twitter2017
 |    |    |-- twitter2017_images
 |    |-- RE_data
 |    |    |-- ...
 |-- models  # models
 |    |-- bert_model.py
 |    |-- modeling_bert.py
 |-- modules
 |    |-- metrics.py    # metric
 |    |-- train.py  # trainer
 |-- processor
 |    |-- dataset.py    # processor, dataset
 |-- logs     # code logs
 |-- run.py   # main 
 |-- run_ner_task.sh
 |-- run_re_task.sh

Train：

NER Task

The data path and GPU related configuration are in the run.py. To train ner model, run this script.

bash run_twitter15.sh
bash run_twitter17.sh

checkpoints can be download via Twitter15_ckpt, Twitter17_ckpt.

RE Task

To train re model, run this script.

bash run_re_task.sh

checkpoints can be download via re_ckpt

Test：

NER Task

To test ner model, you can download the model chekpoints we provide via Twitter15_ckpt, Twitter17_ckpt or use your own tained model and set load_path to the model path, then run following script:

python -u run.py \
      --dataset_name="twitter15/twitter17" \
      --bert_name="bert-base-uncased" \
      --seed=1234 \
      --only_test \
      --max_seq=80 \
      --use_prompt \
      --prompt_len=4 \
      --sample_ratio=1.0 \
      --load_path='your_ner_ckpt_path'

RE Task

To test re model, you can download the model chekpoints we provide via re_ckpt or use your own tained model and set load_path to the model path, then run following script:

python -u run.py \
      --dataset_name="MRE" \
      --bert_name="bert-base-uncased" \
      --seed=1234 \
      --only_test \
      --max_seq=80 \
      --use_prompt \
      --prompt_len=4 \
      --sample_ratio=1.0 \
      --load_path='your_re_ckpt_path'

THE END

Please contact the official account for authorization.

The learning group of computer vision research institute is waiting for you to join ！

Institute of computer vision Mainly involves Deep learning field , It's mainly about Face detection 、 Face recognition , Multi target detection 、 Target tracking 、 Image segmentation, etc Research direction . research institute Next, we will continue to share the latest papers, algorithms and new frameworks , The difference of our reform this time is , We need to focus on ” Research “. After that, we will share the practice process for the corresponding fields , Let's really understand Get rid of the theory The real scene of , Develop the habit of hands-on programming and brain thinking ！

Sweep code Focus on

Institute of computer vision

official account ID｜ComputerVisionGzq

Study Group ｜ Scan the code to get the join mode on the homepage

Previous recommendation