当前位置：网站首页>Two tower model: Ernie gram pre training and fine-tuning matching

Two tower model: Ernie gram pre training and fine-tuning matching

2022-06-09 23:16:00 【Artificial intelligence Zeng Xiaojian】

Background introduction

be based on ERNIE-Gram Training Pair-wise Model .Pair-wise The matching model is suitable for The similarity of text pairs is input to the upper ranking module as one of the features Application scenarios for sorting .

ERNIE-Gram

1. Technical proposal and evaluation index

Technical solution

The twin tower model , Use ERNIE-Gram Preliminary training Model , Use margin_ranking_loss Training models .

Evaluation indicators

（1） use AUC Indicators to evaluate Sort model The sorting effect of .

Effect evaluation

Model	AUC
ERNIE-Gram	0.801

2. Environmental dependencies and installation instructions

Environment depends on

python >= 3.x
paddlepaddle >= 2.1.3
paddlenlp >= 2.2
pandas >= 0.25.1
scipy >= 1.3.1

3. The code structure

The following is the main code structure and description of the project ：

ernie_matching/
├── deply #  Deploy 
    └── python
        ├── deploy.sh #  Forecast deployment bash Script 
        └── predict.py # python  Forecast deployment example 
|—— scripts
    ├── export_model.sh #  Dynamic graph parameters export static graph parameters bash file 
    ├── train_pairwise.sh # Pair-wise  Single tower matching model training bash file 
    ├── evaluate.sh #  Validation evaluation document bash Script 
    ├── predict_pairwise.sh # Pair-wise  Single tower matching model prediction script bash file 
├── export_model.py #  Dynamic graph parameter export static graph parameter script 
├── model.py #  Pair-wise  Match the model 
├── data.py #  Pair-wise  Transformation logic of training samples  、Pair-wise  The logic of generating random negative examples 
├── train_pairwise.py # Pair-wise  Single tower matching model training script 
├── evaluate.py #  Validation evaluation document 
├── predict_pairwise.py # Pair-wise  Single tower matching model prediction script , The output text pair is similar

4. Data preparation

Data set description

The sample data is as follows :

 Personal income tax planning        Based on the new personal income tax perspective of tax planning analysis of the new personal income tax ; Individual income tax ; Tax planning        Personal income tax salary tax planning research on personal income tax , Wages and salaries , Tax preparation 
 Stress analysis of hydraulic support base     ZY4000/09/19D Finite element analysis of hydraulic support , Finite element analysis , Load both ends , Partial load , Reverse         be based on ANSYS Multi working condition stress analysis of hydraulic support , Four working conditions , Simulation analysis ,ANSYS, Stress concentration , Optimize 
 Delayed vasospasm    Effect of cilostazol on cerebral vasospasm after aneurysmal subarachnoid hemorrhage Meta Analysis of cilostazol , Subarachnoid hemorrhage , Cerebral vasospasm ,Meta analysis       Effect of cilostazol on cerebral vasospasm after aneurysmal subarachnoid hemorrhage Meta Analysis of cilostazol , Subarachnoid hemorrhage , Cerebral vasospasm ,Meta analysis 
 Silicon oxide          Composite sol - Preparation of silicon oxide for lithium ion batteries by gel one pot method / Carbon composite anode material silicon oxide , Sol - Gel method , Nanoparticles , Negative pole , Lithium ion battery     Supported polyimide - silicon dioxide - Preparation and characterization of silver hybrid film polyimide , silicon dioxide , silver , Hybrid membrane , Promote transmission

Dataset Download

literature_search_data

├── milvus # milvus Build database data set 
    ├── milvus_data.csv.  #  Build the data of recall library 
├── recall  #  Recall （ Semantic index ） Data sets 
    ├── corpus.csv #  Recall library for testing 
    ├── dev.csv  #  Recall validation set 
    ├── test.csv #  Recall test set 
    ├── train.csv  #  Recall training sets 
    ├── train_unsupervised.csv #  Unsupervised training set 
├── sort #  Sort data sets 
    ├── test_pairwise.csv   #  Sort test sets 
    ├── dev_pairwise.csv    #  Sort validation set 
    └── train_pairwise.csv  #  Sort training sets

原网站

版权声明
本文为[Artificial intelligence Zeng Xiaojian]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/160/202206092231353335.html