当前位置:网站首页>Label Semantic Aware Pre-training for Few-shot Text Classification
Label Semantic Aware Pre-training for Few-shot Text Classification
2022-07-03 10:11:00 【InfoQ】

brief introduction
Pictures and articles
The whole structure


pipeline
- Gold data : By an undisclosed benchmark Data set and a public data set . The annotation quality of manual annotation of gold data is higher .
- Silver data : It is also a public data set . Silver data is annotated with heuristic .
- Bronze data : Because tagged data is expensive , And the number is rare . Therefore, this paper also obtains pre training data from a large number of unlabeled data . Figure 2 is the framework used .
- Dialog intention filter : Is based on RoBERTa Of . It is to divide the dialogue into positive examples and negative examples . Because not all conversations have a certain intention . such as “ It's a beautiful day .” This sentence has no intention ; however “ It's a beautiful day , So I want to go to the park .” This sentence is with a clear intention . If you put a data label on the data without intention , It will adversely affect the pre training and subsequent downstream tasks . Therefore, the unintentional sentences should be removed .
- Intention generator : Because unlabeled data has no intention to label , So use a based on T The intention generator of five generates the intention of the corresponding dialogue sentence .
Pre training form


- The first is random mask. Randomize sentences mask. And then use T5 Generate mask Content of the label .
- Here is a training method similar to the downstream task , Intention classification . Enter a sentence , And then sort it out , Output such intention natural language tags .
- Finally, denoising . The input sequence is composed of a sentence and the corresponding tag , But the label mask It fell off . Output to guess mask What is the content of .
Experimental design

Fine tuning part
baselines:
- XLNet
- LM-BFF
- seq2eq-PTR
- T5
- T5(adapt)
Add

summary
Introduction
motivation
- Pre training models are often used to encode input efficiently , But there is little work to let the model access the information representation of the tag .
- Other work is to use tagging in the fine-tuning and prediction stages .
- “ gold ” and “ Silver ” Data scarcity
contribution
- Incorporate tag semantics into the generation model during pre training .
- Create from unlabeled noise data “ word - Intention ” Yes , For label semantic awareness pre training .( Used for processing “ bronze ” data , Create for unlabeled text “ dialogue - Intention ” Yes )
- Intention and subject classification data set SOTA.
Approach
data:
- Gold data : Unpublished data sets + PolyAI Banking, It's with label The data of
- Silver data : Heuristically labeled datasets WikiHow, It's with heuristicallyp-label The data of
- Bronze data : Pseudo tag data , Create from unlabeled data utterance-intent pairs. yes seudo-label data
For the processing of unlabeled data :
- Dialog filter :
- Not all conversations are intentional (goal、intent).
- In order to prevent unintentional statements from being labeled with intent, thereby creating toxic data that affects downstream tasks , So first classify the dialogue into two categories (“non-intentful/negative” and “intentful/positive” ).
- Use Multi-Domain Goal-Oriented Dialogue (MultiDoGO) Schema-guided Dialogue(SGD) Yes, based on RoBERTa The dialog classifier is adjusted .
- Intention generator :
- Use gold and silver data to fine tune T5, Then throw the filtered data into it to generate intention labels . It also produced 37% Tags that don't appear in the training set .
Preliminary training ——label denoising
Experimental setup
fine-tuning :
baselines:
- XLNet
- LM-BFF
- seq2eq-PTR
- T5
- T5(adapt)
边栏推荐
- 20220531数学:快乐数
- 20220603数学:Pow(x,n)
- openEuler kernel 技術分享 - 第1期 - kdump 基本原理、使用及案例介紹
- 使用密钥对的形式连接阿里云服务器
- Sending and interrupt receiving of STM32 serial port
- One click generate traffic password (exaggerated advertisement title)
- Leetcode - 460 LFU cache (Design - hash table + bidirectional linked hash table + balanced binary tree (TreeSet))*
- 20220610其他:任务调度器
- 4G module designed by charging pile obtains signal strength and quality
- yocto 技術分享第四期:自定義增加軟件包支持
猜你喜欢

CV learning notes alexnet

My notes on the development of intelligent charging pile (III): overview of the overall design of the system software

2021-10-28

CV learning notes ransca & image similarity comparison hash

Opencv+dlib to change the face of Mona Lisa

Deep learning by Pytorch

Discrete-event system

Opencv feature extraction - hog

Pymssql controls SQL for Chinese queries

Vgg16 migration learning source code
随机推荐
Wireshark use
Replace the files under the folder with sed
Vgg16 migration learning source code
Leetcode - 895 maximum frequency stack (Design - hash table + priority queue hash table + stack)*
Vscode markdown export PDF error
Leetcode - 460 LFU cache (Design - hash table + bidirectional linked hash table + balanced binary tree (TreeSet))*
2.2 DP: Value Iteration & Gambler‘s Problem
1. Finite Markov Decision Process
Basic use and actual combat sharing of crash tool
Opencv gray histogram, histogram specification
. DLL and Differences between lib files
Leetcode-100:相同的树
Leetcode bit operation
CV learning notes - camera model (Euclidean transformation and affine transformation)
LeetCode - 706 设计哈希映射(设计) *
LeetCode - 508. Sum of subtree elements with the most occurrences (traversal of binary tree)
[combinatorics] combinatorial existence theorem (three combinatorial existence theorems | finite poset decomposition theorem | Ramsey theorem | existence theorem of different representative systems |
The underlying principle of vector
01 business structure of imitation station B project
LeetCode - 673. 最长递增子序列的个数