当前位置:网站首页>Label Semantic Aware Pre-training for Few-shot Text Classification
Label Semantic Aware Pre-training for Few-shot Text Classification
2022-07-03 10:11:00 【InfoQ】

brief introduction
Pictures and articles
The whole structure


pipeline
- Gold data : By an undisclosed benchmark Data set and a public data set . The annotation quality of manual annotation of gold data is higher .
- Silver data : It is also a public data set . Silver data is annotated with heuristic .
- Bronze data : Because tagged data is expensive , And the number is rare . Therefore, this paper also obtains pre training data from a large number of unlabeled data . Figure 2 is the framework used .
- Dialog intention filter : Is based on RoBERTa Of . It is to divide the dialogue into positive examples and negative examples . Because not all conversations have a certain intention . such as “ It's a beautiful day .” This sentence has no intention ; however “ It's a beautiful day , So I want to go to the park .” This sentence is with a clear intention . If you put a data label on the data without intention , It will adversely affect the pre training and subsequent downstream tasks . Therefore, the unintentional sentences should be removed .
- Intention generator : Because unlabeled data has no intention to label , So use a based on T The intention generator of five generates the intention of the corresponding dialogue sentence .
Pre training form


- The first is random mask. Randomize sentences mask. And then use T5 Generate mask Content of the label .
- Here is a training method similar to the downstream task , Intention classification . Enter a sentence , And then sort it out , Output such intention natural language tags .
- Finally, denoising . The input sequence is composed of a sentence and the corresponding tag , But the label mask It fell off . Output to guess mask What is the content of .
Experimental design

Fine tuning part
baselines:
- XLNet
- LM-BFF
- seq2eq-PTR
- T5
- T5(adapt)
Add

summary
Introduction
motivation
- Pre training models are often used to encode input efficiently , But there is little work to let the model access the information representation of the tag .
- Other work is to use tagging in the fine-tuning and prediction stages .
- “ gold ” and “ Silver ” Data scarcity
contribution
- Incorporate tag semantics into the generation model during pre training .
- Create from unlabeled noise data “ word - Intention ” Yes , For label semantic awareness pre training .( Used for processing “ bronze ” data , Create for unlabeled text “ dialogue - Intention ” Yes )
- Intention and subject classification data set SOTA.
Approach
data:
- Gold data : Unpublished data sets + PolyAI Banking, It's with label The data of
- Silver data : Heuristically labeled datasets WikiHow, It's with heuristicallyp-label The data of
- Bronze data : Pseudo tag data , Create from unlabeled data utterance-intent pairs. yes seudo-label data
For the processing of unlabeled data :
- Dialog filter :
- Not all conversations are intentional (goal、intent).
- In order to prevent unintentional statements from being labeled with intent, thereby creating toxic data that affects downstream tasks , So first classify the dialogue into two categories (“non-intentful/negative” and “intentful/positive” ).
- Use Multi-Domain Goal-Oriented Dialogue (MultiDoGO) Schema-guided Dialogue(SGD) Yes, based on RoBERTa The dialog classifier is adjusted .
- Intention generator :
- Use gold and silver data to fine tune T5, Then throw the filtered data into it to generate intention labels . It also produced 37% Tags that don't appear in the training set .
Preliminary training ——label denoising
Experimental setup
fine-tuning :
baselines:
- XLNet
- LM-BFF
- seq2eq-PTR
- T5
- T5(adapt)
边栏推荐
- Swing transformer details-2
- 3.2 Off-Policy Monte Carlo Methods & case study: Blackjack of off-Policy Evaluation
- 2.2 DP: Value Iteration & Gambler‘s Problem
- 20220608其他:逆波兰表达式求值
- Vscode markdown export PDF error
- Opencv feature extraction - hog
- 20220609其他:多数元素
- QT setting suspension button
- Tensorflow built-in evaluation
- CV learning notes - image filter
猜你喜欢
随机推荐
Label Semantic Aware Pre-training for Few-shot Text Classification
LeetCode - 508. 出现次数最多的子树元素和 (二叉树的遍历)
20220601数学:阶乘后的零
Wireshark use
LeetCode - 673. Number of longest increasing subsequences
Opencv histogram equalization
20220606数学:分数到小数
1. Finite Markov Decision Process
Discrete-event system
Leetcode-112:路径总和
The data read by pandas is saved to the MySQL database
CV learning notes - camera model (Euclidean transformation and affine transformation)
The 4G module designed by the charging pile obtains NTP time through mqtt based on 4G network
Interruption system of 51 single chip microcomputer
Openeuler kernel technology sharing - Issue 1 - kdump basic principle, use and case introduction
CV learning notes - reasoning and training
LeetCode - 703 数据流中的第 K 大元素(设计 - 优先队列)
Leetcode interview question 17.20 Continuous median (large top pile + small top pile)
It is difficult to quantify the extent to which a single-chip computer can find a job
Adaptiveavgpool1d internal implementation









