当前位置：网站首页>Label Semantic Aware Pre-training for Few-shot Text Classification

Label Semantic Aware Pre-training for Few-shot Text Classification

2022-07-03 10:11:00 【InfoQ】

Address of thesis ：

Label Semantic Aware Pre-training for Few-shot Text Classification - ACL Anthology

brief introduction

In the text classification task , Useful information is encoded in the tag name . Previous work only used the semantic information of these tags during fine-tuning and prediction , It improves the performance of text classification . However , The use of tag semantics during pre training has not been widely explored . This paper is

The pre training of tag semantic perception is proposed （LSAP）

LSAP Through the second pre training of marked sentences from various fields , Integrate tag semantics into the pre training generation model . Because domain general pre training requires a lot of data , This paper

Also developed a filter and tag pipeline Automatically create from unlabeled text “ The sentence - label ” Yes .

The proposed method cares about the graph （ATIS、Snips、TOPv2） And subject classification （AG News、Yahoo! Answers） Experiments on data sets . LSAP The performance of text classification in few samples is due to SOTA, At the same time, in the high resource setting, it keeps the same with SOTA Comparable performance .

Pictures and articles

The whole structure

The first figure is an overall description of the model architecture . The input sequence is composed of a sentence with intention and an intention tag . The first is to transform intention labels into human natural language . And then I'm going to take that as T5 Model input . In the pre training phase, label mask fall , Use the model to output label content .

pipeline

This picture is a filter sentence pipeline. First, let's talk about the large number of data sets used in the pre training in the paper . The author teaches that training data sets are divided into three categories , gold 、 Silver and bronze .

Gold data ： By an undisclosed benchmark Data set and a public data set . The annotation quality of manual annotation of gold data is higher .

Silver data ： It is also a public data set . Silver data is annotated with heuristic .

Bronze data ： Because tagged data is expensive , And the number is rare . Therefore, this paper also obtains pre training data from a large number of unlabeled data . Figure 2 is the framework used .

Figure 2 this framework is mainly composed of two parts . A dialogue intention filter + An intention generator .

Dialog intention filter ： Is based on RoBERTa Of . It is to divide the dialogue into positive examples and negative examples . Because not all conversations have a certain intention . such as “ It's a beautiful day .” This sentence has no intention ; however “ It's a beautiful day , So I want to go to the park .” This sentence is with a clear intention . If you put a data label on the data without intention , It will adversely affect the pre training and subsequent downstream tasks . Therefore, the unintentional sentences should be removed .

Intention generator ： Because unlabeled data has no intention to label , So use a based on T The intention generator of five generates the intention of the corresponding dialogue sentence .

So the whole thing pipeline Is that ：

First, classify the input data , Filter out unintentional sentences . Then generate corresponding intention tags for the remaining sentences . Finally, splice the sentence with the intention tag .

Pre training form

In the pre training or try three different forms . The final effect is shown in the table above . The author uses the best label denoising .

The first is random mask. Randomize sentences mask. And then use T5 Generate mask Content of the label .

Here is a training method similar to the downstream task , Intention classification . Enter a sentence , And then sort it out , Output such intention natural language tags .

Finally, denoising . The input sequence is composed of a sentence and the corresponding tag , But the label mask It fell off . Output to guess mask What is the content of .

Finally, the author uses ： Unsupervised denoising task for pre training ： For label Add noise to <mask>, Rebuild in output label.

Experimental design

The intention classification data set in the upper part and the subject classification data set in the lower part are used as the evaluation data set . Because the theme of the article is a small sample , So the author dealt with the training machine .

Fine tuning part

The fine-tuning part is a task similar to intention classification .

Input format ：intent classification: balabala（prefix+utterance） Output ： In the form of natural language intent（ take label Into natural language ）

baselines：

XLNet

LM-BFF

seq2eq-PTR

T5（adapt）

There are many experimental results and ablation experiments . There are many data tables , I don't want to take screenshots here .

Add

I have a doubt about this position , Because I didn't understand the purpose of this place . If you are interested in reading this article , If you understand what this place wants to express , You can discuss it with me .

summary

Introduction

motivation

Pre training models are often used to encode input efficiently , But there is little work to let the model access the information representation of the tag .

Other work is to use tagging in the fine-tuning and prediction stages .

So put forward LSAP

“ gold ” and “ Silver ” Data scarcity

So I made a pipeline Handle unlabeled data acquisition “utterance-intent pairs”

contribution

Incorporate tag semantics into the generation model during pre training .

Create from unlabeled noise data “ word - Intention ” Yes , For label semantic awareness pre training .（ Used for processing “ bronze ” data , Create for unlabeled text “ dialogue - Intention ” Yes ）

Intention and subject classification data set SOTA.

Approach

data：

Gold data ： Unpublished data sets + PolyAI Banking, It's with label The data of

Silver data ： Heuristically labeled datasets WikiHow, It's with heuristicallyp-label The data of

Bronze data ： Pseudo tag data , Create from unlabeled data utterance-intent pairs. yes seudo-label data

For the processing of unlabeled data ：

Dialog filter ：

Not all conversations are intentional （goal、intent）.

In order to prevent unintentional statements from being labeled with intent, thereby creating toxic data that affects downstream tasks , So first classify the dialogue into two categories （“non-intentful/negative” and “intentful/positive” ）.

Use Multi-Domain Goal-Oriented Dialogue (MultiDoGO) Schema-guided Dialogue(SGD) Yes, based on RoBERTa The dialog classifier is adjusted .

Intention generator ：

Use gold and silver data to fine tune T5, Then throw the filtered data into it to generate intention labels . It also produced 37% Tags that don't appear in the training set .