当前位置：网站首页>Paper appreciation [aaai18] meta multi task learning for sequence modeling

Paper appreciation [aaai18] meta multi task learning for sequence modeling

2022-07-27 21:15:00 【51CTO】

Meta multi task learning for sequence modeling - WeiYang Blog

godweiyang.com

Thesis appreciation [AAAI18] Meta multi task learning for sequence modeling _ Text classification

This article is a paper for knowledge analysis class , Take a look at it , Let's give you a brief introduction , The paper is written by Mr. Qiu Xipeng, Fudan .

Address of thesis ：Meta Multi-Task Learning for Sequence Modeling

Introduce

The general model of multi task learning is the shared feature representation layer , That is to say, the feature presentation layer at the bottom is shared , The upper neural networks are different according to specific tasks . But there's a problem , For example, use LSTM When modeling sentences , The combination function of different phrases is the same , For example, verbs + Noun 、 Adjective + Noun . But composite functions should be defined as different ones , So this article proposes different tasks , The dynamic parameter generation method of generating different parameter matrix at different time .

This article mainly has the following three contribution points ：

Different from the previous sharing of feature layer , In this paper, the sharing of function layer is proposed , That is to say, different combination functions are generated dynamically for different tasks .
Not only for multitasking ,Meta-LSTM For single task, there are also improvements , Because it's generating parameters dynamically , So the parameters of every moment are different , It can better express different phrasal meanings .
Models can also be used as transfer learning ,Meta-LSTM After training, it can be directly used for new tasks as a priori knowledge , And task specific LSTM As a posteriori knowledge .

Model

The task is introduced

This paper mainly does experiments on two tasks: sequence annotation and text classification , And it's multitasking , Sequence labels include NER and POS tagging, Text classification includes text classification in many different fields .

Traditional models

Traditional multitasking models share a private LSTM Feature presentation layer , Use this private LSTM Learn how to express sentences , Then input to task specific public with word vector splicing LSTM Go to . The specific structure is shown in the following figure ：

Thesis appreciation [AAAI18] Meta multi task learning for sequence modeling _ Data sets _02

Every task in the output layer is not shared , It's the same as the general model , I won't introduce it here . The final loss function is the weighted sum of the loss functions of all tasks .

The training strategy of the multitask model is as follows ： First, choose a task at random . Then randomly select a... From the data set of this task mini-batch. Then use the mini-batch Data to train and update parameters . Repeat the above three processes .

In this way, we can train a multi task model for all tasks .

Multi task learning

Traditional models only share the feature representation layer , That is to say, sharing private property LSTM. The model innovation of this paper is through Meta-LSTM Dynamic generation for each task 、 Different parameters at each time , Then use each task specific Basic-LSTM Encoding . The specific structure is shown in the following figure ：

Thesis appreciation [AAAI18] Meta multi task learning for sequence modeling _ Data sets _03

among Basic-LSTM Structure and common LSTM Is essentially the same , The only difference is the parameters of each moment W and b It's through Meta-LSTM Dynamically generated , The formal definition is as follows ：

Thesis appreciation [AAAI18] Meta multi task learning for sequence modeling _ multitasking _04

because W Dimension too large , The computational complexity is too high , And it is also easy to cause over fitting , So it's used here SVD decompose ：

Thesis appreciation [AAAI18] Meta multi task learning for sequence modeling _ Data sets _05

And here it is

It is through Meta-LSTM Dynamically generated , The formal definition is as follows ：

Thesis appreciation [AAAI18] Meta multi task learning for sequence modeling _ Data sets _06

If you express this in a concise way LSTM The relationship between , We can write it in the following form ：

Thesis appreciation [AAAI18] Meta multi task learning for sequence modeling _ multitasking _07

To sum up, it is ：Basic-LSTM The output of the last moment

、Meta-LSTM The output of the last moment

And the word for the moment

As Meta-LSTM Input of current time , Output generated

Used to generate Basic-LSTM The parameter matrix of the current time .

Meta-LSTM There are two main advantages ：

One is the dynamic generation of parameters at each time .
The other is more ordinary LSTM Fewer parameters , Because there is SVD decompose .

experiment

Text classification

The task of text classification is to 16 Shopping sites comment on the dataset done , The dataset size is as follows ：

Thesis appreciation [AAAI18] Meta multi task learning for sequence modeling _ multitasking _08

Finally, on most datasets ,Meta-LSTM Can do the best , The results are as follows ：

Thesis appreciation [AAAI18] Meta multi task learning for sequence modeling _ Text classification _09

Sequence annotation

The sequence annotation task is done on three datasets , Two are NER Data sets , One is POS tagging Data sets , The results are as follows ：

Thesis appreciation [AAAI18] Meta multi task learning for sequence modeling _ multitasking _10

Can only say than the most basic LSTM+CRF If the model is too high, throw it away .

summary

This paper puts forward a kind of function-level Multi task sharing mechanism , That is to use Meta-LSTM To generate... Dynamically Basic-LSTM The parameter matrix of each moment .

After reading it, I was thinking , Can this dynamic parameter generation mechanism be used in component parsing , For example, for top-down Of chart-based Model , Can pass from top to bottom Tree-LSTM Dynamically generate the parameter matrix of each tree node , Then we use this parameter matrix to predict the label and split.

原网站

版权声明
本文为[51CTO]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/208/202207271832333081.html