当前位置:网站首页>Paper appreciation [aaai18] meta multi task learning for sequence modeling
Paper appreciation [aaai18] meta multi task learning for sequence modeling
2022-07-27 21:15:00 【51CTO】
Meta multi task learning for sequence modeling - WeiYang Blog
godweiyang.com
![Thesis appreciation [AAAI18] Meta multi task learning for sequence modeling _ Text classification](/img/5f/4c6dbcf71202e8dbe7b191ea4bd95e.png)
This article is a paper for knowledge analysis class , Take a look at it , Let's give you a brief introduction , The paper is written by Mr. Qiu Xipeng, Fudan .
Address of thesis :Meta Multi-Task Learning for Sequence Modeling
Introduce
The general model of multi task learning is the shared feature representation layer , That is to say, the feature presentation layer at the bottom is shared , The upper neural networks are different according to specific tasks . But there's a problem , For example, use LSTM When modeling sentences , The combination function of different phrases is the same , For example, verbs + Noun 、 Adjective + Noun . But composite functions should be defined as different ones , So this article proposes different tasks , The dynamic parameter generation method of generating different parameter matrix at different time .
This article mainly has the following three contribution points :
- Different from the previous sharing of feature layer , In this paper, the sharing of function layer is proposed , That is to say, different combination functions are generated dynamically for different tasks .
- Not only for multitasking ,Meta-LSTM For single task, there are also improvements , Because it's generating parameters dynamically , So the parameters of every moment are different , It can better express different phrasal meanings .
- Models can also be used as transfer learning ,Meta-LSTM After training, it can be directly used for new tasks as a priori knowledge , And task specific LSTM As a posteriori knowledge .
Model
The task is introduced
This paper mainly does experiments on two tasks: sequence annotation and text classification , And it's multitasking , Sequence labels include NER and POS tagging, Text classification includes text classification in many different fields .
Traditional models
Traditional multitasking models share a private LSTM Feature presentation layer , Use this private LSTM Learn how to express sentences , Then input to task specific public with word vector splicing LSTM Go to . The specific structure is shown in the following figure :
![Thesis appreciation [AAAI18] Meta multi task learning for sequence modeling _ Data sets _02](/img/08/3f62a87146cc2cd0f1a3ea8bd5fa6e.png)
Every task in the output layer is not shared , It's the same as the general model , I won't introduce it here . The final loss function is the weighted sum of the loss functions of all tasks .
The training strategy of the multitask model is as follows : First, choose a task at random . Then randomly select a... From the data set of this task mini-batch. Then use the mini-batch Data to train and update parameters . Repeat the above three processes .
In this way, we can train a multi task model for all tasks .
Multi task learning
Traditional models only share the feature representation layer , That is to say, sharing private property LSTM. The model innovation of this paper is through Meta-LSTM Dynamic generation for each task 、 Different parameters at each time , Then use each task specific Basic-LSTM Encoding . The specific structure is shown in the following figure :
![Thesis appreciation [AAAI18] Meta multi task learning for sequence modeling _ Data sets _03](/img/02/7494501e653a10531b2181274cb5bb.png)
among Basic-LSTM Structure and common LSTM Is essentially the same , The only difference is the parameters of each moment W and b It's through Meta-LSTM Dynamically generated , The formal definition is as follows :
![Thesis appreciation [AAAI18] Meta multi task learning for sequence modeling _ multitasking _04](/img/0d/75d29b239a4d8b9841408d7b9d50de.png)
because W Dimension too large , The computational complexity is too high , And it is also easy to cause over fitting , So it's used here SVD decompose :
![Thesis appreciation [AAAI18] Meta multi task learning for sequence modeling _ Data sets _05](/img/2b/345b5a287fcd9c9b1a86ae683f124b.png)
And here it is
It is through Meta-LSTM Dynamically generated , The formal definition is as follows :
![Thesis appreciation [AAAI18] Meta multi task learning for sequence modeling _ Data sets _06](/img/32/32f8af48564c3b749a9f2bfbea2f8d.png)
If you express this in a concise way LSTM The relationship between , We can write it in the following form :
![Thesis appreciation [AAAI18] Meta multi task learning for sequence modeling _ multitasking _07](/img/e0/2d38240c2676aee81b3728560d8d33.png)
To sum up, it is :Basic-LSTM The output of the last moment
、Meta-LSTM The output of the last moment
And the word for the moment
As Meta-LSTM Input of current time , Output generated
Used to generate Basic-LSTM The parameter matrix of the current time .
Meta-LSTM There are two main advantages :
- One is the dynamic generation of parameters at each time .
- The other is more ordinary LSTM Fewer parameters , Because there is SVD decompose .
experiment
Text classification
The task of text classification is to 16 Shopping sites comment on the dataset done , The dataset size is as follows :
![Thesis appreciation [AAAI18] Meta multi task learning for sequence modeling _ multitasking _08](/img/2c/911e75b49ea52d77f6b30e73a9db39.png)
Finally, on most datasets ,Meta-LSTM Can do the best , The results are as follows :
![Thesis appreciation [AAAI18] Meta multi task learning for sequence modeling _ Text classification _09](/img/16/d9233a12e70734aba01dcc611f078e.png)
Sequence annotation
The sequence annotation task is done on three datasets , Two are NER Data sets , One is POS tagging Data sets , The results are as follows :
![Thesis appreciation [AAAI18] Meta multi task learning for sequence modeling _ multitasking _10](/img/c7/e593fa9bedfbb6a31ffeaca68d1572.png)
Can only say than the most basic LSTM+CRF If the model is too high, throw it away .
summary
This paper puts forward a kind of function-level Multi task sharing mechanism , That is to use Meta-LSTM To generate... Dynamically Basic-LSTM The parameter matrix of each moment .
After reading it, I was thinking , Can this dynamic parameter generation mechanism be used in component parsing , For example, for top-down Of chart-based Model , Can pass from top to bottom Tree-LSTM Dynamically generate the parameter matrix of each tree node , Then we use this parameter matrix to predict the label and split.
边栏推荐
- LeetCode每日一练 —— 876. 链表的中间结点
- Smart Internet ran out of China's "acceleration", and the market reshuffle behind the 26.15% carrying rate
- How to solve the problem when the Microsoft account login of the computer keeps turning around
- Leetcode daily practice - the penultimate node in the linked list
- 常见ArrayLIst面试题
- opencv实现图片裁剪和缩放
- PHP code audit 6 - file contains vulnerability
- Lidar China's front loading curtain opens, millions of production capacity to be digested
- 建筑云渲染的应用正在扩大,越来越多的行业急需可视化服务
- NPDP | what kind of product manager can be called excellent?
猜你喜欢

Airiot Q & A issue 6 | how to use the secondary development engine?
![论文赏析[AAAI18]面向序列建模的元多任务学习](/img/2b/345b5a287fcd9c9b1a86ae683f124b.png)
论文赏析[AAAI18]面向序列建模的元多任务学习

一文读懂Plato Farm的ePLATO,以及其高溢价缘由

LeetCode-209-长度最小的子数组

Introduction to source insight 4.0

mysql 最大建议行数2000w,靠谱吗?

API gateway introduction

js闭包知识

Zhongdi Digital: integrating innovative domestic GIS to boost the construction of real 3D China

MapGIS三维管线建模,唤醒城市地下管线脉搏
随机推荐
Brief description of tenant and multi tenant concepts in cloud management platform
Introduction to source insight 4.0
飞信卒于2022:中国移动一手好牌被打烂,5亿用户成“僵尸”
自动化测试----unittest框架
Codeforces 1706E 并查集 + 启发式合并 + ST 表
Feixin died in 2022: a good hand of China Mobile was broken, and 500million users became "zombies"
数字化工厂管理系统有哪些价值
LeetCode每日一练 —— 21. 合并两个有序链表
14 day Hongmeng device development practice - Chapter 7 device networking cloud learning notes
Conquer 3 pieces of IT equipment for all programmers →
Sscanf caused the address to be out of bounds
Leetcode daily practice 206. Reverse the linked list
如何对话CIO/CTO
Leetcode daily practice - 876. Intermediate node of linked list
Win11用户名和密码备份方法
Sre: Google operation and maintenance decryption
怎样实现文档协同?
Hexagon_ V65_ Programmers_ Reference_ Manual(5)
LeetCode-209-长度最小的子数组
Opencv implements image clipping and scaling