当前位置:网站首页>【5分钟Paper】Pointer Network指针网络
【5分钟Paper】Pointer Network指针网络
2022-07-26 15:03:00 【小小何先生】
所解决的问题?
提出了一个网络结构,学习输入序列的位置关系。
背景
学习输入序列的位置关系这一类问题可以被看做是seq2seq问题,输出序列长度与输入序列长度一致,并且是一个可变变量。可以用来处理变量排序或者组合优化问题。
这篇文章是2017年发表的,在这之前,对于循环神经网络这一类算法,主要还是基于先验固定了输出的字典,也就是隐藏层输出单元的预测大小并不可知。
所采用的方法?
与之前采用的attention用于编码encoder输入的隐藏状态不同,作者提出的Ptr-Net是基于Attention去选择的。

论文中核心的网络结构就在decoder部分。主要有两点:
- 与以往的
seq2seq模型不同,decoder每一个时刻的输入其实上一个时刻被选中的节点输入信息,是最原始的坐标信息编码之后进入到encoder中的输入信息。 - 在计算
decoder的输出的时候,采用的attention的计算如下:输入是[bs, hidden_dim, seq_len]的context和维度为[bs, hidden_dim]的input。这里的input就是decoder的输入和隐藏层经过LSTM得到的输出。
context经过一个一维度卷积操作,也相当于经过一个权重矩阵 W 1 W_{1} W1得到[bs, hidden_dim, seq_len]维度不变的矩阵。input经过一个权重矩阵 W 2 W_{2} W2再扩展维度到[bs, hidden_dim, seq_len]。之后这两个矩阵相加再与一个[bs, 1, hidden_dim]的可学习的变量 V V V相乘。得到[bs, seq]维度的attention矩阵。对应到原文公式就是:
u j i = v T t a n h ( W 1 e j + W 2 d i ) j ∈ ( 1 , ⋯ n ) u_{j}^{i} = v^{T} tanh(W_{1} e_{j} + W_{2}d_{i} ) \ \ \ j \in (1, \cdots n) uji=vTtanh(W1ej+W2di) j∈(1,⋯n)
过一层softmax得到attention矩阵:
a j i = s o f t m a x ( u j i ) j ∈ ( 1 , ⋯ n ) a_{j}^{i} = softmax(u_{j}^{i}) \ \ \ j \in (1, \cdots n) aji=softmax(uji) j∈(1,⋯n)
这个[bs, seq]维度的attention矩阵再与维度为[bs, hidden_dim, seq_len]的context矩阵相乘得到隐藏层的输出[bs, hidden_dim],作为LSTM的下一个时刻的hidden state。
d i ′ = ∑ j = 1 n a j i e j d_{i}^{\prime} = \sum_{j=1}^{n} a_{j}^{i} e_{j} di′=j=1∑najiej
取得的效果?
所出版信息?作者信息?
参考链接
边栏推荐
- Everything is available Cassandra: the fairy database behind Huawei tag
- About the selection of industrial control gateway IOT serial port to WiFi module and serial port to network port module
- 李宏毅《机器学习》丨3. Gradient Descent(梯度下降)
- How to get 5L water in a full 10L container, 7L or 4L empty container
- FOC电机控制基础
- 【华为联机对战服务】客户端退出重连或中途进入游戏,新玩家如何补帧?
- In the changing era of equipment manufacturing industry, how can SCM supply chain management system enable equipment manufacturing enterprises to transform and upgrade
- R语言使用lm函数构建多元回归模型(Multiple Linear Regression)、并根据模型系数写出回归方程、使用fitted函数计算出模型的拟合的y值(响应值)向量
- Sqldeveloper tools quick start
- Parallel d-Pipeline: A Cuckoo Hashing Implementation for Increased Throughput论文总结
猜你喜欢

Cve-2022-33891 Apache spark shell command injection vulnerability recurrence

The civil construction of the whole line of Guangzhou Metro Line 13 phase II has been completed by 53%, and it is expected to open next year

If food manufacturing enterprises want to realize intelligent and collaborative supplier management, it is enough to choose SRM supplier system

怎样在nature上查文献?

写综述,想用一个靠谱的整理文献的软件,有推荐的吗?

OSPF和MGRE实验

装备制造业的变革时代,SCM供应链管理系统如何赋能装备制造企业转型升级

driver开发环境
Database expansion can also be so smooth, MySQL 100 billion level data production environment expansion practice

Vs add settings for author information and time information
随机推荐
Digital commerce cloud: lead the digital upgrading of chemical industry and see how Mobei can quickly open up the whole scene of mutual integration and interoperability
Write a summary, want to use a reliable software to sort out documents, is there any recommendation?
益方生物上市首日跌16%:公司市值88亿 高瓴与礼来是股东
If food manufacturing enterprises want to realize intelligent and collaborative supplier management, it is enough to choose SRM supplier system
The most detailed patent application tutorial, teaching you how to apply for a patent
pytorch---进阶篇(函数使用技巧/注意事项)
Simulation of character function and string function
哪里有写毕业论文需要的外文文献?
R语言可视化散点图、使用ggrepel包的geom_text_repel函数避免数据点之间的标签互相重叠(设置min.segment.length参数为0为每个数据点的标签添加线段)
FOC learning notes - coordinate transformation and simulation verification
FOC电机控制基础
How do college students apply for utility model patents?
cs224w(图机器学习)2021冬季课程学习笔记5
【LeetCode每日一题】——121.买卖股票的最佳时机
Sharkteam releases Web3 security situational awareness report in the second quarter of 2022
Familiarize you with the "phone book" of cloud network: DNS
The leader took credit for it. I changed the variable name and laid him off
数据挖掘之数据预处理
二叉树的创建以及遍历
OSS deletes all files two days before the current time