当前位置:网站首页>LSTM neural network
LSTM neural network
2022-07-24 06:10:00 【A little cute C】

Long and short term memory network (LSTM) It is a variant of circular network , It can effectively solve the problem of cyclic neural network (RNN) Gradient explosion problem .
LSTM The three doors of
LSTM The network introduces a gating mechanism (gating mechanism) To control the path of information transmission , The three gates are input gates
、 Oblivion gate
、 Output gate
, The functions of these three doors are :
(1) Input gate
Control the candidate state at the current time
How much information needs to be saved .
(2) Oblivion gate
Control the internal state of the last time
How much information to forget
(3) Output gate
Controls the internal state at the current moment
How much information needs to be output to the external state
When
,
when , The memory unit empties the history information , And the candidate state vector
write in , But now the memory unit
Still relevant to the historical information of the last moment , When
,
when , The memory unit will copy the contents of the last time , Do not write new information .
LSTM Network “ door ” It's a kind of “ soft ” door , The value is (0,1) Between , Means to allow information to pass through in a certain proportion , The calculation method of three doors is :
,
,
,
among
by Logistic function , Its output range is (0,1),
Input for the current time ,
It is the external state of the last moment .
LSTM Calculation process
The figure below shows LSTM Cyclic cell structure of network

The calculation process is as follows :
1) First, use the external state of the previous moment
And the input of the current time
Calculate three doors , And candidate status
;
2) Combined with forgetting gate
And input gate to update memory unit
;
3) Combined with output gate
, Pass the information of internal state to external state ;
pytorch in lstm Parameter interpretation
LSTM All in all 7 Parameters :
1:input_size – The size of the input data
2:hidden_size – The size of the hidden layer ( That is, the number of hidden layer nodes ), The dimension of the output vector is equal to the number of hidden nodes
3:num_layers – LSTM The number of layers stacked , The default value is 1 layer , If set to 2, the second LSTM Receive the first LSTM Calculated results of . That is, the first layer of input [ X0 X1 X2 ... Xt], To calculate the [ h0 h1 h2 ... ht ], The second layer will [ h0 h1 h2 ... ht ] As [ X0 X1 X2 ... Xt] Enter to recalculate , Output the last [ h0 h1 h2 ... ht ].
4:bias– Whether the hidden layer state has bias, The default is true.bias It's the offset value , Or offset value
5:batch_first– Whether the first dimension of input and output is batch_size, The default value is False
6:dropout– The default value is 0. Is it in addition to the last RNN Others outside the floor RNN Add dropout layer . The input value is 0-1 Decimal between , Representation probability .0 Express 0 probability dripout, I.e. no dropout
7:bidirectional– Whether it's two-way RNN, The default is :false, if true, be :num_directions=2, Otherwise 1.
Why is it called long-term and short-term memory ?( Long and short-term memory refers to long “ Short term memory ”)
Hidden state in recurrent neural networks
Historical information is stored , It can be seen as a kind of memory . In simple cyclic networks , The hidden state is rewritten every moment , Therefore, it can be regarded as a kind of short-term memory , In the neural network , Long term memory can be regarded as grid parameters , Implicit in the experience learned from training data , Its renewal cycle is much slower than short-term memory , And in the LSTM In the network , Memory unit
You can capture some key information at a certain moment , And the ability to store this critical information at intervals , Memory unit
The declaration period for storing information in is longer than that in short-term memory
, But it's much shorter than long-term memory , Therefore, it is called short-term and long-term memory .
On gradient dispersion
Generally, in the deep network parameter learning , The value of parameter initialization is generally set to be small , But in training LSTM When the network , Too small a value will make the value of the forgetting gate smaller , This means that most of the information from the previous moment has been lost , In this way, it is difficult for the network to capture long-distance dependent information , And the gradient between adjacent time intervals will be very small , This can lead to gradient dispersion problems . Therefore, the initial value of forgotten parameters is generally set to be large , Its paranoid vector
Set to 1 or 2
边栏推荐
- Paper reading endmember guided unmixing network (EGU net)
- JVM system learning
- MySql与Qt连接、将数据输出到QT的窗口tableWidget详细过程。
- JUC并发编程基础(8)--读写锁
- Signals and systems: Hilbert transform
- unity2D游戏之让人物动起来-下
- The detailed process of connecting MySQL with QT and outputting data to QT window tablewidget.
- C language linked list (create, traverse, release, find, delete, insert a node, sort, reverse order)
- STM32 DSP库MDK VC5\VC6编译错误: 256, (const float64_t *)twiddleCoefF64_256, armBitRevIndexTableF64_256,
- JDBC初级学习 ------(师承尚硅谷)
猜你喜欢
![[MYCAT] MYCAT sets up read-write separation](/img/7e/bc3488d3ca77104af101d45d723967.png)
[MYCAT] MYCAT sets up read-write separation

Headlong platform operation

day3-jvm+排序总结
![[activiti] process variables](/img/5e/34077833f6eb997e64f186d4773e89.png)
[activiti] process variables

Day-7 JVM end

day2-WebSocket+排序

常见AR以及MR头戴显示设备整理

【树莓派4B】七、远程登录树莓派的方法总结XShell,PuTTY,vncServer,Xrdp

Installation of tensorflow and pytorch frames and CUDA pit records
![[MYCAT] MYCAT sub database and sub table](/img/a8/ebaedfa084754ef3c96f42ed78eb81.png)
[MYCAT] MYCAT sub database and sub table
随机推荐
How to download videos on the web
JUC并发编程基础(4)--线程组和线程优先级
Accurate calculation of time delay detailed explanation of VxWorks timestamp
Foundation of JUC concurrent programming (6) -- lock lock
vsual studio 2013环境 Udp组播
Unity基础知识及一些基本API的使用
[activiti] personal task
Machine learning (Zhou Zhihua) Chapter 3 Notes on learning linear models
js星星打分效果
机器学习&深度学习 入门资料分享总结
[deep learning] handwritten neural network model preservation
Typora installation package in November 2021, the last free version of the installation package to download v13.6.1
Thymeleaf快速入门学习
Jupyter notebook select CONDA environment
HoloLens2开发:使用MRTK并且模拟眼动追踪
Pytorch single machine multi card distributed training
树莓派大用处,利用校园网搭建一个校园局域网站
使用Keras和LSTM实现对于长期趋势记忆的时间序列预测-LSTNet
unity2D游戏之让人物动起来-下
[FatFs] migrate FatFs manually and transfer SRAM virtual USB flash disk