当前位置:网站首页>LSTM of RNN
LSTM of RNN
2022-07-01 07:43:00 【Programming bear】
Cyclic neural network is not only prone to gradient dispersion or gradient explosion , Can't handle long sentences yet , That is to say, to have short-term memory (Short-term memory)
To overcome these shortcomings , Put forward Long and short term memory network (Long Short-Term Memory, abbreviation LSTM). LSTM Relative to the foundation RNN The network is , Better memory , Better at processing long sequence signal data
One 、LSTM principle
The basis of RNN The network structure is shown in the figure , The state vector of the last timestamp h𝑡-1 Input with current timestamp 𝒙𝑡 After a linear transformation , By activating the function 𝑡𝑎𝑛ℎ Then we get a new state vector h𝑡. Relative to the foundation RNN The network has only one state vector h𝑡,LSTM Added a new state vector 𝑪𝑡, At the same time, door control is introduced (Gate) Mechanism , The forgetting and refreshing of information are controlled by the gating unit , Pictured


stay LSTM in , There are two state vectors 𝒄 and h , among 𝒄 As LSTM The internal state vector of , It can be understood as LSTM Memory state vector Memory, and h Express LSTM Output vector of . Relative to the foundation RNN Come on ,LSTM Put the inside Memory And output are separated into two variables , Use three gating at the same time : Input gate (Input Gate)、 Oblivion gate (Forget Gate) and Output gate (Output Gate) To control the flow of internal information .
The gating mechanism can be understood as Control data flow A means of , stay LSTM in , valve Opening and degree Using doors Control vector 𝒈 Express 
adopt 𝜎(𝒈) The activation function compresses the door control to [0,1] Between , When 𝜎(𝒈) = 0 when , All door controls are closed , Output 𝒐 = 0; When 𝜎(𝒈) = 1 when , The door control is all open , Output 𝒐 = 𝒙. The gating mechanism can better control the flow degree of data .
1. Oblivion gate
The forgetting gate acts on LSTM State vector 𝒄 above , Memory used to control the last timestamp 𝒄𝑡-1 Impact on the current timestamp .

The control variable of forgetting gate 𝒈𝑓 from
produce ,. When gating
= 1 when , Forget all the doors open , LSTM Accept the previous status
All the information about ; When gating
= 0 when , Forgetting door closed , LSTM Direct to ignore
, Output is 0 Vector , After the forgotten door , LSTM The state vector of becomes 
2. Input gate
The input gate is used to control LSTM Acceptance of input .

First, through the input of the current timestamp
And the output of the last timestamp
Do nonlinear transformation to get a new input vector
:
,tanh Is the activation function , Used to standardize input to [-1,1] Section .
Not all will be refreshed into LSTM Of Memory, Instead, the input gate controls the amount of input received
The control variable of the input gate also comes from the input
And the output
:
, Enter the gate control variable
To determine the LSTM New input to the current timestamp
Acceptance of : When
= 0 when , LSTM Do not accept any new input
; When
= 1 when , LSTM Accept all new inputs
, After passing through the input gate , To be written Memory The vector of is 
Under the control of forgetting gate and input gate , LSTM Selectively read the memory of the last timestamp
And the current timestamp Enter into
, State vector
The refresh method is
, The new state vector obtained
This is the state vector of the current timestamp
3. Output gate
LSTM The internal state vector of
It is not used directly for output , This and the basic RNN Dissimilarity . The basis of RNN The state vector of the network is used for memory , Also used to output , So basic RNN It can be understood as a state vector 𝒄 Is the same object as the output vector . stay LSTM Inside , State vectors are not all output , But selectively output... Under the action of the output gate .

The gating variable of the output gate
by :
. When the output gate
= 0 when , Output off , LSTM My internal memory is completely cut off , Cannot be used as output , The output is 0 Vector ; When the output
= 1 when , The output is fully open , LSTM State vector of
All for output . LSTM The output of is determined by
produce , Memory vector
after tanh After activating the function, it acts with the input gate , obtain LSTM Output . because
∈[0,1], tanh(
) ∈ [-1,1], therefore LSTM Output
∈ [-1,1].
Two 、LSTM Realization
stay TensorFlow in , There are also two ways to achieve LSTM The Internet . Both available LSTMCell To manually complete the cyclic operation on the timestamp , It can also be done through LSTM Layer mode completes forward operation in one step
1.LSTMCell
LSTMCell and SimpleRNNCell Use consistent , The difference lies in LSTM State variable of List There are two , namely [ h𝑡, 𝒄𝑡], Need to initialize separately , among List The first element is h𝑡, The second element is 𝒄𝑡. call cell When the forward operation is completed , Return two elements : The first element is cell Output h𝑡, The second element is cell The updated status of List: [ h𝑡, 𝒄𝑡].
x = tf.random.normal([2, 80, 100])
cell = layers.LSTMCell(64) # establish LSTM Cell
# Initialization status
state = [tf.zeros([2, 64]), tf.zeros([2, 64])]
# Forward calculation
for xt in tf.unstack(x, axis=1):
out, state = cell(xt, state)
Output returned out and state First element of list ht It's the same
2.LSTM layer
after LSTM After layer forward propagation , By default, only the output of the last timestamp will be returned , If you need to return the output above each timestamp , Need to set up return_sequences=True sign . For multilayer neural networks , Can pass Sequential The container is wrapped in multiple layers LSTM layer , and Set up all non end layer networks return_sequences=True, This is because it is not the end layer LSTM Layer requires the output of all timestamps from the previous layer as input
x = tf.random.normal([2, 80, 100])
net = Sequential([
layers.LSTM(64, return_sequences=True), # The non end layer needs to return all timestamp outputs
layers.LSTM(64)
])
# Once through the network model , You can get the last layer 、 Output of the last timestamp
out = net(x)边栏推荐
- 电脑有网络,但所有浏览器网页都打不开,是怎么回事?
- Discussion on several research hotspots of cvpr2022
- Ctfhub port scan (SSRF)
- Minecraft 1.16.5 module development (51) tile entity
- 【mysql学习笔记27】存储过程
- 良心安利万向轮 SolidWorks模型素材网站
- H5 页面设置了字体的粗细样式,但是在华为手机里微信打开访问样式不生效?
- 2022制冷与空调设备运行操作国家题库模拟考试平台操作
- Huawei modelarts training alexnet model
- [R language] age sex frequency matching select samples for case-control study, and perform frequency matching on age and sex
猜你喜欢

The triode is a great invention

C# Newtonsoft.Json中JObject的使用

Custom events of components ②

Custom events of components ①

Alibaba OSS postman invalid according to policy: policy condition failed: ["starts with", "key", "test/"]

Jax's deep learning and scientific computing

2022 operation of refrigeration and air conditioning equipment operation of national question bank simulated examination platform

2022电工(中级)复训题库及答案

Redisson utilise la solution complète - redisson Documents officiels + commentaires (Partie 1)

運維管理系統,人性化操作體驗
随机推荐
[Shenzhen IO] precise Food Scale (some understanding of assembly language)
[MySQL learning notes 28] storage function
下载Xshell和Xftp
Apple account password auto fill
Redisson watchdog mechanism, redisson watchdog performance problems, redisson source code analysis
Basic knowledge of MATLAB
Understanding of Turing test and Chinese Room
2022 test question bank and simulation test of tea master (primary) operation certificate
论文学习——水文时间序列相似性查询的分析与研究
Discussion on several research hotspots of cvpr2022
[skill] create Bat quick open web page
华泰证券开户是安全可靠的么?怎么开华泰证券账户
【无标题】
Atguigu---- scaffold --02- use scaffold (2)
Cadence OrCAD Capture “网络名”相同,但是未连接或连接错误的解放方案之nodename的用法
redisson使用全解——redisson官方文檔+注釋(上篇)
Custom events of components ②
Système de gestion de l'exploitation et de l'entretien, expérience d'exploitation humanisée
weback5基础配置详解
The database is locked. Is there a solution