当前位置:网站首页>LSTM of RNN
LSTM of RNN
2022-07-01 07:43:00 【Programming bear】
Cyclic neural network is not only prone to gradient dispersion or gradient explosion , Can't handle long sentences yet , That is to say, to have short-term memory (Short-term memory)
To overcome these shortcomings , Put forward Long and short term memory network (Long Short-Term Memory, abbreviation LSTM). LSTM Relative to the foundation RNN The network is , Better memory , Better at processing long sequence signal data
One 、LSTM principle
The basis of RNN The network structure is shown in the figure , The state vector of the last timestamp h𝑡-1 Input with current timestamp 𝒙𝑡 After a linear transformation , By activating the function 𝑡𝑎𝑛ℎ Then we get a new state vector h𝑡. Relative to the foundation RNN The network has only one state vector h𝑡,LSTM Added a new state vector 𝑪𝑡, At the same time, door control is introduced (Gate) Mechanism , The forgetting and refreshing of information are controlled by the gating unit , Pictured


stay LSTM in , There are two state vectors 𝒄 and h , among 𝒄 As LSTM The internal state vector of , It can be understood as LSTM Memory state vector Memory, and h Express LSTM Output vector of . Relative to the foundation RNN Come on ,LSTM Put the inside Memory And output are separated into two variables , Use three gating at the same time : Input gate (Input Gate)、 Oblivion gate (Forget Gate) and Output gate (Output Gate) To control the flow of internal information .
The gating mechanism can be understood as Control data flow A means of , stay LSTM in , valve Opening and degree Using doors Control vector 𝒈 Express 
adopt 𝜎(𝒈) The activation function compresses the door control to [0,1] Between , When 𝜎(𝒈) = 0 when , All door controls are closed , Output 𝒐 = 0; When 𝜎(𝒈) = 1 when , The door control is all open , Output 𝒐 = 𝒙. The gating mechanism can better control the flow degree of data .
1. Oblivion gate
The forgetting gate acts on LSTM State vector 𝒄 above , Memory used to control the last timestamp 𝒄𝑡-1 Impact on the current timestamp .

The control variable of forgetting gate 𝒈𝑓 from
produce ,. When gating
= 1 when , Forget all the doors open , LSTM Accept the previous status
All the information about ; When gating
= 0 when , Forgetting door closed , LSTM Direct to ignore
, Output is 0 Vector , After the forgotten door , LSTM The state vector of becomes 
2. Input gate
The input gate is used to control LSTM Acceptance of input .

First, through the input of the current timestamp
And the output of the last timestamp
Do nonlinear transformation to get a new input vector
:
,tanh Is the activation function , Used to standardize input to [-1,1] Section .
Not all will be refreshed into LSTM Of Memory, Instead, the input gate controls the amount of input received
The control variable of the input gate also comes from the input
And the output
:
, Enter the gate control variable
To determine the LSTM New input to the current timestamp
Acceptance of : When
= 0 when , LSTM Do not accept any new input
; When
= 1 when , LSTM Accept all new inputs
, After passing through the input gate , To be written Memory The vector of is 
Under the control of forgetting gate and input gate , LSTM Selectively read the memory of the last timestamp
And the current timestamp Enter into
, State vector
The refresh method is
, The new state vector obtained
This is the state vector of the current timestamp
3. Output gate
LSTM The internal state vector of
It is not used directly for output , This and the basic RNN Dissimilarity . The basis of RNN The state vector of the network is used for memory , Also used to output , So basic RNN It can be understood as a state vector 𝒄 Is the same object as the output vector . stay LSTM Inside , State vectors are not all output , But selectively output... Under the action of the output gate .

The gating variable of the output gate
by :
. When the output gate
= 0 when , Output off , LSTM My internal memory is completely cut off , Cannot be used as output , The output is 0 Vector ; When the output
= 1 when , The output is fully open , LSTM State vector of
All for output . LSTM The output of is determined by
produce , Memory vector
after tanh After activating the function, it acts with the input gate , obtain LSTM Output . because
∈[0,1], tanh(
) ∈ [-1,1], therefore LSTM Output
∈ [-1,1].
Two 、LSTM Realization
stay TensorFlow in , There are also two ways to achieve LSTM The Internet . Both available LSTMCell To manually complete the cyclic operation on the timestamp , It can also be done through LSTM Layer mode completes forward operation in one step
1.LSTMCell
LSTMCell and SimpleRNNCell Use consistent , The difference lies in LSTM State variable of List There are two , namely [ h𝑡, 𝒄𝑡], Need to initialize separately , among List The first element is h𝑡, The second element is 𝒄𝑡. call cell When the forward operation is completed , Return two elements : The first element is cell Output h𝑡, The second element is cell The updated status of List: [ h𝑡, 𝒄𝑡].
x = tf.random.normal([2, 80, 100])
cell = layers.LSTMCell(64) # establish LSTM Cell
# Initialization status
state = [tf.zeros([2, 64]), tf.zeros([2, 64])]
# Forward calculation
for xt in tf.unstack(x, axis=1):
out, state = cell(xt, state)
Output returned out and state First element of list ht It's the same
2.LSTM layer
after LSTM After layer forward propagation , By default, only the output of the last timestamp will be returned , If you need to return the output above each timestamp , Need to set up return_sequences=True sign . For multilayer neural networks , Can pass Sequential The container is wrapped in multiple layers LSTM layer , and Set up all non end layer networks return_sequences=True, This is because it is not the end layer LSTM Layer requires the output of all timestamps from the previous layer as input
x = tf.random.normal([2, 80, 100])
net = Sequential([
layers.LSTM(64, return_sequences=True), # The non end layer needs to return all timestamp outputs
layers.LSTM(64)
])
# Once through the network model , You can get the last layer 、 Output of the last timestamp
out = net(x)边栏推荐
- Challenges faced by operation and maintenance? Intelligent operation and maintenance management system to help you
- 【目标检测】目标检测界的扛把子YOLOv5(原理详解+修炼指南)
- Apple账号密码自动填充
- redisson使用全解——redisson官方文档+注释(上篇)
- Apple account password auto fill
- Are there any practical skills for operation and maintenance management
- Custom events of components ②
- 组件的自定义事件①
- 2022年茶艺师(中级)复训题库及答案
- Redisson utilise la solution complète - redisson Documents officiels + commentaires (Partie 1)
猜你喜欢
![[target detection] yolov5, the shoulder of target detection (detailed principle + Training Guide)](/img/47/80d2e92ea7347cc5c7410194d5bf2e.png)
[target detection] yolov5, the shoulder of target detection (detailed principle + Training Guide)

Eigen matrix operation Library

良心安利万向轮 SolidWorks模型素材网站

C# Newtonsoft. Use of job in JSON

ctfshow-web351(SSRF)
![C language implementation [Sanzi chess game] (step analysis and implementation source code)](/img/3b/d32b46292ed20f31a6e1db97349df1.png)
C language implementation [Sanzi chess game] (step analysis and implementation source code)

论文学习——水文时间序列相似性查询的分析与研究

微软宣布开源 (GODEL) 语言模型聊天机器人

PWN攻防世界int_overflow

如何让两融交易更极速
随机推荐
【mysql学习笔记25】sql语句优化
Conscience Amway universal wheel SolidWorks model material website
[MySQL learning notes 28] storage function
Warm congratulations on the successful listing of five elements hehe liquor
Solution to the problem that objects in unity2021 scene view cannot be directly selected
Caesar
[target detection] yolov5, the shoulder of target detection (detailed principle + Training Guide)
Operation and maintenance management system, humanized operation experience
Huawei modelarts training alexnet model
[R language] two /n data merge functions
关系数据库如何工作
PWN攻防世界int_overflow
[programming training] delete public characters (hash mapping) + team competition (greedy)
Which securities company is better or safer for mobile phone account opening
Long way to go with technology
The computer has a network, but all browser pages can't be opened. What's the matter?
[Shenzhen IO] precise Food Scale (some understanding of assembly language)
base64
redisson使用全解——redisson官方文档+注释(中篇)
长路漫漫、技术作伴