当前位置:网站首页>Deep learning - LSTM Foundation
Deep learning - LSTM Foundation
2022-07-05 03:44:00 【Guan Longxin】
1. RNN
Remember all the information .
(1) Definition and characteristics
RNN The reason why it has excellent performance in time series data is RNN stay t Time slice will t-1 The hidden node of the time slice is used as the input of the current time slice .
(2) problem
- Long term dependence : With the increase of data time slice ,RNN Lost the ability to learn to connect information so far .
- The gradient disappears : Gradient disappearance and gradient explosion are caused by RNN Caused by the cyclic multiplication of the weight matrix .
LSTM The reason why RNN Long term dependence , Because LSTM Door introduced (gate) Mechanisms are used to control the circulation and loss of features .
2. LSTM
(1) Definition and characteristics
Set up memory cells , Selective memory .
- Three doors : Oblivion gate 、 Input gate 、 Output gate
- Two states :C(t), h(t)
(2) Forward propagation
Selectively retain historical memory , Absorb new knowledge
- Oblivion gate f t f_t ft:
① f t = σ ( W x f x t + W h f h t − 1 + b f ) ; f_t=\sigma(W_{xf}x_t+W_{hf}h_{t-1}+b_f); ft=σ(Wxfxt+Whfht−1+bf);
② understand : f t f_t ft adopt sigmoid Function selection memory ( Forget ) Historical information C t − 1 C_{t-1} Ct−1.
As you can imagine , Brain capacity is limited . When inputting new information, we need to selectively forget some weak historical memories .
- Input gate i t i_t it:
① i t = σ ( W x i x t + W h i h t − 1 + b i ) ; i_t=\sigma(W_{xi}x_t+W_{hi}h_{t-1}+b_i); it=σ(Wxixt+Whiht−1+bi);
understand : i t i_t it adopt sigmoid Selectively learn new information g t g_t gt.
② g t = tanh ( W x g x t + W h g h t − 1 + b g ) g_t=\tanh(W_{xg}x_t+W_{hg}h_{t-1}+b_g) gt=tanh(Wxgxt+Whght−1+bg)
New input information is not all useful , We just need to remember the relevant information .
- Historical information c t c_t ct:
① c t = f t ⊙ c t − 1 + g t ∗ i t ; c_t=f_t \odot c_{t-1}+g_t*i_t; ct=ft⊙ct−1+gt∗it;
understand : New memory is composed of previous memory and newly learned information . among f t , i t f_t,i_t ft,it They are the screening of historical memory and information .
Selectively combine historical memory with new information , Formed a new memory .
Output gate o t o_t ot:
① o t = σ ( W x o x t + W h o h t − 1 + b o ) ; o_t=\sigma(W_{xo}x_t+W_{ho}h_{t-1}+b_o); ot=σ(Wxoxt+Whoht−1+bo);
understand : o t o_t ot adopt sigmoid Selective use of memory tanh ( C t ) \tanh(C_t) tanh(Ct).
② m t = tanh ( c t ) ; m_t=\tanh(c_t); mt=tanh(ct);
understand : C t C_t Ct adopt tanh Using historical memory .
③ h t = o t ⊙ m t ; h_t=o_t \odot m_t; ht=ot⊙mt; Got h t h_t ht Will be output and used for the next event step t+1 in .Output y t y_t yt:
① y t = W y h h t + b y ; y_t = W_{yh}h_t+b_y; yt=Wyhht+by;
(3) understand
① Use σ \sigma σ function f t , g t f_t,g_t ft,gt Selective memory of historical information C t − 1 C_{t-1} Ct−1 And learn new knowledge g t g_t gt.
c t = f t ⊙ c t − 1 + g t ∗ i t ; c_t=f_t \odot c_{t-1}+g_t*i_t; ct=ft⊙ct−1+gt∗it;② Use σ \sigma σ function o t o_t ot Filter historical memory C t C_t Ct As a short-term memory h t h_t ht.
h t = o t ⊙ m t ; h_t=o_t \odot m_t; ht=ot⊙mt;The process of spreading forward :
LSTM Realize long-term and short-term memory through three gates and two states . First, through the memory gate f t f_t ft Choose to remember historical information C t − 1 C_{t-1} Ct−1, Then through the learning door g t g_t gt Selective learning of new information i t i_t it. Add the old and new memories obtained through screening to obtain new historical memories C t C_t Ct. Finally, through the output gate o t o_t ot Selectively receive historical information to obtain short-term memory h t h_t ht. Input the short-term memory into the output to obtain the output value y t y_t yt.
边栏推荐
- ABP vNext microservice architecture detailed tutorial - distributed permission framework (Part 1)
- The latest blind box mall, which has been repaired very popular these days, has complete open source operation source code
- Cette ADB MySQL prend - elle en charge SQL Server?
- Mongodb common commands
- Monitoring web performance with performance
- [untitled]
- Use of kubesphere configuration set (configmap)
- 【软件逆向-基础知识】分析方法、汇编指令体系结构
- NEW:Devart dotConnect ADO.NET
- FBO and RBO disappeared in webgpu
猜你喜欢
[groovy] string (string type variable definition | character type variable definition)
[groovy] string (string injection function | asBoolean | execute | minus)
[web Audit - source code disclosure] obtain source code methods and use tools
【软件逆向-分析工具】反汇编和反编译工具
[software reverse analysis tool] disassembly and decompilation tool
IPv6 experiment
SPI and IIC communication protocol
Multi person online anonymous chat room / private chat room source code / support the creation of multiple chat rooms at the same time
Operation flow of UE4 DMX and grandma2 onpc 3.1.2.5
函数基础学习02
随机推荐
When sqlacodegen generates a model, how to solve the problem that the password contains special characters?
Usage scenarios and solutions of ledger sharing
[wp][入门]刷弱类型题目
@Transactional 注解导致跨库查询失效的问题
001 chip test
Difference between MotionEvent. getRawX and MotionEvent. getX
Machine learning experiment report 1 - linear model, decision tree, neural network part
【web審計-源碼泄露】獲取源碼方法,利用工具
How to define a unified response object gracefully
VM in-depth learning (XXV) -class file overview
Timing manager based on C #
[punch in questions] integrated daily 5-question sharing (phase III)
Use of kubesphere configuration set (configmap)
Kbp206-asemi rectifier bridge kbp206
Nmap使用手册学习记录
Kubernetes -- cluster expansion principle
有個疑問 flink sql cdc 的話可以設置並行度麼, 並行度大於1會有順序問題吧?
It took two nights to get Wu Enda's machine learning course certificate from Stanford University
Basic knowledge of tuples
Solve the problem that sqlyog does not have a schema Designer