当前位置:网站首页>Deep learning - LSTM Foundation
Deep learning - LSTM Foundation
2022-07-05 03:44:00 【Guan Longxin】
1. RNN
Remember all the information .
(1) Definition and characteristics
RNN The reason why it has excellent performance in time series data is RNN stay t Time slice will t-1 The hidden node of the time slice is used as the input of the current time slice .
(2) problem
- Long term dependence : With the increase of data time slice ,RNN Lost the ability to learn to connect information so far .
- The gradient disappears : Gradient disappearance and gradient explosion are caused by RNN Caused by the cyclic multiplication of the weight matrix .
LSTM The reason why RNN Long term dependence , Because LSTM Door introduced (gate) Mechanisms are used to control the circulation and loss of features .
2. LSTM
(1) Definition and characteristics
Set up memory cells , Selective memory .
- Three doors : Oblivion gate 、 Input gate 、 Output gate
- Two states :C(t), h(t)
(2) Forward propagation
Selectively retain historical memory , Absorb new knowledge
- Oblivion gate f t f_t ft:
① f t = σ ( W x f x t + W h f h t − 1 + b f ) ; f_t=\sigma(W_{xf}x_t+W_{hf}h_{t-1}+b_f); ft=σ(Wxfxt+Whfht−1+bf);
② understand : f t f_t ft adopt sigmoid Function selection memory ( Forget ) Historical information C t − 1 C_{t-1} Ct−1.
As you can imagine , Brain capacity is limited . When inputting new information, we need to selectively forget some weak historical memories .
- Input gate i t i_t it:
① i t = σ ( W x i x t + W h i h t − 1 + b i ) ; i_t=\sigma(W_{xi}x_t+W_{hi}h_{t-1}+b_i); it=σ(Wxixt+Whiht−1+bi);
understand : i t i_t it adopt sigmoid Selectively learn new information g t g_t gt.
② g t = tanh ( W x g x t + W h g h t − 1 + b g ) g_t=\tanh(W_{xg}x_t+W_{hg}h_{t-1}+b_g) gt=tanh(Wxgxt+Whght−1+bg)
New input information is not all useful , We just need to remember the relevant information .
- Historical information c t c_t ct:
① c t = f t ⊙ c t − 1 + g t ∗ i t ; c_t=f_t \odot c_{t-1}+g_t*i_t; ct=ft⊙ct−1+gt∗it;
understand : New memory is composed of previous memory and newly learned information . among f t , i t f_t,i_t ft,it They are the screening of historical memory and information .
Selectively combine historical memory with new information , Formed a new memory .
Output gate o t o_t ot:
① o t = σ ( W x o x t + W h o h t − 1 + b o ) ; o_t=\sigma(W_{xo}x_t+W_{ho}h_{t-1}+b_o); ot=σ(Wxoxt+Whoht−1+bo);
understand : o t o_t ot adopt sigmoid Selective use of memory tanh ( C t ) \tanh(C_t) tanh(Ct).
② m t = tanh ( c t ) ; m_t=\tanh(c_t); mt=tanh(ct);
understand : C t C_t Ct adopt tanh Using historical memory .
③ h t = o t ⊙ m t ; h_t=o_t \odot m_t; ht=ot⊙mt; Got h t h_t ht Will be output and used for the next event step t+1 in .Output y t y_t yt:
① y t = W y h h t + b y ; y_t = W_{yh}h_t+b_y; yt=Wyhht+by;
(3) understand
① Use σ \sigma σ function f t , g t f_t,g_t ft,gt Selective memory of historical information C t − 1 C_{t-1} Ct−1 And learn new knowledge g t g_t gt.
c t = f t ⊙ c t − 1 + g t ∗ i t ; c_t=f_t \odot c_{t-1}+g_t*i_t; ct=ft⊙ct−1+gt∗it;② Use σ \sigma σ function o t o_t ot Filter historical memory C t C_t Ct As a short-term memory h t h_t ht.
h t = o t ⊙ m t ; h_t=o_t \odot m_t; ht=ot⊙mt;The process of spreading forward :
LSTM Realize long-term and short-term memory through three gates and two states . First, through the memory gate f t f_t ft Choose to remember historical information C t − 1 C_{t-1} Ct−1, Then through the learning door g t g_t gt Selective learning of new information i t i_t it. Add the old and new memories obtained through screening to obtain new historical memories C t C_t Ct. Finally, through the output gate o t o_t ot Selectively receive historical information to obtain short-term memory h t h_t ht. Input the short-term memory into the output to obtain the output value y t y_t yt.
边栏推荐
- Unity implements the code of the attacked white flash (including shader)
- When sqlacodegen generates a model, how to solve the problem that the password contains special characters?
- Three line by line explanations of the source code of anchor free series network yolox (a total of ten articles, which are guaranteed to be explained line by line. After reading it, you can change the
- Anti debugging (basic principles of debugger Design & NT NP and other anti debugging principles)
- Thread Basics
- Une question est de savoir si Flink SQL CDC peut définir le parallélisme. Si le parallélisme est supérieur à 1, il y aura un problème d'ordre?
- [105] Baidu brain map - Online mind mapping tool
- How can we truncate the float64 type to a specific precision- How can we truncate float64 type to a particular precision?
- [punch in questions] integrated daily 5-question sharing (phase III)
- The architect started to write a HelloWorld
猜你喜欢
[learning notes] month end operation -gr/ir reorganization
Mongodb common commands
Use of kubesphere configuration set (configmap)
Share the newly released web application development framework based on blazor Technology
[wp]bmzclub几道题的writeup
Machine learning experiment report 1 - linear model, decision tree, neural network part
[105] Baidu brain map - Online mind mapping tool
Zero foundation uses paddlepaddle to build lenet-5 network
The perfect car for successful people: BMW X7! Superior performance, excellent comfort and safety
Accuracy problem and solution of BigDecimal
随机推荐
The latest blind box mall, which has been repaired very popular these days, has complete open source operation source code
English essential vocabulary 3400
Installation of postman and postman interceptor
SPI and IIC communication protocol
Google Chrome CSS will not update unless the cache is cleared - Google Chrome CSS doesn't update unless clear cache
Kubernetes - identity and authority authentication
Why do some programmers change careers before they are 30?
【软件逆向-分析工具】反汇编和反编译工具
MySQL winter vacation self-study 2022 11 (9)
[untitled]
特殊版:SpreadJS v15.1 VS SpreadJS v15.0
v-if VS v-show 2.0
MySQL winter vacation self-study 2022 11 (10)
Easy processing of ten-year futures and stock market data -- Application of tdengine in Tongxinyuan fund
glibc strlen 实现方式分析
Operation flow of UE4 DMX and grandma2 onpc 3.1.2.5
Flex flexible layout
[an Xun cup 2019] not file upload
ActiveReportsJS 3.1 VS ActiveReportsJS 3.0
Timing manager based on C #