当前位置:网站首页>Deep learning - LSTM Foundation

Deep learning - LSTM Foundation

2022-07-05 03:44:00 Guan Longxin

List of articles

1. RNN

Remember all the information .
 Insert picture description here
(1) Definition and characteristics
RNN The reason why it has excellent performance in time series data is RNN stay t Time slice will t-1 The hidden node of the time slice is used as the input of the current time slice .
 Insert picture description here
 Insert picture description here
(2) problem

  1. Long term dependence : With the increase of data time slice ,RNN Lost the ability to learn to connect information so far .
  2. The gradient disappears : Gradient disappearance and gradient explosion are caused by RNN Caused by the cyclic multiplication of the weight matrix .

LSTM The reason why RNN Long term dependence , Because LSTM Door introduced (gate) Mechanisms are used to control the circulation and loss of features .

2. LSTM

(1) Definition and characteristics
Set up memory cells , Selective memory .
 Insert picture description here

  • Three doors : Oblivion gate 、 Input gate 、 Output gate
  • Two states :C(t), h(t)

(2) Forward propagation
 Insert picture description here
Selectively retain historical memory , Absorb new knowledge

  1. Oblivion gate f t f_t ft
    f t = σ ( W x f x t + W h f h t − 1 + b f ) ; f_t=\sigma(W_{xf}x_t+W_{hf}h_{t-1}+b_f); ft=σ(Wxfxt+Whfht1+bf);
    ② understand : f t f_t ft adopt sigmoid Function selection memory ( Forget ) Historical information C t − 1 C_{t-1} Ct1.

As you can imagine , Brain capacity is limited . When inputting new information, we need to selectively forget some weak historical memories .

  1. Input gate i t i_t it
    i t = σ ( W x i x t + W h i h t − 1 + b i ) ; i_t=\sigma(W_{xi}x_t+W_{hi}h_{t-1}+b_i); it=σ(Wxixt+Whiht1+bi);
    understand : i t i_t it adopt sigmoid Selectively learn new information g t g_t gt.
    g t = tanh ⁡ ( W x g x t + W h g h t − 1 + b g ) g_t=\tanh(W_{xg}x_t+W_{hg}h_{t-1}+b_g) gt=tanh(Wxgxt+Whght1+bg)

New input information is not all useful , We just need to remember the relevant information .

  1. Historical information c t c_t ct
    c t = f t ⊙ c t − 1 + g t ∗ i t ; c_t=f_t \odot c_{t-1}+g_t*i_t; ct=ftct1+gtit;
    understand : New memory is composed of previous memory and newly learned information . among f t , i t f_t,i_t ft,it They are the screening of historical memory and information .

Selectively combine historical memory with new information , Formed a new memory .

  1. Output gate o t o_t ot
    o t = σ ( W x o x t + W h o h t − 1 + b o ) ; o_t=\sigma(W_{xo}x_t+W_{ho}h_{t-1}+b_o); ot=σ(Wxoxt+Whoht1+bo);
    understand : o t o_t ot adopt sigmoid Selective use of memory tanh ⁡ ( C t ) \tanh(C_t) tanh(Ct).
    m t = tanh ⁡ ( c t ) ; m_t=\tanh(c_t); mt=tanh(ct);
    understand : C t C_t Ct adopt tanh Using historical memory .
    h t = o t ⊙ m t ; h_t=o_t \odot m_t; ht=otmt; Got h t h_t ht Will be output and used for the next event step t+1 in .

  2. Output y t y_t yt
    y t = W y h h t + b y ; y_t = W_{yh}h_t+b_y; yt=Wyhht+by;

(3) understand

  • ① Use σ \sigma σ function f t , g t f_t,g_t ft,gt Selective memory of historical information C t − 1 C_{t-1} Ct1 And learn new knowledge g t g_t gt.
    c t = f t ⊙ c t − 1 + g t ∗ i t ; c_t=f_t \odot c_{t-1}+g_t*i_t; ct=ftct1+gtit;

  • ② Use σ \sigma σ function o t o_t ot Filter historical memory C t C_t Ct As a short-term memory h t h_t ht.
    h t = o t ⊙ m t ; h_t=o_t \odot m_t; ht=otmt;

  • The process of spreading forward
    LSTM Realize long-term and short-term memory through three gates and two states . First, through the memory gate f t f_t ft Choose to remember historical information C t − 1 C_{t-1} Ct1, Then through the learning door g t g_t gt Selective learning of new information i t i_t it. Add the old and new memories obtained through screening to obtain new historical memories C t C_t Ct. Finally, through the output gate o t o_t ot Selectively receive historical information to obtain short-term memory h t h_t ht. Input the short-term memory into the output to obtain the output value y t y_t yt.

原网站

版权声明
本文为[Guan Longxin]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/186/202207050303306439.html