当前位置：网站首页>Deep learning - LSTM Foundation

Deep learning - LSTM Foundation

2022-07-05 03:44:00 【Guan Longxin】

List of articles

1. RNN
2. LSTM

1. RNN

Remember all the information .
Insert picture description here
（1） Definition and characteristics
RNN The reason why it has excellent performance in time series data is RNN stay t Time slice will t-1 The hidden node of the time slice is used as the input of the current time slice .

（2） problem

Long term dependence ： With the increase of data time slice ,RNN Lost the ability to learn to connect information so far .
The gradient disappears ： Gradient disappearance and gradient explosion are caused by RNN Caused by the cyclic multiplication of the weight matrix .

LSTM The reason why RNN Long term dependence , Because LSTM Door introduced （gate） Mechanisms are used to control the circulation and loss of features .

2. LSTM

（1） Definition and characteristics
Set up memory cells , Selective memory .
Insert picture description here

Three doors ： Oblivion gate 、 Input gate 、 Output gate
Two states ：C(t), h(t)

（2） Forward propagation
Insert picture description here
Selectively retain historical memory , Absorb new knowledge

Oblivion gate $f_t$ ：
① $f_t=\sigma(W_{xf}x_t+W_{hf}h_{t-1}+b_f);$
② understand ： $f_t$ adopt sigmoid Function selection memory （ Forget ） Historical information $C_{t-1}$ .

As you can imagine , Brain capacity is limited . When inputting new information, we need to selectively forget some weak historical memories .

Input gate $i_t$ ：
① $i_t=\sigma(W_{xi}x_t+W_{hi}h_{t-1}+b_i);$
understand ： $i_t$ adopt sigmoid Selectively learn new information $g_t$ .
② $g_t=\tanh(W_{xg}x_t+W_{hg}h_{t-1}+b_g)$

New input information is not all useful , We just need to remember the relevant information .

Historical information $c_t$ ：
① $c_t=f_t \odot c_{t-1}+g_t*i_t;$
understand ： New memory is composed of previous memory and newly learned information . among $f_t,i_t$ They are the screening of historical memory and information .

Selectively combine historical memory with new information , Formed a new memory .

Output gate $o_t$ ：
① $o_t=\sigma(W_{xo}x_t+W_{ho}h_{t-1}+b_o);$
understand ： $o_t$ adopt sigmoid Selective use of memory $tanh(C_t)$ .
② $m_t=\tanh(c_t);$
understand ： $C_t$ adopt tanh Using historical memory .
③ $h_t=o_t \odot m_t;$ Got $h_t$ Will be output and used for the next event step t+1 in .
Output $y_t$ ：
① $y_t = W_{yh}h_t+b_y;$

（3） understand

① Use $\sigma$ function $f_t,g_t$ Selective memory of historical information $C_{t-1}$ And learn new knowledge $g_t$ .
$c_t=f_t \odot c_{t-1}+g_t*i_t;$
② Use $\sigma$ function $o_t$ Filter historical memory $C_t$ As a short-term memory $h_t$ .
$h_t=o_t \odot m_t;$
The process of spreading forward ：
LSTM Realize long-term and short-term memory through three gates and two states . First, through the memory gate $f_t$ Choose to remember historical information $C_{t-1}$ , Then through the learning door $g_t$ Selective learning of new information $i_t$ . Add the old and new memories obtained through screening to obtain new historical memories $C_t$ . Finally, through the output gate $o_t$ Selectively receive historical information to obtain short-term memory $h_t$ . Input the short-term memory into the output to obtain the output value $y_t$ .