当前位置：网站首页>LSTM of RNN

LSTM of RNN

2022-07-01 07:43:00 【Programming bear】

Cyclic neural network is not only prone to gradient dispersion or gradient explosion , Can't handle long sentences yet , That is to say, to have short-term memory （Short-term memory)

To overcome these shortcomings , Put forward Long and short term memory network (Long Short-Term Memory, abbreviation LSTM). LSTM Relative to the foundation RNN The network is , Better memory , Better at processing long sequence signal data

One 、LSTM principle

The basis of RNN The network structure is shown in the figure , The state vector of the last timestamp h𝑡-1 Input with current timestamp 𝒙𝑡 After a linear transformation , By activating the function 𝑡𝑎𝑛ℎ Then we get a new state vector h𝑡. Relative to the foundation RNN The network has only one state vector h𝑡,LSTM Added a new state vector 𝑪𝑡, At the same time, door control is introduced (Gate) Mechanism , The forgetting and refreshing of information are controlled by the gating unit , Pictured

stay LSTM in , There are two state vectors 𝒄 and h , among 𝒄 As LSTM The internal state vector of , It can be understood as LSTM Memory state vector Memory, and h Express LSTM Output vector of . Relative to the foundation RNN Come on ,LSTM Put the inside Memory And output are separated into two variables , Use three gating at the same time ： Input gate (Input Gate)、 Oblivion gate (Forget Gate) and Output gate (Output Gate) To control the flow of internal information .

The gating mechanism can be understood as Control data flow A means of , stay LSTM in , valve Opening and degree Using doors Control vector 𝒈 Express

adopt 𝜎(𝒈) The activation function compresses the door control to [0,1] Between , When 𝜎(𝒈) = 0 when , All door controls are closed , Output 𝒐 = 0; When 𝜎(𝒈) = 1 when , The door control is all open , Output 𝒐 = 𝒙. The gating mechanism can better control the flow degree of data .

1. Oblivion gate

The forgetting gate acts on LSTM State vector 𝒄 above , Memory used to control the last timestamp 𝒄𝑡-1 Impact on the current timestamp .

The control variable of forgetting gate 𝒈𝑓 from $g_{f} = \sigma (W_{h}[h_{t-1},x_{t}] + b_{f})$ produce ,. When gating $g_{f}$ = 1 when , Forget all the doors open , LSTM Accept the previous status $c_{t-1}$ All the information about ; When gating $g_{f}$ = 0 when , Forgetting door closed , LSTM Direct to ignore $c_{t-1}$ , Output is 0 Vector , After the forgotten door , LSTM The state vector of becomes $g_{f}c_{t-1}$

2. Input gate

The input gate is used to control LSTM Acceptance of input .

First, through the input of the current timestamp $x_{t}$ And the output of the last timestamp $h_{t-1}$ Do nonlinear transformation to get a new input vector $\tilde{c}t$ : $\tilde{c_{t}} = \tanh (W_{c}[h_{t-1},x_{t}] + b_{c})$ ,tanh Is the activation function , Used to standardize input to [-1,1] Section . $\tilde{c}t$ Not all will be refreshed into LSTM Of Memory, Instead, the input gate controls the amount of input received

The control variable of the input gate also comes from the input $x_{t}$ And the output $h_{t-1}$ ： $g_{i} = \sigma (W_{i}[h_{t-1},x_{t}] + b_{i})$ , Enter the gate control variable $g_{i}$ To determine the LSTM New input to the current timestamp $\tilde{c}t$ Acceptance of ： When $g_{i}$ = 0 when , LSTM Do not accept any new input $\tilde{c}t$ ; When $g_{i}$ = 1 when , LSTM Accept all new inputs $\tilde{c}t$ , After passing through the input gate , To be written Memory The vector of is $g_{i}\tilde{c_{t}}$

Under the control of forgetting gate and input gate , LSTM Selectively read the memory of the last timestamp $c_{t-1}$ And the current timestamp Enter into $\tilde{c}t$ , State vector $\tilde{c}t$ The refresh method is $c_{t} = g_{i}\tilde{c_{t}} + g_{f}c_{t-1}$ , The new state vector obtained $c_{t}$ This is the state vector of the current timestamp

3. Output gate

LSTM The internal state vector of $c_{t}$ It is not used directly for output , This and the basic RNN Dissimilarity . The basis of RNN The state vector of the network is used for memory , Also used to output , So basic RNN It can be understood as a state vector 𝒄 Is the same object as the output vector . stay LSTM Inside , State vectors are not all output , But selectively output... Under the action of the output gate .

The gating variable of the output gate $g_{o}$ by ： $g_{o} = \sigma (W_{o}[h_{t-1},x_{t}] + b_{o})$ . When the output gate $g_{o}$ = 0 when , Output off , LSTM My internal memory is completely cut off , Cannot be used as output , The output is 0 Vector ; When the output $g_{o}$ = 1 when , The output is fully open , LSTM State vector of $c_{t}$ All for output . LSTM The output of is determined by $h_{t} = g_{o}.\tanh (c_{t})$ produce , Memory vector $c_{t}$ after tanh After activating the function, it acts with the input gate , obtain LSTM Output . because $g_{o}$ ∈[0,1], tanh( $c_{t}$ ) ∈ [-1,1], therefore LSTM Output $h_{t}$ ∈ [-1,1].

Two 、LSTM Realization

stay TensorFlow in , There are also two ways to achieve LSTM The Internet . Both available LSTMCell To manually complete the cyclic operation on the timestamp , It can also be done through LSTM Layer mode completes forward operation in one step

1.LSTMCell

LSTMCell and SimpleRNNCell Use consistent , The difference lies in LSTM State variable of List There are two , namely [ h𝑡, 𝒄𝑡], Need to initialize separately , among List The first element is h𝑡, The second element is 𝒄𝑡. call cell When the forward operation is completed , Return two elements : The first element is cell Output h𝑡, The second element is cell The updated status of List： [ h𝑡, 𝒄𝑡].

x = tf.random.normal([2, 80, 100])
cell = layers.LSTMCell(64)   #  establish LSTM Cell
#  Initialization status 
state = [tf.zeros([2, 64]), tf.zeros([2, 64])]
#  Forward calculation 
for xt in tf.unstack(x, axis=1):
    out, state = cell(xt, state)

Output returned out and state First element of list ht It's the same

2.LSTM layer

after LSTM After layer forward propagation , By default, only the output of the last timestamp will be returned , If you need to return the output above each timestamp , Need to set up return_sequences=True sign . For multilayer neural networks , Can pass Sequential The container is wrapped in multiple layers LSTM layer , and Set up all non end layer networks return_sequences=True, This is because it is not the end layer LSTM Layer requires the output of all timestamps from the previous layer as input

x = tf.random.normal([2, 80, 100])
net = Sequential([
    layers.LSTM(64, return_sequences=True),  #  The non end layer needs to return all timestamp outputs 
    layers.LSTM(64)
])
#  Once through the network model , You can get the last layer 、 Output of the last timestamp 
out = net(x)

原网站

版权声明
本文为[Programming bear]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202160207244089.html