当前位置：网站首页>Deep learning - brnn and DRNN

Deep learning - brnn and DRNN

2022-06-30 07:44:00 【Hair will grow again without it】

Two way recurrent neural network （Bidirectional RNN）

two-way RNN The model allows you to not only get the previous information at some point in the sequence , You can also get information about the future

why we need BRNN？

There is a problem with this network , In judging the third word Teddy（ Figure number 1 Shown ） Is it part of a person's name , It's not enough just to look at the front part of the sentence , To judge 𝑦^<3>（ Figure number 2 Shown ） yes 0 still 1, Except before 3 Word , You need more information , Because before 3 A word can't tell what they're saying Teddy The bear , Or the former president of the United States Teddy Roosevelt, therefore This is a non bidirectional or forward only RNN. What I just said is always true , No matter these units （ Figure number 3 Shown ） It's standard RNN block , still GRU Unit or LSTM unit , As long as these components are forward only .

how can BRNN solve this problem？

Input only 4 individual ,𝑥<1> To 𝑥<4>. The network from here will have a forward loop unit called 𝑎⃗⃗ <1>,𝑎⃗⃗ <2>,𝑎⃗⃗ <3> also 𝑎⃗⃗ <4>, I put a right arrow on it to show the forward loop unit , All four cycle units have a current input 𝑥 Input in , Predicted 𝑦^<1>,𝑦<2>,𝑦^{<3> and 𝑦}<4>.
Here is a 𝑎⃖⃗⃗<1>, The left arrow represents a reverse connection ,𝑎⃖⃗⃗<2> Reverse connection ,𝑎⃖⃗⃗<3> Reverse connection ,𝑎⃖⃗⃗<4> Reverse connection , So the left arrow here represents a reverse connection .
Given an input sequence 𝑥<1> To 𝑥<4>, This sequence First calculate the forward 𝑎⃗⃗ <1>, Then calculate the forward 𝑎⃗⃗ <2>, next 𝑎⃗⃗ <3>,𝑎⃗⃗ <4>. and The reverse sequence is calculated from 𝑎⃖⃗⃗<4> Start , Go in reverse , Calculate the reverse 𝑎⃖⃗⃗<3>. What you calculate is the network activation value , It's not reverse but forward propagation , And this one in the picture Part of the forward propagation is calculated from left to right , Part of the calculation is right to left . After calculating the reverse 𝑎⃖⃗⃗<3>, You can use these activation values to calculate the reverse 𝑎⃖⃗⃗<2>, And then it's reverse 𝑎⃖⃗⃗<1>, After all these activation values have been calculated, the prediction results can be calculated .
for instance , In order to predict the result , Your network will be like 𝑦^<𝑡>,𝑦^<𝑡> = 𝑔(𝑊𝑔[𝑎⃗⃗ <𝑡> , 𝑎⃖⃗⃗<𝑡>] + 𝑏𝑦). For example, you need to observe time 3 The prediction here , Information from 𝑥<1> To come over , Through here , Prior to the 𝑎⃗⃗ <1> To the forward 𝑎⃗⃗ <2>, There are expressions in these functions , To the forward 𝑎⃗⃗ <3> Until then 𝑦^<3>, So from 𝑥<1>,𝑥<2>,𝑥<3> All incoming information will be taken into account , And from 𝑥<4> Incoming information flows in the opposite direction 𝑎⃖⃗⃗<4>, To reverse 𝑎⃖⃗⃗<3> Until then 𝑦^<3>, This makes time 3 The results of the prediction not only input the past information , And now the message , This step involves forward and backward information dissemination as well as future information .

This is a two-way recurrent neural network , And these basic units are not just standards RNN unit , It can also be GRU Unit or LSTM unit . in fact , A great deal of NLP problem , For a large number of texts with natural language processing problems , Yes LSTM Two way of unit RNN Models are the most used . So if there is NLP problem , And the text and sentences are complete , First of all, we need to calibrate these sentences , One has LSTM Two way of unit RNN Model , Having forward and reverse processes is a good first choice

Deep circulation neural network （Deep RNNs）

use 𝑎[1]<0> To represent the first layer , So we're now use 𝑎[𝑙]<𝑡> To represent the l Activation value of layer , This means the second 𝑡 Some time , In this way, we can express . Activation value of the first time point of the first layer 𝑎[1]<1>, this （𝑎[1]<2>） Is the activation value of the second time point of the first layer ,𝑎[1]<3> and 𝑎[1]<4>. And then we Stack these on top , This is a new network with three hidden layers .
Look at the value 𝑎[2]<3> How is it calculated .
Activation value 𝑎[2]<3> There are two inputs , One is the input from below , There's another input coming from the left ,𝑎[2]<3> = 𝑔(𝑊𝑎[2][𝑎[2]<2>, 𝑎[1]<3>] + 𝑏𝑎[2]), This is how the activation value is calculated . Parameters 𝑊𝑎[2] and 𝑏𝑎[2] It is the same in the calculation of this layer , Correspondingly, the first layer also has its own parameters 𝑊𝑎[1] and 𝑏𝑎[1].