当前位置:网站首页>Deep learning - brnn and DRNN
Deep learning - brnn and DRNN
2022-06-30 07:44:00 【Hair will grow again without it】
Two way recurrent neural network (Bidirectional RNN)
two-way RNN The model allows you to not only get the previous information at some point in the sequence , You can also get information about the future
why we need BRNN?
There is a problem with this network , In judging the third word Teddy( Figure number 1 Shown ) Is it part of a person's name , It's not enough just to look at the front part of the sentence , To judge 𝑦^<3>( Figure number 2 Shown ) yes 0 still 1, Except before 3 Word , You need more information , Because before 3 A word can't tell what they're saying Teddy The bear , Or the former president of the United States Teddy Roosevelt, therefore This is a non bidirectional or forward only RNN. What I just said is always true , No matter these units ( Figure number 3 Shown ) It's standard RNN block , still GRU Unit or LSTM unit , As long as these components are forward only .
how can BRNN solve this problem?
Input only 4 individual ,𝑥<1> To 𝑥<4>. The network from here will have a forward loop unit called 𝑎⃗⃗ <1>,𝑎⃗⃗ <2>,𝑎⃗⃗ <3> also 𝑎⃗⃗ <4>, I put a right arrow on it to show the forward loop unit , All four cycle units have a current input 𝑥 Input in , Predicted 𝑦<1>,𝑦<2>,𝑦<3> and 𝑦<4>.
Here is a 𝑎⃖⃗⃗<1>, The left arrow represents a reverse connection ,𝑎⃖⃗⃗<2> Reverse connection ,𝑎⃖⃗⃗<3> Reverse connection ,𝑎⃖⃗⃗<4> Reverse connection , So the left arrow here represents a reverse connection .
Given an input sequence 𝑥<1> To 𝑥<4>, This sequence First calculate the forward 𝑎⃗⃗ <1>, Then calculate the forward 𝑎⃗⃗ <2>, next 𝑎⃗⃗ <3>,𝑎⃗⃗ <4>. and The reverse sequence is calculated from 𝑎⃖⃗⃗<4> Start , Go in reverse , Calculate the reverse 𝑎⃖⃗⃗<3>. What you calculate is the network activation value , It's not reverse but forward propagation , And this one in the picture Part of the forward propagation is calculated from left to right , Part of the calculation is right to left . After calculating the reverse 𝑎⃖⃗⃗<3>, You can use these activation values to calculate the reverse 𝑎⃖⃗⃗<2>, And then it's reverse 𝑎⃖⃗⃗<1>, After all these activation values have been calculated, the prediction results can be calculated .
for instance , In order to predict the result , Your network will be like 𝑦^<𝑡>,𝑦^<𝑡> = 𝑔(𝑊𝑔[𝑎⃗⃗ <𝑡> , 𝑎⃖⃗⃗<𝑡>] + 𝑏𝑦). For example, you need to observe time 3 The prediction here , Information from 𝑥<1> To come over , Through here , Prior to the 𝑎⃗⃗ <1> To the forward 𝑎⃗⃗ <2>, There are expressions in these functions , To the forward 𝑎⃗⃗ <3> Until then 𝑦^<3>, So from 𝑥<1>,𝑥<2>,𝑥<3> All incoming information will be taken into account , And from 𝑥<4> Incoming information flows in the opposite direction 𝑎⃖⃗⃗<4>, To reverse 𝑎⃖⃗⃗<3> Until then 𝑦^<3>, This makes time 3 The results of the prediction not only input the past information , And now the message , This step involves forward and backward information dissemination as well as future information .
This is a two-way recurrent neural network , And these basic units are not just standards RNN unit , It can also be GRU Unit or LSTM unit . in fact , A great deal of NLP problem , For a large number of texts with natural language processing problems , Yes LSTM Two way of unit RNN Models are the most used . So if there is NLP problem , And the text and sentences are complete , First of all, we need to calibrate these sentences , One has LSTM Two way of unit RNN Model , Having forward and reverse processes is a good first choice
Deep circulation neural network (Deep RNNs)
use 𝑎[1]<0> To represent the first layer , So we're now use 𝑎[𝑙]<𝑡> To represent the l Activation value of layer , This means the second 𝑡 Some time , In this way, we can express . Activation value of the first time point of the first layer 𝑎[1]<1>, this (𝑎[1]<2>) Is the activation value of the second time point of the first layer ,𝑎[1]<3> and 𝑎[1]<4>. And then we Stack these on top , This is a new network with three hidden layers .
Look at the value 𝑎[2]<3> How is it calculated .
Activation value 𝑎[2]<3> There are two inputs , One is the input from below , There's another input coming from the left ,𝑎[2]<3> = 𝑔(𝑊𝑎[2][𝑎[2]<2>, 𝑎[1]<3>] + 𝑏𝑎[2]), This is how the activation value is calculated . Parameters 𝑊𝑎[2] and 𝑏𝑎[2] It is the same in the calculation of this layer , Correspondingly, the first layer also has its own parameters 𝑊𝑎[1] and 𝑏𝑎[1].
边栏推荐
- ACM. HJ48 从单向链表中删除指定值的节点 ●●
- MCU essay
- Parameter calculation of deep learning convolution neural network
- Digital white paper on total cost management in chain operation industry
- 深度学习——GRU单元
- 想转行,却又不知道干什么?此文写给正在迷茫的你
- 2021 China Enterprise Cloud index insight Report
- Network, network card and IP configuration
- C. Fishingprince Plays With Array
- Introduction notes to pytorch deep learning (11) neural network pooling layer
猜你喜欢

期末复习-PHP学习笔记5-PHP数组

Use of nested loops and output instances

2021 China Enterprise Cloud index insight Report

深度学习——BRNN和DRNN

深度学习——嵌入矩阵and学习词嵌入andWord2Vec

深度学习——GRU单元
![November 21, 2021 [reading notes] - bioinformatics and functional genomics (Chapter 5 advanced database search)](/img/20/bbb6e740df96250016147c54bc6cc1.jpg)
November 21, 2021 [reading notes] - bioinformatics and functional genomics (Chapter 5 advanced database search)

Deloitte: investment management industry outlook in 2022

C51 minimum system board infrared remote control LED light on and off

回文子串、回文子序列
随机推荐
Network, network card and IP configuration
Final review -php learning notes 7-php and web page interaction
Combinatorial mathematics Chapter 1 Notes
C. Fishingprince Plays With Array
Research Report on search business value in the era of big search in 2022
Palindrome substring, palindrome subsequence
Final review -php learning notes 9-php session control
Simple application of generating function
Deloitte: investment management industry outlook in 2022
right four steps of SEIF SLAM
Xiashuo think tank: 125 planet updates reported today (packed with 101 meta universe collections)
National technology n32g45x series about timer timing cycle calculation
【笔记】Polygon mesh processing 学习笔记(10)
期末复习-PHP学习笔记9-PHP会话控制
November 22, 2021 [reading notes] - bioinformatics and functional genomics (Chapter 5, section 4, hidden Markov model)
How to quickly delete routing in Ad
6月底了,可以开始做准备了,不然这么赚钱的行业就没你的份了
C51 minimum system board infrared remote control LED light on and off
Self study notes -- use of 74h573
Use of nested loops and output instances

