2022-07-05 03:33:00 【冠long馨】
1. RNN
- 长期依赖问题:随着数据时间片的增加,RNN丧失了学习连接如此远的信息的能力。
- 梯度消失:产生梯度消失和梯度爆炸是由于RNN的权值矩阵循环相乘导致的。
- 三个门:遗忘门、输入门、输出门
- 两个状态:C(t), h(t)
- 遗忘门 f t f_t ft:
① f t = σ ( W x f x t + W h f h t − 1 + b f ) ; f_t=\sigma(W_{xf}x_t+W_{hf}h_{t-1}+b_f); ft=σ(Wxfxt+Whfht−1+bf);
②理解: f t f_t ft通过sigmoid函数选择记忆(遗忘)历史信息 C t − 1 C_{t-1} Ct−1。
- 输入门 i t i_t it:
① i t = σ ( W x i x t + W h i h t − 1 + b i ) ; i_t=\sigma(W_{xi}x_t+W_{hi}h_{t-1}+b_i); it=σ(Wxixt+Whiht−1+bi);
理解: i t i_t it通过sigmoid选择性的学习新的信息 g t g_t gt。
② g t = tanh ( W x g x t + W h g h t − 1 + b g ) g_t=\tanh(W_{xg}x_t+W_{hg}h_{t-1}+b_g) gt=tanh(Wxgxt+Whght−1+bg)
- 历史信息 c t c_t ct:
① c t = f t ⊙ c t − 1 + g t ∗ i t ; c_t=f_t \odot c_{t-1}+g_t*i_t; ct=ft⊙ct−1+gt∗it;
理解:新的记忆是由之前的记忆和新获知的信息组成。其中 f t , i t f_t,i_t ft,it分别是对历史记忆和信息的筛选。
输出门 o t o_t ot:
① o t = σ ( W x o x t + W h o h t − 1 + b o ) ; o_t=\sigma(W_{xo}x_t+W_{ho}h_{t-1}+b_o); ot=σ(Wxoxt+Whoht−1+bo);
理解: o t o_t ot通过sigmoid选择性的运用记忆 tanh ( C t ) \tanh(C_t) tanh(Ct)。
② m t = tanh ( c t ) ; m_t=\tanh(c_t); mt=tanh(ct);
理解: C t C_t Ct通过tanh运用历史记忆。
③ h t = o t ⊙ m t ; h_t=o_t \odot m_t; ht=ot⊙mt;得到的 h t h_t ht会输出和用于下一个事件步t+1中。输出 y t y_t yt:
① y t = W y h h t + b y ; y_t = W_{yh}h_t+b_y; yt=Wyhht+by;
①使用 σ \sigma σ函数 f t , g t f_t,g_t ft,gt选择性的记忆历史信息 C t − 1 C_{t-1} Ct−1和学习新的知识 g t g_t gt。
c t = f t ⊙ c t − 1 + g t ∗ i t ; c_t=f_t \odot c_{t-1}+g_t*i_t; ct=ft⊙ct−1+gt∗it;②使用 σ \sigma σ函数 o t o_t ot筛选历史记忆 C t C_t Ct作为短期记忆 h t h_t ht。
h t = o t ⊙ m t ; h_t=o_t \odot m_t; ht=ot⊙mt;向前传播的过程:
LSTM通过三个门两个状态实现长短期记忆。首先通过记忆门 f t f_t ft选择记忆历史信息 C t − 1 C_{t-1} Ct−1,然后通过学习门 g t g_t gt选择性学习新的信息 i t i_t it。将筛选获得的新旧记忆相加获得新的历史记忆 C t C_t Ct。最后通过输出门 o t o_t ot选择性接收历史信息获得短期记忆 h t h_t ht。将短期记忆输入到输出中获得输出值 y t y_t yt。
- Watch the online press conference of tdengine community heroes and listen to TD hero talk about the legend of developers
- This + closure + scope interview question
- Solve the problem that sqlyog does not have a schema Designer
- 为什么腾讯阿里等互联网大厂诞生的好产品越来越少?
- 单项框 复选框
- Qrcode: generate QR code from text
- Pat grade a 1119 pre- and post order traversals (30 points)
- IPv6 experiment
- Sqoop命令
- [groovy] string (string splicing | multi line string)
SQL injection exercise -- sqli Labs
[learning notes] month end operation -gr/ir reorganization
Bumblebee: build, deliver, and run ebpf programs smoothly like silk
The perfect car for successful people: BMW X7! Superior performance, excellent comfort and safety
[groovy] string (string type variable definition | character type variable definition)
Multi person online anonymous chat room / private chat room source code / support the creation of multiple chat rooms at the same time
The perfect car for successful people: BMW X7! Superior performance, excellent comfort and safety
SQL performance optimization skills
Technology sharing swift defense programming
Linux Installation redis
Bumblebee: build, deliver, and run ebpf programs smoothly like silk
Multi person online anonymous chat room / private chat room source code / support the creation of multiple chat rooms at the same time
Why do some programmers change careers before they are 30?
[105] Baidu brain map - Online mind mapping tool
Basic authorization command for Curl
Six stone programming: advantages of automated testing
The latest blind box mall, which has been repaired very popular these days, has complete open source operation source code
SQL performance optimization skills
Usage scenarios and solutions of ledger sharing
Accuracy problem and solution of BigDecimal
Difference between MotionEvent. getRawX and MotionEvent. getX
Qrcode: generate QR code from text
LeetCode 234. Palindrome linked list
Eight days of learning C language - while loop (embedded) (single chip microcomputer)