当前位置:网站首页>Realizing deep learning framework from zero -- LSTM from theory to practice [theory]
Realizing deep learning framework from zero -- LSTM from theory to practice [theory]
2022-06-29 22:23:00 【Angry coke】
introduction
In line with “ Everything I can't create , I can't understand ” Thought , This series The article will be based on pure Python as well as NumPy Create your own deep learning framework from zero , The framework is similar PyTorch It can realize automatic derivation .
Deep understanding and deep learning , The experience of creating from scratch is very important , From an understandable point of view , Try not to use an external complete framework , Implement the model we want . This series The purpose of this article is through such a process , Let us grasp the underlying realization of deep learning , Instead of just being a switchman .
What we introduced earlier is simple RNN There are some problems , That is, it is difficult to keep the information away from the current position and the gradient disappears .
LSTM
LSTM Designed to solve the above problems . By making the network learn to forget unwanted information , And remember the information you need to make decisions in the future , To explicitly manage contextual information .
LSTM The context management problem is divided into two sub problems : Remove information that is no longer needed from the context , And adding information that is more likely to be needed for future decisions .
framework
The key to solving these two problems is to learn how to manage this context , Instead of hard coding policies into the architecture .LSTM First Add a display context layer to the network ( Except for the common RNN Hidden layer ), And by using specialized neurons , Gates are used to control the flow of information into and out of the units that make up the network layer . These gates are used through the input 、 Additional weight operations are added to the previous hidden state and the previous context state respectively .
LSTM These doors in share the same design pattern : Each contains a feedforward network , Next is sigmoid Activation function , Finally, there is an element level multiplication with the gated layer .
choice sigmoid As an activation function, its output is based on 0 To 1 Between . The effect combined with element level multiplication is similar to binary mask (binary mask). Close to the mask 1 The values in the gate layer corresponding to the values of are almost the same ; The value corresponding to the lower value is basically erased .
Oblivion gate
The first door we introduced is the forgetting door (forget gate). The purpose of this gate is to delete information that is no longer needed by the context . The forgetting gate calculates the weighted sum of the previous hidden state and the current input , And pass sigmoid convert . obtain mask, The mask Then it is multiplied by the context vector element level to remove the context information that is no longer needed .
f t = σ ( U f h t − 1 + W f x t ) (1) f_t = \sigma(U_f h_{t-1} + W_fx_t) \tag 1 ft=σ(Ufht−1+Wfxt)(1)
among U f U_f Uf and W f W_f Wf Is two weight matrices ; h t − 1 h_{t-1} ht−1 For the previous hidden state ; x t x_t xt Enter... For the current ; σ \sigma σ by sigmoid function , Here we ignore the bias term .
k t = c t − 1 ⊙ f t (2) k_t = c_{t-1} \odot f_t \tag 2 kt=ct−1⊙ft(2)
among c t − 1 c_{t-1} ct−1 Represents the previous context layer vector ; Element level multiplication of two vectors ( By operator ⊙ \odot ⊙ Express , Sometimes called Hadamard product ) Is the multiplication of the corresponding elements of two vectors .
The forgetting gate controls the context vector in memory c t − 1 c_{t-1} ct−1 Whether to be forgotten .
Input gate
Similarly , The input gate also passes through the previous hidden state h t − 1 h_{t-1} ht−1 And current input x t x_t xt Calculation :
i t = σ ( U i h t − 1 + W i x t ) (3) i_t = \sigma(U_ih_{t-1} + W_ix_t) \tag 3 it=σ(Uiht−1+Wixt)(3)
Then we calculate the previous hidden state h t − 1 h_{t-1} ht−1 And current input x t x_t xt Extract actual information from —— all RNN The basic operations used by the network :
g t = tanh ( U g h t − 1 + W g x t ) (4) g_t = \tanh(U_gh_{t-1} + W_gx_t) \tag 4 gt=tanh(Ught−1+Wgxt)(4)
In a simple RNN in , The result of the above calculation is the hidden state of the current time , But in LSTM Middle is not .LSTM The hidden state in is based on the context state c t c_t ct Calculated .
j t = g t ⊙ i t (5) j_t = g_t \odot i_t \tag 5 jt=gt⊙it(5)
Multiply the input gate to control g t g_t gt( Also known as candidate values ) How much can be stored in the current context state c t c_t ct.
We put what we got above j t j_t jt and k t k_t kt Add it up to get the current context state c t c_t ct:
c t = j t + k t (6) c_t = j_t + k_t \tag 6 ct=jt+kt(6)
Output gate
The last gate is the output gate , It is used to control which information of the current hidden state is needed .
o t = σ ( U o h t − 1 + W o x t ) (7) o_t = \sigma(U_o h_{t-1} +W_o x_t) \tag 7 ot=σ(Uoht−1+Woxt)(7)
h t = o t ⊙ tanh ( c t ) (8) h_t = o_t \odot \tanh(c_t) \tag 8 ht=ot⊙tanh(ct)(8)
Given the weights of different gates ,LSTM Accept the context layer and hidden layer of the previous time and the current input vector as input . then , It generates updated context vectors and hidden vectors as output . Hidden layer h t h_t ht Can be used as a stack RNN Input of subsequent layers in , Or generate output for the last layer of the network .
For example, based on the current hidden state , You can calculate the output at the current time y ^ t \hat y_t y^t:
y ^ t = softmax ( W y h t + b y ) (9) \hat y_t = \text{softmax}(W_y h_t + b_y) \tag{9} y^t=softmax(Wyht+by)(9)
All in all ,LSTM Calculation c t c_t ct and h t h_t ht. c t c_t ct Long term memory , h t h_t ht Short term memory . Use input x t x_t xt and h t − 1 h_{t-1} ht−1 Renew long-term memory , In update , some c t c_t ct The characteristics of the forgotten gate f t f_t ft eliminate , Some other features have been added from the input gate to c t c_t ct in .
The new short-term memory is the long-term memory process tanh \tanh tanh Multiplied by the output gate . Be careful , On update ,LSTM Not looking at long-term memory c t c_t ct. Only modify it . c t c_t ct And never go through a linear transformation . This is why the gradient vanishes and explodes .
References
- Speech and Language Processing
边栏推荐
- 掌握这28张图,面试再也不怕被问TCP知识了
- 软件快速交付真的需要以安全为代价吗?
- 把数组排成最小的数_数组中的逆序对(归并统计法)_数字在升序数组中出现的次数_丑数(剑指offer)
- math_基本初等函数图型(幂函数/指数/对数/三角/反三角)
- 证券开户选择哪个证券另外想问,现在在线开户安全么?
- ASP.NET 跨页面提交(Button控件页面重定向)
- 从检查点恢复后读取不到mysql的数据有那位兄台知道原因吗
- leetcode:91. 解码方法【dfs + 记忆化】
- Is it reliable to open an account on the compass with your mobile phone? Is there any hidden danger in this way
- Spark集群安装
猜你喜欢
Reading notes on how to connect the network - LAN on the server side (4)
2022年第一季度保险服务数字化跟踪分析
【多线程】 如何自己实现定时器
联通入库|需要各地联通公司销售其产品的都需要先入总库
Shangsilicon Valley real-time data warehouse project (Alibaba cloud real-time data warehouse)
Motianlun "high availability architecture" dry goods document sharing (including 124 Oracle, MySQL and PG materials)
Hidden worries behind the listing of shushulang: the performance has declined significantly, the market position is relatively backward, and the competitiveness is questionable
IFLYTEK AI learning machine summer new product launch AI + education depth combination to create a new height of products
One click file sharing software jirafeau
Detailed description of gaussdb (DWS) complex and diverse resource load management methods
随机推荐
为什么要同时重写hashcode和equals方法之简单理解
Ce CDC Flink peut - il être utilisé pour la synchronisation incrémentale d'Oracle à MySQL?
Huawei's software testing director with 7 years' experience, several suggestions for all students who want to switch to software testing
免费将pdf转换成word的软件分享,这几个软件一定要知道!
Matplotlib histogram
If I am in Zhuhai, where can I open an account? Is it safe to open an account online?
请教一下,CDC2.2.1可以同时监听多个pgsql 的库吗?
硅树脂油漆申请美国标准UL 790 Class A 合适吗?
Kubernetes architecture that novices must know
With the rise of China's database, Alibaba cloud lifeifei: China's cloud database has taken the lead in various mainstream technological innovations abroad
Numpy array creation
Add the applet "lazycodeloading": "requiredcomponents" in taro,
Taro2.* applet configuration sharing wechat circle of friends
直播平台开发,进入可视区域执行动画、动效、添加样式类名
#第三天
CSDN failed to replicate problem
Hidden worries behind the listing of shushulang: the performance has declined significantly, the market position is relatively backward, and the competitiveness is questionable
这次跟大家聊聊技术,也聊聊人生
5-2Web应用程序漏洞扫描
ASP动态创建表格 Table