当前位置:网站首页>Realizing deep learning framework from zero -- LSTM from theory to practice [practice]
Realizing deep learning framework from zero -- LSTM from theory to practice [practice]
2022-07-05 19:17:00 【Angry coke】
introduction
In line with “ Everything I can't create , I can't understand ” Thought , This series The article will be based on pure Python as well as NumPy Create your own deep learning framework from zero , The framework is similar PyTorch It can realize automatic derivation .
Deep understanding and deep learning , The experience of creating from scratch is very important , From an understandable point of view , Try not to use an external complete framework , Implement the model we want . This series The purpose of this article is through such a process , Let us grasp the underlying realization of deep learning , Instead of just being a switchman .
Last article in , We learned LSTM Theoretical part . And in RNN The actual combat part of , We saw how to achieve multi tier RNN And two way RNN. Again , What is achieved here LSTM It also supports multilayer and bidirectional .
LSTMCell
class LSTMCell(Module):
def __init__(self, input_size, hidden_size: int, bias: bool = True) -> None:
super(LSTMCell, self).__init__()
# Combined x->input gate; x-> forget gate; x-> g ; x-> output gate Linear transformation of
self.input_trans = Linear(hidden_size, 4 * hidden_size, bias=bias)
# Combined h->input gate; h-> forget gate; h-> g ; h-> output gate Linear transformation of
self.hidden_trans = Linear(input_size, 4 * hidden_size, bias=bias)
def forward(self, x: Tensor, h: Tensor, c: Tensor) -> Tuple[Tensor, Tensor]:
# i: input gate
# f: forget gate
# o: output gate
# g: g_t
ifgo = self.input_trans(h) + self.hidden_trans(x)
ifgo = F.chunk(ifgo, 4, -1)
# Calculate three doors at one time And g_t
i, f, g, o = ifgo
c_next = F.sigmoid(f) * c + F.sigmoid(i) * F.tanh(g)
h_next = F.sigmoid(o) * F.tanh(c_next)
return h_next, c_next
In the implementation, I refer to the last reference article , take x
and h
Related linear transformations separate :
i t = Linear x i ( x t ) + Linear h i ( h t − 1 ) f t = Linear x f ( x t ) + Linear h f ( h t − 1 ) g t = Linear x g ( x t ) + Linear h g ( h t − 1 ) o t = Linear x o ( x t ) + Linear h o ( h t − 1 ) (1) \begin{aligned} i_t &= \text{Linear}^i_x(x_t) + \text{Linear}^i_h(h_{t-1}) \\ f_t &= \text{Linear}^f_x(x_t) + \text{Linear}^f_h(h_{t-1}) \\ g_t &= \text{Linear}^g_x(x_t) + \text{Linear}^g_h(h_{t-1}) \\ o_t &= \text{Linear}^o_x(x_t) + \text{Linear}^o_h(h_{t-1}) \\ \end{aligned} \tag{1} itftgtot=Linearxi(xt)+Linearhi(ht−1)=Linearxf(xt)+Linearhf(ht−1)=Linearxg(xt)+Linearhg(ht−1)=Linearxo(xt)+Linearho(ht−1)(1)
For example, formula ( 3 ) (3) (3) Of i t = σ ( U i h t − 1 + W i x t ) i_t = \sigma(U_ih_{t-1} + W_ix_t) it=σ(Uiht−1+Wixt) in σ \sigma σ The parameter part of can be changed to :
U i h t − 1 + W i x t ⇒ i t = Linear x i ( x t ) + Linear h i ( h t − 1 ) U_ih_{t-1} + W_ix_t \Rightarrow i_t = \text{Linear}^i_x(x_t) + \text{Linear}^i_h(h_{t-1}) Uiht−1+Wixt⇒it=Linearxi(xt)+Linearhi(ht−1)
Then we can combine similar linear transformations , For example, combination x t x_t xt Related linear transformation : Linear x i ( x t ) , Linear x f ( x t ) , Linear x g ( x t ) , Linear x o ( x t ) \text{Linear}^i_x(x_t),\text{Linear}^f_x(x_t),\text{Linear}^g_x(x_t),\text{Linear}^o_x(x_t) Linearxi(xt),Linearxf(xt),Linearxg(xt),Linearxo(xt) by :
self.input_trans = Linear(hidden_size, 4 * hidden_size, bias=bias)
Similarly ,hidden_trans
Empathy . The reason for this is to speed up the calculation , We only need to do one linear operation , You can get four results .
adopt
ifgo = self.input_trans(h) + self.hidden_trans(x) \tag 2
The results of these four important values are obtained , But they are spliced into a tensor . And then use it
ifgo = F.chunk(ifgo, 4, -1)
Split them into tuples containing four values ,
i, f, g, o = ifgo
In this way, we get the parameter value in the corresponding function , Add the corresponding Sigmoid Function or Tanh Function can get the value we want .
Next, calculate c t c_t ct:
c t = σ ( f t ) ⊙ c t − 1 + σ ( i t ) ⊙ tanh ( g t ) (3) c_t = \sigma(f_t) \odot c_{t-1} + \sigma(i_t) \odot \tanh(g_t) \tag 3 ct=σ(ft)⊙ct−1+σ(it)⊙tanh(gt)(3)
The corresponding code :
c_next = F.sigmoid(f) * c + F.sigmoid(i) * F.tanh(g)
And then according to o t o_t ot and c t c_t ct Calculate hidden state h t h_t ht:
h t = σ ( o t ) ⊙ tanh ( c t ) (4) h_t = \sigma(o_t) \odot \tanh(c_t) \tag 4 ht=σ(ot)⊙tanh(ct)(4)
The corresponding code :
h_next = F.sigmoid(o) * F.tanh(c_next)
LSTM
With LSTMCell
You can achieve complete LSTM
了 .
class LSTM(Module):
def __init__(self, input_size: int, hidden_size: int, batch_first: bool = False, num_layers: int = 1,
bidirectional: bool = False, dropout: float = 0):
super(LSTM, self).__init__()
self.num_layers = num_layers
self.hidden_size = hidden_size
self.batch_first = batch_first
self.bidirectional = bidirectional
# Multi tier support
self.cells = ModuleList([LSTMCell(input_size, hidden_size)] +
[LSTMCell(hidden_size, hidden_size) for _ in range(num_layers - 1)])
if self.bidirectional:
# Support two-way
self.back_cells = copy.deepcopy(self.cells)
self.dropout = dropout
if dropout:
# Dropout layer
self.dropout_layer = Dropout(dropout)
def _one_directional_op(self, input, cells, n_steps, hs, cs, reverse=False):
''' Args: input: Input [n_steps, batch_size, input_size] cells: Forward or reverse RNNCell Of ModuleList hs: cs: n_steps: step reverse: true reverse Returns: '''
output = []
for t in range(n_steps):
inp = input[t]
for layer in range(self.num_layers):
hs[layer], cs[layer] = cells[layer](inp, hs[layer], cs[layer])
inp = hs[layer]
if self.dropout and layer != self.num_layers - 1:
inp = self.dropout_layer(inp)
# Collect the output of the final layer
output.append(hs[-1])
output = F.stack(output) # (n_steps, batch_size, num_directions * hidden_size)
if reverse:
output = F.flip(output, 0) # Reverse the output time step dimension , Make time step t=0 On , It's the result of looking at the whole sequence .
if self.batch_first:
output = output.transpose((1, 0, 2))
h_n = F.stack(hs)
c_n = F.stack(cs)
return output, (h_n, c_n)
def forward(self, input: Tensor, state: Optional[Tuple[Tensor, Tensor]] = None):
''' Args: input: shape [n_steps, batch_size, input_size] if batch_first=False ; Otherwise shape [batch_size, n_steps, input_size] state: Tuples (h,c) num_directions = 2 if self.bidirectional else 1 h: [num_directions * num_layers, batch_size, hidden_size] c: [num_directions * num_layers, batch_size, hidden_size] Returns: num_directions = 2 if self.bidirectional else 1 output: (n_steps, batch_size, num_directions * hidden_size) if batch_first=False or (batch_size, n_steps, num_directions * hidden_size) if batch_first=True Contains the last layer of each time step ( Multi-storey RNN) Output h_t h_n: (num_directions * num_layers, batch_size, hidden_size) Contains the final hidden state c_n: (num_directions * num_layers, batch_size, hidden_size) Contains the final hidden state '''
h_0, c_0 = None, None
if state is not None:
h_0, c_0 = state
is_batched = input.ndim == 3
batch_dim = 0 if self.batch_first else 1
if not is_batched:
# Convert to batch size 1 The input of
input = input.unsqueeze(batch_dim)
if state is not None:
h_0 = h_0.unsqueeze(1)
c_0 = c_0.unsqueeze(1)
if self.batch_first:
batch_size, n_steps, _ = input.shape
input = input.transpose((1, 0, 2)) # take batch Put it in the middle dimension
else:
n_steps, batch_size, _ = input.shape
if state is None:
num_directions = 2 if self.bidirectional else 1
h_0 = Tensor.zeros((self.num_layers * num_directions, batch_size, self.hidden_size), dtype=input.dtype,
device=input.device)
c_0 = Tensor.zeros((self.num_layers * num_directions, batch_size, self.hidden_size), dtype=input.dtype,
device=input.device)
# Get the state of each layer
hs, cs = list(F.split(h_0)), list(F.split(c_0))
if not self.bidirectional:
# If it's one-way
output, (h_n, c_n) = self._one_directional_op(input, self.cells, n_steps, hs, cs)
else:
output_f, (h_n_f, c_n_f) = self._one_directional_op(input, self.cells, n_steps, hs[:self.num_layers],
cs[:self.num_layers])
output_b, (h_n_b, c_n_b) = self._one_directional_op(F.flip(input, 0), self.back_cells, n_steps,
hs[self.num_layers:], cs[self.num_layers:],
reverse=True)
output = F.cat([output_f, output_b], 2)
h_n = F.cat([h_n_f, h_n_b], 0)
c_n = F.cat([c_n_f, c_n_b], 0)
return output, (h_n, c_n)
In the implementation of multi-layer and bi-directional, and RNN Is essentially the same , Its input and output are somewhat different . This is from LSTM It's decided by the architecture of .
Part of speech tagging practice
Based on the above implementation LSTM To implement the part of speech tagging classification model , It is also called LSTM:
class LSTM(nn.Module):
def __init__(self, vocab_size: int, embedding_dim: int, hidden_dim: int, output_dim: int, n_layers: int,
dropout: float, bidirectional: bool = False):
super(LSTM, self).__init__()
self.embedding = nn.Embedding(vocab_size, embedding_dim)
self.rnn = nn.LSTM(embedding_dim, hidden_dim, batch_first=True, num_layers=n_layers, dropout=dropout,
bidirectional=bidirectional)
num_directions = 2 if bidirectional else 1
self.output = nn.Linear(num_directions * hidden_dim, output_dim)
def forward(self, input: Tensor, hidden: Tensor = None) -> Tensor:
embeded = self.embedding(input)
output, _ = self.rnn(embeded, hidden) # pos tag The task takes advantage of all the time steps output
outputs = self.output(output)
log_probs = F.log_softmax(outputs, axis=-1)
return log_probs
Process and RNN Similar to , The training code is as follows :
embedding_dim = 128
hidden_dim = 128
batch_size = 32
num_epoch = 10
n_layers = 2
dropout = 0.2
# Load data
train_data, test_data, vocab, pos_vocab = load_treebank()
train_dataset = RNNDataset(train_data)
test_dataset = RNNDataset(test_data)
train_data_loader = DataLoader(train_dataset, batch_size=batch_size, collate_fn=train_dataset.collate_fn, shuffle=True)
test_data_loader = DataLoader(test_dataset, batch_size=batch_size, collate_fn=test_dataset.collate_fn, shuffle=False)
num_class = len(pos_vocab)
# Load model
device = cuda.get_device("cuda:0" if cuda.is_available() else "cpu")
model = LSTM(len(vocab), embedding_dim, hidden_dim, num_class, n_layers, dropout, bidirectional=True)
model.to(device)
# Training process
nll_loss = NLLLoss()
optimizer = SGD(model.parameters(), lr=0.1)
model.train() # Make sure that... Is applied dropout
for epoch in range(num_epoch):
total_loss = 0
for batch in tqdm(train_data_loader, desc=f"Training Epoch {
epoch}"):
inputs, targets, mask = [x.to(device) for x in batch]
log_probs = model(inputs)
loss = nll_loss(log_probs[mask], targets[mask]) # adopt bool choice ,mask Part does not need to be calculated
optimizer.zero_grad()
loss.backward()
optimizer.step()
total_loss += loss.item()
print(f"Loss: {
total_loss:.2f}")
# Testing process
acc = 0
total = 0
model.eval() # Unwanted dropout
for batch in tqdm(test_data_loader, desc=f"Testing"):
inputs, targets, mask = [x.to(device) for x in batch]
with no_grad():
output = model(inputs)
acc += (output.argmax(axis=-1).data == targets.data)[mask.data].sum().item()
total += mask.sum().item()
# The accuracy of the output on the test set
print(f"Acc: {
acc / total:.2f}")
Output :
Loss: 102.51
Acc: 0.70
Same configuration , The accuracy on the test set is also 70%, Do you need more batches .
Here's a demonstration , Only trained. 10 Lots .
Reference resources
边栏推荐
- 国内低代码开发平台靠谱的都有哪些?
- Android面试,android音视频开发
- Fuzor 2020 software installation package download and installation tutorial
- 100million single men and women supported an IPO with a valuation of 13billion
- MySQL中字段类型为longtext的值导出后显示二进制串方式
- Advanced application of C # language
- 618“低调”谢幕,百秋尚美如何携手品牌跨越“不确定时代”?
- 机器学习基础(三)——KNN/朴素贝叶斯/交叉验证/网格搜索
- 泰山OFFICE技术讲座:由行的布局高度,谈绘制高度的高度溢出、高度缩水(全网首发)
- Shell编程基础(第8篇:分支语句-case in)
猜你喜欢
C# 语言的基本语法结构
5. Data access - entityframework integration
测试的核心价值到底是什么?
Benefits of automated testing
Fuzor 2020 software installation package download and installation tutorial
1亿单身男女撑起一个IPO,估值130亿
word如何转换成pdf?word转pdf简单的方法分享!
微波雷达感应模块技术,实时智能检测人体存在,静止微小动静感知
UDF implementation of Dameng database
JMeter 常用的几种断言方法,你会了吗?
随机推荐
Benefits of automated testing
JAD的安装、配置及集成IDEA
cf:B. Almost Ternary Matrix【對稱 + 找規律 + 構造 + 我是構造垃圾】
彻底理解为什么网络 I/O 会被阻塞?
Technology sharing | interface testing value and system
Vagrant2.2.6 supports virtualbox6.1
尚硅谷尚优选项目教程发布
Talking about fake demand from takeout order
Interviewer: what is the difference between redis expiration deletion strategy and memory obsolescence strategy?
PG基础篇--逻辑结构管理(用户及权限管理)
Go语言学习教程(十五)
android中常见的面试题,2022金九银十Android大厂面试题来袭
手机开户选择哪家券商公司比较好哪家平台更安全
Technology sharing | common interface protocol analysis
什么是面上项目
Go语言 | 01 WSL+VSCode环境搭建避坑指南
[today in history] July 5: the mother of Google was born; Two Turing Award pioneers born on the same day
块编辑器如何选择?印象笔记 Verse、Notion、FlowUs
潘多拉 IOT 开发板学习(HAL 库)—— 实验8 定时器中断实验(学习笔记)
为什么 BI 软件都搞不定关联分析?带你分析分析