当前位置:网站首页>Code implementation MNLM
Code implementation MNLM
2022-07-02 13:46:00 【InfoQ】
Let's review the model structure first

- y It's output
- x It's input , Then it will be transformed into C,But the original formula is still used x Express
- d It's a hidden layer bias
- H Is the weight of the input layer to the hidden layer
- U Is the weight from the hidden layer to the output layer
- W yes c Weight directly to the output layer
- b It's the output layer bias
Explain how the Internet came out




Code
The model code
class NNLM(nn.Module):
def __init__(self):
super(NNLM, self).__init__()
self.C = nn.Embedding(n_class, m)
self.H = nn.Parameter(torch.randn(len_sen * m, n_hidden,requires_grad=True))
self.d = nn.Parameter(torch.randn(n_hidden))
self.U = nn.Parameter(torch.randn(n_hidden, n_class,requires_grad=True))
self.W = nn.Parameter(torch.zeros(len_sen * m, n_class,requires_grad=True))
self.b = nn.Parameter(torch.randn(n_class))
def forward(self, X): # X : [batch_size, len_sen]
X = self.C(X) # X : [batch_size, len_sen, m]
X = X.view(-1, len_sen * m) # [batch_size, len_sen * m]
tanh = torch.tanh(self.d + X @ self.H) # [batch_size, n_hidden]
output = self.b + X @ self.W + tanh @ self.U # [batch_size, n_class]
return output
__init__(self)
This part is the above parameter quantity .
self.C
It's a embedding operation .
- The rest is the parameters in the network . Mentioned W Initialize to 0 matrix , therefore W Use it there
torch.zeros
, The rest use random initializationtorch.randn
.
forward(self, X)
Is to set forward propagation ,
X = self.C(X)
, First the X Carry out a embedding Handle , Then return the result to X. It corresponds to what we mentioned earlier , Although it takes a embedding Handle , But the input in the original formula still uses X It means .
Tensor.view
The function is to modify the shape of the tensor .torch.Tensor.view — PyTorch 1.11.0 documentation. After the dimension is modified, the word in each sentence is word embedding Join the vectors together .
self.d + X @ self.H
Here is the calculation of the hidden layer of the input layer .
tanh = torch.tanh(self.d + X @ self.H)
The calculation result needs to go through tanh The activation function of . Here is the general tanh The result of the calculation of the activation function is directly assigned to a named tanh The variable of .
output = self.b + X @ self.W + tanh @ self.U
Then there is the output layer calculation . Note here that the result of the output layer is composed of two parts . Part is the result of the hidden layer , Part is the result from the input layer , The calculation of hidden layer is only after the two are added .
batch_size
len_sen
m
n_hidden
n_class
- At first, your input is a group of sentences , So your input X The shape of should be [batch_size, len_sen]. At this point, each element of the matrix is a word .
- After the first step embedding After calculation , It will be transformed into eigenvector representation . At this time X The shape of should be [batch_size, len_sen, m]. Because you are an element , To express a word . Now it becomes a word , Use an eigenvector to represent . So a dimension is added to represent the eigenvector . Now it becomes a three-dimensional matrix .
- after
Tensor.view
Modify shape . Here isX.view(-1, len_sen * m)
Change to two-dimensional matrix , The second dimension of the matrix is len_sen * m, The first dimension is adaptive (-1 It means adaptive ). It means to make a representation of different words in a sentence concate, Splice up .
tanh
Here we have reached the hidden layer . So the length of the input vector will become the size of the hidden layer . The size of this hidden layern_hidden
You need to set it yourself . The size of the hidden layer determines the quality of the network . Of course, the amount of data here is relatively small , So whether it's good or not actually has little effect on the size of the hidden layer . Generally, the size of the hidden layer follows the following rules .
- hypothesis :
- The input layer size is
- The output layer is divided intoclass
- The number of samples is
- A constant
- A common view is the number of hidden layers:
- ……
- How to determine the number and size of hidden layers in neural network _LolitaAnn Technology blog _51CTO Blog
- Here we use. The length entered in our code is
len_sen * m
. The classification size is the length of the word listn_class
. After calculation h The size is 14.
- At this time
tanh
Dimension for [batch_size, n_hidden].
- The shape of the output layer is [batch_size, n_class], What the output layer needs to do is calculate each sample and finally obtain a vector . The length of this vector is the same as the length of the word list , This indicates the position of the prediction result in the word list .
Data preprocessing part of the code
sentences = ["The cat is walking in the bedroom",
"A dog was running in a room",
"The cat is running in a room",
"A dog is walking in a bedroom",
"The dog was walking in the room"]
word_list = " ".join(sentences).lower().split()
word_list = list(set(word_list))
word_dict = {w: i for i, w in enumerate(word_list)}
number_dict = {i: w for i, w in enumerate(word_list)}
- The seventh line of code
word_list
Is to splice all sentences in the data set with spaces . Then convert it to lowercase . Then separate them with a space , Divided into different words . At this point, you get a list of words . But now there will be a lot of repeated words .
- The code on the eighth line
word_list
First use set, Convert the list obtained above into a set , Remove duplicate words , Then convert back to the list .
- The ninth and tenth lines of code are dictionaries that use enumeration to create word lists .
def dataset():
input = []
target = []
for sen in sentences:
word = sen.lower().split() # space tokenizer
i = [word_dict[n] for n in word[:-1]] # create (1~n-1) as input
t = word_dict[word[-1]] # create (n) as target, We usually call this 'casual language model'
input.append(i)
target.append(t)
return input, target

Complete code
import torch
import torch.nn as nn
import torch.optim as optim
def dataset():
input = []
target = []
for sen in sentences:
word = sen.lower().split() # space tokenizer
i = [word_dict[n] for n in word[:-1]] # create (1~n-1) as input
t = word_dict[word[-1]] # create (n) as target, We usually call this 'casual language model'
input.append(i)
target.append(t)
return input, target
# Model
class NNLM(nn.Module):
def __init__(self):
super(NNLM, self).__init__()
self.C = nn.Embedding(n_class, m)
self.H = nn.Parameter(torch.randn(len_sen * m, n_hidden,requires_grad=True))
self.d = nn.Parameter(torch.randn(n_hidden))
self.U = nn.Parameter(torch.randn(n_hidden, n_class,requires_grad=True))
self.W = nn.Parameter(torch.zeros(len_sen * m, n_class,requires_grad=True))
self.b = nn.Parameter(torch.randn(n_class))
def forward(self, X): # X : [batch_size, len_sen, m]
X = self.C(X) # X : [batch_size, len_sen, m]
X = X.view(-1, len_sen * m) # [batch_size, len_sen * m]
tanh = torch.tanh(self.d + X @ self.H) # [batch_size, n_hidden]
output = self.b + X @ self.W + tanh @ self.U # [batch_size, n_class]
return output
if __name__ == '__main__':
sentences = ["The cat is walking in the bedroom",
"A dog was running in a room",
"The cat is running in a room",
"A dog is walking in a bedroom",
"The dog was walking in the room"]
word_list = " ".join(sentences).lower().split()
word_list = list(set(word_list))
word_dict = {w: i for i, w in enumerate(word_list)}
number_dict = {i: w for i, w in enumerate(word_list)}
n_class = len(word_dict) # number of Vocabulary
len_sen = 6 # number of steps, n-1 in paper
m = 3 # embedding size, m in paper
n_hidden = (int)((len_sen*m*n_class)**0.5) # number of hidden size, h in paper
model = NNLM()
loss = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.003)
input, target = dataset()
input = torch.LongTensor(input)
target = torch.LongTensor(target)
# Look at the effect before training .
predict = model(input).data.max(1, keepdim=True)[1]
print([sen.split()[:6] for sen in sentences], '->', [number_dict[n.item()] for n in predict.squeeze()])
# Training
for epoch in range(5000):
optimizer.zero_grad()
output = model(input)
# output : [batch_size, n_class], target : [batch_size]
Loss = loss(output, target)
if (epoch + 1) % 1000 == 0:
print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.6f}'.format(Loss))
Loss.backward()
optimizer.step()
# Predict & test
predict = model(input).data.max(1, keepdim=True)[1]
print([sen.split()[:6] for sen in sentences], '->', [number_dict[n.item()] for n in predict.squeeze()])
边栏推荐
猜你喜欢
rxjs Observable 自定义 Operator 的开发技巧
你的 Sleep 服务会梦到服务网格外的 bookinfo 吗
A better database client management tool than Navicat
When tidb meets Flink: tidb efficiently enters the lake "new play" | tilaker team interview
【蓝桥杯选拔赛真题43】Scratch航天飞行 少儿编程scratch蓝桥杯选拔赛真题讲解
OpenFOAM:lduMatrix&lduAddressing
Integral link, inertia link and proportion link in Simulink
BeanUtils--浅拷贝--实例/原理
研究表明“气味相投”更易成为朋友
What are eNB, EPC and PGW?
随机推荐
Halcon extract orange (Orange)
Partner cloud form strong upgrade! Pro version, more extraordinary!
解答:EasyDSS视频点播时音频是否可以设置为默认开启?
JS逆向之巨量创意signature签名
P3807 [template] Lucas theorem /lucas theorem
OpenFOAM:lduMatrix&lduAddressing
Drawing Nyquist diagram with MATLAB
能自动更新的万能周报模板,有手就会用!
代码实现MNLM
On flow delivery between microservices
Subcontracting configuration of uniapp applet subpackages
[技术发展-22]:网络与通信技术的应用与发展快速概览-2- 通信技术
BeanUtils--浅拷贝--实例/原理
Fundamentals of machine learning (II) -- division of training set and test set
Detailed collection of common MySQL commands
de4000h存储安装配置
Unity skframework framework (XVII), freecameracontroller God view / free view camera control script
Unity skframework framework (XIX), POI points of interest / information points
使用BLoC 构建 Flutter的页面实例
Node.js通过ODBC访问PostgreSQL数据库