当前位置:网站首页>Code implementation MNLM
Code implementation MNLM
2022-07-02 13:46:00 【InfoQ】
Let's review the model structure first

- y It's output
- x It's input , Then it will be transformed into C,But the original formula is still used x Express
- d It's a hidden layer bias
- H Is the weight of the input layer to the hidden layer
- U Is the weight from the hidden layer to the output layer
- W yes c Weight directly to the output layer
- b It's the output layer bias
Explain how the Internet came out




Code
The model code
class NNLM(nn.Module):
def __init__(self):
super(NNLM, self).__init__()
self.C = nn.Embedding(n_class, m)
self.H = nn.Parameter(torch.randn(len_sen * m, n_hidden,requires_grad=True))
self.d = nn.Parameter(torch.randn(n_hidden))
self.U = nn.Parameter(torch.randn(n_hidden, n_class,requires_grad=True))
self.W = nn.Parameter(torch.zeros(len_sen * m, n_class,requires_grad=True))
self.b = nn.Parameter(torch.randn(n_class))
def forward(self, X): # X : [batch_size, len_sen]
X = self.C(X) # X : [batch_size, len_sen, m]
X = X.view(-1, len_sen * m) # [batch_size, len_sen * m]
tanh = torch.tanh(self.d + X @ self.H) # [batch_size, n_hidden]
output = self.b + X @ self.W + tanh @ self.U # [batch_size, n_class]
return output
__init__(self)
This part is the above parameter quantity .
self.C
It's a embedding operation .
- The rest is the parameters in the network . Mentioned W Initialize to 0 matrix , therefore W Use it there
torch.zeros
, The rest use random initializationtorch.randn
.
forward(self, X)
Is to set forward propagation ,
X = self.C(X)
, First the X Carry out a embedding Handle , Then return the result to X. It corresponds to what we mentioned earlier , Although it takes a embedding Handle , But the input in the original formula still uses X It means .
Tensor.view
The function is to modify the shape of the tensor .torch.Tensor.view — PyTorch 1.11.0 documentation. After the dimension is modified, the word in each sentence is word embedding Join the vectors together .
self.d + X @ self.H
Here is the calculation of the hidden layer of the input layer .
tanh = torch.tanh(self.d + X @ self.H)
The calculation result needs to go through tanh The activation function of . Here is the general tanh The result of the calculation of the activation function is directly assigned to a named tanh The variable of .
output = self.b + X @ self.W + tanh @ self.U
Then there is the output layer calculation . Note here that the result of the output layer is composed of two parts . Part is the result of the hidden layer , Part is the result from the input layer , The calculation of hidden layer is only after the two are added .
batch_size
len_sen
m
n_hidden
n_class
- At first, your input is a group of sentences , So your input X The shape of should be [batch_size, len_sen]. At this point, each element of the matrix is a word .
- After the first step embedding After calculation , It will be transformed into eigenvector representation . At this time X The shape of should be [batch_size, len_sen, m]. Because you are an element , To express a word . Now it becomes a word , Use an eigenvector to represent . So a dimension is added to represent the eigenvector . Now it becomes a three-dimensional matrix .
- after
Tensor.view
Modify shape . Here isX.view(-1, len_sen * m)
Change to two-dimensional matrix , The second dimension of the matrix is len_sen * m, The first dimension is adaptive (-1 It means adaptive ). It means to make a representation of different words in a sentence concate, Splice up .
tanh
Here we have reached the hidden layer . So the length of the input vector will become the size of the hidden layer . The size of this hidden layern_hidden
You need to set it yourself . The size of the hidden layer determines the quality of the network . Of course, the amount of data here is relatively small , So whether it's good or not actually has little effect on the size of the hidden layer . Generally, the size of the hidden layer follows the following rules .
- hypothesis :
- The input layer size is
- The output layer is divided intoclass
- The number of samples is
- A constant
- A common view is the number of hidden layers:
- ……
- How to determine the number and size of hidden layers in neural network _LolitaAnn Technology blog _51CTO Blog
- Here we use. The length entered in our code is
len_sen * m
. The classification size is the length of the word listn_class
. After calculation h The size is 14.
- At this time
tanh
Dimension for [batch_size, n_hidden].
- The shape of the output layer is [batch_size, n_class], What the output layer needs to do is calculate each sample and finally obtain a vector . The length of this vector is the same as the length of the word list , This indicates the position of the prediction result in the word list .
Data preprocessing part of the code
sentences = ["The cat is walking in the bedroom",
"A dog was running in a room",
"The cat is running in a room",
"A dog is walking in a bedroom",
"The dog was walking in the room"]
word_list = " ".join(sentences).lower().split()
word_list = list(set(word_list))
word_dict = {w: i for i, w in enumerate(word_list)}
number_dict = {i: w for i, w in enumerate(word_list)}
- The seventh line of code
word_list
Is to splice all sentences in the data set with spaces . Then convert it to lowercase . Then separate them with a space , Divided into different words . At this point, you get a list of words . But now there will be a lot of repeated words .
- The code on the eighth line
word_list
First use set, Convert the list obtained above into a set , Remove duplicate words , Then convert back to the list .
- The ninth and tenth lines of code are dictionaries that use enumeration to create word lists .
def dataset():
input = []
target = []
for sen in sentences:
word = sen.lower().split() # space tokenizer
i = [word_dict[n] for n in word[:-1]] # create (1~n-1) as input
t = word_dict[word[-1]] # create (n) as target, We usually call this 'casual language model'
input.append(i)
target.append(t)
return input, target

Complete code
import torch
import torch.nn as nn
import torch.optim as optim
def dataset():
input = []
target = []
for sen in sentences:
word = sen.lower().split() # space tokenizer
i = [word_dict[n] for n in word[:-1]] # create (1~n-1) as input
t = word_dict[word[-1]] # create (n) as target, We usually call this 'casual language model'
input.append(i)
target.append(t)
return input, target
# Model
class NNLM(nn.Module):
def __init__(self):
super(NNLM, self).__init__()
self.C = nn.Embedding(n_class, m)
self.H = nn.Parameter(torch.randn(len_sen * m, n_hidden,requires_grad=True))
self.d = nn.Parameter(torch.randn(n_hidden))
self.U = nn.Parameter(torch.randn(n_hidden, n_class,requires_grad=True))
self.W = nn.Parameter(torch.zeros(len_sen * m, n_class,requires_grad=True))
self.b = nn.Parameter(torch.randn(n_class))
def forward(self, X): # X : [batch_size, len_sen, m]
X = self.C(X) # X : [batch_size, len_sen, m]
X = X.view(-1, len_sen * m) # [batch_size, len_sen * m]
tanh = torch.tanh(self.d + X @ self.H) # [batch_size, n_hidden]
output = self.b + X @ self.W + tanh @ self.U # [batch_size, n_class]
return output
if __name__ == '__main__':
sentences = ["The cat is walking in the bedroom",
"A dog was running in a room",
"The cat is running in a room",
"A dog is walking in a bedroom",
"The dog was walking in the room"]
word_list = " ".join(sentences).lower().split()
word_list = list(set(word_list))
word_dict = {w: i for i, w in enumerate(word_list)}
number_dict = {i: w for i, w in enumerate(word_list)}
n_class = len(word_dict) # number of Vocabulary
len_sen = 6 # number of steps, n-1 in paper
m = 3 # embedding size, m in paper
n_hidden = (int)((len_sen*m*n_class)**0.5) # number of hidden size, h in paper
model = NNLM()
loss = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.003)
input, target = dataset()
input = torch.LongTensor(input)
target = torch.LongTensor(target)
# Look at the effect before training .
predict = model(input).data.max(1, keepdim=True)[1]
print([sen.split()[:6] for sen in sentences], '->', [number_dict[n.item()] for n in predict.squeeze()])
# Training
for epoch in range(5000):
optimizer.zero_grad()
output = model(input)
# output : [batch_size, n_class], target : [batch_size]
Loss = loss(output, target)
if (epoch + 1) % 1000 == 0:
print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.6f}'.format(Loss))
Loss.backward()
optimizer.step()
# Predict & test
predict = model(input).data.max(1, keepdim=True)[1]
print([sen.split()[:6] for sen in sentences], '->', [number_dict[n.item()] for n in predict.squeeze()])
边栏推荐
- D language, possible 'string plug-ins'
- Web Foundation
- D如何检查null
- What are eNB, EPC and PGW?
- Student course selection information management system based on ssm+jsp framework [source code + database]
- 浏览器驱动的下载
- Quantum three body problem: Landau fall
- Pocket Raider comments
- OpenAPI generator: simplify the restful API development process
- 题解:《压缩技术》(原版、续集版)
猜你喜欢
Web Foundation
Drawing Nyquist diagram with MATLAB
Professor of Shanghai Jiaotong University: he Yuanjun - bounding box (containment / bounding box)
uniapp小程序 subPackages分包配置
(POJ - 1984) navigation nightare (weighted and search set)
刚好1000粉丝,记录一下
De4000h storage installation configuration
Solve "sub number integer", "jump happily", "turn on the light"
Chinese name extraction (toy code - accurate head is too small, right to play)
[cloud native database] what to do when encountering slow SQL (Part 1)?
随机推荐
selenium的特点
操作教程:EasyDSS如何将MP4点播文件转化成RTSP视频流?
Research shows that "congenial" is more likely to become friends
研究表明“气味相投”更易成为朋友
题解:《压缩技术》(原版、续集版)
混沌工程平台 ChaosBlade-Box 新版重磅发布
二、帧模式 MPLS 操作
Qt入门-制作一个简易的计算器
[OpenGL] notes 29. Advanced lighting (specular highlights)
(POJ - 1984) navigation nightare (weighted and search set)
Tupang multi-target tracking! BOT sort: robust correlated multi pedestrian tracking
题解《子数整数》、《欢乐地跳》、《开灯》
The 29 year old programmer in Shanghai was sentenced to 10 months for "deleting the database and running away" on the day of his resignation!
Pattern matching and regular expressions in PostgreSQL - Das
Crowncad (crown CAD), the first fully independent 3D CAD platform based on Cloud Architecture in China
Runhe hi3516 development board openharmony small system and standard system burning
石子合并板子【区间DP】(普通石子合并 & 环形石子合并)
Origin绘制热重TG和微分热重DTG曲线
Redis数据库持久化
I did it with two lines of code. As a result, my sister had a more ingenious way