当前位置:网站首页>Neural Network Study Notes 3 - LSTM Long Short-Term Memory Network
Neural Network Study Notes 3 - LSTM Long Short-Term Memory Network
2022-07-30 10:40:00 【Oreos are delicious】
目录
1.1The general structure of the recurrent neural network
1.2延时神经网络(Time Delay Neural Network,TDNN)
entire blog post,The principle and code implementation are recommended to be eaten with the video,口感更佳
LSTMLong short-term memory network from principle to programming implementation
LSTMLong short-term memory network from principle to programming implementation_哔哩哔哩_bilibili
代码链接
链接:https://pan.baidu.com/s/1lrtbjpgpegiAGLOLLZcYoA
提取码:5lt3
--来自百度网盘超级会员V5的分享
要讲LSTMFirst of all, we need to understand the general network of circulating gods to which he belongs
1.循环神经网络
1.1The general structure of the recurrent neural network
循环神经网络(Recurrent Neural Network ,RNN)是一类具有短期记忆能力的神经网络.在循环神经网络中,神经元不但可以接受其他神经元的信息,也可以接受自身的信息,形成具有环路的网络结构.

1.2延时神经网络(Time Delay Neural Network,TDNN)
How recurrent neural networks form short-term memory?
答:Build an additional delay unit,用来存储网络的历史信息(可以包括输入、输出、隐状态等)

1.3按时间展开


The impact can be seen intuitively from the figure
1.4反向传播
![]()



δt,k为第t时刻的损失对第kNet input to hidden neuronszk的导数
1.5 梯度消失,梯度爆炸
梯度 

梯度爆炸问题: 梯度消失问题:
权重衰减 调整模型
梯度截断
2.lstmgating principle


![]()

3Matlab实现
(1)加载序列数据
Load speech training data. XTrain是一个包含270个12A cell array of dimensional variable-length sequences. Y是标签“1”,“2”,… ,“9”,对应9个说话者. XTrainThe entry in is there12行(每个特性一行)and a different number of columns(One column per time step)的矩阵.
[XTrain,YTrain] = japaneseVowelsTrainData;
XTrain(1:5)(2)Visualize the first time series in a plot.每行对应一个特征.
figure
plot(XTrain{1}')
xlabel("Time Step")
title("Training Observation 1")
numFeatures = size(XTrain{1},1);
legend("Feature " + string(1:numFeatures),'Location','northeastoutside')(3)准备要填充的数据
准备填充数据
在训练过程中,默认情况下,The software divides the training data into mini-batches,and fill the sequence,使它们具有相同的长度. Excessive padding can negatively impact network performance.
To prevent adding too much padding during training,The training data can be sorted by sequence length,and choose a mini batch size,so that the sequences in the mini-batches are of similar length. The figure below shows the effect of padding the sequence before and after sorting.

Get the sequence length for each observation.
numObservations = numel(XTrain); % Take the number of samples
for i=1:numObservations
sequence = XTrain{i};
sequenceLengths(i) = size(sequence,2); %Take the number of sequence lengths
end(4)按序列长度对数据进行排序.Sorting uses mini-batches in order to keep similar ones together.
[sequenceLengths,idx] = sort(sequenceLengths);
XTrain = XTrain(idx);
YTrain = YTrain(idx);(5)在条形图中查看排序的序列长度.
figure
bar(sequenceLengths)
ylim([0 30])
xlabel("Sequence")
ylabel("Length")
title("Sorted Data")(6)Choose a mini-batch size 27 to evenly divide the training data,And reduce the amount of padding in mini-batches.The image below illustrates the padding added to the sequence.Because the total amount of data is270So use mini-batches per group of ten
miniBatchSize = 27;(7)定义 LSTM 网络架构
定义 LSTM 网络架构.将输入大小指定为序列大小 12(输入数据的维度).指定具有 100 个隐含单元的双向 LSTM 层,并输出序列的最后一个元素.最后,通过包含大小为 9 的全连接层,后跟 softmax 层和分类层,to specify nine classes.
If you have access to the full sequence at forecast time,Then you can use bidirectional in the network LSTM 层.双向 LSTM Layers learn from the complete sequence at each time step.If you don't have access to the full sequence at forecast time,例如,When you are forecasting values or forecasting one time step at a time,则改用 LSTM 层.
inputSize = 12;
numHiddenUnits = 100;
numClasses = 9;
layers = [ ...
sequenceInputLayer(inputSize) %输入层
bilstmLayer(numHiddenUnits,'OutputMode','last') %lstmStructure and hidden layers lastRepresents sequence-to-label classification
fullyConnectedLayer(numClasses) %Fully connected layer settings category
softmaxLayer %softmaxFind the probability of each class for multiple classes
classificationLayer] 现在,指定训练选项.指定求解器为 'adam',梯度阈值为 1,最大轮数为 100.To reduce padding in minibatches,请选择 27 作为小批量大小.The data is to be padded so that the length is the same as the longest sequence,Please specify the sequence length as 'longest'.要确保数据保持按序列长度排序的状态,请指定从不打乱数据.
Since the mini-batch data store is smaller and the sequence is shorter,So it is more suitable for CPU 上训练.将 'ExecutionEnvironment' 指定为 'cpu'.要在 GPU(如果可用)上进行训练,请将 'ExecutionEnvironment' 设置为 'auto'(这是默认值).
maxEpochs = 100; %设置迭代次数
miniBatchSize = 27; %Set minibatch parameters
options = trainingOptions('adam', ... %Train the network solver The training option specifies the decay rate of the gradient and the moving average of the squared gradient.
'ExecutionEnvironment','cpu', ... %选择cpu,或者gpu
'GradientThreshold',1, ... % Adam The gradient moving average decay rate of the solver,Specify less than The non-negative scalar of 1.Gradient decay rate in Adamsection indicated.必须是 'adam'.Default values are suitable for most tasks
'MaxEpochs',maxEpochs, ... % 迭代步数
'MiniBatchSize',miniBatchSize, ... %The size of the mini-batch used for each training iteration,指定为正整数.A mini-batch is a subset of the training set,Used to evaluate the gradient of the loss function and update the weights.Improve performance and increase generalization ability
'SequenceLength','longest', ... %序列长度
'Shuffle','never', ...
'Verbose',0, ... % An indicator showing training progress information in the command window,指定为 1(true) 或0(false).
'Plots','training-progress'); % 绘制训练过程Shuffle用法
Option for data shuffling,指定为下列之一:
'once'— Shuffle the training and validation data once before training.
'never'— 不要打乱数据.
'every-epoch'— 在每个训练 epoch 之前打乱训练数据,And scramble the validation data before each network validation.如果 mini-batch The size does not evenly divide the number of training samples,则trainNetworkDiscard does not fit every epoch the final completeness mini-batch 的训练数据.to avoid in each epoch Discard the same data,请将ShuffleThe training options are set to 'every-epoch'.
(8)训练 LSTM 网络
使用 trainNetwork 以指定的训练选项训练 LSTM 网络.
net = trainNetwork(XTrain,YTrain,layers,options); (9)测试 LSTM 网络
Load the test set and classify sequences to different speakers.
Load voice test data.XTest 是包含 370 个不同长度的 12 维序列的元胞数组.YTest is composed of labels corresponding to nine speakers "1"、"2"、...、"9" 组成的分类向量.
[XTest,YTest] = japaneseVowelsTestData;
XTest(1:3)LSTM 网络 net 已使用相似长度的小批量序列进行训练.Make sure to organize your test data in the same way.按序列长度对测试数据进行排序.
numObservationsTest = numel(XTest);
for i=1:numObservationsTest
sequence = XTest{i};
sequenceLengthsTest(i) = size(sequence,2);
end
[sequenceLengthsTest,idx] = sort(sequenceLengthsTest);
XTest = XTest(idx);
YTest = YTest(idx);对测试数据进行分类.To reduce the amount of padding introduced during classification,Please set the minibatch size to 27.To apply the same padding as the training data,Please specify the sequence length as 'longest'.
miniBatchSize = 27;
YPred = classify(net,XTest, ...
'MiniBatchSize',miniBatchSize, ...
'SequenceLength','longest');(10)计算预测值的分类准确度.
acc = sum(YPred == YTest)./numel(YTest)边栏推荐
- 自适应控制——仿真实验一 用李雅普诺夫稳定性理论设计自适应规律
- CVTE school recruitment written test questions + summary of knowledge points
- The method of parameter passing
- wsl操作
- 软考 系统架构设计师 简明教程 | 系统运行与软件维护
- STM32CubeMX configuration to generate FreeRTOS project
- Detailed explanation of JVM memory layout, class loading mechanism and garbage collection mechanism
- [AGC] Growth Service 2 - In-App Message Example
- Paper reading: SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
- (C语言)文件操作
猜你喜欢

Paper reading: SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

你真的懂Redis的5种基本数据结构吗?

【HMS core】【FAQ】HMS Toolkit典型问题合集1

Selected System Design | Design of CAN Bus Controller Based on FPGA (with Code)

Materialist Dialectics - Conditionalism

自适应控制——仿真实验一 用李雅普诺夫稳定性理论设计自适应规律

JVM内存布局、类加载机制及垃圾回收机制详解

Nacos configuration in the project of battle

Multithreading--the usage of threads and thread pools

If someone asks you about distributed transactions again, throw this to him
随机推荐
hcip06 ospf special area comprehensive experiment
jmeter接口压力测试-(二)
flowable工作流所有业务概念
BERT预训练模型系列总结
Some commands of kubernetes
ospf2 two-point two-way republish (question 2)
Neural Ordinary Differential Equations
Re15: Read the paper LEVEN: A Large-Scale Chinese Legal Event Detection Dataset
OC-ARC (Automatic Reference Counting) automatic reference counting
JCL 学习
If someone asks you about distributed transactions again, throw this to him
Day113. Shangyitong: WeChat login QR code, login callback interface
PyQt5 - draw text on window
105. Construct binary tree from preorder and inorder traversal sequence (video explanation!!)
jmeter接口压力测试(一)
SST-Calib: A lidar-visual extrinsic parameter calibration method combining semantics and VO for spatiotemporal synchronization calibration (ITSC 2022)
Always remember: one day you will emerge from the chrysalis
debian10 install djando
[AGC] Growth Service 2 - In-App Message Example
Detailed explanation of JVM memory layout, class loading mechanism and garbage collection mechanism