当前位置:网站首页>[Deep Learning] (Problem Record) - Linear Regression - Small Batch Stochastic Gradient Descent
[Deep Learning] (Problem Record) - Linear Regression - Small Batch Stochastic Gradient Descent
2022-07-30 10:14:00 【aaaafeng】
前言
博主主页:阿阿阿阿锋的主页_CSDN
原文链接:原文
When I read a book and feel that the foreword does not match the afterword,I often can't help but wonder if there's something wrong with this writing(Sometimes I even scold the author a few words in my heart,怎么这么不小心).Yet experience and residual sanity still remind me,This is most likely my own problem.
我的编程环境:Win10操作系统,python3.6
初学不久,文章如有问题,欢迎指出.
1. 问题和代码
for the codesgd函数中的param[:] = param - lr * param.grad / batch_sizeI've been very confused about this line.
For example a mini-batch is set up in the code10个样本,So I think it's right参数集params求梯度时,得到每个参数The gradient of should be a vector(可以理解为一个数组)类型的数据.Because for each parameter,通过10The gradient of each sample will be obtained10corresponding value.
So I had doubts,/ batch_sizeThe purpose is to get the gradient平均值,But the dividend to its left is not one标量(Ordinary single value),So how does this line of code get the average we want?
注:代码参考自《动手学深度学习》
代码:
# 代码目标:训练一个线性回归模型,Use mini-batch stochastic gradient descent
%matplotlib inline
from IPython import display
from matplotlib import pyplot as plt
from mxnet import autograd, nd
import random
# 制作训练集
num_inputs = 2
num_examples = 1000
true_w = [2, -3.4]
true_b = 4.2
features = nd.random.normal(scale=1, shape=(num_examples, num_inputs))
labels = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] + true_b
labels += nd.random.normal(scale=0.01, shape=labels.shape)
features[0], labels[0]
def use_svg_display():
# 用矢量图显示
display.set_matplotlib_formats('svg')
def set_figsize(figsize=(3.5, 2.5)):
use_svg_display()
# 设置图的尺寸
plt.rcParams['figure.figsize'] = figsize
set_figsize()
plt.scatter(features[:, 1].asnumpy(), labels.asnumpy(), 1); # 加分号只显示图(Otherwise, a line of text will also be displayed)
# 本函数已保存在d2lzh包中方便以后使用
def data_iter(batch_size, features, labels):
num_examples = len(features)
indices = list(range(num_examples))
random.shuffle(indices) # 样本的读取顺序是随机的
for i in range(0, num_examples, batch_size):
j = nd.array(indices[i: min(i + batch_size, num_examples)])
yield features.take(j), labels.take(j) # take函数根据索引返回对应元素
batch_size = 10 # 一个“小批量”的大小
# Build the parameters of the model we want to train
w = nd.random.normal(scale=0.01, shape=(num_inputs, 1))
b = nd.zeros(shape=(1,))
w.attach_grad()
b.attach_grad()
def linreg(X, w, b): # our model function
return nd.dot(X, w) + b
def squared_loss(y_hat, y): # 使用的损失函数
return (y_hat - y.reshape(y_hat.shape)) ** 2 / 2
def sgd(params, lr, batch_size): # 用于迭代(更新)参数
for param in params:
param[:] = param - lr * param.grad / batch_size
lr = 0.03 # 学习率
num_epochs = 3 # number of learning cycles
net = linreg # take a nickname
loss = squared_loss
for epoch in range(num_epochs): # 训练模型一共需要num_epochs个迭代周期
# 在每一个迭代周期中,会使用训练数据集中所有样本一次(假设样本数能够被批量大小整除).X
# X和y分别是小批量样本的特征和标签
for X, y in data_iter(batch_size, features, labels):
with autograd.record():
l = loss(net(X, w, b), y) # l是有关小批量X和y的损失
l.backward() # 小批量的损失对模型参数求梯度
sgd([w, b], lr, batch_size) # 使用小批量随机梯度下降迭代模型参数
train_l = loss(net(features, w, b), labels)
print('epoch %d, loss %f' % (epoch + 1, train_l.mean().asnumpy()))
print('\nweights:')
print(true_w, w)
print('\nbias:')
print(true_b, b)
2. 分析问题
After thinking about it for a while,我突然想到,Why not put the parameterparamThe gradients are printed out and have a look?Then it is not clear what the situation is!请看代码:
def sgd(params, lr, batch_size): # 用于迭代(更新)参数
for param in params:
param[:] = param - lr * param.grad / batch_size
print('\nparam.grad:')
print(param.grad)
我仅仅在sgdTwo lines of printing are added at the end of the function,Then let's look at the effect again(This function is called every time a mini-batch of samples is processed,One look is enough,Because I just want to know the data type of the gradient of the parameter)
运行效果:
前面的param.gradrepresents two 权重(weight) 参数的梯度,后面的是 偏差(bias) 参数的梯度.也就是说,The gradient obtained for each parameter has only one value.
Why a batch10个样本,Only one value is obtained?
这个值是什么?其实在执行l.backward()时,Equivalent to executingl.sum().backward(),That is, there is a gradient value for each sample in a batch,然后把这10add up the gradient values,The gradient value of the parameter is obtained.所以再用/ batch_sizeTaking the average is also a natural thing to do.
Actually I saw this explanation shortly after,But because I was already in the loop,Just when I see here蒙圈++
🧭 总结
种瓜得瓜,种豆得豆.
What shape is the variable,What is the shape of the gradient obtained for this variable.
The reason I subconsciously thought I would get a set of values instead of a single value,It's because I saw an example of finding the gradient of a matrix earlier,What you get is a set of values(一个矩阵).Then I got confused here,Here each parameter object for which we are finding the gradient is a single value,It's just that there are multiple data samples.
| 对矩阵(向量)求梯度 | The gradient obtained is a matrix(向量) |
|---|---|
| Find the gradient of a scalar | The obtained gradient is a scalar |
I have another feeling,as a previous idiomaticC/C++程序员,pythonThis flexibility of variable data types really blows my mind、非常的不适应,I've been fooled by this many times,呜.
边栏推荐
- 软考 系统架构设计师 简明教程 | 案例分析 | 需求分析
- leetcode 剑指 Offer 57. 和为s的两个数字
- OC-ARC(Automatic Reference Counting)自动引用计数
- what is this method called
- HR团队如何提升效率?人力资源RPA给你答案
- hcip06 ospf特殊区域综合实验
- leetcode 剑指 Offer 25. 合并两个排序的链表
- PyQt5-用像素点绘制正弦曲线
- Flink_CDC construction and simple use
- By building a sequence table - teach you to calculate time complexity and space complexity (including recursion)
猜你喜欢

Security思想项目总结

Matplotlib--绘图标记

Test automation selenium (a)

Re17:读论文 Challenges for Information Extraction from Dialogue in Criminal Law

4、yolov5-6.0 ERROR: AttributeError: ‘Upsample‘ object has no attribute ‘recompute_scale_factor‘ 解决方案

hcip06 ospf特殊区域综合实验

PyQt5-在窗口上绘制文本

柱状图 直方图 条形图 的区别

leetcode 剑指 Offer 21. 调整数组顺序使奇数位于偶数前面

hcip06 ospf special area comprehensive experiment
随机推荐
leetcode 剑指 Offer 25. 合并两个排序的链表
多线程保证单个线程开启事务并生效的方案
ospf2 two-point two-way republish (question 2)
软考 系统架构设计师 简明教程 | 案例分析 | 需求分析
(C language) file operation
Version management of public Jar packages
(BUG记录)No module named PIL
再有人问你分布式事务,把这篇扔给他
EViews 12.0 software installation package download and installation tutorial
Re16: Read the paper ILDC for CJPE: Indian Legal Documents Corpus for Court Judgment Prediction and Explanation
OC-关于alloc和dealloc(还没开始写)
Security思想项目总结
Flask之路由(app.route)详解
【HMS core】【FAQ】HMS Toolkit典型问题合集1
分页 paging
(文字)无框按钮设置
nacos实战项目中的配置
Day113. Shangyitong: WeChat login QR code, login callback interface
时刻铭记:总有一天你将破蛹而出
C语言顺序表基本操作