当前位置:网站首页>Study Notes: The Return of Machine Learning
Study Notes: The Return of Machine Learning
2022-08-02 00:07:00 【Menglong is like us】
活动地址:CSDN21天学习挑战赛
1 简介
There are many simple regression problems in life,For example, there is a significant linear relationship between two variables,We can use regression to quantitatively analyze the problem,预测变量.
比如本例子中,We explore the relationship between vehicle speed and brake displacement.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
cars=pd.read_csv("../data/cars.csv",usecols=["speed","dist"])
cars[:5]#speed为车速,dist为刹车距离,| 0 | 4 | 2 |
| 1 | 4 | 10 |
| 2 | 7 | 4 |
| 3 | 7 | 22 |
| 4 | 8 | 16 |
查看数据的情况
x=cars["speed"]
y=cars["dist"]
plt.title("fig1.Scatter plot of vehicle speed and braking distance")
plt.scatter(x,y,marker="*",c='red')
From the image we can see that there is roughly some kind of linear relationship between the two variables,Next we quantify it by building a regression model.
2 回归模型
Our purpose is to find the parameters of the following functions

其中x为车速,y为刹车距离.
We need to build a function to measure the difference between the predicted value and the true value,In this way, we can know how the effect of the parameters we finally obtained is.
损失函数为:
n为样本数量,
is the real value of a certain braking,
为预测值.
自然地,我们需要L达到最小值,The regression model obtained in this way is the most accurate,以下有两种方式.
3 两种求解方式
3.1 直接法
The optimal solution can be found directly using the direct method,But it needs to satisfy that the loss function is a convex function,The solution is an analytical solution.
【函数的凹凸性】
Let a function be f(x),There are two points on the image
,弦AB上一点为
.
then stringAB的方程为:
其中
,则
.弦AB的参数方程为:
因为f(x)上任A,Ba little on the stringPThe function value is greater than thenf(x)上的函数值,则有:
For the calculation method of parameters, please refer to the blog:最小二乘法
实验部分:
import sympy
#设方程为y=ax+b
#回归系数:比例系数a和偏置值b
a,b=sympy.symbols("a b")
L=0.5*np.sum((y-a*x-b)**2)
#求偏导
f1=sympy.diff(L,b)
f2=sympy.diff(L,a)
print(sympy.diff(L,a))
print(sympy.diff(L,b))
ans=sympy.solve([f1,f2],[b,a])
# Solved{b: -17.5790948905109, a: 3.93240875912409}
alpha=ans[a]
beta=ans[b]
pre=alpha*x+beta
plt.title("Fig2 Fitting results")
plt.scatter(x,y,c="red")
plt.plot(x,pre,c="green")
plt.show()Draw the resulting straight line:
Then the functional relationship is :

3.2 迭代法
There are limitations to directly optimizing the loss function,If the loss function is a non-convex function, it is difficult to find the optimal solution.Therefore, an iterative method is proposed,Similar to the back-propagation algorithm of the neural network learned earlier,The loss value is minimized by continuously updating the parameters in small increments.所以,The iterative method differs from the direct method in the way the parameters are updated:The former can be obtained by direct calculation,The latter needs to be updated by means of mini-batch gradient descent.
此时损失函数为:
为批量大小
目标求解:
参数更新:
为学习率
实验部分:
import random
#Variable update function
def update_var(pre_alpha,pre_beta,y,x,lr):
n=len(x)
diff_alpha=np.sum(-(y-pre_beta*x-pre_alpha))/n
diff_beta =np.sum(-x*(y-pre_beta*x-pre_alpha))/n
new_alpha=pre_alpha-lr*diff_alpha
new_beta=pre_beta-lr*diff_beta
return (new_alpha,new_beta)
#Implement an iterative process
def iterative_func(y,x,raw_alpha,raw_beta,lr,num,sample_num):
alpha_list = []
beta_list = []
alpha = raw_alpha
beta = raw_beta
num_list = list(range(1, len(y)+1))
for i in range(num):
alpha_list.append(alpha)
beta_list.append(beta)
random.shuffle(num_list)
index = num_list[:sample_num]
alpha, beta = update_var(alpha, beta,y[index], x[index], lr)
print("【{}】Now alpha:{},beta:{}".format(i,alpha,beta))
return (alpha_list, beta_list)
#随机初始化alpha和beta
raw_alpha=np.random.random()*10
raw_beta =np.random.random()*10
raw_alpha
#设置超参数:学习率lr=0.005 迭代次数num=2000,sample_num=16
lr = 0.005
num = 10000
sample_num = 16
alpha_list, beta_list = iterative_func(y, x, raw_alpha, raw_beta,
lr, num,sample_num)
print("after {} times iteration:alpha: {}, beta:{}".format(num,alpha_list[-1], beta_list[-1]))after 10000 times iteration:alpha: -17.791370073935074, beta:4.000650060840287
#Persist the data generated by the parameter iteration process
import csv
var_data=zip(alpha_list,beta_list)
with open("../data/20220801_vardata.csv",'w',newline='')as f:
csv_writer=csv.writer(f)
csv_writer.writerows(['alpha','beta'])
csv_writer.writerows(var_data)
plt.subplot(121)
plt.plot(alpha_list)
plt.title("alpha change process")
plt.subplot(122)
plt.plot(beta_list)
plt.title("beta change process")
plt.show()
参考
(44条消息) Machine learning theory and case studies(part2)--回归_GoatGui的博客-CSDN博客
边栏推荐
猜你喜欢

技术分享 | 接口测试中如何使用Json 来进行数据交互 ?

很多人喜欢用多御安全浏览器,竟是因为这些原因

Docker实践经验:Docker 上部署 mysql8 主从复制
![[Three sons] C language implements simple three sons](/img/96/c3f6c331cbc6d794dc5381cf176ba7.png)
[Three sons] C language implements simple three sons

【Leetcode】1206. Design Skiplist

How to get the best power efficiency in Windows 11?

2022 6th Strong Net Cup Part WP

20220725 Information update

Excel导入和导出
![[LeetCode304 Weekly Competition] Two questions about the base ring tree 6134. Find the closest node to the given two nodes, 6135. The longest cycle in the graph](/img/63/16de443caf28644d79dc6e6889e5dd.png)
[LeetCode304 Weekly Competition] Two questions about the base ring tree 6134. Find the closest node to the given two nodes, 6135. The longest cycle in the graph
随机推荐
Flink Yarn Per Job - CliFrontend
GIF制作-灰常简单的一键动图工具
【Leetcode】475. Heaters
Data Organization --- Chapter 5 Trees and Binary Trees --- The Concept of Binary Trees --- Application Questions
ICLR 2022 Best Paper: Partial Label Learning Based on Contrastive Disambiguation
【Leetcode】470. Implement Rand10() Using Rand7()
@Transactional 注解使用详解
numpy.where
【Leetcode】479. Largest Palindrome Product
ES中SQL查询详解
在MySQL中使用MD5加密【入门体验】
Spark Sql之join on and和where
【ACWing】406. 放置机器人
获取小猪民宿(短租)数据
security跨域配置
使用Ganache、web3.js和remix在私有链上部署并调用合约
如何用Redis实现分布式锁?
【三子棋】C语言实现简易三子棋
Wincc报表教程(SQL数据库的建立,wincc在数据库中保存和查询数据,调用Excel模板把数据保存到指定的位置和打印功能)
Quartus 使用 tcl 文件快速配置管脚
,弦AB上一点为
.

,则
.弦AB的参数方程为:
