当前位置:网站首页>【Machine Learning】1 Univariate Linear Regression
【Machine Learning】1 Univariate Linear Regression
2022-08-05 06:08:00 【cabbage itself】
ex1data1数据集如下
The previous column represents the population,The latter column represents profit so the variable is only population,A linear fit is now performed on this dataset,and display it graphically
具体python代码展示如下:
1,The first is the reference of the library function and the drawing of the original data
用到了numpy库,pandas库,matplotlib库中的pyplot模块(写在程序开头)
path指明文件名,The filename is written as a string(It should be noted that if it is written like this, it should bepyThe file should and the datasettxt文件在同一目录下)
pandas读取CSV文件方法:pd.read_csv(path,header=None,names=['population','profit'])
参数path上文已说明,参数header表示names=['population','profit']所插入的位置(Defaults to row insertion),header=NoneIndicates inserting the first row,And name the first column of data aspopulation,第二列为profit,Read in data and assign it to a variabledata,可知data是dataframe类型
data.head()The first five rows of data are observed by default,Numbers can be specified in parentheses(需加print进行打印)
data.plot(kind='scatter',x='population',y='profit',figsize=(8,5))#dataframeType data to draw,scatter表示绘制散点图,figsize表示画布大小
2,定义代价函数
def computecost(X,Y,theta):
inner=np.power(((X*theta.T)-Y),2)
return np.sum(inner)/(2*len(X))
np.power()对数组的power of each elementnp.sum()对数组所有元素求和//len(X)表示XThe number of rows of data is the number of samples
注意X与theta相乘的顺序
3,数据初始化
data.insert(0,'ones',1) #插入常数项1,and name a column as ones
colums=data.shape[1] #The number of columns for statistics
X=data.iloc[:,-1] #Assign to the data index respectivelyX,y
y=data.iloc[:,colums-1:colums]
X=np.matrix(X.values) #Because of the subsequent matrix multiplication,故对X,ydigital matrix
y=np.matrix(y.values)
theta=np.matrix(np.array([0,0])) #Set the parameter one constant term and one variable term
dataframe类型的插入:df.insert(0,'ones',1) Insert columns by default,在位置0Insert a column before1and the column name is ones
dataframe类型索引:df.iloc[,]Lines are indicated before commas,逗号后表示列
注意变量y的赋值:data.iloc[:,colums-1:colums]If you don't write it like thisyThe dimension will be from (97,1)变为(1,97),不利于数据处理
4,定义梯度下降函数
def gradientdescent(X,Y,theta,alpha,iters):
temp=np.matrix(np.zeros(theta.shape))
parameters=int(theta.ravel().shape[1]) #参数
cost=np.zeros(iters)
for i in range(iters):
error=(X*theta.T)-Y
for j in range(parameters):
term=np.multiply(error, X[:,j])#Refers to the multiplication of elements
temp[0,j] = theta[0,j] - ((alpha / len(X)) * np.sum(term))
theta=temp
cost[i]=computecost(X, Y, theta)
return theta,cost
itersIndicates the number of gradient iterations,alpha是学习率,parameters统计参数的个数,costThe array records the cost function value for each iteration,Observe whether the algorithm is working properly
注意X与theta相乘的顺序
np.zeros() Generate one-dimensional elements as 0的数组
np.ravel() It is used to reduce the array to one-dimensional, that is, one-dimensional arrangement
np.multiply(a,b) a,bMultiply the elements corresponding to the positions of the arrays
temp[0,j]表示temp矩阵中第0行第j列的位置
5,Function calls and predictions
alpha=0.01 #设置学习率
iters=1500 #设置迭代次数
g,cost=gradientdescent(X, Y, theta, alpha, iters)
#Below are the predicted numbers35000,70000的利润
predict1 = [1,3.5]*g.T
print("predict1:",predict1)
predict2 = [1,7]*g.T
print("predict2:",predict2)
6,Drawing of the fitted curve
#The first method to draw a fitted curve
m=np.linspace(data.Population.min(),data.Population.max(),100) #添加数据
n=g[0,0]+g[0,1]*m #numpyThe broadcast mechanism of an array multiplies all elements by the same number,都加一个数
#The old way of drawing in the video works too
#绘制预测图
fig, ax = plt.subplots(figsize=(12,8)) #Together define a coordinate for drawing on the same graph
ax.plot(m,n , 'r', label='Prediction')
ax.scatter(data.Population, data.Profit, label='Traning Data')
ax.legend(loc=2) #添加图例
ax.set_xlabel('Population') #x轴标签
ax.set_ylabel('Profit') #y轴标签
ax.set_title('Predicted Profit vs. Population Size') #表格标题
plt.show()
#The second method of drawing graphics,Set the image size first and then draw step by step
fig=plt.figure(figsize=(12,8),dpi=80)
plt.plot(x,f,"r",label='prediction')
plt.scatter(data.Population,data.Profit,label='traing data')
plt.legend()#添加图例
plt.xlabel("population")
plt.ylabel('profit')
plt.show()
g[0,0] 与g[0,1]Represents constant term parameters and variable parameters
n=g[0,0] +g[0,1]*m运用numpy数组的广播机制,Multiplication and addition are done for each element
np.linspace(start,stop,num)从start到stop中选取numnumbers form an array
总结:
1,主要是对numpy数组和dataframeTypes are poorly understood,无法灵活运用
2,I don't know about the methods in the library functions,makes programming difficult
3,cost,gradientWrite the definition of the function
4,Note in the formulaX与theta相乘的顺序
边栏推荐
- 什么是阿里云·速成美站?
- LeetCode刷题之第74题
- C语言入门笔记 —— 函数(1)
- C语言程序死循环问题解析——变量被修改
- 错误类型:reflection.ReflectionException: Could not set property ‘xxx‘ of ‘class ‘xxx‘ with value ‘xxx‘
- [Pytorch study notes] 8. How to use WeightedRandomSampler (weight sampler) when the training category is unbalanced data
- 2020年手机上最好的25种免费游戏
- 入门文档10 资源映射
- LeetCode刷题之第23题
- (C语言)动态内存管理
猜你喜欢
随机推荐
【Day6】文件系统权限管理 文件特殊权限 隐藏属性
Autoware--北科天绘rfans激光雷达使用相机&激光雷达联合标定文件验证点云图像融合效果
UE4动画雨滴材质制作教程
D39_欧拉角与四元数
spark源码-RPC通信机制
成功的独立开发者应对失败&冒名顶替综
2020,Laya最新中高级面试灵魂32问,你都知道吗?
【UiPath2022+C#】UiPath 循环
手把手教你搭建小程序
【Day1】(超详细步骤)构建软RAID磁盘阵列
来来来,一文让你读懂Cocos Creator如何读写JSON文件
如何用UE5渲染一个可爱的茶壶屋?
将一句话的单词进行倒置(C语言纯代码)
C语言入门笔记 —— 初识
dsf5.0新建页面访问时重定向到首页的问题
【Day5】软硬链接 文件存储,删除,目录管理命令
乘云科技受邀出席2022阿里云合作伙伴大会荣获“聚力行远奖”
每日一题-二分法
【UiPath2022+C#】UiPath 数据操作
spark算子-textFile算子