当前位置:网站首页>Machine Learning - Notes and Implementation of Linear Regression, Logistic Regression Problems
Machine Learning - Notes and Implementation of Linear Regression, Logistic Regression Problems
2022-07-31 07:45:00 【Miracle Fan】
回归问题
概述:
A regression problem is all about predicting the value of a continuous problem,比如……,
And if the above regression problem,利用Sigmoid函数(Logistic 回归),The ability to turn a predicted value into a probability of judging whether or not something can be done,Change the continuous value obtained by regression to (0,1)之间的概率,It can then be used to deal with binary classification problems
一元线性回归
线性回归方程为:
y ^ = a x + b \hat{y} =ax+b y^=ax+b
For example, given a set of data,The following scatterplot can be obtained.
x=np.array([1,2,4,6,8])
y=np.array([2,5,7,8,9])
for linear regression,Equivalent to we fit a straight line,It is a good way to connect the various samples in the above figure,But in general, the perfect fitting effect cannot be achieved,Just wish it was as shown in the image below,The green line represents the error between the predicted point and the true point,We want the error to be as small as possible,That is, a better fitting effect can be achieved.
y_pred=lambda x: a*x+b
plt.scatter(x,y,color='b')
plt.plot(x,y_pred(x),color='r')
plt.plot([x,x], [y,y_pred(x)], color='g')
plt.show()
That is, a loss function can be defined:
L = 1 n ∑ i = 1 n ( y i − y p r e d i ) L=\frac{1}{n}\sum^n_{i=1}(y^i-y_{pred}^i) L=n1i=1∑n(yi−ypredi)
But if you choose this function,When we do the error calculation,In some cases the predicted value is larger than the true value,In some cases the predicted value is smaller than the true value.则会导致 y − y p r e d y-y_{pred} y−ypred出现正、negative case,when adding them together,will cause mutual cancellation,So here we need to use the mean squared loss function:
L = 1 n ∑ i = 1 n ( y i − y p r e d i ) 2 L=\frac{1}{n}\sum^n_{i=1}(y^i-y_{pred}^i)^2 L=n1i=1∑n(yi−ypredi)2
Substitute into the fitting equation:
L = 1 n ∑ i = 1 n ( y i − a x i − b ) 2 L=\frac{1}{n}\sum^n_{i=1}(y^i-ax^i-b)^2 L=n1i=1∑n(yi−axi−b)2
Use the least squares method to derive the rule:
a = ∑ i = 1 n ( x i − x ˉ ) ( y i − y ˉ ) ∑ i = 1 n ( x i − x ˉ ) 2 b = y ˉ − a x ˉ a=\frac{\sum_{i=1}^n(x_i-\bar{x})(y^i-\bar{y})}{\sum_{i=1}^n(x^i-\bar{x})^2}\\ b=\bar{y}-a\bar{x} a=∑i=1n(xi−xˉ)2∑i=1n(xi−xˉ)(yi−yˉ)b=yˉ−axˉ
def Linear_Regression(x,y):
x_mean=np.mean(x)
y_mean=np.mean(y)
# num=np.sum((x-np.tile(x_mean,x.shape))*(y-np.tile(y_mean,y.shape)))
num=np.sum((x-x_mean)*(y-y_mean))
den=np.sum((x-x_mean)**2)
a=num/den
b=y_mean-a*x_mean
return a,b
由于numpy的广播机制,Not necessary herex_mean的维度进行调整.
多元线性回归
对于多元线性回归,其一般表达式为:
y = θ 0 + θ 1 x 1 + θ 2 x 2 + ⋯ + θ n x n y=\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_nx_n y=θ0+θ1x1+θ2x2+⋯+θnxn
This formula can be simplified to :
Y = θ ⋅ X Y=\theta \cdot X Y=θ⋅X
X = ( 1 x 11 ⋯ x 1 p 1 x 21 ⋯ x 2 p ⋮ ⋮ ⋱ ⋮ 1 x n 1 ⋯ x n p ) X=\left(\begin{array}{cccc} 1 & x_{11} & \cdots & x_{1 p} \\ 1 & x_{21} & \cdots & x_{2 p} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{n 1} & \cdots & x_{n p} \end{array}\right) X=⎝⎛11⋮1x11x21⋮xn1⋯⋯⋱⋯x1px2p⋮xnp⎠⎞
θ = ( θ 0 θ 1 θ 2 … θ n ) \theta=\begin{pmatrix} \theta_0\\ \theta_1\\ \theta_2\\ \dots \\ \theta _n \end{pmatrix} θ=⎝⎛θ0θ1θ2…θn⎠⎞
而对于 θ \theta θ的求解,Use the least squares method used above,可以得到:
θ = ( X i T X i ) − 1 X i T y \theta=(X_i^TX_i)^{-1}X_i^Ty θ=(XiTXi)−1XiTy
#Generates a column for manipulating intercept values
ones = np.ones((X_train.shape[0], 1))
#在horizentalstack in the direction
X_b = np.hstack((ones, X_train)) # 将XThe matrix is converted to the first column1,其余不变的X_b矩阵
theta = linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y_train)
interception = theta[0]
coef =theta[1:]
logistic回归
简单的logisticsRegression is the addition of linear regressionSigmoid函数,Implement compression of prediction results,keep it there(0,1)It can also be understood as a probability value,Then usually with 0.5作为分界线,概率大于0.5则为类别1反之为0.
It is used to utilize probability problems obtained by linear regressionsigmoidThe output of the function is a class question.
p = 1 1 + e − z p=\frac{1}{1+e^{-z}} p=1+e−z1
- z:Usually a linear regression equation
- p:The predicted probability,Determine which category belongs to by dividing line
梯度下降
Gradient descent is mainly used to optimize the loss function,找到lossThe parameter value with the smallest value.
For example, suppose a loss function is
L = ( x − 2.5 ) 2 − 1 L=(x-2.5)^2-1 L=(x−2.5)2−1
Then define its loss function and its derivative.
def J(theta):
try:
return (theta-2.5)**2-1
except:
return float('inf')
def dJ(theta):
return 2*(theta-2.5)
每一次迭代
θ = θ + η d J θ \theta=\theta+\eta\frac{dJ}{\theta} θ=θ+ηθdJ
def CalGradient(eta):
theta = 0.0
theta_history = [theta]
epsilon = 1e-8#Used to finally terminate the computation of gradient descent
while True:
gradient = dJ(theta)
last_theta = theta
theta = theta - eta * gradient
theta_history.append(theta)
if (abs(J(theta) - J(last_theta)) < epsilon):
break
plt.title('lr:' + str(eta))
plt.plot(x, J(x), color='r')
plt.plot(np.array(theta_history), J(np.array(theta_history)), color='b', marker='x')
plt.show()
print(len(theta_history))
It is relevant when taking different learning rates,The drop chart is shown below.The learning rate is generally in 0~1之间,As shown in the figure below when the learning rate is1时,Convergence has not been reached,And when the learning rate is greater than 1时,It will show a divergent state.
Logistic回归的损失函数
LogisticRegression The expression after incorporating linear regression is shown below:
p = 1 1 + e − θ X p=\frac{1}{1+e^{-\theta X}} p=1+e−θX1
对于Logistic回归,The logarithmic loss function is generally used,Compute the parameters.
c o s t = { − log ( p p r e d ) i f y = 1 − log ( 1 − p p r e d ) i f y = 0 cost=\left \{\begin{matrix} -\log^{(p_{pred})} \quad if &y=1\\ -\log^{(1-p_{pred})} \quad if &y=0 \end{matrix}\right. cost={ −log(ppred)if−log(1−ppred)ify=1y=0
With a little tidying up, a loss function can be synthesized:
c o s t = − y log ( p p r e d ) − ( 1 − y ) log ( 1 − p p r e d ) cost=-y\log(p_{pred})-(1-y)\log({1-p_{pred}}) cost=−ylog(ppred)−(1−y)log(1−ppred)
import numpy as np
class LogisticRegression:
def __init__(self):
self.coef_ = None
self.intercept_ = None
self._theta = None
def _sigmoid(self, x):
y = 1.0 / (1.0 + np.exp(-x))
return y
def fit(self, x_train, y_train, eta=0.01, n_iters=1e4):
assert x_train.shape[0] == y_train.shape[0], 'The number of training set and its label length samples needs to be consistent'
def J(theta, x, y):
p_pred = self._sigmoid(x.dot(theta))
try:
return -np.sum(y * np.log(p_pred) + (1 - y) * np.log(1 - p_pred)) / len(y)
except:
return float('inf')
def dJ(theta, x, y):
x = self._sigmoid(x.dot(theta))
return x.dot(x - y) / len(x)
# Simulate gradient descent
def gradient_descent(X_b, y, initial_theta, eta, n_iters=1e4, epsilon=1e-8):
theta = initial_theta
i_iter = 0
while i_iter < n_iters:
gradient = dJ(theta, X_b, y)
last_theta = theta
theta = theta - eta * gradient
i_iter += 1
if (abs(J(theta, X_b, y) - J(last_theta, X_b, y)) < epsilon):
break
return theta
X_b = np.hstack([np.ones((len(x_train), 1)), x_train])
initial_theta = np.zeros(X_b.shape[1]) # 列向量
self._theta = gradient_descent(X_b, y_train, initial_theta, eta, n_iters)
self.intercept_ = self._theta[0] # 截距
self.coef_ = self._theta[1:] # 维度
return self
def predict_proba(self, X_predict):
X_b = np.hstack([np.ones((len(X_predict), 1)), X_predict])
return self._sigmoid(X_b.dot(self._theta))
def predict(self, X_predict):
proba = self.predict_proba(X_predict)
return np.array(proba > 0.5, dtype='int')
边栏推荐
猜你喜欢
Environment_Variable_and_SetUID
2022.07.24_每日一题
剑指offer(一)
【微服务】 微服务学习笔记二:Eureka注册中心的介绍及搭建
DirectExchange switch simple introduction demo
【C语言项目合集】这十个入门必备练手项目,让C语言对你来说不再难学!
Linked list implementation and task scheduling
【面试:并发篇37:多线程:线程池】自定义线程池
客户端navicat安装教程
Analysis of the principle and implementation of waterfall flow layout
随机推荐
Run the NPM will pop up to ask "how are you going to open this file?"
2022.07.13 _ a day
【微服务】Nacos集群搭建以及加载文件配置
Zero-Shot Learning & Domain-aware Visual Bias Eliminating for Generalized Zero-Shot Learning
双倍数据速率同步动态随机存储器(Double Data Rate Synchronous Dynamic Random Access Memory, DDR SDRAM)- 逻辑描述部分
tidyverse笔记——管道函数
Financial leasing business
单点登录 思维导图
Yu Mr Series 】 【 2022 July 022 - Go Go teaching course of container in the dictionary
ros小乌龟画图
360 push-360 push tool-360 batch push tool
【Star项目】小帽飞机大战(八)
2022.07.22_每日一题
从入门到一位合格的爬虫师,这几点很重要
DirectExchange switch simple introduction demo
leetcode 406. Queue Reconstruction by Height
HighTec 的安装与配置
Jobject 使用
科普 | “大姨太”ETH 和 “小姨太”ETC的爱恨情仇
2022.07.29_Daily Question