当前位置：网站首页>Linear regression of machine learning (least square handwriting +sklearn Implementation)

Linear regression of machine learning (least square handwriting +sklearn Implementation)

2022-07-29 09:43:00 【Cyril-KI】

Write it at the front

This part is in my CSDN It has been mentioned in the blog , It's the basic part . Now move to official account , No more like CSDN It's as simple as that , I try to make it easy to understand .

One 、 Basic form

We use it Examples of attribute descriptions , among , yes In the Values on attributes . Linear model (Linear Model) Is trying to use a linear combination to describe some kind of comprehensive score of an example ：

Generally, we write it in vector form ：

among .

For example, we use five groups of attributes to describe whether a person is a good spouse ：( Gender , character , Age , Appearance , Wealth ).

Then everyone can use a vector to express ： Gender, personality, age, appearance, wealth

Then judge whether a person is a good spouse , We can define the following linear model ：

Gender, personality, age, appearance, wealth Final , The higher the score, the more likely this person is to be a good spouse .

Two 、 Linear regression

Linear regression attempts to learn a linear model to predict real value output markers as accurately as possible . By establishing the cost function on the data set (loss function), Finally, the model parameters are determined with the objective of optimizing the cost function , Thus, the model can be used for subsequent prediction .

Imagine a scenario ： Now there's a bunch of data , It contains 10000 The values of the above five attributes of an individual and their scores , Now we need to find out and , Then predict others （ Only know the attribute value ） Of .

So our ultimate goal is ： Look for parameters and , bring and For this 10000 The predicted value of people and the real regression goal （ What has been given ） The mean square error between is the smallest . So definition ：

among and Separate indication control Attribute values and tags of samples （ score ）.

The goal is ： Just to find out and , bring Minimum , So we can think we found and It has the best adaptability to this batch of data , If this batch of samples has good universality , We can use it and To predict others' scores .

3、 ... and 、 solve

To facilitate derivation , We are right. Make further simplification , We make ：

Re order So finally ：

then Yes Derivation is equal to 0：

And then we get ：

So in the end ：

If you give us a person's attribute values , We can use the above formula to calculate the score of this person .

Four 、 Code implementation

1. Import package

from sklearn import linear_model
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error,r2_score,mean_absolute_error

sklearn There is a special linear model package in linear_model,matplotlib For drawing , In addition, import MSE,R_Square and MAE Three evaluation indicators .

2. Construct data set

Data can be generated automatically , You can also look for existing data , The following data is the data in the job Salary_Data.csv, The sample data has only one feature .

def load_data():
    data=pd.read_csv('Salary_Data.csv',encoding='gbk')
    data=data.values.tolist()
    train_x=[];train_y=[]
    test_x=[];test_y=[]
    # The first half is a training set , The latter half is used as a test set 
    for i in range(len(data)):
        if i<len(data)/2:
            train_x.append(data[i][0])
            train_y.append(data[i][1])
        else:
            test_x.append(data[i][0])
            test_y.append(data[i][1])
    return train_x,train_y,test_x,test_y

3. model training + forecast

lr = linear_model.LinearRegression()
lr.fit(train_x, train_y)
y_pred = lr.predict(test_x)

4. Output parameters and evaluation indicators

#  Output factor and intercept 
print('w:', lr.coef_, 'b:', lr.intercept_)
#  Output evaluation index 
print('MSE:', mean_squared_error(test_y, y_pred))
print('MAE:', mean_absolute_error(test_y, y_pred))
print('R2_Squared:', r2_score(test_y, y_pred))

5. drawing

#  Show 
plt.scatter(test_x, test_y, color='black')
plt.plot(test_x, y_pred, color='blue', linewidth=3)
plt.show()

Complete code ：

import pandas as pd
import matplotlib.pyplot as plt
from sklearn import linear_model
import numpy as np
from sklearn.metrics import mean_squared_error,r2_score,mean_absolute_error

def load_data():
    data=pd.read_csv('Salary_Data.csv',encoding='gbk')
    data=data.values.tolist()
    train_x=[];train_y=[]
    test_x=[];test_y=[]
    # The first half is a training set , The latter half is used as a test set 
    for i in range(len(data)):
        if i<len(data)/2:
            train_x.append(data[i][0])
            train_y.append(data[i][1])
        else:
            test_x.append(data[i][0])
            test_y.append(data[i][1])
    return train_x,train_y,test_x,test_y

def model():
    print(' Handwriting ：')
    train_x,train_y,test_x,test_y=load_data()
    # The parameters are obtained by the least square method 
    sum=0.0;sum_square=0.0;sum_2=0.0;sum_b=0.0
    for i in range(len(train_x)):
        sum=sum+train_x[i]
        sum_square=sum_square+train_x[i]**2
    ave_x=sum/len(train_x)
    for i in range(len(train_x)):
        sum_2=sum_2+(train_y[i]*(train_x[i]-ave_x))
    w=sum_2/(sum_square-sum**2/len(train_x))
    for i in range(len(train_x)):
        sum_b=sum_b+(train_y[i]-w*train_x[i])
    b=sum_b/len(train_x)
    print('w=',w,'b=',b)
    # test 
    pred_y=[]
    for i in range(len(test_x)):
        pred_y.append(w*test_x[i]+b)
    # Calculation MSE,MAE,r2_score
    sum_mse=0.0;sum_mae=0.0
    sum1=0.0;sum2=0.0
    for i in range(len(pred_y)):
        sum_mae=sum_mae+np.abs(pred_y[i]-test_y[i])
        sum_mse=sum_mse+(pred_y[i]-test_y[i])**2
    sum_y=0.0
    for i in range(len(test_y)):
        sum_y=sum_y+test_y[i]
    ave_y=sum_y/len(test_y)
    for i in range(len(pred_y)):
        sum1=sum1+(pred_y[i]-test_y[i])**2
        sum2=sum2+(ave_y-test_y[i])**2
    print('MSE:',sum_mse/len(pred_y))
    print('MAE:',sum_mae/len(pred_y))
    print('R2_Squared:',1-sum1/sum2)
    # Show 
    plt.scatter(test_x, test_y, color='black')
    plt.plot(test_x, pred_y, color='blue', linewidth=3)
    plt.show()
    print('\n')

# Transfer 
def sklearn_linearmodel():
    print(' Transfer ：')
    train_x, train_y, test_x, test_y = load_data()
    train_x=np.array(train_x).reshape(-1,1)
    train_y = np.array(train_y).reshape(-1, 1)
    test_x = np.array(test_x).reshape(-1, 1)
    test_y = np.array(test_y).reshape(-1, 1)
    #  Training + test 
    lr = linear_model.LinearRegression()
    lr.fit(train_x, train_y)
    y_pred = lr.predict(test_x)

    #  Output factor and intercept 
    print('w:', lr.coef_, 'b:', lr.intercept_)
    #  Output evaluation index 
    print('MSE:', mean_squared_error(test_y, y_pred))
    print('MAE:', mean_absolute_error(test_y, y_pred))
    print('R2_Squared:', r2_score(test_y, y_pred))

    #  Show 
    plt.scatter(test_x, test_y, color='black')
    plt.plot(test_x, y_pred, color='blue', linewidth=3)
    plt.show()


if __name__=='__main__':
    model()
    sklearn_linearmodel()

原网站

版权声明
本文为[Cyril-KI]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/210/202207290935020707.html

当前位置：网站首页>Linear regression of machine learning (least square handwriting +sklearn Implementation)

Linear regression of machine learning (least square handwriting +sklearn Implementation)

边栏推荐

猜你喜欢

随机推荐