当前位置：网站首页>Machine learning note 5 - logistic regression

Machine learning note 5 - logistic regression

2022-07-28 06:27:00 【I have two candies】

List of articles

1. Logistic Regression
2. Logistic Returning Python Realization
3. scikit-learn example
- 3.1 LogisticRegression
- 3.2 Example

1. Logistic Regression

1.1 Logistic Regression & Perceptron

1.2 Logistic Definition of regression model

1.3 Maximum likelihood estimation estimates model parameters

summary

2. Logistic Returning Python Realization

2.1 Data sets

The data set is Iris data set , Contains two types of flowers , Each sample contains two features and a category , Set the ratio of test set to training set to 1 : 4：

from math import exp
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

def create_data():
    iris = load_iris()
    df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
    df['label'] = iris.target
    df.columns = ['sl', 'sw', 'pl', 'pw', 'label']
    data = np.array(df.iloc[:100, [0, 1, -1]])
    return data[:, [0, 1]], data[:, -1]

X, y = create_data()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

plt.scatter(X[:50, 0], X[:50, 1], label='0')
plt.scatter(X[50:, 0], X[50:, 1], label='1')
plt.xlabel('sepal length')
plt.ylabel('sepal width')
plt.legend()
plt.show()

give the result as follows ：

2.2 Build the model

Then we build LogisticRegressionClassifier Of model：

class LogisticRegressionClassifier:
	#  Maximum number of iterations and learning step 
    def __init__(self, max_iter=200, learning_rate=0.01):
        self.max_iter = max_iter
        self.learning_rate = learning_rate

    def sigmoid(self, x):
        return 1 / (1 + exp(-x))

    def expand(self, X):
        matrix = []
        for item in X:
            matrix.append([*item, 1.0])
        return matrix

    def fit(self, X, y):
        X = self.expand(X)
        self.weights = np.zeros((len(X[0]), 1), dtype=np.float32) # columnn matrice

        for iter_ in range(self.max_iter):
            for item_x, item_y in zip(X, y):
                res = self.sigmoid(np.dot(item_x, self.weights))
                self.weights += self.learning_rate * (item_y - res) * np.transpose([item_x]) 
        print(f'LogisticRegression Model(learning_rate={
      self.learning_rate}, max_iter={
      self.max_iter}')

    def score(self, X_test, y_test):
        success = 0
        expanded_X = self.expand(X_test)
        for X, y in zip(expanded_X, y_test):
            predict_res = np.dot(X, self.weights) > 0.5
            if predict_res == y:
                success += 1
        return success / len(X_test)

explain

(1) be aware ,sigmoid The function uses exp(-x) , Because use exp(x) Possible overflow ！

(2) expand Function pair X Expand , Add one more 1

(3) Fit function fit() Similar to perceptron , Its principle is Maximum likelihood estimation Find parameters , have access to Random gradient Calculate the maximum , principle as follows ：

The corresponding code is ：

for x, y in zip(X, y) You can get one-to-one corresponding indexes x and y;

np.transpose Function can transpose matrix ,np.dot() You can perform matrix multiplication ;

def fit(self, X, y):
    X = self.expand(X)
    self.weights = np.zeros((len(X[0]), 1), dtype=np.float32) # columnn matrice

    for iter_ in range(self.max_iter):
        for item_x, item_y in zip(X, y):
            res = self.sigmoid(np.dot(item_x, self.weights))
            self.weights += self.learning_rate * (item_y - res) * np.transpose([item_x])

(4) When predicting , By judgment sigmoid(wx) Whether it meets more than 0.5, Less than 0.5 Adjudicated as 0 class , Or sentenced to 1 class

2.3 test result

Let's test the score of the model ：

clf = LogisticRegressionClassifier()
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test))

x_ponits = np.arange(4, 8)
y_ = -(clf.weights[0]*x_ponits + clf.weights[2])/clf.weights[1]
plt.plot(x_ponits, y_)
plt.scatter(X[:50,0],X[:50,1], label='0')
plt.scatter(X[50:,0],X[50:,1], label='1')
plt.legend()
plt.show()

Forecast score 1.0

3. scikit-learn example

3.1 LogisticRegression

stay scikit-learn The linear model of Linear Model It contains the model LogisticRegression, Use method can refer to sklearn.linear_model.LogisticRegression¶:

Parameters solver

solver Parameters determine our optimization method for loss function of logistic regression , There are four algorithms to choose from , Namely ：

a) liblinear： Using open source liblinear Library implementation , The axis descent method is used to iteratively optimize the loss function .
b) lbfgs： One of the quasi Newton methods , The second derivative matrix of loss function is used to optimize the loss function .
c) newton-cg： It's also a kind of Newton method family , The second derivative matrix of loss function is used to optimize the loss function .
d) sag： That is, the random average gradient drops , It 's a variation of the gradient descent method , The difference with the ordinary gradient descent method is that each iteration only uses a part of the sample to calculate the gradient , It is suitable when there are many sample data .

3.2 Example

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

def create_data():
    iris = load_iris()
    df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
    df['label'] = iris.target
    df.columns = ['sl', 'sw', 'pl', 'pw', 'label']
    data = np.array(df.iloc[:100, [0, 1, -1]])
    return data[:, [0, 1]], data[:, -1]

X, y = create_data()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

clf = LogisticRegression(max_iter=200)
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test))
print(clf.coef_, clf.intercept_)

give the result as follows

1.0
[[ 2.86401035 -2.76369768]] [-6.92179114]

The last line is preceded by w, The last one is Separate parts b

REFERENCE

Li Hang's learning method of Statistics
machine learning
scikit-learn

原网站

版权声明
本文为[I have two candies]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/209/202207280519051072.html