当前位置:网站首页>Machine learning note 5 - logistic regression
Machine learning note 5 - logistic regression
2022-07-28 06:27:00 【I have two candies】
List of articles
1. Logistic Regression
1.1 Logistic Regression & Perceptron

1.2 Logistic Definition of regression model

1.3 Maximum likelihood estimation estimates model parameters

summary

2. Logistic Returning Python Realization
2.1 Data sets
The data set is Iris data set , Contains two types of flowers , Each sample contains two features and a category , Set the ratio of test set to training set to 1 : 4:
from math import exp
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
def create_data():
iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['label'] = iris.target
df.columns = ['sl', 'sw', 'pl', 'pw', 'label']
data = np.array(df.iloc[:100, [0, 1, -1]])
return data[:, [0, 1]], data[:, -1]
X, y = create_data()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
plt.scatter(X[:50, 0], X[:50, 1], label='0')
plt.scatter(X[50:, 0], X[50:, 1], label='1')
plt.xlabel('sepal length')
plt.ylabel('sepal width')
plt.legend()
plt.show()
give the result as follows :

2.2 Build the model
Then we build LogisticRegressionClassifier Of model:
class LogisticRegressionClassifier:
# Maximum number of iterations and learning step
def __init__(self, max_iter=200, learning_rate=0.01):
self.max_iter = max_iter
self.learning_rate = learning_rate
def sigmoid(self, x):
return 1 / (1 + exp(-x))
def expand(self, X):
matrix = []
for item in X:
matrix.append([*item, 1.0])
return matrix
def fit(self, X, y):
X = self.expand(X)
self.weights = np.zeros((len(X[0]), 1), dtype=np.float32) # columnn matrice
for iter_ in range(self.max_iter):
for item_x, item_y in zip(X, y):
res = self.sigmoid(np.dot(item_x, self.weights))
self.weights += self.learning_rate * (item_y - res) * np.transpose([item_x])
print(f'LogisticRegression Model(learning_rate={
self.learning_rate}, max_iter={
self.max_iter}')
def score(self, X_test, y_test):
success = 0
expanded_X = self.expand(X_test)
for X, y in zip(expanded_X, y_test):
predict_res = np.dot(X, self.weights) > 0.5
if predict_res == y:
success += 1
return success / len(X_test)
explain
(1) be aware ,sigmoid The function uses exp(-x) , Because use exp(x) Possible overflow !
(2) expand Function pair X Expand , Add one more 1
(3) Fit function fit() Similar to perceptron , Its principle is Maximum likelihood estimation Find parameters , have access to Random gradient Calculate the maximum , principle as follows :

The corresponding code is :
for x, y in zip(X, y) You can get one-to-one corresponding indexes x and y;
np.transpose Function can transpose matrix ,np.dot() You can perform matrix multiplication ;
def fit(self, X, y):
X = self.expand(X)
self.weights = np.zeros((len(X[0]), 1), dtype=np.float32) # columnn matrice
for iter_ in range(self.max_iter):
for item_x, item_y in zip(X, y):
res = self.sigmoid(np.dot(item_x, self.weights))
self.weights += self.learning_rate * (item_y - res) * np.transpose([item_x])
(4) When predicting , By judgment sigmoid(wx) Whether it meets more than 0.5, Less than 0.5 Adjudicated as 0 class , Or sentenced to 1 class
2.3 test result
Let's test the score of the model :
clf = LogisticRegressionClassifier()
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test))
x_ponits = np.arange(4, 8)
y_ = -(clf.weights[0]*x_ponits + clf.weights[2])/clf.weights[1]
plt.plot(x_ponits, y_)
plt.scatter(X[:50,0],X[:50,1], label='0')
plt.scatter(X[50:,0],X[50:,1], label='1')
plt.legend()
plt.show()
Forecast score 1.0

3. scikit-learn example
3.1 LogisticRegression
stay scikit-learn The linear model of Linear Model It contains the model LogisticRegression, Use method can refer to sklearn.linear_model.LogisticRegression¶:

Parameters solver
solver Parameters determine our optimization method for loss function of logistic regression , There are four algorithms to choose from , Namely :
- a) liblinear: Using open source liblinear Library implementation , The axis descent method is used to iteratively optimize the loss function .
- b) lbfgs: One of the quasi Newton methods , The second derivative matrix of loss function is used to optimize the loss function .
- c) newton-cg: It's also a kind of Newton method family , The second derivative matrix of loss function is used to optimize the loss function .
- d) sag: That is, the random average gradient drops , It 's a variation of the gradient descent method , The difference with the ordinary gradient descent method is that each iteration only uses a part of the sample to calculate the gradient , It is suitable when there are many sample data .
3.2 Example
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
def create_data():
iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['label'] = iris.target
df.columns = ['sl', 'sw', 'pl', 'pw', 'label']
data = np.array(df.iloc[:100, [0, 1, -1]])
return data[:, [0, 1]], data[:, -1]
X, y = create_data()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf = LogisticRegression(max_iter=200)
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test))
print(clf.coef_, clf.intercept_)
give the result as follows
1.0
[[ 2.86401035 -2.76369768]] [-6.92179114]
The last line is preceded by w, The last one is Separate parts b
REFERENCE
- Li Hang's learning method of Statistics
- machine learning
- scikit-learn
边栏推荐
- Cautious speculation about fusion on Apple silicon
- Matlab 信号处理
- A NOVEL DEEP PARALLEL TIME-SERIES RELATION NETWORK FOR FAULT DIAGNOSIS
- (PHP graduation design) obtained based on PHP fruit sales store management system
- set_multicycle_path
- Fluke dtx-1800 and its accessories dtx-cha002 channel adapter channel replacement RJ45 socket notes
- EfficientNET_V1
- AEM online product promotion conference - Cable certification tester
- Led selection - hardware learning notes 3
- set_ case_ analysis
猜你喜欢

clock tree分析实例

Overall understanding of PLC

Beta分布(概率的概率)

set_clock_groups

IMS-FACNN(Improved Multi-Scale Convolution Neural Network integrated with a Feature Attention Mecha

Arduino reads the analog voltage_ How mq2 gas / smoke sensor works and its interface with Arduino

How to test industrial Ethernet cables (using fluke dsx-8000)?

TCL和ELTCL?CDNEXT和CMRL?

论福禄克DTX-1800如何测试CAT7网线?

vi和vim命令
随机推荐
(PHP graduation project) based on thinkphp5 community property management system
简述EMD分解、希尔伯特变换、谱方法
AEM online product promotion conference - Cable certification tester
How to pop up the message dialog box
Detailed explanation of creepage distance and electrical clearance
Uniapp problem: "navigationbartextstyle" error: invalid prop: custom validator check failed for prop "Navigator
USB network native driver for esxi updated to support esxi7.0 Update 2
VAN(DWConv+DWDilationConv+PWConv)
How to use the bit error meter?
Fluke dtx-sfm2 single mode module of a company in Hangzhou - repair case
AEM testpro K50 and south Guangdong survey
CLIP Learning Transferable Visual Models From Natural Language Supervision
PT physical aware based on multi voltage
Photovoltaic power generation system MPPT maximum power point tracking
Word邮件合并功能详解:合并后生成多个word文档,删除空白页
TVs tube parameters and selection
clock tree分析实例
Agilent Agilent e5071 test impedance and attenuation are normal, except crosstalk ng--- Repair plan
Talk about the "hybrid mode" of esxi virtual switch and port group
(PHP graduation project) obtained based on PHP novel website management system