当前位置:网站首页>Machine learning note 5 - logistic regression
Machine learning note 5 - logistic regression
2022-07-28 06:27:00 【I have two candies】
List of articles
1. Logistic Regression
1.1 Logistic Regression & Perceptron

1.2 Logistic Definition of regression model

1.3 Maximum likelihood estimation estimates model parameters

summary

2. Logistic Returning Python Realization
2.1 Data sets
The data set is Iris data set , Contains two types of flowers , Each sample contains two features and a category , Set the ratio of test set to training set to 1 : 4:
from math import exp
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
def create_data():
iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['label'] = iris.target
df.columns = ['sl', 'sw', 'pl', 'pw', 'label']
data = np.array(df.iloc[:100, [0, 1, -1]])
return data[:, [0, 1]], data[:, -1]
X, y = create_data()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
plt.scatter(X[:50, 0], X[:50, 1], label='0')
plt.scatter(X[50:, 0], X[50:, 1], label='1')
plt.xlabel('sepal length')
plt.ylabel('sepal width')
plt.legend()
plt.show()
give the result as follows :

2.2 Build the model
Then we build LogisticRegressionClassifier Of model:
class LogisticRegressionClassifier:
# Maximum number of iterations and learning step
def __init__(self, max_iter=200, learning_rate=0.01):
self.max_iter = max_iter
self.learning_rate = learning_rate
def sigmoid(self, x):
return 1 / (1 + exp(-x))
def expand(self, X):
matrix = []
for item in X:
matrix.append([*item, 1.0])
return matrix
def fit(self, X, y):
X = self.expand(X)
self.weights = np.zeros((len(X[0]), 1), dtype=np.float32) # columnn matrice
for iter_ in range(self.max_iter):
for item_x, item_y in zip(X, y):
res = self.sigmoid(np.dot(item_x, self.weights))
self.weights += self.learning_rate * (item_y - res) * np.transpose([item_x])
print(f'LogisticRegression Model(learning_rate={
self.learning_rate}, max_iter={
self.max_iter}')
def score(self, X_test, y_test):
success = 0
expanded_X = self.expand(X_test)
for X, y in zip(expanded_X, y_test):
predict_res = np.dot(X, self.weights) > 0.5
if predict_res == y:
success += 1
return success / len(X_test)
explain
(1) be aware ,sigmoid The function uses exp(-x) , Because use exp(x) Possible overflow !
(2) expand Function pair X Expand , Add one more 1
(3) Fit function fit() Similar to perceptron , Its principle is Maximum likelihood estimation Find parameters , have access to Random gradient Calculate the maximum , principle as follows :

The corresponding code is :
for x, y in zip(X, y) You can get one-to-one corresponding indexes x and y;
np.transpose Function can transpose matrix ,np.dot() You can perform matrix multiplication ;
def fit(self, X, y):
X = self.expand(X)
self.weights = np.zeros((len(X[0]), 1), dtype=np.float32) # columnn matrice
for iter_ in range(self.max_iter):
for item_x, item_y in zip(X, y):
res = self.sigmoid(np.dot(item_x, self.weights))
self.weights += self.learning_rate * (item_y - res) * np.transpose([item_x])
(4) When predicting , By judgment sigmoid(wx) Whether it meets more than 0.5, Less than 0.5 Adjudicated as 0 class , Or sentenced to 1 class
2.3 test result
Let's test the score of the model :
clf = LogisticRegressionClassifier()
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test))
x_ponits = np.arange(4, 8)
y_ = -(clf.weights[0]*x_ponits + clf.weights[2])/clf.weights[1]
plt.plot(x_ponits, y_)
plt.scatter(X[:50,0],X[:50,1], label='0')
plt.scatter(X[50:,0],X[50:,1], label='1')
plt.legend()
plt.show()
Forecast score 1.0

3. scikit-learn example
3.1 LogisticRegression
stay scikit-learn The linear model of Linear Model It contains the model LogisticRegression, Use method can refer to sklearn.linear_model.LogisticRegression¶:

Parameters solver
solver Parameters determine our optimization method for loss function of logistic regression , There are four algorithms to choose from , Namely :
- a) liblinear: Using open source liblinear Library implementation , The axis descent method is used to iteratively optimize the loss function .
- b) lbfgs: One of the quasi Newton methods , The second derivative matrix of loss function is used to optimize the loss function .
- c) newton-cg: It's also a kind of Newton method family , The second derivative matrix of loss function is used to optimize the loss function .
- d) sag: That is, the random average gradient drops , It 's a variation of the gradient descent method , The difference with the ordinary gradient descent method is that each iteration only uses a part of the sample to calculate the gradient , It is suitable when there are many sample data .
3.2 Example
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
def create_data():
iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['label'] = iris.target
df.columns = ['sl', 'sw', 'pl', 'pw', 'label']
data = np.array(df.iloc[:100, [0, 1, -1]])
return data[:, [0, 1]], data[:, -1]
X, y = create_data()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf = LogisticRegression(max_iter=200)
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test))
print(clf.coef_, clf.intercept_)
give the result as follows
1.0
[[ 2.86401035 -2.76369768]] [-6.92179114]
The last line is preceded by w, The last one is Separate parts b
REFERENCE
- Li Hang's learning method of Statistics
- machine learning
- scikit-learn
边栏推荐
- 雷达成像 Matlab 仿真 1 —— LFM信号及其频谱
- Agilent Agilent e5071 test impedance and attenuation are normal, except crosstalk ng--- Repair plan
- DSX2-8000如何校准?校准流程?
- mixup_ ratio
- set_ false_ path
- Agilent安捷伦 E5071测试阻抗、衰减均正常,惟独串扰NG?---修复方案
- TCL和ELTCL?CDNEXT和CMRL?
- Talking about fluke optical cable certification? What is CFP? What is OFP?
- An example of bill printing
- EfficientNET_ V1
猜你喜欢

TCL和ELTCL?CDNEXT和CMRL?

Pycharm2019设置编辑器主题和默认代码

Cautious speculation about fusion on Apple silicon

VS Code 基础配置与美化

How does fluke dtx-1800 test cat7 network cable?

set_ clock_ groups

Surge impact immunity experiment (surge) -emc series Hardware Design Notes 6

How to view the transfer function of the module directly built by Simulink

Communication between DSP and FPGA

mixup_ ratio
随机推荐
set_ case_ analysis
PyTorch 学习笔记 4 —— 自动计算梯度下降 AUTOGRAD
Perl入门学习(八)子程序
Beta distribution (probability of probability)
Perl入门学习(九)引用
set_ clock_ groups
Detailed explanation of creepage distance and electrical clearance
Analysis of MOSFET damage at the moment of power failure of isolated power supply
Esxi on ARM v1.2 (updated in November 2020)
(PHP graduation project) obtained based on thinkphp5 campus news release management system
测量电脑电池容量
Systemmediasize startup option added in esxi 7.0 update 1C
Efficient Net_V2
雷达成像 Matlab 仿真 4 —— 距离分辨率分析
Weight decay
Agilent Agilent e5071 test impedance and attenuation are normal, except crosstalk ng--- Repair plan
Triode design, understanding saturation, linear region and cut-off region
如何测试工业以太网线缆(利用FLUKE DSX-8000)?
(PHP graduation project) based on PHP user online submission management system
USB network native driver for esxi updated to support esxi7.0.1