当前位置:网站首页>5. Logistic regression
5. Logistic regression
2022-07-05 23:38:00 【CGOMG】
What is logical regression

Application scenarios

The principle of logical regression
Master logistic regression , You must master the following two points
- In logical regression , What is the input value
- How to judge the output of logistic regression
Input

Activation function

Measure losses

Loss


Optimize

API

Tumor prediction cases
Data is introduced

Code implementation
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
# get data
names = ['Sample code number', 'Clump Thickness', 'Uniformity of Cell Size', 'Uniformity of Cell Shape','Marginal Adhesion', 'Single Epithelial Cell Size', 'Bare Nuclei', 'Bland Chromatin','Normal Nucleoli', 'Mitoses', 'Class']
data = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data",names=names)
data.head()

# Basic data processing
# Missing value processing
data = data.replace(to_replace="?",value=np.nan)
data = data.dropna()
# Determine eigenvalue , The target
x = data.iloc[:,1:-1]
y = data["Class"]
# Split data
x_train,x_test,y_train,y_test = train_test_split(x,y,random_state=22,test_size=0.2)
# Feature Engineering Standardization
transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)
x_test = transfer.transform(x_test)
# machine learning
estmator = LogisticRegression()
estmator.fit(x_train,y_train)
# Model to evaluate
print(" Accuracy rate :\n",estmator.score(x_test,y_test))
print(" Predictive value :\n",estmator.predict(x_test))

Evaluation method
Accuracy and recall rate
Confusion matrix

The accuracy formula we used before is :(TP+TN)/(TP+Fn+FP+TN)
Accuracy (Precision) And recall rate (Recall)

Accuracy :(TP)/(TP+FP)
Recall rate :(TP)/(TP+FN)
F1-score

Classification assessment report api

from sklearn.metrics import classification_report
y_pre = estmator.predict(x_test)
ret = classification_report(y_test,y_pre,labels=(2,4),target_names=(" Benign "," Malignant "))
print(ret)

ROC Curve and AUC indicators
TPR And FPR

ROC curve

AUC indicators

AUC Calculation API

from sklearn.metrics import roc_auc_score
y_test = np.where(y_test>3,1,0)
roc_auc_score(y_test,y_pre)

Solve the problem of category imbalance
pip3 install imbalanced-learn
Prepare category imbalance data
from sklearn.datasets import make_classification
import matplotlib.pylab as plt
from collections import Counter
X,Y = make_classification(n_samples=5000,
n_features=2, # The number of features = n_informative()+ n_redundant()+ n_repeated()
n_informative=2,# Number of multi-information features
n_redundant=0,# Redundant information ,informative Random linear combination of features
n_repeated=0,# Duplicate information , Random extraction n_informative and n_redundant features
n_classes=3,# Classification categories
n_clusters_per_class=1,# A certain category is composed of several cluster Composed of
weights=[0.01,0.05,0.94],# List the type , Weight ratio
random_state=0)
X,Y,X.shape

Counter(y)

# Data visualization
plt.scatter(X[:,0],X[:,1],c=Y)
plt.show()

terms of settlement

Oversampling method

Random oversampling method

from imblearn.over_sampling import RandomOverSampler
ros = RandomOverSampler(random_state=0)
X_resampled,Y_resampled = ros.fit_resample(X,Y)
Counter(Y_resampled)

# Data visualization
plt.scatter(X_resampled[:,0],X_resampled[:,1],c=Y_resampled)
plt.show()


Oversampling representative algorithm -SMOTE



from imblearn.over_sampling import SMOTE
X_resampled,Y_resampled = SMOTE().fit_resample(X,Y)
Counter(Y_resampled)

# Data visualization
plt.scatter(X_resampled[:,0],X_resampled[:,1],c=Y_resampled)
plt.show()

Under sampling method

Random undersampling method

from imblearn.under_sampling import RandomUnderSampler
rus = RandomUnderSampler(random_state=0)
X_resampled,Y_resampled = rus.fit_resample(X,Y)
Counter(Y_resampled)

# Data visualization
plt.scatter(X_resampled[:,0],X_resampled[:,1],c=Y_resampled)
plt.show()


边栏推荐
- (4)UART應用設計及仿真驗證2 —— TX模塊設計(無狀態機)
- [original] what is the core of programmer team management?
- UVA11294-Wedding(2-SAT)
- Différence entre hors bande et en bande
- Rasa 3.x 学习系列-Rasa 3.2.1 新版本发布
- UVA11294-Wedding(2-SAT)
- How to enable relationship view in phpMyAdmin - how to enable relationship view in phpMyAdmin
- Latex multiple linebreaks
- 秒杀系统的设计与实现思路
- 基于脉冲神经网络的物体检测
猜你喜欢

SpreadJS 15.1 CN 与 SpreadJS 15.1 EN

Scala concurrent programming (II) akka

98. 验证二叉搜索树 ●●

Go language implementation principle -- map implementation principle

Do you regret becoming a programmer?

【LeetCode】5. Valid palindrome

el-cascader的使用以及报错解决

TVS管 与 稳压二极管参数对比

orgchart. JS organization chart, presenting structural data in an elegant way

Go语言实现原理——锁实现原理
随机推荐
Multi view 3D reconstruction
【原创】程序员团队管理的核心是什么?
Difference between out of band and in band
From the perspective of quantitative genetics, why do you get the bride price when you get married
MySQL replace primary key delete primary key add primary key
3D reconstruction of point cloud
CIS benchmark tool Kube bench
It is proved that POJ 1014 module is optimized and pruned, and some recursion is wrong
VS2010编写动态链接库DLL和单元测试,转让DLL测试的正确性
【LeetCode】5. Valid palindrome
GFS distributed file system
Xinyuan & Lichuang EDA training camp - brushless motor drive
11gR2 Database Services for "Policy" and "Administrator" Managed Databases (文件 I
如何提升口才
In C#, why can't I modify the member of a value type instance in a foreach loop?
orgchart. JS organization chart, presenting structural data in an elegant way
How to enable relationship view in phpMyAdmin - how to enable relationship view in phpMyAdmin
3:第一章:认识JVM规范2:JVM规范,简介;
poj 2762 Going from u to v or from v to u? (infer whether it is a weak link diagram)
Solution to the packaging problem of asyncsocket long connecting rod