当前位置:网站首页>5. Logistic regression
5. Logistic regression
2022-07-05 23:38:00 【CGOMG】
What is logical regression
Application scenarios
The principle of logical regression
Master logistic regression , You must master the following two points
- In logical regression , What is the input value
- How to judge the output of logistic regression
Input
Activation function
Measure losses
Loss
Optimize
API
Tumor prediction cases
Data is introduced
Code implementation
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
# get data
names = ['Sample code number', 'Clump Thickness', 'Uniformity of Cell Size', 'Uniformity of Cell Shape','Marginal Adhesion', 'Single Epithelial Cell Size', 'Bare Nuclei', 'Bland Chromatin','Normal Nucleoli', 'Mitoses', 'Class']
data = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data",names=names)
data.head()
# Basic data processing
# Missing value processing
data = data.replace(to_replace="?",value=np.nan)
data = data.dropna()
# Determine eigenvalue , The target
x = data.iloc[:,1:-1]
y = data["Class"]
# Split data
x_train,x_test,y_train,y_test = train_test_split(x,y,random_state=22,test_size=0.2)
# Feature Engineering Standardization
transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)
x_test = transfer.transform(x_test)
# machine learning
estmator = LogisticRegression()
estmator.fit(x_train,y_train)
# Model to evaluate
print(" Accuracy rate :\n",estmator.score(x_test,y_test))
print(" Predictive value :\n",estmator.predict(x_test))
Evaluation method
Accuracy and recall rate
Confusion matrix
The accuracy formula we used before is :(TP+TN)/(TP+Fn+FP+TN)
Accuracy (Precision) And recall rate (Recall)
Accuracy :(TP)/(TP+FP)
Recall rate :(TP)/(TP+FN)
F1-score
Classification assessment report api
from sklearn.metrics import classification_report
y_pre = estmator.predict(x_test)
ret = classification_report(y_test,y_pre,labels=(2,4),target_names=(" Benign "," Malignant "))
print(ret)
ROC Curve and AUC indicators
TPR And FPR
ROC curve
AUC indicators
AUC Calculation API
from sklearn.metrics import roc_auc_score
y_test = np.where(y_test>3,1,0)
roc_auc_score(y_test,y_pre)
Solve the problem of category imbalance
pip3 install imbalanced-learn
Prepare category imbalance data
from sklearn.datasets import make_classification
import matplotlib.pylab as plt
from collections import Counter
X,Y = make_classification(n_samples=5000,
n_features=2, # The number of features = n_informative()+ n_redundant()+ n_repeated()
n_informative=2,# Number of multi-information features
n_redundant=0,# Redundant information ,informative Random linear combination of features
n_repeated=0,# Duplicate information , Random extraction n_informative and n_redundant features
n_classes=3,# Classification categories
n_clusters_per_class=1,# A certain category is composed of several cluster Composed of
weights=[0.01,0.05,0.94],# List the type , Weight ratio
random_state=0)
X,Y,X.shape
Counter(y)
# Data visualization
plt.scatter(X[:,0],X[:,1],c=Y)
plt.show()
terms of settlement
Oversampling method
Random oversampling method
from imblearn.over_sampling import RandomOverSampler
ros = RandomOverSampler(random_state=0)
X_resampled,Y_resampled = ros.fit_resample(X,Y)
Counter(Y_resampled)
# Data visualization
plt.scatter(X_resampled[:,0],X_resampled[:,1],c=Y_resampled)
plt.show()
Oversampling representative algorithm -SMOTE
from imblearn.over_sampling import SMOTE
X_resampled,Y_resampled = SMOTE().fit_resample(X,Y)
Counter(Y_resampled)
# Data visualization
plt.scatter(X_resampled[:,0],X_resampled[:,1],c=Y_resampled)
plt.show()
Under sampling method
Random undersampling method
from imblearn.under_sampling import RandomUnderSampler
rus = RandomUnderSampler(random_state=0)
X_resampled,Y_resampled = rus.fit_resample(X,Y)
Counter(Y_resampled)
# Data visualization
plt.scatter(X_resampled[:,0],X_resampled[:,1],c=Y_resampled)
plt.show()
边栏推荐
猜你喜欢
rsync远程同步
orgchart. JS organization chart, presenting structural data in an elegant way
《牛客刷verilog》Part III Verilog企业真题
How to design API return code (error code)?
【原创】程序员团队管理的核心是什么?
Rasa 3. X learning series -rasa x Community Edition (Free Edition) changes
14种神笔记方法,只需选择1招,让你的学习和工作效率提高100倍!
Fiddler Everywhere 3.2.1 Crack
Dynamic planning: robbing families and houses
98. Verify the binary search tree ●●
随机推荐
Practice of concurrent search
开源crm客户关系统管理系统源码,免费分享
3D reconstruction of point cloud
Golang code checking tool
成为程序员的你,后悔了吗?
Mathematical formula screenshot recognition artifact mathpix unlimited use tutorial
21.PWM应用编程
Creative mode 1 - single case mode
Live tiktok shop 2022 latest gameplay card slot overseas live e-commerce new traffic
(4)UART应用设计及仿真验证2 —— RX模块设计(无状态机)
QCombox(重写)+QCompleter(自动补全,自动加载qcombox的下拉选项,设置背景颜色)
Difference between out of band and in band
Différence entre hors bande et en bande
How to insert data into MySQL database- How can I insert data into a MySQL database?
Switching power supply buck circuit CCM and DCM working mode
C# 文件与文件夹操作
Rasa 3. X learning series -rasa x Community Edition (Free Edition) changes
帶外和帶內的區別
如何让同步/刷新的图标(el-icon-refresh)旋转起来
Go语言实现原理——Map实现原理