当前位置:网站首页>5. Logistic regression
5. Logistic regression
2022-07-05 23:38:00 【CGOMG】
What is logical regression
Application scenarios
The principle of logical regression
Master logistic regression , You must master the following two points
- In logical regression , What is the input value
- How to judge the output of logistic regression
Input
Activation function
Measure losses
Loss
Optimize
API
Tumor prediction cases
Data is introduced
Code implementation
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
# get data
names = ['Sample code number', 'Clump Thickness', 'Uniformity of Cell Size', 'Uniformity of Cell Shape','Marginal Adhesion', 'Single Epithelial Cell Size', 'Bare Nuclei', 'Bland Chromatin','Normal Nucleoli', 'Mitoses', 'Class']
data = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data",names=names)
data.head()
# Basic data processing
# Missing value processing
data = data.replace(to_replace="?",value=np.nan)
data = data.dropna()
# Determine eigenvalue , The target
x = data.iloc[:,1:-1]
y = data["Class"]
# Split data
x_train,x_test,y_train,y_test = train_test_split(x,y,random_state=22,test_size=0.2)
# Feature Engineering Standardization
transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)
x_test = transfer.transform(x_test)
# machine learning
estmator = LogisticRegression()
estmator.fit(x_train,y_train)
# Model to evaluate
print(" Accuracy rate :\n",estmator.score(x_test,y_test))
print(" Predictive value :\n",estmator.predict(x_test))
Evaluation method
Accuracy and recall rate
Confusion matrix
The accuracy formula we used before is :(TP+TN)/(TP+Fn+FP+TN)
Accuracy (Precision) And recall rate (Recall)
Accuracy :(TP)/(TP+FP)
Recall rate :(TP)/(TP+FN)
F1-score
Classification assessment report api
from sklearn.metrics import classification_report
y_pre = estmator.predict(x_test)
ret = classification_report(y_test,y_pre,labels=(2,4),target_names=(" Benign "," Malignant "))
print(ret)
ROC Curve and AUC indicators
TPR And FPR
ROC curve
AUC indicators
AUC Calculation API
from sklearn.metrics import roc_auc_score
y_test = np.where(y_test>3,1,0)
roc_auc_score(y_test,y_pre)
Solve the problem of category imbalance
pip3 install imbalanced-learn
Prepare category imbalance data
from sklearn.datasets import make_classification
import matplotlib.pylab as plt
from collections import Counter
X,Y = make_classification(n_samples=5000,
n_features=2, # The number of features = n_informative()+ n_redundant()+ n_repeated()
n_informative=2,# Number of multi-information features
n_redundant=0,# Redundant information ,informative Random linear combination of features
n_repeated=0,# Duplicate information , Random extraction n_informative and n_redundant features
n_classes=3,# Classification categories
n_clusters_per_class=1,# A certain category is composed of several cluster Composed of
weights=[0.01,0.05,0.94],# List the type , Weight ratio
random_state=0)
X,Y,X.shape
Counter(y)
# Data visualization
plt.scatter(X[:,0],X[:,1],c=Y)
plt.show()
terms of settlement
Oversampling method
Random oversampling method
from imblearn.over_sampling import RandomOverSampler
ros = RandomOverSampler(random_state=0)
X_resampled,Y_resampled = ros.fit_resample(X,Y)
Counter(Y_resampled)
# Data visualization
plt.scatter(X_resampled[:,0],X_resampled[:,1],c=Y_resampled)
plt.show()
Oversampling representative algorithm -SMOTE
from imblearn.over_sampling import SMOTE
X_resampled,Y_resampled = SMOTE().fit_resample(X,Y)
Counter(Y_resampled)
# Data visualization
plt.scatter(X_resampled[:,0],X_resampled[:,1],c=Y_resampled)
plt.show()
Under sampling method
Random undersampling method
from imblearn.under_sampling import RandomUnderSampler
rus = RandomUnderSampler(random_state=0)
X_resampled,Y_resampled = rus.fit_resample(X,Y)
Counter(Y_resampled)
# Data visualization
plt.scatter(X_resampled[:,0],X_resampled[:,1],c=Y_resampled)
plt.show()
边栏推荐
猜你喜欢
3: Chapter 1: understanding JVM specification 2: JVM specification, introduction;
《牛客刷verilog》Part III Verilog企业真题
Hcip course notes-16 VLAN, three-tier architecture, MPLS virtual private line configuration
Rasa 3. X learning series -rasa 3.2.1 new release
开关电源Buck电路CCM及DCM工作模式
98. Verify the binary search tree ●●
开源crm客户关系统管理系统源码,免费分享
3:第一章:认识JVM规范2:JVM规范,简介;
Spire.PDF for NET 8.7.2
【原创】程序员团队管理的核心是什么?
随机推荐
LabVIEW打开PNG 图像正常而 Photoshop打开得到全黑的图像
rsync远程同步
同事悄悄告诉我,飞书通知还能这样玩
秒杀系统的设计与实现思路
AsyncSocket长连接棒包装问题解决
动态规划 之 打家劫舍
GFS Distributed File System
Live tiktok shop 2022 latest gameplay card slot overseas live e-commerce new traffic
MySQL (1) -- related concepts, SQL classification, and simple operations
MySQL (2) -- simple query, conditional query
Introduction to JVM
[Yu Yue education] NC machining technology reference materials of Shaanxi University of science and technology
俄外交部:日韩参加北约峰会影响亚洲安全稳定
asp. Net pop-up layer instance
White hat talks about web security after reading 2
LeetCode——Add Binary
11gR2 Database Services for " Policy" and " Administrator" Managed databases (file I
20. Migrate freetype font library
Creative mode 1 - single case mode
From the perspective of quantitative genetics, why do you get the bride price when you get married