当前位置:网站首页>良/恶性乳腺肿瘤预测(逻辑回归分类器)
良/恶性乳腺肿瘤预测(逻辑回归分类器)
2022-06-27 20:49:00 【别团等shy哥发育】
乳腺肿瘤预测
案例:良/恶性乳腺肿瘤预测
1.1 简介
本案例使用逻辑回归分类器对乳腺肿瘤进行良性/恶性预测,并对预测模型进行指标测算与评价。
这里数据集采用乳腺癌数据集,原始的数据集下载地址为:https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data。
数据特征如下:

1.2 代码
将每个属性的特征量化为1~10的数值进行表示。首先,导入数据并显示前5条数据。
1.2.1 导入数据集
import pandas as pd
import numpy as np
column_names=['number','Cl_Thickness','Unif_cell_size','Unif_cell_shape','Marg_Adhesion','Sing_epith_cell_size','Bare_nuclei','Bland_chromation','Norm_nuclei','Mitoses','Class']
data=pd.read_csv('breast-cancer-wisconsin.data',names=column_names)
display(data.head())

1.2.2 浏览数据的基本信息
data.info()

1.2.3 查看数据的基本统计信息
data.describe()

1.2.4 统计数据属性中的缺失值
data.isnull().sum()

如果存在缺失数据,需要丢弃或填充。该数据集中并没有缺失值。这里我们采取删除缺失值的方法
data=data.replace(to_replace='?',value=np.nan)
data=data.dropna(how='any')
print(data.shape)

1.2.5 将数据集划分为训练集和测试集
from sklearn.model_selection import train_test_split
# 划分训练集与测试集
X_train,X_test,y_train,y_test=train_test_split(data[column_names[1:10]],data[column_names[10]],test_size=0.25,random_state=33)
print('训练样本的数量和类别分布:\n',y_train.value_counts())

1.2.6 标准化数据
每个维度的特征数据方差为1,均值为0,使得预测结果不会被某些维度过大的特征值主导。
from sklearn.preprocessing import StandardScaler
ss=StandardScaler()
X_train=ss.fit_transform(X_train)
X_test=ss.transform(X_test)
print(X_train.mean())

1.2.7 分别用LogisticRegression与SGDClassifier构建分类器
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import SGDClassifier
lr=LogisticRegression()
sgdc=SGDClassifier()
lr.fit(X_train,y_train)
lr_y_predict=lr.predict(X_test)
sgdc.fit(X_train,y_train)
sgdc_y_predict=sgdc.predict(X_test)
1.2.8 分析LR分类器性能
from sklearn.metrics import classification_report
print('Accuracy of LR Classifier:',lr.score(X_test,y_test))
print(classification_report(y_test,lr_y_predict,target_names=['Benign','Malignant']))

1.2.9 SGD分类器性能分析
print('Accuracy of SGD Classifier:',sgdc.score(X_test,y_test))
print(classification_report(y_test,sgdc_y_predict,target_names=['Benign','Malignant']))
# print(classification_report(y_test,sgdc_y_predict))

precision 精确率
recall 召回率
f1_score F1值
macro avg 宏观平均值
weighted avg 加权平均值
边栏推荐
- Aggregation and index optimization of mongodb basic operations
- 游戏手机平台简单介绍
- Consumer finance app user insight in the first quarter of 2022 - a total of 44.79 million people
- 陈云pytorch学习笔记_用50行代码搭建ResNet
- 2022年第一季度“广州好人”刘磊峰:具有强烈的诚信意识和食品安全意识
- webService
- Redis principle - string
- Follow the archiving tutorial to learn rnaseq analysis (IV): QC method for de analysis using deseq2
- Practice torch FX: pytorch based model optimization quantization artifact
- Azure Kinect DK 实现三维重建 (jetson实时版)
猜你喜欢

【经典干货书】数据科学中的信息理论方法,561页pdf

微服務之服務網關

Discuz小鱼游戏风影传说商业GBK+UTF8版模板/DZ游戏网站模板

Mysql database experiment report (I)

fiddler 监听不到接口怎么办

UESTC (shenhengtao team) & JD AI (Mei Tao team) proposed a structured dual stream attention network for video Q & A, with performance SOTA! Better than the method based on dual video representation!

官宣!Apache Doris 从 Apache 孵化器毕业,正式成为 Apache 顶级项目!

Spark bug practice (including bug:classcastexception; connectexception; NoClassDefFoundError; runtimeException, etc.)

Azure Kinect DK 实现三维重建 (PC非实时版)

如何设置企业微信群机器人定时发消息?
随机推荐
Follow the archiving tutorial to learn rnaseq analysis (III): count standardization using deseq2
Summary of solutions to cross system data consistency problems
这届考生,报志愿比高考更“拼命”
Livox Lidar+海康Camera实时生成彩色点云
Senior headhunting team manager: interviewed 3000 consultants, summarized and organized 8 commonalities (Mao Sheng)
Summary of various loams (laser SLAM)
Follow the archiving tutorial to learn rnaseq analysis (IV): QC method for de analysis using deseq2
Aggregation and index optimization of mongodb basic operations
雪糕还是雪“高”?
用pytorch进行CIFAR-10数据集分类
消除el-image图片周围间隙
6G显卡显存不足出现CUDA Error:out of memory解决办法
How to use RPA to achieve automatic customer acquisition?
PE buys a underwear company
[js]var, let, const
This year's examinees are more "desperate" than the college entrance examination
Livox lidar+ Hikvision camera real-time 3D reconstruction based on loam to generate RGB color point cloud
最虚的华人首富更虚了
webService
Discuz小鱼游戏风影传说商业GBK+UTF8版模板/DZ游戏网站模板