当前位置:网站首页>多元线性回归(sklearn法)
多元线性回归(sklearn法)
2022-07-05 08:42:00 【python-码博士】
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
# SVR LinearSVR 回归
# SVC LinearSVC 分类
# 流程
# 1. 获取数据
data = pd.read_csv('./data.csv')
# 2. 数据探索
# print(data.columns)
# print(data.describe())
# 3. 数据清洗
# 特征分为3组
features_mean = list(data.columns[2:12]) #平均值数据
features_se = list(data.columns[12:22]) #标准差数据
# ID列删除
data.drop('id',axis=1,inplace=True)
# 将B良性替换为0,M恶性替换为1
data['diagnosis'] = data['diagnosis'].map({
'M':1,'B':0})
print(data.head(5))
# 4. 特征选择
# 目的 降维
sns.countplot(data['diagnosis'],label='Count')
plt.show()
# 热力图features_mean 字段间的相关性
corr = data[features_mean].corr()
plt.figure(figsize=(14,14))
sns.heatmap(corr,annot=True)
plt.show()
# 特征选择 平均值这组 10--→6
features_remain = ['radius_mean', 'texture_mean', 'smoothness_mean', 'compactness_mean', 'symmetry_mean','fractal_dimension_mean']
# 模型训练
# 抽取30%数据作为测试集
train,test = train_test_split(data,test_size=0.3)
train_x = train[features_mean]
train_y = train['diagnosis']
test_x = test[features_mean]
test_y = test['diagnosis']
# 数据规范化
ss = StandardScaler()
train_X = ss.fit_transform(train_x)
test_X = ss.transform(test_x)
# 创建svm分类器
model = svm.SVC()
#参数
# kernel核函数选择
# 1.linear 线性核函数 数据线性可分情况下
# 2.poly 多项式核函数 将数据从低维空间映射到高维空间 但是参数比较多,计算量比较大
# 3.rbf 高斯核函数 将样本映射到高维空间 参数少 性能不错 默认
# 4.sigmoid sigmoid核函数 蛇精网络的映射中 SVM实现多层神经网络
# c目标函数的惩罚系数
# gamma 核函数系数 默认为样本特征数的倒数
# 训练数据
model.fit(train_x,train_y)
# 6. 模型评估
pred = model.predict(test_x)
print('准确率:',accuracy_score(test_y,pred))
边栏推荐
- 696. Count binary substring
- 猜谜语啦(10)
- Old Wang's esp8266 and old Wu's ws2818 light strip
- Lori remote control LEGO motor
- 287. Looking for repeats - fast and slow pointer
- The first week of summer vacation
- Reasons for the insecurity of C language standard function scanf
- Basic number theory - factors
- C# LINQ源码分析之Count
- Digital analog 2: integer programming
猜你喜欢
TypeScript手把手教程,简单易懂
C# LINQ源码分析之Count
猜谜语啦(3)
实例009:暂停一秒输出
Hello everyone, welcome to my CSDN blog!
Halcon blob analysis (ball.hdev)
Business modeling | process of software model
[matlab] matlab reads and writes Excel
实例001:数字组合 有四个数字:1、2、3、4,能组成多少个互不相同且无重复数字的三位数?各是多少?
Xrosstools tool installation for X-Series
随机推荐
STM32---ADC
[three tier architecture]
Guess riddles (142)
Esphone Feixun DC1 soft change access homeassstant
Go dependency injection -- Google open source library wire
Warning: retrying occurs during PIP installation
Shell script
關於線性穩壓器的五個設計細節
猜谜语啦(3)
[noi simulation] juice tree (tree DP)
Halcon affine transformations to regions
Digital analog 1: linear programming
第十八章 使用工作队列管理器(一)
【NOI模拟赛】汁树(树形DP)
剑指 Offer 06. 从尾到头打印链表
Example 010: time to show
Wheel 1:qcustomplot initialization template
猜谜语啦(7)
696. Count binary substring
leetcode - 445. Add two numbers II