当前位置:网站首页>Multiple linear regression (sklearn method)
Multiple linear regression (sklearn method)
2022-07-05 08:50:00 【Python code doctor】
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
# SVR LinearSVR Return to
# SVC LinearSVC classification
# technological process
# 1. get data
data = pd.read_csv('./data.csv')
# 2. Data exploration
# print(data.columns)
# print(data.describe())
# 3. Data cleaning
# Characteristics are divided into 3 Group
features_mean = list(data.columns[2:12]) # Average data
features_se = list(data.columns[12:22]) # Standard deviation data
# ID Column delete
data.drop('id',axis=1,inplace=True)
# take B Benign replace with 0,M Malignant replace with 1
data['diagnosis'] = data['diagnosis'].map({
'M':1,'B':0})
print(data.head(5))
# 4. feature selection
# Purpose Dimension reduction
sns.countplot(data['diagnosis'],label='Count')
plt.show()
# Heat map features_mean Correlation between fields
corr = data[features_mean].corr()
plt.figure(figsize=(14,14))
sns.heatmap(corr,annot=True)
plt.show()
# feature selection Average this group 10--→6
features_remain = ['radius_mean', 'texture_mean', 'smoothness_mean', 'compactness_mean', 'symmetry_mean','fractal_dimension_mean']
# model training
# extract 30% Data as a test set
train,test = train_test_split(data,test_size=0.3)
train_x = train[features_mean]
train_y = train['diagnosis']
test_x = test[features_mean]
test_y = test['diagnosis']
# Data normalization
ss = StandardScaler()
train_X = ss.fit_transform(train_x)
test_X = ss.transform(test_x)
# establish svm classifier
model = svm.SVC()
# Parameters
# kernel Kernel function selection
# 1.linear Linear kernel function When the data is linearly separable
# 2.poly Polynomial kernel function Mapping data from low dimensional space to high dimensional space But there are many parameters , The amount of calculation is relatively large
# 3.rbf Gaussian kernel Map samples to high-dimensional space Less parameters Good performance Default
# 4.sigmoid sigmoid Kernel function In the mapping of snake spirit network SVM Realize multilayer neural network
# c Penalty coefficient of objective function
# gamma Coefficient of kernel function The default is the reciprocal of the sample feature number
# Training data
model.fit(train_x,train_y)
# 6. Model to evaluate
pred = model.predict(test_x)
print(' Accuracy rate :',accuracy_score(test_y,pred))
边栏推荐
猜你喜欢
随机推荐
Solutions of ordinary differential equations (2) examples
Guess riddles (6)
One dimensional vector transpose point multiplication np dot
Illustration of eight classic pointer written test questions
猜谜语啦(7)
Halcon wood texture recognition
容易混淆的基本概念 成员变量 局部变量 全局变量
图解八道经典指针笔试题
ECMAScript6介绍及环境搭建
Low code platform | apaas platform construction analysis
Wechat H5 official account to get openid climbing account
GEO数据库中搜索数据
kubeadm系列-01-preflight究竟有多少check
[daily training -- Tencent selected 50] 557 Reverse word III in string
猜谜语啦(9)
Run menu analysis
Dynamic dimensions required for input: input, but no shapes were provided. Automatically overriding
12、动态链接库,dll
[daily training] 1200 Minimum absolute difference
kubeadm系列-00-overview








