当前位置:网站首页>Multiple linear regression (sklearn method)
Multiple linear regression (sklearn method)
2022-07-05 08:50:00 【Python code doctor】
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
# SVR LinearSVR Return to
# SVC LinearSVC classification
# technological process
# 1. get data
data = pd.read_csv('./data.csv')
# 2. Data exploration
# print(data.columns)
# print(data.describe())
# 3. Data cleaning
# Characteristics are divided into 3 Group
features_mean = list(data.columns[2:12]) # Average data
features_se = list(data.columns[12:22]) # Standard deviation data
# ID Column delete
data.drop('id',axis=1,inplace=True)
# take B Benign replace with 0,M Malignant replace with 1
data['diagnosis'] = data['diagnosis'].map({
'M':1,'B':0})
print(data.head(5))
# 4. feature selection
# Purpose Dimension reduction
sns.countplot(data['diagnosis'],label='Count')
plt.show()
# Heat map features_mean Correlation between fields
corr = data[features_mean].corr()
plt.figure(figsize=(14,14))
sns.heatmap(corr,annot=True)
plt.show()
# feature selection Average this group 10--→6
features_remain = ['radius_mean', 'texture_mean', 'smoothness_mean', 'compactness_mean', 'symmetry_mean','fractal_dimension_mean']
# model training
# extract 30% Data as a test set
train,test = train_test_split(data,test_size=0.3)
train_x = train[features_mean]
train_y = train['diagnosis']
test_x = test[features_mean]
test_y = test['diagnosis']
# Data normalization
ss = StandardScaler()
train_X = ss.fit_transform(train_x)
test_X = ss.transform(test_x)
# establish svm classifier
model = svm.SVC()
# Parameters
# kernel Kernel function selection
# 1.linear Linear kernel function When the data is linearly separable
# 2.poly Polynomial kernel function Mapping data from low dimensional space to high dimensional space But there are many parameters , The amount of calculation is relatively large
# 3.rbf Gaussian kernel Map samples to high-dimensional space Less parameters Good performance Default
# 4.sigmoid sigmoid Kernel function In the mapping of snake spirit network SVM Realize multilayer neural network
# c Penalty coefficient of objective function
# gamma Coefficient of kernel function The default is the reciprocal of the sample feature number
# Training data
model.fit(train_x,train_y)
# 6. Model to evaluate
pred = model.predict(test_x)
print(' Accuracy rate :',accuracy_score(test_y,pred))
边栏推荐
猜你喜欢
Halcon snap, get the area and position of coins
整形的分类:short in long longlong
Guess riddles (3)
Business modeling | process of software model
[Niuke brush questions day4] jz55 depth of binary tree
Run菜单解析
Apaas platform of TOP10 abroad
How to manage the performance of R & D team?
猜谜语啦(3)
Programming implementation of subscriber node of ROS learning 3 subscriber
随机推荐
Warning: retrying occurs during PIP installation
我从技术到产品经理的几点体会
图解网络:什么是网关负载均衡协议GLBP?
Redis实现高性能的全文搜索引擎---RediSearch
Programming implementation of ROS learning 5-client node
ROS learning 4 custom message
Typescript hands-on tutorial, easy to understand
Hello everyone, welcome to my CSDN blog!
Programming implementation of ROS learning 2 publisher node
[matlab] matlab reads and writes Excel
Esphone retrofits old fans
Guess riddles (5)
C# LINQ源码分析之Count
Classification of plastic surgery: short in long long long
微信H5公众号获取openid爬坑记
golang 基础 —— golang 向 mysql 插入的时间数据和本地时间不一致
Guess riddles (142)
[daiy4] jz32 print binary tree from top to bottom
One dimensional vector transpose point multiplication np dot
Digital analog 1: linear programming