当前位置:网站首页>Machine learning practice - logistic regression-19
Machine learning practice - logistic regression-19
2022-07-28 12:49:00 【gemoumou】
Machine learning practice - Logical regression - User churn prediction



import numpy as np
train_data = np.genfromtxt('Churn-Modelling.csv',delimiter=',',dtype=np.str)
test_data = np.genfromtxt('Churn-Modelling-Test-Data.csv',delimiter=',',dtype=np.str)
x_train = train_data[1:,:-1]
y_train = train_data[1:,-1].astype(int)
x_test = test_data[1:,:-1]
y_test = test_data[1:,-1].astype(int)
x_train = np.delete(x_train,[0,1,2],axis=1)
x_test = np.delete(x_test,[0,1,2],axis=1)
x_train[:5]

y_train[:5]

# x_train[x_train=='Female'] = 0
# x_train[x_train=='Male'] = 1
from sklearn.preprocessing import LabelEncoder
labelencoder1 = LabelEncoder()
x_train[:,1] = labelencoder1.fit_transform(x_train[:,1])
x_test[:,1] = labelencoder1.transform(x_test[:,1])
labelencoder2 = LabelEncoder()
x_train[:,2] = labelencoder2.fit_transform(x_train[:,2])
x_test[:,2] = labelencoder2.transform(x_test[:,2])

x_train = x_train.astype(np.float32)
x_test = x_test.astype(np.float32)
y_train = y_train.astype(np.float32)
y_test = y_test.astype(np.float32)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

from sklearn.linear_model import LinearRegression
from sklearn.metrics import classification
LR = LinearRegression()
LR.fit(x_train,y_train)
predictions = LR.predict(x_test)
print(classification_report(y_test, predictions))

Machine learning practice - Logical regression - Diabetes prediction model


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load data
diabetes_data = pd.read_csv('diabetes.csv')
diabetes_data.head()

# Data and information
diabetes_data.info(verbose=True)

# Data description
diabetes_data.describe()

# Data shape
diabetes_data.shape

# View label distribution
print(diabetes_data.Outcome.value_counts())
# Use the histogram to draw the statistics of the number of labels
p=diabetes_data.Outcome.value_counts().plot(kind="bar")
plt.show()

# Visualizing data distribution
p=sns.pairplot(diabetes_data, hue = 'Outcome')
plt.show()

There are mainly two types of pictures drawn here , Histogram and scatter . Histogram is used for single feature comparison , When comparing different features, scatter charts are used , Show the relationship between the two features . We can find some outliers by observing the data distribution , such as Glucose glucose ,BloodPressure Blood pressure ,SkinThickness Skin thickness ,Insulin Insulin ,BMI These characteristics of body mass index should be impossible 0 It's worth it .
# Put glucose , Blood pressure , Skin thickness , Insulin , In body mass index 0 Replace with nan
colume = ['Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI']
diabetes_data[colume] = diabetes_data[colume].replace(0,np.nan)
# pip install missingno
import missingno as msno
p=msno.bar(diabetes_data)
plt.show()

# Set threshold
thresh_count = diabetes_data.shape[0]*0.8
# If the number of missing data in a column exceeds 20% It will be deleted
diabetes_data = diabetes_data.dropna(thresh=thresh_count, axis=1)
p=msno.bar(diabetes_data)
plt.show()

# Import interpolation Library
from sklearn.preprocessing import Imputer
# Missing values for numeric variables , We use the mean interpolation method to fill in the missing values
imr = Imputer(missing_values='NaN', strategy='mean', axis=0)
colume = ['Glucose', 'BloodPressure', 'BMI']
# Interpolate
diabetes_data[colume] = imr.fit_transform(diabetes_data[colume])
p=msno.bar(diabetes_data)
plt.show()

plt.figure(figsize=(12,10))
# Draw a heat map , The value is the correlation coefficient between the two variables
p=sns.heatmap(diabetes_data.corr(), annot=True)
plt.show()

# Segment data into features x And labels y
x = diabetes_data.drop("Outcome",axis = 1)
y = diabetes_data.Outcome
from sklearn.model_selection import train_test_split
# Sharding data sets ,stratify=y Represents the ratio of data types in the training set and test set after segmentation to that before segmentation y The proportion is the same
# For example, before segmentation y in 0 and 1 The proportion of 1:2, After cutting y_train and y_test in 0 and 1 The proportion is also 1:2
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3, stratify=y)
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
LR = LogisticRegression()
LR.fit(x_train,y_train)
predictions = LR.predict(x_test)
print(classification_report(y_test, predictions))

边栏推荐
- VS1003 debugging routine
- The 'name' attribute value associated with the element type 'item' cannot contain '& lt;' Character solution
- 公司在什么情况下可以开除员工
- SuperMap itablet license module division
- New progress in the implementation of the industry | the openatom openharmony sub forum of the 2022 open atom global open source summit was successfully held
- 快速读入
- What SaaS architecture design does a software architect need to know?
- leetcode 376. Wiggle Subsequence
- BA autoboot plug-in of uniapp application boot
- LeetCode394 字符串解码
猜你喜欢

Marketing play is changeable, and understanding the rules is the key!

30 years of open source community | 2022 open atom global open source summit 30 years of special activities of open source community were successfully held

快速读入

Science heavyweight: AI design protein has made another breakthrough, and it can design specific functional proteins

【萌新解题】爬楼梯

机器学习基础-支持向量机 SVM-17

非标自动化设备企业如何借助ERP系统,做好产品质量管理?

Sliding Window

西门子对接Leuze BPS_304i 笔记

What SaaS architecture design does a software architect need to know?
随机推荐
遭受痛苦和创伤后的四种本真姿态 齐泽克
机器学习实战-神经网络-21
Aopmai biological has passed the registration: the half year revenue is 147million, and Guoshou Chengda and Dachen are shareholders
C# 泛型是什么、泛型缓存、泛型约束
mysql limit 分页优化
区块反转(暑假每日一题 7)
03 pyechars 直角坐标系图表(示例代码+效果图)
C# 结构使用
[Nuxt 3] (十二) 项目目录结构 3
Sliding Window
Communication example between upper computer and Mitsubishi fn2x
SuperMap iclient3d for webgl to realize floating thermal map
LeetCode 移除元素&移动零
输入字符串,内有数字和非字符数组,例如A123x456将其中连续的数字作为一个整数,依次存放到一个数组中,如123放到a[0],456放到a[1],并输出a这些数
leetcode 1518. 换酒问题
第九章 REST 服务安全
STM32 loopback structure receives and processes serial port data
HC-05蓝牙模块调试从模式和主模式经历
Linear classifier (ccf20200901)
Four authentic postures after suffering and trauma, Zizek