当前位置:网站首页>Machine learning practice - integrated learning-23
Machine learning practice - integrated learning-23
2022-07-28 12:49:00 【gemoumou】
Integrated learning - Prediction of the rescue of the Titanic crew

import pandas
titanic = pandas.read_csv("titanic_train.csv")
titanic

# spare age Fill the whole age The median of
titanic["Age"] = titanic["Age"].fillna(titanic["Age"].median())
print(titanic.describe())


print(titanic["Sex"].unique())
# hold male become 0, hold female become 1
titanic.loc[titanic["Sex"] == "male", "Sex"] = 0
titanic.loc[titanic["Sex"] == "female", "Sex"] = 1


print(titanic["Embarked"].unique())
# Data filling
titanic["Embarked"] = titanic["Embarked"].fillna('S')
# Change categories into numbers
titanic.loc[titanic["Embarked"] == "S", "Embarked"] = 0
titanic.loc[titanic["Embarked"] == "C", "Embarked"] = 1
titanic.loc[titanic["Embarked"] == "Q", "Embarked"] = 2


from sklearn.preprocessing import StandardScaler
# Selected features
predictors = ["Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", "Embarked"]
x_data = titanic[predictors]
y_data = titanic["Survived"]
# Data standardization
scaler = StandardScaler()
x_data = scaler.fit_transform(x_data)
Logical regression
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
# Logistic regression model
LR = LogisticRegression()
# Calculate the error of cross validation
scores = model_selection.cross_val_score(LR, x_data, y_data, cv=3)
# Averaging
print(scores.mean())

neural network
from sklearn.neural_network import MLPClassifier
# modeling
mlp = MLPClassifier(hidden_layer_sizes=(20,10),max_iter=1000)
# Calculate the error of cross validation
scores = model_selection.cross_val_score(mlp, x_data, y_data, cv=3)
# Averaging
print(scores.mean())

KNN
from sklearn import neighbors
knn = neighbors.KNeighborsClassifier(21)
# Calculate the error of cross validation
scores = model_selection.cross_val_score(knn, x_data, y_data, cv=3)
# Averaging
print(scores.mean())

Decision tree
from sklearn import tree
# Decision tree model
dtree = tree.DecisionTreeClassifier(max_depth=5, min_samples_split=4)
# Calculate the error of cross validation
scores = model_selection.cross_val_score(dtree, x_data, y_data, cv=3)
# Averaging
print(scores.mean())

Random forests
# Random forests
from sklearn.ensemble import RandomForestClassifier
RF1 = RandomForestClassifier(random_state=1, n_estimators=10, min_samples_split=2)
# Calculate the error of cross validation
scores = model_selection.cross_val_score(RF1, x_data, y_data, cv=3)
# Averaging
print(scores.mean())

RF2 = RandomForestClassifier(n_estimators=100, min_samples_split=4)
# Calculate the error of cross validation
scores = model_selection.cross_val_score(RF2, x_data, y_data, cv=3)
# Averaging
print(scores.mean())

Bagging
from sklearn.ensemble import BaggingClassifier
bagging_clf = BaggingClassifier(RF2, n_estimators=20)
# Calculate the error of cross validation
scores = model_selection.cross_val_score(bagging_clf, x_data, y_data, cv=3)
# Averaging
print(scores.mean())

Adaboost
from sklearn.ensemble import AdaBoostClassifier
# AdaBoost Model
adaboost = AdaBoostClassifier(bagging_clf,n_estimators=10)
# Calculate the error of cross validation
scores = model_selection.cross_val_score(adaboost, x_data, y_data, cv=3)
# Averaging
print(scores.mean())

Stacking
from sklearn.ensemble import VotingClassifier
from mlxtend.classifier import StackingClassifier
sclf = StackingClassifier(classifiers=[bagging_clf, mlp, LR],
meta_classifier=LogisticRegression())
sclf2 = VotingClassifier([('adaboost',adaboost), ('mlp',mlp), ('LR',LR),('knn',knn),('dtree',dtree)])
# Calculate the error of cross validation
scores = model_selection.cross_val_score(sclf2, x_data, y_data, cv=3)
# Averaging
print(scores.mean())

Integrated learning - Breast cancer prediction

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
df = pd.read_csv("data.csv")
df.head()


df = df.drop('id', axis=1)
df.diagnosis.unique()

df['diagnosis'] = df['diagnosis'].map({
'M':1,'B':0})
df.head()

df.describe()

# Draw a heat map , The value is the correlation coefficient between the two variables
plt.figure(figsize=(20,20))
p=sns.heatmap(df.corr(), annot=True ,square=True)
plt.show()

# View label distribution
print(df.diagnosis.value_counts())
# Use the histogram to draw the statistics of the number of labels
p=df.diagnosis.value_counts().plot(kind="bar")
plt.show()

# Get training data and tags
x_data = df.drop(['diagnosis'], axis=1)
y_data = df['diagnosis']
from sklearn.model_selection import train_test_split
# Sharding data sets ,stratify=y Represents the ratio of data types in the training set and test set after segmentation to that before segmentation y The proportion is the same
# For example, before segmentation y in 0 and 1 The proportion of 1:2, After cutting y_train and y_test in 0 and 1 The proportion is also 1:2
x_train,x_test,y_train,y_test = train_test_split(x_data, y_data, test_size=0.3, stratify=y_data)
from sklearn.metrics import accuracy_score
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, BaggingClassifier
classifiers = [
KNeighborsClassifier(3),
LogisticRegression(),
MLPClassifier(hidden_layer_sizes=(20,50),max_iter=10000),
DecisionTreeClassifier(),
RandomForestClassifier(max_depth=9,min_samples_split=3),
AdaBoostClassifier(),
BaggingClassifier(),
]
log = []
for clf in classifiers:
clf.fit(x_train, y_train)
name = clf.__class__.__name__
print("="*30)
print(name)
print('****Results****')
test_predictions = clf.predict(x_test)
acc = accuracy_score(y_test, test_predictions)
print("Accuracy: {:.4%}".format(acc))
log.append([name, acc*100])
print("="*30)

log = pd.DataFrame(log)
log

log.rename(columns={
0: 'Classifier', 1:'Accuracy'}, inplace=True)
sns.barplot(x='Accuracy', y='Classifier', data=log, color="b")
plt.xlabel('Accuracy %')
plt.title('Classifier Accuracy')
plt.show()

边栏推荐
- Unity loads GLB model
- 输入字符串,内有数字和非字符数组,例如A123x456将其中连续的数字作为一个整数,依次存放到一个数组中,如123放到a[0],456放到a[1],并输出a这些数
- LeetCode94. 二叉树的中序遍历
- Distributed session solution
- GMT installation and use
- Is it overtime to be on duty? Take up legal weapons to protect your legitimate rights and interests. It's time to rectify the working environment
- leetcode:704二分查找
- STM32F103 几个特殊引脚做普通io使用注意事项以及备份寄存器丢失数据问题1,2
- SuperMap iclient3d for webgl to realize floating thermal map
- 通过Jenkins 拉取服务器代码 权限不足问题及其他注意事项
猜你喜欢
随机推荐
十三. 实战——常用依赖的作用
C语言项目中使用json
GMT安装与使用
Insufficient permission to pull server code through Jenkins and other precautions
Analysis of new retail e-commerce o2o model
1331. Array sequence number conversion: simple simulation question
Use json.stringify() to format data
Under what circumstances can the company dismiss employees
[Nuxt 3] (十二) 项目目录结构 3
机器学习实战-神经网络-21
[base] what is the optimization of optimization performance?
Science 重磅:AI设计蛋白质再获突破,可设计特定功能性蛋白质
Merge sort
Leetcode: array
HC-05蓝牙模块调试从模式和主模式经历
04 pyechars 地理图表(示例代码+效果图)
Four authentic postures after suffering and trauma, Zizek
1331. 数组序号转换 : 简单模拟题
Baidu map API adds information window circularly. The window only opens at the last window position and the window information content is the same solution
Multi Chain and multi currency wallet system development cross chain technology








