当前位置:网站首页>Machine learning practice - integrated learning-23
Machine learning practice - integrated learning-23
2022-07-28 12:49:00 【gemoumou】
Integrated learning - Prediction of the rescue of the Titanic crew

import pandas
titanic = pandas.read_csv("titanic_train.csv")
titanic

# spare age Fill the whole age The median of
titanic["Age"] = titanic["Age"].fillna(titanic["Age"].median())
print(titanic.describe())


print(titanic["Sex"].unique())
# hold male become 0, hold female become 1
titanic.loc[titanic["Sex"] == "male", "Sex"] = 0
titanic.loc[titanic["Sex"] == "female", "Sex"] = 1


print(titanic["Embarked"].unique())
# Data filling
titanic["Embarked"] = titanic["Embarked"].fillna('S')
# Change categories into numbers
titanic.loc[titanic["Embarked"] == "S", "Embarked"] = 0
titanic.loc[titanic["Embarked"] == "C", "Embarked"] = 1
titanic.loc[titanic["Embarked"] == "Q", "Embarked"] = 2


from sklearn.preprocessing import StandardScaler
# Selected features
predictors = ["Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", "Embarked"]
x_data = titanic[predictors]
y_data = titanic["Survived"]
# Data standardization
scaler = StandardScaler()
x_data = scaler.fit_transform(x_data)
Logical regression
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
# Logistic regression model
LR = LogisticRegression()
# Calculate the error of cross validation
scores = model_selection.cross_val_score(LR, x_data, y_data, cv=3)
# Averaging
print(scores.mean())

neural network
from sklearn.neural_network import MLPClassifier
# modeling
mlp = MLPClassifier(hidden_layer_sizes=(20,10),max_iter=1000)
# Calculate the error of cross validation
scores = model_selection.cross_val_score(mlp, x_data, y_data, cv=3)
# Averaging
print(scores.mean())

KNN
from sklearn import neighbors
knn = neighbors.KNeighborsClassifier(21)
# Calculate the error of cross validation
scores = model_selection.cross_val_score(knn, x_data, y_data, cv=3)
# Averaging
print(scores.mean())

Decision tree
from sklearn import tree
# Decision tree model
dtree = tree.DecisionTreeClassifier(max_depth=5, min_samples_split=4)
# Calculate the error of cross validation
scores = model_selection.cross_val_score(dtree, x_data, y_data, cv=3)
# Averaging
print(scores.mean())

Random forests
# Random forests
from sklearn.ensemble import RandomForestClassifier
RF1 = RandomForestClassifier(random_state=1, n_estimators=10, min_samples_split=2)
# Calculate the error of cross validation
scores = model_selection.cross_val_score(RF1, x_data, y_data, cv=3)
# Averaging
print(scores.mean())

RF2 = RandomForestClassifier(n_estimators=100, min_samples_split=4)
# Calculate the error of cross validation
scores = model_selection.cross_val_score(RF2, x_data, y_data, cv=3)
# Averaging
print(scores.mean())

Bagging
from sklearn.ensemble import BaggingClassifier
bagging_clf = BaggingClassifier(RF2, n_estimators=20)
# Calculate the error of cross validation
scores = model_selection.cross_val_score(bagging_clf, x_data, y_data, cv=3)
# Averaging
print(scores.mean())

Adaboost
from sklearn.ensemble import AdaBoostClassifier
# AdaBoost Model
adaboost = AdaBoostClassifier(bagging_clf,n_estimators=10)
# Calculate the error of cross validation
scores = model_selection.cross_val_score(adaboost, x_data, y_data, cv=3)
# Averaging
print(scores.mean())

Stacking
from sklearn.ensemble import VotingClassifier
from mlxtend.classifier import StackingClassifier
sclf = StackingClassifier(classifiers=[bagging_clf, mlp, LR],
meta_classifier=LogisticRegression())
sclf2 = VotingClassifier([('adaboost',adaboost), ('mlp',mlp), ('LR',LR),('knn',knn),('dtree',dtree)])
# Calculate the error of cross validation
scores = model_selection.cross_val_score(sclf2, x_data, y_data, cv=3)
# Averaging
print(scores.mean())

Integrated learning - Breast cancer prediction

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
df = pd.read_csv("data.csv")
df.head()


df = df.drop('id', axis=1)
df.diagnosis.unique()

df['diagnosis'] = df['diagnosis'].map({
'M':1,'B':0})
df.head()

df.describe()

# Draw a heat map , The value is the correlation coefficient between the two variables
plt.figure(figsize=(20,20))
p=sns.heatmap(df.corr(), annot=True ,square=True)
plt.show()

# View label distribution
print(df.diagnosis.value_counts())
# Use the histogram to draw the statistics of the number of labels
p=df.diagnosis.value_counts().plot(kind="bar")
plt.show()

# Get training data and tags
x_data = df.drop(['diagnosis'], axis=1)
y_data = df['diagnosis']
from sklearn.model_selection import train_test_split
# Sharding data sets ,stratify=y Represents the ratio of data types in the training set and test set after segmentation to that before segmentation y The proportion is the same
# For example, before segmentation y in 0 and 1 The proportion of 1:2, After cutting y_train and y_test in 0 and 1 The proportion is also 1:2
x_train,x_test,y_train,y_test = train_test_split(x_data, y_data, test_size=0.3, stratify=y_data)
from sklearn.metrics import accuracy_score
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, BaggingClassifier
classifiers = [
KNeighborsClassifier(3),
LogisticRegression(),
MLPClassifier(hidden_layer_sizes=(20,50),max_iter=10000),
DecisionTreeClassifier(),
RandomForestClassifier(max_depth=9,min_samples_split=3),
AdaBoostClassifier(),
BaggingClassifier(),
]
log = []
for clf in classifiers:
clf.fit(x_train, y_train)
name = clf.__class__.__name__
print("="*30)
print(name)
print('****Results****')
test_predictions = clf.predict(x_test)
acc = accuracy_score(y_test, test_predictions)
print("Accuracy: {:.4%}".format(acc))
log.append([name, acc*100])
print("="*30)

log = pd.DataFrame(log)
log

log.rename(columns={
0: 'Classifier', 1:'Accuracy'}, inplace=True)
sns.barplot(x='Accuracy', y='Classifier', data=log, color="b")
plt.xlabel('Accuracy %')
plt.title('Classifier Accuracy')
plt.show()

边栏推荐
- Use json.stringify() to format data
- Science heavyweight: AI design protein has made another breakthrough, and it can design specific functional proteins
- LeetCode394 字符串解码
- Uniapp 应用开机自启插件 Ba-Autoboot
- Under what circumstances can the company dismiss employees
- Developing NES game (cc65) 03 and VRAM buffer with C language
- The input string contains an array of numbers and non characters, such as a123x456. Take the consecutive numbers as an integer, store them in an array in turn, such as 123 in a[0], 456 in a[1], and ou
- 机器学习实战-决策树-22
- VS1003调试例程
- sqli-labs(less-8)
猜你喜欢

leetcode 1518. 换酒问题

New Oriental's single quarter revenue was 524million US dollars, a year-on-year decrease of 56.8%, and 925 learning centers were reduced

洪九果品通过聆讯:5个月经营利润9亿 阿里与中国农垦是股东

Did kafaka lose the message

恋爱男女十禁

开源社区三十年 | 2022 开放原子全球开源峰会开源社区三十年专题活动圆满召开

Leetcode: array

C for循环内定义int i变量出现的重定义问题

Analysis of new retail e-commerce o2o model

04 pyechars 地理图表(示例代码+效果图)
随机推荐
Uninstall Navicat: genuine MySQL official client, really fragrant!
Under what circumstances can the company dismiss employees
Four authentic postures after suffering and trauma, Zizek
十三. 实战——常用依赖的作用
机器学习实战-逻辑回归-19
设计一个线程池
Uniapp 应用开机自启插件 Ba-Autoboot
Fastjson parses multi-level JSON strings
Library automatic reservation script
leetcode 376. Wiggle Subsequence
Developing NES games with C language (cc65) 02. What is v-blank?
SuperMap game engine license module division
05 pyechars 基本图表(示例代码+效果图)
合并表格行---三层for循环遍历数据
DART 三维辐射传输模型申请及下载
sqli-labs(less-8)
云原生—运行时环境
scala 转换、过滤、分组、排序
AVL tree (balanced search tree)
03 pyechars 直角坐标系图表(示例代码+效果图)