当前位置:网站首页>Pytoch framework exercise (based on kaggle Titanic competition)
Pytoch framework exercise (based on kaggle Titanic competition)
2022-07-25 15:41:00 【whut_ L】
Catalog
2 be based on Pytorch Frame building FCNN
3 Model testing and generation of submission data
1 Data preprocessing
## Training set and test set data preprocessing
# Import third-party library
import numpy as np
import pandas as pd
import math
# Data preprocessing functions
def Data_Processing(Input_Dataset, Dataset_Type):
Dataset = pd.read_csv(Input_Dataset) # Import dataset
print(Dataset.describe()) # Data set description
print('The Feature of the Dataset is:', Dataset.head(n = 0))
print('The shape of Dataset is:', Dataset.shape) # Dataset size
# Age Missing value fill
Dataset['Age'] = Dataset['Age'].fillna(Dataset['Age'].mean()) # Mean filling
# Fare Missing value fill
Dataset['Fare'] = Dataset['Fare'].fillna(Dataset['Fare'].mean()) # Mean filling
# Embarked Fill and map
#print(Dataset['Embarked'].unique()) # 'Embarked' Take the possible result :['S' 'C' 'Q' nan]
#print(Dataset['Embarked'].mode()) # 'Embarked' Mode of column ,‘S’
Mode = Dataset['Embarked'].mode() # The number of
Dataset['Embarked'] = Dataset['Embarked'].fillna(Mode[0]) # Mode filling
Dataset.loc[Dataset['Embarked']=='S','Embarked'] = 0 # mapping :‘S’ -> 0
Dataset.loc[Dataset['Embarked']=='C','Embarked'] = 1 # mapping :‘C’ -> 1
Dataset.loc[Dataset['Embarked']=='Q','Embarked'] = 2 # mapping :‘Q’ -> 2
# Sex mapping ‘male’ -> 0 ‘female’ -> 1
Dataset.loc[Dataset['Sex']== 'male', 'Sex'] = 0 #loc usage , Get data by label , The first parameter is line , The second parameter is the column
Dataset.loc[Dataset['Sex']== 'female', 'Sex'] = 1
if Dataset_Type == 0: # The data set is a training set
Features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked', 'Survived'] # Training set selected Feature
train_Dataset = Dataset[Features]
train_Dataset.to_csv('Train_Dataset1.csv', index = False) # preservation , Ignore index
elif Dataset_Type == 1: # The data set is the test set
Features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked'] # Test set selected Feature
Test_Dataset = Dataset[Features]
Test_Dataset.to_csv('Test_Dataset1.csv', index = False) # preservation , Ignore index
Input_Dataset = 'train.csv' # Processing training sets
Data_Processing(Input_Dataset , 0) # 0 Represents a training set
Input_Dataset = 'test.csv' # Processing test sets
Data_Processing(Input_Dataset , 1) # 1 Represents a test set 2 be based on Pytorch Frame building FCNN
## be based on Pytorch Frame training model
# Import third-party library
import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
import matplotlib.pyplot as plt
# Training set processing
class Titanic_Train_Dataset(Dataset):
def __init__(self, filepath):
xy = np.loadtxt(filepath, delimiter = ',', dtype = np.float32, skiprows = 1) # file name , Data separator , data type
self.len = xy.shape[0] # Calculate the number of samples
self.x_data = torch.from_numpy(xy[:, :-1]) # Read seven columns of characteristic data
self.y_data = torch.from_numpy(xy[:, [-1]]) # Read the last column of tag data
def __getitem__(self, index):
return self.x_data[index], self.y_data[index] # Returns the data of the specified index
def __len__(self):
return self.len # The number of samples of the returned data
# All connected neural networks
class FCNN_Model(torch.nn.Module):
def __init__(self):
super(FCNN_Model, self).__init__()
self.linear1 = torch.nn.Linear(7, 5)
self.linear2 = torch.nn.Linear(5, 3)
self.linear3 = torch.nn.Linear(3, 1)
self.sigmoid = torch.nn.Sigmoid()
def forward(self, x):
x = self.sigmoid(self.linear1(x)) # Activation function
x = self.sigmoid(self.linear2(x))
x = self.sigmoid(self.linear3(x))
return x
def Mode_Train(Epoch, Epoch_loss): # Model training
for epoch in range(Epoch):
for i, data in enumerate(train_loader, 0):
#1.Prepare data Take the data
inputs, labels = data # Characteristic data column , Label data column
# 2.Forward Forward direction
y_pred = model(inputs)
#print(y_pred)
loss = criterion(y_pred, labels) # Calculate the loss
Epoch_loss[epoch] += loss.item() # same Epoch The next loss is accumulated
#3.Backward reverse
optimizer.zero_grad() # Zero gradient
loss.backward() # Calculate the gradient
#4.Updata to update
optimizer.step() # Parameter optimization
#print(epoch, Epoch_loss[epoch]) # each Epoch Cumulative losses
return Epoch_loss
dataset = Titanic_Train_Dataset('Train_Dataset1.csv') # Import dataset
train_loader = DataLoader(dataset = dataset, batch_size = 64, shuffle = True, num_workers = 0) # Data sets , batch (batch) size , Whether to disturb , Number of processes
model = FCNN_Model() # Defining models
criterion = torch.nn.BCELoss(size_average = True) # Loss calculator
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01, momentum = 0.5) # Build optimizer ,lr For learning rate ,momentum Is the impulse factor
Epoch = 2000 # Epoch size
Epoch_loss = np.zeros([Epoch, 1]) # Initialize each Epoch The loss of
Epoch_loss = Mode_Train(Epoch, Epoch_loss)
plt.plot(Epoch_loss)3 Model testing and generation of submission data
## Apply the trained model to the test set
# Define test functions
def Mode_Test(test_loader):
with torch.no_grad(): # The data does not calculate the gradient
for data in test_loader:
outputs = model(data)
return outputs # Return results
# State function , Convert passengers to ‘ Survival ’or‘ Death ’ The state of
def State(Labels):
for i, label in enumerate(Labels, 0):
if label > 0.5: # The probability is greater than 0.5 For survival
Labels[i] = 1
else: # The probability is less than 0.5 For death
Labels[i] = 0
return Labels # Return the conversion result
test_dataset = np.loadtxt('Test_Dataset1.csv', delimiter = ',', dtype = np.float32, skiprows = 1) # Import training set
print(test_dataset)
print('The Shape of test_dataset is:', np.shape(test_dataset))
test_loader = DataLoader(test_dataset, shuffle = False, batch_size = 418)
#print(np.size(test_dataset, 0))
Labels = Mode_Test(test_loader) # test result
Labels = Labels.numpy() # convert to Numpy Array
Labels = State(Labels) # Status update
Labels = Labels.astype(int) # Convert to integer data
print(Labels) # Output final results
## According to the submission template , Sort out the final submission data
Submission = pd.read_csv('gender_submission.csv') # Import submission template
Submission['Survived'] = Labels # Import the final results into the template
Submission.to_csv('Submission1.csv', index = False) # Generate submission data set 边栏推荐
- Leetcode - 379 telephone directory management system (Design)
- PAT甲级1152 Google Recruitment (20 分)
- Window system black window redis error 20creating server TCP listening socket *: 6379: listen: unknown error19-07-28
- Find out what happened in the process of new
- Flex 布局
- MySQL优化总结二
- UIDocumentInteractionController UIDocumentPickerViewController
- Pytorch学习笔记--SEResNet50搭建
- 伤透脑筋的CPU 上下文切换
- 2021江苏省赛A. Array-线段树,维护值域,欧拉降幂
猜你喜欢

IDEA—点击文件代码与目录自动同步对应

LeetCode - 379 电话目录管理系统(设计)

LeetCode - 380 O(1) 时间插入、删除和获取随机元素 (设计 哈希表+数组)

LeetCode - 622 设计循环队列 (设计)

LeetCode - 707 设计链表 (设计)

P4552 differential

JVM知识脑图分享

解决vender-base.66c6fc1c0b393478adf7.js:6 TypeError: Cannot read property ‘validate‘ of undefined问题

Leetcode - 379 telephone directory management system (Design)

window系统黑窗口redis报错20Creating Server TCP listening socket *:6379: listen: Unknown error19-07-28
随机推荐
Leetcode - 303 area and retrieval - array immutable (design prefix and array)
Icpc2021 Kunming m-violence + chairman tree
Games101 review: 3D transformation
解决vender-base.66c6fc1c0b393478adf7.js:6 TypeError: Cannot read property ‘validate‘ of undefined问题
理解“平均负载”
The difference between mouseover and mouseenter
2021江苏省赛A. Array-线段树,维护值域,欧拉降幂
Binary complement
获取键盘按下的键位对应ask码
LeetCode - 225 用队列实现栈
Pytorch学习笔记-Advanced_CNN(Using Inception_Module)实现Mnist数据集分类-(注释及结果)
Leetcode - 641 design cycle double ended queue (Design)*
matlab randint,Matlab的randint函数用法「建议收藏」
Gary marcus: learning a language is more difficult than you think
Pytorch学习笔记--SEResNet50搭建
Qtime定义(手工废物利用简单好看)
Understanding the difference between wait() and sleep()
Leetcode - 225 implements stack with queue
IDEA—点击文件代码与目录自动同步对应
Leetcode - 379 telephone directory management system (Design)