当前位置:网站首页>Hands on deep learning pytorch version exercise answer - 2.2 preliminary knowledge / data preprocessing
Hands on deep learning pytorch version exercise answer - 2.2 preliminary knowledge / data preprocessing
2022-07-03 10:20:00 【Innocent^_^】
I am here jupyter notebook Completed exercises , It takes a little more time. After all, it's new , Like judging numbers 、 Judge whether the value is empty 、 Delete the specified columns , Give a reference to friends who read this book newly . First post the overall operation results , Code after

Here's the code section :
import os
os.makedirs(os.path.join('..','practice'),exist_ok=True)
practice_file=os.path.join('..','practice','student_scores.csv')
with open(practice_file,'w') as f:
f.write('stu_num,stu_name,stu_course,stu_score\n')
f.write('1,Lily,English,100\n')
f.write('1,Lily,Physics,80\n')
f.write('2,NA,Computer Science,90\n')
f.write('2,NA,Database,88\n')
f.write('NA,John,Math,99\n')
f.write('4,Lisa,NA,100\n')
f.write('5,NA,French,50\n')
f.write('6,GOGO,NA,10\n')
# Delete the column with the most missing values
import pandas as pd
practice_data=pd.read_csv(practice_file)
print(practice_data)
# Take the horizontal and longitudinal length
column_len,row_len=len(practice_data.iloc[0,:]),len(practice_data.iloc[:,0])
# Count each column nan Number of
nan_sum=[0 for i in range(column_len)]
print(" Column length :{}, Line length :{}, Statistics nan Array of :{} ".format(column_len,row_len,nan_sum))
import math
import numbers
for i in range(column_len):
for j in range(row_len):
#NaN There are two kinds of : Numbers and strings , This should be judged separately
# Note that the number one j+1 That's ok 、 The first i+1 Column
if isinstance(practice_data.iloc[j,i],numbers.Number) == True:
if math.isnan(practice_data.iloc[j,i]) == True:
nan_sum[i] += 1
else:
if practice_data.iloc[j,i]=='NaN':
nan_sum[i] += 1
max_index=[]# The reason for getting the list is that there may be multiple columns with the most nan
most_nan=max(nan_sum)
for i,j in enumerate(nan_sum):
if j == most_nan:
max_index.append(practice_data.columns.values[i])
print(" The column with the most missing values is {}".format(max_index))
print(" The column with the most missing values is not deleted :")
print(practice_data)
# Start deleting the column with the most missing values
# This drop Delete the name of the column , Remember to update
practice_data = practice_data.drop(max_index,axis=1)
print(" After deleting the column with the most missing values ")
print(practice_data)
# The preprocessed data set is transformed into tensor
practice_inputs = practice_data.iloc[:,:]
practice_inputs = pd.get_dummies(practice_inputs,dummy_na=True)
print(practice_inputs)
import torch
Z = torch.tensor(practice_inputs.values)
print(Z)
边栏推荐
- 20220601数学:阶乘后的零
- About windows and layout
- Opencv Harris corner detection
- Label Semantic Aware Pre-training for Few-shot Text Classification
- Mise en œuvre d'OpenCV + dlib pour changer le visage de Mona Lisa
- LeetCode - 705 设计哈希集合(设计)
- 20220607 others: sum of two integers
- [LZY learning notes -dive into deep learning] math preparation 2.5-2.7
- Dictionary tree prefix tree trie
- CV learning notes - deep learning
猜你喜欢

1. Finite Markov Decision Process

『快速入门electron』之实现窗口拖拽

pycharm 无法引入自定义包

Leetcode - 460 LFU cache (Design - hash table + bidirectional linked hash table + balanced binary tree (TreeSet))*

Leetcode-106:根据中后序遍历序列构造二叉树

2.2 DP: Value Iteration & Gambler‘s Problem

CV learning notes - Stereo Vision (point cloud model, spin image, 3D reconstruction)

Leetcode - 1172 plate stack (Design - list + small top pile + stack))

YOLO_ V1 summary

Vgg16 migration learning source code
随机推荐
Replace the files under the folder with sed
QT setting suspension button
CV learning notes alexnet
20220608 other: evaluation of inverse Polish expression
LeetCode - 715. Range module (TreeSet)*****
使用sed替换文件夹下文件
Flutter 退出当前操作二次确认怎么做才更优雅?
Leetcode - 1670 design front, middle and rear queues (Design - two double ended queues)
Problems encountered when MySQL saves CSV files
Wireshark use
20220603数学:Pow(x,n)
LeetCode - 508. 出现次数最多的子树元素和 (二叉树的遍历)
[C question set] of Ⅵ
LeetCode - 673. Number of longest increasing subsequences
20220602 Mathematics: Excel table column serial number
Discrete-event system
Google browser plug-in recommendation
3.3 Monte Carlo Methods: case study: Blackjack of Policy Improvement of on- & off-policy Evaluation
CV learning notes ransca & image similarity comparison hash
YOLO_ V1 summary