当前位置:网站首页>Data processing of deep learning

Data processing of deep learning

2022-07-07 00:42:00 Peng Xiang

Data manipulation

  • data type , The most commonly used one is array

 Insert picture description here
 Insert picture description here
Creating an array requires

  • shape : A few lines and columns
  • Element type :int still float
  • Element value

Array access method :
 Insert picture description here

Code :

 Insert picture description here
 Insert picture description here
 Insert picture description here
 Insert picture description here
This mechanism works as follows : First , Extend one or two arrays by copying elements appropriately , So that after the conversion , Two tensors have the same shape . secondly , Perform a per element operation on the generated array .
 Insert picture description here
because a and b And matrix , If you add them together , Their shapes don't match . We broadcast two matrices into a larger matrix , As shown below : matrix a Columns will be copied , matrix b The row will be copied , Then add by element .

Data preprocessing

Create files and write data

import os
os.makedirs(os.path.join('.', 'data'), exist_ok=True)# Create in current directory data Folder 
data_file = os.path.join('.', 'data', 'house_tiny.csv')# stay data Create under folder house_tiny.csv
print(data_file)
with open(data_file, 'w') as f:
    f.write('NumRooms,Alley,Price\n')  #  Name 
    f.write('NA,Pave,127500\n')  #  Each row represents a data sample 
    f.write('2,NA,106000\n')
    f.write('4,NA,178100\n')
    f.write('NA,NA,140000\n')

Read the file , about csv Document multipurpose pandas This library

import pandas as pd
data=pd.read_csv(data_file)
print(data)

 Insert picture description here

Data processing missing values and conversions
For missing values , We can use two methods: insertion method and deletion method , Insertion is the value we give , Deletion is a direct deletion, which is no longer considered , Here we use the method of taking the mean value of the missing value

inputs,outputs = data.iloc[:, 0:2], data.iloc[:, 2]# Read data in file format , Read columns 1 to 2 
inputs = inputs.fillna(inputs.mean())# For the missing value, we usually take the mean value of other values 
inputs = pd.get_dummies(inputs, dummy_na=True)# about string Type we see Alley The value is only Pave and NaN, So we can put Pave Write it down as 1,NaN Write it down as 0
print(inputs)

 Insert picture description here

Transform our data into tensors

import torch
x,y=torch.tensor(inputs.values),torch.tensor(outputs.values)
print(x,y)

Come here , We will convert the data into tensor Tensor , This is processable for computers
 Insert picture description here

Complete code :

import os
os.makedirs(os.path.join('.', 'data'), exist_ok=True)
data_file = os.path.join('.', 'data', 'house_tiny.csv')
print(data_file)
with open(data_file, 'w') as f:
    f.write('NumRooms,Alley,Price\n')  #  Name 
    f.write('NA,Pave,127500\n')  #  Each row represents a data sample 
    f.write('2,NA,106000\n')
    f.write('4,NA,178100\n')
    f.write('NA,NA,140000\n')
import pandas as pd
data=pd.read_csv(data_file)
print(data)
inputs,outputs = data.iloc[:, 0:2], data.iloc[:, 2]# Read data in file format , Read columns 1 to 2 
inputs = inputs.fillna(inputs.mean())# For the missing value, we usually take the mean value of other values 
inputs = pd.get_dummies(inputs, dummy_na=True)
print(inputs)
import torch
x,y=torch.tensor(inputs.values),torch.tensor(outputs.values)
print(x,y)
原网站

版权声明
本文为[Peng Xiang]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207061654112351.html