当前位置:网站首页>How to Load Data from CSV (Data Preparation Part)
How to Load Data from CSV (Data Preparation Part)
2022-06-11 21:03:00 【Dreamer DBA】
- How to load a CSV file
- How to convert strings from a file to floating point numbers.
- How to convert class values from a file to integers.
1.2 Tutorial
- Load a file
- Load a file and convert Strings to Floats
- Load a file and convert Strings to Integers.
# Function for loading a CSV
# load a CSV file
from csv import reader
def load_csv(filename):
file = open(filename,"r")
lines = reader(file)
dataset = list(lines)
return dataset
load_csv('pima-indians-diabetes.data.csv')# Example of Loading the Pima Indians Diabetes Dataset CSV File
# Example of loading Pima Indians CSV dataset
from csv import reader
# Load a csv file
def load_csv(filename):
file = open(filename,"r")
lines = reader(file)
dataset = list(lines)
return dataset
# Load dataset
filename = 'pima-indians-diabetes.data.csv'
dataset = load_csv(filename)
print('Loaded data file {0} with {1} rows and {2} columns'.format(filename,len(dataset),len(dataset[0])))Sample output from loading the Pima Indians Diabetes dataset CSV file.

A limitation of this function is that it will load empty lines from data files and add them to our list of rows. Below is the updated example with the new improved version of the load_csv () function
# Improved Example of Loading the Pima Indians Diabetes Dataset CSV File
# Example of loading Pima Indian CSV dataset
from csv import reader
# Load a CSV file
def load_csv(filename):
dataset = list()
with open(filename, 'r') as file:
csv_reader = reader(file)
for row in csv_reader:
if not row:
continue
dataset.append(row)
return dataset
# Load dataset
filename = 'pima-indians-diabetes.data.csv'
dataset = load_csv(filename)
print('Loaded data file {0} with {1} rows and {2} columns'.format(filename,len(dataset),len(dataset[0])))Sample Output From Loading the Pima Indians Diabetes Dataset CSV File

1.2 Convert String to Floats
if not all machine learning algorithms prefer to work with numbers. Specifically, floating point numbers are prefered.Our code for loading a CSV file returns a dataset as a list of lists. but each value is a string. We can see if we print out one record from the dataset:
print(dataset[0])
We can write a small function to convert specific columns of our loaded dataset to floating point values.Below is this function called str_column_to_float(). It will convert a given column in the dataset to floating point values, careful to strip any whitespace from the value befor making the conversion.
def str_column_to_float(dataset, column):
for row in dataset:
row[column] = float(row[column].strip())We can test this function by combining it with our load CSV function above, and convert all of the numeric data in the Pima Indians dataset to floating point values. The complete example is below.
# Example of converting string variables to float
from csv import reader
# Load a CSV file
def load_csv(filename):
dataset = list()
with open(filename, 'r') as file:
csv_reader = reader(file)
for row in csv_reader:
if not row:
continue
dataset.append(row)
return dataset
# Convert string column to float
def str_column_to_float(dataset, column):
for row in dataset:
row[column] = float(row[column].strip())
# Load pima-indians-diabetes dataset
filename = 'pima-indians-diabetes.data.csv'
dataset = load_csv(filename)
print('Loaded data file {0} with {1} rows and {2} columns'.format(filename,len(dataset),len(dataset[0])))
print(dataset[0])
# convert string columns to float
for i in range(len(dataset[0])):
str_column_to_float(dataset,i)
print(dataset[0])Running this example we see the first row of the dataset printed both before and after the conversion. We can see that the values in each column have been converted from strings to numbers.

Some machine learning algorithms prefer all values to be numeric, including the outcome or predicted value. We can convert the class value in the iris flowers dataset to an integer by creating a map.
- First, we locate all of the unique class values, which happen to be: Iris-setosa, Iris-versicolor and Iris-virginica.
- Next, we assign an integer value to each, such as: 0, 1 and 2.
- Finally, we replace all occurrences of class string values with their corresponding integer values.
Below is a function to do just that called str_column_to_int(). Like the previously introduced str_column_to_float() it operates on a single column in the dataset.
# Example of integer encoding string class values
from csv import reader
# Load a CSV file
def load_csv(filename):
dataset = list()
with open(filename, 'r') as file:
csv_reader = reader(file)
for row in csv_reader:
if not row:
continue
dataset.append(row)
return dataset
# Convert string column to float
def str_column_to_float(dataset, column):
for row in dataset:
row[column] = float(row[column].strip())
# Convert string column to float
def str_column_to_float(dataset,column):
for row in dataset:
row[column] = float(row[column].strip())
# Convert string column to integer
def str_column_to_int(dataset, column):
class_values = [row[column] for row in dataset]
unique = set(class_values)
lookup = dict()
for i, value in enumerate(unique):
lookup[value] = i
for row in dataset:
row[column] = lookup[row[column]]
return lookup
# Load iris dataset
filename = 'iris.csv'
dataset = load_csv(filename)
print('Loaded data file {0} with {1} rows and {2} columns'.format(filename,len(dataset),len(dataset[0])))
print(dataset[0])
# convert string columns to float
for i in range(4):
str_column_to_int(dataset,4)
# convert class column to int
lookup = str_column_to_int(dataset, 4)
print(dataset[0])
print(lookup)
边栏推荐
- 电竞网咖用2.5G网卡,体验飞一般的感觉!
- 10 R vector operation construction
- php pcntl_fork 创建多个子进程解析
- JMeter load test finds the maximum number of concurrent users (including step analysis)
- JVM之对象创建过程
- 第一部分 物理层
- 13 r basic exercises
- Lr-link Lianrui makes its debut at the digital Expo with new products - helping the construction of new infrastructure data center
- Goland中在文件模板中为go文件添加个人声明
- The input value "18-20000hz" is incorrect. The setting information is incomplete. Please select a company
猜你喜欢

Online excel file parsing and conversion to JSON format

Which Bluetooth headset is better within 500? Inventory of gifts for girls' Day

Chinese text classification based on CNN

Docker installing MySQL

Final examination of Dialectics of nature 1

Teach you how to grab ZigBee packets through cc2531 and parse encrypted ZigBee packets
![[data visualization] use Apache superset to visualize Clickhouse data](/img/4b/a73c2eb810f1d2b492e950afb2d0bc.png)
[data visualization] use Apache superset to visualize Clickhouse data

UDP、TCP

PHP strtotime 获取自然月误差问题解决方案

【数据可视化】Apache Superset 1.2.0教程 (三)—— 图表功能详解
随机推荐
12 date and time in R
产品资讯|PoE网卡家族集体亮相,机器视觉完美搭档!
Implement AOP and interface caching on WPF client
Compilation process of program
Release of version 5.6 of rainbow, add multiple installation methods, and optimize the topology operation experience
ORA-04098: trigger ‘xxx.xxx‘ is invalid and failed re-validation
Js 监听滚动触底加载更多_浏览器滚动触底加载更多
成长的12条黄金法则
Unity截屏
My collection of scientific research websites
Go语言函数
var 和 let的区别_let 和 var的区别
[data visualization] use Apache superset to visualize Clickhouse data
浅谈UGUI中Canvas RectTransform的Scale
Wechat applet Bluetooth development
Figure guessing game
JVM之对象创建过程
Unity screenshot
Unity package manager starting server stuck
【数据可视化】Apache Superset 1.2.0教程 (三)—— 图表功能详解