当前位置:网站首页>How to Load Data from CSV (Data Preparation Part)
How to Load Data from CSV (Data Preparation Part)
2022-06-11 21:03:00 【Dreamer DBA】
- How to load a CSV file
- How to convert strings from a file to floating point numbers.
- How to convert class values from a file to integers.
1.2 Tutorial
- Load a file
- Load a file and convert Strings to Floats
- Load a file and convert Strings to Integers.
# Function for loading a CSV
# load a CSV file
from csv import reader
def load_csv(filename):
file = open(filename,"r")
lines = reader(file)
dataset = list(lines)
return dataset
load_csv('pima-indians-diabetes.data.csv')# Example of Loading the Pima Indians Diabetes Dataset CSV File
# Example of loading Pima Indians CSV dataset
from csv import reader
# Load a csv file
def load_csv(filename):
file = open(filename,"r")
lines = reader(file)
dataset = list(lines)
return dataset
# Load dataset
filename = 'pima-indians-diabetes.data.csv'
dataset = load_csv(filename)
print('Loaded data file {0} with {1} rows and {2} columns'.format(filename,len(dataset),len(dataset[0])))Sample output from loading the Pima Indians Diabetes dataset CSV file.

A limitation of this function is that it will load empty lines from data files and add them to our list of rows. Below is the updated example with the new improved version of the load_csv () function
# Improved Example of Loading the Pima Indians Diabetes Dataset CSV File
# Example of loading Pima Indian CSV dataset
from csv import reader
# Load a CSV file
def load_csv(filename):
dataset = list()
with open(filename, 'r') as file:
csv_reader = reader(file)
for row in csv_reader:
if not row:
continue
dataset.append(row)
return dataset
# Load dataset
filename = 'pima-indians-diabetes.data.csv'
dataset = load_csv(filename)
print('Loaded data file {0} with {1} rows and {2} columns'.format(filename,len(dataset),len(dataset[0])))Sample Output From Loading the Pima Indians Diabetes Dataset CSV File

1.2 Convert String to Floats
if not all machine learning algorithms prefer to work with numbers. Specifically, floating point numbers are prefered.Our code for loading a CSV file returns a dataset as a list of lists. but each value is a string. We can see if we print out one record from the dataset:
print(dataset[0])
We can write a small function to convert specific columns of our loaded dataset to floating point values.Below is this function called str_column_to_float(). It will convert a given column in the dataset to floating point values, careful to strip any whitespace from the value befor making the conversion.
def str_column_to_float(dataset, column):
for row in dataset:
row[column] = float(row[column].strip())We can test this function by combining it with our load CSV function above, and convert all of the numeric data in the Pima Indians dataset to floating point values. The complete example is below.
# Example of converting string variables to float
from csv import reader
# Load a CSV file
def load_csv(filename):
dataset = list()
with open(filename, 'r') as file:
csv_reader = reader(file)
for row in csv_reader:
if not row:
continue
dataset.append(row)
return dataset
# Convert string column to float
def str_column_to_float(dataset, column):
for row in dataset:
row[column] = float(row[column].strip())
# Load pima-indians-diabetes dataset
filename = 'pima-indians-diabetes.data.csv'
dataset = load_csv(filename)
print('Loaded data file {0} with {1} rows and {2} columns'.format(filename,len(dataset),len(dataset[0])))
print(dataset[0])
# convert string columns to float
for i in range(len(dataset[0])):
str_column_to_float(dataset,i)
print(dataset[0])Running this example we see the first row of the dataset printed both before and after the conversion. We can see that the values in each column have been converted from strings to numbers.

Some machine learning algorithms prefer all values to be numeric, including the outcome or predicted value. We can convert the class value in the iris flowers dataset to an integer by creating a map.
- First, we locate all of the unique class values, which happen to be: Iris-setosa, Iris-versicolor and Iris-virginica.
- Next, we assign an integer value to each, such as: 0, 1 and 2.
- Finally, we replace all occurrences of class string values with their corresponding integer values.
Below is a function to do just that called str_column_to_int(). Like the previously introduced str_column_to_float() it operates on a single column in the dataset.
# Example of integer encoding string class values
from csv import reader
# Load a CSV file
def load_csv(filename):
dataset = list()
with open(filename, 'r') as file:
csv_reader = reader(file)
for row in csv_reader:
if not row:
continue
dataset.append(row)
return dataset
# Convert string column to float
def str_column_to_float(dataset, column):
for row in dataset:
row[column] = float(row[column].strip())
# Convert string column to float
def str_column_to_float(dataset,column):
for row in dataset:
row[column] = float(row[column].strip())
# Convert string column to integer
def str_column_to_int(dataset, column):
class_values = [row[column] for row in dataset]
unique = set(class_values)
lookup = dict()
for i, value in enumerate(unique):
lookup[value] = i
for row in dataset:
row[column] = lookup[row[column]]
return lookup
# Load iris dataset
filename = 'iris.csv'
dataset = load_csv(filename)
print('Loaded data file {0} with {1} rows and {2} columns'.format(filename,len(dataset),len(dataset[0])))
print(dataset[0])
# convert string columns to float
for i in range(4):
str_column_to_int(dataset,4)
# convert class column to int
lookup = str_column_to_int(dataset, 4)
print(dataset[0])
print(lookup)
边栏推荐
- 成长的12条黄金法则
- Compilation process of program
- Black circle display implementation
- Gestionnaire de paquets d'Unit é Starting Server Stuck
- [unity plug-in] shader keyword analysis tool shadercontrol
- 产品资讯|PoE网卡家族集体亮相,机器视觉完美搭档!
- Go语言for循环
- [nk] deleted number of 100 C Xiaohong in Niuke practice match
- Go语言条件语句
- 使用 float 创建一个网页页眉、页脚、左边的内容和主要内容。
猜你喜欢

Network security Kali penetration learning introduction to web penetration using MSF penetration to attack win7 host and execute commands remotely

Yintai department store and Taobao tmall jointly create a green fashion show to help "carbon neutrality"

银泰百货与淘宝天猫联合打造绿色潮玩展,助力“碳中和”

12 date and time in R

MySQL installation free configuration tutorial under Windows mysql-5.6.51-winx64 Zip version

go语言的goto语句

从概率论基础出发推导卡尔曼滤波

Online excel file parsing and conversion to JSON format

Wechat applet | rotation chart

Weekly 02 | to tell you the truth, I am actually a student of MIT
随机推荐
【数据可视化】使用 Apache Superset 可视化 ClickHouse 数据
JVM对象分配策略TLAB
Why should I use iwarp, roce V2, nvme of and other protocols for 100g network transmission
12 date and time in R
The official announced the launch of Alibaba's 2023 global school recruitment: Technical Posts account for more than 60%
Live broadcast with practice | 30 minutes to build WordPress website with Alibaba cloud container service and container network file system
Pyqt5 technical part - cause of the problem that setting the top of the window does not take effect. Setwindowflags() does not take effect after setting the parameters. Solution
Space transcriptome experiment | what factors will affect the quality of space transcriptome sequencing during the preparation of clinical tissue samples?
On scale of canvas recttransform in ugui
【指标体系】最新数仓指标体系建模方法
Unity package manager starting server stuck
Online excel file parsing and conversion to JSON format
UDP、TCP
[Unity插件]着色器关键字分析工具ShaderControl
Teach you how to grab ZigBee packets through cc2531 and parse encrypted ZigBee packets
New product release: lr-link Lianrui launched the first 25g OCP 3.0 network card
Final examination of Dialectics of nature 1
The world's first public chain integrating commercial and financial fields
JVM方法区
UI automated interview questions