当前位置:网站首页>Machine learning -- handwritten English alphabet 2 -- importing and processing data
Machine learning -- handwritten English alphabet 2 -- importing and processing data
2022-07-28 10:35:00 【Cute me】
Catalog
1. Create a data store
Handwritten sample file
Samples of each letter were collected from many different volunteers . Some people provided more than one sample of each letter . Each sample is saved in a separate file , All files are stored in one folder . The format of the file name is as follows
user003_B_2.txt
This file will contain the volunteer assigned “user003” Written letters B The second sample of .
letterds = datastore("*_M_*.txt")
data = read(letterds)
plot(data.X,data.Y)
Again using read, Read datastore The second file in
data = read(letterds)
plot(data.X,data.Y)

Use readall The function imports the data in all files into a file named data In the table of . By drawing Y And X The comparison chart of makes the data visible .
data = readall(letterds)
plot(data.X,data.Y)
2. Add preprocessing function
Usually , You need to apply a series of preprocessing operations to each sample of the original data . The first step in automating this process is to create a custom function , This function applies specific preprocessing operations .
letterds = datastore("*_M_*.txt");
data = read(letterds);
data = scale(data);
plot(data.X,data.Y)
axis equal
plot(data.Time,data.Y)
ylabel("Vertical position")
xlabel("Time") At the end of the script, create a named scale Function of , This function does the following :
data.Time=(data.Time-data.Time(1))/1000;
data.X=1.5*data.X;
Because these commands modify variable data directly , So functions should use data as input and output variables .
Be careful , The third line of the script calls scale function . Before creating this function , Your script will not run .
Also pay attention to , Local functions must be at the end of the script . This means that you will edit the script sections sequentially in this interaction . The section title shows which section of the script to edit in each task .
function dataout = scale(data)
data.Time = (data.Time - data.Time(1))/1000;
data.X = 1.5*data.X;
dataout = data
end


at present , You still need to call the function manually . To automate data import and preprocessing , You want the data store to apply this function when reading data . You can use the converted data store to do this .transform Function accepts a data store and a function as input . It returns a new data store as output . The converted data store applies the given function when importing data .
background
To use a function as input to another function , Please add @ Symbol , Create function handle .
transform(ds,@myfun)
A function handle is a reference to a function . without @ Symbol ,MATLAB The function name will be interpreted as a call to the function .
Use transform The function creation name is predocds Conversion data storage . This data store should store scale Function applied to letterds Referenced data .
preprocds = transform(letterds,@scale)
Now? , Whenever data is read from the preprint data store , The zoom function should be applied automatically .
Use readall Function to import all data . By way of Y Variables are plotted as functions of time , Check whether the preprocessing function is applied to each file .
data = readall(preprocds)
plot(data.Time,data.Y)



The position of the letters is not important for classification . What matters is shape . For many machine learning problems , A common preprocessing step is to normalize the data .
Typical normalization involves shifting by average ( Make the average value of the shifted data 0) Or shift and shrink the data to a fixed range ( for example [-1,1]). In the case of handwritten letters , take x and y The average value of the data is converted to 0 Will ensure that all letters are centered around the same point .
function data = scale(data)
data.Time = (data.Time - data.Time(1))/1000;
data.X = 1.5*data.X;
data.X = data.X - mean(data.X);
data.Y = data.Y - mean(data.Y);
end Any involvement NaNs The calculation of ( Including default use mean Such as function ) Will lead to NaN. This is very important in machine learning , Because in machine learning , Values in data are often lost . In handwritten data , As long as the author lifts the pen from the tablet , Will appear NaN.
You can use “omitnan” Options to use such as mean ignore missing values Such statistical functions .
mean(x,“omitnan”)
function data = scale(data)
data.Time = (data.Time - data.Time(1))/1000;
data.X = 1.5*data.X;
data.X = data.X - mean(data.X,"omitnan");
data.Y = data.Y - mean(data.Y,"omitnan");
endCode in this section
letterds = datastore("*_M_*.txt");
data = read(letterds);
data = scale(data);
plot(data.X,data.Y)
axis equal
plot(data.Time,data.Y)
ylabel("Vertical position")
xlabel("Time")
preprocds = transform(letterds,@scale)
data = readall(preprocds)
plot(data.Time,data.Y)
function data = scale(data)
data.Time = (data.Time - data.Time(1))/1000;
data.X = 1.5*data.X;
data.X = data.X - mean(data.X,"omitnan");
data.Y = data.Y - mean(data.Y,"omitnan");
end
边栏推荐
- a different object with the same identifier value was already associated with the session
- C语言 输入带空格的字符串
- 287. Find the Duplicate Number
- 中兴通讯总裁徐子阳:5nm芯片将在2021年推出
- 数据库安全 --- 创建登录名 用户+配置权限【笔记】
- Pl/sql server syntax explanation
- 2. Output one of the repeated numbers in the array
- 2021-10-13arx
- uni-app项目目录、文件作用介绍 及 开发规范
- Why does the cluster need root permission
猜你喜欢

逆元&组合数&快速幂

SDUT Round #9 2020-新春大作战

gcc: error trying to exec 'as': execvp: No such file or directory

Uni app project directory, file function introduction and development specification

SuperMap iserver publishing management and calling map services

机器学习--手写英文字母1--分类流程

多线程与高并发(三)—— 源码解析 AQS 原理

用两个栈实现一个队列【C语言】

机器学习--手写英文字母2--导入与处理数据

Install mysql5.7 under centos7
随机推荐
试题 历届试题 发现环
Huawei takes a 10% stake in fullerene technology, a graphene material manufacturer
印度计划禁用中国电信设备!真离得开华为、中兴?
【微信小程序】项目实战—抽签应用
12. Double pointer -- merge two ordered linked lists
按位与、或、异或等运算方法
SQL Server 2016 learning records - single table query
Small knowledge in Oracle
SQL Server 2016 学习记录 --- 数据定义
20200229训练赛 L2 - 2 树种统计 (25分)
Deadlock algorithm: banker algorithm and security algorithm
胡润发布2020中国芯片设计10强民营企业:华为海思竟然没有上榜!
Use of Ogg parameter filter [urgent]
Codeforces Round #614 (Div. 2) A. ConneR and the A.R.C. Markland-N
利用正则表达式从文件路径中匹配文件名
PL/SQL server语法详解
brief introduction
简介
SQL Server 2016 学习记录 --- 数据更新
Codeforces Round #614 (Div. 2) A. ConneR and the A.R.C. Markland-N