当前位置:网站首页>Machine learning -- handwritten English alphabet 2 -- importing and processing data
Machine learning -- handwritten English alphabet 2 -- importing and processing data
2022-07-28 10:35:00 【Cute me】
Catalog
1. Create a data store
Handwritten sample file
Samples of each letter were collected from many different volunteers . Some people provided more than one sample of each letter . Each sample is saved in a separate file , All files are stored in one folder . The format of the file name is as follows
user003_B_2.txt
This file will contain the volunteer assigned “user003” Written letters B The second sample of .
letterds = datastore("*_M_*.txt")
data = read(letterds)
plot(data.X,data.Y)
Again using read, Read datastore The second file in
data = read(letterds)
plot(data.X,data.Y)

Use readall The function imports the data in all files into a file named data In the table of . By drawing Y And X The comparison chart of makes the data visible .
data = readall(letterds)
plot(data.X,data.Y)
2. Add preprocessing function
Usually , You need to apply a series of preprocessing operations to each sample of the original data . The first step in automating this process is to create a custom function , This function applies specific preprocessing operations .
letterds = datastore("*_M_*.txt");
data = read(letterds);
data = scale(data);
plot(data.X,data.Y)
axis equal
plot(data.Time,data.Y)
ylabel("Vertical position")
xlabel("Time") At the end of the script, create a named scale Function of , This function does the following :
data.Time=(data.Time-data.Time(1))/1000;
data.X=1.5*data.X;
Because these commands modify variable data directly , So functions should use data as input and output variables .
Be careful , The third line of the script calls scale function . Before creating this function , Your script will not run .
Also pay attention to , Local functions must be at the end of the script . This means that you will edit the script sections sequentially in this interaction . The section title shows which section of the script to edit in each task .
function dataout = scale(data)
data.Time = (data.Time - data.Time(1))/1000;
data.X = 1.5*data.X;
dataout = data
end


at present , You still need to call the function manually . To automate data import and preprocessing , You want the data store to apply this function when reading data . You can use the converted data store to do this .transform Function accepts a data store and a function as input . It returns a new data store as output . The converted data store applies the given function when importing data .
background
To use a function as input to another function , Please add @ Symbol , Create function handle .
transform(ds,@myfun)
A function handle is a reference to a function . without @ Symbol ,MATLAB The function name will be interpreted as a call to the function .
Use transform The function creation name is predocds Conversion data storage . This data store should store scale Function applied to letterds Referenced data .
preprocds = transform(letterds,@scale)
Now? , Whenever data is read from the preprint data store , The zoom function should be applied automatically .
Use readall Function to import all data . By way of Y Variables are plotted as functions of time , Check whether the preprocessing function is applied to each file .
data = readall(preprocds)
plot(data.Time,data.Y)



The position of the letters is not important for classification . What matters is shape . For many machine learning problems , A common preprocessing step is to normalize the data .
Typical normalization involves shifting by average ( Make the average value of the shifted data 0) Or shift and shrink the data to a fixed range ( for example [-1,1]). In the case of handwritten letters , take x and y The average value of the data is converted to 0 Will ensure that all letters are centered around the same point .
function data = scale(data)
data.Time = (data.Time - data.Time(1))/1000;
data.X = 1.5*data.X;
data.X = data.X - mean(data.X);
data.Y = data.Y - mean(data.Y);
end Any involvement NaNs The calculation of ( Including default use mean Such as function ) Will lead to NaN. This is very important in machine learning , Because in machine learning , Values in data are often lost . In handwritten data , As long as the author lifts the pen from the tablet , Will appear NaN.
You can use “omitnan” Options to use such as mean ignore missing values Such statistical functions .
mean(x,“omitnan”)
function data = scale(data)
data.Time = (data.Time - data.Time(1))/1000;
data.X = 1.5*data.X;
data.X = data.X - mean(data.X,"omitnan");
data.Y = data.Y - mean(data.Y,"omitnan");
endCode in this section
letterds = datastore("*_M_*.txt");
data = read(letterds);
data = scale(data);
plot(data.X,data.Y)
axis equal
plot(data.Time,data.Y)
ylabel("Vertical position")
xlabel("Time")
preprocds = transform(letterds,@scale)
data = readall(preprocds)
plot(data.Time,data.Y)
function data = scale(data)
data.Time = (data.Time - data.Time(1))/1000;
data.X = 1.5*data.X;
data.X = data.X - mean(data.X,"omitnan");
data.Y = data.Y - mean(data.Y,"omitnan");
end
边栏推荐
- Hurun released the 2020 top 10 Chinese chip design private enterprises: Huawei Hisilicon did not appear on the list!
- 16. String inversion
- SQL Server 2016 学习记录 --- 集合查询
- 机器学习--手写英文字母1--分类流程
- 死锁算法:银行家算法和安全性算法
- SQL Server 2016 learning records - connection query
- 5. Dynamic programming -- Fibonacci series
- 7、MapReduce自定义排序实现
- Codeforces Round #614 (Div. 2) A. ConneR and the A.R.C. Markland-N
- 上下文变量值(context values)陷阱及在 Go 中如何避免或缓和这些陷阱
猜你喜欢

11. Linked list inversion

机器人技术(RoboCup 2D)如何进行一场球赛

Typora tutorial

SQL Server 2016 learning records - connection query

机器学习--手写英文字母3--工程特点

15. Judge whether the target value exists in the two-dimensional array

Multithreading and high concurrency (III) -- source code analysis AQS principle

SQL Server 2016 learning record - Data Definition

Qt生成.exe文件 并 在无Qt环境下运行(Enigma Virtual Box进行绿色可执行软件封装)图文教程

11、链表反转
随机推荐
AP Autosar平台设计 1-2 导言、技术范围与方法
SDUT Round #9 2020-新春大作战
胡润发布2020中国芯片设计10强民营企业:华为海思竟然没有上榜!
Qt生成.exe文件 并 在无Qt环境下运行(Enigma Virtual Box进行绿色可执行软件封装)图文教程
Why does the cluster need root permission
Install Office customization. Troubleshooting during installation
KingbaseES V8R6 JDBC 能否使用VIP ?
8、Yarn系统架构与原理详解
Inverse element & combinatorial number & fast power
Xu Ziyang, President of ZTE: 5nm chip will be launched in 2021
16. String inversion
多线程与高并发(三)—— 源码解析 AQS 原理
安装office自定义项 安装期间出错 解决办法
2019年9月PAT甲级题目
发力大核、独显!英众科技2020十代酷睿独显产品发布
django-celery-redis异步发邮件
试题 历届试题 发现环
Sleeping barber problem
Install mysql5.7 under centos7
gcc: error trying to exec 'as': execvp: No such file or directory