当前位置:网站首页>Machine learning -- handwritten English alphabet 1 -- classification process
Machine learning -- handwritten English alphabet 1 -- classification process
2022-07-28 10:35:00 【Cute me】
Catalog
1. Import data
Handwritten letters are stored as separate text files . Each file is separated by commas , It contains four columns : Time stamp 、 The horizontal position of the pen 、 The vertical position of the pen and the pressure of the pen . The timestamp is the number of milliseconds elapsed since the beginning of data collection . Other variables are in normalized units (0 To 1) Express . For pen position ,0 Indicates the lower and left edges of the writing surface ,1 Represents the top and right edges .
letter = readtable("J.txt");
plot(letter.X,letter.Y)
axis equal
letter = readtable("M.txt");
plot(letter.X,letter.Y)
axis equal

2. Data processing
Pen position of handwritten data in standardized units (0 To 1) measurement . however , Tablets used to record data are not square . It means 1 The vertical distance of corresponds to 10 Inch , The same horizontal distance corresponds to 15 Inch . To correct this problem , The horizontal unit should be adjusted to the range [0 1.5] instead of [0 1].
letter = readtable("M.txt")
letter.X = 1.5*letter.X;
plot(letter.X,letter.Y)
axis equal

Time value has no physical meaning . They represent the number of milliseconds elapsed since the beginning of the data collection session . This makes it difficult for us to interpret handwriting patterns through time . A more useful time variable is the duration from each letter ( In seconds ).
letter.Time = letter.Time - letter.Time(1)
letter.Time = letter.Time/1000
plot(letter.Time,letter.X)
plot(letter.Time,letter.Y)

3. Feature calculation
What aspects of these letters can be used to distinguish J and M or V? Our goal is not to use raw signals , Instead, the calculation extracts the whole signal into simple 、 Useful information unit ( It's called a feature ) Value .
For letters J and M, A simple feature may be aspect ratio ( The height of a letter relative to its width ).J It may be tall and narrow , and M May be more square .
And J and M comparison ,V Fast writing speed , Therefore, the duration of the signal may also be a distinguishing feature .

letter = readtable("M.txt");
letter.X = letter.X*1.5;
letter.Time = (letter.Time - letter.Time(1))/1000
plot(letter.X,letter.Y)
axis equal
# The above is the previous repeated code
dur = letter.Time(end)
aratio = range(letter.Y)/range(letter.X)
4. feature extraction
MAT file featuredata.MAT Include a include 470 An alphabetic table for extracting features , These letters are written by different people . Table properties have three variables :AspectRatio and Duration( The two characteristics calculated in the previous section ) and Character( Known letters ).
load featuredata.mat
features
scatter(features.AspectRatio,features.Duration)
It is not clear whether these features are sufficient to distinguish the three letters in the data set (J、M and V).gscatter Function to generate a grouped scatter graph , That is, a scatter graph that colors points according to grouped variables .
gscatter(features.AspectRatio,features.Duration,features.Character)
5. Build models and forecasts

load featuredata.mat
features
testdata
knnmodel = fitcknn(features,"Character")
After building the model according to the data , It can be used to classify new observations . It only needs to calculate the characteristics of the new observations , And determine where they are in the prediction space .
predictions = predict(knnmodel,testdata)
By default ,fitcknn fit k=1 Of kNN Model . in other words , The model uses only the closest known example to classify a given observation . This makes the model sensitive to any outliers in the training data ( As the outliers highlighted in the above figure ) sensitive . New observations near outliers may be misclassified . A simple way to solve this problem is to add k Value ( That is, use the most common class in several neighbors ).
knnmodel = fitcknn(features,"Character","NumNeighbors",5)
predictions = predict(knnmodel,testdata)
6. Evaluate a model
kNN How good is the model ?testdata The table contains known classes for test observations . You can associate known classes with kNN The prediction of the model is compared , To understand the performance of the model on new data .
load featuredata.mat
testdata
knnmodel = fitcknn(features,"Character","NumNeighbors",5);
predictions = predict(knnmodel,testdata)
iscorrect = predictions == testdata.Character
Calculate the proportion of correct predictions by dividing the correct predictions by the total number of predictions . Store the results in a file named accurity Variables in . have access to sum Function to determine the number of correct predictions , Use numel Function to determine the total number of predictions .
accuracy = sum(iscorrect)/numel(predictions)![]()
Error rate calculation
iswrong = predictions ~= testdata.Character
misclassrate = sum(iswrong)/numel(predictions)
Accuracy and error classification rate provide a single value for the overall performance of the model , But you can see a more detailed breakdown of the classes that are confused by the model . The confusion matrix shows the number of observations for each combination of real and predicted classes .

Confusion matrices are usually visualized by coloring elements according to their values . Usually diagonal elements ( Correct classification ) Color with one color , Other elements ( Wrong classification ) Color with another color . You can use confusionchart Function visualization confusion matrix .

7.review
Now? , You have a simple two feature model , It can handle three specific letters very well (J、M and V). Does this pattern also apply to the entire alphabet ? In this interaction , You will create and test the same kNN Model , however 13 Letters ( Half of the English alphabet ).
load featuredata13letters.mat
features
testdata
gscatter(features.AspectRatio,features.Duration,features.Character)
xlim([0 10])
knnmodel = fitcknn(features,"Character","NumNeighbors",5);
predictions = predict(knnmodel,testdata);
misclass = sum(predictions ~= testdata.Character)/numel(predictions)
confusionchart(testdata.Character,predictions);

边栏推荐
- Inside story of Wu xiongang being dismissed by arm: did the establishment of a private investment company harm the interests of shareholders?
- SQL Server 2016 学习记录 --- 集合查询
- Detailed explanation of super complete knowledge points of instruction system
- ogg参数filter的使用问题【急】
- 机器学习--手写英文字母2--导入与处理数据
- 第一篇:UniAPP的小程序跨端开发-----创建uniapp项目
- Database security - create login user + configure permissions [notes]
- 11、链表反转
- string matching
- SQL Server 2016 学习记录 --- 数据更新
猜你喜欢
![Database security - create login user + configure permissions [notes]](/img/02/0c3eb542593e8e0a3a62db75c52850.png)
Database security - create login user + configure permissions [notes]

SQL Server 2016学习记录 --- 连接查询

C语言 二级指针详解及示例代码

CentOS7下安装mysql5.7

Sword finger offer

Aqua Data Studio 18.5.0 export insert statement

ACM寒假集训#6

uni-app项目目录、文件作用介绍 及 开发规范

Chapter 1: cross end development of small programs of uniapp ----- create a uniapp project

markdown转成word或者pdf
随机推荐
3.用数组逆序打印链表
a different object with the same identifier value was already associated with the session
最短路专题
SQL Server 2016 learning records - View
It is said that the global semiconductor equipment giant may build a joint venture factory in Shanghai!
Aqua Data Studio 18.5.0 export insert statement
Go json.Decoder Considered Harmful
SQL Server 2016 学习记录 --- 集合查询
SuperMap iserver publishing management and calling map services
9. Delete nodes in the linked list
机器学习--手写英文字母1--分类流程
C语言 二级指针详解及示例代码
11. Linked list inversion
逆元&组合数&快速幂
Database mysql Foundation
7、MapReduce自定义排序实现
第一篇:UniAPP的小程序跨端开发-----创建uniapp项目
uni-app项目目录、文件作用介绍 及 开发规范
Can kingbasees v8r6 JDBC use VIP?
Codeforces Round #614 (Div. 2) B. JOE is on TV!