当前位置:网站首页>Mathematical modeling clustering
Mathematical modeling clustering
2022-07-29 08:42:00 【Herding cattle】
Catalog
Q Type clustering
Classifying samples is called Q Type cluster analysis , Use distance to measure the similarity between sample points , Two groups Sample points The distance between is usually measured by Euclidean distance , Notice if the dimensions are different , Then it should be standardized . Mahalanobis distance does not need to consider dimension . The distance between two sample classes is also defined
mandist( matrix ) This function is used to find the matrix Column The absolute distance between two vectors
Y=pdist(X) return X Euclidean distance between row and row observations . Return value Y It's a row vector , use squareform(Y) Convert to square matrix , Row vector Y In fact, it is the value of the lower triangular matrix in matrix form .
tril( matrix ) You can intercept the triangular matrix ,nonzeros( matrix ) Remove zero elements from the matrix , Non zero elements are arranged in columns ,unique( matrix ) You can remove repeated non-zero elements ,linkage() Function can find the clustering tree , The parameter should be the row vector of distance ,dendrogram(z,num); Draw a cluster diagram ,num It's the number of nodes , Default maximum 30
Example :
| w1 | 1 | 0 |
| w2 | 1 | 1 |
| w3 | 3 | 2 |
| w4 | 4 | 3 |
| w5 | 2 | 4 |
Yes w To classify
The absolute value distance is used as the classification basis , Theoretical analysis shows the distance :

Draw a cluster diagram and a binary tree diagram :

clc,clear
a = [1,0;1,1;3,2;4,3;2,5];
d = pdist(a,'cityblock');% Calculate the row vector of the direct distance between the sample points
z = linkage(d);
dendrogram(z);% Draw a cluster diagram
T = cluster(z,'maxclust',3)% Classify into three categories
%% It's fine too :
z = linkage(a,'single','cityblock');%single Refers to the distance between classes , The latter is the sample point distance Calculation results :
z =
1 2 1
3 4 2
6 7 3
5 8 4intend :1,2 The sample points are divided into one class , yes h6,3,4 Divide into one class h7,h6、h7 It is divided into h8,h8 And sample points 5 Divide into one class .z The third column contains the connection distance between two rows of objects
T =
1
1
2
2
3T Divided into three categories , Sample points 1,2 It is divided into 1 class ...


R Type clustering
Cluster variables , Then we can find out the main factors that affect the system , There are two commonly used measures of variable similarity ① The correlation coefficient ② Angle cosine , The correlation coefficient is the most used

Variable clustering methods commonly used are the longest distance method and the shortest distance method
Example :
give 14 Correlation coefficient between variables , For this 14 Categorize variables

clc,clear
a = readmatrix('data.txt');
a(isnan(a)) = 0;
d = 1-abs(a); % Perform data transformation , Convert the correlation coefficient into distance
d = tril(d); % Propose the lower triangular matrix
b = nonzeros(d); % remove 0
b = b'; % Into a row vector
z = linkage(b,'complete'); % Cluster according to the longest distance method
y = cluster(z,'maxclust',2);% Divide the variables into 2 class
ind1 = find(y==1) % Display the corresponding variable label of the first type
ind2 = find(y==2)
h = dendrogram(z); % Draw a cluster diagram other
A = zsore(x) Standardize the data matrix , The way to deal with it is :
B = corrcoef(A) return A Matrix of correlation coefficients
边栏推荐
- 集群使用规范
- Component transfer participation lifecycle
- Classic interview question: = = the difference between equals
- 7.3-function-templates
- Leetcode Hot 100 (brush question 9) (301/45/517/407/offer62/mst08.14/)
- Day4: SQL server is easy to use
- Opencv cvcircle function
- 2022 electrician (elementary) test question simulation test platform operation
- Thrift installation manual
- 搜索与回溯经典题型(八皇后)
猜你喜欢

Osgsimplegl3 combined with renderdoc tool

2022 P cylinder filling test simulation 100 questions simulation test platform operation

User identity identification and account system practice

2022 electrician (elementary) test question simulation test platform operation

Common query optimization technology of data Lake - "deepnova developer community"

Chrony time synchronization

2022电工(初级)考题模拟考试平台操作

(Video + graphic) introduction series to machine learning - Chapter 2 linear regression

Sword finger offer 27. image of binary tree

数学建模——聚类
随机推荐
2022 Teddy cup data mining challenge C project and post game summary
Day4: SQL server is easy to use
(Video + graphic) introduction series to machine learning - Chapter 2 linear regression
C language function output I love you
英语高频后缀
The first week of postgraduate freshman training: deep learning and pytorch Foundation
The computer video pauses and resumes, and the sound suddenly becomes louder
C language calculates the length of string
Ar virtual augmentation and reality
2022电工(初级)考题模拟考试平台操作
Ga-rpn: recommended area network for guiding anchors
Chrony time synchronization
GBase 8s数据库有哪些备份恢复方式
2022 Shandong Province safety officer C certificate work certificate question bank and answers
QT version of Snake game project
分组背包
Requests library simple method usage notes
2022.7.9 quick view of papers
Brief introduction and use of commonjs import and export and ES6 modules import and export
centos7/8命令行安装Oracle11g
