当前位置:网站首页>Mathematical modeling clustering
Mathematical modeling clustering
2022-07-29 08:42:00 【Herding cattle】
Catalog
Q Type clustering
Classifying samples is called Q Type cluster analysis , Use distance to measure the similarity between sample points , Two groups Sample points The distance between is usually measured by Euclidean distance , Notice if the dimensions are different , Then it should be standardized . Mahalanobis distance does not need to consider dimension . The distance between two sample classes is also defined
mandist( matrix ) This function is used to find the matrix Column The absolute distance between two vectors
Y=pdist(X) return X Euclidean distance between row and row observations . Return value Y It's a row vector , use squareform(Y) Convert to square matrix , Row vector Y In fact, it is the value of the lower triangular matrix in matrix form .
tril( matrix ) You can intercept the triangular matrix ,nonzeros( matrix ) Remove zero elements from the matrix , Non zero elements are arranged in columns ,unique( matrix ) You can remove repeated non-zero elements ,linkage() Function can find the clustering tree , The parameter should be the row vector of distance ,dendrogram(z,num); Draw a cluster diagram ,num It's the number of nodes , Default maximum 30
Example :
| w1 | 1 | 0 |
| w2 | 1 | 1 |
| w3 | 3 | 2 |
| w4 | 4 | 3 |
| w5 | 2 | 4 |
Yes w To classify
The absolute value distance is used as the classification basis , Theoretical analysis shows the distance :

Draw a cluster diagram and a binary tree diagram :

clc,clear
a = [1,0;1,1;3,2;4,3;2,5];
d = pdist(a,'cityblock');% Calculate the row vector of the direct distance between the sample points
z = linkage(d);
dendrogram(z);% Draw a cluster diagram
T = cluster(z,'maxclust',3)% Classify into three categories
%% It's fine too :
z = linkage(a,'single','cityblock');%single Refers to the distance between classes , The latter is the sample point distance Calculation results :
z =
1 2 1
3 4 2
6 7 3
5 8 4intend :1,2 The sample points are divided into one class , yes h6,3,4 Divide into one class h7,h6、h7 It is divided into h8,h8 And sample points 5 Divide into one class .z The third column contains the connection distance between two rows of objects
T =
1
1
2
2
3T Divided into three categories , Sample points 1,2 It is divided into 1 class ...


R Type clustering
Cluster variables , Then we can find out the main factors that affect the system , There are two commonly used measures of variable similarity ① The correlation coefficient ② Angle cosine , The correlation coefficient is the most used

Variable clustering methods commonly used are the longest distance method and the shortest distance method
Example :
give 14 Correlation coefficient between variables , For this 14 Categorize variables

clc,clear
a = readmatrix('data.txt');
a(isnan(a)) = 0;
d = 1-abs(a); % Perform data transformation , Convert the correlation coefficient into distance
d = tril(d); % Propose the lower triangular matrix
b = nonzeros(d); % remove 0
b = b'; % Into a row vector
z = linkage(b,'complete'); % Cluster according to the longest distance method
y = cluster(z,'maxclust',2);% Divide the variables into 2 class
ind1 = find(y==1) % Display the corresponding variable label of the first type
ind2 = find(y==2)
h = dendrogram(z); % Draw a cluster diagram other
A = zsore(x) Standardize the data matrix , The way to deal with it is :
B = corrcoef(A) return A Matrix of correlation coefficients
边栏推荐
- Opencv cvcircle function
- 2022电工(初级)考题模拟考试平台操作
- Day5: PHP simple syntax and usage
- AI application lesson 1: C language Alipay face brushing login
- Basic shell operations (Part 1)
- 不同数据库相同字段查不重复数据
- Gan: generate adversarial networks
- WQS binary learning notes
- MySQL statement mind map
- Design of distributed (cluster) file system
猜你喜欢

2022 Shandong Province safety officer C certificate work certificate question bank and answers

C language sorts n integers with pointers pointing to pointers

The first week of postgraduate freshman training: deep learning and pytorch Foundation

Common query optimization technology of data Lake - "deepnova developer community"
![[from_bilibili_dr_can][[advanced control theory] 9_ State observer design] [learning record]](/img/9d/d9a4a3091c00ec65a9492ca49267db.png)
[from_bilibili_dr_can][[advanced control theory] 9_ State observer design] [learning record]

2022 Teddy cup data mining challenge C project and post game summary

Crawling JS encrypted data of playwright actual combat case

Tensorboard use

QT learning: use non TS files such as json/xml to realize multilingual internationalization

Centos7/8 command line installation Oracle11g
随机推荐
Sword finger offer 50. the first character that appears only once
Information system project manager must recite the quality grade of the core examination site (53)
Eggjs create application knowledge points
2022 R2 mobile pressure vessel filling test question simulation test platform operation
The computer video pauses and resumes, and the sound suddenly becomes louder
Flask reports an error runtimeerror: the session is unavailable because no secret key was set
Selenium actual combat case crawling JS encrypted data
SAP ooalv-sd module actual development case (add, delete, modify and check)
Memory leaks and common solutions
Requests library simple method usage notes
Week 1 task deep learning and pytorch Foundation
Day5: PHP simple syntax and usage
ML.NET相关资源整理
Sudoku (DFS)
Google browser cross domain configuration free
2022 spsspro certification cup mathematical modeling problem B phase II scheme and post game summary
Day13: file upload vulnerability
7.3-function-templates
access数据库可以被远程访问吗
GBase 8s数据库有哪些备份恢复方式
