当前位置:网站首页>Mathematical modeling clustering
Mathematical modeling clustering
2022-07-29 08:42:00 【Herding cattle】
Catalog
Q Type clustering
Classifying samples is called Q Type cluster analysis , Use distance to measure the similarity between sample points , Two groups Sample points The distance between is usually measured by Euclidean distance , Notice if the dimensions are different , Then it should be standardized . Mahalanobis distance does not need to consider dimension . The distance between two sample classes is also defined
mandist( matrix ) This function is used to find the matrix Column The absolute distance between two vectors
Y=pdist(X) return X Euclidean distance between row and row observations . Return value Y It's a row vector , use squareform(Y) Convert to square matrix , Row vector Y In fact, it is the value of the lower triangular matrix in matrix form .
tril( matrix ) You can intercept the triangular matrix ,nonzeros( matrix ) Remove zero elements from the matrix , Non zero elements are arranged in columns ,unique( matrix ) You can remove repeated non-zero elements ,linkage() Function can find the clustering tree , The parameter should be the row vector of distance ,dendrogram(z,num); Draw a cluster diagram ,num It's the number of nodes , Default maximum 30
Example :
| w1 | 1 | 0 |
| w2 | 1 | 1 |
| w3 | 3 | 2 |
| w4 | 4 | 3 |
| w5 | 2 | 4 |
Yes w To classify
The absolute value distance is used as the classification basis , Theoretical analysis shows the distance :

Draw a cluster diagram and a binary tree diagram :

clc,clear
a = [1,0;1,1;3,2;4,3;2,5];
d = pdist(a,'cityblock');% Calculate the row vector of the direct distance between the sample points
z = linkage(d);
dendrogram(z);% Draw a cluster diagram
T = cluster(z,'maxclust',3)% Classify into three categories
%% It's fine too :
z = linkage(a,'single','cityblock');%single Refers to the distance between classes , The latter is the sample point distance Calculation results :
z =
1 2 1
3 4 2
6 7 3
5 8 4intend :1,2 The sample points are divided into one class , yes h6,3,4 Divide into one class h7,h6、h7 It is divided into h8,h8 And sample points 5 Divide into one class .z The third column contains the connection distance between two rows of objects
T =
1
1
2
2
3T Divided into three categories , Sample points 1,2 It is divided into 1 class ...


R Type clustering
Cluster variables , Then we can find out the main factors that affect the system , There are two commonly used measures of variable similarity ① The correlation coefficient ② Angle cosine , The correlation coefficient is the most used

Variable clustering methods commonly used are the longest distance method and the shortest distance method
Example :
give 14 Correlation coefficient between variables , For this 14 Categorize variables

clc,clear
a = readmatrix('data.txt');
a(isnan(a)) = 0;
d = 1-abs(a); % Perform data transformation , Convert the correlation coefficient into distance
d = tril(d); % Propose the lower triangular matrix
b = nonzeros(d); % remove 0
b = b'; % Into a row vector
z = linkage(b,'complete'); % Cluster according to the longest distance method
y = cluster(z,'maxclust',2);% Divide the variables into 2 class
ind1 = find(y==1) % Display the corresponding variable label of the first type
ind2 = find(y==2)
h = dendrogram(z); % Draw a cluster diagram other
A = zsore(x) Standardize the data matrix , The way to deal with it is :
B = corrcoef(A) return A Matrix of correlation coefficients
边栏推荐
- Eggjs create application knowledge points
- AES bidirectional encryption and decryption tool
- C language -- 23 two-dimensional array
- Selenium actual combat case crawling JS encrypted data
- Source code compilation pytorch pit
- 2022电工(初级)考题模拟考试平台操作
- Error reporting when adding fields to sap se11 transparent table: structural changes at the field level (conversion table xxxxx)
- Week 2: convolutional neural network basics
- Basic crawler actual combat case: obtaining game product data
- 01背包关于从二维优化到一维
猜你喜欢

RESTful 风格详解

Basic shell operations (Part 2)

Application of matrix transpose

MySQL statement mind map

2022年P气瓶充装考试模拟100题模拟考试平台操作

2022 spsspro certification cup mathematical modeling problem B phase II scheme and post game summary

【Transformer】ATS: Adaptive Token Sampling For Efficient Vision Transformers

HC-SR04超声波测距模块使用方法和例程(STM32)

6.3 references

C language function output I love you
随机推荐
Cloud security daily 220712: the IBM integration bus integration solution has found a vulnerability in the execution of arbitrary code, which needs to be upgraded as soon as possible
Fastjson's tojsonstring() source code analysis for special processing of time classes - "deepnova developer community"
Intel将逐步结束Optane存储业务 未来不再开发新产品
集群使用规范
Leetcode Hot 100 (brush question 9) (301/45/517/407/offer62/mst08.14/)
Arfoundation starts from scratch 5-ar image tracking
AES bidirectional encryption and decryption tool
数学建模——聚类
Common query optimization technology of data Lake - "deepnova developer community"
Clickhouse learning (III) table engine
What if official account does not support markdown format file preparation?
数学建模——微分方程
Selenium actual combat case crawling JS encrypted data
GBase 8s数据库有哪些备份恢复方式
RPC and rest
2022危险化学品经营单位主要负责人操作证考试题库及答案
Basic shell operations (Part 1)
Google browser cross domain configuration free
Ar virtual augmentation and reality
Excellent Allegro skill recommendation
