当前位置:网站首页>System (hierarchical) clustering method and SPSS implementation
System (hierarchical) clustering method and SPSS implementation
2022-07-02 21:33:00 【caiggle】
Catalog
Four . Systematic clustering method Spss Realization
One . Definition
Systematic clustering (hierarchjcal cluster method) One translation " Hierarchical clustering method ". A method of cluster analysis . The method is to treat each sample as a class at the beginning , Then take the nearest sample ( That is, the group with the smallest distance ) First, it is grouped into small categories , Then press the aggregated subclass Distance between classes Re merger , Keep going , Finally, all subclasses are aggregated into a large class .
notes : Distance between classes and distance between samples
Here is my own understanding : Classes can consist of one or more samples , So if the class consists of only one sample , The distance between classes is equal to the distance between samples ; If the class consists of multiple samples , Then we need to define the distance between classes . The main definitions are as follows :
1. The shortest distance method
2. The longest distance method
3. Group average connection method
4. Intra group average connection method
5. Barycenter method
6. Variable average method
7. The sum of squares of deviations
ad locum ( Systematic clustering ) When calculating the distance between classes, we use the shortest distance method :
As shown in the figure , The category on the left contains five points , The right contains three points , among 1 Point and 2 The distance between the points Is the smallest distance between all points , therefore 1 Point and 2 The distance between points is the shortest distance of these two kinds .
Naturally , We think of , How to calculate 1 Point and 2 The distance between points ?
This is the distance between samples just mentioned , The most commonly used definition is Euclidean distance (EuclideanDistance), From the distance formula between two points in Euclidean space .
Euclidean distance formula in two-dimensional space
Such as : Definition a=[1,5],b=[2,1], solve a And b Euclidean distance between vectors ( Use matlab Realization )
a=[1,5],b=[2,1];
c=[a;b];
pdist(c,'euclidean')
Euclidean distance formula in three-dimensional space
n Euclidean distance formula of dimensional space
n Dimensional Euclidean space It's a set of points , Every point of it X It can be expressed as (x[1]x[2]…x[n]), Then set points X1(x[1]x[2]…x[n]) With the point X2(y[1]y[2]…y[n]) The distance between them can be expressed as
Two . thought
For the idea of systematic clustering , I use flow chart to show :
Let the initial sample have n individual , Each sample falls into its own category .
3、 ... and . give an example
Set the available quality indicators of milk production in a certain place X To measure , There are five bottles of fresh milk ,X The values of are 1.5,2.5,5,6.5,8.5. Please use the system clustering method to classify five bottles of milk .
Explain : Let the distance between samples use Euclidean distance , Use the shortest distance between classes , And record the samples as X1、X2、X3、X4、X5.
1) The calculated distance is shown in the table
class | X1 | X2 | X3 | X4 | X5 |
X1 | 0 | 1 | 3.5 | 5 | 7 |
X2 | 0 | 2.5 | 4 | 6 | |
X3 | 0 | 1.5 | 3.5 | ||
X4 | 0 | 2 | |||
X4 | 0 |
2) take X1 And X2 Group together , Write it down as N1
class | N1 | X3 | X4 | X5 |
N1 | 0 | 2.5 | 4 | 6 |
X3 | 0 | 1.5 | 2.5 | |
X4 | 0 | 2 | ||
X5 | 0 |
3) take X3 And X4 Group together , Write it down as N2
class | N1 | N2 | X5 |
N1 | 0 | 2.5 | 6 |
N2 | 0 | 2 | |
X5 | 0 |
4) take N2 And X5 Group together , Write it down as N3
class | N1 | N3 |
N1 | 0 | 2.5 |
N3 | 0 |
5) take N1 And N3 Group together , end
6) Draw a genealogical structure ( It is complicated to process data and draw pictures by yourself , Let's skip , The next part uses Spss Simulate and draw )
Four . Systematic clustering method Spss Realization
1) Data import ( This data has no practical significance, only for demonstration )
This is us. excel Data in table , The file named “milk.xlsx"
Then we turn on Spss24( If the version is too low, some functions may not be realized )
Now we have successfully imported the data , It's time to move on .
2) You can start clustering
We take the liberty of correcting a small problem ,Spss Set our several clusters under the classification column , But in fact, we know that clustering and classification are completely different concepts .
choice V2 As a variable ,V1 Mark the basis for the case .
stay ” chart “ Check “ Genealogy ”
3) Generate analysis results
Case summary a,b | |||||
individual case | |||||
It works | defect | A total of | |||
Number of cases | percentage | Number of cases | percentage | Number of cases | percentage |
5 | 100.0 | 0 | .0 | 5 | 100.0 |
a. Square Euclidean distance In the use |
b. Average connection ( Intergroup ) |
Here, the vertical icicle diagram is also generated by default , But in general, it is of little reference significance .
5、 ... and . Conclusion
Here, the sharing of systematic clustering is over .
I wish : Always happy .
边栏推荐
- Roommate, a king of time, I took care of the C language structure memory alignment
- China's Micro SD market trend report, technology dynamic innovation and market forecast
- Research and Analysis on the current situation of China's clamping device market and forecast report on its development prospect
- Internal/validators js:124 throw new ERR_ INVALID_ ARG_ Type (name, 'string', value) -- solution
- Research Report on the overall scale, major manufacturers, major regions, products and application segmentation of sound quality head simulators in the global market in 2022
- In depth research and investment feasibility report of global and Chinese isolator industry, 2022-2028
- Go web programming practice (1) -- basic syntax of go language
- Cloud computing technology [1]
- Analysis of enterprise financial statements [1]
- Construction and maintenance of business websites [4]
猜你喜欢
Highly qualified SQL writing: compare lines. Don't ask why. Asking is highly qualified..
[dynamic planning] p1220: interval DP: turn off the street lights
JDBC | Chapter 4: transaction commit and rollback
Welfare, let me introduce you to someone
6 pyspark Library
[shutter] shutter layout component (opacity component | clipprect component | padding component)
Check the confession items of 6 yyds
Roommate, a king of time, I took care of the C language structure memory alignment
Basic knowledge of tree and binary tree (detailed illustration)
Internal/validators js:124 throw new ERR_ INVALID_ ARG_ Type (name, 'string', value) -- solution
随机推荐
What is the difference between programming in real work and that in school?
Construction and maintenance of business websites [6]
Happy Lantern Festival! Tengyuanhu made you a bowl of hot dumplings!
qwb2018_ core kernel_ rop
Cloud computing technology [2]
Construction and maintenance of business websites [8]
Go web programming practice (1) -- basic syntax of go language
Report on investment development and strategic recommendations of China's vibration isolator market, 2022-2027
MySQL learning record (1)
Research Report on the overall scale, major manufacturers, major regions, products and applications of metal oxide arresters in the global market in 2022
Research Report on the overall scale, major manufacturers, major regions, products and applications of outdoor vacuum circuit breakers in the global market in 2022
6 pyspark Library
This team with billions of data access and open source dreams is waiting for you to join
Research Report on ranking analysis and investment strategic planning of RFID market competitiveness of China's industrial manufacturing 2022-2028 Edition
MySQL learning notes (Advanced)
Research Report on right-hand front door industry - market status analysis and development prospect forecast
Research Report on the overall scale, major manufacturers, major regions, products and applications of sliding door dampers in the global market in 2022
ctf-HCTF-Final-Misc200
Research Report on the overall scale, major manufacturers, major regions, products and applications of capacitive voltage transformers in the global market in 2022
如何防止你的 jar 被反编译?