当前位置:网站首页>System (hierarchical) clustering method and SPSS implementation
System (hierarchical) clustering method and SPSS implementation
2022-07-02 21:33:00 【caiggle】
Catalog
Four . Systematic clustering method Spss Realization
One . Definition
Systematic clustering (hierarchjcal cluster method) One translation " Hierarchical clustering method ". A method of cluster analysis . The method is to treat each sample as a class at the beginning , Then take the nearest sample ( That is, the group with the smallest distance ) First, it is grouped into small categories , Then press the aggregated subclass Distance between classes Re merger , Keep going , Finally, all subclasses are aggregated into a large class .
notes : Distance between classes and distance between samples
Here is my own understanding : Classes can consist of one or more samples , So if the class consists of only one sample , The distance between classes is equal to the distance between samples ; If the class consists of multiple samples , Then we need to define the distance between classes . The main definitions are as follows :
1. The shortest distance method
2. The longest distance method
3. Group average connection method
4. Intra group average connection method
5. Barycenter method
6. Variable average method
7. The sum of squares of deviations
ad locum ( Systematic clustering ) When calculating the distance between classes, we use the shortest distance method :
As shown in the figure , The category on the left contains five points , The right contains three points , among 1 Point and 2 The distance between the points Is the smallest distance between all points , therefore 1 Point and 2 The distance between points is the shortest distance of these two kinds .
Naturally , We think of , How to calculate 1 Point and 2 The distance between points ?
This is the distance between samples just mentioned , The most commonly used definition is Euclidean distance (EuclideanDistance), From the distance formula between two points in Euclidean space .
Euclidean distance formula in two-dimensional space
Such as : Definition a=[1,5],b=[2,1], solve a And b Euclidean distance between vectors ( Use matlab Realization )
a=[1,5],b=[2,1];
c=[a;b];
pdist(c,'euclidean')
Euclidean distance formula in three-dimensional space
n Euclidean distance formula of dimensional space
n Dimensional Euclidean space It's a set of points , Every point of it X It can be expressed as (x[1]x[2]…x[n]), Then set points X1(x[1]x[2]…x[n]) With the point X2(y[1]y[2]…y[n]) The distance between them can be expressed as
Two . thought
For the idea of systematic clustering , I use flow chart to show :
Let the initial sample have n individual , Each sample falls into its own category .
3、 ... and . give an example
Set the available quality indicators of milk production in a certain place X To measure , There are five bottles of fresh milk ,X The values of are 1.5,2.5,5,6.5,8.5. Please use the system clustering method to classify five bottles of milk .
Explain : Let the distance between samples use Euclidean distance , Use the shortest distance between classes , And record the samples as X1、X2、X3、X4、X5.
1) The calculated distance is shown in the table
class | X1 | X2 | X3 | X4 | X5 |
X1 | 0 | 1 | 3.5 | 5 | 7 |
X2 | 0 | 2.5 | 4 | 6 | |
X3 | 0 | 1.5 | 3.5 | ||
X4 | 0 | 2 | |||
X4 | 0 |
2) take X1 And X2 Group together , Write it down as N1
class | N1 | X3 | X4 | X5 |
N1 | 0 | 2.5 | 4 | 6 |
X3 | 0 | 1.5 | 2.5 | |
X4 | 0 | 2 | ||
X5 | 0 |
3) take X3 And X4 Group together , Write it down as N2
class | N1 | N2 | X5 |
N1 | 0 | 2.5 | 6 |
N2 | 0 | 2 | |
X5 | 0 |
4) take N2 And X5 Group together , Write it down as N3
class | N1 | N3 |
N1 | 0 | 2.5 |
N3 | 0 |
5) take N1 And N3 Group together , end
6) Draw a genealogical structure ( It is complicated to process data and draw pictures by yourself , Let's skip , The next part uses Spss Simulate and draw )
Four . Systematic clustering method Spss Realization
1) Data import ( This data has no practical significance, only for demonstration )
This is us. excel Data in table , The file named “milk.xlsx"
Then we turn on Spss24( If the version is too low, some functions may not be realized )
Now we have successfully imported the data , It's time to move on .
2) You can start clustering
We take the liberty of correcting a small problem ,Spss Set our several clusters under the classification column , But in fact, we know that clustering and classification are completely different concepts .
choice V2 As a variable ,V1 Mark the basis for the case .
stay ” chart “ Check “ Genealogy ”
3) Generate analysis results
Case summary a,b | |||||
individual case | |||||
It works | defect | A total of | |||
Number of cases | percentage | Number of cases | percentage | Number of cases | percentage |
5 | 100.0 | 0 | .0 | 5 | 100.0 |
a. Square Euclidean distance In the use |
b. Average connection ( Intergroup ) |
Here, the vertical icicle diagram is also generated by default , But in general, it is of little reference significance .
5、 ... and . Conclusion
Here, the sharing of systematic clustering is over .
I wish : Always happy .
边栏推荐
- [dynamic planning] p1220: interval DP: turn off the street lights
- Free open source web version of xshell [congratulations on a happy new year]
- 2021 v+ Quanzhen internet global innovation and Entrepreneurship Challenge, one of the top ten audio and video scene innovation and application pioneers
- Go web programming practice (2) -- process control statement
- China Indonesia advanced wound care market trend report, technological innovation and market forecast
- 2021 software security report: open source code, happiness and disaster depend on each other?
- A river of spring water flows eastward
- Go language learning summary (5) -- Summary of go learning notes
- Research Report on market supply and demand and strategy of China's plastic trunking industry
- Research Report on the overall scale, major manufacturers, major regions, products and application segmentation of power management units in the global market in 2022
猜你喜欢
Redis分布式锁故障,我忍不住想爆粗...
kernel tty_ struct
Huawei Hongmeng watch achieves fireworks display effect on New Year's Eve
AMD's largest transaction ever, the successful acquisition of Xilinx with us $35billion
Friends who firmly believe that human memory is stored in macromolecular substances, please take a look
7. Build native development environment
Today, I met a Alipay and took out 35K. It's really sandpaper to wipe my ass. it's a show for me
Talk about macromolecule coding theory and Lao Wang's fallacy from the perspective of evolution theory
Internal/validators js:124 throw new ERR_ INVALID_ ARG_ Type (name, 'string', value) -- solution
JDBC | Chapter 4: transaction commit and rollback
随机推荐
Adding data to the head or tail of the rar file can still decompress normally
Construction and maintenance of business website [1]
Construction and maintenance of business websites [6]
A river of spring water flows eastward
Research Report on plastic antioxidant industry - market status analysis and development prospect forecast
Huawei Hongmeng watch achieves fireworks display effect on New Year's Eve
China's log saw blade market trend report, technological innovation and market forecast
Research Report on the overall scale, major manufacturers, major regions, products and application segmentation of power management units in the global market in 2022
[12] the water of the waves is clear, which can wash my tassel. The water of the waves is muddy, which can wash my feet
Report on investment development and strategic recommendations of China's vibration isolator market, 2022-2027
Customized Huawei hg8546m restores Huawei's original interface
Analysis of enterprise financial statements [3]
Number of DP schemes
Investment strategy analysis of China's electronic information manufacturing industry and forecast report on the demand outlook of the 14th five year plan 2022-2028 Edition
[hands on deep learning]02 softmax regression
China's crude oil heater market trend report, technological innovation and market forecast
Codeworks global round 19 (CF 1637) a ~ e problem solution
China's Micro SD market trend report, technology dynamic innovation and market forecast
AMD's largest transaction ever, the successful acquisition of Xilinx with us $35billion
Go cache of go cache series