当前位置：网站首页>Brief introduction of [data mining] cluster analysis

Brief introduction of [data mining] cluster analysis

2022-07-24 05:42:00 【hongdi】

Catalog

One 、 What is cluster analysis ？

Two 、 Importance of cluster analysis

3、 ... and 、 The types of clustering algorithms

（ One ） Based on partition clustering algorithm

（ Two ） Based on hierarchical clustering algorithm

（ 3、 ... and ） Based on density clustering algorithm

（ Four ） Grid based clustering algorithm

（ 5、 ... and ） Clustering algorithm based on neural network

（ 6、 ... and ） Clustering algorithm based on statistics

Four 、 Application of cluster analysis

One 、 What is cluster analysis ？

Clustering analysis refers to the analysis process of grouping a collection of physical or abstract objects into several classes composed of similar objects , Its purpose is to collect data on a similar basis to classify .

Clustering is similar to classification , But different from the purpose of classification , It is to divide a group of data into several categories according to the similarity and difference of data . There is a great similarity between data belonging to the same category , But the data similarity between different categories is very small , Cross class data association is very low . The difference between clustering and classification lies in , The class required by clustering is unknown .

Two 、 Importance of cluster analysis

“ Birds of a feather flock together , Birds of a feather flock together ”, This is the basic ability of human beings to understand the world and Society for thousands of years , It is a universality that we must face to find value from big data 、 Basic questions , Is cognitive science as “ Discipline of discipline ” The first problem to be solved .

Whether it's politics 、 economic 、 literature 、 history 、 social 、 Culture 、 Or Mathematics 、 chemical 、 Medical agriculture 、 traffic 、 Geography 、 Big data from all walks of life or any macro or micro value discovery , With the help of big data clustering analysis , therefore , The primary problem of data analysis and mining is clustering , This clustering is interdisciplinary 、 Cross domain 、 Cross media . Big data clustering is the foundation of data intensive Science 、 The question of universality .

It's no exaggeration to say , If the clustering algorithm is confused , Or no “ to ground ” Of “ example ”, It's just a hoax to say that you are engaged in data mining .

If human cognitive science wants to make a breakthrough , First, we need to make a breakthrough in big data clustering , Clustering is the first step in mining the value of big data assets .

3、 ... and 、 The types of clustering algorithms

As a very active research field in data mining , There are many algorithms for cluster analysis .

（ One ） Based on partition clustering algorithm

1、k-means： Is a typical partition clustering algorithm , It uses a cluster center to represent a cluster , That is to say, the cluster point selected in the iterative process is not necessarily a point in the cluster , This algorithm can only deal with numerical data

2、k-modes：K-Means The extension of the algorithm , A simple matching method is used to measure the similarity of different types of data

3、k-prototypes： Combined with the K-Means and K-Modes Two algorithms , Able to handle mixed data

4、k-medoids： In the iterative process, a point in the cluster is selected as the aggregation point ,PAM Is a typical k-medoids Algorithm

5、CLARA：CLARA Algorithm in PAM On the basis of the sampling technique , Able to handle large scale data

6、CLARANS：CLARANS The algorithm converges PAM and CLARA Advantages of both , Is the first clustering algorithm for spatial databases

7、Focused CLARAN： Using spatial index technology to improve CLARANS The efficiency of the algorithm

8、PCM： Fuzzy set theory is introduced into cluster analysis and put forward PCM Fuzzy clustering algorithm

（ Two ） Based on hierarchical clustering algorithm

1、CURE： Sampling technique is used to analyze the data set first D Random sampling , Then partition the samples with Partition Technology , Then local clustering for each partition , Finally, the local clustering is clustered globally

2、ROCK： Random sampling technology is also used , When calculating the similarity between two objects , At the same time, the influence of surrounding objects is considered

3、CHEMALOEN（ Chameleon algorithm ）： First, the data set is constructed into a K- Nearest neighbor Gk , Then the graph is divided into Gk Divide into a large number of subgraphs , Each subgraph represents an initial sub cluster , Finally, a condensed hierarchical clustering algorithm is used to anti compound and merge sub clusters , Find the real result cluster

4、SBAC：SBAC The algorithm is used to calculate the similarity between objects , Considering the importance of attribute characteristics to reflect the essence of the object , Give a higher weight to the attribute that can better reflect the essence of the object

5、BIRCH：BIRCH The algorithm uses tree structure to process the data set , Leaf nodes store a cluster , Expressed by center and radius , Process each object in sequence , And divide it into the nearest node , This algorithm can also be used as the preprocessing process of other clustering algorithms

6、BUBBLE：BUBBLE The algorithm puts BIRCH The concept of center and radius of the algorithm is extended to ordinary distance space

7、BUBBLE-FM：BUBBLE-FM The algorithm reduces the number of distance calculations , Improved BUBBLE The efficiency of the algorithm

（ 3、 ... and ） Based on density clustering algorithm

1、DBSCAN：DBSCAN The algorithm is a typical density based clustering algorithm , The algorithm uses spatial index technology to search the neighborhood of the object , Introduced “ The core object ” and “ Density can reach ” And so on , Starting from the core object , All the objects with density can be grouped into a cluster

2、GDBSCAN： Algorithm through generalization DBSCAN The concept of neighborhood in Algorithm , To adapt to the characteristics of spatial objects

3、OPTICS：OPTICS The algorithm combines the automaticity and interactivity of clustering , In the order of clustering , Different parameters can be set for different clusters , To get users' satisfactory results

4、FDC：FDC The algorithm constructs k-d tree Divide the whole data space into several rectangular spaces , When the space dimension is small, it can be greatly improved DBSCAN The efficiency of

（ Four ） Grid based clustering algorithm

1、STING： Use grid cells to save data statistics , So as to achieve multi-resolution clustering

2、WaveCluster： The principle of wavelet transform is introduced into cluster analysis , It is mainly used in the field of signal processing .（ remarks ： Wavelet algorithm in signal processing , Graphic and image , It has important applications in fields such as encryption and decryption , It's a kind of profound and awesome thing ）

3、CLIQUE： It is a clustering algorithm that combines grid and density

（ 5、 ... and ） Clustering algorithm based on neural network

1、 Self organizing neural network SOM： The basic idea of this method is -- Input different samples from the outside to the artificial self-organizing mapping network , At the beginning , The location of the output excited cells caused by the input sample varies , But some cell groups will be formed after self-organization , They represent the input samples , It reflects the characteristics of the input sample

（ 6、 ... and ） Clustering algorithm based on statistics

1、COBWeb：COBWeb It is a general concept clustering method , It uses the form of classification tree to express hierarchical clustering

2、AutoClass： It is based on probability mixed model , Use the probability distribution of attributes to describe clustering , This method can deal with mixed data , But each attribute is required to be independent

Cluster analysis is an exploratory analysis , In the process of classification , People don't have to give a classification in advance , Cluster analysis can start from sample data , Automatic classification . Different methods are used in cluster analysis , Different conclusions are often drawn . Different researchers cluster the same set of data , The number of clusters obtained may not be consistent .

Four 、 Application of cluster analysis

1、 business

Cluster analysis is used to find different customer groups , And describe the characteristics of different customer groups through purchase patterns . Cluster analysis is an effective tool for market segmentation , It can also be used to study consumer behavior , Look for new potential markets 、 Choose the market for the experiment , And as a pretreatment of multivariate analysis .

2、 Electronic Commerce

Clustering analysis is also a very important aspect in data mining of website construction in e-commerce , Cluster customers with similar browsing behavior by grouping , And analyze the common characteristics of customers , It can better help e-commerce users understand their customers , Provide more appropriate services to customers .

The relevant knowledge of this article comes from the network .

原网站

版权声明
本文为[hongdi]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/205/202207240516306655.html