当前位置:网站首页>Brief introduction of [data mining] cluster analysis
Brief introduction of [data mining] cluster analysis
2022-07-24 05:42:00 【hongdi】
Catalog
One 、 What is cluster analysis ?
Two 、 Importance of cluster analysis
3、 ... and 、 The types of clustering algorithms
( One ) Based on partition clustering algorithm
( Two ) Based on hierarchical clustering algorithm
( 3、 ... and ) Based on density clustering algorithm
( Four ) Grid based clustering algorithm
( 5、 ... and ) Clustering algorithm based on neural network
( 6、 ... and ) Clustering algorithm based on statistics
Four 、 Application of cluster analysis
One 、 What is cluster analysis ?
Clustering analysis refers to the analysis process of grouping a collection of physical or abstract objects into several classes composed of similar objects , Its purpose is to collect data on a similar basis to classify .

Clustering is similar to classification , But different from the purpose of classification , It is to divide a group of data into several categories according to the similarity and difference of data . There is a great similarity between data belonging to the same category , But the data similarity between different categories is very small , Cross class data association is very low . The difference between clustering and classification lies in , The class required by clustering is unknown .
Two 、 Importance of cluster analysis
“ Birds of a feather flock together , Birds of a feather flock together ”, This is the basic ability of human beings to understand the world and Society for thousands of years , It is a universality that we must face to find value from big data 、 Basic questions , Is cognitive science as “ Discipline of discipline ” The first problem to be solved .
Whether it's politics 、 economic 、 literature 、 history 、 social 、 Culture 、 Or Mathematics 、 chemical 、 Medical agriculture 、 traffic 、 Geography 、 Big data from all walks of life or any macro or micro value discovery , With the help of big data clustering analysis , therefore , The primary problem of data analysis and mining is clustering , This clustering is interdisciplinary 、 Cross domain 、 Cross media . Big data clustering is the foundation of data intensive Science 、 The question of universality .
It's no exaggeration to say , If the clustering algorithm is confused , Or no “ to ground ” Of “ example ”, It's just a hoax to say that you are engaged in data mining .
If human cognitive science wants to make a breakthrough , First, we need to make a breakthrough in big data clustering , Clustering is the first step in mining the value of big data assets .
3、 ... and 、 The types of clustering algorithms
As a very active research field in data mining , There are many algorithms for cluster analysis .
( One ) Based on partition clustering algorithm
1、k-means: Is a typical partition clustering algorithm , It uses a cluster center to represent a cluster , That is to say, the cluster point selected in the iterative process is not necessarily a point in the cluster , This algorithm can only deal with numerical data
2、k-modes:K-Means The extension of the algorithm , A simple matching method is used to measure the similarity of different types of data
3、k-prototypes: Combined with the K-Means and K-Modes Two algorithms , Able to handle mixed data
4、k-medoids: In the iterative process, a point in the cluster is selected as the aggregation point ,PAM Is a typical k-medoids Algorithm
5、CLARA:CLARA Algorithm in PAM On the basis of the sampling technique , Able to handle large scale data
6、CLARANS:CLARANS The algorithm converges PAM and CLARA Advantages of both , Is the first clustering algorithm for spatial databases
7、Focused CLARAN: Using spatial index technology to improve CLARANS The efficiency of the algorithm
8、PCM: Fuzzy set theory is introduced into cluster analysis and put forward PCM Fuzzy clustering algorithm
( Two ) Based on hierarchical clustering algorithm
1、CURE: Sampling technique is used to analyze the data set first D Random sampling , Then partition the samples with Partition Technology , Then local clustering for each partition , Finally, the local clustering is clustered globally
2、ROCK: Random sampling technology is also used , When calculating the similarity between two objects , At the same time, the influence of surrounding objects is considered
3、CHEMALOEN( Chameleon algorithm ): First, the data set is constructed into a K- Nearest neighbor Gk , Then the graph is divided into Gk Divide into a large number of subgraphs , Each subgraph represents an initial sub cluster , Finally, a condensed hierarchical clustering algorithm is used to anti compound and merge sub clusters , Find the real result cluster
4、SBAC:SBAC The algorithm is used to calculate the similarity between objects , Considering the importance of attribute characteristics to reflect the essence of the object , Give a higher weight to the attribute that can better reflect the essence of the object
5、BIRCH:BIRCH The algorithm uses tree structure to process the data set , Leaf nodes store a cluster , Expressed by center and radius , Process each object in sequence , And divide it into the nearest node , This algorithm can also be used as the preprocessing process of other clustering algorithms
6、BUBBLE:BUBBLE The algorithm puts BIRCH The concept of center and radius of the algorithm is extended to ordinary distance space
7、BUBBLE-FM:BUBBLE-FM The algorithm reduces the number of distance calculations , Improved BUBBLE The efficiency of the algorithm
( 3、 ... and ) Based on density clustering algorithm
1、DBSCAN:DBSCAN The algorithm is a typical density based clustering algorithm , The algorithm uses spatial index technology to search the neighborhood of the object , Introduced “ The core object ” and “ Density can reach ” And so on , Starting from the core object , All the objects with density can be grouped into a cluster
2、GDBSCAN: Algorithm through generalization DBSCAN The concept of neighborhood in Algorithm , To adapt to the characteristics of spatial objects
3、OPTICS:OPTICS The algorithm combines the automaticity and interactivity of clustering , In the order of clustering , Different parameters can be set for different clusters , To get users' satisfactory results
4、FDC:FDC The algorithm constructs k-d tree Divide the whole data space into several rectangular spaces , When the space dimension is small, it can be greatly improved DBSCAN The efficiency of
( Four ) Grid based clustering algorithm
1、STING: Use grid cells to save data statistics , So as to achieve multi-resolution clustering
2、WaveCluster: The principle of wavelet transform is introduced into cluster analysis , It is mainly used in the field of signal processing .( remarks : Wavelet algorithm in signal processing , Graphic and image , It has important applications in fields such as encryption and decryption , It's a kind of profound and awesome thing )
3、CLIQUE: It is a clustering algorithm that combines grid and density
( 5、 ... and ) Clustering algorithm based on neural network
1、 Self organizing neural network SOM: The basic idea of this method is -- Input different samples from the outside to the artificial self-organizing mapping network , At the beginning , The location of the output excited cells caused by the input sample varies , But some cell groups will be formed after self-organization , They represent the input samples , It reflects the characteristics of the input sample
( 6、 ... and ) Clustering algorithm based on statistics
1、COBWeb:COBWeb It is a general concept clustering method , It uses the form of classification tree to express hierarchical clustering
2、AutoClass: It is based on probability mixed model , Use the probability distribution of attributes to describe clustering , This method can deal with mixed data , But each attribute is required to be independent
Cluster analysis is an exploratory analysis , In the process of classification , People don't have to give a classification in advance , Cluster analysis can start from sample data , Automatic classification . Different methods are used in cluster analysis , Different conclusions are often drawn . Different researchers cluster the same set of data , The number of clusters obtained may not be consistent .
Four 、 Application of cluster analysis
1、 business
Cluster analysis is used to find different customer groups , And describe the characteristics of different customer groups through purchase patterns . Cluster analysis is an effective tool for market segmentation , It can also be used to study consumer behavior , Look for new potential markets 、 Choose the market for the experiment , And as a pretreatment of multivariate analysis .
2、 Electronic Commerce
Clustering analysis is also a very important aspect in data mining of website construction in e-commerce , Cluster customers with similar browsing behavior by grouping , And analyze the common characteristics of customers , It can better help e-commerce users understand their customers , Provide more appropriate services to customers .
The relevant knowledge of this article comes from the network .
边栏推荐
猜你喜欢
随机推荐
黑龙江省SVG格式地图的创建及生成
MySQL误操作后如何快速恢复数据
达梦数据库_LENGTH_IN_CHAR和CHARSET的影响情况
盘点波卡生态潜力项目 | 跨链特性促进多赛道繁荣
mysqldump 导出中文乱码
Use of wechat applet map
[vSphere high availability] handling after host failure or isolation
mysql查询手机号码后四位,前几位怎么写?
How to export Excel files with php+mysql
Flink函数(1):rich function
Wechat applet returns parameters or trigger events
How to use phpstudy to build WordPress website locally
Vscode configuring autoprefixer
在本地怎么使用phpstudy搭建WordPress网站
Canvas - fill
Principle of fusdt liquidity pledge mining development logic system
Scarcity in Web3: how to become a winner in a decentralized world
Flink 并行度的理解(parallel)
Flink函数(2):CheckpointedFunction
Station B video comment crawling - take the blade of ghost destruction as an example (and store it in CSV)









