当前位置:网站首页>Market segmentation of supermarket customers based on purchase behavior data (RFM model)
Market segmentation of supermarket customers based on purchase behavior data (RFM model)
2022-07-06 06:39:00 【Nothing (sybh)】
Catalog
subject :
Market segmentation of supermarket customers based on purchase behavior data : Existing supermarket customers' purchasing behavior RFM Data sets ( Data files :RFM data .txt), Please use various clustering algorithms to segment customer groups . Complete the following questions :
1) Analyze customers' purchasing behavior RFM Data set ,R\F\M What are the distribution characteristics of these three variables ?(10 branch )
2) Try to divide the purchase into 4 class , And analyze the buying behavior characteristics of each customer .(10 branch )
3) Evaluation model , And analyze and gather 4 Whether the class is appropriate .(30 branch )
One 、 What do you mean RFM?
RFM It is a model for clustering user quality , Corresponding to three indicators
R(Recency): The time interval of the user's last consumption , Measure whether users are likely to lose
F(Frequency)
: The cumulative consumption frequency of users in the recent period , Measure user stickiness
M(Money): The user's accumulated consumption amount in the recent period , Measure users' spending power and loyalty
Two 、 clustering
# Item 1 : E-commerce user quality RFM Clustering analysis
from sklearn.cluster import KMeans
from sklearn import metrics
import matplotlib.pyplot as plt
from sklearn import preprocessing
# Import and clean data
data = pd.read_table('RFM data .txt',sep=" ")
# data=pd.read_table("RFM data .txt",encoding="gbk",sep=" ")
# data.user_id = data.user_id.astype('str')
print(data.info())
print(data.describe())
X = data.values[:,1:]
# Data standardization (z_score)
Model = preprocessing.StandardScaler()
X = Model.fit_transform(X)
# iteration , Choose the right one K
ch_score = []
ss_score = []
inertia = []
for k in range(2,10):
clf = KMeans(n_clusters=k,max_iter=1000)
pred = clf.fit_predict(X)
ch = metrics.calinski_harabasz_score(X,pred)
ss = metrics.silhouette_score(X,pred)
ch_score.append(ch)
ss_score.append(ss)
inertia.append(clf.inertia_)
# Make a comparison
fig = plt.figure()
ax1 = fig.add_subplot(131)
plt.plot(list(range(2,10)),ch_score,label='ch',c='y')
plt.title('CH(calinski_harabaz_score)')
plt.legend()
ax2 = fig.add_subplot(132)
plt.plot(list(range(2,10)),ss_score,label='ss',c='b')
plt.title(' Profile factor ')
plt.legend()
ax3 = fig.add_subplot(133)
plt.plot(list(range(2,10)),inertia,label='inertia',c='g')
plt.title('inertia')
plt.legend()
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['font.serif'] = ['SimHei'] # Set the normal display of Chinese
plt.show()
This time, we use 3 The clustering quality is comprehensively determined by three indicators ,CH, Contour coefficient and inertia fraction , The first 1 And the 3 The bigger the better , The contour coefficient is closer to 1 The better , Taken together , Gather into 4 Class effect is better
CH The bigger the indicator , The better the clustering effect is
clf.inertia_ It is a clustering evaluation index , I often use this . Talk about his shortcomings : This evaluation parameter represents the sum of the distances from a certain point in the cluster to the cluster , Although this method shows the fineness of clustering when the evaluation parameters are the smallest , But in this case, the division will be too fine , And it does not consider maximizing the distance from the point outside the cluster , therefore , I recommend method 2 :
Use the contour coefficient method K Choice of value , Here it is , I need to explain the contour coefficient , And why the contour coefficient is selected as the standard of internal evaluation , The formula of contour coefficient is :S=(b-a)/max(a,b), among a Is the average distance between a single sample and all samples in the same cluster ,b Is the average of all samples from a single sample to different clusters .
# According to the best K value , Clustering results
model = KMeans(n_clusters=4,max_iter=1000)
model.fit_predict(X)
labels = pd.Series(model.labels_)
centers = pd.DataFrame(model.cluster_centers_)
result1 = pd.concat([centers,labels.value_counts().sort_index(ascending=True)],axis=1) # Put the cluster center and the number of clusters together
result1.columns = list(data.columns[1:]) + ['counts']
print(result1)
result = pd.concat([data,labels],axis=1) # Put the original data and clustering results together
result.columns = list(data.columns)+['label'] # Change column names
pd.options.display.max_columns = None # Set to show all columns
print(result.groupby(['label']).agg('mean')) # Calculate the mean value of each index in groups
# Map the clustering results
fig = plt.figure()
ax1= fig.add_subplot(131)
ax1.plot(list(range(1,5)),result1.R,c='y',label='R')
plt.title('R indicators ')
plt.legend()
ax2= fig.add_subplot(132)
ax2.plot(list(range(1,5)),result1.F,c='b',label='F')
plt.title('F indicators ')
plt.legend()
ax3= fig.add_subplot(133)
ax3.plot(list(range(1,5)),result1.M,c='g',label='M')
plt.title('M indicators ')
plt.legend()
plt.show()
3、 ... and 、 Result analysis
1. Losing customers : Long consumption cycle 、 Less consumption 、 Consumption capacity is almost zero
2. Active customers : The consumption cycle is short 、 More times of consumption 、 Strong consumption ability
3.VIP customer : The consumption cycle is general 、 Consumption times are average 、 Consumption capacity is very strong
4. Wandering customers : The consumption cycle is general 、 Less consumption 、 Average spending power
1. Losing customers : Long consumption cycle 、 Less consumption 、 Consumption capacity is almost zero
2.vip customer : The consumption cycle is short 、 More times of consumption 、 Consumption capacity is very strong
3. Ordinary customers : The consumption cycle is general 、 Consumption times are average 、 Average spending power
边栏推荐
- Phishing & filename inversion & Office remote template
- Avtiviti创建表时报错:Error getting a new connection. Cause: org.apache.commons.dbcp.SQLNestedException
- Day 246/300 ssh连接提示“REMOTE HOST IDENTIFICATION HAS CHANGED! ”
- 金融德语翻译,北京专业的翻译公司
- Biomedical localization translation services
- Summary of the post of "Web Test Engineer"
- Luogu p2089 roast chicken
- LeetCode - 152 乘积最大子数组
- PHP uses redis to implement distributed locks
- Past and present lives of QR code and sorting out six test points
猜你喜欢
How do programmers remember code and programming language?
It is necessary to understand these characteristics in translating subtitles of film and television dramas
Phishing & filename inversion & Office remote template
LeetCode 732. My schedule III
论文翻译英译中,怎样做翻译效果好?
How effective is the Chinese-English translation of international economic and trade contracts
Redis core technology and basic architecture of actual combat: what does a key value database contain?
机器学习植物叶片识别
Drug disease association prediction based on multi-scale heterogeneous network topology information and multiple attributes
E-book CHM online CS
随机推荐
The whole process realizes the single sign on function and the solution of "canceltoken" of undefined when the request is canceled
基于JEECG-BOOT的list页面的地址栏参数传递
[English] Verb Classification of grammatical reconstruction -- English rabbit learning notes (2)
The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
[mqtt from getting started to improving series | 01] quickly build an mqtt test environment from 0 to 1
自动化测试环境配置
Cannot create poolableconnectionfactory (could not create connection to database server. error
Wish Dragon Boat Festival is happy
mysql的基础命令
LeetCode每日一题(1997. First Day Where You Have Been in All the Rooms)
LeetCode每日一题(1870. Minimum Speed to Arrive on Time)
生物医学本地化翻译服务
LeetCode每日一题(971. Flip Binary Tree To Match Preorder Traversal)
如何将flv文件转为mp4文件?一个简单的解决办法
How to translate professional papers and write English abstracts better
女生学软件测试难不难 入门门槛低,学起来还是比较简单的
How effective is the Chinese-English translation of international economic and trade contracts
生物医学英文合同翻译,关于词汇翻译的特点
字幕翻译中翻英一分钟多少钱?
Delete external table source data