当前位置：网站首页>Market segmentation of supermarket customers based on purchase behavior data (RFM model)

Market segmentation of supermarket customers based on purchase behavior data (RFM model)

2022-07-06 06:39:00 【Nothing (sybh)】

Catalog

subject ：

Market segmentation of supermarket customers based on purchase behavior data ： Existing supermarket customers' purchasing behavior RFM Data sets （ Data files ：RFM data .txt）, Please use various clustering algorithms to segment customer groups . Complete the following questions ：

1） Analyze customers' purchasing behavior RFM Data set ,R\F\M What are the distribution characteristics of these three variables ？（10 branch ）

2） Try to divide the purchase into 4 class , And analyze the buying behavior characteristics of each customer .（10 branch ）

3） Evaluation model , And analyze and gather 4 Whether the class is appropriate .（30 branch ）

One 、 What do you mean RFM?

RFM It is a model for clustering user quality , Corresponding to three indicators

R(Recency)： The time interval of the user's last consumption , Measure whether users are likely to lose
F(Frequency)
： The cumulative consumption frequency of users in the recent period , Measure user stickiness
M（Money)： The user's accumulated consumption amount in the recent period , Measure users' spending power and loyalty

Two 、 clustering

#  Item 1 ： E-commerce user quality RFM Clustering analysis 

from sklearn.cluster import KMeans
from sklearn import metrics
import matplotlib.pyplot as plt
from sklearn import preprocessing

#  Import and clean data 
data = pd.read_table('RFM data .txt',sep=" ")
# data=pd.read_table("RFM data .txt",encoding="gbk",sep=" ")
# data.user_id = data.user_id.astype('str')
print(data.info())
print(data.describe())
X = data.values[:,1:]

#  Data standardization (z_score)
Model = preprocessing.StandardScaler()
X = Model.fit_transform(X)

#  iteration , Choose the right one K
ch_score = []
ss_score = []
inertia = []
for k in range(2,10):
    clf = KMeans(n_clusters=k,max_iter=1000)
    pred = clf.fit_predict(X)
    ch = metrics.calinski_harabasz_score(X,pred)
    ss = metrics.silhouette_score(X,pred)
    ch_score.append(ch)
    ss_score.append(ss)
    inertia.append(clf.inertia_)

#  Make a comparison 
fig = plt.figure()
ax1 = fig.add_subplot(131)
plt.plot(list(range(2,10)),ch_score,label='ch',c='y')
plt.title('CH(calinski_harabaz_score)')
plt.legend()

ax2 = fig.add_subplot(132)
plt.plot(list(range(2,10)),ss_score,label='ss',c='b')
plt.title(' Profile factor ')
plt.legend()

ax3 = fig.add_subplot(133)
plt.plot(list(range(2,10)),inertia,label='inertia',c='g')
plt.title('inertia')
plt.legend()
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['font.serif'] = ['SimHei']  #  Set the normal display of Chinese 
plt.show()

This time, we use 3 The clustering quality is comprehensively determined by three indicators ,CH, Contour coefficient and inertia fraction , The first 1 And the 3 The bigger the better , The contour coefficient is closer to 1 The better , Taken together , Gather into 4 Class effect is better

CH The bigger the indicator , The better the clustering effect is

clf.inertia_ It is a clustering evaluation index , I often use this . Talk about his shortcomings ： This evaluation parameter represents the sum of the distances from a certain point in the cluster to the cluster , Although this method shows the fineness of clustering when the evaluation parameters are the smallest , But in this case, the division will be too fine , And it does not consider maximizing the distance from the point outside the cluster , therefore , I recommend method 2 ：

Use the contour coefficient method K Choice of value , Here it is , I need to explain the contour coefficient , And why the contour coefficient is selected as the standard of internal evaluation , The formula of contour coefficient is ：S=(b-a)/max(a,b), among a Is the average distance between a single sample and all samples in the same cluster ,b Is the average of all samples from a single sample to different clusters .

#  According to the best K value , Clustering results 
model = KMeans(n_clusters=4,max_iter=1000)
model.fit_predict(X)
labels = pd.Series(model.labels_)
centers = pd.DataFrame(model.cluster_centers_)
result1 = pd.concat([centers,labels.value_counts().sort_index(ascending=True)],axis=1) #  Put the cluster center and the number of clusters together 
result1.columns = list(data.columns[1:]) + ['counts']
print(result1)
result = pd.concat([data,labels],axis=1)   #  Put the original data and clustering results together 
result.columns = list(data.columns)+['label']  #  Change column names 
pd.options.display.max_columns = None  #  Set to show all columns 
print(result.groupby(['label']).agg('mean')) #  Calculate the mean value of each index in groups

#  Map the clustering results 

fig = plt.figure()
ax1= fig.add_subplot(131)
ax1.plot(list(range(1,5)),result1.R,c='y',label='R')
plt.title('R indicators ')
plt.legend()
ax2= fig.add_subplot(132)
ax2.plot(list(range(1,5)),result1.F,c='b',label='F')
plt.title('F indicators ')
plt.legend()
ax3= fig.add_subplot(133)
ax3.plot(list(range(1,5)),result1.M,c='g',label='M')
plt.title('M indicators ')
plt.legend()
plt.show()

3、 ... and 、 Result analysis

1. Losing customers ： Long consumption cycle 、 Less consumption 、 Consumption capacity is almost zero

2. Active customers ： The consumption cycle is short 、 More times of consumption 、 Strong consumption ability

3.VIP customer ： The consumption cycle is general 、 Consumption times are average 、 Consumption capacity is very strong

4. Wandering customers ： The consumption cycle is general 、 Less consumption 、 Average spending power

1. Losing customers ： Long consumption cycle 、 Less consumption 、 Consumption capacity is almost zero

2.vip customer ： The consumption cycle is short 、 More times of consumption 、 Consumption capacity is very strong

3. Ordinary customers ： The consumption cycle is general 、 Consumption times are average 、 Average spending power

原网站

版权声明
本文为[Nothing (sybh)]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/187/202207060629132881.html

当前位置：网站首页>Market segmentation of supermarket customers based on purchase behavior data (RFM model)

Market segmentation of supermarket customers based on purchase behavior data (RFM model)

subject ：

One 、 What do you mean RFM?

Two 、 clustering

3、 ... and 、 Result analysis

边栏推荐

猜你喜欢

随机推荐