当前位置:网站首页>同期群分析是什么?教你用 SQL 来搞定
同期群分析是什么?教你用 SQL 来搞定
2022-06-29 00:41:00 【俊红的数据分析之路】
目录
一、同期群分析的定义
二、SQL 步骤
1. 查看数据
2. 根据 uid 、年月聚合用户人数
3. 计算年月的差额(天数)
4. 计算年月的差额(月数)
5. 透视(根据 uid 、首次付费年月去透视年月差额的用户人数)
6. 计算留存率
一、同期群分析的定义
「同期群分析」(Cohort Analysis)是一种通过“纵横”结合对用户分群的细分类型分析的方法:
「横向上」——分析同期群随着周期推移而发生的变化
「纵向上」——分析在生命周期相同阶段的群组之间的差异
「同期群」指的是同一时期的群体,可以是同一天注册的用户、同一天第一次发生付费行为的用户等。
「周期的指标变化」是指用户在一定周期内的留存率、付费率等指标。
同期群分析包含三个核心的元素:
「客户首次行为时间」:这是划分同期群体的基点
「时间周期维度」:比如 N 日留存率、N 日转化率中的 N 日,一般即为 +N 日、+N 月
「变化的指标」:比如注册转化率、付款转化率、留存率等指标
同期群分析给到更加细致的衡量指标,可以实时监控真实的用户行为、衡量用户价值,并为营销方案的优化和改进提供支撑,避免“被平均”的虚荣数据。
二、SQL 步骤
下面我使用 PostgreSQL 拆分步骤来实现基于首单日期的用户留存率同期群报表,「每一步骤都是在前一步骤的基础上进行再加工」,这在代码中的子查询中也得到体现,理清了思路就会发现其实很简单。
重点有以下几点:
统计出每个用户的「首单时间」
计算首单时间和实际下单时间的「日期差」
对于付费用户数需要「去重统计」
注意字段「格式的转换」
1. 查看数据
-- 0. 查看数据
SELECT * FROM "日志" LIMIT 10;
2. 根据 uid 、年月聚合用户人数
-- 1. 根据 uid 、年月聚合用户人数
SELECT
"日志".uid,
to_char( to_date( "日志"."日期", 'YYYY-MM' ), 'YYYY-MM' ) AS 年月,
min(to_char( to_date( "日志"."日期", 'YYYY-MM' ), 'YYYY-MM' )) OVER(PARTITION BY "日志".uid) AS 首次付费年月
FROM
"日志"
GROUP BY
"日志".uid,
to_char( to_date( "日志"."日期", 'YYYY-MM' ), 'YYYY-MM' )
ORDER BY "日志".uid;
3. 计算年月的差额(天数)
-- 2. 计算年月的差额(天数)
SELECT *,to_date(t.年月,'YYYY-MM') - to_date(t.首次付费年月,'YYYY-MM') AS 天数差额
FROM (SELECT
"日志".uid,
to_char( to_date( "日志"."日期", 'YYYY-MM' ), 'YYYY-MM' ) AS 年月,
min(to_char( to_date( "日志"."日期", 'YYYY-MM' ), 'YYYY-MM' )) OVER(PARTITION BY "日志".uid) AS 首次付费年月
FROM
"日志"
GROUP BY
"日志".uid,
to_char( to_date( "日志"."日期", 'YYYY-MM' ), 'YYYY-MM' )
ORDER BY "日志".uid) AS t;
4. 计算年月的差额(月数)
-- 3. 计算年月的差额(月数)
SELECT t.*,
(case when t."天数差额" <= 30 then '首月'
when t."天数差额" <= 60 then '+1月'
when t."天数差额" <= 90 then '+2月'
when t."天数差额" <= 120 then '+3月'
when t."天数差额" <= 150 then '+4月'
else NULL
END) AS 月差额
FROM (SELECT *,to_date(t.年月,'YYYY-MM') - to_date(t.首次付费年月,'YYYY-MM') AS 天数差额
FROM (SELECT
"日志".uid,
to_char( to_date( "日志"."日期", 'YYYY-MM' ), 'YYYY-MM' ) AS 年月,
min(to_char( to_date( "日志"."日期", 'YYYY-MM' ), 'YYYY-MM' )) OVER(PARTITION BY "日志".uid) AS 首次付费年月
FROM
"日志"
GROUP BY
"日志".uid,
to_char( to_date( "日志"."日期", 'YYYY-MM' ), 'YYYY-MM' )
ORDER BY "日志".uid) AS t) AS t;
5. 透视(根据 uid 、首次付费年月去透视年月差额的用户人数)
-- 4. 透视(根据 uid 、首次付费年月去透视年月差额的用户人数)
SELECT t.首次付费年月,
count(distinct case when t.年月差额 = 0 then t.uid else NULL end) AS 首月,
count(distinct case when t.年月差额 = 1 then t.uid else NULL end) AS "+1月",
count(distinct case when t.年月差额 = 2 then t.uid else NULL end) AS "+2月",
count(distinct case when t.年月差额 = 3 then t.uid else NULL end) AS "+3月",
count(distinct case when t.年月差额 = 4 then t.uid else NULL end) AS "+4月"
FROM (SELECT * FROM (SELECT *,round((to_date(t.年月,'YYYY-MM') - to_date(t.首次付费年月,'YYYY-MM')) / 30,0) AS 年月差额
FROM (SELECT
"日志".uid:: text,
to_char( to_date( "日志"."日期", 'YYYY-MM' ), 'YYYY-MM' ) AS 年月,
min(to_char( to_date( "日志"."日期", 'YYYY-MM' ), 'YYYY-MM' )) OVER(PARTITION BY "日志".uid) AS 首次付费年月
FROM
"日志"
GROUP BY
"日志".uid,
to_char( to_date( "日志"."日期", 'YYYY-MM' ), 'YYYY-MM' )
ORDER BY "日志".uid) AS t) AS t) AS t
GROUP BY t.首次付费年月;
6. 计算留存率
-- 5. 计算留存率
SELECT t.首次付费年月,t.首月,
round((t."+1月"::numeric / t.首月::numeric) * 100,2)::text || '%' AS "1月后",
round((t."+2月"::numeric / t.首月::numeric) * 100,2)::text || '%' AS "2月后",
round((t."+3月"::numeric / t.首月::numeric) * 100,2)::text || '%' AS "3月后",
round((t."+4月"::numeric / t.首月::numeric) * 100,2)::text || '%' AS "4月后"
FROM(SELECT t.首次付费年月,
count(distinct case when t.年月差额 = 0 then t.uid else NULL end) AS 首月,
count(distinct case when t.年月差额 = 1 then t.uid else NULL end) AS "+1月",
count(distinct case when t.年月差额 = 2 then t.uid else NULL end) AS "+2月",
count(distinct case when t.年月差额 = 3 then t.uid else NULL end) AS "+3月",
count(distinct case when t.年月差额 = 4 then t.uid else NULL end) AS "+4月"
FROM (SELECT * FROM (SELECT *,round((to_date(t.年月,'YYYY-MM') - to_date(t.首次付费年月,'YYYY-MM')) / 30,0) AS 年月差额
FROM (SELECT
"日志".uid:: text,
to_char( to_date( "日志"."日期", 'YYYY-MM' ), 'YYYY-MM' ) AS 年月,
min(to_char( to_date( "日志"."日期", 'YYYY-MM' ), 'YYYY-MM' )) OVER(PARTITION BY "日志".uid) AS 首次付费年月
FROM
"日志"
GROUP BY
"日志".uid,
to_char( to_date( "日志"."日期", 'YYYY-MM' ), 'YYYY-MM' )
ORDER BY "日志".uid) AS t) AS t) AS t
GROUP BY t.首次付费年月) AS t;
- END -
对比Excel系列图书累积销量达15w册,让你轻松掌握数据分析技能,可以点击下方链接进行了解选购:边栏推荐
- After eight years of testing and opening experience and interview with 28K company, hematemesis sorted out high-frequency interview questions and answers
- Oracle uses sqlloader to prompt sql*loader-406 Import failed but no error was reported
- How to calculate the income tax of foreign-funded enterprises
- 浏览器缓存库设计总结(localStorage/indexedDB)
- Document management.
- 卷绕工艺与叠片工艺的对比
- mysql 8.0以上报2058 解决方式
- How to guarantee the delivery quality through the cloud effect test plan
- User login (remember the user) & user registration (verification code) [using cookie session technology]
- 学习通否认 QQ 号被盗与其有关:已报案;iPhone 14 量产工作就绪:四款齐发;简洁优雅的软件早已是明日黄花|极客头条
猜你喜欢

光纤滑环价格过高的原因
![[image detection] line recognition based on Hough transform (fitting angle bisector) with matlab code](/img/29/a3dc68ebc958ff96c3d8cc771a84f1.jpg)
[image detection] line recognition based on Hough transform (fitting angle bisector) with matlab code

Nodejs安装和下载

How to guarantee the delivery quality through the cloud effect test plan

每日一题:数组中数字出现的次数2
![[staff] pedal mark (step on pedal ped mark | release pedal * mark | corresponding pedal command in MIDI | continuous control signal | switch control signal)](/img/2b/e358b22d62ce6d683ed93418ff39fe.jpg)
[staff] pedal mark (step on pedal ped mark | release pedal * mark | corresponding pedal command in MIDI | continuous control signal | switch control signal)

be based on. NETCORE development blog project starblog - (13) add friendship link function

Daily practice: delete duplicates in the ordered array

The company has a new Post-00 test paper king. The old oilman said that he could not do it. He has been

滑环电机是如何工作的
随机推荐
FATAL ERROR: Could not find ./bin/my_print_defaults的解决办法
Redis common command manual
Leetcode daily question: implementing strstr()
养老年金险是理财产品吗?预期收益在哪看?
Comics | goodbye, postman! One stop collaboration makes apipost more fragrant!
EditText监听焦点
光纤滑环价格过高的原因
运营级智慧校园系统源码 智慧校园小程序源码+电子班牌+人脸识别系统
Reprint: VTK notes - clipping and segmentation - 3D curve or geometric cutting volume data (black mountain old demon)
2022_ 2_ 16 the second day of learning C language_ Constant, variable
Oracle uses sqlloader to prompt sql*loader-406 Import failed but no error was reported
Sampling with VerilogA module
Daily question 1: the number of numbers in the array
Chrome浏览器的基本使用
Redis常用命令手册
滑环的基本结构及工作原理分析
Bug risk level
UI高度自适应的修改方案
[image registration] SAR image registration based on particle swarm optimization Improved SIFT with matlab code
[200 opencv routines] 101 adaptive median filter