当前位置:网站首页>电商项目常见连续登录,消费,日期等问题
电商项目常见连续登录,消费,日期等问题
2022-08-02 14:05:00 【大学生爱编程】
一.数据,建表语句
id datestr amount
1,2019-02-08,6214.23
1,2019-02-08,6247.32
1,2019-02-09,85.63
1,2019-02-09,967.36
1,2019-02-10,85.69
1,2019-02-12,769.85
1,2019-02-13,943.86
1,2019-02-14,538.42
1,2019-02-15,369.76
1,2019-02-16,369.76
1,2019-02-18,795.15
1,2019-02-19,715.65
1,2019-02-21,537.71
2,2019-02-08,6214.23
2,2019-02-08,6247.32
2,2019-02-09,85.63
2,2019-02-09,967.36
2,2019-02-10,85.69
2,2019-02-12,769.85
2,2019-02-13,943.86
2,2019-02-14,943.18
2,2019-02-15,369.76
2,2019-02-18,795.15
2,2019-02-19,715.65
2,2019-02-21,537.71
3,2019-02-08,6214.23
3,2019-02-08,6247.32
3,2019-02-09,85.63
3,2019-02-09,967.36
3,2019-02-10,85.69
3,2019-02-12,769.85
3,2019-02-13,943.86
3,2019-02-14,276.81
3,2019-02-15,369.76
3,2019-02-16,369.76
3,2019-02-18,795.15
3,2019-02-19,715.65
3,2019-02-21,537.71
create table deal_tb(
id string
,datestr string
,amount string
)row format delimited fields terminated by ',';
二.计算逻辑
1.按照用户和日期进行分组,求每用户每天的消费
select id,datestr,sum(amount) as sum_amount from deal_tb group by id,datestr;
2.根据用户id进行开窗按照日期进行排序,如果日期减去序号数是相同的说明这几行日期是连续的
对日期进行开窗排序
select t1.id as id,t1.datestr as datestr,t1.sum_amount as sum_amount,row_number() over(partition by id order by datestr) as rn from (select id,datestr,sum(amount) as sum_amount from deal_tb group by id,datestr) t1;
上表起名为tt1,注意用到日期函数date_sub(tt1.datestr,tt1.rn)
日期减去序号再得出一列
select tt1.id,tt1.datestr,tt1.sum_amount,date_sub(tt1.datestr,tt1.rn) as grep from (select t1.id as id,t1.datestr as datestr,t1.sum_amount as sum_amount,row_number() over(partition by id order by datestr) as rn from (select id,datestr,sum(amount) as sum_amount from deal_tb group by id,datestr) t1)tt1;
临时结果
1 2019-02-08 12461.55 2019-02-07
1 2019-02-09 1052.99 2019-02-07
1 2019-02-10 85.69 2019-02-07
1 2019-02-12 769.85 2019-02-08
1 2019-02-13 943.86 2019-02-08
1 2019-02-14 538.42 2019-02-08
1 2019-02-15 369.76 2019-02-08
1 2019-02-16 369.76 2019-02-08
1 2019-02-18 795.15 2019-02-09
1 2019-02-19 715.65 2019-02-09
1 2019-02-21 537.71 2019-02-10
2 2019-02-08 12461.55 2019-02-07
2 2019-02-09 1052.99 2019-02-07
2 2019-02-10 85.69 2019-02-07
2 2019-02-12 769.85 2019-02-08
2 2019-02-13 943.86 2019-02-08
2 2019-02-14 943.18 2019-02-08
2 2019-02-15 369.76 2019-02-08
2 2019-02-18 795.15 2019-02-10
2 2019-02-19 715.65 2019-02-10
2 2019-02-21 537.71 2019-02-11
3 2019-02-08 12461.55 2019-02-07
3 2019-02-09 1052.99 2019-02-07
3 2019-02-10 85.69 2019-02-07
3 2019-02-12 769.85 2019-02-08
3 2019-02-13 943.86 2019-02-08
3 2019-02-14 276.81 2019-02-08
3 2019-02-15 369.76 2019-02-08
3 2019-02-16 369.76 2019-02-08
3 2019-02-18 795.15 2019-02-09
3 2019-02-19 715.65 2019-02-09
3 2019-02-21 537.71 2019-02-10
3.统计用户连续交易总额,连续登录天数连续登录开始时间个结束时间以及间隔天数
连续交易总额:
以id和grep进行分组,gerp相同的表示是连续交易的
select ttt1.id,ttt1.grep,sum(ttt1.sum_amount) from
(select tt1.id as id,tt1.datestr as datestr,tt1.sum_amount as sum_amount,date_sub(tt1.datestr,tt1.rn) as grep from (select t1.id as id,t1.datestr as datestr,t1.sum_amount as sum_amount,row_number() over(partition by id order by datestr) as rn from (select id,datestr,sum(amount) as sum_amount from deal_tb group by id,datestr) t1)tt1)ttt1 group by id,grep;
临时结果:(小数列可以用round函数处理一下:round(sum(ttt1.sum_amount)))
1 2019-02-07 13600.23
1 2019-02-08 2991.6500000000005
1 2019-02-09 1510.8
1 2019-02-10 537.71
2 2019-02-07 13600.23
2 2019-02-08 3026.6499999999996
2 2019-02-10 1510.8
2 2019-02-11 537.71
3 2019-02-07 13600.23
3 2019-02-08 2730.04
3 2019-02-09 1510.8
3 2019-02-10 537.71
根据id,grep分组后查询列增加count(1)即可得到连续登录天数(代码省略)
根据id,grep分组后的datestr列,最小的即为连续登录的开始时间,最大的即为连续登录的结束时间
多出连续登录的开始时间和结束时间两列:
select ttt1.id,ttt1.grep,sum(ttt1.sum_amount),min(datestr),max(datestr),count(1) from (select tt1.id as id,tt1.datestr as datestr,tt1.sum_amount as sum_amount,date_sub(tt1.datestr,tt1.rn) as grep from (select t1.id as id,t1.datestr as datestr,t1.sum_amount as sum_amount,row_number() over(partition by id order by datestr) as rn from (select id,datestr,sum(amount) as sum_amount from deal_tb group by id,datestr) t1)tt1)ttt1 group by id,grep;
4.间隔时间:指的是每一次连续登录时间的间隔
根据id进行聚合开窗,grep相减得到的值就是所求
举例:排序开窗后的结果
datestr rn grep
2.8 1 2.7
2.9 2 2.7 2.9-2.7=2 间隔两天
2.10 3 2.7
2.13 4 2.9 2.10与2.13之间也是间隔两天
2.14 5 2.9
2.15 6 2.9
grep相同表示是连续登录的,根据grep分组后,相减得到的就是连续登录的间隔时间
聚合开窗函数:lag(字段,n):获取当前行的前几行的那一行数据
select ttt1.id,ttt1.grep,sum(ttt1.sum_amount),min(datestr),max(datestr),count(1),datediff(grep,lag(grep,1) over(partition by ttt1.id order by grep)) from (select tt1.id as id,tt1.datestr as datestr,tt1.sum_amount as sum_amount,date_sub(tt1.datestr,tt1.rn) as grep from (select t1.id as id,t1.datestr as datestr,t1.sum_amount as sum_amount,row_number() over(partition by id order by datestr) as rn from (select id,datestr,sum(amount) as sum_amount from deal_tb group by id,datestr) t1)tt1)ttt1 group by id,grep;
边栏推荐
猜你喜欢
随机推荐
MySQL知识总结 (五) 锁
drf view component
drf routing component Routers
VS Code无法安装插件之Unable to install because, the extension '' compatible with current version
重新学习编程day1 【初始c语言】【c语言编写出计算两个数之和的代码】
The specific operation process of cloud GPU (Hengyuan cloud) training
MySQL知识总结 (三) 索引
执行栈和执行上下文
C语言日记 2 标识符
华为防火墙
Flask request application context source code analysis
Unit 12 associated serialization
C语言一维数组练习——将m个元素移动到数组尾部
YOLOv7 uses cloud GPU to train its own dataset
利用c语言实现对键盘输入的一串字符的各类字符的计数
科创知识年度盛会,中国科创者大会8月6日首场开幕!
我的第一篇博客
C语言日记 1“Hello world“
C语言日记 7 输入/输出格式控制
8576 Basic operations of sequential linear tables







![[VCU] Detailed S19 file (S-record)](/img/00/ccff024d919398230f9b876c60acf6.png)

