当前位置:网站首页>SQL Server - window function - solve the problem of filtering consecutive n records
SQL Server - window function - solve the problem of filtering consecutive n records
2022-06-27 19:53:00 【Lazy Ethan】
Summary
When we are developing an application system to process various reports , Sometimes the temperature is lower than for several consecutive days 0 degree , Similar requirements for continuous login to the system many times in a day . This paper mainly introduces how to use SQL Server Window function in , To solve these complex query problems .
In this paper, two examples will be used to illustrate .
Code and Implementation
Statistics of continuous login in one day 3 Secondary users
Build the predicative sentence as follows :
if object_id('login_details') is not null
drop table login_details;
create table login_details(
login_id int primary key,
user_name varchar(50) not null,
login_date date
);
The login log contains the login Id, User name and login date , among login_id It's the primary key .
See Appendix for data initialization code .
We need to count the user names and dates of those who log in three or more times in a day . Continuous login Id continuity .
The code is as follows :
;WITH DUPICATE_3 AS (
SELECT
ld.[login_id], ld.[user_name], ld.[login_date],
CASE
WHEN
ld.[user_name] = LEAD(ld.[user_name]) OVER(PARTITION BY ld.[login_date] ORDER BY ld.[login_id ])
AND
ld.[user_name] = LEAD(ld.[user_name],2) OVER(PARTITION BY ld.[login_date] ORDER BY ld.[login_id ])
THEN 1
WHEN
ld.[user_name] = LAG(ld.[user_name]) OVER(PARTITION BY ld.[login_date] ORDER BY ld.[login_id ])
AND
ld.[user_name] = LEAD(ld.[user_name]) OVER(PARTITION BY ld.[login_date] ORDER BY ld.[login_id ])
THEN 1
WHEN
ld.[user_name] = LAG(ld.[user_name]) OVER(PARTITION BY ld.[login_date] ORDER BY ld.[login_id ])
AND
ld.[user_name] = LAG(ld.[user_name],2) OVER(PARTITION BY ld.[login_date] ORDER BY ld.[login_id ])
THEN 1
END AS TAG
FROM [login_details] ld
)
,DUPICATE_FILTER AS(
SELECT DISTINCT d.[user_name], d.[login_date]
FROM DUPICATE_3 d WHERE d.TAG = 1
)
SELECT
d4.[user_name],d4.login_date,
COUNT(d4.[user_name]) AS login_time
FROM DUPICATE_FILTER d4
LEFT JOIN DUPICATE_3 d3
ON d4.login_date = d3.login_date AND d4.[user_name] = d3.[user_name]
GROUP BY d4.[user_name],d4.login_date
- Avoid nested queries , Define a DIPICATE_3 Of CTE. The CTE It mainly generates a tag column TAG , If you log in three or more times a day , Then the column is 1, Other cases should be classified as 0.
- Through window analysis function LEAD/LAG, Data analysis . Group by date , Each group is logged in by Id Sort ,
a. If the user name of the current record and the next , The user name of the next item is the same , Then the record is marked as 1;
b. If the current user name and the previous , The user names in the next entry are the same , Then the record is marked as 1;
c. If the current user name and the previous , The user names in the previous item are the same , Then the record is marked as 1; - Total logins , The query results are as follows :

The temperature is less than for three consecutive days 0 Degree record
Build the predicative sentence as follows :
if object_id('weather','U') is not null
drop table weather
create table weather
(
id int primary key,
city varchar(50) not null,
temperature int not null,
day date not null
);
The meteorological information record table contains Id, City name , Temperature and record date , among id It's the primary key , This column is a self incrementing numeric column .
See Appendix for data initialization code .
We need to count the temperature less than... For three consecutive days 0 Degree record .
The solution to this problem is similar to the previous login problem , Give the solution directly
WITH WEATHER_ADD_TAG AS (
SELECT *,
CASE
WHEN
ld.temperature < 0
AND
LEAD(ld.temperature) OVER(PARTITION BY 1 ORDER BY ld.[day]) < 0
AND
LEAD(ld.temperature,2) OVER(PARTITION BY 1 ORDER BY ld.[day]) < 0
THEN 1
WHEN
LAG(ld.temperature) OVER(PARTITION BY 1 ORDER BY ld.[day]) < 0
AND
ld.temperature < 0
AND
LEAD(ld.temperature) OVER(PARTITION BY 1 ORDER BY ld.[day ]) < 0
THEN 1
WHEN
LAG(ld.temperature) OVER(PARTITION BY 1 ORDER BY ld.[day ]) < 0
AND
LAG(ld.temperature,2) OVER(PARTITION BY 1 ORDER BY ld.[day ]) < 0
AND
ld.temperature < 0
THEN 1
END AS TAG
FROM weather ld
)
SELECT * FROM WEATHER_ADD_TAG WHERE TAG = 1
The results are as follows :

Scheme optimization
The purpose of the scheme is to case when Part is too cumbersome , What we need is to find out that the temperature is lower than 0 Degree record , There is no need for strict string matching as in the previous example .
The temperature is less than for three consecutive days or more 0 Degree equivalent to 3 The high temperature within days is less than 0 degree .
The optimization code is as follows :
SELECT Id, City, Temperature, Day
FROM
(
SELECT *,
CASE
WHEN max(w.temperature) OVER(PARTITION BY 1 ORDER BY day ROWS BETWEEN CURRENT ROW and 2 FOLLOWING ) < 0
THEN 1
WHEN max(w.temperature) OVER(PARTITION BY 1 ORDER BY day ROWS BETWEEN 1 PRECEDING and 1 FOLLOWING ) < 0
THEN 1
WHEN max(w.temperature) OVER(PARTITION BY 1 ORDER BY day ROWS BETWEEN 2 PRECEDING and CURRENT ROW ) < 0
THEN 1
END AS TAG
FROM weather w
) x WHERE x.TAG = 1;
The results are as follows :

There was a problem with the result , Two more days to record .
LAG/LEAD If the records do not exist in the two methods , For example, the record before the first record , Records after the last record , This is the NULL To process and participate in operations , Can be filtered out .
This example uses FOLLOWING/PRECEDING To get the previous and subsequent records , For records that don't exist , It doesn't follow NULL To deal with it . Records before the first or after the last do not exist , Will not participate in the operation .1 month 1 Records before No. 1 are not treated as null values , It is not involved in computation , therefore 1 month 1 Number and 2 The temperature of No 0 degree , They are added to the final result .
Solution :
Add two records that do not meet the requirements , As the first and last record ,
The code is as follows :
;WITH APPEND_MIN_MAX_DATE_CTE as (
SELECT * FROM weather w
UNION
SELECT 1, 'London', 0, '2020-01-01'
UNION
SELECT 1, 'London', 0, '2050-01-01'
)
SELECT Id, City, Temperature, Day
FROM
(
SELECT *,
CASE
WHEN max(w.temperature) OVER(PARTITION BY 1 ORDER BY day ROWS BETWEEN CURRENT ROW and 2 FOLLOWING ) < 0
THEN 1
WHEN max(w.temperature) OVER(PARTITION BY 1 ORDER BY day ROWS BETWEEN 1 PRECEDING and 1 FOLLOWING ) < 0
THEN 1
WHEN max(w.temperature) OVER(PARTITION BY 1 ORDER BY day ROWS BETWEEN 2 PRECEDING and CURRENT ROW ) < 0
THEN 1
END AS TAG
FROM APPEND_MIN_MAX_DATE_CTE w
) x WHERE x.TAG = 1;
The results are as follows :

Statistical continuity N The temperature is less than 0 Degree record
The following needs to be upgraded , No more specific days , Instead, the user enters , Control by oneself .
obviously , The existing schemes are based on the known days , Unable to meet new needs . We need to redefine continuity N Heaven is in T-SQL The determination method in .
The implementation code is as follows :
;WITH ADD_ROW_NUMBER_CTE AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY 1 ORDER BY [day])AS RN
FROM weather w
),
ADD_ROW_NUMBER_LT_0_CTE AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY 1 ORDER BY [day])AS RN_LT_0
FROM ADD_ROW_NUMBER_CTE WHERE temperature < 0
),
ADD_DIFF_CTE AS (
SELECT *,
(c.RN - c.RN_LT_0) AS DIFF
FROM ADD_ROW_NUMBER_LT_0_CTE c
),
ADD_COUNT_CTE AS (
SELECT *,
COUNT(*) OVER (PARTITION BY DIFF ORDER BY DIFF) AS CNT
FROM ADD_DIFF_CTE
)
SELECT * FROM ADD_COUNT_CTE WHERE CNT = 4
- Definition CTE,ADD_ROW_NUMBER_CTE, newly added RN Serial number column , Sort by date .
- Definition CTE,ADD_ROW_NUMBER_LT_0_CTE , newly added RN_LT_0 Serial number column , Sort by date , But the filtered temperature is greater than 0 The record of .
- seek RN and RN_LT_0 Difference , Columns with the same difference , Prove their continuity .
- Definition CTE,ADD_COUNT_CTE , Count the number of the same difference , This number means that the continuous temperature is lower than 0 The number of days , We can set any number , To meet the needs .
appendix
Log in to the log sheet
if object_id('login_details') is not null
drop table login_details;
create table login_details(
login_id int primary key,
user_name varchar(50) not null,
login_date date);
truncate table login_details;
insert into login_details values
(101, 'Michael', GETDATE()),
(102, 'James', GETDATE()),
(103, 'Stewart', DATEADD(DD,1,GETDATE())),
(104, 'Stewart', DATEADD(DD,1,GETDATE())),
(105, 'Stewart', DATEADD(DD,1,GETDATE())),
(106, 'Michael', DATEADD(DD,2,GETDATE())),
(107, 'Michael', DATEADD(DD,2,GETDATE())),
(108, 'Stewart', DATEADD(DD,3,GETDATE())),
(109, 'Stewart', DATEADD(DD,3,GETDATE())),
(110, 'James', DATEADD(DD,4,GETDATE())),
(111, 'James', DATEADD(DD,4,GETDATE())),
(112, 'James', DATEADD(DD,4,GETDATE())),
(113, 'James', DATEADD(DD,4,GETDATE())),
(114, 'James', DATEADD(DD,5,GETDATE())),
(115, 'Charles', DATEADD(DD,1,GETDATE())),
(116, 'Charles', DATEADD(DD,1,GETDATE())),
(117, 'Charles', DATEADD(DD,1,GETDATE()));
Meteorological information table
if object_id('weather','U') is not null
drop table weather
create table weather
(
id int primary key,
city varchar(50) not null,
temperature int not null,
day date not null
);
delete from weather;
insert into weather values
(1, 'London', -1, '2021-01-01'),
(2, 'London', -2, '2021-01-02'),
(3, 'London', 4, '2021-01-03'),
(4, 'London', 1, '2021-01-04'),
(5, 'London', -2, '2021-01-05'),
(6, 'London', -5, '2021-01-06'),
(7, 'London', -7, '2021-01-07'),
(8, 'London', 5, '2021-01-08'),
(9, 'London', -20,'2021-01-09'),
(10, 'London', 20, '2021-01-10'),
(11, 'London', 22,'2021-01-11'),
(12, 'London', -1, '2021-01-12'),
(13, 'London', -2, '2021-01-13'),
(14, 'London', -2, '2021-01-14'),
(15, 'London', -4, '2021-01-15'),
(16, 'London', -9, '2021-01-16'),
(17, 'London', 0, '2021-01-17'),
(18, 'London', -10, '2021-01-18'),
(19, 'London', -11, '2021-01-19'),
(20, 'London', -12, '2021-01-20'),
(21, 'London', -11, '2021-01-21');
边栏推荐
猜你喜欢

Erreur Keil de Huada Single Chip Computer La solution de Weak

Bit.Store:熊市漫漫,稳定Staking产品或成主旋律

SQL Server - Window Function - 解决连续N条记录过滤问题

UE4:Build Configuration和Config的解释

binder hwbinder vndbinder

The Fifth Discipline: the art and practice of learning organization

【bug】联想小新出现问题,你的PIN不可用。

什么是SSR/SSG/ISR?如何在AWS上托管它们?

GIS遥感R语言学习看这里

Bit. Store: long bear market, stable stacking products may become the main theme
随机推荐
Buzzer experiment based on stm32f103zet6 library function
Crawl national laws and Regulations Database
# Leetcode 821. 字符的最短距离(简单)
带你认识图数据库性能和场景测试利器LDBC SNB
shell脚本常用命令(四)
移动低代码开发专题月 | 可视化开发 一键生成专业级源码
网络传输是怎么工作的 -- 详解 OSI 模型
Function key input experiment based on stm32f103zet6 Library
形参的默认值-及return的注意事项-及this的使用-和箭头函数的知识
实战回忆录:从Webshell开始突破边界
Erreur Keil de Huada Single Chip Computer La solution de Weak
华大单片机KEIL报错_WEAK的解决方案
基础数据类型和复杂数据类型
金鱼哥RHCA回忆录:DO447管理项目和开展作业--创建作业模板并启动作业
Is the account opening QR code given by CICC securities manager safe? Who can I open an account with?
Golang map 并发读写问题源码分析
Leetcode 1381. 设计一个支持增量操作的栈
Redis cluster Series II
【登录界面】
Redis cluster Series III