当前位置:网站首页>In the face of difficult SQL requirements, HQL is not afraid
In the face of difficult SQL requirements, HQL is not afraid
2022-07-03 01:47:00 【Software testing Jun】

subject :
(1). precondition :
There are the following data sets I, Table query results are shown in the following figure , Set this table as a table :test_user_scan.

(2). Subject requirements :
Use hive sql Query the number of times that the difference between two adjacent browsing times of each user is less than three minutes .
Expected results :
Their thinking :
(1). Subquery G As left join Main table for , Mainly to get all user_id
The query results are as follows :
user_id scan_time
1 2022-01-07 21:13:07
1 2022-01-07 21:15:25
1 2022-01-07 21:17:44
2 2022-01-13 21:14:09
2 2022-01-13 21:18:19
2 2022-01-13 21:20:36
3 2022-01-21 21:16:51
4 2022-01-02 21:17:22
4 2022-01-16 22:22:09
4 2022-01-30 15:15:44
4 2022-01-30 15:17:57
(2). Subquery H As left join Secondary table of , It is mainly used to count the total number of times that the difference between two adjacent browsing times of each user is less than three minutes .
The query results are as follows :
user_id cnt
1 2
2 1
4 1
Subquery H = Subquery C join Subquery D
(C=D, Use C join D Make a self-correlation , It's to deal with :“ Two adjacent times ” and “ The difference in browsing time is less than three minutes ” The logic of .)
Subquery C, The query results are as follows ( And subquery D The results are consistent ):
user_id scan_time rn
1 2022-01-07 21:13:07 1
1 2022-01-07 21:15:25 2
1 2022-01-07 21:17:44 3
2 2022-01-13 21:14:09 1
2 2022-01-13 21:18:19 2
2 2022-01-13 21:20:36 3
3 2022-01-21 21:16:51 1
4 2022-01-02 21:17:22 1
4 2022-01-16 22:22:09 2
4 2022-01-30 15:15:44 3
4 2022-01-30 15:17:57 4
Subquery D, The query results are as follows :
user_id scan_time rn
1 2022-01-07 21:13:07 1
1 2022-01-07 21:15:25 2
1 2022-01-07 21:17:44 3
2 2022-01-13 21:14:09 1
2 2022-01-13 21:18:19 2
2 2022-01-13 21:20:36 3
3 2022-01-21 21:16:51 1
4 2022-01-02 21:17:22 1
4 2022-01-16 22:22:09 2
4 2022-01-30 15:15:44 3
4 2022-01-30 15:17:57 4
(3). Finally, use the subquery G Result left join Subquery H Result , The query result is as expected
Use user_id As a condition of Association , Also on cnt by null Data processing nvl The judgment is transformed into 0, Finally using user_id and cnt Group and filter duplicate data
Problem solving method 1 :
It is applicable to situations where physical tables are not created
Using data sets I、A、E Replace physical table :test_user_scan, Directly copy the following hive sql sentence , Can be in Apache Hive The environment runs directly , Get the above expected result data .
select G.user_id,
CASE WHEN nvl(H.cnt, 0) = 0 THEN 0
ELSE H.cnt
END cnt
from ( select *
from(
select 1 user_id,date_format(regexp_replace('2022/1/7 21:13:07', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 1 user_id,date_format(regexp_replace('2022/1/7 21:15:25', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 1 user_id,date_format(regexp_replace('2022/1/7 21:17:44', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 2 user_id,date_format(regexp_replace('2022/1/13 21:14:09', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 2 user_id,date_format(regexp_replace('2022/1/13 21:18:19', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 2 user_id,date_format(regexp_replace('2022/1/13 21:20:36', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 3 user_id,date_format(regexp_replace('2022/1/21 21:16:51', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 4 user_id,date_format(regexp_replace('2022/1/16 22:22:09', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 4 user_id,date_format(regexp_replace('2022/1/2 21:17:22', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 4 user_id,date_format(regexp_replace('2022/1/30 15:15:44', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 4 user_id,date_format(regexp_replace('2022/1/30 15:17:57', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
)I order by user_id,scan_time
)G left join (
select C.user_id,
count(1) as cnt
from (
select B.*,
row_number() over(partition by user_id order by scan_time) rn
from (
select *
from (
select 1 user_id,date_format(regexp_replace('2022/1/7 21:13:07', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 1 user_id,date_format(regexp_replace('2022/1/7 21:15:25', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 1 user_id,date_format(regexp_replace('2022/1/7 21:17:44', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 2 user_id,date_format(regexp_replace('2022/1/13 21:14:09', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 2 user_id,date_format(regexp_replace('2022/1/13 21:18:19', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 2 user_id,date_format(regexp_replace('2022/1/13 21:20:36', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 3 user_id,date_format(regexp_replace('2022/1/21 21:16:51', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 4 user_id,date_format(regexp_replace('2022/1/16 22:22:09', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 4 user_id,date_format(regexp_replace('2022/1/2 21:17:22', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 4 user_id,date_format(regexp_replace('2022/1/30 15:15:44', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 4 user_id,date_format(regexp_replace('2022/1/30 15:17:57', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
)A order by user_id,scan_time
)B
)C join (
select F.*,
row_number() over(partition by user_id order by scan_time) rn
from (
select *
from (
select 1 user_id,date_format(regexp_replace('2022/1/7 21:13:07', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 1 user_id,date_format(regexp_replace('2022/1/7 21:15:25', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 1 user_id,date_format(regexp_replace('2022/1/7 21:17:44', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 2 user_id,date_format(regexp_replace('2022/1/13 21:14:09', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 2 user_id,date_format(regexp_replace('2022/1/13 21:18:19', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 2 user_id,date_format(regexp_replace('2022/1/13 21:20:36', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 3 user_id,date_format(regexp_replace('2022/1/21 21:16:51', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 4 user_id,date_format(regexp_replace('2022/1/16 22:22:09', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 4 user_id,date_format(regexp_replace('2022/1/2 21:17:22', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 4 user_id,date_format(regexp_replace('2022/1/30 15:15:44', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
union all
select 4 user_id,date_format(regexp_replace('2022/1/30 15:17:57', '/', '-'), 'yyyy-MM-dd HH:mm:ss') scan_time
)E order by user_id,scan_time
)F
)D
ON C.user_id=D.user_id
where C.rn = D.rn + 1
and abs((unix_timestamp(C.scan_time) - unix_timestamp(D.scan_time))/60) < 3
group by C.user_id
) H
on G.user_id = H.user_id
group by G.user_id,H.cnt;
Problem solving method 2 :
It is applicable to creating physical tables first :test_user_scan Under the circumstances
Will test the data insert to test_user_scan surface .
The results of the table are as follows :

Set the data set in solution 1 I、A、E Replace with table test_user_scan that will do .
select G.user_id,
CASE WHEN nvl(H.cnt, 0) = 0 THEN 0
ELSE H.cnt
END cnt
from ( select *
from test_user_scan order by user_id,scan_time
)G left join (
select C.user_id,
count(1) as cnt
from (
select B.*,
row_number() over(partition by user_id order by scan_time) rn
from (
select *
from test_user_scan order by user_id,scan_time
)B
)C join (
select F.*,
row_number() over(partition by user_id order by scan_time) rn
from (
select *
from test_user_scan order by user_id,scan_time
)F
)D
ON C.user_id=D.user_id
where C.rn = D.rn + 1
and abs((unix_timestamp(C.scan_time) - unix_timestamp(D.scan_time))/60) < 3
group by C.user_id
) H
on G.user_id = H.user_id
group by G.user_id,H.cnt;
Sum up the knowledge points :
Use hive sql Finish this Sql topic , The functions or methods used are as follows :
(1).regexp_replace
Regular replacement functions , Set the date string to “/” Replace with “-” ;
(2).date_format
Date format function , Will use regexp_replace Function to replace the date string , Convert to : Mm / DD / yyyy HHM / S (yyyy-MM-dd HH:mm:ss) The data type of the format , It is convenient to sort the following time ;
(3).row_number() over(partition by user_id order by scan_time) rn
row_number() The function can group the data result set before sorting according to the specified grouping field and sorting field, and mark the corresponding numerical sequence number , The purpose is to provide two adjacent comparison conditions for each user , The concrete application in this paper is :where C.rn = D.rn + 1 In this judgment condition .
(4).abs((unix_timestamp(C.scan_time) - unix_timestamp(D.scan_time))/60)
unix_timestamp The function converts a time date into seconds , Divide 60 To convert to minutes , Because the title requirement is less than 3 minute ;
abs A function is an absolute value , In order to avoid the influence of positive and negative numbers on conditional judgment, an absolute value judgment is added ;
(5).case when Condition judgment of row to column
CASE WHEN nvl(H.cnt, 0) = 0 THEN 0
ELSE H.cnt
END cnt
Because the user user_id by 3 The only test data is 1 strip , Therefore, there is no adjacency , However, what is not required in the expected results of the topic is counted as 0, In subquery H There is no user_id by 3 Result .
So in the subquery G As the main table ,user_id by 3 Corresponding cnt The value of is null, So here it is case when in nvl Function pair null Value handling .
nvl(H.cnt, 0) Express : If H.cnt The value of is null, Then its value is converted to 0.
The technology industry should continue to learn , Don't fight alone in your study , It's best to keep warm , Achieve each other and grow together , The effect of mass effect is very powerful , Let's learn together , Punch in together , Will be more motivated to learn , And you can stick to it . You can join our testing technology exchange group :914172719( There are various software testing resources and technical discussions )
Here's a message for you , Mutual encouragement : When our abilities are insufficient , The first thing to do is internal practice ! When we are strong enough , You can look outside !

Finally, we also prepared a supporting learning resource for you , You can scan the QR code below via wechat , Get one for free 216 Page software testing engineer interview guide document information . And the corresponding video learning tutorial is free to share !, The information includes basic knowledge 、Linux necessary 、Shell、 The principles of the Internet 、Mysql database 、 Special topic of bag capturing tools 、 Interface testing tool 、 Test advanced -Python Programming 、Web automated testing 、APP automated testing 、 Interface automation testing 、 Testing advanced continuous integration 、 Test architecture development test framework 、 Performance testing 、 Safety test, etc. .
Friends who like software testing , If my blog helps you 、 If you like my blog content , please “ give the thumbs-up ” “ Comment on ” “ Collection ” One button, three links !
Good article recommends
Interview : First tier cities move bricks ! Another software testing post ,5000 That's enough …
What kind of person is suitable for software testing ?
The man who leaves work on time , Promoted before me …
The test post changes jobs repeatedly , Jump, jump and disappear …
“ One year in the job , The automated software test hired by high salary was discouraged .”

边栏推荐
- Network security - firewall
- Network security - password cracking
- Wireshark data analysis and forensics a.pacapng
- 网络安全-ACL访问控制列表
- STM32 - GPIO input / output mode
- 小程序开发的部分功能
- Function definition and call, this, strict mode, higher-order function, closure, recursion
- Network security - Information Collection
- How is the mask effect achieved in the LPL ban/pick selection stage?
- Qtablewidget lazy load remaining memory, no card!
猜你喜欢

Three core issues of concurrent programming - "deep understanding of high concurrent programming"

Introduction to flask tutorial

Huakaiyun (Zhiyin) | virtual host: what is a virtual host
![[error record] navigator operation requested with a context that does not include a naviga](/img/53/e28718970a2f7226ed53afa27f6725.jpg)
[error record] navigator operation requested with a context that does not include a naviga

Installation and use of serial port packet capturing / cutting tool

什么是调。调的故事

"Jetpack - livedata parsing"
![[data mining] task 1: distance calculation](/img/72/a63cdfe32a7c438acf48a069d9bba1.png)
[data mining] task 1: distance calculation

自定义组件、使用npm包、全局数据共享、分包

Vant implements a simple login registration module and a personal user center
随机推荐
Everything文件搜索工具
【Camera专题】OTP数据如何保存在自定义节点中
NCTF 2018 part Title WP (1)
Learn the five skills you need to master in cloud computing application development
网络安全-中间人攻击
网络安全-漏洞与木马
Summary of interval knowledge
简易分析fgui依赖关系工具
Some functions of applet development
[机缘参悟-36]:鬼谷子-飞箝篇 - 面对捧杀与诱饵的防范之道
网络安全-扫描与密码爆破2
【数据挖掘】任务3:决策树分类
Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot construct instance o
Network security - Trojan horse
One of the C language practical projects is greedy snake
Network security - vulnerabilities and Trojans
Installation and use of serial port packet capturing / cutting tool
Mathematical knowledge: step Nim game game game theory
[shutter] animation animation (the core class of shutter animation | animation | curvedanimation | animationcontroller | tween)
CF1617B Madoka and the Elegant Gift、CF1654C Alice and the Cake、 CF1696C Fishingprince Plays With Arr
