当前位置:网站首页>10 advanced concepts that must be understood in learning SQL
10 advanced concepts that must be understood in learning SQL
2022-07-06 17:42:00 【Junhong's road of data analysis】
As the amount of data continues to grow , The demand for qualified data professionals will also grow . To be specific , Yes SQL The demand for fluent professionals is growing , Not just at the primary level .
therefore ,Stratascratch The founder of Nathan Rosidi And I think I think 10 The most important and relevant intermediate to advanced SQL Concept .
1. Common table expressions (CTEs)
If you want to query subqueries , That's it CTEs When it comes to playing - CTEs Basically create a temporary table .
Use common table expressions (CTEs) It's a good way to modularize and decompose code , It's the same way you break the article down into paragraphs .
Please be there. Where Clause for the following queries .
SELECT
name,
salary
FROM
People
WHERE
NAME IN ( SELECT DISTINCT NAME FROM population WHERE country = "Canada" AND city = "Toronto" )
AND salary >= (
SELECT
AVG( salary )
FROM
salaries
WHERE
gender = "Female")
It seems hard to understand , But if there are many subqueries in the query , So how about ? This is it. CTEs Where it works .
with toronto_ppl as (
SELECT DISTINCT name
FROM population
WHERE country = "Canada"
AND city = "Toronto"
)
, avg_female_salary as (
SELECT AVG(salary) as avgSalary
FROM salaries
WHERE gender = "Female"
)
SELECT name
, salary
FROM People
WHERE name in (SELECT DISTINCT FROM toronto_ppl)
AND salary >= (SELECT avgSalary FROM avg_female_salary)
Now it's clear ,Where Clause is filtered in the name of Toronto . If you notice ,CTE It is useful to , Because you can break the code down into smaller blocks , But they're also useful , Because it allows you to do it for each CTE Assign variable names ( namely toronto_ppl and avg_female_salary)
Again ,CTEs Allows you to complete more advanced technology , Such as creating recursive tables .
2. recursive CTEs.
recursive CTE It's about quoting your own CTE, It's like Python It's the same as recursive functions in . recursive CTE Especially useful , It involves querying organization charts , file system , Hierarchical data of links between web pages, graphs, etc , Especially useful .
recursive CTE Yes 3 Parts of :
Anchor member : return CTE The basic results of the initial query
Recursive members : quote CTE Recursive queries for . This is all the alliance with anchor components
Stop recursive component termination condition
Here's how to get each employee ID The manager of ID Recursion CTE An example of :
with org_structure as (
SELECT id
, manager_id
FROM staff_members
WHERE manager_id IS NULL
UNION ALL
SELECT sm.id
, sm.manager_id
FROM staff_members sm
INNER JOIN org_structure os
ON os.id = sm.manager_id
3. Temporary functions
If you want to learn more about temporary functions , Please check this , But knowing how to write temporary functions is an important reason :
It allows you to break down blocks of code into smaller blocks
It's good for writing clean code
It can prevent repetition , And allows you to reuse things like using Python The code of the function in .
Consider the following example :
SELECT name
, CASE WHEN tenure < 1 THEN "analyst"
WHEN tenure BETWEEN 1 and 3 THEN "associate"
WHEN tenure BETWEEN 3 and 5 THEN "senior"
WHEN tenure > 5 THEN "vp"
ELSE "n/a"
END AS seniority
FROM employees
contrary , You can use temporary functions to capture example sentences .
CREATE TEMPORARY FUNCTION get_seniority(tenure INT64) AS (
CASE WHEN tenure < 1 THEN "analyst"
WHEN tenure BETWEEN 1 and 3 THEN "associate"
WHEN tenure BETWEEN 3 and 5 THEN "senior"
WHEN tenure > 5 THEN "vp"
ELSE "n/a"
END
);
SELECT name
, get_seniority(tenure) as seniority
FROM employees
Through the temporary function , The query itself is simpler , More readable , You can reuse the qualification function !
4. Use CASE WHEN Pivot data
You'll probably see a lot of requirements used in statements CASE WHEN The problem of , It's just because it's a multifunctional concept . If you want to assign a value or class based on other variables , Allows you to write complex conditional statements .
It is well known that , It also allows you to pivot data . for example , If you have a month column , And you want to create a single column for each month , You can use statements to trace the data .
Example questions : To write SQL Query to reformat the table , So that there is an income column every month .
Initial table:
+------+---------+-------+
| id | revenue | month |
+------+---------+-------+
| 1 | 8000 | Jan |
| 2 | 9000 | Jan |
| 3 | 10000 | Feb |
| 1 | 7000 | Feb |
| 1 | 6000 | Mar |
+------+---------+-------+
Result table:
+------+-------------+-------------+-------------+-----+-----------+
| id | Jan_Revenue | Feb_Revenue | Mar_Revenue | ... | Dec_Revenue |
+------+-------------+-------------+-------------+-----+-----------+
| 1 | 8000 | 7000 | 6000 | ... | null |
| 2 | 9000 | null | null | ... | null |
| 3 | null | 10000 | null | ... | null |
+------+-------------+-------------+-------------+-----+-----------+
5.EXCEPT vs NOT IN
Except for almost different operations . They are both used to compare two queries / Rows between tables . said , There are subtle differences between the two .
First , In addition to filtering, deleting duplicates and returning different rows and different rows that are not in .
Again , In addition to querying / The same number of columns in the table , Which is no longer associated with each query / Table compares individual columns .
6. Self coupling
One SQL The table connects itself . You might think it's useless , But you'll be surprised how common it is . In a lot of real life , Data is stored in a large table instead of many smaller tables . under these circumstances , You may need to connect yourself to solve unique problems .
Let's take a look at an example .
Example questions : Given the employee table below , Write a SQL Inquire about , Know the salary of employees , These employees are paid more than their managers . For the table above ,Joe Is the only employee who is paid more than his manager .
+----+-------+--------+-----------+
| Id | Name | Salary | ManagerId |
+----+-------+--------+-----------+
| 1 | Joe | 70000 | 3 |
| 2 | Henry | 80000 | 4 |
| 3 | Sam | 60000 | NULL |
| 4 | Max | 90000 | NULL |
+----+-------+--------+-----------+Answer:
SELECT
a.Name as Employee
FROM
Employee as a
JOIN Employee as b on a.ManagerID = b.Id
WHERE a.Salary > b.Salary
7.Rank vs Dense Rank vs Row Number
It's a very common application , Rank lines and values . Here are some examples of how companies often use rankings :
Press shopping , The number of customers with the highest profit
Top products ranked by sales volume
Top countries with the largest sales
Ranking in the number of minutes watched , The number of different viewers and so on to watch the top video .
stay SQL in , You can use several ways to “ Grade ” Assign to line , We'll explore with examples . Consider the following Query And result :
SELECT Name
, GPA
, ROW_NUMBER() OVER (ORDER BY GPA desc)
, RANK() OVER (ORDER BY GPA desc)
, DENSE_RANK() OVER (ORDER BY GPA desc)
FROM student_grades
ROW_NUMBER() Returns the unique number at the beginning of each line . When there is a relationship ( for example ,BOB vs Carrie),ROW_NUMBER() If the second criterion is not defined , The number will be assigned arbitrarily .
Rank() Return from 1 The unique number of each line that begins , Except when it's related , Grade () The same number will be assigned . Again , The gap will follow the level of repetition .
dense_rank() It's similar to hierarchy (), Except there's no gap after the repeat level . Please note that , Use dense_rank(),Daniel Ranking the first 3, Not the first. 4 position ().
8. Calculation Delta value
Another common application is to compare values from different periods . for example , What's the delta between this month's and last month's sales ? Or what this month and this month and this month last year were ?
Compare the values of different periods to calculate Deltas when , This is a Lead() and LAG() When it works .
Here are some examples :
# Comparing each month's sales to last month
SELECT month
, sales
, sales - LAG(sales, 1) OVER (ORDER BY month)
FROM monthly_sales
# Comparing each month's sales to the same month last year
SELECT month
, sales
, sales - LAG(sales, 12) OVER (ORDER BY month)
FROM monthly_sales
9. Calculate the total number of runs
If you know about row_number() and lag()/ lead(), It may not surprise you . But if you don't , This is probably one of the most useful window functions , Especially when you want to visualize growth !
Use with SUM() The window function of , We can calculate the total number of runs . See the example below :
SELECT Month
, Revenue
, SUM(Revenue) OVER (ORDER BY Month) AS Cumulative
FROM monthly_revenue
10. Date time manipulation
You should certainly expect something involving date time data SQL problem . for example , You may need to group data or change variable formats from DD-MM-Yyyy Convert to simple months .
Some of the features you should know are :
refine
Yen
date_add,date_sub.
date_trunc.
Example questions : Given the weather table , Write a SQL Inquire about , To find its previous ( yesterday ) The temperature of the date is higher than that of all the dates ID.
+---------+------------------+------------------+
| Id(INT) | RecordDate(DATE) | Temperature(INT) |
+---------+------------------+------------------+
| 1 | 2015-01-01 | 10 |
| 2 | 2015-01-02 | 25 |
| 3 | 2015-01-03 | 20 |
| 4 | 2015-01-04 | 30 |
+---------+------------------+------------------+Answer:
SELECT
a.Id
FROM
Weather a,
Weather b
WHERE
a.Temperature > b.Temperature
AND DATEDIFF(a.RecordDate, b.RecordDate) = 1
That's it ! I hope this will help you in your interview preparation - I Believe , If you know this 10 It's an internal concept , Well, most of them there SQL The question is , You will do well .
As always, , I wish you the best in your study !
This article is translated from Dimitris Poulopoulos The article
《Ten Advanced SQL Concepts You Should Know for Data Science Interviews》
Link to the original text :
https://towardsdatascience.com/ten-advanced-sql-concepts-you-should-know-for-data-science-interviews-4d7015ec74b0
- END -
contrast Excel The cumulative sales of the series of books reached 15w book , Make it easy for you to master data analysis skills , You can click on the link below to learn about purchasing :
边栏推荐
- Single responsibility principle
- Kali2021 installation and basic configuration
- C# WinForm中DataGridView单元格显示图片
- Summary of study notes for 2022 soft exam information security engineer preparation
- Xin'an Second Edition: Chapter 26 big data security demand analysis and security protection engineering learning notes
- 【Elastic】Elastic缺少xpack无法创建模板 unknown setting index.lifecycle.name index.lifecycle.rollover_alias
- The solution to the left-right sliding conflict caused by nesting Baidu MapView in the fragment of viewpager
- RepPoints:可形变卷积的进阶
- 自动化运维利器ansible基础
- C#版Selenium操作Chrome全屏模式显示(F11)
猜你喜欢
中移动、蚂蚁、顺丰、兴盛优选技术专家,带你了解架构稳定性保障
Re signal writeup
[translation] principle analysis of X Window Manager (I)
SAP UI5 框架的 manifest.json
C# NanoFramework 点灯和按键 之 ESP32
C#版Selenium操作Chrome全屏模式显示(F11)
Kali2021 installation and basic configuration
Solution qui ne peut pas être retournée après la mise à jour du navigateur Web flutter
基于LNMP部署flask项目
EasyCVR平台通过接口编辑通道出现报错“ID不能为空”,是什么原因?
随机推荐
Guidelines for preparing for the 2022 soft exam information security engineer exam
Huawei certified cloud computing hica
CTF reverse entry question - dice
The NTFS format converter (convert.exe) is missing from the current system
[reverse] repair IAT and close ASLR after shelling
Concept and basic knowledge of network layering
Re signal writeup
自动化运维利器-Ansible-Playbook
MySQL error reporting solution
Selenium test of automatic answer runs directly in the browser, just like real users.
Deploy flask project based on LNMP
Sqoop I have everything you want
Models used in data warehouse modeling and layered introduction
Debug and run the first xv6 program
Flink parsing (V): state and state backend
EasyCVR电子地图中设备播放器loading样式的居中对齐优化
DataGridView scroll bar positioning in C WinForm
Flink parsing (III): memory management
基于STM32+华为云IOT设计的智能路灯
Establishment of graphical monitoring grafana