当前位置:网站首页>10 advanced concepts that must be understood in learning SQL
10 advanced concepts that must be understood in learning SQL
2022-07-06 17:42:00 【Junhong's road of data analysis】
As the amount of data continues to grow , The demand for qualified data professionals will also grow . To be specific , Yes SQL The demand for fluent professionals is growing , Not just at the primary level .
therefore ,Stratascratch The founder of Nathan Rosidi And I think I think 10 The most important and relevant intermediate to advanced SQL Concept .
1. Common table expressions (CTEs)
If you want to query subqueries , That's it CTEs When it comes to playing - CTEs Basically create a temporary table .
Use common table expressions (CTEs) It's a good way to modularize and decompose code , It's the same way you break the article down into paragraphs .
Please be there. Where Clause for the following queries .
SELECT
name,
salary
FROM
People
WHERE
NAME IN ( SELECT DISTINCT NAME FROM population WHERE country = "Canada" AND city = "Toronto" )
AND salary >= (
SELECT
AVG( salary )
FROM
salaries
WHERE
gender = "Female")It seems hard to understand , But if there are many subqueries in the query , So how about ? This is it. CTEs Where it works .
with toronto_ppl as (
SELECT DISTINCT name
FROM population
WHERE country = "Canada"
AND city = "Toronto"
)
, avg_female_salary as (
SELECT AVG(salary) as avgSalary
FROM salaries
WHERE gender = "Female"
)
SELECT name
, salary
FROM People
WHERE name in (SELECT DISTINCT FROM toronto_ppl)
AND salary >= (SELECT avgSalary FROM avg_female_salary)Now it's clear ,Where Clause is filtered in the name of Toronto . If you notice ,CTE It is useful to , Because you can break the code down into smaller blocks , But they're also useful , Because it allows you to do it for each CTE Assign variable names ( namely toronto_ppl and avg_female_salary)
Again ,CTEs Allows you to complete more advanced technology , Such as creating recursive tables .
2. recursive CTEs.
recursive CTE It's about quoting your own CTE, It's like Python It's the same as recursive functions in . recursive CTE Especially useful , It involves querying organization charts , file system , Hierarchical data of links between web pages, graphs, etc , Especially useful .
recursive CTE Yes 3 Parts of :
Anchor member : return CTE The basic results of the initial query
Recursive members : quote CTE Recursive queries for . This is all the alliance with anchor components
Stop recursive component termination condition
Here's how to get each employee ID The manager of ID Recursion CTE An example of :
with org_structure as (
SELECT id
, manager_id
FROM staff_members
WHERE manager_id IS NULL
UNION ALL
SELECT sm.id
, sm.manager_id
FROM staff_members sm
INNER JOIN org_structure os
ON os.id = sm.manager_id3. Temporary functions
If you want to learn more about temporary functions , Please check this , But knowing how to write temporary functions is an important reason :
It allows you to break down blocks of code into smaller blocks
It's good for writing clean code
It can prevent repetition , And allows you to reuse things like using Python The code of the function in .
Consider the following example :
SELECT name
, CASE WHEN tenure < 1 THEN "analyst"
WHEN tenure BETWEEN 1 and 3 THEN "associate"
WHEN tenure BETWEEN 3 and 5 THEN "senior"
WHEN tenure > 5 THEN "vp"
ELSE "n/a"
END AS seniority
FROM employeescontrary , You can use temporary functions to capture example sentences .
CREATE TEMPORARY FUNCTION get_seniority(tenure INT64) AS (
CASE WHEN tenure < 1 THEN "analyst"
WHEN tenure BETWEEN 1 and 3 THEN "associate"
WHEN tenure BETWEEN 3 and 5 THEN "senior"
WHEN tenure > 5 THEN "vp"
ELSE "n/a"
END
);
SELECT name
, get_seniority(tenure) as seniority
FROM employeesThrough the temporary function , The query itself is simpler , More readable , You can reuse the qualification function !
4. Use CASE WHEN Pivot data
You'll probably see a lot of requirements used in statements CASE WHEN The problem of , It's just because it's a multifunctional concept . If you want to assign a value or class based on other variables , Allows you to write complex conditional statements .
It is well known that , It also allows you to pivot data . for example , If you have a month column , And you want to create a single column for each month , You can use statements to trace the data .
Example questions : To write SQL Query to reformat the table , So that there is an income column every month .
Initial table:
+------+---------+-------+
| id | revenue | month |
+------+---------+-------+
| 1 | 8000 | Jan |
| 2 | 9000 | Jan |
| 3 | 10000 | Feb |
| 1 | 7000 | Feb |
| 1 | 6000 | Mar |
+------+---------+-------+
Result table:
+------+-------------+-------------+-------------+-----+-----------+
| id | Jan_Revenue | Feb_Revenue | Mar_Revenue | ... | Dec_Revenue |
+------+-------------+-------------+-------------+-----+-----------+
| 1 | 8000 | 7000 | 6000 | ... | null |
| 2 | 9000 | null | null | ... | null |
| 3 | null | 10000 | null | ... | null |
+------+-------------+-------------+-------------+-----+-----------+5.EXCEPT vs NOT IN
Except for almost different operations . They are both used to compare two queries / Rows between tables . said , There are subtle differences between the two .
First , In addition to filtering, deleting duplicates and returning different rows and different rows that are not in .
Again , In addition to querying / The same number of columns in the table , Which is no longer associated with each query / Table compares individual columns .
6. Self coupling
One SQL The table connects itself . You might think it's useless , But you'll be surprised how common it is . In a lot of real life , Data is stored in a large table instead of many smaller tables . under these circumstances , You may need to connect yourself to solve unique problems .
Let's take a look at an example .
Example questions : Given the employee table below , Write a SQL Inquire about , Know the salary of employees , These employees are paid more than their managers . For the table above ,Joe Is the only employee who is paid more than his manager .
+----+-------+--------+-----------+
| Id | Name | Salary | ManagerId |
+----+-------+--------+-----------+
| 1 | Joe | 70000 | 3 |
| 2 | Henry | 80000 | 4 |
| 3 | Sam | 60000 | NULL |
| 4 | Max | 90000 | NULL |
+----+-------+--------+-----------+Answer:
SELECT
a.Name as Employee
FROM
Employee as a
JOIN Employee as b on a.ManagerID = b.Id
WHERE a.Salary > b.Salary7.Rank vs Dense Rank vs Row Number
It's a very common application , Rank lines and values . Here are some examples of how companies often use rankings :
Press shopping , The number of customers with the highest profit
Top products ranked by sales volume
Top countries with the largest sales
Ranking in the number of minutes watched , The number of different viewers and so on to watch the top video .
stay SQL in , You can use several ways to “ Grade ” Assign to line , We'll explore with examples . Consider the following Query And result :
SELECT Name
, GPA
, ROW_NUMBER() OVER (ORDER BY GPA desc)
, RANK() OVER (ORDER BY GPA desc)
, DENSE_RANK() OVER (ORDER BY GPA desc)
FROM student_grades
ROW_NUMBER() Returns the unique number at the beginning of each line . When there is a relationship ( for example ,BOB vs Carrie),ROW_NUMBER() If the second criterion is not defined , The number will be assigned arbitrarily .
Rank() Return from 1 The unique number of each line that begins , Except when it's related , Grade () The same number will be assigned . Again , The gap will follow the level of repetition .
dense_rank() It's similar to hierarchy (), Except there's no gap after the repeat level . Please note that , Use dense_rank(),Daniel Ranking the first 3, Not the first. 4 position ().
8. Calculation Delta value
Another common application is to compare values from different periods . for example , What's the delta between this month's and last month's sales ? Or what this month and this month and this month last year were ?
Compare the values of different periods to calculate Deltas when , This is a Lead() and LAG() When it works .
Here are some examples :
# Comparing each month's sales to last month
SELECT month
, sales
, sales - LAG(sales, 1) OVER (ORDER BY month)
FROM monthly_sales
# Comparing each month's sales to the same month last year
SELECT month
, sales
, sales - LAG(sales, 12) OVER (ORDER BY month)
FROM monthly_sales9. Calculate the total number of runs
If you know about row_number() and lag()/ lead(), It may not surprise you . But if you don't , This is probably one of the most useful window functions , Especially when you want to visualize growth !
Use with SUM() The window function of , We can calculate the total number of runs . See the example below :
SELECT Month
, Revenue
, SUM(Revenue) OVER (ORDER BY Month) AS Cumulative
FROM monthly_revenue
10. Date time manipulation
You should certainly expect something involving date time data SQL problem . for example , You may need to group data or change variable formats from DD-MM-Yyyy Convert to simple months .
Some of the features you should know are :
refine
Yen
date_add,date_sub.
date_trunc.
Example questions : Given the weather table , Write a SQL Inquire about , To find its previous ( yesterday ) The temperature of the date is higher than that of all the dates ID.
+---------+------------------+------------------+
| Id(INT) | RecordDate(DATE) | Temperature(INT) |
+---------+------------------+------------------+
| 1 | 2015-01-01 | 10 |
| 2 | 2015-01-02 | 25 |
| 3 | 2015-01-03 | 20 |
| 4 | 2015-01-04 | 30 |
+---------+------------------+------------------+Answer:
SELECT
a.Id
FROM
Weather a,
Weather b
WHERE
a.Temperature > b.Temperature
AND DATEDIFF(a.RecordDate, b.RecordDate) = 1That's it ! I hope this will help you in your interview preparation - I Believe , If you know this 10 It's an internal concept , Well, most of them there SQL The question is , You will do well .
As always, , I wish you the best in your study !
This article is translated from Dimitris Poulopoulos The article
《Ten Advanced SQL Concepts You Should Know for Data Science Interviews》
Link to the original text :
https://towardsdatascience.com/ten-advanced-sql-concepts-you-should-know-for-data-science-interviews-4d7015ec74b0
- END -
contrast Excel The cumulative sales of the series of books reached 15w book , Make it easy for you to master data analysis skills , You can click on the link below to learn about purchasing :边栏推荐
- EasyCVR电子地图中设备播放器loading样式的居中对齐优化
- Automatic operation and maintenance sharp weapon ansible Foundation
- Sqoop I have everything you want
- [ciscn 2021 South China]rsa writeup
- 03 products and promotion developed by individuals - plan service configurator v3.0
- node の SQLite
- 06 products and promotion developed by individuals - code statistical tools
- 网络分层概念及基本知识
- 05 personal R & D products and promotion - data synchronization tool
- OpenCV中如何使用滚动条动态调整参数
猜你喜欢

node の SQLite

全网最全tcpdump和Wireshark抓包实践

当前系统缺少NTFS格式转换器(convert.exe)

Zen integration nails, bugs, needs, etc. are reminded by nails

Automatic operation and maintenance sharp weapon ansible Foundation

Optimization of middle alignment of loading style of device player in easycvr electronic map

基于STM32+华为云IOT设计的智能路灯

Flink parsing (IV): recovery mechanism

The problem of "syntax error" when uipath executes insert statement is solved

OpenCV中如何使用滚动条动态调整参数
随机推荐
Flexible report v1.0 (simple version)
[introduction to MySQL] third, common data types in MySQL
CTF reverse entry question - dice
Xin'an Second Edition: Chapter 24 industrial control safety demand analysis and safety protection engineering learning notes
MySQL报错解决
The art of Engineering
Flink parsing (V): state and state backend
mysql高级(索引,视图,存储过程,函数,修改密码)
Example of batch update statement combining update and inner join in SQL Server
Spark accumulator and broadcast variables and beginners of sparksql
Grafana 9 正式发布,更易用,更酷炫了!
Solrcloud related commands
Flink analysis (II): analysis of backpressure mechanism
Remote code execution penetration test - B module test
Development and practice of lightweight planning service tools
02个人研发的产品及推广-短信平台
遠程代碼執行滲透測試——B模塊測試
Re signal writeup
Deploy flask project based on LNMP
Run xv6 system