当前位置:网站首页>10 advanced concepts that must be understood in learning SQL

10 advanced concepts that must be understood in learning SQL

2022-06-11 07:18:00 Java notes shrimp

Click on the official account , utilize Fragment time to learn

As the amount of data continues to grow , The demand for qualified data professionals will also grow . To be specific , Yes SQL The demand for fluent professionals is growing , Not just at the primary level .

therefore ,Stratascratch The founder of Nathan Rosidi And I think I think 10 The most important and relevant intermediate to advanced SQL Concept .

1. Common table expressions (CTEs)

If you want to query subqueries , That's it CTEs When it comes to playing - CTEs Basically create a temporary table .

Use common table expressions (CTEs) It's a good way to modularize and decompose code , It's the same way you break the article down into paragraphs .

Please be there. Where Clause for the following queries .

SELECT   
 name,  
 salary   
FROM  
 People   
WHERE  
 NAME IN ( SELECT DISTINCT NAME FROM population WHERE country = "Canada" AND city = "Toronto" )   
 AND salary >= (  
 SELECT  
  AVG( salary )   
 FROM  
  salaries   
WHERE  
 gender = "Female")

It seems hard to understand , But if there are many subqueries in the query , So how about ? This is it. CTEs Where it works .

with toronto_ppl as (  
   SELECT DISTINCT name  
   FROM population  
   WHERE country = "Canada"  
         AND city = "Toronto"  
)  
, avg_female_salary as (  
   SELECT AVG(salary) as avgSalary  
   FROM salaries  
   WHERE gender = "Female"  
)  
SELECT name  
       , salary  
FROM People  
WHERE name in (SELECT DISTINCT FROM toronto_ppl)  
      AND salary >= (SELECT avgSalary FROM avg_female_salary)

Now it's clear ,Where Clause is filtered in the name of Toronto . If you notice ,CTE It is useful to , Because you can break the code down into smaller blocks , But they're also useful , Because it allows you to do it for each CTE Assign variable names ( namely toronto_ppl and avg_female_salary)

Again ,CTEs Allows you to complete more advanced technology , Such as creating recursive tables .

2. recursive CTEs.

recursive CTE It's about quoting your own CTE, It's like Python It's the same as recursive functions in . recursive CTE Especially useful , It involves querying organization charts , file system , Hierarchical data of links between web pages, graphs, etc , Especially useful .

recursive CTE Yes 3 Parts of :

  • Anchor member : return CTE The basic results of the initial query

  • Recursive members : quote CTE Recursive queries for . This is all the alliance with anchor components

  • Stop recursive component termination condition

Here's how to get each employee ID The manager of ID Recursion CTE An example of :

with org_structure as (  
   SELECT id  
          , manager_id  
   FROM staff_members  
   WHERE manager_id IS NULL  
   UNION ALL  
   SELECT sm.id  
          , sm.manager_id  
   FROM staff_members sm  
   INNER JOIN org_structure os  
      ON os.id = sm.manager_id

3. Temporary functions

If you want to learn more about temporary functions , Please check this , But knowing how to write temporary functions is an important reason :

  • It allows you to break down blocks of code into smaller blocks

  • It's good for writing clean code

  • It can prevent repetition , And allows you to reuse things like using Python The code of the function in .

Consider the following example :

SELECT name  
       , CASE WHEN tenure < 1 THEN "analyst"  
              WHEN tenure BETWEEN 1 and 3 THEN "associate"  
              WHEN tenure BETWEEN 3 and 5 THEN "senior"  
              WHEN tenure > 5 THEN "vp"  
              ELSE "n/a"  
         END AS seniority   
FROM employees

contrary , You can use temporary functions to capture example sentences .

CREATE TEMPORARY FUNCTION get_seniority(tenure INT64) AS (  
   CASE WHEN tenure < 1 THEN "analyst"  
        WHEN tenure BETWEEN 1 and 3 THEN "associate"  
        WHEN tenure BETWEEN 3 and 5 THEN "senior"  
        WHEN tenure > 5 THEN "vp"  
        ELSE "n/a"  
   END  
);  
SELECT name  
       , get_seniority(tenure) as seniority  
FROM employees

Through the temporary function , The query itself is simpler , More readable , You can reuse the qualification function !

4. Use CASE WHEN Pivot data

You'll probably see a lot of requirements used in statements CASE WHEN The problem of , It's just because it's a multifunctional concept . If you want to assign a value or class based on other variables , Allows you to write complex conditional statements .

It is well known that , It also allows you to pivot data . for example , If you have a month column , And you want to create a single column for each month , You can use statements to trace the data .

Example questions : To write SQL Query to reformat the table , So that there is an income column every month .

Initial table:    
+------+---------+-------+    
| id   | revenue | month |    
+------+---------+-------+    
| 1    | 8000    | Jan   |    
| 2    | 9000    | Jan   |    
| 3    | 10000   | Feb   |    
| 1    | 7000    | Feb   |    
| 1    | 6000    | Mar   |    
+------+---------+-------+    
  
Result table:    
+------+-------------+-------------+-------------+-----+-----------+    
| id   | Jan_Revenue | Feb_Revenue | Mar_Revenue | ... | Dec_Revenue |    
+------+-------------+-------------+-------------+-----+-----------+    
| 1    | 8000        | 7000        | 6000        | ... | null        |    
| 2    | 9000        | null        | null        | ... | null        |    
| 3    | null        | 10000       | null        | ... | null        |    
+------+-------------+-------------+-------------+-----+-----------+

5.EXCEPT vs NOT IN

Except for almost different operations . They are both used to compare two queries / Rows between tables . said , There are subtle differences between the two .

First , In addition to filtering, deleting duplicates and returning different rows and different rows that are not in .

Again , In addition to querying / The same number of columns in the table , Which is no longer associated with each query / Table compares individual columns .

6. Self coupling

One SQL The table connects itself . You might think it's useless , But you'll be surprised how common it is . In a lot of real life , Data is stored in a large table instead of many smaller tables . under these circumstances , You may need to connect yourself to solve unique problems .

Let's take a look at an example .

Example questions : Given the employee table below , Write a SQL Inquire about , Know the salary of employees , These employees are paid more than their managers . For the table above ,Joe Is the only employee who is paid more than his manager .

+----+-------+--------+-----------+    
| Id | Name  | Salary | ManagerId |    
+----+-------+--------+-----------+    
| 1  | Joe   | 70000  | 3         |    
| 2  | Henry | 80000  | 4         |    
| 3  | Sam   | 60000  | NULL      |    
| 4  | Max   | 90000  | NULL      |    
+----+-------+--------+-----------+Answer:    
SELECT    
    a.Name as Employee    
FROM    
    Employee as a    
    JOIN Employee as b on a.ManagerID = b.Id    
WHERE a.Salary > b.Salary

7.Rank vs DenseRank vs RowNumber

It's a very common application , Rank lines and values . Here are some examples of how companies often use rankings :

  • Press shopping , The number of customers with the highest profit

  • Top products ranked by sales volume

  • Top countries with the largest sales

  • Ranking in the number of minutes watched , The number of different viewers and so on to watch the top video .

stay SQL in , You can use several ways to “ Grade ” Assign to line , We'll explore with examples . Consider the following Query And result :

SELECT Name    
 , GPA    
 , ROW_NUMBER() OVER (ORDER BY GPA desc)    
 , RANK() OVER (ORDER BY GPA desc)    
 , DENSE_RANK() OVER (ORDER BY GPA desc)    
FROM student_grades
65c170e1fb85a4217e32d48bfe55e39c.png
picture

ROW_NUMBER() Returns the unique number at the beginning of each line . When there is a relationship ( for example ,BOB vs Carrie),ROW_NUMBER() If the second criterion is not defined , The number will be assigned arbitrarily .

Rank() Return from 1 The unique number of each line that begins , Except when it's related , Grade () The same number will be assigned . Again , The gap will follow the level of repetition .

dense_rank() It's similar to hierarchy (), Except there's no gap after the repeat level . Please note that , Use dense_rank(),Daniel Ranking the first 3, Not the first. 4 position ().

8. Calculation Delta value

Another common application is to compare values from different periods . for example , What's the delta between this month's and last month's sales ? Or what this month and this month and this month last year were ?

Compare the values of different periods to calculate Deltas when , This is a Lead() and LAG() When it works .

Here are some examples :

# Comparing each month's sales to last month    
SELECT month    
       , sales    
       , sales - LAG(sales, 1) OVER (ORDER BY month)    
FROM monthly_sales    
# Comparing each month's sales to the same month last year    
SELECT month    
       , sales    
       , sales - LAG(sales, 12) OVER (ORDER BY month)    
FROM monthly_sales

9. Calculate the total number of runs

If you know about row_number() and lag()/ lead(), It may not surprise you . But if you don't , This is probably one of the most useful window functions , Especially when you want to visualize growth !

Use with SUM() The window function of , We can calculate the total number of runs . See the example below :

SELECT Month    
       , Revenue    
       , SUM(Revenue) OVER (ORDER BY Month) AS Cumulative    
FROM monthly_revenue
56ea4c6fcc6d2d598e9b9282a02da56f.png
picture

10. Date time manipulation

You should certainly expect something involving date time data SQL problem . for example , You may need to group data or change variable formats from DD-MM-Yyyy Convert to simple months .

Some of the features you should know are :

  • refine

  • Yen

  • date_add,date_sub.

  • date_trunc.

Example questions : Given the weather table , Write a SQL Inquire about , To find its previous ( yesterday ) The temperature of the date is higher than that of all the dates ID.

+---------+------------------+------------------+    
| Id(INT) | RecordDate(DATE) | Temperature(INT) |    
+---------+------------------+------------------+    
|       1 |       2015-01-01 |               10 |    
|       2 |       2015-01-02 |               25 |    
|       3 |       2015-01-03 |               20 |    
|       4 |       2015-01-04 |               30 |    
+---------+------------------+------------------+Answer:    
SELECT    
    a.Id    
FROM    
    Weather a,    
    Weather b    
WHERE    
    a.Temperature > b.Temperature    
    AND DATEDIFF(a.RecordDate, b.RecordDate) = 1

That's it ! I hope this will help you in your interview preparation - I Believe , If you know this 10 It's an internal concept , Well, most of them there SQL The question is , You will do well .

As always, , I wish you the best in your study !

( This article is translated from Dimitris Poulopoulos The article 《Ten Advanced SQL Concepts You Should Know for Data Science Interviews》, Reprint please indicate the source , Link to the original text :
https://towardsdatascience.com/ten-advanced-sql-concepts-you-should-know-for-data-science-interviews-4d7015ec74b0)

recommend :

The most comprehensive java Interview question bank

f086d04a1174bafa256a0dff52ec8182.png

PS: Because the official account platform changed the push rules. , If you don't want to miss the content , Remember to click after reading “ Looking at ”, Add one “ Star standard ”, In this way, each new article push will appear in your subscription list for the first time . spot “ Looking at ” Support us !

原网站

版权声明
本文为[Java notes shrimp]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/162/202206110718146534.html