当前位置:网站首页>SQL (2) - join window function view
SQL (2) - join window function view
2022-08-05 05:37:00 【share16】
SQL基础模块
SQL(二) —— joinWindow function view
大家可以关注知乎或微信公众号的share16,我们也会同步更新此文章.
一、Views and subqueries
1.1 什么是视图
视图
是一个虚拟表,不同于直接操作数据表,视图是依据SELECT语句来创建的,所以操作视图时会根据创建视图的SELECT语句生成一张虚拟表,然后在这张虚拟表上做SQL操作.
The reason the view exists:
- By defining a view,Can be used frequentlySELECT语句保存以提高效率;
- By defining a view,It can make the data seen by the user more clear;
- By defining a view,All fields of the data table may not be exposed to the public,增强数据的保密性;
- By defining a view,Data redundancy can be reduced;
视图与表的区别:是否保存了实际的数据
1.2 视图操作
创建视图
CREATE VIEW < 视图名 > AS
< SELECT语句 >
修改视图
ALTER VIEW < 视图名 > AS
< SELECT语句 >
删除视图
DROP VIEW < 视图名 >
视图归根结底还是从表派生出来的,So if the original table is updated,那么视图中的数据也可以更新;Conversely, if the view has changed,而原表没有进行相应更新的话,就无法保证数据的一致性.
1.3 什么是子查询
子查询
也被称为嵌套查询,指一个查询语句嵌套在另一个查询语句内部的查询;When using subqueries in a query,Subqueries are executed first,The main query is then executed based on the results returned by the subquery.
- 嵌套子查询:Subqueries can also be nestedSELECT语句;
- 标量子查询:Also called a single subquery,即返回表中具体的“某一行的某一列”;
- 关联子查询:are subqueries that rely on information from the main query,This means that the tables in the subquery can be related to the tables in the main query(可理解为:in the main query and subquery,Use the same table);
二、函数
所谓函数,类似一个黑盒子,你给它一个输入值,它便按照预设的程序定义给出返回值,输入值称为参数.
- 算术函数:除了+ - * /外,还有 绝对值ABS()、求余MOD()、四舍五入ROUND()等;
- 字符串函数:拼接CONCAT(s1,s2,…)、长度LENGTH(s)、替换REPLACE(s,old,new)、截取(pop即索引,从1开始,python从0开始) SUBSTRING(str FROM pos FOR len)、截取(先按delim对str分割,索引从1开始) SUBSTRING_INDEX(str,delim,count)等;
- 日期函数:now()、curdate()、current_date()、current_time()、获取date的年/时等EXTRACT(unit FROM date)等;
- 转换函数:字符串转数字CAST(‘123’ AS BINARY)、数字转字符串CAST(123 AS CHAR)、Returns the first non-null value from the leftCOALESCE(value1,value2,value3,…)等;
- 谓词:LIKE(
%
匹配零个或多个字符串,_
匹配任意1个字符)、IN、BETWEEN、IS NULL、IS NOT NULL等; - CASE WHEN表达式:sum(case when…)、count(case when…)等;
CASE WHEN < 求值表达式 > THEN < 表达式 >
WHEN < 求值表达式 > THEN < 表达式 > ...
ELSE < 表达式 > END [AS < 字段名 >]
三、数据拼接(union/join)
数据拼接
大致可分为两类,That is, splicing up and down(集合运算)and splicing left and right(列连接).
集合运算
- union:取并集且去重,但
union all
Take the union without deduplication;(The number of fields in the upper and lower tables&The numeric types must be the same;若要order排序,Just write it on the last line); - intersect:取交集,但目前的mysqlThe version does not support arithmetic;
- except:取差集(The former table minus the latter table,If the order is different,运行结果也不同),但目前的mysqlThe version does not support arithmetic;
列连接
- 内连接(join或inner join):Data on both sides will be retained,类似于交集;
- 左连接(left join):The data in the left table will be retained,If there is no matching data in the right table,则返回null;
- 右连接(right join):The data in the right table will be retained,If there is no matching data in the left table,则返回null;
- 全连接(full join):Regardless of whether the two sides match,所有数据都会保留,If there is no matching data,则返回null;
- 笛卡尔积(cross join):Also called cross-bonding,as left table3条记录,右表4条记录,结果有12条记录;
四、窗口函数
窗口函数
也称为OLAP函数,意思是对数据库数据进行实时分析处理.常规的SELECT语句都是对整张表进行查询,The window function allows us to selectively take a certain part of the data for summarization、计算和排序.
<窗口函数> OVER ([PARTITION BY <列名>] ORDER BY <列名>)
partition by:用来分组,类似于group by子句;order by:用来排序,That is to decide which rule to follow(字段)来排序;
- 专用窗口函数:有rank、dense_rank、row_number等;
- Application of aggregate functions:有sum、avg、max等,其结果是一个累计值;
- Add subtotals/总计等:group by子句后,添加
WITH ROLLUP
;
五、练习题
The dataset used for the exercises,点此下载.
01. 连接(join)
请使用A股上市公司季度营收预测数据集income_statement、company_operating和market_data;以market _data为主表,将三张表中的TICKER_SYMBOL为000545和200550的信息合并在一起(只需要显示以下字段).
select distinct a.TICKER_SYMBOL,a.END_DATE,a.CLOSE_PRICE,
b.INDIC_NAME_EN,b.VALUE,c.T_REVENUE,c.T_COGS,c.N_INCOME
from market_data a
left join company_operating b
on a.TICKER_SYMBOL = b.TICKER_SYMBOL and a.END_DATE = b.END_DATE
left join income_statement c
on a.TICKER_SYMBOL = c.TICKER_SYMBOL and a.END_DATE = c.END_DATE
where a.TICKER_SYMBOL in (000545,200550)
order by a.TICKER_SYMBOL,a.END_DATE
使用A股上市公司季度营收预测中的数据集Macro_Industry,计算’Depository Securities: Circulation Market Value of Listed Stocks’在2015年用电最高峰是发生在哪月?并且相比去年同期增长/减少了多少个百分比?
select m.*,n.value_2014,round((m.value_2015-n.value_2014)/n.value_2014,2) 'rate'
from (select month(PERIOD_DATE)'month',sum(DATA_VALUE)'value_2015'
from macro_industry
where year(PERIOD_DATE) = 2015
and name_cn = 'Depository Securities: Circulation Market Value of Listed Stocks'
group by month ) m
left join (select month(PERIOD_DATE)'month',sum(DATA_VALUE)'value_2014'
from macro_industry
where year(PERIOD_DATE) = 2014
and name_cn = 'Depository Securities: Circulation Market Value of Listed Stocks'
group by month ) n on m.month = n.month
order by m.value_2015 desc
02. 排序(rank/dense_rank/row_number)
Please use datasetwinequality-red,找出pH=3.03的所有红葡萄酒,然后对其citric acid进行中式排名(相同排名的下一个名次应该是下一个连续的整数值,In other words, there should be no rankings“间隔”).
select pH,`citric acid`,
dense_rank() over(order by `citric acid`) '排名'
from `winequality-red`
where pH = 3.03
Please use datasetwinequality-white,找出pH=3.63的所有白葡萄酒,然后对其residual sugar量进行英式排名(非连续的排名).
select pH, `residual sugar`,
rank() over(order by `residual sugar`) '排名'
from `winequality-white`
where pH = 3.63
03. 分割字符串(substring_index)
使用数据集ccf_offline_stage1_test_revised,to be found separately2016年7月期间,发放优惠券总金额最多和发放The merchant with the most coupons(Only full deductions are considered,Do not consider discounts;Discount_rate:x in [0,1]代表折扣率,x:y表示满x减y).
select Merchant_id,sum(amount),count(Coupon_id)
from (select Merchant_id,Coupon_id,Discount_rate,
SUBSTRING_INDEX(Discount_rate,':',-1) amount,
case when (Discount_rate between 0 and 1) then '折扣' else '满减' end as type
from ccf_offline_stage1_test_revised
where Date_received >= '2016-07-01' and Date_received <= '2016-07-31') k
where k.type = '满减'
group by Merchant_id
order by sum(amount) desc -- The merchant with the most total coupon amount
#order by count(Coupon_id) desc -- The merchant with the most coupons
limit 1
谢谢大家
边栏推荐
- Thread handler句柄 IntentServvice handlerThread
- 【Pytorch学习笔记】10.如何快速创建一个自己的Dataset数据集对象(继承Dataset类并重写对应方法)
- el-pagination分页分页设置
- Flink 状态与容错 ( state 和 Fault Tolerance)
- 基于Flink CDC实现实时数据采集(三)-Function接口实现
- 学习总结week2_3
- 通过Flink-Sql将Kafka数据写入HDFS
- flink基本原理及应用场景分析
- flink项目开发-配置jar依赖,连接器,类库
- [Go through 7] Notes from the first section of the fully connected neural network video
猜你喜欢
Lecture 5 Using pytorch to implement linear regression
Kubernetes常备技能
BFC详解(Block Formmating Context)
【Pytorch学习笔记】8.训练类别不均衡数据时,如何使用WeightedRandomSampler(权重采样器)
flink部署操作-flink standalone集群安装部署
day10-字符串作业
BroadCast Receiver(广播)详解
flink on yarn 集群模式启动报错及解决方案汇总
2022年中总结关键词:裁员、年终奖、晋升、涨薪、疫情
【论文精读】ROC和PR曲线的关系(The relationship between Precision-Recall and ROC curves)
随机推荐
Database experiment five backup and recovery
npm搭建本地服务器,直接运行build后的目录
DOM and its applications
The difference between the operators and logical operators
学习总结week3_1函数
学习总结week3_4类与对象
flink yarn-session的两种使用方式
【数据库和SQL学习笔记】3.数据操纵语言(DML)、SELECT查询初阶用法
学习总结week2_4
学习总结week2_5
Lecture 3 Gradient Tutorial Gradient Descent and Stochastic Gradient Descent
【论文精读】R-CNN 之预测框回归(Bounding box regression)问题详述
Web Component-处理数据
vscode要安装的插件
The software design experiment four bridge model experiment
CAP+BASE
怎么更改el-table-column的边框线
通过Flink-Sql将Kafka数据写入HDFS
大型Web网站高并发架构方案
[Go through 9] Convolution