当前位置:网站首页>Azure Synapse Analytics 性能优化指南(3)——使用具体化视图优化性能(下)
Azure Synapse Analytics 性能优化指南(3)——使用具体化视图优化性能(下)
2022-07-26 22:35:00 【zyypjc】
目录
(一)前言
承接Azure Synapse Analytics 性能优化指南(2)一文,本文我们将继续探讨Azure Synapse Analytics中使用具体化视图优化性能。
(二)设计指南
下面是关于使用具体化视图提高查询性能的一般指南:
1. 针对工作负载设计
在开始创建具体化视图之前,必须从查询模式、重要性、频率和生成的数据大小等方面深入了解工作负载。
用户可以针对查询优化器建议的具体化视图运行 EXPLAIN WITH_RECOMMENDATIONS <SQL_statement>。 由于这些建议特定于查询,因此对一个查询有利的具体化视图可能并不适合同一工作负载中的其他查询。
评估这些建议时应考虑工作负载需要。 理想的具体化视图是那些有利于工作负载性能的视图。
2. 注意快速查询和成本之间的权衡
每个具体化视图都有相应的数据存储成本和视图维护成本。 当基表中的数据更改时,具体化视图的大小会增加,其物理结构也会改变。 为了避免查询性能下降,每个具体化视图会由 SQL 引擎单独维护。
当具体化视图和基表更改的数量增加时,维护工作负载会增加。 用户应该检查查询性能的提高是否可以弥补所有具体化视图所产生的成本。
可运行以下查询来生成专用 SQL 池中的具体化视图列表:
SELECT V.name as materialized_view, V.object_id
FROM sys.views V
JOIN sys.indexes I ON V.object_id= I.object_id AND I.index_id < 2;减少具体化视图数量的选项:
确定工作负载中的复杂查询经常使用的常见数据集。 创建具体化视图来存储这些数据集,以便优化器可以在创建执行计划时将它们用作构建基块。
删除使用率低或不再需要的具体化视图。 禁用的具体化视图不会得到维护,但仍会产生存储成本。
合并在相同或相似基表上创建的具体化视图,即使它们的数据没有重叠。 合并具体化视图可能导致视图的大小大于单独视图的总和,但是视图维护成本应该会降低。 例如:
-- Query 1 would benefit from having a materialized view created with this SELECT statement
SELECT A, SUM(B)
FROM T
GROUP BY A
-- Query 2 would benefit from having a materialized view created with this SELECT statement
SELECT C, SUM(D)
FROM T
GROUP BY C
-- You could create a single mateiralized view of this form
SELECT A, C, SUM(B), SUM(D)
FROM T
GROUP BY A, C3. 并非所有性能优化都需要更改查询
SQL 查询优化器可以自动使用部署的具体化视图来改善查询性能。 此支持透明地应用于不引用视图的查询和使用具体化视图创建中不支持的聚合的查询。 无需更改任何查询。 可以检查查询的预估执行计划,确认是否使用了具体化视图。
4. 监视具体化视图
与使用聚集列存储索引 (CCI) 的表一样,具体化视图存储在专用 SQL 池中。 从具体化视图读取数据包括扫描 CCI 索引段和应用基表中的任何增量更改。 当增量更改的数量太多时,从具体化视图解析查询可能比直接查询基表花费更长的时间。
为了避免查询性能下降,最佳做法是运行DBCC PDW_SHOWMATERIALIZEDVIEWOVERHEAD 来监视视图的 overhead_ratio (total_rows / max(1, base_view_row))。 如果具体化视图的 overhead_ratio 过高,则用户应重新生成该视图。
5. 具体化视图和结果集缓存
专用 SQL 池中的这两项功能用于优化查询性能。 结果集缓存用于获得高并发性,使针对静态数据的重复查询快速返回响应。
为使用缓存结果,请求缓存的查询的形式必须与生成缓存的查询匹配。 此外,缓存的结果必须应用于整个查询。
具体化视图允许基表中的数据更改。 具体化视图中的数据可以应用于查询的一部分。 借助这一支持,使用相同的某些计算的不同查询可使用相同的具体化视图,以获得更快的性能。
(三)示例
本例使用类似 TPCDS 的查询,查找通过目录支出比在商店花费更多资金的客户,确定首选客户及其原产国家/地区。 查询包括从涉及 SUM() 和 GROUP BY 的三个子 SELECT 语句的并集中选择前 100 条记录。
WITH year_total AS (
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt+ss_ext_sales_price, 0)/2) year_total
,'s' sale_type
FROM customer
,store_sales
,date_dim
WHERE c_customer_sk = ss_customer_sk
AND ss_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
UNION ALL
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt+cs_ext_sales_price, 0)/2) year_total
,'c' sale_type
FROM customer
,catalog_sales
,date_dim
WHERE c_customer_sk = cs_bill_customer_sk
AND cs_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
UNION ALL
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt+ws_ext_sales_price, 0)/2) year_total
,'w' sale_type
FROM customer
,web_sales
,date_dim
WHERE c_customer_sk = ws_bill_customer_sk
AND ws_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
)
SELECT TOP 100
t_s_secyear.customer_id
,t_s_secyear.customer_first_name
,t_s_secyear.customer_last_name
,t_s_secyear.customer_birth_country
FROM year_total t_s_firstyear
,year_total t_s_secyear
,year_total t_c_firstyear
,year_total t_c_secyear
,year_total t_w_firstyear
,year_total t_w_secyear
WHERE t_s_secyear.customer_id = t_s_firstyear.customer_id
AND t_s_firstyear.customer_id = t_c_secyear.customer_id
AND t_s_firstyear.customer_id = t_c_firstyear.customer_id
AND t_s_firstyear.customer_id = t_w_firstyear.customer_id
AND t_s_firstyear.customer_id = t_w_secyear.customer_id
AND t_s_firstyear.sale_type = 's'
AND t_c_firstyear.sale_type = 'c'
AND t_w_firstyear.sale_type = 'w'
AND t_s_secyear.sale_type = 's'
AND t_c_secyear.sale_type = 'c'
AND t_w_secyear.sale_type = 'w'
AND t_s_firstyear.dyear+0 = 1999
AND t_s_secyear.dyear+0 = 1999+1
AND t_c_firstyear.dyear+0 = 1999
AND t_c_secyear.dyear+0 = 1999+1
AND t_w_firstyear.dyear+0 = 1999
AND t_w_secyear.dyear+0 = 1999+1
AND t_s_firstyear.year_total > 0
AND t_c_firstyear.year_total > 0
AND t_w_firstyear.year_total > 0
AND CASE WHEN t_c_firstyear.year_total > 0 THEN t_c_secyear.year_total / t_c_firstyear.year_total ELSE NULL END
> CASE WHEN t_s_firstyear.year_total > 0 THEN t_s_secyear.year_total / t_s_firstyear.year_total ELSE NULL END
AND CASE WHEN t_c_firstyear.year_total > 0 THEN t_c_secyear.year_total / t_c_firstyear.year_total ELSE NULL END
> CASE WHEN t_w_firstyear.year_total > 0 THEN t_w_secyear.year_total / t_w_firstyear.year_total ELSE NULL END
ORDER BY t_s_secyear.customer_id
,t_s_secyear.customer_first_name
,t_s_secyear.customer_last_name
,t_s_secyear.customer_birth_country
OPTION ( LABEL = 'Query04-af359846-253-3');
检查查询的预估执行计划。 共有 18 个随机排布和 17 个联接操作,执行这些操作需要更多的时间。 现在让我们为三个子 SELECT 语句分别创建一个具体化视图。
CREATE materialized view nbViewSS WITH (DISTRIBUTION=HASH(customer_id)) AS
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt+ss_ext_sales_price, 0)/2) year_total
, count_big(*) AS cb
FROM dbo.customer
,dbo.store_sales
,dbo.date_dim
WHERE c_customer_sk = ss_customer_sk
AND ss_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
GO
CREATE materialized view nbViewCS WITH (DISTRIBUTION=HASH(customer_id)) AS
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt+cs_ext_sales_price, 0)/2) year_total
, count_big(*) as cb
FROM dbo.customer
,dbo.catalog_sales
,dbo.date_dim
WHERE c_customer_sk = cs_bill_customer_sk
AND cs_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
GO
CREATE materialized view nbViewWS WITH (DISTRIBUTION=HASH(customer_id)) AS
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt+ws_ext_sales_price, 0)/2) year_total
, count_big(*) AS cb
FROM dbo.customer
,dbo.web_sales
,dbo.date_dim
WHERE c_customer_sk = ws_bill_customer_sk
AND ws_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year再次检查原始查询的执行计划。 现在,联接数从 17 个更改为 5 个,并且没有随机排布。 选择计划中的“筛选操作”图标,其“输出列表”将显示已从具体化视图而不是基表读取了数据。

使用具体化视图时,同一查询的运行速度更快,且无需进行任何代码更改。
边栏推荐
- P5469 [NOI2019] 机器人(拉格朗日插值、区间dp)
- 大疆智图、CC生产了多份数据,如何合并为一份在图新地球进行加载
- Vit:vision transformer super detailed with code
- 简单的SQL优化
- [MySQL] CentOS 7.9 installation and use mysql-5.7.39 binary version
- Problems and solutions encountered in using nextline(), nextint() and next() in scanner
- Basic select statement
- Part II - C language improvement_ 9. Linked list
- MySQL random paging to get non duplicate data
- Application of workflow engine in vivo marketing automation | engine 03
猜你喜欢

HCIA-R&S自用笔记(19)VLAN配置及实验、VLAN间路由

org.yaml.snakeyaml.scanner. ScannerException: mapping values are not allowed here in ‘reader‘, line

HCIA-R&S自用笔记(23)DHCP

Hcia-r & s self use notes (21) STP technical background, STP foundation and data package structure, STP election rules and cases

研究阿尔茨海默病最经典的Nature论文涉嫌造假

第二部分—C语言提高篇_7. 结构体

Dynamic memory management and related topics

Kalibr calibration realsensed435i -- multi camera calibration

Vit:vision transformer super detailed with code

SQL Basics
随机推荐
Product principles of non-financial decentralized application
JSON formatting gadget -- pyqt5 instance
What are the use cases in the Internet of things industry in 2022?
[shader realizes swaying effect _shader effect Chapter 4]
Kt6368a Bluetooth chip development precautions and problem collection - long term update
2022年物联网行业有哪些用例?
Too busy with scientific research to take care of your family? Chen Ting: life cannot have only one fulcrum
第二部分—C语言提高篇_13. 递归函数
Dajiang Zhitu and CC have produced multiple copies of data. How to combine them into one and load them in the new earth map
Pytorch学习记录(二):张量
第二部分—C语言提高篇_12. 动/精态库的封装和使用
会议OA项目排座功能以及送审功能
[H5 bottom scrolling paging loading]
Lesson 2 of Silicon Valley classroom - building project environment and developing lecturer management interface
会议OA之我的会议
Ribbon load balancing
Typescript stage learning
P5469 [noi2019] robot (Lagrange interpolation, interval DP)
Hcia-r & s self use notes (19) VLAN configuration and experiment, routing between VLANs
Security team: Recently, there is an rce vulnerability in the COREMAIL email client of windows, which may lead to the disclosure of the private key of the wallet