当前位置:网站首页>Azure Synapse Analytics 性能优化指南(3)——使用具体化视图优化性能(下)
Azure Synapse Analytics 性能优化指南(3)——使用具体化视图优化性能(下)
2022-07-26 22:35:00 【zyypjc】
目录
(一)前言
承接Azure Synapse Analytics 性能优化指南(2)一文,本文我们将继续探讨Azure Synapse Analytics中使用具体化视图优化性能。
(二)设计指南
下面是关于使用具体化视图提高查询性能的一般指南:
1. 针对工作负载设计
在开始创建具体化视图之前,必须从查询模式、重要性、频率和生成的数据大小等方面深入了解工作负载。
用户可以针对查询优化器建议的具体化视图运行 EXPLAIN WITH_RECOMMENDATIONS <SQL_statement>。 由于这些建议特定于查询,因此对一个查询有利的具体化视图可能并不适合同一工作负载中的其他查询。
评估这些建议时应考虑工作负载需要。 理想的具体化视图是那些有利于工作负载性能的视图。
2. 注意快速查询和成本之间的权衡
每个具体化视图都有相应的数据存储成本和视图维护成本。 当基表中的数据更改时,具体化视图的大小会增加,其物理结构也会改变。 为了避免查询性能下降,每个具体化视图会由 SQL 引擎单独维护。
当具体化视图和基表更改的数量增加时,维护工作负载会增加。 用户应该检查查询性能的提高是否可以弥补所有具体化视图所产生的成本。
可运行以下查询来生成专用 SQL 池中的具体化视图列表:
SELECT V.name as materialized_view, V.object_id
FROM sys.views V
JOIN sys.indexes I ON V.object_id= I.object_id AND I.index_id < 2;减少具体化视图数量的选项:
确定工作负载中的复杂查询经常使用的常见数据集。 创建具体化视图来存储这些数据集,以便优化器可以在创建执行计划时将它们用作构建基块。
删除使用率低或不再需要的具体化视图。 禁用的具体化视图不会得到维护,但仍会产生存储成本。
合并在相同或相似基表上创建的具体化视图,即使它们的数据没有重叠。 合并具体化视图可能导致视图的大小大于单独视图的总和,但是视图维护成本应该会降低。 例如:
-- Query 1 would benefit from having a materialized view created with this SELECT statement
SELECT A, SUM(B)
FROM T
GROUP BY A
-- Query 2 would benefit from having a materialized view created with this SELECT statement
SELECT C, SUM(D)
FROM T
GROUP BY C
-- You could create a single mateiralized view of this form
SELECT A, C, SUM(B), SUM(D)
FROM T
GROUP BY A, C3. 并非所有性能优化都需要更改查询
SQL 查询优化器可以自动使用部署的具体化视图来改善查询性能。 此支持透明地应用于不引用视图的查询和使用具体化视图创建中不支持的聚合的查询。 无需更改任何查询。 可以检查查询的预估执行计划,确认是否使用了具体化视图。
4. 监视具体化视图
与使用聚集列存储索引 (CCI) 的表一样,具体化视图存储在专用 SQL 池中。 从具体化视图读取数据包括扫描 CCI 索引段和应用基表中的任何增量更改。 当增量更改的数量太多时,从具体化视图解析查询可能比直接查询基表花费更长的时间。
为了避免查询性能下降,最佳做法是运行DBCC PDW_SHOWMATERIALIZEDVIEWOVERHEAD 来监视视图的 overhead_ratio (total_rows / max(1, base_view_row))。 如果具体化视图的 overhead_ratio 过高,则用户应重新生成该视图。
5. 具体化视图和结果集缓存
专用 SQL 池中的这两项功能用于优化查询性能。 结果集缓存用于获得高并发性,使针对静态数据的重复查询快速返回响应。
为使用缓存结果,请求缓存的查询的形式必须与生成缓存的查询匹配。 此外,缓存的结果必须应用于整个查询。
具体化视图允许基表中的数据更改。 具体化视图中的数据可以应用于查询的一部分。 借助这一支持,使用相同的某些计算的不同查询可使用相同的具体化视图,以获得更快的性能。
(三)示例
本例使用类似 TPCDS 的查询,查找通过目录支出比在商店花费更多资金的客户,确定首选客户及其原产国家/地区。 查询包括从涉及 SUM() 和 GROUP BY 的三个子 SELECT 语句的并集中选择前 100 条记录。
WITH year_total AS (
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt+ss_ext_sales_price, 0)/2) year_total
,'s' sale_type
FROM customer
,store_sales
,date_dim
WHERE c_customer_sk = ss_customer_sk
AND ss_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
UNION ALL
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt+cs_ext_sales_price, 0)/2) year_total
,'c' sale_type
FROM customer
,catalog_sales
,date_dim
WHERE c_customer_sk = cs_bill_customer_sk
AND cs_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
UNION ALL
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt+ws_ext_sales_price, 0)/2) year_total
,'w' sale_type
FROM customer
,web_sales
,date_dim
WHERE c_customer_sk = ws_bill_customer_sk
AND ws_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
)
SELECT TOP 100
t_s_secyear.customer_id
,t_s_secyear.customer_first_name
,t_s_secyear.customer_last_name
,t_s_secyear.customer_birth_country
FROM year_total t_s_firstyear
,year_total t_s_secyear
,year_total t_c_firstyear
,year_total t_c_secyear
,year_total t_w_firstyear
,year_total t_w_secyear
WHERE t_s_secyear.customer_id = t_s_firstyear.customer_id
AND t_s_firstyear.customer_id = t_c_secyear.customer_id
AND t_s_firstyear.customer_id = t_c_firstyear.customer_id
AND t_s_firstyear.customer_id = t_w_firstyear.customer_id
AND t_s_firstyear.customer_id = t_w_secyear.customer_id
AND t_s_firstyear.sale_type = 's'
AND t_c_firstyear.sale_type = 'c'
AND t_w_firstyear.sale_type = 'w'
AND t_s_secyear.sale_type = 's'
AND t_c_secyear.sale_type = 'c'
AND t_w_secyear.sale_type = 'w'
AND t_s_firstyear.dyear+0 = 1999
AND t_s_secyear.dyear+0 = 1999+1
AND t_c_firstyear.dyear+0 = 1999
AND t_c_secyear.dyear+0 = 1999+1
AND t_w_firstyear.dyear+0 = 1999
AND t_w_secyear.dyear+0 = 1999+1
AND t_s_firstyear.year_total > 0
AND t_c_firstyear.year_total > 0
AND t_w_firstyear.year_total > 0
AND CASE WHEN t_c_firstyear.year_total > 0 THEN t_c_secyear.year_total / t_c_firstyear.year_total ELSE NULL END
> CASE WHEN t_s_firstyear.year_total > 0 THEN t_s_secyear.year_total / t_s_firstyear.year_total ELSE NULL END
AND CASE WHEN t_c_firstyear.year_total > 0 THEN t_c_secyear.year_total / t_c_firstyear.year_total ELSE NULL END
> CASE WHEN t_w_firstyear.year_total > 0 THEN t_w_secyear.year_total / t_w_firstyear.year_total ELSE NULL END
ORDER BY t_s_secyear.customer_id
,t_s_secyear.customer_first_name
,t_s_secyear.customer_last_name
,t_s_secyear.customer_birth_country
OPTION ( LABEL = 'Query04-af359846-253-3');
检查查询的预估执行计划。 共有 18 个随机排布和 17 个联接操作,执行这些操作需要更多的时间。 现在让我们为三个子 SELECT 语句分别创建一个具体化视图。
CREATE materialized view nbViewSS WITH (DISTRIBUTION=HASH(customer_id)) AS
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt+ss_ext_sales_price, 0)/2) year_total
, count_big(*) AS cb
FROM dbo.customer
,dbo.store_sales
,dbo.date_dim
WHERE c_customer_sk = ss_customer_sk
AND ss_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
GO
CREATE materialized view nbViewCS WITH (DISTRIBUTION=HASH(customer_id)) AS
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt+cs_ext_sales_price, 0)/2) year_total
, count_big(*) as cb
FROM dbo.customer
,dbo.catalog_sales
,dbo.date_dim
WHERE c_customer_sk = cs_bill_customer_sk
AND cs_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
GO
CREATE materialized view nbViewWS WITH (DISTRIBUTION=HASH(customer_id)) AS
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt+ws_ext_sales_price, 0)/2) year_total
, count_big(*) AS cb
FROM dbo.customer
,dbo.web_sales
,dbo.date_dim
WHERE c_customer_sk = ws_bill_customer_sk
AND ws_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year再次检查原始查询的执行计划。 现在,联接数从 17 个更改为 5 个,并且没有随机排布。 选择计划中的“筛选操作”图标,其“输出列表”将显示已从具体化视图而不是基表读取了数据。

使用具体化视图时,同一查询的运行速度更快,且无需进行任何代码更改。
边栏推荐
- [H5 bottom scrolling paging loading]
- Pytorch学习记录(二):张量
- Part II - C language improvement_ 8. File operation
- Silicon Valley class lesson 5 - Tencent cloud object storage and course classification management
- HCIA-R&S自用笔记(21)STP技术背景、STP基础和数据包结构、STP选举规则及案例
- The interviewer asked: this point of JS
- [MySQL] CentOS 7.9 installation and use mysql-5.7.39 binary version
- Dynamic memory management and related topics
- 力扣141题:环形链表
- np. transpose & np.expand_ dims
猜你喜欢

第二部分—C语言提高篇_13. 递归函数

Thousands of tiles' tilt model browsing speeds up, saying goodbye to the embarrassment of jumping out one by one

Three person management of system design

Three effective strategies for the transformation of data supply chain to be coordinated and successful

数据供应链的转型 协调一致走向成功的三大有效策略

An online accident, I suddenly realized the essence of asynchrony
![[MySQL] CentOS 7.9 installation and use mysql-5.7.39 binary version](/img/70/5638080a2d2eabf6ae1f2a8db3abe6.png)
[MySQL] CentOS 7.9 installation and use mysql-5.7.39 binary version

Basic select statement

告别宽表,用 DQL 成就新一代 BI

Hcia-r & s self use notes (18) campus network architecture foundation, switch working principle, VLAN principle
随机推荐
Several inventory terms often used in communication
Three person management of system design
[shader realizes shine effect _shader effect Chapter 3]
What is the reason for oom during redis shake synchronization in shake database?
Signal debugging document developed by car
[interview: concurrent Article 27: multithreading: hesitation mode]
np.transpose & np.expand_dims
JSON formatting gadget -- pyqt5 instance
Part II - C language improvement_ 9. Linked list
Lesson 2 of Silicon Valley classroom - building project environment and developing lecturer management interface
简单的SQL优化
30、 Modern storage system (management database and distributed storage system)
Product principles of non-financial decentralized application
Easily implement seckill system with redis! (including code)
【不积跬步无以至千里】统计日志指定时间段内的关键词
Hcia-r & s self use notes (21) STP technical background, STP foundation and data package structure, STP election rules and cases
【flask高级】结合源码分析flask中的线程隔离机制
第6节:cmake语法介绍
SQL Basics
第二部分—C语言提高篇_11. 预处理