当前位置:网站首页>Azure Synapse Analytics 性能优化指南(3)——使用具体化视图优化性能(下)
Azure Synapse Analytics 性能优化指南(3)——使用具体化视图优化性能(下)
2022-07-26 22:35:00 【zyypjc】
目录
(一)前言
承接Azure Synapse Analytics 性能优化指南(2)一文,本文我们将继续探讨Azure Synapse Analytics中使用具体化视图优化性能。
(二)设计指南
下面是关于使用具体化视图提高查询性能的一般指南:
1. 针对工作负载设计
在开始创建具体化视图之前,必须从查询模式、重要性、频率和生成的数据大小等方面深入了解工作负载。
用户可以针对查询优化器建议的具体化视图运行 EXPLAIN WITH_RECOMMENDATIONS <SQL_statement>。 由于这些建议特定于查询,因此对一个查询有利的具体化视图可能并不适合同一工作负载中的其他查询。
评估这些建议时应考虑工作负载需要。 理想的具体化视图是那些有利于工作负载性能的视图。
2. 注意快速查询和成本之间的权衡
每个具体化视图都有相应的数据存储成本和视图维护成本。 当基表中的数据更改时,具体化视图的大小会增加,其物理结构也会改变。 为了避免查询性能下降,每个具体化视图会由 SQL 引擎单独维护。
当具体化视图和基表更改的数量增加时,维护工作负载会增加。 用户应该检查查询性能的提高是否可以弥补所有具体化视图所产生的成本。
可运行以下查询来生成专用 SQL 池中的具体化视图列表:
SELECT V.name as materialized_view, V.object_id
FROM sys.views V
JOIN sys.indexes I ON V.object_id= I.object_id AND I.index_id < 2;减少具体化视图数量的选项:
确定工作负载中的复杂查询经常使用的常见数据集。 创建具体化视图来存储这些数据集,以便优化器可以在创建执行计划时将它们用作构建基块。
删除使用率低或不再需要的具体化视图。 禁用的具体化视图不会得到维护,但仍会产生存储成本。
合并在相同或相似基表上创建的具体化视图,即使它们的数据没有重叠。 合并具体化视图可能导致视图的大小大于单独视图的总和,但是视图维护成本应该会降低。 例如:
-- Query 1 would benefit from having a materialized view created with this SELECT statement
SELECT A, SUM(B)
FROM T
GROUP BY A
-- Query 2 would benefit from having a materialized view created with this SELECT statement
SELECT C, SUM(D)
FROM T
GROUP BY C
-- You could create a single mateiralized view of this form
SELECT A, C, SUM(B), SUM(D)
FROM T
GROUP BY A, C3. 并非所有性能优化都需要更改查询
SQL 查询优化器可以自动使用部署的具体化视图来改善查询性能。 此支持透明地应用于不引用视图的查询和使用具体化视图创建中不支持的聚合的查询。 无需更改任何查询。 可以检查查询的预估执行计划,确认是否使用了具体化视图。
4. 监视具体化视图
与使用聚集列存储索引 (CCI) 的表一样,具体化视图存储在专用 SQL 池中。 从具体化视图读取数据包括扫描 CCI 索引段和应用基表中的任何增量更改。 当增量更改的数量太多时,从具体化视图解析查询可能比直接查询基表花费更长的时间。
为了避免查询性能下降,最佳做法是运行DBCC PDW_SHOWMATERIALIZEDVIEWOVERHEAD 来监视视图的 overhead_ratio (total_rows / max(1, base_view_row))。 如果具体化视图的 overhead_ratio 过高,则用户应重新生成该视图。
5. 具体化视图和结果集缓存
专用 SQL 池中的这两项功能用于优化查询性能。 结果集缓存用于获得高并发性,使针对静态数据的重复查询快速返回响应。
为使用缓存结果,请求缓存的查询的形式必须与生成缓存的查询匹配。 此外,缓存的结果必须应用于整个查询。
具体化视图允许基表中的数据更改。 具体化视图中的数据可以应用于查询的一部分。 借助这一支持,使用相同的某些计算的不同查询可使用相同的具体化视图,以获得更快的性能。
(三)示例
本例使用类似 TPCDS 的查询,查找通过目录支出比在商店花费更多资金的客户,确定首选客户及其原产国家/地区。 查询包括从涉及 SUM() 和 GROUP BY 的三个子 SELECT 语句的并集中选择前 100 条记录。
WITH year_total AS (
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt+ss_ext_sales_price, 0)/2) year_total
,'s' sale_type
FROM customer
,store_sales
,date_dim
WHERE c_customer_sk = ss_customer_sk
AND ss_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
UNION ALL
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt+cs_ext_sales_price, 0)/2) year_total
,'c' sale_type
FROM customer
,catalog_sales
,date_dim
WHERE c_customer_sk = cs_bill_customer_sk
AND cs_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
UNION ALL
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt+ws_ext_sales_price, 0)/2) year_total
,'w' sale_type
FROM customer
,web_sales
,date_dim
WHERE c_customer_sk = ws_bill_customer_sk
AND ws_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
)
SELECT TOP 100
t_s_secyear.customer_id
,t_s_secyear.customer_first_name
,t_s_secyear.customer_last_name
,t_s_secyear.customer_birth_country
FROM year_total t_s_firstyear
,year_total t_s_secyear
,year_total t_c_firstyear
,year_total t_c_secyear
,year_total t_w_firstyear
,year_total t_w_secyear
WHERE t_s_secyear.customer_id = t_s_firstyear.customer_id
AND t_s_firstyear.customer_id = t_c_secyear.customer_id
AND t_s_firstyear.customer_id = t_c_firstyear.customer_id
AND t_s_firstyear.customer_id = t_w_firstyear.customer_id
AND t_s_firstyear.customer_id = t_w_secyear.customer_id
AND t_s_firstyear.sale_type = 's'
AND t_c_firstyear.sale_type = 'c'
AND t_w_firstyear.sale_type = 'w'
AND t_s_secyear.sale_type = 's'
AND t_c_secyear.sale_type = 'c'
AND t_w_secyear.sale_type = 'w'
AND t_s_firstyear.dyear+0 = 1999
AND t_s_secyear.dyear+0 = 1999+1
AND t_c_firstyear.dyear+0 = 1999
AND t_c_secyear.dyear+0 = 1999+1
AND t_w_firstyear.dyear+0 = 1999
AND t_w_secyear.dyear+0 = 1999+1
AND t_s_firstyear.year_total > 0
AND t_c_firstyear.year_total > 0
AND t_w_firstyear.year_total > 0
AND CASE WHEN t_c_firstyear.year_total > 0 THEN t_c_secyear.year_total / t_c_firstyear.year_total ELSE NULL END
> CASE WHEN t_s_firstyear.year_total > 0 THEN t_s_secyear.year_total / t_s_firstyear.year_total ELSE NULL END
AND CASE WHEN t_c_firstyear.year_total > 0 THEN t_c_secyear.year_total / t_c_firstyear.year_total ELSE NULL END
> CASE WHEN t_w_firstyear.year_total > 0 THEN t_w_secyear.year_total / t_w_firstyear.year_total ELSE NULL END
ORDER BY t_s_secyear.customer_id
,t_s_secyear.customer_first_name
,t_s_secyear.customer_last_name
,t_s_secyear.customer_birth_country
OPTION ( LABEL = 'Query04-af359846-253-3');
检查查询的预估执行计划。 共有 18 个随机排布和 17 个联接操作,执行这些操作需要更多的时间。 现在让我们为三个子 SELECT 语句分别创建一个具体化视图。
CREATE materialized view nbViewSS WITH (DISTRIBUTION=HASH(customer_id)) AS
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt+ss_ext_sales_price, 0)/2) year_total
, count_big(*) AS cb
FROM dbo.customer
,dbo.store_sales
,dbo.date_dim
WHERE c_customer_sk = ss_customer_sk
AND ss_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
GO
CREATE materialized view nbViewCS WITH (DISTRIBUTION=HASH(customer_id)) AS
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt+cs_ext_sales_price, 0)/2) year_total
, count_big(*) as cb
FROM dbo.customer
,dbo.catalog_sales
,dbo.date_dim
WHERE c_customer_sk = cs_bill_customer_sk
AND cs_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
GO
CREATE materialized view nbViewWS WITH (DISTRIBUTION=HASH(customer_id)) AS
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt+ws_ext_sales_price, 0)/2) year_total
, count_big(*) AS cb
FROM dbo.customer
,dbo.web_sales
,dbo.date_dim
WHERE c_customer_sk = ws_bill_customer_sk
AND ws_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year再次检查原始查询的执行计划。 现在,联接数从 17 个更改为 5 个,并且没有随机排布。 选择计划中的“筛选操作”图标,其“输出列表”将显示已从具体化视图而不是基表读取了数据。

使用具体化视图时,同一查询的运行速度更快,且无需进行任何代码更改。
边栏推荐
- Silicon Valley class lesson 5 - Tencent cloud object storage and course classification management
- 第二部分—C语言提高篇_11. 预处理
- Basic select statement
- Part II - C language improvement_ 9. Linked list
- The interviewer asked: this point of JS
- New employees of black maredge takeout
- 30、 Modern storage system (management database and distributed storage system)
- Public cloud security and compliance considerations
- 8 other programming languages -- Recording
- Re understand the life world and ourselves
猜你喜欢

np. transpose & np.expand_ dims

Pyqt5 how to set pushbutton click event to obtain file address
![[MySQL] CentOS 7.9 installation and use mysql-5.7.39 binary version](/img/70/5638080a2d2eabf6ae1f2a8db3abe6.png)
[MySQL] CentOS 7.9 installation and use mysql-5.7.39 binary version

SQL Basics

np.transpose & np.expand_dims

Science | 华盛顿大学利用AI和结构预测设计全新蛋白质

数据供应链的转型 协调一致走向成功的三大有效策略

Esmfold: a new breakthrough in protein structure prediction after alphafold2

04-传统的Synchronized锁

第二部分—C语言提高篇_6. 多维数组
随机推荐
Several inventory terms often used in communication
2. Realize the map of navigation bar and battle page
Three effective strategies for the transformation of data supply chain to be coordinated and successful
The interviewer asked: this point of JS
The nature and proof of the center of gravity of [mathematics] tree
Part II - C language improvement_ 5. Bit operation
Basic select statement
HCIA-R&S自用笔记(20)VLAN综合实验、GVRP
证券公司哪家佣金最低?网上开户安全吗
Esmfold: a new breakthrough in protein structure prediction after alphafold2
[interview: concurrent Article 27: multithreading: hesitation mode]
Silicon Valley class lesson 7 - Tencent cloud on demand management module (2)
Download win10 system image and create virtual machine on VMware virtual machine
第二部分—C语言提高篇_5. 位运算
告别宽表,用 DQL 成就新一代 BI
第二部分—C语言提高篇_12. 动/精态库的封装和使用
HCIA-R&S自用笔记(18)园区网架构基础、交换机工作原理、VLAN原理
Kingbasees SQL language reference manual of Jincang database (3.1.1.3. currency type)
会议OA之我的会议
Basic operations of objects