当前位置:网站首页>Azure synapse analytics Performance Optimization Guide (1) -- optimize performance using ordered aggregate column storage indexes
Azure synapse analytics Performance Optimization Guide (1) -- optimize performance using ordered aggregate column storage indexes
2022-07-26 00:39:00 【zyypjc】
Catalog
( Two ) Ordered and unordered clustered columns store indexes
( 3、 ... and ) Query performance
( Four ) Data loading performance
( 5、 ... and ) Reduce segment overlap
( 6、 ... and ) Create order in large tables CCI
( 6、 ... and ) Practical cases
A. Check the sequence and serial number :
( One ) Preface
When the user inquires about special SQL When a column in the pool stores a table , The optimizer checks the minimum and maximum values stored in each segment . Segments beyond the bounds of query predicates are not read from disk to memory . If the number of segments to be read and their total size are small , Query performance can be faster .
( Two ) Ordered and unordered clustered columns store indexes
By default , For every table not created with the index option , An internal component ( Index generator ) An unordered clustered column storage index will be created in the table (CCI). The data in each column is compressed into separate CCI Line segment . The value range of each segment has metadata , therefore , During query execution , Segments beyond the query predicate boundary are not read from disk . CCI Provide the highest level of data compression , It can reduce the segment size to be read , So queries can run faster . however , Because the index generator will not sort the data before compressing it into segments , Therefore, there may be segments with overlapping value ranges , This causes the query to read more segments from the disk , It will take longer to complete .
Create order CCI when , special SQL The pool engine will first sort the existing data in memory by the order key , then , The index generator will compress the data into index segments . Using ordered data can reduce segment overlap , Make the query eliminate segments more effectively , Therefore, the performance can be improved , Because there are fewer segments to read from disk . If you can sort all the data in memory at once , You can avoid overlapping segments . Because the tables in the data warehouse are large , Therefore, this situation does not often happen .
To check the segment range of a column , Please run the following command in combination with table name and column name :
SELECT o.name, pnp.index_id,
cls.row_count, pnp.data_compression_desc,
pnp.pdw_node_id, pnp.distribution_id, cls.segment_id,
cls.column_id,
cls.min_data_id, cls.max_data_id,
cls.max_data_id-cls.min_data_id as difference
FROM sys.pdw_nodes_partitions AS pnp
JOIN sys.pdw_nodes_tables AS Ntables ON pnp.object_id = NTables.object_id AND pnp.pdw_node_id = NTables.pdw_node_id
JOIN sys.pdw_table_mappings AS Tmap ON NTables.name = TMap.physical_name AND substring(TMap.physical_name,40, 10) = pnp.distribution_id
JOIN sys.objects AS o ON TMap.object_id = o.object_id
JOIN sys.pdw_nodes_column_store_segments AS cls ON pnp.partition_id = cls.partition_id AND pnp.distribution_id = cls.distribution_id
JOIN sys.columns as cols ON o.object_id = cols.object_id AND cls.column_id = cols.column_id
WHERE o.name = '<Table Name>' and cols.name = '<Column Name>' and TMap.physical_name not like '%HdTable%'
ORDER BY o.name, pnp.distribution_id, cls.min_data_id;remarks
In order CCI In the table , Same batch DML Or the new data generated by the data loading operation will be sorted in the batch , All data in the table will not be globally sorted . Users can regenerate (REBUILD) Orderly CCI To sort all the data in the table . In dedicated SQL In the pool , Column store index rebuild is an offline operation . For partitioned tables , Regenerate one partition at a time . The data in the regenerated partition is “ offline ” Of , Before rebuilding the partition , These data are not available .
( 3、 ... and ) Query performance
Orderly CCI The degree of query performance improvement depends on the query mode 、 data size 、 Rationality of data sorting 、 The physical structure of the segment , And the... Selected for query execution DWU And resources . In order of design CCI Table time , The user should consider all these factors before choosing a ranking .
Queries with all these patterns are in order CCI Time tends to run faster .
- Queries have equality 、 Inequality or range predicates
- Predicate columns and ordering CCI The columns are the same .
In this example , surface T1 Have a pressed Col_C、Col_B and Col_A Sequentially sorted clustered columns store indexes .
CREATE CLUSTERED COLUMNSTORE INDEX MyOrderedCCI ON T1
ORDER (Col_C, Col_B, Col_A);Because of the inquiry 1 And query 2 Quote all ordered CCI Column , So compared with other queries , The performance of these two queries is most suitable for ordered CCI.
-- Query #1:
SELECT * FROM T1 WHERE Col_C = 'c' AND Col_B = 'b' AND Col_A = 'a';
-- Query #2
SELECT * FROM T1 WHERE Col_B = 'b' AND Col_C = 'c' AND Col_A = 'a';
-- Query #3
SELECT * FROM T1 WHERE Col_B = 'b' AND Col_A = 'a';
-- Query #4
SELECT * FROM T1 WHERE Col_A = 'a' AND Col_C = 'c';( Four ) Data loading performance
Load data in order CCI The performance in a table is similar to loading data into a partitioned table . Due to the need to perform data sorting operations , Load data in order CCI Tables may take longer to load than unordered CCI The watch is longer , But then , Queries can be ordered CCI Run faster .
The following example compares the performance of loading data into tables with different architectures .

The following example uses CCI And order CCI The query performance is compared .

( 5、 ... and ) Reduce segment overlap
The number of overlapping segments depends on the size of the data to be sorted 、 Available memory , And create order CCI Maximum parallelism during (MAXDOP) Set up . The following options can be used to create an order CCI Reduce segment overlap .
At a higher level DWU Upper use xlargerc Resource class , So that before the index generator compresses the data into segments , There is more memory available for data sorting . After entering the index segment , The physical location of the data cannot be changed . There will be no data sorting inside or between segments .
Use MAXDOP = 1 Create order CCI. Used to create ordered CCI Each thread of runs against a portion of the data , And sort the data locally . Data that has been sorted by different threads will not be globally sorted . Using parallel threads can reduce the creation of order CCI Time required , But there are more overlapping segments generated than when using a single thread . at present ,MAXDOP Option can only be used through CREATE TABLE AS SELECT Command creation order CCI surface . adopt CREATE INDEX or CREATE TABLE Command creation order CCI Don't support MAXDOP Options . for example ,
CREATE TABLE Table1 WITH (DISTRIBUTION = HASH(c1), CLUSTERED COLUMNSTORE INDEX ORDER(c1) )
AS SELECT * FROM ExampleTable
OPTION (MAXDOP 1);- Before loading data into the table , Press the sort key in advance to sort the data .
Here is the order CCI Table distribution example , The table eliminates segment overlap based on the above recommendations . This order CCI The table is used MAXDOP 1 and xlargerc, be based on 20-GB The heap table of passes CTAS stay DWU1000c Created in the database . CCI According to BIGINT Sort columns .

( 6、 ... and ) Create order in large tables CCI
Create order CCI Is an offline operation . For tables that do not contain partitions , In order CCI Before the creation process is completed , Users cannot access data . For partitioned tables , Because the engine will create orderly CCI Partition , therefore , In the not yet orderly CCI In the case of creating an operation , Users can still access the data in the partition . Create order in large tables CCI In the process of , You can use this option to minimize downtime :
- In the target large table ( be known as Table_A) Create partitions in .
- Use and table A The same table schema and partition schema create empty order CCI surface ( be known as Table_B).
- Remove a partition from the table A Switch to table B.
- For table B function ALTER INDEX <Ordered_CCI_Index> ON <Table_B> REBUILD PARTITION = <Partition_ID>, To regenerate the swapped in partition .
- in the light of Table_A Repeat step for each partition in 3 and 4.
- Take all sections from Table_A Switch to Table_B And regenerate these partitions , Delete Table_A, And will Table_B Rename it to Table_A.
Tips
For having order CCI For the exclusive use of SQL Pool table ,ALTER INDEX REBUILD Will use tempdb Reorder the data . Monitor during rebuild operation tempdb. If more is needed tempdb Space , Please expand the pool vertically . After index rebuild , Reduce to the original space .
For having order CCI For the exclusive use of SQL Pool table ,ALTER INDEX REORGANIZE Do not reorder the data . To reorder data , Please use ALTER INDEX REBUILD.
About order CCI Maintenance details , Please read the next one Azure Synapse Analytics Performance optimization guidelines (2)
( 6、 ... and ) Practical cases
A. Check the sequence and serial number :
SELECT object_name(c.object_id) table_name, c.name column_name, i.column_store_order_ordinal
FROM sys.index_columns i
JOIN sys.columns c ON i.object_id = c.object_id AND c.column_id = i.column_id
WHERE column_store_order_ordinal <>0;
B. To change the column ordinal , Please add or delete columns in the order list , Or from CCI Change to ordered CCI:
( The following SQL The table name and column name in are the fields in my actual case , Please replace when readers use )
CREATE CLUSTERED COLUMNSTORE INDEX InternetBonus ON dim_zyy_test
ORDER ([CustomerID],[Monthkey] )
WITH (DROP_EXISTING = ON);边栏推荐
猜你喜欢

测试左移和测试右移的概念
![[calculate the number of times that one string is equal to another string]](/img/82/db8ed70464df46c7a700c65d208fef.png)
[calculate the number of times that one string is equal to another string]

Find and locate commands

MySQL - Multi version concurrency control (mvcc)

Research on text classification of e-commerce comments based on mffmb

Modeling and simulation analysis of online medical crowdfunding communication based on SEIR model

Nest. JS uses express but not completely

Verilog语法基础HDL Bits训练 06

OAuth2和JWT

Nodejs starts mqtt service with an error schemaerror: expected 'schema' to be an object or Boolean problem solving
随机推荐
SereTOD2022 Track1代码剖析-面向半监督和强化学习的任务型对话系统挑战赛
P4047 [JSOI2010]部落划分
使用LocalDate类完成日历设计
Pikachu target clearance and source code analysis
Verilog语法基础HDL Bits训练 06
【计算一个字符串和另一个字符串相等的次数】
数据库工具对决:HeidiSQL 与 Navicat
2022/7/25 exam summary
[untitled] how to realize pluggable configuration?
HCIP第十三天
Trial division -- power of 3
BGP comprehensive experiment
C # from entry to mastery (III)
LCA three postures (multiplication, tarjan+ joint search set, tree chain dissection)
Comparing the seven distributed transaction schemes, I prefer Alibaba's open source Seata (principle + Practice)
12. Neural network model
8个小妙招-数据库性能优化,yyds~
Hcip day 11
MWEC:一种基于多语义词向量的中文新词发现方法
TID-MOP:面向数据交易所场景下的安全管控综合框架