当前位置：网站首页>Azure synapse analytics Performance Optimization Guide (1) -- optimize performance using ordered aggregate column storage indexes

Azure synapse analytics Performance Optimization Guide (1) -- optimize performance using ordered aggregate column storage indexes

2022-07-26 00:39:00 【zyypjc】

Catalog

（ One ） Preface

（ Two ） Ordered and unordered clustered columns store indexes

（ 3、 ... and ） Query performance

（ Four ） Data loading performance

（ 5、 ... and ） Reduce segment overlap

（ 6、 ... and ） Create order in large tables CCI

（ 6、 ... and ） Practical cases

A. Check the sequence and serial number ：

B. To change the column ordinal , Please add or delete columns in the order list , Or from CCI Change to ordered CCI：

（ One ） Preface

When the user inquires about special SQL When a column in the pool stores a table , The optimizer checks the minimum and maximum values stored in each segment . Segments beyond the bounds of query predicates are not read from disk to memory . If the number of segments to be read and their total size are small , Query performance can be faster .

（ Two ） Ordered and unordered clustered columns store indexes

By default , For every table not created with the index option , An internal component （ Index generator ） An unordered clustered column storage index will be created in the table (CCI). The data in each column is compressed into separate CCI Line segment . The value range of each segment has metadata , therefore , During query execution , Segments beyond the query predicate boundary are not read from disk . CCI Provide the highest level of data compression , It can reduce the segment size to be read , So queries can run faster . however , Because the index generator will not sort the data before compressing it into segments , Therefore, there may be segments with overlapping value ranges , This causes the query to read more segments from the disk , It will take longer to complete .

Create order CCI when , special SQL The pool engine will first sort the existing data in memory by the order key , then , The index generator will compress the data into index segments . Using ordered data can reduce segment overlap , Make the query eliminate segments more effectively , Therefore, the performance can be improved , Because there are fewer segments to read from disk . If you can sort all the data in memory at once , You can avoid overlapping segments . Because the tables in the data warehouse are large , Therefore, this situation does not often happen .

To check the segment range of a column , Please run the following command in combination with table name and column name ：

SELECT o.name, pnp.index_id, 
cls.row_count, pnp.data_compression_desc, 
pnp.pdw_node_id, pnp.distribution_id, cls.segment_id, 
cls.column_id, 
cls.min_data_id, cls.max_data_id, 
cls.max_data_id-cls.min_data_id as difference
FROM sys.pdw_nodes_partitions AS pnp
   JOIN sys.pdw_nodes_tables AS Ntables ON pnp.object_id = NTables.object_id AND pnp.pdw_node_id = NTables.pdw_node_id
   JOIN sys.pdw_table_mappings AS Tmap  ON NTables.name = TMap.physical_name AND substring(TMap.physical_name,40, 10) = pnp.distribution_id
   JOIN sys.objects AS o ON TMap.object_id = o.object_id
   JOIN sys.pdw_nodes_column_store_segments AS cls ON pnp.partition_id = cls.partition_id AND pnp.distribution_id  = cls.distribution_id
JOIN sys.columns as cols ON o.object_id = cols.object_id AND cls.column_id = cols.column_id
WHERE o.name = '<Table Name>' and cols.name = '<Column Name>'  and TMap.physical_name  not like '%HdTable%'
ORDER BY o.name, pnp.distribution_id, cls.min_data_id;

remarks
In order CCI In the table , Same batch DML Or the new data generated by the data loading operation will be sorted in the batch , All data in the table will not be globally sorted . Users can regenerate (REBUILD) Orderly CCI To sort all the data in the table . In dedicated SQL In the pool , Column store index rebuild is an offline operation . For partitioned tables , Regenerate one partition at a time . The data in the regenerated partition is “ offline ” Of , Before rebuilding the partition , These data are not available .

（ 3、 ... and ） Query performance

Orderly CCI The degree of query performance improvement depends on the query mode 、 data size 、 Rationality of data sorting 、 The physical structure of the segment , And the... Selected for query execution DWU And resources . In order of design CCI Table time , The user should consider all these factors before choosing a ranking .

Queries with all these patterns are in order CCI Time tends to run faster .

Queries have equality 、 Inequality or range predicates
Predicate columns and ordering CCI The columns are the same .

In this example , surface T1 Have a pressed Col_C、Col_B and Col_A Sequentially sorted clustered columns store indexes .

CREATE CLUSTERED COLUMNSTORE INDEX MyOrderedCCI ON  T1
ORDER (Col_C, Col_B, Col_A);

Because of the inquiry 1 And query 2 Quote all ordered CCI Column , So compared with other queries , The performance of these two queries is most suitable for ordered CCI.

-- Query #1: 

SELECT * FROM T1 WHERE Col_C = 'c' AND Col_B = 'b' AND Col_A = 'a';

-- Query #2

SELECT * FROM T1 WHERE Col_B = 'b' AND Col_C = 'c' AND Col_A = 'a';

-- Query #3
SELECT * FROM T1 WHERE Col_B = 'b' AND Col_A = 'a';

-- Query #4
SELECT * FROM T1 WHERE Col_A = 'a' AND Col_C = 'c';

（ Four ） Data loading performance

Load data in order CCI The performance in a table is similar to loading data into a partitioned table . Due to the need to perform data sorting operations , Load data in order CCI Tables may take longer to load than unordered CCI The watch is longer , But then , Queries can be ordered CCI Run faster .

The following example compares the performance of loading data into tables with different architectures .

The following example uses CCI And order CCI The query performance is compared .

（ 5、 ... and ） Reduce segment overlap

The number of overlapping segments depends on the size of the data to be sorted 、 Available memory , And create order CCI Maximum parallelism during (MAXDOP) Set up . The following options can be used to create an order CCI Reduce segment overlap .

At a higher level DWU Upper use xlargerc Resource class , So that before the index generator compresses the data into segments , There is more memory available for data sorting . After entering the index segment , The physical location of the data cannot be changed . There will be no data sorting inside or between segments .
Use MAXDOP = 1 Create order CCI. Used to create ordered CCI Each thread of runs against a portion of the data , And sort the data locally . Data that has been sorted by different threads will not be globally sorted . Using parallel threads can reduce the creation of order CCI Time required , But there are more overlapping segments generated than when using a single thread . at present ,MAXDOP Option can only be used through CREATE TABLE AS SELECT Command creation order CCI surface . adopt CREATE INDEX or CREATE TABLE Command creation order CCI Don't support MAXDOP Options . for example ,

CREATE TABLE Table1 WITH (DISTRIBUTION = HASH(c1), CLUSTERED COLUMNSTORE INDEX ORDER(c1) )
AS SELECT * FROM ExampleTable
OPTION (MAXDOP 1);

Before loading data into the table , Press the sort key in advance to sort the data .

Here is the order CCI Table distribution example , The table eliminates segment overlap based on the above recommendations . This order CCI The table is used MAXDOP 1 and xlargerc, be based on 20-GB The heap table of passes CTAS stay DWU1000c Created in the database . CCI According to BIGINT Sort columns .

（ 6、 ... and ） Create order in large tables CCI

Create order CCI Is an offline operation . For tables that do not contain partitions , In order CCI Before the creation process is completed , Users cannot access data . For partitioned tables , Because the engine will create orderly CCI Partition , therefore , In the not yet orderly CCI In the case of creating an operation , Users can still access the data in the partition . Create order in large tables CCI In the process of , You can use this option to minimize downtime ：

In the target large table （ be known as Table_A） Create partitions in .
Use and table A The same table schema and partition schema create empty order CCI surface （ be known as Table_B）.
Remove a partition from the table A Switch to table B.
For table B function ALTER INDEX <Ordered_CCI_Index> ON <Table_B> REBUILD PARTITION = <Partition_ID>, To regenerate the swapped in partition .
in the light of Table_A Repeat step for each partition in 3 and 4.
Take all sections from Table_A Switch to Table_B And regenerate these partitions , Delete Table_A, And will Table_B Rename it to Table_A.

Tips
For having order CCI For the exclusive use of SQL Pool table ,ALTER INDEX REBUILD Will use tempdb Reorder the data . Monitor during rebuild operation tempdb. If more is needed tempdb Space , Please expand the pool vertically . After index rebuild , Reduce to the original space .
For having order CCI For the exclusive use of SQL Pool table ,ALTER INDEX REORGANIZE Do not reorder the data . To reorder data , Please use ALTER INDEX REBUILD.
About order CCI Maintenance details , Please read the next one Azure Synapse Analytics Performance optimization guidelines （2）

（ 6、 ... and ） Practical cases

A. Check the sequence and serial number ：

SELECT object_name(c.object_id) table_name, c.name column_name, i.column_store_order_ordinal 
FROM sys.index_columns i 
JOIN sys.columns c ON i.object_id = c.object_id AND c.column_id = i.column_id
WHERE column_store_order_ordinal <>0;

B. To change the column ordinal , Please add or delete columns in the order list , Or from CCI Change to ordered CCI：

（ The following SQL The table name and column name in are the fields in my actual case , Please replace when readers use ）

CREATE CLUSTERED COLUMNSTORE INDEX InternetBonus ON dim_zyy_test 
ORDER ([CustomerID],[Monthkey] )
WITH (DROP_EXISTING = ON);

原网站

版权声明
本文为[zyypjc]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/207/202207260024110870.html

当前位置：网站首页>Azure synapse analytics Performance Optimization Guide (1) -- optimize performance using ordered aggregate column storage indexes

Azure synapse analytics Performance Optimization Guide (1) -- optimize performance using ordered aggregate column storage indexes

（ One ） Preface

（ Two ） Ordered and unordered clustered columns store indexes

（ 3、 ... and ） Query performance

（ Four ） Data loading performance

（ 5、 ... and ） Reduce segment overlap

（ 6、 ... and ） Create order in large tables CCI

（ 6、 ... and ） Practical cases

A. Check the sequence and serial number ：

B. To change the column ordinal , Please add or delete columns in the order list , Or from CCI Change to ordered CCI：

（ The following SQL The table name and column name in are the fields in my actual case , Please replace when readers use ）

边栏推荐

猜你喜欢

随机推荐