当前位置：网站首页>Clickhouse materialized view

Clickhouse materialized view

2022-07-05 03:45:00 【Younger Cheng】

One 、 Concept

clickhouse Materialized view of is a kind of persistence of query results , It has brought us the improvement of query efficiency . There is no difference between user query and table lookup , Its essence is a table , A table that is always pre calculated , The creation process uses a special engine ：

notes ： Use create grammar , A hidden target will be created to save the view data , Use To Table name , Save to a displayed table , No addition To Table name , The default table name is .inner. Materialized view name ,

CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db.]table_name [TO[db.]name] [ENGINE = engine] [POPULATE] AS SELECT …

“ Query result set ” Has a wide range of , It can be a simple copy of some data in the basic table , It can also be multi table join The result or a subset of it 、 Or the aggregation index of the original data . therefore , The materialized view will not change with the change of the underlying table , So it is also called snapshot

Restrictions on creating materialized views ：

1、 The of materialized views must be specified engine For data storage

2、 Use To[db].[table] Grammatical time , Do not use POPULATE,POPULATE All historical data will be loaded and converted , There is an unavailable time when there is a large amount of data （ It is not officially recommended to use when creating materialized views POPULATE, Because the data written in the process of creating materialized views cannot be inserted into materialized views ）

3、 Query statement （select） It can contain the following sentences ：DISTINCT、GROUP BY、LIMIT etc.

4、 If the definition of materialized view uses TO [db.]name Sub statement , You can uninstall the view of the target table DETACH Reload ATTACH

The difference between materialized view and ordinary view ：

Normal view doesn't save data , Only query statements are saved , When querying, you still read data from the original table , You can think of a normal view as a subquery .

Materialized views are Store the query results into disk or memory according to the corresponding engine , Reorganize the data , A new table is generated

Advantages and disadvantages ：

advantage ： Fast query speed , Write materialized view rules in advance , It is much better than directly querying the original data （ Materialized views play a synchronous role ）

shortcoming ： The essence is streaming data Usage scenarios of , Adopt cumulative Technology , Historical data should be used for de duplication 、 Analysis of denuclearization . Limited use scenarios , If you add many materialized views to a table , Writing this table will consume a lot of resources , For example, the data bandwidth is full , Sudden increase in storage

# Create table 
CREATE TABLE `ts_area_info` (
  id UInt32 ,
  createDate Date ,
  userId UInt32 ,
  url String,
  income UInt8
) ENGINE=MergeTree()
PARTITION BY toYYYYMM(createDate)
ORDER BY  (id,createDate,intHash32(userId))
SAMPLE BY  intHash32(userId)
SETTINGS index_granularity = 8192

# Create materialized views  
 CREATE MATERIALIZED VIEW area_mv
 ENGINE  SummingMergeTree
 PARTITION BY toYYYYMM(createDate)
 ORDER BY  (id,createDate,intHash32(userId))
 AS 
 select * from ts_area_info;

Newly generated materialized view ：

Key fields ：

populate： Create a table to synchronize data

final： Go to the latest data

Two 、 Materialized views act as aggregate tables

Realize aggregation when synchronizing data

 # Use materialized views to synchronize aggregate tables 
 1、 Create a schedule 
 drop table tb_order;
 create table tb_order(
  id UInt8 ,
  createDate Date ,
  money  UInt64
 )
 ENGINE =MergeTree()
 order by id;
 
 2、 insert data 
 insert into tb_order values(1,toDate(now()),100),
							(2,toDate(now()),100),
							(3,toDate(now()),100),
							(1,toDate(now()),100),
							(2,toDate(now()),200),
							(3,toDate(now()),300);
 
 3、 Create materialized views to synchronize data 
 CREATE MATERIALIZED VIEW order_mv
 ENGINE  AggregatingMergeTree()
 PARTITION BY toYYYYMM(createDate)
 ORDER BY  (id,createDate)
 POPULATE AS 
 select id,createDate,sumState(money) as ms from tb_order
 GROUP BY id,createDate;

4、 Query materialized view 
select id,createDate,sumMerge(ms) from order_mv GROUP BY id,createDate; 
 
5、 Reinsert to view synchronized data   
insert into tb_order values(1,toDate(now()),100),(2,toDate(now()),100);

insert into tb_order values(1,toDate('2022-06-29'),100),(2,toDate('2022-06-29'),100);

6、 Query order table 
select * from tb_order;

notes ：

The primary key is not specified when creating the table , Will be used by default order by Use the field of as the primary key

原网站

版权声明
本文为[Younger Cheng]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/186/202207050309085225.html