当前位置:网站首页>Clickhouse learning (VII) table query optimization
Clickhouse learning (VII) table query optimization
2022-07-29 05:34:00 【Crying dogs in the sun】
Catalog
Single table
prewhere
Its function and where identical , Used to filter data , First, the specified column data will be read , To judge data filtering , Wait for the data to be filtered before reading select Declared column fields complete the remaining attributes , Reduce io operation
explain syntax select WatchID,
JavaEnable,
Title,
GoodEvent,
EventTime,
EventDate,
CounterID,
ClientIP,
ClientIP6,
RegionID,
UserID,
CounterClass,
OS,
UserAgent,
URL,
Referer,
URLDomain,
RefererDomain,
Refresh,
IsRobot,
RefererCategories,
URLCategories,
URLRegions,
RefererRegions,
ResolutionWidth,
ResolutionHeight,
ResolutionDepth,
FlashMajor,
FlashMinor,
FlashMinor2
from datasets.hits_v1 where UserID='3198390223272470366';
prewhere If you don't close , On by default The shutdown command : set optimize_move_to_prewhere=0;
But in some scenarios, you have to specify it manually prewhere, So directly prewhere To replace the where The writing is simple and clear
Data sampling
SELECT Title,count(*) AS PageViews
FROM hits_v1
SAMPLE 0.1
WHERE CounterID =57
GROUP BY Title
ORDER BY PageViews DESC LIMIT 100;
from 1000 Before extracting data 10% The sample of , Only approximate data is not actual data 
Column clipping and partition clipping
In general, column clipping is to select fields instead of *, Partition clipping is to read partition information
It should be avoided when the amount of data is too large select * operation , The fewer fields , The consumption of io The less resources , The higher the performance .
select WatchID,
JavaEnable,
Title,
GoodEvent,
EventTime,
EventDate,
CounterID,
ClientIP,
ClientIP6,
RegionID,
UserID
from datasets.hits_v1;
Partition clipping is to read only the needed partitions , Specify... In the filter criteria .
select WatchID,
JavaEnable,
Title,
GoodEvent,
ClientIP6,
RegionID,
UserID
from datasets.hits_v1
where EventDate='2014-03-23';

orderby combination where、limit Use
More than 10 million data sets order by It needs to be matched when querying where Conditions and limit Statement together
SELECT UserID,Age
FROM hits_v1
PREWHERE CounterID=57
ORDER BY Age DESC LIMIT 1000

Avoid building virtual columns
Just try not to use as Create a new column
for example select a/b as t from test
uniqCombined replace distinct
Approximate de duplication uniqCombined
Count(distinct ) Will use uniqExact Accurate weight removal
select count(distinct rand()) from hits_v1;

SELECT uniqCombined(rand()) from datasets.hits_v1;

Multiple tables
preparation
Create a small table , Avoid memory explosion
CREATE TABLE visits_v2
ENGINE = CollapsingMergeTree(Sign)
PARTITION BY toYYYYMM(StartDate)
ORDER BY (CounterID, StartDate, intHash32(UserID), VisitID)
SAMPLE BY intHash32(UserID)
SETTINGS index_granularity = 8192
as select * from visits_v1 limit 10000;
Create a result table for storing data , Avoid rendering explosion
CREATE TABLE hits_v2
ENGINE = MergeTree()
PARTITION BY toYYYYMM(EventDate)
ORDER BY (CounterID, EventDate, intHash32(UserID))
SAMPLE BY intHash32(UserID)
SETTINGS index_granularity = 8192
as select * from hits_v1 where 1=0;
As far as possible with in no need join
insert into hits_v2
select a.* from hits_v1 a where a. CounterID in (select CounterID from
visits_v1);

Must use join
!!! The principle of small table on the right must be met ,clickhouse The data of the right table will be loaded into memory and compared with the left table , No matter which connection method is used, it will only load the right table into memory
When the big table is on the right, the program will directly report an error , Not enough memory space 
边栏推荐
- 365 day challenge leetcode 1000 questions - day 037 elements and the maximum side length of squares less than or equal to the threshold + the number of subsequences that meet the conditions
- About local variables
- 使用微信小程序扫码登录系统PC端web的功能
- On Paradigm
- 【C语言系列】— 打印100~200之间的素数
- Alibaba cloud Zhang Xintao: heterogeneous computing provides surging power for the digital economy
- ANSI C type qualifier
- 365 day challenge leetcode 1000 questions - day 041 two point search completion anniversary + nth magic number + online election
- JD cloud and Forrester consulting released a hybrid cloud report that cloud Nativity has become a new engine driving industrial development
- Cryengine Technology
猜你喜欢

Day 5

Helm chart for Kubernetes

Alibaba cloud architect Liang Xu: MES on cloud box helps customers quickly build digital factories

Day 5

Thousands of databases, physical machines all over the country, JD logistics full volume cloud live record | interview with excellent technical team

The road to success in R & D efficiency of 1000 person Internet companies

一维数组练习

510000 prize pool invites you to fight! The second Alibaba cloud ECS cloudbuild developer competition is coming

刷题狂魔—LeetCode之剑指offer58 - II. 左旋转字符串 详解

Detailed explanation of GPIO input and output
随机推荐
ClickHouse学习(一)ClickHouse?
C语言 一级指针
Thousands of databases, physical machines all over the country, JD logistics full volume cloud live record | interview with excellent technical team
实现简单的数据库查询(不完整)
抢先预约 | 阿里云无影云应用线上发布会预约开启
Alibaba cloud and Dingjie software released the cloud digital factory solution to realize the localized deployment of cloud MES system
哈夫曼树以及哈夫曼编码在文件压缩上的应用
Cryengine5 shader debugging
三次握手四次挥手针对面试总结
AD常用快捷键
Complete ecological map of R & D Efficiency & selection of Devops tools
Detailed explanation of GPIO input and output
存储类别
510000 prize pool invites you to fight! The second Alibaba cloud ECS cloudbuild developer competition is coming
Solution: find the position of the first and last element in a sorted array (personal notes)
ClickHouse学习(八)物化视图
[event preview] cloud development, efficient and intelligent - the second Alibaba cloud ECS cloudbuild developer competition is about to start
Pyqt5: Chapter 1, Section 1: creating a user interface using QT components - Introduction
Terminal shell common commands
Database operation day 6
