当前位置:网站首页>Clickhouse learning (VII) table query optimization
Clickhouse learning (VII) table query optimization
2022-07-29 05:34:00 【Crying dogs in the sun】
Catalog
Single table
prewhere
Its function and where identical , Used to filter data , First, the specified column data will be read , To judge data filtering , Wait for the data to be filtered before reading select Declared column fields complete the remaining attributes , Reduce io operation
explain syntax select WatchID,
JavaEnable,
Title,
GoodEvent,
EventTime,
EventDate,
CounterID,
ClientIP,
ClientIP6,
RegionID,
UserID,
CounterClass,
OS,
UserAgent,
URL,
Referer,
URLDomain,
RefererDomain,
Refresh,
IsRobot,
RefererCategories,
URLCategories,
URLRegions,
RefererRegions,
ResolutionWidth,
ResolutionHeight,
ResolutionDepth,
FlashMajor,
FlashMinor,
FlashMinor2
from datasets.hits_v1 where UserID='3198390223272470366';
prewhere If you don't close , On by default The shutdown command : set optimize_move_to_prewhere=0;
But in some scenarios, you have to specify it manually prewhere, So directly prewhere To replace the where The writing is simple and clear
Data sampling
SELECT Title,count(*) AS PageViews
FROM hits_v1
SAMPLE 0.1
WHERE CounterID =57
GROUP BY Title
ORDER BY PageViews DESC LIMIT 100;
from 1000 Before extracting data 10% The sample of , Only approximate data is not actual data 
Column clipping and partition clipping
In general, column clipping is to select fields instead of *, Partition clipping is to read partition information
It should be avoided when the amount of data is too large select * operation , The fewer fields , The consumption of io The less resources , The higher the performance .
select WatchID,
JavaEnable,
Title,
GoodEvent,
EventTime,
EventDate,
CounterID,
ClientIP,
ClientIP6,
RegionID,
UserID
from datasets.hits_v1;
Partition clipping is to read only the needed partitions , Specify... In the filter criteria .
select WatchID,
JavaEnable,
Title,
GoodEvent,
ClientIP6,
RegionID,
UserID
from datasets.hits_v1
where EventDate='2014-03-23';

orderby combination where、limit Use
More than 10 million data sets order by It needs to be matched when querying where Conditions and limit Statement together
SELECT UserID,Age
FROM hits_v1
PREWHERE CounterID=57
ORDER BY Age DESC LIMIT 1000

Avoid building virtual columns
Just try not to use as Create a new column
for example select a/b as t from test
uniqCombined replace distinct
Approximate de duplication uniqCombined
Count(distinct ) Will use uniqExact Accurate weight removal
select count(distinct rand()) from hits_v1;

SELECT uniqCombined(rand()) from datasets.hits_v1;

Multiple tables
preparation
Create a small table , Avoid memory explosion
CREATE TABLE visits_v2
ENGINE = CollapsingMergeTree(Sign)
PARTITION BY toYYYYMM(StartDate)
ORDER BY (CounterID, StartDate, intHash32(UserID), VisitID)
SAMPLE BY intHash32(UserID)
SETTINGS index_granularity = 8192
as select * from visits_v1 limit 10000;
Create a result table for storing data , Avoid rendering explosion
CREATE TABLE hits_v2
ENGINE = MergeTree()
PARTITION BY toYYYYMM(EventDate)
ORDER BY (CounterID, EventDate, intHash32(UserID))
SAMPLE BY intHash32(UserID)
SETTINGS index_granularity = 8192
as select * from hits_v1 where 1=0;
As far as possible with in no need join
insert into hits_v2
select a.* from hits_v1 a where a. CounterID in (select CounterID from
visits_v1);

Must use join
!!! The principle of small table on the right must be met ,clickhouse The data of the right table will be loaded into memory and compared with the left table , No matter which connection method is used, it will only load the right table into memory
When the big table is on the right, the program will directly report an error , Not enough memory space 
边栏推荐
- 浅谈范式
- Bubble sort c language
- Cryengine3 debugging shader method
- Pyqt5: Chapter 1, Section 1: creating a user interface using QT components - Introduction
- ANSI C type qualifier
- 抢先预约 | 阿里云无影云应用线上发布会预约开启
- C语言 一级指针
- Side effects and sequence points
- Occt learning 001 - Introduction
- Introduction to array learning simple question sum of two numbers
猜你喜欢

一维数组练习

B - 识别浮点常量问题

【活动预告】云上数字工厂与中小企业数字化转型创新论坛

365天挑战LeetCode1000题——Day 035 每日一题 + 二分查找 13

510000 prize pool invites you to fight! The second Alibaba cloud ECS cloudbuild developer competition is coming

Container security open source detection tool - veinmind (mirror backdoor, malicious samples, sensitive information, weak password, etc.)

Pyqt5: Chapter 1, Section 1: creating a user interface using QT components - Introduction

全局components组件注册

Li Kou 994: rotten orange (BFS)

C语言数组入门到精通(数组精讲)
随机推荐
Bubble sort c language
Introduction to array learning simple question sum of two numbers
The function of using wechat applet to scan code to log in to the PC web of the system
Database operation day 6
B - 识别浮点常量问题
PyQt5:第一章第1节:使用Qt组件创建一个用户界面-介绍
In depth analysis of common cross end technology stacks of app
Live broadcast preview | how to improve enterprise immunity through "intelligent edge security"?
Li Yan, CEO of parallel cloud: cloudxr, opens the channel to the metauniverse
Li Kou 994: rotten orange (BFS)
C语言 N皇后问题
C语言数组入门到精通(数组精讲)
Alibaba cloud and Dingjie software released the cloud digital factory solution to realize the localized deployment of cloud MES system
Topological ordering of a graph of water
数组学习之入门简单题 两数之和
Summary of the first week
牛客网编程题—【WY22 Fibonacci数列】和【替换空格】详解
End of document
Global components component registration
NVIDIA Zhou Xijian: the last mile from design to digital marketing
