当前位置:网站首页>Clickhouse source code note 6: exploring the sorting of columnar storage systems
Clickhouse source code note 6: exploring the sorting of columnar storage systems
2022-06-24 07:06:00 【HappenLee】
The analysis completes the aggregation and vectorization filtering , After vectorized function calculation . This article , The author will analyze an important operator of database : Sort . Let's analyze it from the perspective of source code ClickHouse As a column storage system, how to achieve sorting .
The source code analysis of this series of articles is based on ClickHouse v19.16.2.2 Version of .
1. Implementation plan
Old rules , Let's start with a simple query , By implementing the plan step by step ClickHouse Execution logic .
select * from test order by k1;
Let's try to open it first ClickHouse Of Debug Log to see the specific implementation of pipeline.
image.png
It's divided into 5 A flow , And the flow that we need to focus on is coming out MergeSorting And PartialSorting,ClickHouse Read data from the storage engine first , And perform function operations , And sort the data first , And then for the data that is already in order MergeSort, Get the final, orderly data .
2. Sorting out the implementation process
Then the code we are going to comb next is also very clear , Namely PartialSortingBlockInputStream And MergingSortedBlockInputStream.
- PartialSortingBlockInputStream The implementation of the PartialSortingBlockInputStream Is very simple to implement , Let's just look at the code :
Block PartialSortingBlockInputStream::readImpl()
{
Block res = children.back()->read();
sortBlock(res, description, limit);
return res;
} It reads data from the underlying stream Block,Block It can be understood as Doris Among them Batch, A lot of rows of data , And then according to its own member variables SortDescription To the individual Block Sort , And according to limit Do length truncation .
SortDescription It's a vector, Each member describes the collation of a single permutation . such as : null Collation of values , Whether to sort in reverse order, etc .
/// Description of the sorting rule for several columns. using SortDescription = std::vector<SortColumnDescription>;
- sortBlock Function implementation of
Next , Let's see sortBlock Implementation of function , Take a look at how the column execution system uses the above information to sort data .
void sortBlock(Block & block, const SortDescription & description, UInt64 limit)
{
/// If only one column to sort by
if (description.size() == 1)
{
bool reverse = description[0].direction == -1;
const IColumn * column = !description[0].column_name.empty()
? block.getByName(description[0].column_name).column.get()
: block.safeGetByPosition(description[0].column_number).column.get();
IColumn::Permutation perm;
if (needCollation(column, description[0]))
{
const ColumnString & column_string = typeid_cast<const ColumnString &>(*column);
column_string.getPermutationWithCollation(*description[0].collator, reverse, limit, perm);
}
else
column->getPermutation(reverse, limit, description[0].nulls_direction, perm);
size_t columns = block.columns();
for (size_t i = 0; i < columns; ++i)
block.getByPosition(i).column = block.getByPosition(i).column->permute(perm, limit);
}There are two situations that need to be discussed here :1. Single column sort .2. Multi column sorting . Multi column sorting is similar to single column sorting , So let's start with the single column sorting code . Its core code is the following four lines :
column->getPermutation(reverse, limit, description[0].nulls_direction, perm);
size_t columns = block.columns();
for (size_t i = 0; i < columns; ++i)
block.getByPosition(i).column = block.getByPosition(i).column->permute(perm, limit); First, sort by a single column , Get each column after the sort IColumn::Permutation perm;. then Block Each of these columns uses this perm, Generate a new sort column , After replacing the old Columns , It's done Block It's sort of .
Generate Perm
As shown in the figure above ,Permutation Is a length of limit Of PodArray, It identifies the sort position after sorting according to the sort sequence . Follow this up perm Rules use functions permute Generate a new column , Sort the columns that have been completed .
ColumnPtr ColumnVector<T>::permute(const IColumn::Permutation & perm, size_t limit) const
{
typename Self::Container & res_data = res->getData();
for (size_t i = 0; i < limit; ++i)
res_data[i] = data[perm[i]];
return res;
} Careful friends here will find that ,String Listed in sortBlock There's some extra judgment in the function
if (needCollation(column, description[0])) {
const ColumnString & column_string = typeid_cast<const ColumnString &>(*column);
column_string.getPermutationWithCollation(*description[0].collator, reverse, limit, perm);
} This part is a special string generation perm The logic of ,ClickHouse It supports sorting string columns with different codes . Such as through GBK If you sort by coding , Then the order of Chinese will be based on the order of Pinyin .
- getPermutation The implementation of the therefore , stay ClickHouse In the process of sorting .
getPermutationIt's the most important of all the sorting operators , It isColumnA virtual function of class , In other words, each column of different data types can implement its own sorting logic . We go throughColumnVectorThe implementation of the , To manage Zhonggui leopard .
template <typename T>
void ColumnVector<T>::getPermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res) const
{
if (reverse)
std::partial_sort(res.begin(), res.begin() + limit, res.end(), greater(*this, nan_direction_hint));
else
std::partial_sort(res.begin(), res.begin() + limit, res.end(), less(*this, nan_direction_hint));
}
else
{
/// A case for radix sort
if constexpr (std::is_arithmetic_v<T> && !std::is_same_v<T, UInt128>)
{
return;
}
}
/// Default sorting algorithm.
for (size_t i = 0; i < s; ++i)
res[i] = i;
pdqsort(res.begin(), res.end(), less(*this, nan_direction_hint));
}
}This part of the code is more , The author simplifies the logic of this part .
- If there is
limitConditions , And the length of the column is greater thanlimit, usestd::partial_sortConductpermSort . - If it's a number type , And not for
UInt128Type , ThenRadix SortCount and sortpermSort . - If the first two conditions are not met , Quicksort is used as the final default implementation .
well , See here . It's completely sorted out PartialSortingBlockInputStream, We get the... Of each output Block It's sorted according to our sorting rules . Next, please come out MergeSortingBlockInputStream To do the final sorting .
- MergeSortingBlockInputStream The implementation of the It can be seen from the name , It needs to be done once Merge sort , To get the final ordered sorting result . As for the sort objects , Naturally, it goes through PartialSortingBlockInputStream Output
Block了 .
Go straight to readImpl() The implementation of the ,ClickHouse Here comes Spill to disk The external sort logic of , To simplify , I'll take away this part of external sorting logic for the time being .
Block MergeSortingBlockInputStream::readImpl()
{
/** Algorithm:
* - read to memory blocks from source stream;
*/
/// If has not read source blocks.
if (!impl)
{
while (Block block = children.back()->read())
{
blocks.push_back(block);
sum_rows_in_blocks += block.rows();
sum_bytes_in_blocks += block.allocatedBytes();
/** If significant amount of data was accumulated, perform preliminary merging step.
*/
if (blocks.size() > 1
&& limit
&& limit * 2 < sum_rows_in_blocks /// 2 is just a guess.
&& remerge_is_useful
&& max_bytes_before_remerge
&& sum_bytes_in_blocks > max_bytes_before_remerge)
{
remerge();
}
if ((blocks.empty() && temporary_files.empty()) || isCancelledOrThrowIfKilled())
return Block();
if (temporary_files.empty())
{
impl = std::make_unique<MergeSortingBlocksBlockInputStream>(blocks, description, max_merged_block_size, limit);
}
Block res = impl->read();
return res;
}� As you can see from the code above ,MergeSortingBlockInputStream This part is constantly from the bottom PartialSortingBlockInputStream Read out , And store it all . After the final read is complete , utilize MergeSortingBlocksBlockInputStream class , To complete all Blocks The merging and sorting work of . and MergeSortingBlocksBlockInputStream Class is simply to complete the process code of multi-channel merge and sort using heap , I will not start here , Interested students can refer to MergeSortingBlockInputStream.cpp Partial implementation .
3. The point to comb
The second section is finished ClickHouse The implementation process of sorting operator of , Here's a brief summary of the main points :
- ClickHouse The implementation of sorting needs to take advantage of Sort columns Generate corresponding
perm, In the end usepermComplete every one of them Block Sort . - So every column of a different data type , Both need to be implemented
getPermutationAndpermuteTo achieve sorting . And according to the data type , Choose a different sort implementation . such asradix sortThe time complexity of is O(n), Compared with the time complexity of quick sort, there are obvious advantages . - There is a lot of data dependence in sorting algorithm , So it's hard to play
SIMDAdvantage of . Only inradix sortThere are only a few parts that can be vectorized , So relative to the implementation of non vectorization , There are not many performance advantages .
4. Summary
OK, Only this and nothing more , We can start from Clickhouse In the implementation of the source code, sort out how to complete the sorting of the column storage system . Of course , This section skips over some important implementations :Spill to disk. This is to ensure that under a certain memory limit , When sorting massive data , Disk can be used to cache the intermediate results of sorting . The implementation of this part is also very interesting , Interested friends , We can further expand to see the implementation of this part . The author is a ClickHouse Beginners , Yes ClickHouse Interested students , Welcome to give me more advice , communication .
5. Reference material
边栏推荐
- Spark accumulators and broadcast variables
- C language student management system - can check the legitimacy of user input, two-way leading circular linked list
- What is JSP technology? Advantages of JSP technology
- FreeRTOS MPU makes the system more robust!
- Localized operation on cloud, the sea going experience of kilimall, the largest e-commerce platform in East Africa
- FreeRTOS MPU使系统更健壮!
- mysql中的 ON UPDATE CURRENT_TIMESTAMP
- JSON formatting method advantages of JSON over XML
- On BOM and DOM (4): dom0/dom2 event handling analysis
- Command ‘[‘where‘, ‘cl‘]‘ returned non-zero exit status 1.
猜你喜欢

Database stored procedure begin end

数据同步工具 DataX 已经正式支持读写 TDengine

leetcode:剑指 Offer 26:判断t1中是否含有t2的全部拓扑结构

In the middle of the year, I have prepared a small number of automated interview questions. Welcome to the self-test

Spark项目打包优化实践

Rockscache schematic diagram of cache operation
![Command ‘[‘where‘, ‘cl‘]‘ returned non-zero exit status 1.](/img/2c/d04f5dfbacb62de9cf673359791aa9.png)
Command ‘[‘where‘, ‘cl‘]‘ returned non-zero exit status 1.

Counter attack of flour dregs: MySQL 66 questions, 20000 words + 50 pictures

Challenges brought by maker education to teacher development

Record -- about the problem of garbled code when JSP foreground passes parameters to the background
随机推荐
RealNetworks vs. Microsoft: the battle in the early streaming media industry
Computing power and intelligence of robot fog
leetcode:1856. 子数组最小乘积的最大值
JVM调试工具-jstack
Laravel文档阅读笔记-Laravel Str slug() Function Example
What are the audio formats? Can the audio format be converted
记录--关于virtual studio2017添加报表控件的方法--Reportview控件
How to operate the little red book account: making good use of the theory of long tail words
How do I reinstall the system? How to install win10 system with USB flash disk?
Spark accumulators and broadcast variables
机器人迷雾之算力与智能
[Yugong series] June 2022 asp Basic introduction and use of cellreport reporting tool under net core
Bay area enterprises quick look! The focus of the data regulations of Shenzhen Special Economic Zone just released is coming!
Spark参数调优实践
sql join的使用
On BOM and DOM (1): overview of BOM and DOM
【云驻共创】华为云HCIA-IoT V2.5培训系列内容之物联网概览
网吧管理系统与数据库
潞晨科技获邀加入NVIDIA初创加速计划
Do you know about Statistics?