当前位置：网站首页>Clickhouse source code note 6: exploring the sorting of columnar storage systems

Clickhouse source code note 6: exploring the sorting of columnar storage systems

2022-06-24 07:06:00 【HappenLee】

The analysis completes the aggregation and vectorization filtering , After vectorized function calculation . This article , The author will analyze an important operator of database ： Sort . Let's analyze it from the perspective of source code ClickHouse As a column storage system, how to achieve sorting .

The source code analysis of this series of articles is based on ClickHouse v19.16.2.2 Version of .

1. Implementation plan

Old rules , Let's start with a simple query , By implementing the plan step by step ClickHouse Execution logic .

select * from test order by k1;

Let's try to open it first ClickHouse Of Debug Log to see the specific implementation of pipeline.

image.png

It's divided into 5 A flow , And the flow that we need to focus on is coming out MergeSorting And PartialSorting,ClickHouse Read data from the storage engine first , And perform function operations , And sort the data first , And then for the data that is already in order MergeSort, Get the final, orderly data .

2. Sorting out the implementation process

Then the code we are going to comb next is also very clear , Namely PartialSortingBlockInputStream And MergingSortedBlockInputStream.

PartialSortingBlockInputStream The implementation of the PartialSortingBlockInputStream Is very simple to implement , Let's just look at the code ：

Block PartialSortingBlockInputStream::readImpl()
{
    Block res = children.back()->read();
    sortBlock(res, description, limit);
    return res;
}

It reads data from the underlying stream Block,Block It can be understood as Doris Among them Batch, A lot of rows of data , And then according to its own member variables SortDescription To the individual Block Sort , And according to limit Do length truncation .

SortDescription It's a vector, Each member describes the collation of a single permutation . such as ： null Collation of values , Whether to sort in reverse order, etc .

/// Description of the sorting rule for several columns.
using SortDescription = std::vector<SortColumnDescription>;

sortBlock Function implementation of

Next , Let's see sortBlock Implementation of function , Take a look at how the column execution system uses the above information to sort data .

void sortBlock(Block & block, const SortDescription & description, UInt64 limit)
{
    /// If only one column to sort by
    if (description.size() == 1)
    {
        bool reverse = description[0].direction == -1;

        const IColumn * column = !description[0].column_name.empty()
            ? block.getByName(description[0].column_name).column.get()
            : block.safeGetByPosition(description[0].column_number).column.get();

        IColumn::Permutation perm;
        if (needCollation(column, description[0]))
        {
            const ColumnString & column_string = typeid_cast<const ColumnString &>(*column);
            column_string.getPermutationWithCollation(*description[0].collator, reverse, limit, perm);
        }
        else
            column->getPermutation(reverse, limit, description[0].nulls_direction, perm);

        size_t columns = block.columns();
        for (size_t i = 0; i < columns; ++i)
            block.getByPosition(i).column = block.getByPosition(i).column->permute(perm, limit);
    }

There are two situations that need to be discussed here ：1. Single column sort .2. Multi column sorting . Multi column sorting is similar to single column sorting , So let's start with the single column sorting code . Its core code is the following four lines ：

    column->getPermutation(reverse, limit, description[0].nulls_direction, perm);
    size_t columns = block.columns();
    for (size_t i = 0; i < columns; ++i)
           block.getByPosition(i).column = block.getByPosition(i).column->permute(perm, limit);

First, sort by a single column , Get each column after the sort IColumn::Permutation perm;. then Block Each of these columns uses this perm, Generate a new sort column , After replacing the old Columns , It's done Block It's sort of .

Generate Perm

As shown in the figure above ,Permutation Is a length of limit Of PodArray, It identifies the sort position after sorting according to the sort sequence . Follow this up perm Rules use functions permute Generate a new column , Sort the columns that have been completed .

ColumnPtr ColumnVector<T>::permute(const IColumn::Permutation & perm, size_t limit) const
{
    typename Self::Container & res_data = res->getData();
    for (size_t i = 0; i < limit; ++i)
        res_data[i] = data[perm[i]];

    return res;
}

Careful friends here will find that ,String Listed in sortBlock There's some extra judgment in the function

  if (needCollation(column, description[0])) {
            const ColumnString & column_string = typeid_cast<const ColumnString &>(*column);
            column_string.getPermutationWithCollation(*description[0].collator, reverse, limit, perm);
 }

This part is a special string generation perm The logic of ,ClickHouse It supports sorting string columns with different codes . Such as through GBK If you sort by coding , Then the order of Chinese will be based on the order of Pinyin .

getPermutation The implementation of the therefore , stay ClickHouse In the process of sorting .getPermutation It's the most important of all the sorting operators , It is Column A virtual function of class , In other words, each column of different data types can implement its own sorting logic . We go through ColumnVector The implementation of the , To manage Zhonggui leopard .

template <typename T>
void ColumnVector<T>::getPermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res) const
{
        if (reverse)
            std::partial_sort(res.begin(), res.begin() + limit, res.end(), greater(*this, nan_direction_hint));
        else
            std::partial_sort(res.begin(), res.begin() + limit, res.end(), less(*this, nan_direction_hint));
    }
    else
    {
        /// A case for radix sort
        if constexpr (std::is_arithmetic_v<T> && !std::is_same_v<T, UInt128>)
        {
                return;
            }
        }

        /// Default sorting algorithm.
        for (size_t i = 0; i < s; ++i)
            res[i] = i;

       pdqsort(res.begin(), res.end(), less(*this, nan_direction_hint));
    }
}

This part of the code is more , The author simplifies the logic of this part .

If there is limit Conditions , And the length of the column is greater than limit, use std::partial_sort Conduct perm Sort .
If it's a number type , And not for UInt128 Type , Then Radix Sort Count and sort perm Sort .
If the first two conditions are not met , Quicksort is used as the final default implementation .

well , See here . It's completely sorted out PartialSortingBlockInputStream, We get the... Of each output Block It's sorted according to our sorting rules . Next, please come out MergeSortingBlockInputStream To do the final sorting .

MergeSortingBlockInputStream The implementation of the It can be seen from the name , It needs to be done once Merge sort , To get the final ordered sorting result . As for the sort objects , Naturally, it goes through PartialSortingBlockInputStream Output Block 了 .

Go straight to readImpl() The implementation of the ,ClickHouse Here comes Spill to disk The external sort logic of , To simplify , I'll take away this part of external sorting logic for the time being .

Block MergeSortingBlockInputStream::readImpl()
{
    /** Algorithm:
      * - read to memory blocks from source stream;
      */

    /// If has not read source blocks.
    if (!impl)
    {
        while (Block block = children.back()->read())
        {
            blocks.push_back(block);
            sum_rows_in_blocks += block.rows();
            sum_bytes_in_blocks += block.allocatedBytes();

            /** If significant amount of data was accumulated, perform preliminary merging step.
              */
            if (blocks.size() > 1
                && limit
                && limit * 2 < sum_rows_in_blocks   /// 2 is just a guess.
                && remerge_is_useful
                && max_bytes_before_remerge
                && sum_bytes_in_blocks > max_bytes_before_remerge)
            {
                remerge();
            }

        if ((blocks.empty() && temporary_files.empty()) || isCancelledOrThrowIfKilled())
            return Block();

        if (temporary_files.empty())
        {
            impl = std::make_unique<MergeSortingBlocksBlockInputStream>(blocks, description, max_merged_block_size, limit);
        }
       
    Block res = impl->read();
    return res;
}

� As you can see from the code above ,MergeSortingBlockInputStream This part is constantly from the bottom PartialSortingBlockInputStream Read out , And store it all . After the final read is complete , utilize MergeSortingBlocksBlockInputStream class , To complete all Blocks The merging and sorting work of . and MergeSortingBlocksBlockInputStream Class is simply to complete the process code of multi-channel merge and sort using heap , I will not start here , Interested students can refer to MergeSortingBlockInputStream.cpp Partial implementation .

3. The point to comb

The second section is finished ClickHouse The implementation process of sorting operator of , Here's a brief summary of the main points ：

ClickHouse The implementation of sorting needs to take advantage of Sort columns Generate corresponding perm, In the end use perm Complete every one of them Block Sort .
So every column of a different data type , Both need to be implemented getPermutation And permute To achieve sorting . And according to the data type , Choose a different sort implementation . such as radix sort The time complexity of is O(n), Compared with the time complexity of quick sort, there are obvious advantages .
There is a lot of data dependence in sorting algorithm , So it's hard to play SIMD Advantage of . Only in radix sort There are only a few parts that can be vectorized , So relative to the implementation of non vectorization , There are not many performance advantages .

4. Summary

OK, Only this and nothing more , We can start from Clickhouse In the implementation of the source code, sort out how to complete the sorting of the column storage system . Of course , This section skips over some important implementations ：Spill to disk. This is to ensure that under a certain memory limit , When sorting massive data , Disk can be used to cache the intermediate results of sorting . The implementation of this part is also very interesting , Interested friends , We can further expand to see the implementation of this part . The author is a ClickHouse Beginners , Yes ClickHouse Interested students , Welcome to give me more advice , communication .