当前位置：网站首页>Morsel driven parallelism: a NUMA aware parallel query execution framework

Morsel driven parallelism: a NUMA aware parallel query execution framework

2022-06-09 05:33:00 【Lizhaolong's blog】

Insert picture description here This work adopts Creative Commons signature - Noncommercial use - Share in the same way 4.0 International licensing agreement Licensing .

This work ( Zhao Long Li post , from Zhao Long Li A literary creation ), from Zhao Long Li confirm , Reprint please indicate copyright .

introduction

I saw... Some time ago OSAPP in polarDB There is a core data structure in the community to support NUMA Aware The subject of , Although I have already reported ChaosBlade Community do OS Level of fault injection , But I am also very interested in the implementation of the former , If nothing unexpected happens, I will always pay attention to polarDB This PR Implementation process .

I went to Chongqing with Qibao a few days ago , There is no travel plan in the past week , The reply was delayed for several days , So I had a few days' free time to study the knowledge I was interested in , Then I thought of that NUMA Aware The subject of . It took me about a day to review my past NUMA Related issues , Including several articles NUMA The troubleshooting process leading to burrs ,NUMA API Use （ Include migrate_pages and mbind These two system calls , and numactl,numademo,numastat These terminal tools ）, several memory policy The practical significance and possibility of NUMA dependent PMU Access method .

In fact, according to the scheduled plan , In the last month of College , I didn't do technical research anymore , But should feel at ease when a lazy dog , Especially today is the Dragon Boat Festival , It is a sure thing to give up and accept teats . however . All of a sudden timeline There is an article in NUMA Relevant papers have not been read , Now that I have learned , Just learn the whole set , This is the origin of this article .

Morsel-Driven Parallelism

Morsel-Driven Parallelism The biggest contribution is to create a new fine-grained parallelism Query Execution Frame and will NUMA The idea is applied to database design .

In my humble opinion , With the continuous enhancement of new hardware , modern NUMA System span node The cost of access is actually getting lower and lower [4][5], And only in the higher order cache Remote access to memory is only possible on misses , It also means that unless false shareding[3] appear , otherwise NUMA It is very difficult for remote access to cause bottlenecks , So the core of this article is to propose a more fine-grained Query Execution frame , In order to adapt the framework is also important relational operators A set of parallel algorithms is designed .

Parallel is mentioned at the beginning of the article Query Execution There are two problems with the development of computer systems （ The increase of computer speed is gradually reflected in the speed of multi-core rather than single core ） There are conflicts ：

To take advantage of multiple cores , All query work must be evenly distributed among hundreds of threads .
The complexity of modern kernels , As a result, even with accurate data statistics, it is difficult to evenly distribute queries .

The above problems lead to load imbalance and context switching bottleneck among multiple cores , So it is no longer extensible , So this article is actually to solve the above problems , But the solution can be done easily NUMA friendly , In order to solve the imbalance of parallelism in operators .

From the perspective of execution granularity Morsel-Driven Parallelism With the traditional Volcano The model is no different , about plan-driven For example, the optimizer statically determines how many threads should run at query compilation time , Instantiate a query operator plan for each thread , And connect these with the exchange operator . From my immature database experience Volcano In fact, the minimum execution unit is the operator （ Of course, the operator can also be parallel , Suppose the data is distributed at different nodes , This is a typical intra operator parallelism , Of course, this is from the computing node ）, It also means in a Query Plan Only in EXCHANGE operator Operator time [1][6] Can be parallel between operators .

In fact, a single operator can be executed in parallel , However, there is no way to achieve uniform distribution among multiple cores ,Morsel-Driven Parallelism Use work stealing to dynamically allocate work between threads （ Thought of the previous thread pool implementation ）, This prevents unused due to load imbalance CPU resources , And allow flexibility , That is, you can reallocate between different queries at any time CPU resources .

But from the perspective of pipeline execution Morsel-Driven Parallelism With the traditional Volcano Models are fundamentally different , namely push and pull The difference between , The basic differences are shown in the figure below [13]：
Insert picture description here
The biggest difference between the two, I think, is that the tradition Volcano Represented by model push In order to Operator Centred , and Morsel-Driven Parallelism Such a model is data centric , The latter is easier to achieve parallel friendliness , Because the executing thread is actually bound to the core , And the data source is obtained by the worker thread itself , This is parallel friendly , This is actually a simple and efficient scheduler .

With ClickHouse For example , Through AST Generate Query Plan after , After some RBO Optimize , take QueryPlan It is transformed into... In the way of post order traversal Pipeline, Generated in this way Pipeline, Actually sum push It's the same , Next Pipeline How to execute is the key to parallel friendliness , This essentially requires a well-designed scheduler ,Morsel-Driven Parallelism The implementation of is excellent .

The basic implementation process is as follows ： One Query Execution Will generate a QEPobject, Manage all pipeline-job Dependencies between , This is similar to Volcano Medium query plan, In a pipeline job Execution complete , Will trigger QEPobject Passive state machine , Select the next executable pipeline job, Pass to dispatcher, every last worker Will be bound to a CPU nucleus , And take the initiative from dispatcher Middle pull morsel, Here you can do NUMA Affinity , Priority scheduling can be added , The output results are materialized in NUMA Local storage area in , follow-up pipeline It can be processed based on this .

You can see Morsel-Driven Parallelism In fact, it is to standardize the implementation of operators , That is, all operators use fine-grained elements （morsel） To measure , And the parallelism in the operator is integrated into Query Execution Come on , This can not only achieve uniform load among multiple cores , It can also be done NUMA perception , Even priority scheduling .

Cool！

From a higher point of view ,Morsel Can we extend our thinking to a larger field rather than just an in memory database ？ Because we want to introduce IO, So the execution flow needs to use coroutinue To express , however coroutinue The thread is bound to the core , Each core runs n individual coroutine,channel Notify another pipeline, This architecture also looks great （matrixone This is how we play now ）. Another way of thinking , In the paper, the overall architecture looks like a three-level control ,worker Class A ,dispatcher Class A ,QEPobject Class A , gradually push, Moving out of the core idea seems to require only modifying the underlying worker That floor , The other two layers are reusable , It looks good, too , But I need pipeline Detailed split of , The implementation is relatively complex .

Of course, the above discussion is actually about the implementation of the scheduler , How to base on Query Execution Fine grained splitting pepeline It is also a problem to be considered , After all, the foundation of parallelism comes from pipeline The combination of （ After the sequence traversal ？）, But I am a computing engine Xiaobai , Let's stop this discussion .

Specific solutions can be found in the paper [1][2], Not mentioned here .

summary

The database is extensive and profound , Business ,Query Execution, If you want to make two points clear, you can at least keep them in mind , Not to mention consistency , Multiple copies / How to live , Storage engine , Indexes , performance tuning ,bug screening , Take your time , It is very satisfying for me to figure out a plate .

This article does not represent the general direction of my study , It's just a pastime after dinner . The most important thing at this stage is to practice and use tools , Whether it is the foreseeable specific work needs or , It is also expected to introduce innovative ideas and understanding of chaotic engineering , It is the embodiment of those two points .

By the way, English is also very important , It doesn't mean that it will run out in two years 2333

Reference resources ：