当前位置:网站首页>Overview of Yunxi database executor
Overview of Yunxi database executor
2022-07-03 22:23:00 【Inspur Yunxi database】
as everyone knows , Before actuator execution , Need the support of the plan . Plans are divided into logical plans and physical plans . Their relationship is like we're going on a trip , Choosing which means of transportation is equivalent to logical planning , In this step, for example, after selecting an aircraft . Choosing which airline is equivalent to a physical plan . Last , When you really start to travel, it's equivalent to carrying out . Here's how SQL Basic architecture diagram of statement execution , From this, we can clearly see the execution process of optimizer and executor in the whole process .
Optimizer
Logical plan and physical plan are responsible for generating execution plan and index selection , Think of it as an optimizer . For example, executing such a statement , Execute the of two tables join:
Select * from t1 join t2 using(ID) where t1.c = 10 and t2.d = 20;
You can start with t1 Take it out c=10 Records of the ID, According to ID Related to t2, To determine t2 Inside d Is the value of 20, You can also start with t2 Take it out c=20 Records of the ID, According to ID Related to t1, To determine t2 Inside d Is the value of 10. The logic of these two execution methods is the same , But the execution efficiency is different , The optimizer can estimate the cost and decide to use the scheme . In distributed database , The physical plan can also be based on the data to be used span The node where the operator is located determines which node the operator is executed on , So as to realize distributed execution . Overview of the specific role of distributed related physical plans in distributed execution .
Distributed execution
The key idea of distributed execution is how to change from logical execution plan to physical execution plan , There are mainly two aspects involved here , One is the distributed processing of computing , One is distributed data processing .
Once the physical plan is generated , The system needs to split it up and distribute it to each node Running between . Every node Responsible for local dispatching processors and inputs.node You also need to be able to communicate with each other to put the output output router Connect to input. especially , Need one streaming interface To connect these components . To avoid extra synchronization costs , A flexible execution environment is needed to satisfy all of the above operations , So that different node In addition to the initial scheduling of the execution plan , Can be relatively independent to start the corresponding data processing work , Without being gateway Other choreographing effects of nodes .
In a cluster of databases Gateway node A scheduler will be created , It accepts a set of flow, Set input and output related information , To create local processor And start execution . stay node When processing input and output data , We need to be right about flow Take some control , Through this control , We can refuse request Some of the requests in .
Every Flow Represents a complete piece of execution across nodes in the entire physical plan , from processors and streams form , You can pull the data of the fragment 、 Data processing and final data output . As shown in the figure below :
For cross node execution ,Gateway node First, we will serialize the right FlowSpec by SetupFlowRequest, And pass GRPC Send to the far end node, Distal node After receiving , It will be restored first Flow, And create what it contains processor And interactive use stream(TCP passageway ), Complete the construction of the implementation framework , Then start the multi node calculation driven by the mesh joint points .Flow Through between box The cache pool is scheduled asynchronously , To realize the parallel execution of the whole distributed framework .
For local execution , It's parallel execution , Every processor,synchronizer and router Can be used as goroutine function , Between them by channel interconnection . these channel Channels can be buffered to synchronize producers and consumers .
To achieve distributed concurrent execution , The database introduces Router The concept of , about JOIN and AGGREGATOR According to the data distribution characteristics , Three ways of data redistribution are realized ,mirror_router、hash_router and range_router, By redistributing data processor The operator is internally split into two phases , The first stage is to process part of the data in the node where the data is located , After processing, the results , It will be redistributed according to the operator type , The second stage is aggregation , In this way, a single operator and multi node cooperative execution are realized .
边栏推荐
- 1 Introduction to spark Foundation
- Cognitive fallacy: what is Fredkin's paradox
- Investment planning analysis and prospect prediction report of China's satellite application industry during the 14th five year plan Ⓑ 2022 ~ 2028
- Redis single thread and multi thread
- regular expression
- Why should enterprises do more application activities?
- Analysis report on the development trend and Prospect of global and Chinese supercontinuum laser source industry Ⓚ 2022 ~ 2027
- Buuctf, misc: n solutions
- Covariance
- Teach you to easily learn the type of data stored in the database (a must see for getting started with the database)
猜你喜欢
[secretly kill little partner pytorch20 days] - [day3] - [example of text data modeling process]
2022 G3 boiler water treatment registration examination and G3 boiler water treatment examination papers
Buuctf, misc: n solutions
[Android reverse] application data directory (files data directory | lib application built-in so dynamic library directory | databases SQLite3 database directory | cache directory)
On my first day at work, this API timeout optimization put me down!
Exness: the Central Bank of England will raise interest rates again in March, and inflation is coming
Compréhension de la technologie gslb (Global Server load balance)
Morning flowers and evening flowers
Asynchronous artifact: implementation principle and usage scenario of completable future
Blue Bridge Cup Guoxin Changtian MCU -- program download (III)
随机推荐
Compréhension de la technologie gslb (Global Server load balance)
How PHP gets all method names of objects
Pat grade A - 1164 good in C (20 points)
[actual combat record] record the whole process of the server being attacked (redis vulnerability)
The reason why the computer runs slowly and how to solve it
STM32 multi serial port implementation of printf -- Based on cubemx
string
2022 high altitude installation, maintenance and removal of examination question bank and high altitude installation, maintenance and removal of examination papers
Bluebridge cup Guoxin Changtian single chip microcomputer -- detailed explanation of schematic diagram (IV)
[secretly kill little partner pytorch20 days] - [day3] - [example of text data modeling process]
WFC900M-Network_ Card/Qualcomm-Atheros-AR9582-2T-2R-MIMO-802.11-N-900M-high-power-Mini-PCIe-Wi-Fi-Mod
LeetCode 1646. Get the maximum value in the generated array
Leetcode week 4: maximum sum of arrays (shape pressing DP bit operation)
Teach you how to run two or more MySQL databases at the same time in one system
DR-AP40X9-A-Qualcomm-IPQ-4019-IPQ-4029-5G-4G-LTE-aluminum-body-dual-band-wifi-router-2.4GHZ-5GHz-QSD
Harbor integrated LDAP authentication
Introduction to kubernetes
The latest analysis of crane driver (limited to bridge crane) in 2022 and the test questions and analysis of crane driver (limited to bridge crane)
IPhone development swift foundation 08 encryption and security
Buuctf, web:[geek challenge 2019] buyflag