当前位置:网站首页>How is a Clickhouse query completed?
How is a Clickhouse query completed?
2022-06-24 05:58:00 【felixxdu】
Clickhouse SQL FUNCTION Introduce
Clickhouse The functions in can be roughly divided into three categories :
- Ordinary function It can also be called One line function , Detail function , from IFunction Interface definition . For the queried table or view Each row returns a result value . Common digital operation functions , Type conversion functions , Conditional function , Comparison functions, etc . see clickhouse There are as many detailed functions supported 600 Multiple , And the number of iterations supported is increasing . If you need to add support for new functions , At present, the only way is source code Medium hard code . Not yet
create udf return type as ...Allied udf The function of . You can use the following SQL Query supported function:select * from system.functions where "is_aggregate"=0select * from mysql('host:port','database', 'table', 'user','password') -- see mysql Data in the database select * from numbers() limit 10,1000000; -- Single threaded generation 10~1000010 Between the numbers select * from numbers_mt() limit 10,1000000; -- Multithreaded generation 10~1000010 Between the numbers - polymerization function from IAggregateFunction Interface definition , To rowset group ( A collection of rows ) Do aggregate calculations , Aggregate functions can only return one value per group . Common are sum,avg Functions, etc , The state of aggregate functions supports serialization and deserialization , So it can be transmitted between distributed nodes , To achieve incremental computing . Query supported aggregations function:
select * from system.functions where "is_aggregate"=1 - surface function Common ones are tables function Yes
mysqlurlnumbersremoteetc. , As a data source (storage) Use , With the from After Clause . Common use :
For the introduction of all functions, see : Official documents
AST The structure of the tree
Parser and Interpreter Are two very important sets of interfaces :Parser Responsible for creating AST object ,Interpreter The interpreter is responsible for explaining AST, And further create the execution of the query pipeline. They are associated with IStorage Together , The whole data query process is concatenated .
Parser Take one SQL The statement is recursively parsed into AST The form of the grammar tree . Different SQL sentence , Through different Parser Implement class parsing . Based on the current community master Branch version ,parser Has as many subclasses as 170 Multiple . The main one is src/parser Next , be responsible for clickhouse class sql Syntax parsing ;mysql Some of the following parser Mainly responsible for clickhouse It can be used as mysql Syntax parsing of the client side of .
They have implemented the two main interfaces according to their respective responsibilities :getName() And parseImpl(). It's responsible for parsing DDL Query statement ParserRenameQuery、ParserDropQuery and ParserAlterQuery Parser , There are also people who are responsible for parsing INSERT Of the statement ParserInsertQuery Parser , And responsible for SELECT Of the statement ParserSelectWithUnionQuery etc. .
This parser The way of working is to expand in a hierarchical way , One SQL Come here , First construct a parserQuery Of root parser , At the root parser The first category to judge the attribution , Then there are large categories of parserImpl Will be called to multiple secondary categories parser... And so on .
root / Class A parser(ParserQuery) There are the following two levels in parser( Then there are function notes )(ClickHouse/src/Parsers/ParserQuery.cpp):
ParserQueryWithOutput query_with_output_p; // The most common SQL Statements will match this parser ParserInsertQuery insert_p(end); // insert sentence ParserUseQuery use_p; // use db sentence ParserSetQuery set_p; // set key1 = value1 sentence ParserSystemQuery system_p; // system Opening statement https://clickhouse.tech/docs/en/sql-reference/statements/grant/#grant-system ParserCreateUserQuery create_user_p; // CREATE USER or ALTER USER ParserCreateRoleQuery create_role_p; // CREATE ROLE or ALTER ROLE ParserCreateQuotaQuery create_quota_p; // CREATE USER or ALTER USER ParserCreateRowPolicyQuery create_row_policy_p; // Implement row level permission control ParserCreateSettingsProfileQuery create_settings_profile_p; // CREATE SETTINGS PROFILE or ALTER SETTINGS PROFILE ParserDropAccessEntityQuery drop_access_entity_p; // DROP USER|ROLE | QUOTA ParserGrantQuery grant_p; // GRANT or REVOKE Table and column level permission control ParserSetRoleQuery set_role_p; // SET ROLE ParserExternalDDLQuery external_ddl_p; //EXTERNAL DDL FROM external_source(...) DROP|CREATE|RENAME
The most important secondary parser ParserQueryWithOutput Then there are the following parser...
ParserShowTablesQuery show_tables_p; // be responsible for show [tables /databases/...] Syntax parsing ParserSelectWithUnionQuery select_p; // be responsible for select Query syntax parsing entry , There are more inside parser ParserTablePropertiesQuery table_p; // (EXISTS | SHOW CREATE) [TABLE|DICTIONARY] [db.]name [FORMAT format] ParserDescribeTableQuery describe_table_p; // (DESCRIBE | DESC) ([TABLE] [db.]name | tableFunction) [FORMAT format] ParserShowProcesslistQuery show_processlist_p; // SHOW PROCESSLIST ParserCreateQuery create_p; // CREATE|ATTACH TABLE ... ParserAlterQuery alter_p; // ALTER TABLE [db.]name ParserRenameQuery rename_p; // RENAME TABLE [db.]name TO [db.]name, [db.]name TO [db.]name ParserDropQuery drop_p; // DROP|DETACH|TRUNCATE TABLE [IF EXISTS] [db.]name ParserCheckQuery check_p; // CHECK [TABLE] [database.]table ParserOptimizeQuery optimize_p; // OPTIMIZE TABLE [db.]name [PARTITION partition] [FINAL] [DEDUPLICATE] ParserKillQueryQuery kill_query_p; // KILL QUERY WHERE ... [SYNC|ASYNC|TEST] ParserWatchQuery watch_p; // WATCH [db.]table EVENTS Function is introduced :https://clickhouse.tech/docs/en/sql-reference/statements/watch/ ParserShowAccessQuery show_access_p; // SHOW ACCESS ParserShowAccessEntitiesQuery show_access_entities_p; // SHOW USERS; SHOW [CURRENT|ENABLED] ROLES; SHOW [SETTINGS] PROFILES etc. ParserShowCreateAccessEntityQuery show_create_access_entity_p; // SHOW CREATE USER [name | CURRENT_USER] ParserShowGrantsQuery show_grants_p; // SHOW GRANTS [FOR user_name] ParserShowPrivilegesQuery show_privileges_p; // SHOW PRIVILEGES ParserExplainQuery explain_p; // EXPLAIN AST|PLAN|SYNTAX|PIPELINE SELECT...
And so on .parser At the end of the day, a Ast Syntax tree . They have a common interface IAST, Inheritance system and parser Very similar .
Lexical and grammatical analysis
Two concepts are introduced :
Token: Represents a meaningful... Composed of several characters ” word “,token There's a lot of type, see src/Parsers/Lexer.h Macro definition under .
Lexer: Lexical parser , Input sql sentence , Spit out one by one token. And finally put these token Add some meaningful information and organize it according to the rules Ast Trees .
AST Tree analysis Function The process of
Among them function Most relevant parser The entrance ParserExpressionList, Final parse Realize in ParserLambdaExpression in parseImpl. stay parser Stage , Can't test function Whether there is . First, we'll build a ASTIdentifier, And then, with the parameters, we build ASTFunction; stay pipeline The existence of the parameter will be verified only when it is actually executed .
Interpreter To pipeline Implementation
Interpreter The interpreter works like Service The service layer is the same , Aggregate the resources required by each operator and concatenate the entire query process . First, it will parse AST object , And then execute “ Business logic ”( For example, branch judgment 、 Set up Parameters 、 Call interface, etc ), Eventually return IBlock object , Set up a query execution in the form of thread pipeline.
One Query The processing flow is generally :
stay clickhouse in ,transformer Is the concept of operator . all transformer Arranged into a pipeline (pipeline), And then to pipelineExecutor stream perform , Every execution of a transformer A batch of data sets in will be processed and output , All the way downstream sinker.
Clickhouse A series of basic transformer modular , see src/Processors/Transforms, such as :
- FilterTransform – WHERE filter
- SortingTransform – ORDER BY Sort
- LimitByTransform – LIMIT tailoring
- ExpressionTransform - Expression execution
When we execute :
SELECT age + 1 FROM t1 WHERE id=1 ORDER BY time DESC LIMIT 10 about ClickHouse Of QueryPipeline Come on , It will be arranged and assembled in the following way :
QueryPipeline::addSimpleTransform(Source) QueryPipeline::addSimpleTransform(FilterTransform) QueryPipeline::addSimpleTransform(SortingTransform) QueryPipeline::addSimpleTransform(LimitByTransform) QueryPipeline::addSimpleTransform(ExpressionTransform) QueryPipeline::addSimpleTransform(Sinker)
When QueryPipeline Conduct transformer When the choreography , There is also a need for a lower level DAG Connected construction .
connect(Source.OutPort, FilterTransform.InPort) connect(FilterTransform.OutPort, SortingTransform.InPort) connect(SortingTransform.OutPort, LimitByTransform.InPort) connect(LimitByTransform.OutPort, ExpressionTransform.InPort) connect(ExpressionTransform.OutPort, Sinker.InPort)
In this way, the data flow relationship is realized , One transformer Of OutPort Docking with another InPort. meanwhile , Different transformer The operator of , If it can be executed in parallel ( such as filter,expression Can be executed in parallel ), There will be more fission transformer , Achieve a parallel acceleration effect .
边栏推荐
- A letter from little potato
- Technical dry goods | multi modal classification and recognition of audio-visual scenes in the stage of Tencent cloud smart media AI
- NoClassDefFoundError and classnotfoundexception exceptions
- Fixed assets management software enables enterprises to realize intelligent management of fixed assets
- Test development knowledge map
- Is the prospect of cloud computing in the security industry worth being optimistic about?
- What are the advantages of building a private cloud platform?
- How to resolve the primary domain name and how to operate it
- How do users in the insurance upgrade industry choose?
- Disaster recovery series (III) -- cloud network disaster recovery construction
猜你喜欢
随机推荐
Playing "honey in snow and ice city" with single chip microcomputer
How to build a website with a domain name? Is the website domain name free to use?
What happened to the JVM locking on Tencent ECS?
Best practices for building a distributed Domain Driven Architecture Based on data mesh
How to renew the domain name when it expires
ZABBIX enterprise distributed monitoring
System of test development - create test virtual machine on demand
Netaapp data recovery process
How about the online domain name? Is it easy to use from the current market
How to get a secondary domain name? What does a secondary domain name mean?
Tencent cloud ceontos server patrol script
How to resolve the domain name? How to choose a domain name?
Tensorflow daily essay (I)
How does the company domain name come from? What kind of domain name is a good domain name
How to solve the enterprise network security problem in the mixed and multi cloud era?
Go concurrency - work pool mode
Oceanus practice consumption CMQ subject model data source
Test development knowledge map
As a sigmastar agent, Qiming cloud shares dry goods for you: what are the characteristics of ssd201/202
How to apply for a primary domain name? Is primary domain name good or secondary domain name good?


