当前位置:网站首页>Some small requirements for SQL Engine for domestic database manufacturers

Some small requirements for SQL Engine for domestic database manufacturers

2022-06-24 19:27:00 PostgreSQLChina

Domestic databases have ushered in a good period of rapid development , A large number of enterprise users are migrating their database systems to open source and domestic database platforms . However, our domestic database manufacturers are harvesting a large number of users and “ Counting money ” When , The vast number of users are also looking forward to the domestic database becoming better .SQL The engine is one of the core components of the database , Therefore, a large number of domestic database manufacturers are competing in the internal volume , It also needs to be able to make some functions that satisfy the majority of users .

Our team has been engaged in system optimization and database operation and maintenance tool development for many years , Over the years, I have also contacted a large number of users , Encountered a large number of database pits , Most of them are and SQL Engine related . So today I also represent the majority of users , Put forward some suggestions for the domestic database SQL The functional requirements of the engine . It is hoped that in the new version of the domestic database , We can see that these functions have been gradually realized .

If you want domestic databases SQL The engine has full functions HASH JOIN, Everybody knows HASH JOIN It is the most effective connection method to solve the problem of large data table Association .Oracle Of HASH JOIN Very powerful , A large number of complex connection conditions , Both can pass HASH JOIN To settle . Although now many open source 、 All domestic databases support HASH JOIN, But for HASH JOIN There are still many blind spots in the support of . If there are certain circumstances , Just right HASH JOIN Can't use , So this one SQL There is only one way left to rewrite , This is for developers and DBA It is a disaster .

The second requirement is SQL Fingerprints and execution plan fingerprints

SQL Fingerprints and SQL ID Not exactly the same thing ,SQL ID Can only point to a unique SQL sentence , and SQL Fingerprints can be a set of slightly different SQL Statements are classified as a kind of SQL. For example, we have one SQL, Except for some case differences , The others are the same , Or only one variable is different , The others are the same , So these different SQL It should be the same SQL, Although these SQL Of SQL ID It may be different , But these SQL Have the same fingerprint information . These fingerprints can be used to find the same SQL, Make a unified analysis .

Execution plan fingerprints refer to identical execution plans , It could be different SQL ID Of SQL Will use the same execution plan , stay SQL There will be an identification of the execution plan fingerprint in the , Point to this execution plan . adopt “ Execution plan fingerprint ”, We can reduce the number of execution plans stored in memory , Whether the global execution plan is implemented or not , The execution plan can be stored in a shared memory area , For monitoring analysts . Allied SQL The functions of fingerprints and execution plan fingerprints are actually Oracle Most of the databases have implemented , Interested friends can go to study .

The third requirement is HINT

Upgrading the optimizer is quite difficult , It needs a lot of money and time to do better , It is absolutely impossible to rely on a few smart masters to complete . If CBO The optimizer really can't make the right judgment , When you have to use the wrong execution plan , Developers can still use HINT To enforce corrective action plans . At present, there are also some domestic databases and open source databases to support hint 了 . But in the implementation method , Many domestic and open source databases use plug-ins , Using the hooks in the database code , in addition HINT The supported operations are not complete yet .

The plug-in implementation through the hook is still more efficient than the original kernel support , Support rich... Directly in the kernel HINT It is absolutely to improve the domestic database SQL The inevitable way to analyze efficiency . stay HINT Supported operational aspects ,HINT It's not just mandatory to specify an execution scheme , It can also realize the strong read-write separation in the cluster computing environment 、 Weak read / write separation and other functions . For example, set the cluster computing environment MASTER The strategy of choice , And indicate that an operation can be placed on a read-only node , Even indicate that an operation is a weakly consistent operation , Maximum limit of operation data delay, etc . these HINT Cluster computing environment is often required to be incorporated into the database kernel , Not just plug-ins .

The fourth requirement is OUTLINES Original ecological support

When we can't directly modify SQL, add to HINT To force an optimized execution plan , We can only rely on OUTLINES 了 . Conventional OUTLINES Only for one SQL ID, If there are some cases where bound variables are not used , There's no way to get through SQL ID To specify the OUTLINES. And often in a system , these SQL Is the most commonly used , And the most important . stay OUTLINES On the realization of , If you can SQL Fingerprint to set , that OUTLINES There will be a wider range of uses .

The fifth requirement is long running SQL Visualization of execution progress

To provide an example similar to Oracle V$SESSION_LONGOPS External interface view of . But I hope it can be better than Oracle Provide more information . For example, which execution plan does the current operation come from ( Execution plan fingerprint ), And this operation is in the first few steps of the execution plan . Of course SQL The execution progress visualization only shows the operations that have been executed for a long time , Only when the execution cost of an operator in the execution plan exceeds a certain threshold , You need to output to the interface , Otherwise, this output will affect SQL The efficiency of the engine . This part of the function can be realized only at a certain operator level , Don't need to do SQL Grade ,SQL Engine or performance first , Visualization is secondary .

Actually SQL The optimizer in the engine is the most difficult part to improve , A large number of application cases are needed to promote its optimization and improvement . And some optimizers are very difficult to optimize , To make an excellent CBO The optimizer can not be completed overnight . But before the optimizer reaches perfection , Must be sufficient . That is, we can try our best to make our developers not always face SQL The dilemma of not working properly without rewriting . The user's application scenario is very complex , Therefore, as a developer of domestic databases , Concentrate on solving the problems that must be solved , The rest of the questions passed HINT,OUTLINES It seems that this is not a very intelligent means to make up for the insufficient capacity of the optimizer , It is also necessary . Anyway? , A database that can solve user problems is a good database .


Click here to read the original text

This article is from WeChat official account. - Open Source Software Alliance PostgreSQL Branch (kaiyuanlianmeng).
If there is any infringement , Please contact the [email protected] Delete .
Participation of this paper “OSC Source creation plan ”, You are welcome to join us , share .

原网站

版权声明
本文为[PostgreSQLChina]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/175/202206241752060260.html