当前位置:网站首页>Is there a completely independent localization database technology
Is there a completely independent localization database technology
2022-07-06 02:46:00 【Ma Nong, little fat brother】
The conflict between Russia and Ukraine some time ago ,Oracle announce “ Suspend all business in Russia ”, I believe everyone's mood is not to watch fire from afar , But think carefully and fear extremely .
Database name IT One of the three core fields ( The other two are CPU And operating systems ), It has always been monopolized by international giants , Others control the core , Lock your throat whenever you want , There's nothing you can do .
Now the solution to this problem can only be self-improvement , Master the core technology of database in your own hands , Make your own domestic database . Actually , This matter has been discussed for decades in China , Back in the last century 80 In the s, the national team dominated by research institutes and universities began to invest in the research and development of domestic databases , And in 90 In the s, several database products were launched successively . However, it is a pity that these product R & D lacks industrial access from the beginning , Not because of the stimulation of actual demand , And purely to have . such , The expansion of products in the commercial market is also relatively weak . As a chaser , I never saw the back of my opponent .
There's a problem :“ Has China crossed the mountain of database ?” Translation is : Is there a completely self-developed domestic database now ? Yes 100 Multiple , After reading it, you can either popularize database knowledge or promote your own products , Most of the answers did not face this question directly . It's really impossible to face , Because we can't say that we have climbed this mountain .
Current situation of domestic databases
In recent years , Hundreds of domestic databases have sprung up , But how many have original technology ?
It's not much ! You can even say it rudely : Hardly any !
These hundreds of domestic databases , Most of them are based on open source database ,90% More than that . Most of them ( Probably again 90%) Is based on MySQL or PostgreSQL Transformed .
MySQL As the most famous open source database , Because of the large number of users 、 Strong compatibility 、 Rich interfaces and other factors , It is not surprising that many domestic database manufacturers have used it to transform it into their own products , After all, many people are familiar with it , The transformation cost is also a little lower .
however , relative MySQL, be based on PostgreSQL( Be commonly called PG) More encapsulated . This is because PG use BSD The open source license is very loose , It is allowed to modify the source code and then close the source , You don't even need a copyright notice . therefore PG Become the favorite of many domestic database manufacturers , Based on PG Encapsulate your own “ original ” Domestic database , Including some famous manufacturers famous for innovation . As the saying goes “ Foreign open source , We are original ”, Some manufacturers are even too lazy to transform ( It may also be unable to transform ), Even the driver can be borrowed directly .
except MySQL and PG Outside these two camps , There are also some encapsulated based on other open source databases , But the quantity is very small . Some domestic databases seem original , But it's actually based on an ancient open source database that has withdrawn from the Jianghu , Now it's hard to see that it's mistaken for original .
In addition to using open source libraries to encapsulate , There are also some domestic database manufacturers who purchase the source code “ independent ”. image 2015 Several Chinese companies bought Informix Source code to develop their own database .
these “ Borrow others ” Non original database manufacturer , Most do not master core technology , After all, it is not easy to digest tens of millions of lines of code . Although the source code in hand , But it is still difficult to carry out in-depth transformation , Future upgrading and development should also rely on others . Sometimes there are even agreements and legal issues , such as MySQL Now the ownership belongs to Oracle all , God knows when O Can you remember what you will do to us if you are happy .
however , The good news is that , There are still a few valuable manufacturers from 0 Begin to realize autonomously . What is more representative is OceanBase. Because it was born in Internet enterprises , Facing the rapidly expanding business , Continuing to use foreign commercial databases is difficult to support both in cost and capacity , It has a strong motivation to get rid of dependence on foreign products , We must find a way of self-study . Of course , Developing a database from scratch is not easy , This is a hard career that can only be sharpened in ten years , Few manufacturers are willing to endure like this .
besides , We There is another more wonderful thing that has been sharpened in ten years , A product that does not look like a database but can complete a large number of database tasks : Runqian software development Totalizer SPL. It is not only completely self-developed in engineering implementation , Even the theoretical model is original , The breakthrough is not just the database itself , And the theoretical framework behind it , Such products are unique in China .
SPL What is it ? What does it have to do with the database ? What's the effect ? What theory has been broken through behind ? Now let's talk about .
SPL The origin of
SPL The main body of development is Runqian software , You may have heard or used the moistening report , yes 20 Years ago, it was an innovative product to solve the production of Chinese style complex reports , The original nonlinear report model theory is used . We know , Report is a strong data calculation scenario , The data in the database is far from the data to be presented , It takes a lot of complex operations to get . The report tool can only solve a small amount of calculation in the presentation step , There is nothing we can do about the data calculation before entering the report tool . This leads to the fact that there are mature report tools to solve the calculation problems of format and presentation , But report development is still difficult .
For this question , There is no good way in the industry , It can only be complicated SQL( And stored procedures ) Or use high-level languages in applications ( Such as Java) Programming , Very cumbersome and inefficient . And because of SQL and Java Development characteristics of , It will also bring high coupling 、 Maintenance difficulties, etc .
In this context , We hope to find a way to solve the difficulty of data calculation 、 The problem of slow calculation . Through a large number of summary and analysis of various data calculation problems encountered, we found , If we continue to use SQL Our technical system can't solve this problem anyway , At best, do some optimization in the project ( Now most databases are doing ), It's just old wine in new bottles .
SQL The theoretical basis of is relational algebra ,SQL The reason why it is difficult to deal with complex data calculation , The fundamental reason is the theory of relational algebra behind . If you want to solve this problem fundamentally, you can no longer be based on relational algebra .
What to do with that ?
Since there is no ready-made available , You can only invent new , Use new theoretical models to solve computational problems !
however , It's easy to say , It's not easy to do . from 2007 Year begins , It took us more than ten years , It took four major refactorings to stabilize the model and structure , Formed a set of theoretical models —— Discrete data sets , Based on this model SPL(Structured Process Language), Programming language specially used for structured data calculation , With the storage mechanism , It can also be understood as a data warehouse product .
because SPL A new theoretical model is adopted , There are no other products on the market that can be used for reference , It is impossible to have ready-made open source code to “ To borrow ”, You can only develop your own line by line . therefore ,SPL The core operation model code of is completely original from head to toe . Even the foundation of rationalism was invented by ourselves , The code can only be original , Are you independent enough ?
Speaking of this, you may find ,SPL It looks different from traditional databases , How effective is its practical application ?
SPL Application effect
For big data computing tasks , In terms of the applied effect ,SPL The performance in practice is excellent . When implementing complex calculations , Not only is the code short , Performance is usually an order of magnitude faster than traditional databases .
A celestial body computing scene of the National Astronomical Observatory :11 A picture , Each sheet 50 Ten thousand celestial bodies ( The target size is 500 ten thousand ), Astronomical distance ( Trigonometric function calculation ) The nearer celestial body is regarded as the same , You need to put the... In different photos “ identical ” Celestial bodies merge , Attribute re aggregation .
The technical essence of this task is a non equivalent Correlation , The amount of calculation is square ( That is to say 50 ten thousand *50 ten thousand =2500 Billion ).Python Code about 200 That's ok , Single threaded Computing 6.5 God , Estimate by speed , Target 500 The scale of 10000 needs to be close 2 Year time , There is no practicality at all ; The distributed database of a large domestic factory used 100 individual CPU Of SQL The code is also used 3.8 Hours , Calculate the single core computing speed ratio Python Also slow ; and SPL The optimized code implemented is only 50 Multiple lines , Using the characteristics of the task greatly reduces the amount of calculation ( Far from 2500 Billion ), stay 4 Nuclear notebook only 2 The calculation was completed in more than minutes , Calculation 500 The target scale of 10000 can be achieved in just a few hours , Completely practical .
Behind this gap : Limited by the theoretical model SQL, This optimization technique cannot be achieved , We can only watch the consumption of computing resources ;Python Although hard coding can realize the optimization algorithm , But the workload is huge , The code will go far beyond 200 That's ok ; Only SPL, Code shorter , And run faster .
More Than This , In other industries ,SPL The advantages are obvious .
In the group insurance details query scenario of an insurance company ,SPL comparison Oracle performance Promoted 2000 times , At the same time, the amount of code is reduced 5 More than times ……
In the optimization scenario of vehicle insurance batch calculation of an insurance company , Use SPL take RDB Run batch time from 2 Hours to 17 minute , The implementation code changes from the original 2000 The line is shortened to less than 500 That's ok ……
In the customer portrait scene of a bank ,SPL Calculate the performance of the intersection of user portraits and customer groups Promoted 200 times above ……
In the report query scenario of a financial user ,SPL Calculate the report from 3700 The second is shortened to 105 second , Promoted 35 times many ……
……
Similar cases SPL Implemented many , I haven't missed yet , The average acceleration is more than an order of magnitude , At the same time, the amount of code is reduced several times .
Here is also a performance test report :《 Performance test report of national production computing database 》(http://c.raqsoft.com.cn/article/1564972044122). use SPL The operation realized on the domestic chip , Can surpass in Intel Running on the chip Oracle. This is all SPL Theoretical innovation ( Discrete data sets ) The effect .
SPL Why stronger
We see SPL I can't help asking after the application effect of ,SPL What kind of magic can actually achieve these amazing effects ?SPL What is the theoretical basis behind the discrete data set model ?
SPL Its advantages mainly focus on two points , Realize data calculation The code is short ( It's simple ), and Higher performance ( Run fast ). That is SPL Have you changed the speed of the computer ? did not , Software cannot change the performance of hardware .SPL The reason why it is stronger is that it designs many algorithms that others don't have ( And storage mechanism ), Based on these algorithms, the computer can perform less operations , So as to obtain high performance , Most of these algorithms rely on discrete data set theory to be well implemented .
Here is SPL Part of the algorithm , Many of them are SPL Original invention of , For the first time in the industry , Look at a spot and know the whole leopard .
Like the common TopN operation , stay SPL in TopN It is understood as aggregation operation , In this way, the sorting with high complexity can be transformed into aggregation operation with low complexity , And it can also expand the scope of application .
A | ||
1 | =file(“data.ctx”).open().cursor() | |
2 | =A1.groups(;top(10,amount)) | Amount before 10 Order with name |
3 | =A1.groups(area;top(10,amount)) | The amount of each region is in the top 10 Order with name |
and SQL Different ,SPL There is no sort word in the statement that completes this operation , There will be no big sort of action , Calculate in complete set or group TopN The grammar is basically the same , Not only is it easier to write , Higher performance . and SQL You can only write sentences with sort words , If you can run fast, you can only rely on the optimization engine of the database , In simple cases, the database can deal with , But the situation is complex and continuous Oracle Such a senior database will “ Faint ”, Here are relevant detailed test cases : Performance optimization techniques :TopN .
We have already SPL The theory of discrete data set is compiled into a paper ( SPL The paper http://c.raqsoft.com.cn/article/1653097658478 ), It strictly defines the algebraic system of discrete data sets , And describes the difference between it and relational algebra .
High performance does not depend on code , It's algebra , Code is just a means of implementation , The key is SPL The data types, algorithms and storage models provided in the theoretical system behind .
This article Write a simple and fast database language SPL It is explained in a more popular way SPL The efficient principle of . Relational algebra and SQL Like arithmetic in elementary school , Only addition, subtraction, multiplication and division , Discrete datasets and SPL It is equivalent to increasing the logarithm of the power square exponent of middle school . Addition, subtraction, multiplication and division can cope with daily shopping and shopping , But to build an airplane building, you must use more mathematics .
I know that , Let's look at the above-mentioned surpassing on domestic chips Oracle stay Intel The performance on the chip is not magical . Even domestic chips still have a long way to go , be based on SPL Create complete autonomy 、 Efficient domestic databases can also become a reality , So that domestic chips can also take off with wings .
SPL The future of
Of course ,SPL There is still a long way to go , Currently, the released functions are only for OLAP( Data analysis ) scene , It mainly solves the problem of data calculation . We know , Besides calculation, there are transactions in the database , It's often said OLTP Ability . In the transaction oriented scenario ,SPL We will still solve all kinds of problems faced by the current database through innovation .
Or innovation?
Now the cloud on database is the general trend , But simply moving relational databases from the local to the cloud does not reflect the characteristics of Cloud Applications . The basic characteristics of cloud applications are The diversity of data structures . Cloud database should provide services for multiple users at the same time , Different users may have different data structures , The data structure of the same user in different periods of time will also change , In this way, a large number of data with different structures will be accumulated and stored and calculated together . This will face personalization ( Different data structures ) Contradiction with a large number of users , This is a problem that relational databases cannot solve .
in fact ,50 This problem was not considered in the design of the database born years ago ( It's impossible to think of 50 Demand after years ), Therefore, there is almost no ability to design data processing for diverse structures in relational algebra . If you want to solve this problem, you can no longer use the relational algebra system .
meanwhile , Relational databases cost too much to achieve consistency , The consumption of resources is serious , Leading to a decline in concurrency . High concurrency is a typical feature of Cloud Applications , This has become a pair of irreconcilable contradictions . The reason for this problem lies in its data organization mechanism ( data type ), This is still determined by its theoretical relational algebra . If you want to give consideration to consistency and high concurrency at the same time, you have to break the limitation of relational algebra , Organize and store data in another way .
Only by breaking through theoretical limitations can we fundamentally solve the problem ,SPL( Discrete data sets ) At the right time !
This future is not far away ,SPL oriented OLTP The function of has been polished in the laboratory for several years , After a period of improvement, you can light your sword and get out of your body , At that time, the domestic database completely based on the independent original theory will break through the sky .
transcend
meanwhile , Theoretical innovation may also bring another result , That's it : transcend ! Surpass foreign products in the field of database .
We understand , As a chaser , Adopting technology following strategy is hopeless . At present, the vast majority of domestic databases are still relational databases , It can be said that they are all technology followers . Foreign giants have been doing these things for decades , People are strong and money accumulates well , I don't have three heads and six arms , Why surpass others ? The only possibility is that the opponent makes mistakes , But as the top ten, we can't count on the front N Two opponents make mistakes at the same time . And hope that some policy will shut out foreign products , It's also a little unpromising, not , And this is unlikely to happen in this open era .
Then only innovation !
database , We must do better than our opponents , A lot better , Only in this way can we have a chance to surpass , In order to make up for the ecological imperfection . And do better , We need disruptive technology , In the face of new technology, we are on the same starting line with our opponents .
Relational databases have been invented for decades , It has long been unable to meet the more complex application requirements and more powerful hardware environment , Many seemingly simple problems are very difficult to do , Development and maintenance costs are high , Nor can we make full use of computer resources , Endure low performance helplessly .
For those relational database giants , To account to the shareholders , It is necessary to maintain a stable income , It can't just kill itself , The result is a relatively unfavorable situation . This gives opportunities for products that can innovate at the theoretical level , Achieving transcendence is not wishful thinking .
No matter how high-grade the carriage is, it is still a carriage , In any case, optimization still depends on the horse . Nascent car , Of course, there will be all kinds of unaccustomed in operation , There will also be many disappointments in function . But it's engine driven , Over time, it will continue to improve , Its great advantage will surely crush the carriage in an all-round way .
Let's wait and see , Let us also forge ahead !
blockbuster ! Open source SPL The exchange group was established
Easy to use SPL Open source !
In order to provide a platform for interested technicians to communicate with each other ,
Specially opened an exchange group ( The group is completely free , No advertising, no classes )
Friends who need to join the group , Long press to scan the QR code below
Friends interested in this article , Please read the original text to collect ^_^
边栏推荐
- UE4 - how to make a simple TPS role (I) - create a basic role
- MySQL winter vacation self-study 2022 11 (7)
- "Hands on learning in depth" Chapter 2 - preparatory knowledge_ 2.3 linear algebra_ Learning thinking and exercise answers
- Six stone management: why should leaders ignore product quality
- 原型图设计
- Redis cluster deployment based on redis5
- Pure QT version of Chinese chess: realize two-man, man-machine and network games
- Introduction to robotframework (I) brief introduction and use
- [Yunju entrepreneurial foundation notes] Chapter II entrepreneur test 21
- 事故指标统计
猜你喜欢
Referenceerror: primordials is not defined error resolution
高数_向量代数_单位向量_向量与坐标轴的夹角
[matlab] access of variables and files
[Yunju entrepreneurial foundation notes] Chapter II entrepreneur test 16
MySQL winter vacation self-study 2022 11 (9)
有没有完全自主的国产化数据库技术
CobaltStrike-4.4-K8修改版安装使用教程
[Yunju entrepreneurial foundation notes] Chapter II entrepreneur test 24
Déduisez la question d'aujourd'hui - 729. Mon emploi du temps I
Fault analysis | analysis of an example of MySQL running out of host memory
随机推荐
[Yunju entrepreneurial foundation notes] Chapter II entrepreneur test 15
Briefly describe the implementation principle of redis cluster
2345文件粉碎,文件强力删除工具无捆绑纯净提取版
[Yunju entrepreneurial foundation notes] Chapter II entrepreneur test 22
事故指标统计
Redis delete policy
MySQL winter vacation self-study 2022 11 (8)
Taobao focus map layout practice
Communication between microservices
力扣今日題-729. 我的日程安排錶 I
Differences and usage scenarios between TCP and UDP
How does yyds dry inventory deal with repeated messages in the consumption process?
[Digital IC manual tearing code] Verilog asynchronous reset synchronous release | topic | principle | design | simulation
力扣今日题-729. 我的日程安排表 I
tcpdump: no suitable device found
"Hands on learning in depth" Chapter 2 - preparatory knowledge_ 2.3 linear algebra_ Learning thinking and exercise answers
CSP date calculation
Building the prototype of library functions -- refer to the manual of wildfire
[Yunju entrepreneurial foundation notes] Chapter II entrepreneur test 20
[postgraduate entrance examination English] prepare for 2023, learn list5 words