当前位置:网站首页>The integrated real-time HTAP database stonedb, how to replace MySQL and achieve nearly 100 times the improvement of analysis performance
The integrated real-time HTAP database stonedb, how to replace MySQL and achieve nearly 100 times the improvement of analysis performance
2022-07-26 04:45:00 【StoneDB】

The industry revolves around MySQL build HTAP Mainstream programs
1. MySQL + Hadoop

2. MySQL + Data Lake

3. MySQL + ClickHouse/Greenplum

4. Based on multiple copies of Divergent Design

The system architecture is too heavy , The complexity of operation and maintenance is high ; TP Data is passed through ETL Mode to AP In the system , The data delay is large , It is difficult to meet the real-time requirements of services for analysis ; Heterogeneous database combination , Technically, two sets of database systems need to be maintained , It involves many technology stacks , Higher requirements for technical personnel ; NewSQL System , Various compatibility adaptations are required , Adaptation work will be more complex , The requirements for technicians are also relatively high . So , We brought in HTAP The solution to the problem :StoneDB, An open source integrated real-time HTAP database .

StoneDB Plug in access to MySQL, By inquiring / Write interface and MySQL server Layers interact , The main features of the current integrated architecture are :
Organize data by column storage , Combined with efficient compression algorithm , bring StoneDB While obtaining high performance, it also has the advantage of storage cost . Based on knowledge grid (Knowledge Grid) Approximate query and parallel processing mechanism , bring StoneDB When dealing with massive data and complex queries , It can minimize irrelevant data IO. Using histogram , Data block bitmap and many other statistical information to further accelerate the speed of query processing . Adopt Column-at-a-time An execution engine for columnar storage , And further improve the efficiency of the execution engine . Provide high-speed data loading capability .
Let's take a look StoneDB Architecture design :
Architecture design : Data organization form

stay StoneDB in , Data is organized in columns . This form of data organization , Friendly to all kinds of compression algorithms , According to the type of each column 、 Data and other factors, choose the appropriate efficient compression algorithm , To save IO and Memory Purpose of resources . It also has the following advantages :
Cache Line friendly . During query , The operations for each column are performed concurrently , Finally, aggregate the complete recordset in memory . When querying ad hoc , Just scan specific columns , No need to consume IO Resources to read the values of other columns . No need to maintain index , Support ad hoc query of any column combination . It can provide knowledge-based grid capabilities , Improve data search efficiency .
Architecture design : Column based data compression
As mentioned above , Data is organized in columns , All records in the column have the same type , You can choose the corresponding efficient compression algorithm according to the data type , because :
The probability of duplicate values in the column is high , The compression effect is obvious . The data node size is fixed , Maximize compression performance and efficiency . Compress according to a specific numeric type (int,float,date/time,string etc. ).
StoneDB It can support up to 20+ An adaptive compression algorithm , At present, it mainly uses :
PPM LZ4 B2 Delta wait
Architecture design : Data organization structure and knowledge grid

Physical data is divided into fixed data blocks , For storage , Usually called :Data Node, Usually it is :128KB, The system is convenient for IO Optimization of efficiency . meanwhile , It can also provide block based (Block) Efficient compression / encryption algorithm . Knowledge grids can be query optimizers , Implementation and compression algorithm support . for example : Query based on knowledge grid , The optimizer will use the knowledge grid to decide what to grab Data Node To perform data operations .
Data nodes (Data Node,DN): The data block size is fixed ( Typical values 128KB), Optimize IO efficiency , Provide block based (Block) Efficient compression / encryption algorithm . Knowledge grid (Knowledge Grid,KG): For metadata storage . Metadata node (Metadata Node,MDN): Metadata information describing the data node . By the knowledge node (Knowledge Node,KN) form , For the query optimizer , Support plan execution and compression algorithm .
Architecture design - Inquire about : Knowledge grid ( Knowlegde Grid ) overview

Architecture design - Inquire about : be based on Knowlegde Grid The optimizer for



Architecture design - Inquire about : Processing flow


select * from xx where seller = 86, The internal execution process is as follows :Implementation plan optimization and implementation :
Based on knowledge grid Cost-based Optimize IO Thread pool maintenance Memory allocation and management
SMP Support ( Concurrent query ) Vectorization execution

Fully compatible with MySQL. Whether it's grammar or ecology MySQL Users can seamlessly switch to StoneDB. Business 、 Integration of analysis . There is no need to ETL, Transactional data is synchronized to the analysis engine in real time . It enables users to obtain real-time business analysis results . Fully open source . Compare with MySQL Provide 10-100 Times AP Ability . Hundred million level multi meter correlation rapid response , There is no need to wait for the result of the decision . 10 Times the import speed . because AP scenario , The amount of analysis data is huge , Efficient import speed , It can bring you a good user experience . 1/10 Of TCO cost ,StoneDB Have efficient compression algorithm , Seamless business migration capability , And its simple architecture , Can bring to users TCO Reduction .
StoneDB 2.0 Will bring a new architecture
StoneDB Open source warehouse
https://github.com/stoneatom/stonedb

I worked in Huawei 、 Iqiyi 、 Peking University is engaged in the design of the core architecture of the database kernel . exceed 10 Experience in database kernel development , Good at query engine , Execution engine , Large scale parallel processing and other technologies . It has dozens of database invention patents , The author of 《PostgreSQL Inquiry engine source code technology analysis 》.

Graduated from Huazhong University of science and technology , Like to study the mainstream database architecture and source code .8 Years of experience in database kernel development , Once engaged in distributed database CirroData 、RadonDB and TDengine Kernel Research and development , I am now in the position of StoneDB Kernel architects and StoneDB project PMC.
https://www.bilibili.com/video/BV1U3411F76U
https://www.bilibili.com/video/BV1gS4y1H7NK
https://www.bilibili.com/video/BV19f4y1Z7JB

This article is from WeChat official account. - StoneDB(StoneDB2021).
If there is any infringement , Please contact the [email protected] Delete .
Participation of this paper “OSC Source creation plan ”, You are welcome to join us , share .
边栏推荐
猜你喜欢

Working principle and application of fast recovery diode

Keil v5安装和使用

快恢复二极管工作原理及使用

"Game engine light in light out" 4. shader

七、RESTful

Offline installation of idea plug-in (continuous update)

【300+精选大厂面试题持续分享】大数据运维尖刀面试题专栏(八)

STM32开发 | AD7606并行多路采集数据

Recursive implementation of exponential enumeration

There was an unexpected error (type=Method Not Allowed, status=405).记录报错
随机推荐
C语言——字符串函数,内存函数集锦以及模拟实现
Integrated architecture of performance and cost: modular architecture
Array sort 3
Spark Structured Streaming HelloWorld
C语言——指针一点通※
Is this my vs not connected to the database
Rman-06031 cannot convert database keywords
Array sort 1
2022 Henan Mengxin League game (3): Henan University J - magic number
数组排序3
can 串口 can 232 can 485 串口转CANbus总线网关模块CAN232/485MB转换器CANCOM
Good at C (summer vacation daily question 6)
egg-ts-sequelize-CLI
Embedded practice -- CPU utilization statistics based on rt1170 FreeRTOS (24)
Use Baidu PaddlePaddle easydl to complete garbage classification
2022 Henan Mengxin League game (3): Henan University a - corn cannon
Whether the SQL that fails to execute MySQL is counted into the slow query?
【UOJ 429】串串划分(Runs)(容斥)+ 有关 Lyndon Tree 及其应用的小小记录
Sliding window -- leetcode solution
Codeforces Round #807 (Div. 2)

