当前位置:网站首页>Ant Group's time series database CeresDB is officially open source
Ant Group's time series database CeresDB is officially open source
2022-08-04 19:32:00 【Alipay Technology】
“我挺好奇的,为什么叫 CeresDB?”
“Was reading the hit TV drama《苍穹浩瀚》(The Expanse),With Ceres(Ceres)Is the first planet was discovered asteroid,CeresDB 是 TimeSeries 的谐音,And also very easy to read to is it.”
“...这么简单?”
“May be the kind of predestination.”
1.背 景
CeresDB Born in the ant group,是一个分布式、高可用、Highly reliable time series database Time Series Database.经过多年双11打磨,As the ant total station monitoring data storage time series database,Bearing the trillions of data point to every day,And to provide multidimensional query.今天 CeresDB 宣布正式开源,通过开源,We want to help users solve the time series data storage scale and high availability of pain points,乃至针对时序数据的复杂分析计算能力的需求.The formal open source also with our open source version 0.2.0 的发布.
2.Time series data is introduced
Time series data is a collection of a series of data points based on time,在有时间的坐标中将这些数据点连成线,From the time dimension look forward can make dimensional reports,揭示其趋势性、规律性、异常性;往未来看可以做大数据分析、机器学习、实现预测和预警.
我们经常会在 IoT And some analysis of anomaly detection in the scene heard time-series data this noun,如下图:
Can be seen that the sequential data is according to the time dimension record data column,And have comparable data line before and after,Thus by observing the data column with the change of time,Needed to extract the useful information.
The following text is above the original data points:
host=192.168.0.1,cluster=A | timestamp=2022-06-26 15:00 ==> 4
host=192.168.0.1,cluster=A | timestamp=2022-06-26 15:01 ==> 4.8
host=192.168.0.1,cluster=A | timestamp=2022-06-26 15:02 ==> 3.9
host=192.168.0.1,cluster=A | timestamp=2022-06-26 15:03 ==> 4.7
host=192.168.0.1,cluster=A | timestamp=2022-06-26 15:04 ==> 3.9
host=192.168.0.1,cluster=A | timestamp=2022-06-26 15:05 ==> 3.5
host=192.168.0.2,cluster=B | timestamp=2022-06-26 15:00 ==> 2.2
host=192.168.0.2,cluster=B | timestamp=2022-06-26 15:01 ==> 1.3
host=192.168.0.2,cluster=B | timestamp=2022-06-26 15:02 ==> 2.9
host=192.168.0.2,cluster=B | timestamp=2022-06-26 15:03 ==> 2.6
host=192.168.0.2,cluster=B | timestamp=2022-06-26 15:04 ==> 1.4
host=192.168.0.2,cluster=B | timestamp=2022-06-26 15:05 ==> 1.8
2.1 Writing characteristics of the data
Write to continue、平稳、高吞吐:With fixed frequency commonly write data
写多读少:Usually only care about a few specific key indicators or specific scenarios of index
Real-time written to the latest data:随着时间推进,Write the data from the time property of the new are the new data,Little or no old data update
2.2 数据查询/分析的特点
According to the time range to read
Recent data has been read the highest probability of
历史数据粗粒度查询的概率高
多种精度查询
A variety of dimension query
2.3 数据存储的特点
量大:TB 甚至 PB,Data compression is the key to reduce cost
冷热分明:The historical data,Queried the lower the probability of,The lower the accuracy requirement of the data of the time
时效性:Data life cycle,Over the life cycle of data can be clean up
多精度数据存储:Based on the cost of storage and query efficiency considerations,Reduced to a variety of data needs to be more rough time accuracy for storage
Can be pre-aggregated:Fixed condition query scenarios more,Based on the query efficiency considerations,Usually provide pre-aggregated every time to avoid the query after calculation
Because of the time-series data record size,This leads to the write to become a bottleneck of real-time data by an,The cost of mass data storage,成为新的技术挑战.传统的数据库如 MySQL For failing to take full advantage of the characteristics of the time-series data in,It's hard to solve the corresponding written and analyzes its performance、The problem such as storage cost,If the sequential data storage based on the traditional database,Usually enterprise's operation and maintenance cost will be a sharp rise in.
3.The temporal database status
由上图 (来自 DB-Engines Ranking)不难看出,In recent years the temporal database is in rapid development phase,Emerging a group of outstanding open source product,The depth of the temporal database technology to a higher level.
Look at temporal database ranking(上图来自 DB-Engines TimeSeries),The heat in the first three are InfluxDB、Kdb+、Prometheus.Nowadays the most popular sequence database not InfluxDB 莫属,常年霸榜 DB-Engines TimeSeries 分类.InfluxDB Is, in fact, around the time sequence data to build a ecological,In addition to the sequential storage itself still has a lot of cooperation InfluxDB 的项目,比如告警、日志收集等.不过遗憾的是 InfluxDB Open source by only single engine storage,Clustering is provided as a commercial service.
About the fast two years ago,InfluxDB 启动了一个叫 InfluxDB IOx 项目,IOx 发音为 eye-ox,iron oxide(氧化铁)的简写,With rust(Rust)相呼应.IOx Project preparation in sequential OLAP 方向上发力.
是的你没看错,IOx 项目基于 Rust 开发,We of the open source CeresDB The main technology stack is also Rust.Another popular temporal database is Kdb+.提到 Kdb+,Understanding of friends always know,It is a Wall Street investment bank、All kinds of days and high-frequency trading quant funds approaches the standard configuration.Kdb+ Is both a database(Kdb)Is also a vector language(q),It is based on the concept of orderly table(时间序)A memory for columns of the database,Its main data is stored in RAM 中.Kdb+ Query speed is very fast,I think this is mainly due to three o 'clock:
Kdb+ Make full use of the vectorization,Vectorization allows an operation for the multiple data points,Can greatly reduce the number of operations necessary to achieve an action,This eliminates the repetitive operation of each piece of data,大大减少了开销;
Kdb+ Has a built-in query language,Calculated directly within the engine perform,No additional data transfer.With the help of the built-in query language,Data can be directly analysis and calculation of the rolls,Without the analysis layer for mobile network and data.Analysis of the rolls for scanning a large number of original data but small scene polymerization calculated results,Will no doubt accelerate query speed is orders of magnitude;
Column storage structure on analytical scenario query more effective,Everyone could understand a certain column save for the benefits of a few columns query scenario,And friendly degrees of vectorization operation,这里不多做解释了.
And then to the third Prometheus,在云原生的大背景下 Prometheus The fact that the is monitoring standards.它不仅仅是一个时间序列数据库,But a monitoring system,It has a full set of data fetching、检索、绘图、报警等功能.Prometheus 很大程度上受 Google 内部 Borgmon 系统的启发,基于 pull Pattern implementation of monitoring and control system.需要强调的是,PromQL Is born to monitor the query language.
4.CeresDB 有什么优势?
We are about to open source is that CeresDB 最新的版本,上面介绍了 3 A temporal database industry very much,You may have friends there will be a problem:”CeresDB 和它们有什么区别,CeresDB If the wheels of the repeat?“Industry focus on monitoring or IoT Field and support many temporal database of multidimensional query,But at the same time, huge amounts of data analysis scenarios to temporal database little.
相对应的,开源版本 CeresDB Based on the internal,Its positioning for high performance、分布式的、Schema-Less A new generation of time series database.Different from traditional time-series database,CeresDB Goal is not only able to handle with the conventional temporal characteristics(Timeseries)的数据,At the same time also to be able to cope with the complex analytical scene.
In traditional time-series data,Common methods for handling is to Tag 做倒排索引,但是在实际的使用过程中,会发现 Tag The complexity of the is different in different scenarios,Some scene complexity will be very high,As a direct result of inverted index completely unable to work,针对这样的场景,Type is often based on the analysis of database by means of(扫描+剪枝)可以达到较好的效果.CeresDB In the ant internal temporal data after a large number of practical experience,Considering from the whole design on the Timeseries 和 Analytical Two different scenarios,采用不同的存储、查询模式,To achieve good results on a comprehensive.
CeresDB Will be compatible with Prometheus、OpenTSDB Traditional time-series database such as protocol and ecology and provide high throughput of writing at the same time,还将支持 SQL Query and provide similar Kdb+ 以及 InfluxDB IOx 的分析能力(OLAP).
5.为什么要开源?
CeresDB 还处于一个相对早期的阶段,The formal open source and released at the same time 0.2.0 Version is just a contains a Analytical Engine Analysis of type single temporal database and basic distributed solution.But want to explain is that,It is still one of the available version,It landed in some scenarios of the group of ants.在此版本之前,We are inside the ants also accumulated a large number of sequential use experience,In recent years research and development in the field of temporal we learn a lot of nutrition from community also,Now we want to have a fusion team for many years experience of a new generation of products back to the community.
此外,除了CeresDB,We later will open source more components in this field.As an open project,We believe that the research and development on the basis of open source,Will help us continue to absorb the idea of community,Push the project go forward.项目开源之后,All research and development and related work in the community transparent operation,Hope to have more friends to participate in the project development.Our long-term goal is hope CeresDB Be really accepted by the developer based software.
6.Release 0.2.0 概要说明
在本次发布的 v0.2.0 版本,CeresDBImplements the time-series analysis engine research and development,支持常见 SQL 进行读写操作;Implements the static distributed deployment scheme,Completed the cloud native cloth type cluster solution some preliminary work.Also perfected the relevant documents,方便开发者使用、了解 CeresDB.
0.2.0 版本主要特性:
●Implements the time-series analysis engine,Implement the table model,支持常见 SQL 进行读写操作
●The underlying storage support local file with ali cloud OSS
●WAL Based on the support of the local RockDB 与 OBKV
●Deployment scheme supports single and distributed deployment based on configuration file
●支持 MySQL 通信协议
7.加入CeresDB 社区
Whether you are planning or implementation time-series data storage and analysis of the related project?Whether you are a headache existing sequential store huge amounts of time line tuning problem?
非常欢迎你参与 CeresDB 开源社区,我们期待你的参与:
项目 Github 主仓库:
https://github.com/CeresDB/ceresdb
Detailed milestone can see:https://github.com/CeresDB/ceresdb/blob/main/docs/dev/roadmap.md
本文分享自微信公众号 - 支付宝技术(Ant-Techfin).
如有侵权,请联系 [email protected] 删除.
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享.
边栏推荐
猜你喜欢
《支付宝体验设计精髓》一书,跟测试相关性知识记录
我的四周年创作纪念日
[Sql brush topic] Query information data--Day1
Those things about the curl command
Infrared image filtering
Jmeter - Heap配置原因报错Invalid initial heap size: -Xms1024m -Xmx2048mError
红外图像滤波
MogDB学习笔记-环境准备及单实例安装
华为企业组网实例:VRRP+MSTP典型组网配置
Pedestrian fall detection experiment based on YOLOV5
随机推荐
Yuanguo chain game system development
c sqlite ... ...
MySQL远程备份策略举例
Openharmony first experience (1)
哈佛架构 VS 冯·诺依曼架构
手把手教你CSP系列之script-src
SQL Server 遇到报错解决办法--更新中
win10 uwp win2d 使用 Path 绘制界面
win10 uwp json
华为交换机:STP测试实验
华为WLAN技术:AP上线及相关模板的配置实验
ACP-Cloud Computing By Wakin自用笔记(1)云计算基础、虚拟化技术
按需视觉识别:愿景和初步方案
5G NR 笔记记录
成品升级程序
性能测试流程
Jmeter - Heap配置原因报错Invalid initial heap size: -Xms1024m -Xmx2048mError
ACP-Cloud Computing By Wakin自用笔记(2)CPU和内存虚拟化
T+Cloud:构建新型生意社交网络和营销关系的“智公司”
密码学系列之:PEM和PKCS7,PKCS8,PKCS12