当前位置：网站首页>OLAP - Druid introduction

OLAP - Druid introduction

2022-06-22 23:47:00 【IT_ one 's mind settles as still water】

Catalog

background

background

Druid It is a distributed data storage system supporting real-time analysis . Popular point theory ： It is a high-performance real-time analysis database .2011 year , By American advertising technology company MetaMarkets establish , And in 2012 In open source . The official website address is ：http://druid.io/. at present Druid Based on Apache License 2.0 Open source agreement , By Apache incubation , The code is hosted in Github. The latest official website address is ：https://druid.apache.org/

（ Be careful ： Ali once opened a project called Druid It's a database connection pool . Same as here Driud Just the same name , There's no connection .）

characteristic

1. A quick query

Memory based data storage improves druid Query speed of , Provides fast aggregation capabilities as well as fast OLAP Query power , Multi tenant design , It is the most ideal way for user oriented analysis and application .druid The granularity of data aggregation can be 1 minute ,5 minute ,1 Hour or 1 God wait .

2. Real time data injection

druid Support real-time streaming data injection , And provides event driven data , Ensure the timeliness and uniformity of events in real-time and offline environments . Typical Lambda framework , Do not change historical data , Real time access to real-time data .

3. Extensible PB Levels of storage

Scalable distributed architecture ,druid Clusters can be easily expanded to PB The amount of data , A million levels of data injection per second . Even if you scale up the data , It can also ensure its timeliness .druid Aggregate data can be partitioned according to time range .

4. Cloud native architecture , High fault tolerance ：

druid It can run on commercial hardware , It can also run on the cloud . It can inject data from a variety of data systems , Include hadoop,spark,kafka,storm and samza etc. .

Basic concepts

Design principles

1. A quick query （Fast Query） : Partial data aggregation （Partial Aggregate） + Memory （In-Memory） + Indexes （Index）

2. Level development ability （Horizontal Scalability）: Distributed data （Distributed data）+ Parallel query （Parallelizable Query）

3. Real time analysis （Realtime Analytics）：Immutable Past , Append-Only Future

data format

druid Before data intake , First of all, you need to define a data source that is Datasource, This dataSource The structure of is Time column （TimeStamp）, Dimension column （Dimension） And indicators （Metric）.

Time column ：druid It will aggregate some data with similar time , Specify a time range when querying .

Dimension column ： As a way to identify some statistical dimensions , For example, all kinds of .

Index column ： Is the column used for aggregation and calculation , Include count,sum wait .