当前位置:网站首页>OLAP - Druid introduction

OLAP - Druid introduction

2022-06-22 23:47:00 IT_ one 's mind settles as still water

Catalog

background

characteristic

Basic concepts

Design principles

data format

Data intake

Data query

Applicable scenario


background

Druid It is a distributed data storage system supporting real-time analysis . Popular point theory : It is a high-performance real-time analysis database .2011 year , By American advertising technology company MetaMarkets establish , And in 2012 In open source . The official website address is :http://druid.io/. at present Druid Based on Apache License 2.0 Open source agreement , By Apache incubation , The code is hosted in Github. The latest official website address is :https://druid.apache.org/

Be careful : Ali once opened a project called Druid It's a database connection pool . Same as here Driud Just the same name , There's no connection .)

characteristic

1. A quick query

Memory based data storage improves druid Query speed of , Provides fast aggregation capabilities as well as fast OLAP Query power , Multi tenant design , It is the most ideal way for user oriented analysis and application .druid The granularity of data aggregation can be 1 minute ,5 minute ,1 Hour or 1 God wait .

2. Real time data injection

druid Support real-time streaming data injection , And provides event driven data , Ensure the timeliness and uniformity of events in real-time and offline environments . Typical Lambda framework , Do not change historical data , Real time access to real-time data .

3. Extensible PB Levels of storage

Scalable distributed architecture ,druid Clusters can be easily expanded to PB The amount of data , A million levels of data injection per second . Even if you scale up the data , It can also ensure its timeliness .druid Aggregate data can be partitioned according to time range .

4. Cloud native architecture , High fault tolerance :

druid It can run on commercial hardware , It can also run on the cloud . It can inject data from a variety of data systems , Include hadoop,spark,kafka,storm and samza etc. .

Basic concepts

Design principles

1. A quick query (Fast Query) : Partial data aggregation (Partial Aggregate) + Memory (In-Memory) + Indexes (Index)

2. Level development ability (Horizontal Scalability): Distributed data (Distributed data)+ Parallel query (Parallelizable Query)

3. Real time analysis (Realtime Analytics):Immutable Past , Append-Only Future

data format

druid Before data intake , First of all, you need to define a data source that is Datasource, This dataSource The structure of is Time column (TimeStamp), Dimension column (Dimension) And indicators (Metric).

Time column :druid It will aggregate some data with similar time , Specify a time range when querying .

Dimension column : As a way to identify some statistical dimensions , For example, all kinds of .

Index column : Is the column used for aggregation and calculation , Include count,sum wait .

Data intake

druid There are two ways of data intake , Real time and batch processing .

Data query

druid Two kinds of queries are supported , Native and sql

Applicable scenario

according to Druid We know the characteristics of ,druid Suitable data scenarios :

  • More queries, less changes

  • Queries are mainly aggregated or grouped

  • A quick query

  • Need to support offline and real-time data sources ·

Specific business scenarios :

  • User behavior analysis

  • Real time monitoring of service performance indicators

  • Digital marketing

  • business intelligence / OLAP

原网站

版权声明
本文为[IT_ one 's mind settles as still water]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/173/202206222123586778.html