当前位置:网站首页>Get to know druid IO real time OLAP data analysis storage system

Get to know druid IO real time OLAP data analysis storage system

2022-06-13 03:28:00 TRX1024


lxw1234.com Website reading and learning Druid Organized notes
Reference link :http://lxw1234.com/archives/2015/11/563.htm


One 、 brief introduction

Druid It's an open source , A distributed , Column storage , Storage system for real-time data analysis , Able to quickly aggregate 、 Flexible filtering 、 Millisecond level query 、 And low latency data import .

characteristic :

  • Druid High availability is fully considered in the design , All kinds of nodes will not make druid Stop working ( But the status cannot be updated );
  • Druid The coupling between the components in the is low , If you do not need real-time data, you can completely ignore the real-time nodes ;
  • Druid Use Bitmap indexing Accelerate query speed of column storage , And use CONCISE Algorithm to bitmap indexing Compress , To cause to be generated segments Much smaller than the original text file ;

Two 、 The overall architecture

Druid Cluster composition and data flow :
 Insert picture description here
Druid Contains five types of nodes : Realtime、Historical、Coordinator、Broker、Indexer

  • Historical: The history node is used to store and query “ history ” data ( Not in real time ) The workspace , It will be from deep storage (Deep Storage) Loading data segments in (Data/Segments), Respond to Broker Node's query request and return results . The history node usually synchronizes some data segments on the deep storage locally , So even if the deep storage area is inaccessible , The history node can still query the synchronized data segments .
  • Realtime: A real-time node is a workspace for storing and querying real-time data , It also responds to Broker Node's query request and return results . The real-time node will periodically create data segments and move them to the history node .
  • Coordinator: The coordination node can be considered as Druid Medium master, It passes through Zookeeper Manage historical nodes and real-time nodes , And through Mysql Medium metadata Manage data segments .
  • Broker: The node is responsible for responding to external query requests , By inquiring Zookeeper Forward the request to the historical node and the real-time node respectively , Finally merge and return the query results to the external , from Broker Node passing zookeeper Determine which historical and real-time nodes provide services .Broker Node usage Zookeeper To determine which Realtime and Historical The existence of nodes .
  • Indexer: The inode is responsible for data import , Load batch and real-time data into the system , And the data stored in the system can be modified .

Druid contain 3 External dependencies :Mysql、Deep storage、Zookeeper

  • Mysql: Store information about Druid Medium metadata Instead of storing actual data , contain 3 A watch :”druid_config”( It's usually empty ), “druid_rules”( Some rule information used by collaboration nodes , Like which segment From which node Go to load) and “druid_segments”( Storage Every segment Of metadata Information );
  • Deep storage: Storage segments, preservation “ Cold data ”,Druid Currently, local disks are supported ,NFS Mount the disk ,HDFS,S3 etc. .Deep Storage Data are available. 2 Sources , One is batch data intake , The other comes from the real-time node ;
  • ZooKeeper: By Druid Used to manage the current cluster The state of , Such as what to record segments Moved from real-time node to historical node ;

3、 ... and 、 install and configure

Reference resources :http://lxw1234.com/archives/2015/11/554.htm

Four 、 Data import

How to import data

  • file . from HDFS, S3, Local files, etc .

  • Stream push. Use Tranquility The tool provides real-time information to Druid Intake data , This method is usually used for Kafka, Storm, Spark Streaming etc. .

  • Stream pull. Pull data from external data sources , Not commonly used .

原网站

版权声明
本文为[TRX1024]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202280529584555.html