当前位置:网站首页>Get to know druid IO real time OLAP data analysis storage system
Get to know druid IO real time OLAP data analysis storage system
2022-06-13 03:28:00 【TRX1024】
Catalog
lxw1234.com Website reading and learning Druid Organized notes
Reference link :http://lxw1234.com/archives/2015/11/563.htm
One 、 brief introduction
Druid It's an open source , A distributed , Column storage , Storage system for real-time data analysis , Able to quickly aggregate 、 Flexible filtering 、 Millisecond level query 、 And low latency data import .
characteristic :
- Druid High availability is fully considered in the design , All kinds of nodes will not make druid Stop working ( But the status cannot be updated );
- Druid The coupling between the components in the is low , If you do not need real-time data, you can completely ignore the real-time nodes ;
- Druid Use Bitmap indexing Accelerate query speed of column storage , And use CONCISE Algorithm to bitmap indexing Compress , To cause to be generated segments Much smaller than the original text file ;
Two 、 The overall architecture
Druid Cluster composition and data flow :
Druid Contains five types of nodes : Realtime、Historical、Coordinator、Broker、Indexer
- Historical: The history node is used to store and query “ history ” data ( Not in real time ) The workspace , It will be from deep storage (Deep Storage) Loading data segments in (Data/Segments), Respond to Broker Node's query request and return results . The history node usually synchronizes some data segments on the deep storage locally , So even if the deep storage area is inaccessible , The history node can still query the synchronized data segments .
- Realtime: A real-time node is a workspace for storing and querying real-time data , It also responds to Broker Node's query request and return results . The real-time node will periodically create data segments and move them to the history node .
- Coordinator: The coordination node can be considered as Druid Medium master, It passes through Zookeeper Manage historical nodes and real-time nodes , And through Mysql Medium metadata Manage data segments .
- Broker: The node is responsible for responding to external query requests , By inquiring Zookeeper Forward the request to the historical node and the real-time node respectively , Finally merge and return the query results to the external , from Broker Node passing zookeeper Determine which historical and real-time nodes provide services .Broker Node usage Zookeeper To determine which Realtime and Historical The existence of nodes .
- Indexer: The inode is responsible for data import , Load batch and real-time data into the system , And the data stored in the system can be modified .
Druid contain 3 External dependencies :Mysql、Deep storage、Zookeeper
- Mysql: Store information about Druid Medium metadata Instead of storing actual data , contain 3 A watch :”druid_config”( It's usually empty ), “druid_rules”( Some rule information used by collaboration nodes , Like which segment From which node Go to load) and “druid_segments”( Storage Every segment Of metadata Information );
- Deep storage: Storage segments, preservation “ Cold data ”,Druid Currently, local disks are supported ,NFS Mount the disk ,HDFS,S3 etc. .Deep Storage Data are available. 2 Sources , One is batch data intake , The other comes from the real-time node ;
- ZooKeeper: By Druid Used to manage the current cluster The state of , Such as what to record segments Moved from real-time node to historical node ;
3、 ... and 、 install and configure
Reference resources :http://lxw1234.com/archives/2015/11/554.htm
Four 、 Data import
How to import data
file . from HDFS, S3, Local files, etc .
Stream push. Use Tranquility The tool provides real-time information to Druid Intake data , This method is usually used for Kafka, Storm, Spark Streaming etc. .
Stream pull. Pull data from external data sources , Not commonly used .
边栏推荐
- Data of all bank outlets in 356 cities nationwide (as of February 13, 2022)
- Level II C preparation -- basic concepts of program design
- MySQL learning summary 7: create and manage databases, create tables, modify tables, and delete tables
- Four ways of array traversal in PHP
- Supervisor -- Process Manager
- Pollution discharge fees of listed companies 2010-2020 & environmental disclosure level of heavy pollution industry - original data and calculation results
- Summary of virtualization technology development
- Masa Auth - SSO and Identity Design
- [azure data platform] ETL tool (4) - azure data factory debug pipeline
- Prefecture level city - air flow coefficient data - updated to 2019 (including 10m wind speed, boundary height, etc.)
猜你喜欢
[JVM Series 5] performance testing tool
Data Governance Series 1: data governance framework [interpretation and analysis]
[azure data platform] ETL tool (2) -- azure data factory "copy data" tool (cloud copy)
Domestic zynq standalone pl-ps interrupt commissioning
Feign based remote service invocation
Panel for measuring innovation efficiency of 31 provinces in China (using Malmquist method)
Neo4j auradb free, the world's leading map database
C # simple understanding - method overloading and rewriting
Prefecture level city - air flow coefficient data - updated to 2019 (including 10m wind speed, boundary height, etc.)
2000-2019 enterprise registration data of all provinces, cities and counties in China (including longitude and latitude, registration number and other multi indicator information)
随机推荐
English grammar_ Mode adverb position
Masa Auth - SSO and Identity Design
Union, intersection and difference sets of different MySQL databases
Four ways of array traversal in PHP
MASA Auth - SSO与Identity设计
MySQL learning summary 6: data type, integer, floating point number, fixed-point number, text string, binary string
2021-08-30 distributed cluster
C语言程序设计——从键盘任意输入一个字符串(可以包含:字母、数字、标点符号,以及空格字符),计算其实际字符个数并打印输出,即不使用字符串处理函数strlen()编程,但能实现strlen()的功能。
C simple understanding - overloaded operator
Stack information, GC statistics
[azure data platform] ETL tool (8) - ADF dataset and link service
QML connecting to MySQL database
The use of curl in PHP
2016. maximum difference between incremental elements
Qt之QTreeView的简单使用(含源码+注释)
look on? What is the case between neo4j and ongdb?
PHP uses the header function to download files
C method parameter: ref
brew工具-“fatal: Could not resolve HEAD to a revision”错误解决
MySQL transaction isolation level experiment