In our previous article , We talked about Multimodal index The design of the , It's a way to Lakehouse Server free and high-performance indexing subsystem , To improve query and write performance . In this blog , We discussed the mechanisms needed to build such a powerful index , Design of asynchronous indexing mechanism , Be similar to PostgreSQL and MySQL And other popular database systems , It supports index building without blocking writes .
background
Apache Hudi Transactions and updates / Delete / Change streams are added to tables on top of elastic cloud storage and open file formats . Hudi A key internal component is the transaction database kernel , It coordinates the right Hudi Reading and writing of tables . The index is the latest subsystem of the kernel . All indexes are stored internally Hudi Merge-On-Read (MOR) In the table , namely The metadata table Keep synchronized with the data table on transactions , Even in case of failure . Metadata tables are also built as Hudi The table service manages itself , Just like the data sheet .
motivation
and Hudi Currently, three indexes are supported ; file 、column_stats and bloom_filter, The number and variety of big data make adding more indexes to further reduce I/O Cost and query latency are imperative . One way to build a new index is to stop all writers , Then create a new index partition in the metadata table , Then recover the writer . As we add more indexes , This may not be ideal , because ,a) It needs to be shut down ,b) It will not expand with more indexes . therefore , You need to dynamically add and delete indexes on tables that are concurrent with writes . Asynchronous indexing has two benefits , Improve write latency and decoupling faults . For those who are familiar with the database system “CREATE INDEX” People will understand how easy it is to create an index , Don't worry about continuous writing . Add an asynchronous index to Hudi A rich set of table services is attempted for Lakehouse Bring the ease of use of similar databases 、 Reliability and performance .
Design
The core of asynchronous indexing with ongoing writes is to ensure that these writes can perform consistent updates to the index , Even if historical data is being indexed in the background . One way to deal with this problem is to lock the index partition completely , Until the historical data is indexed and then catch up . However , The possibility of conflict will only increase with the locking of long-running transactions . The solution of this problem depends on Hudi The three pillars of transaction kernel design :
Hudi File layout
Hudi The data files in the table are organized into file groups , Each filegroup contains multiple file slices . Each slice contains a basic file generated at a particular commit , And a set of log files containing updates to the basic files . This makes it possible to see fine-grained concurrency control in the next section . After initializing the filegroup and writing the basic file , Another writer can record updates to the same filegroup , And a new slice will be created .
Hybrid concurrency control
Asynchronous indexing uses a mixture of optimistic concurrency control and log based concurrency control models . Indexing is divided into two stages : Scheduling and execution .
In the scheduling process , Indexer ( The external process responsible for creating the new index ) Get a short lock , And generate an index plan for the data file , Until the last submission time t
. It initializes the metadata partition corresponding to the requested index , And release the lock after this stage . This should take a few seconds , And no index file will be written at this stage .
During execution , Indexer execution plan , The base file will be indexed ( Corresponds to the moment t
Data files for ) Write metadata partition . meanwhile , Regular ongoing writes continue to record updates to log files in the same filegroup as the base files in the metadata partition . After writing the basic document , The indexer will check t
All subsequent submissions have been completed instant, To ensure that each of them adds entries according to its index plan , Otherwise just stop gracefully . This is when optimistic concurrency control starts , Use the metadata table lock to check whether the writer has affected the overlapping files , If there is a conflict , Then stop , Graceful abort ensures that indexing can be retried idempotent .
Hudi Timeline
Hudi Maintains a schedule of all operations performed on the table at different times . Think of it as an event log , As a core part of inter process coordination . Hudi A fine-grained log based concurrency protocol is implemented on the timeline . To distinguish the index from other write operations , We have introduced a time line called “ Indexes ” New actions for . The state transition of this operation is handled by the indexer . The scheduling index will add a “indexing.requested” instant. The execution phase converts the plan into “inflight” state , Then it is finally converted to “completed” state . The indexer locks only when adding events to the timeline , Instead of locking when writing index files .
The advantages of this design are as follows :
- Data writing and indexing are separate , But they know each other .
- It can be extended to other types of indexes .
- It is suitable for batch and streaming workloads .
Use the timeline as the event log , The mixture of the two concurrency models provides excellent scalability and asynchrony , So that the indexing process and writers and other table Services ( Such as compaction and clustering) Running at the same time .
file
More details about the design and implementation of indexer , Please check out RFC-45. To set up and view the running indexer , Please follow Asynchronous index guide .
Future work
The asynchronous indexing function is Lakehouse First in architecture , It's still evolving . Although the index can be created at the same time as the writer , But deleting indexes requires table level locking , Because tables are usually read by other readers / Writer threads use . therefore , One task is to overcome the current limitation by delaying the deletion of indexes and increasing the amount of asynchrony , So that multiple indexes can be created or deleted at the same time . Another task is to enhance the availability of indexers ; And SQL Integrate with other types of indexes , For example, the second bond Bloom Indexes , be based on Lucene Secondary index of (RFC-52) etc. . We welcome more ideas and contributions from the community .
Conclusion
Hudi The multimodal index and asynchronous index functions of show , Transaction data lake is more than table format and metadata . The basic principle of distributed storage system also applies to Lakehouse framework , And the challenges appear on different scales . Asynchronous indexing on this scale will soon become a necessity . We discussed a method that can be extended to other index types 、 Scalable and non blocking design , And will continue to add more functions to the indexing subsystem based on this framework .
In depth understanding of Apache Hudi More articles on asynchronous indexing mechanism
- A thorough understanding of Apache Hudi Multi version cleaning service
Apache Hudi Provides MVCC Concurrency model , Ensure snapshot level isolation between write side and read side . In this blog, we'll show you how to configure to manage multiple file versions , In addition, we will discuss the cleaning mechanisms available to users , To learn how to maintain the required number of old file versions ...
- Apache Hudi asynchronous Compaction How to summarize
This article is about executing asynchronous Compaction Explore the different deployment models of . 1. Compaction about Merge-On-Read surface , The data uses a column Parquet Files and lines Avro File store , Updates are recorded to the increment file ...
- A thorough grasp of Apache Hudi asynchronous Clustering Deploy
1. Abstract In a previous blog post , We introduced Clustering( Clustering ) Table service to reorganize data to provide better query performance , Without reducing the intake rate , And we already know how to deploy synchronization Clustering, In this blog , We ...
- Apache Hudi Global index of heavy feature interpretation
1. Abstract Hudi Tables allow multiple types of operations , Including the very common ones upsert, Of course to support upsert,Hudi Rely on the indexing mechanism to locate which files the records are in . At present ,Hudi Supports partitioned and non partitioned datasets . A partitioned dataset is a set of files ...
- Apache Hudi Design and architecture
thank Apache Hudi contributor: Wang Xianghu translate & Contribution . Welcome to WeChat official account. :ApacheHudi This article will introduce Apache Hudi Basic concepts of . Design and overall infrastructure . 1. Jane ...
- Apache Hudi Shuang company has been integrated by the top cloud service providers in China !
Yes , Recently, Tencent cloud, a domestic cloud service provider, is in its EMR-V2.2.0 Prior integration of Hudi 0.5.1 Version as its cloud data Lake solution to provide services Apache Hudi stay HDFS Insert updates and ...
- Data Lake framework selection is very tangled ? Understand Apache Hudi Core strengths
The original English text :https://hudi.apache.org/blog/hudi-indexing-mechanisms/ Apache Hudi Use the index to locate the filegroup where the delete operation is located . about Copy-On-W ...
- Apache Hudi The efficient migration mechanism of stock table from the perspective of heavy characteristics
1. Abstract With Apache Hudi Become more and more popular , One of the challenges is how to move the inventory history table to Apache Hudi,Apache Hudi Record level metadata is maintained to provide upserts And the core of incremental pull ...
- Apache Hudi In depth analysis of the file marking mechanism of the kernel
1. Abstract Hudi Support automatic cleaning of unsuccessfully submitted data when writing .Apache Hudi When writing, a marking mechanism is introduced to effectively track the data files written to the storage . In this blog , We will explore in depth the design of the existing direct markup file mechanism , And explains its ...
- 【 translate 】 In depth understanding of python3.4 in Asyncio Library and Node.js The asynchronous IO Mechanism
Reprinted from http://xidui.github.io/2015/10/29/%E6%B7%B1%E5%85%A5%E7%90%86%E8%A7%A3python3-4-Asyncio%E5%BA%93% ...
Random recommendation
- SQLSERVER The wechat public account has opened Sogou wechat search
SQLSERVER The wechat public account has opened Sogou wechat search Please open the link below http://weixin.sogou.com/gzh?openid=oIWsFt-hiIb_oYqQHaBMoNwRB2wM ...
- python code snippet 19
#coding=utf-8 # function def foo(x): print x foo(123) # import httplib def check_web_server(host,port,path ...
- UIWebView Simple use
stay iOS9.0 After that, it needs to be in ifo.plist Add a configuration file to the file , Otherwise, you can only recognize https The opening web address ,http The beginning is not recognized There are three main steps // 1. obtain URL NSURL *url = [NSURL ...
- chrome Learn the basic information of debugging
Learning links : remote-debugging-port relevant : http://blog.chromium.org/2011/05/remote-debugging-with-chrome-develop ...
- Java Of ++ Self increasing
I remember when I started studying in college C Language , The teacher said : There are two forms of self augmentation , Namely i++ and ++i,i++ It means to assign a value before adding 1,++i It's to add... First 1 Post assignment , After many years of understanding in this way, there was no problem , Until you come across the following code , I wonder if my understanding is wrong : ...
- In depth understanding of javascript And this
javascript Medium this It's rich in meaning , It can be a global object , Current object or random object , It all depends on how the function is called . Functions are called in the following ways : Called as object method . Call as function . Call... As a constructor .apply or c ...
- 【 turn 】 The Qualcomm platform android Environment configuration compilation and development experience summary
Original website :http://blog.csdn.net/dongwuming/article/details/12784535 1. The Qualcomm platform android Development summary 1.1 Build a platform environment and development environment for Qualcomm At Qualcomm ...
- hdu 1695 GCD A class + Euler function
Topic link seek $ x\in[1, a] , y \in [1, b] $ Inside \(gcd(x, y) = k\) Of (x, y) The logarithmic . The problem is equivalent to $ x\in[1, a/k] , y \in [1, ...
- polymer skill
1. Add one div Elements We can make such a thing by ourselves , For example, we give the following example div Element to add a is="demo-test" <script> var Poly ...
- bzoj 4712: The flood
[ Authority questions ][https://www.lydsy.com/JudgeOnline/status.php?problem_id=4712&jresult=4] This dynamic \(dp\) Finally, it's not an independent set / ...