当前位置:网站首页>Blog recommendation | Apache pulsar cross regional replication scheme selection practice
Blog recommendation | Apache pulsar cross regional replication scheme selection practice
2022-07-07 12:55:00 【StreamNative】
Apache Pulsar Is a message set 、 Storage 、 Cloud native distributed message flow platform integrating lightweight functional Computing , The cloud native architecture with separation of computing and storage can easily achieve dynamic capacity expansion , Its native support for multi tenancy 、 many Namespace Level abstraction , At the beginning of the design, the cross regional replication requirements of multiple computer rooms were considered , It has the characteristics of cross region and multi machine room data replication and mutual backup , It can meet cross domain replication in multiple scenarios and levels .
Cross regional replication
Pulsar Naturally, it supports cross regional replication , According to whether the message is asynchronous read-write or not, it can be divided into synchronous replication scheme and asynchronous replication scheme , Users can choose according to specific business needs .
Asynchronous replication
Built in asynchronous multi cluster cross region replication function , adopt Geo-replication The mechanism synchronizes and prepares the cluster data of data centers distributed in different regions . This solution is completely unavailable when a data center cluster fails , You can continue to provide services by transferring to other data center clusters .
With Region1 Cluster data direction Region2 Take cluster replication as an example , analysis Geo-replication Copy the process :
1. When Produce towards Region1 colony Topic When writing data , The local machine room will persist messages to BookKeeper in , At the same time, a Replicator (Replicator contain Replication Cursor and Replication Producer,Cursor It is a cursor that records the stage to which the current data is copied );
2. Replication Producer Will be able to Region1 Of Topic Data sent to Region2 Remote cluster Topic in ;
3. Region2 The remote cluster received Replication Producer After the request , Write data to Region2 Of Topic in ;
4. Remote cluster Region2 After the data is written successfully, it will be given Region1 Clustered Cursor Return to one ACK;
5. Region1 Cluster received ACK After answering, it will pass Replication Producer Continue to send the next message ;
6. thus ,Region2 Of consumer Can be consumed to Region1 colony producer Production data information , vice versa .
According to whether the data between clusters of data centers can be interconnected , Can be Pulsar Asynchronous replication is divided into fully connected , Unidirectional and Failover Pattern :
• All connected : In this mode Topic It looks like a big picture Topic, Producers send to Topic When sending a message , Other clusters can start from their own Topic Consumption to data . All clusters that need to be connected can be configured with the same
configurationStoreServer
Parameters can share a global ZooKeeper, Multiple cross regional clusters pass through this global ZooKeeper Mutual perception , When a cluster changes , Other clusters will also receive messages .• One way mode : Set data from Cluster1 Cluster replication to Cluster2, Producer sent to Cluster1 in Topic Your message will be automatically synchronized to Cluster2 in Topic. But when the producer sends a message to Cluster2 colony Topic when , Messages will not be synchronized to Cluster1 in Topic.
• Failover Pattern : A special case of one-way replication , Data backup in remote computer room is applicable , There are no producers and consumers , Only after the current cluster goes down , Will switch the corresponding producers and consumers to the remote cluster to continue to use .
Synchronous replication
Compared with asynchronous scheme , Synchronous replication provides a strongly consistent replication scheme , Single in this scheme Pulsar Clusters are distributed in multiple data centers , When the data is dropped, it will limit that each message must cross the machine room / Only when the region is written successfully , It can ensure the consistency of data between different data centers , Synchronous replication can be done through BookKeeper Client Cross rack / Cross regional perception ability coordination broker.conf
Some parameter settings in .
Scheme comparison summary
Cross domain scheme selection practice and landing scheme design
Model selection practice
Select several physical machines in the two regions to build the environment, and compare and analyze the performance of synchronous and asynchronous replication schemes , Measured between selected cross region nodes ping The network delay is 1.5ms, Available in both areas and one area fault , The experiment is only carried out in a single area available scenario , give the result as follows :
• Synchronous replication , Dual zone available
• Synchronous replication , A zone fault , Only single area is available
• Asynchronous replication , A region as the main cluster produces and consumes news , The messages produced will be asynchronously copied to another region
Analysis and summary of validation results :
1. Delay time : The delay of synchronous scheme is slightly higher than that of asynchronous scheme , In several sets of scenarios tested , Due to the small network delay between cross region nodes , The average end-to-end delay measured by the synchronous scheme is several milliseconds higher than that of the asynchronous scheme , To the extent acceptable ;
2. Data consistency : The synchronous scheme has more obvious advantages than the asynchronous scheme , In the single Region In case of overall unavailability , It can better ensure the availability of data , Basically, there will be no data inconsistency or data loss ;
3. Resource cost occupation : The asynchronous scheme will increase the storage overhead , Synchronization scheme has more advantages . Conclusion : It is more appropriate to choose a synchronization scheme to complete cross domain replication in this practice scenario
The project design
1. share ZooKeeper The cluster adopts the three area deployment scheme (Region1:2 + Region2:2+ Region3:1), Fault in any area , The other two areas can ensure the normal availability of the cluster .
2. share BookKeeper The cluster consists of several Bookie Node composition , Store multiple copies of cross region data on nodes in different regions .
3. Pulsar Instances are divided into single region and cross region , Multiple Pulsar Instances share a set of cross region ZooKeeper and BookKeeper colony . Cross regional instances Broker Clusters are scattered in machine rooms in different areas , All regions have equal status , Cooperate to provide external services , When a region node is not available as a whole , On other areas Broker Still able to provide normal external services .
Open source Pulsar Broker The default is random read / write , During the implementation of this plan, we will Broker The read-write strategy is optimized , The specific changes are as follows :
• For each Broker and Bookie Label nodes , Identify the area to which the node belongs .
• Across the region Pulsar example , When Broker Choose the one that stores dual copies Bookie When nodes are assembled , Ensure that the set contains nodes from different regions ,Broker When reading data, priority should be given to Bookie Node read .
• Single area Pulsar example , When Broker Choose the one that stores dual copies Bookie When nodes are assembled , Only select and Broker In the same area Bookie node , When reading data, there is also a Bookie Node acquisition .
summary
More traditional message queues ,Pulsar More functions , It can cope with many complex scenarios that traditional message queues cannot cope with , Its natural adaptation to cloud native environment , Supports dynamic scaling , Multi protocol extension ( Such as KoP、RoP、AoP And other plug-ins can be connected to the same underlying Pulsar The cluster accesses various clients , It greatly reduces the cost of middleware management and operation ), And built-in cross regional replication and other features , It has become the first choice of message oriented middleware in the cloud native era .
This paper mainly aims at Pulsar The cross domain replication feature of , Analyze the architecture of asynchronous replication and synchronous replication scheme , Advantages and disadvantages and applicable scenarios , Combined with the actual environment, the cross domain replication scheme selection practice and landing scheme design , Hope to help readers understand Pulsar The characteristics of cross regional replication and how to select the scheme in combination with the actual situation .
Reference material :
• Lin Lin . In depth analysis of Apache Pulsar[M]. China industry and information publishing group : Electronic industry press ,2021.
• [ file ] Concept and Architecture - Cross regional replication [1]
• Cloud challenges and solutions on message queues : Tencent cloud's Apache Pulsar practice
Related reading
• The blog recommends | Apache Pulsar Three cross regional replication solutions
• project | Anti downtime ,Pulsar Cross room replication to learn ?
• In cross city practice , How Tencent applies Apache Pulsar
Reference link
[1]
Concept and Architecture - Cross regional replication : https://pulsar.apache.org/docs/next/concepts-replication
▼ Focus on 「Apache Pulsar」 Get more technical dry goods ▼
Join in Apache Pulsar Chinese communication group
Click to read the original text , Enter the cross regional replication topic ~
边栏推荐
猜你喜欢
[pytorch practice] write poetry with RNN
Leetcode skimming: binary tree 27 (delete nodes in the binary search tree)
Static vxlan configuration
[pytorch practice] use pytorch to realize image style migration based on neural network
2022聚合工艺考试题模拟考试题库及在线模拟考试
[statistical learning method] learning notes - support vector machine (Part 2)
Importance of database security
2022a special equipment related management (boiler, pressure vessel and pressure pipeline) simulated examination question bank simulated examination platform operation
ISPRS2021/遥感影像云检测:一种地理信息驱动的方法和一种新的大规模遥感云/雪检测数据集
NPM instal reports agent or network problems
随机推荐
2022 practice questions and mock examination of the third batch of Guangdong Provincial Safety Officer a certificate (main person in charge)
How does MySQL create, delete, and view indexes?
ip2long与long2IP 分析
The IDM server response shows that you do not have permission to download the solution tutorial
[statistical learning method] learning notes - logistic regression and maximum entropy model
数据库安全的重要性
2022a special equipment related management (boiler, pressure vessel and pressure pipeline) simulated examination question bank simulated examination platform operation
visual stdio 2017关于opencv4.1的环境配置
Simple implementation of call, bind and apply
基于NeRF的三维内容生成
[statistical learning methods] learning notes - Chapter 4: naive Bayesian method
NPM instal reports agent or network problems
MySQL导入SQL文件及常用命令
【从 0 开始学微服务】【03】初探微服务架构
认养一头牛冲刺A股:拟募资18.5亿 徐晓波持股近40%
用mysql查询某字段是否有索引
Multi row and multi column flex layout
Cryptography series: detailed explanation of online certificate status protocol OCSP
[crawler] avoid script detection when using selenium
Leetcode skimming: binary tree 20 (search in binary search tree)