当前位置:网站首页>Blog recommendation | Apache pulsar cross regional replication scheme selection practice
Blog recommendation | Apache pulsar cross regional replication scheme selection practice
2022-07-07 12:55:00 【StreamNative】
Apache Pulsar Is a message set 、 Storage 、 Cloud native distributed message flow platform integrating lightweight functional Computing , The cloud native architecture with separation of computing and storage can easily achieve dynamic capacity expansion , Its native support for multi tenancy 、 many Namespace Level abstraction , At the beginning of the design, the cross regional replication requirements of multiple computer rooms were considered , It has the characteristics of cross region and multi machine room data replication and mutual backup , It can meet cross domain replication in multiple scenarios and levels .
Cross regional replication
Pulsar Naturally, it supports cross regional replication , According to whether the message is asynchronous read-write or not, it can be divided into synchronous replication scheme and asynchronous replication scheme , Users can choose according to specific business needs .
Asynchronous replication
Built in asynchronous multi cluster cross region replication function , adopt Geo-replication The mechanism synchronizes and prepares the cluster data of data centers distributed in different regions . This solution is completely unavailable when a data center cluster fails , You can continue to provide services by transferring to other data center clusters .

With Region1 Cluster data direction Region2 Take cluster replication as an example , analysis Geo-replication Copy the process :
1. When Produce towards Region1 colony Topic When writing data , The local machine room will persist messages to BookKeeper in , At the same time, a Replicator (Replicator contain Replication Cursor and Replication Producer,Cursor It is a cursor that records the stage to which the current data is copied );
2. Replication Producer Will be able to Region1 Of Topic Data sent to Region2 Remote cluster Topic in ;
3. Region2 The remote cluster received Replication Producer After the request , Write data to Region2 Of Topic in ;
4. Remote cluster Region2 After the data is written successfully, it will be given Region1 Clustered Cursor Return to one ACK;
5. Region1 Cluster received ACK After answering, it will pass Replication Producer Continue to send the next message ;
6. thus ,Region2 Of consumer Can be consumed to Region1 colony producer Production data information , vice versa .

According to whether the data between clusters of data centers can be interconnected , Can be Pulsar Asynchronous replication is divided into fully connected , Unidirectional and Failover Pattern :
• All connected : In this mode Topic It looks like a big picture Topic, Producers send to Topic When sending a message , Other clusters can start from their own Topic Consumption to data . All clusters that need to be connected can be configured with the same
configurationStoreServerParameters can share a global ZooKeeper, Multiple cross regional clusters pass through this global ZooKeeper Mutual perception , When a cluster changes , Other clusters will also receive messages .• One way mode : Set data from Cluster1 Cluster replication to Cluster2, Producer sent to Cluster1 in Topic Your message will be automatically synchronized to Cluster2 in Topic. But when the producer sends a message to Cluster2 colony Topic when , Messages will not be synchronized to Cluster1 in Topic.
• Failover Pattern : A special case of one-way replication , Data backup in remote computer room is applicable , There are no producers and consumers , Only after the current cluster goes down , Will switch the corresponding producers and consumers to the remote cluster to continue to use .
Synchronous replication
Compared with asynchronous scheme , Synchronous replication provides a strongly consistent replication scheme , Single in this scheme Pulsar Clusters are distributed in multiple data centers , When the data is dropped, it will limit that each message must cross the machine room / Only when the region is written successfully , It can ensure the consistency of data between different data centers , Synchronous replication can be done through BookKeeper Client Cross rack / Cross regional perception ability coordination broker.conf Some parameter settings in .

Scheme comparison summary

Cross domain scheme selection practice and landing scheme design
Model selection practice
Select several physical machines in the two regions to build the environment, and compare and analyze the performance of synchronous and asynchronous replication schemes , Measured between selected cross region nodes ping The network delay is 1.5ms, Available in both areas and one area fault , The experiment is only carried out in a single area available scenario , give the result as follows :
• Synchronous replication , Dual zone available

• Synchronous replication , A zone fault , Only single area is available

• Asynchronous replication , A region as the main cluster produces and consumes news , The messages produced will be asynchronously copied to another region

Analysis and summary of validation results :
1. Delay time : The delay of synchronous scheme is slightly higher than that of asynchronous scheme , In several sets of scenarios tested , Due to the small network delay between cross region nodes , The average end-to-end delay measured by the synchronous scheme is several milliseconds higher than that of the asynchronous scheme , To the extent acceptable ;
2. Data consistency : The synchronous scheme has more obvious advantages than the asynchronous scheme , In the single Region In case of overall unavailability , It can better ensure the availability of data , Basically, there will be no data inconsistency or data loss ;
3. Resource cost occupation : The asynchronous scheme will increase the storage overhead , Synchronization scheme has more advantages . Conclusion : It is more appropriate to choose a synchronization scheme to complete cross domain replication in this practice scenario
The project design

1. share ZooKeeper The cluster adopts the three area deployment scheme (Region1:2 + Region2:2+ Region3:1), Fault in any area , The other two areas can ensure the normal availability of the cluster .
2. share BookKeeper The cluster consists of several Bookie Node composition , Store multiple copies of cross region data on nodes in different regions .
3. Pulsar Instances are divided into single region and cross region , Multiple Pulsar Instances share a set of cross region ZooKeeper and BookKeeper colony . Cross regional instances Broker Clusters are scattered in machine rooms in different areas , All regions have equal status , Cooperate to provide external services , When a region node is not available as a whole , On other areas Broker Still able to provide normal external services .
Open source Pulsar Broker The default is random read / write , During the implementation of this plan, we will Broker The read-write strategy is optimized , The specific changes are as follows :
• For each Broker and Bookie Label nodes , Identify the area to which the node belongs .
• Across the region Pulsar example , When Broker Choose the one that stores dual copies Bookie When nodes are assembled , Ensure that the set contains nodes from different regions ,Broker When reading data, priority should be given to Bookie Node read .
• Single area Pulsar example , When Broker Choose the one that stores dual copies Bookie When nodes are assembled , Only select and Broker In the same area Bookie node , When reading data, there is also a Bookie Node acquisition .

summary
More traditional message queues ,Pulsar More functions , It can cope with many complex scenarios that traditional message queues cannot cope with , Its natural adaptation to cloud native environment , Supports dynamic scaling , Multi protocol extension ( Such as KoP、RoP、AoP And other plug-ins can be connected to the same underlying Pulsar The cluster accesses various clients , It greatly reduces the cost of middleware management and operation ), And built-in cross regional replication and other features , It has become the first choice of message oriented middleware in the cloud native era .
This paper mainly aims at Pulsar The cross domain replication feature of , Analyze the architecture of asynchronous replication and synchronous replication scheme , Advantages and disadvantages and applicable scenarios , Combined with the actual environment, the cross domain replication scheme selection practice and landing scheme design , Hope to help readers understand Pulsar The characteristics of cross regional replication and how to select the scheme in combination with the actual situation .
Reference material :
• Lin Lin . In depth analysis of Apache Pulsar[M]. China industry and information publishing group : Electronic industry press ,2021.
• [ file ] Concept and Architecture - Cross regional replication [1]
• Cloud challenges and solutions on message queues : Tencent cloud's Apache Pulsar practice
Related reading
• The blog recommends | Apache Pulsar Three cross regional replication solutions
• project | Anti downtime ,Pulsar Cross room replication to learn ?
• In cross city practice , How Tencent applies Apache Pulsar
Reference link
[1] Concept and Architecture - Cross regional replication : https://pulsar.apache.org/docs/next/concepts-replication
▼ Focus on 「Apache Pulsar」 Get more technical dry goods ▼
Join in Apache Pulsar Chinese communication group

Click to read the original text , Enter the cross regional replication topic ~
边栏推荐
- 数据库安全的重要性
- How does MySQL create, delete, and view indexes?
- 【从 0 开始学微服务】【01】什么是微服务
- Polymorphism, final, etc
- [statistical learning method] learning notes - support vector machine (I)
- Find ID value MySQL in string
- Day-18 hash table, generic
- 通过Keil如何查看MCU的RAM与ROM使用情况
- leetcode刷题:二叉树19(合并二叉树)
- How to use PS link layer and shortcut keys, and how to do PS layer link
猜你喜欢

ICLR 2022 | pre training language model based on anti self attention mechanism

On valuation model (II): PE index II - PE band

Day-16 set

高瓴投的澳斯康生物冲刺科创板:年营收4.5亿 丢掉与康希诺合作

达晨与小米投的凌云光上市:市值153亿 为机器植入眼睛和大脑

Aike AI frontier promotion (7.7)

红杉中国完成新一期90亿美元基金募集

Leetcode skimming: binary tree 20 (search in binary search tree)

Session

leetcode刷题:二叉树20(二叉搜索树中的搜索)
随机推荐
MySQL导入SQL文件及常用命令
Day-15 common APIs and exception mechanisms
Realize all, race, allsettled and any of the simple version of promise by yourself
2022 examination questions and online simulation examination for safety production management personnel of hazardous chemical production units
Multi row and multi column flex layout
通过Keil如何查看MCU的RAM与ROM使用情况
Ip2long and long2ip analysis
What kind of methods or functions can you view the laravel version of a project?
【从 0 开始学微服务】【00】课程概述
【从 0 开始学微服务】【01】什么是微服务
Query whether a field has an index with MySQL
MySQL importing SQL files and common commands
[爬虫]使用selenium时,躲避脚本检测
【从 0 开始学微服务】【02】从单体应用走向服务化
ip2long之后有什么好处?
怎样重置火狐浏览器
智云健康上市:市值150亿港元 SIG经纬与京新基金是股东
Leetcode skimming: binary tree 20 (search in binary search tree)
图形对象的创建与赋值
2022 polymerization process test question simulation test question bank and online simulation test