当前位置:网站首页>Blog recommendation | Apache pulsar cross regional replication scheme selection practice
Blog recommendation | Apache pulsar cross regional replication scheme selection practice
2022-07-07 12:55:00 【StreamNative】
Apache Pulsar Is a message set 、 Storage 、 Cloud native distributed message flow platform integrating lightweight functional Computing , The cloud native architecture with separation of computing and storage can easily achieve dynamic capacity expansion , Its native support for multi tenancy 、 many Namespace Level abstraction , At the beginning of the design, the cross regional replication requirements of multiple computer rooms were considered , It has the characteristics of cross region and multi machine room data replication and mutual backup , It can meet cross domain replication in multiple scenarios and levels .
Cross regional replication
Pulsar Naturally, it supports cross regional replication , According to whether the message is asynchronous read-write or not, it can be divided into synchronous replication scheme and asynchronous replication scheme , Users can choose according to specific business needs .
Asynchronous replication
Built in asynchronous multi cluster cross region replication function , adopt Geo-replication The mechanism synchronizes and prepares the cluster data of data centers distributed in different regions . This solution is completely unavailable when a data center cluster fails , You can continue to provide services by transferring to other data center clusters .
With Region1 Cluster data direction Region2 Take cluster replication as an example , analysis Geo-replication Copy the process :
1. When Produce towards Region1 colony Topic When writing data , The local machine room will persist messages to BookKeeper in , At the same time, a Replicator (Replicator contain Replication Cursor and Replication Producer,Cursor It is a cursor that records the stage to which the current data is copied );
2. Replication Producer Will be able to Region1 Of Topic Data sent to Region2 Remote cluster Topic in ;
3. Region2 The remote cluster received Replication Producer After the request , Write data to Region2 Of Topic in ;
4. Remote cluster Region2 After the data is written successfully, it will be given Region1 Clustered Cursor Return to one ACK;
5. Region1 Cluster received ACK After answering, it will pass Replication Producer Continue to send the next message ;
6. thus ,Region2 Of consumer Can be consumed to Region1 colony producer Production data information , vice versa .
According to whether the data between clusters of data centers can be interconnected , Can be Pulsar Asynchronous replication is divided into fully connected , Unidirectional and Failover Pattern :
• All connected : In this mode Topic It looks like a big picture Topic, Producers send to Topic When sending a message , Other clusters can start from their own Topic Consumption to data . All clusters that need to be connected can be configured with the same
configurationStoreServer
Parameters can share a global ZooKeeper, Multiple cross regional clusters pass through this global ZooKeeper Mutual perception , When a cluster changes , Other clusters will also receive messages .• One way mode : Set data from Cluster1 Cluster replication to Cluster2, Producer sent to Cluster1 in Topic Your message will be automatically synchronized to Cluster2 in Topic. But when the producer sends a message to Cluster2 colony Topic when , Messages will not be synchronized to Cluster1 in Topic.
• Failover Pattern : A special case of one-way replication , Data backup in remote computer room is applicable , There are no producers and consumers , Only after the current cluster goes down , Will switch the corresponding producers and consumers to the remote cluster to continue to use .
Synchronous replication
Compared with asynchronous scheme , Synchronous replication provides a strongly consistent replication scheme , Single in this scheme Pulsar Clusters are distributed in multiple data centers , When the data is dropped, it will limit that each message must cross the machine room / Only when the region is written successfully , It can ensure the consistency of data between different data centers , Synchronous replication can be done through BookKeeper Client Cross rack / Cross regional perception ability coordination broker.conf
Some parameter settings in .
Scheme comparison summary
Cross domain scheme selection practice and landing scheme design
Model selection practice
Select several physical machines in the two regions to build the environment, and compare and analyze the performance of synchronous and asynchronous replication schemes , Measured between selected cross region nodes ping The network delay is 1.5ms, Available in both areas and one area fault , The experiment is only carried out in a single area available scenario , give the result as follows :
• Synchronous replication , Dual zone available
• Synchronous replication , A zone fault , Only single area is available
• Asynchronous replication , A region as the main cluster produces and consumes news , The messages produced will be asynchronously copied to another region
Analysis and summary of validation results :
1. Delay time : The delay of synchronous scheme is slightly higher than that of asynchronous scheme , In several sets of scenarios tested , Due to the small network delay between cross region nodes , The average end-to-end delay measured by the synchronous scheme is several milliseconds higher than that of the asynchronous scheme , To the extent acceptable ;
2. Data consistency : The synchronous scheme has more obvious advantages than the asynchronous scheme , In the single Region In case of overall unavailability , It can better ensure the availability of data , Basically, there will be no data inconsistency or data loss ;
3. Resource cost occupation : The asynchronous scheme will increase the storage overhead , Synchronization scheme has more advantages . Conclusion : It is more appropriate to choose a synchronization scheme to complete cross domain replication in this practice scenario
The project design
1. share ZooKeeper The cluster adopts the three area deployment scheme (Region1:2 + Region2:2+ Region3:1), Fault in any area , The other two areas can ensure the normal availability of the cluster .
2. share BookKeeper The cluster consists of several Bookie Node composition , Store multiple copies of cross region data on nodes in different regions .
3. Pulsar Instances are divided into single region and cross region , Multiple Pulsar Instances share a set of cross region ZooKeeper and BookKeeper colony . Cross regional instances Broker Clusters are scattered in machine rooms in different areas , All regions have equal status , Cooperate to provide external services , When a region node is not available as a whole , On other areas Broker Still able to provide normal external services .
Open source Pulsar Broker The default is random read / write , During the implementation of this plan, we will Broker The read-write strategy is optimized , The specific changes are as follows :
• For each Broker and Bookie Label nodes , Identify the area to which the node belongs .
• Across the region Pulsar example , When Broker Choose the one that stores dual copies Bookie When nodes are assembled , Ensure that the set contains nodes from different regions ,Broker When reading data, priority should be given to Bookie Node read .
• Single area Pulsar example , When Broker Choose the one that stores dual copies Bookie When nodes are assembled , Only select and Broker In the same area Bookie node , When reading data, there is also a Bookie Node acquisition .
summary
More traditional message queues ,Pulsar More functions , It can cope with many complex scenarios that traditional message queues cannot cope with , Its natural adaptation to cloud native environment , Supports dynamic scaling , Multi protocol extension ( Such as KoP、RoP、AoP And other plug-ins can be connected to the same underlying Pulsar The cluster accesses various clients , It greatly reduces the cost of middleware management and operation ), And built-in cross regional replication and other features , It has become the first choice of message oriented middleware in the cloud native era .
This paper mainly aims at Pulsar The cross domain replication feature of , Analyze the architecture of asynchronous replication and synchronous replication scheme , Advantages and disadvantages and applicable scenarios , Combined with the actual environment, the cross domain replication scheme selection practice and landing scheme design , Hope to help readers understand Pulsar The characteristics of cross regional replication and how to select the scheme in combination with the actual situation .
Reference material :
• Lin Lin . In depth analysis of Apache Pulsar[M]. China industry and information publishing group : Electronic industry press ,2021.
• [ file ] Concept and Architecture - Cross regional replication [1]
• Cloud challenges and solutions on message queues : Tencent cloud's Apache Pulsar practice
Related reading
• The blog recommends | Apache Pulsar Three cross regional replication solutions
• project | Anti downtime ,Pulsar Cross room replication to learn ?
• In cross city practice , How Tencent applies Apache Pulsar
Reference link
[1]
Concept and Architecture - Cross regional replication : https://pulsar.apache.org/docs/next/concepts-replication
▼ Focus on 「Apache Pulsar」 Get more technical dry goods ▼
Join in Apache Pulsar Chinese communication group
Click to read the original text , Enter the cross regional replication topic ~
边栏推荐
- leetcode刷题:二叉树20(二叉搜索树中的搜索)
- Importance of database security
- ip2long之后有什么好处?
- Polymorphism, final, etc
- 通讯协议设计与实现
- Leetcode brush questions: binary tree 19 (merge binary tree)
- What kind of methods or functions can you view the laravel version of a project?
- [Q&A]AttributeError: module ‘signal‘ has no attribute ‘SIGALRM‘
- Visual stdio 2017 about the environment configuration of opencv4.1
- Session
猜你喜欢
Day-14 common APIs
Creation and assignment of graphic objects
图形对象的创建与赋值
Session
Importance of database security
Airserver automatically receives multi screen projection or cross device projection
[crawler] avoid script detection when using selenium
云检测2020:用于高分辨率遥感图像中云检测的自注意力生成对抗网络Self-Attentive Generative Adversarial Network for Cloud Detection
What is an esp/msr partition and how to create an esp/msr partition
2022 practice questions and mock examination of the third batch of Guangdong Provincial Safety Officer a certificate (main person in charge)
随机推荐
企业级自定义表单引擎解决方案(十二)--体验代码目录结构
visual stdio 2017关于opencv4.1的环境配置
[pytorch practice] write poetry with RNN
[deep learning] image multi label classification task, Baidu paddleclas
Static vxlan configuration
Find ID value MySQL in string
MPLS experiment
Visual stdio 2017 about the environment configuration of opencv4.1
Cookie
Realize a simple version of array by yourself from
智云健康上市:市值150亿港元 SIG经纬与京新基金是股东
leetcode刷题:二叉树25(二叉搜索树的最近公共祖先)
Unity 构建错误:当前上下文中不存在名称“EditorUtility”
Leetcode skimming: binary tree 21 (verifying binary search tree)
ip2long之后有什么好处?
Day-18 hash table, generic
Cookie
博文推荐|Apache Pulsar 跨地域复制方案选型实践
On valuation model (II): PE index II - PE band
Day22 deadlock, thread communication, singleton mode