当前位置:网站首页>Blog recommendation | Apache pulsar cross regional replication scheme selection practice
Blog recommendation | Apache pulsar cross regional replication scheme selection practice
2022-07-07 12:55:00 【StreamNative】
Apache Pulsar Is a message set 、 Storage 、 Cloud native distributed message flow platform integrating lightweight functional Computing , The cloud native architecture with separation of computing and storage can easily achieve dynamic capacity expansion , Its native support for multi tenancy 、 many Namespace Level abstraction , At the beginning of the design, the cross regional replication requirements of multiple computer rooms were considered , It has the characteristics of cross region and multi machine room data replication and mutual backup , It can meet cross domain replication in multiple scenarios and levels .
Cross regional replication
Pulsar Naturally, it supports cross regional replication , According to whether the message is asynchronous read-write or not, it can be divided into synchronous replication scheme and asynchronous replication scheme , Users can choose according to specific business needs .
Asynchronous replication
Built in asynchronous multi cluster cross region replication function , adopt Geo-replication The mechanism synchronizes and prepares the cluster data of data centers distributed in different regions . This solution is completely unavailable when a data center cluster fails , You can continue to provide services by transferring to other data center clusters .
With Region1 Cluster data direction Region2 Take cluster replication as an example , analysis Geo-replication Copy the process :
1. When Produce towards Region1 colony Topic When writing data , The local machine room will persist messages to BookKeeper in , At the same time, a Replicator (Replicator contain Replication Cursor and Replication Producer,Cursor It is a cursor that records the stage to which the current data is copied );
2. Replication Producer Will be able to Region1 Of Topic Data sent to Region2 Remote cluster Topic in ;
3. Region2 The remote cluster received Replication Producer After the request , Write data to Region2 Of Topic in ;
4. Remote cluster Region2 After the data is written successfully, it will be given Region1 Clustered Cursor Return to one ACK;
5. Region1 Cluster received ACK After answering, it will pass Replication Producer Continue to send the next message ;
6. thus ,Region2 Of consumer Can be consumed to Region1 colony producer Production data information , vice versa .
According to whether the data between clusters of data centers can be interconnected , Can be Pulsar Asynchronous replication is divided into fully connected , Unidirectional and Failover Pattern :
• All connected : In this mode Topic It looks like a big picture Topic, Producers send to Topic When sending a message , Other clusters can start from their own Topic Consumption to data . All clusters that need to be connected can be configured with the same
configurationStoreServer
Parameters can share a global ZooKeeper, Multiple cross regional clusters pass through this global ZooKeeper Mutual perception , When a cluster changes , Other clusters will also receive messages .• One way mode : Set data from Cluster1 Cluster replication to Cluster2, Producer sent to Cluster1 in Topic Your message will be automatically synchronized to Cluster2 in Topic. But when the producer sends a message to Cluster2 colony Topic when , Messages will not be synchronized to Cluster1 in Topic.
• Failover Pattern : A special case of one-way replication , Data backup in remote computer room is applicable , There are no producers and consumers , Only after the current cluster goes down , Will switch the corresponding producers and consumers to the remote cluster to continue to use .
Synchronous replication
Compared with asynchronous scheme , Synchronous replication provides a strongly consistent replication scheme , Single in this scheme Pulsar Clusters are distributed in multiple data centers , When the data is dropped, it will limit that each message must cross the machine room / Only when the region is written successfully , It can ensure the consistency of data between different data centers , Synchronous replication can be done through BookKeeper Client Cross rack / Cross regional perception ability coordination broker.conf
Some parameter settings in .
Scheme comparison summary
Cross domain scheme selection practice and landing scheme design
Model selection practice
Select several physical machines in the two regions to build the environment, and compare and analyze the performance of synchronous and asynchronous replication schemes , Measured between selected cross region nodes ping The network delay is 1.5ms, Available in both areas and one area fault , The experiment is only carried out in a single area available scenario , give the result as follows :
• Synchronous replication , Dual zone available
• Synchronous replication , A zone fault , Only single area is available
• Asynchronous replication , A region as the main cluster produces and consumes news , The messages produced will be asynchronously copied to another region
Analysis and summary of validation results :
1. Delay time : The delay of synchronous scheme is slightly higher than that of asynchronous scheme , In several sets of scenarios tested , Due to the small network delay between cross region nodes , The average end-to-end delay measured by the synchronous scheme is several milliseconds higher than that of the asynchronous scheme , To the extent acceptable ;
2. Data consistency : The synchronous scheme has more obvious advantages than the asynchronous scheme , In the single Region In case of overall unavailability , It can better ensure the availability of data , Basically, there will be no data inconsistency or data loss ;
3. Resource cost occupation : The asynchronous scheme will increase the storage overhead , Synchronization scheme has more advantages . Conclusion : It is more appropriate to choose a synchronization scheme to complete cross domain replication in this practice scenario
The project design
1. share ZooKeeper The cluster adopts the three area deployment scheme (Region1:2 + Region2:2+ Region3:1), Fault in any area , The other two areas can ensure the normal availability of the cluster .
2. share BookKeeper The cluster consists of several Bookie Node composition , Store multiple copies of cross region data on nodes in different regions .
3. Pulsar Instances are divided into single region and cross region , Multiple Pulsar Instances share a set of cross region ZooKeeper and BookKeeper colony . Cross regional instances Broker Clusters are scattered in machine rooms in different areas , All regions have equal status , Cooperate to provide external services , When a region node is not available as a whole , On other areas Broker Still able to provide normal external services .
Open source Pulsar Broker The default is random read / write , During the implementation of this plan, we will Broker The read-write strategy is optimized , The specific changes are as follows :
• For each Broker and Bookie Label nodes , Identify the area to which the node belongs .
• Across the region Pulsar example , When Broker Choose the one that stores dual copies Bookie When nodes are assembled , Ensure that the set contains nodes from different regions ,Broker When reading data, priority should be given to Bookie Node read .
• Single area Pulsar example , When Broker Choose the one that stores dual copies Bookie When nodes are assembled , Only select and Broker In the same area Bookie node , When reading data, there is also a Bookie Node acquisition .
summary
More traditional message queues ,Pulsar More functions , It can cope with many complex scenarios that traditional message queues cannot cope with , Its natural adaptation to cloud native environment , Supports dynamic scaling , Multi protocol extension ( Such as KoP、RoP、AoP And other plug-ins can be connected to the same underlying Pulsar The cluster accesses various clients , It greatly reduces the cost of middleware management and operation ), And built-in cross regional replication and other features , It has become the first choice of message oriented middleware in the cloud native era .
This paper mainly aims at Pulsar The cross domain replication feature of , Analyze the architecture of asynchronous replication and synchronous replication scheme , Advantages and disadvantages and applicable scenarios , Combined with the actual environment, the cross domain replication scheme selection practice and landing scheme design , Hope to help readers understand Pulsar The characteristics of cross regional replication and how to select the scheme in combination with the actual situation .
Reference material :
• Lin Lin . In depth analysis of Apache Pulsar[M]. China industry and information publishing group : Electronic industry press ,2021.
• [ file ] Concept and Architecture - Cross regional replication [1]
• Cloud challenges and solutions on message queues : Tencent cloud's Apache Pulsar practice
Related reading
• The blog recommends | Apache Pulsar Three cross regional replication solutions
• project | Anti downtime ,Pulsar Cross room replication to learn ?
• In cross city practice , How Tencent applies Apache Pulsar
Reference link
[1]
Concept and Architecture - Cross regional replication : https://pulsar.apache.org/docs/next/concepts-replication
▼ Focus on 「Apache Pulsar」 Get more technical dry goods ▼
Join in Apache Pulsar Chinese communication group
Click to read the original text , Enter the cross regional replication topic ~
边栏推荐
- 聊聊Redis缓存4种集群方案、及优缺点对比
- layer弹出层的关闭问题
- Multi row and multi column flex layout
- .Net下极限生产力之efcore分表分库全自动化迁移CodeFirst
- Preorder, inorder and postorder traversal of binary tree
- 【从 0 开始学微服务】【02】从单体应用走向服务化
- [learn micro services from 0] [02] move from single application to service
- leetcode刷题:二叉树24(二叉树的最近公共祖先)
- JS to convert array to tree data
- leetcode刷题:二叉树22(二叉搜索树的最小绝对差)
猜你喜欢
The left-hand side of an assignment expression may not be an optional property access. ts(2779)
Airserver automatically receives multi screen projection or cross device projection
如何将 @Transactional 事务注解运用到炉火纯青?
NPM instal reports agent or network problems
JS to convert array to tree data
leetcode刷题:二叉树23(二叉搜索树中的众数)
Image pixel read / write operation
ACL 2022 | 序列标注的小样本NER:融合标签语义的双塔BERT模型
leetcode刷题:二叉树24(二叉树的最近公共祖先)
Decrypt gd32 MCU product family, how to choose the development board?
随机推荐
认养一头牛冲刺A股:拟募资18.5亿 徐晓波持股近40%
Layer pop-up layer closing problem
HZOJ #236. 递归实现组合型枚举
通讯协议设计与实现
PHP calls the pure IP database to return the specific address
test
谷歌浏览器如何重置?谷歌浏览器恢复默认设置?
Realize a simple version of array by yourself from
详解ThinkPHP支持的URL模式有四种普通模式、PATHINFO、REWRITE和兼容模式
[learn microservice from 0] [01] what is microservice
ACL 2022 | 序列标注的小样本NER:融合标签语义的双塔BERT模型
事务的七种传播行为
AUTOCAD——大于180度的角度标注、CAD直径符号怎么输入?
- Oui. Migration entièrement automatisée de la Sous - base de données des tableaux d'effets sous net
Leetcode skimming: binary tree 21 (verifying binary search tree)
visual stdio 2017关于opencv4.1的环境配置
Image pixel read / write operation
HZOJ #240. 图形打印四
.Net下極限生產力之efcore分錶分庫全自動化遷移CodeFirst
2022 examination questions and online simulation examination for safety production management personnel of hazardous chemical production units