当前位置:网站首页>[distributed theory] (II) distributed storage
[distributed theory] (II) distributed storage
2022-07-07 17:38:00 【Lin like】
Distributed storage
Is there distribution or big data first ? This is a question worth thinking about . Because of big data, data is distributed storage , Because a single machine cannot store , So we need distributed storage . however , On the other hand , Our data generation is naturally distributed , But our general idea is centralized storage , Easy to manage .
General idea of distributed storage , Is to slice big data , Store between multiple nodes according to a certain policy , This strategy should ensure that the data is evenly distributed , To ensure the uniform load of nodes ; At the same time, the distribution of data should also have a certain stability , The phenomenon of large-scale data migration cannot be caused by the change of nodes . At the same time, the data should be reliable after dispersion , Adopt redundancy mechanism , Ensure that data will not be abnormally lost . Last , Distributed storage of data , It is necessary to ensure the convenience of data acquisition , And it can be polymerized after being disassembled .
All in all , Problems to be solved in distributed storage : Stability of data distribution , Heterogeneity of data nodes , Availability and reliability of data .
- Stability of data distribution : When a node fails , There will be no large-scale data migration , This means that we need good data distribution algorithms .
- Heterogeneity of data nodes : The performance of data nodes varies , Our data distribution algorithm should consider the bias of data distribution nodes ;
- Availability and reliability of data : It means that our data storage should have certain fault tolerance , For example, replica mechanism 、 Persistence mechanism .
1, Data partitioning mechanism
Stability of data distribution , Depends on our data partitioning strategy , Several common data partitioning algorithms :
- Range based partitioning : For example, according to the age range , Regional scope ;
- List based partitioning : For example, according to the country 、 Provincial and municipal divisions ;
- Loop based partitioning : such as mod A cyclic value ;
- Hash based partitioning : The most common partition , such as hash;
- Composition based partitioning : Combination of the above methods .
Hash based partitioning , It is the most common partition strategy in large-scale distributed systems , So here we mainly discuss several implementation forms of the algorithm :
1, Ordinary hashes , For example, hash according to a certain field of data , And then partition ; But there are node changes , Large scale data migration rehash The phenomenon ;
2, Consistent Hashing , That is, the data is stored in clockwise order hash Ring , When the node changes , Just migrate the data of adjacent nodes . One detail is , Uniformity hash When doing data query, you need to maintain an index table in the node , To locate the actual storage location of the data inside the node . But this way , This will cause some nodes to undertake more data storage tasks , The data load of nodes is high .
3, Consistency with limited load hash, That is, each node has a fixed storage limit , When the upper limit is reached, it will continue to traverse the next node clockwise , Store the data ; However, this approach does not take into account the differences in storage performance caused by heterogeneous nodes ;
4, Consistency with virtual head nodes hash, Virtual nodes are virtual nodes allocated according to node performance differences , That is, nodes with good performance , There will be more virtual nodes , The data will be stored in this node as much as possible . Relatively stable performance .
2, Data replication mechanism
In the distributed environment , How to realize the consistency of data replication ? Let's take a look at several data replication strategies :
Synchronous replication : The client is writing a message to the master node , Then the master node synchronizes with other slave nodes , Will return the operation success message to the client . Such a mechanism ensures the strong consistency of data , But if there are many slave nodes , The delay of synchronous replication is long , It will inevitably affect the availability . In some financial situations , Applicable to trading occasions . Like our mysql Active and standby cluster solutions , Including our kafka Copy of the cluster ack Mechanisms adopt similar ideas .
Asynchronous replication : The client is writing a message to the master node , The master node immediately returns the information of successful operation , Then asynchronously copy the data to other slave nodes . You can see , This mechanism ensures availability , But at the expense of consistency , The data queried by the client on the master node and the slave node are inconsistent . This scheme is applicable to the situation with low data requirements , our mysql This scheme is adopted by default in the active / standby mode . And our redis colony , This scheme is also adopted to ensure high performance .
Semi-synchronous replication : That is to balance the above two methods , Both consistency and availability . Semi synchronous replication includes two , One is to receive a response from the slave node, which means that the synchronization is successful , One is that half of the slave nodes respond and are considered successful . This scheme involves data inconsistency after synchronization , That is, our data synchronization should be based on which node . One idea is , With leader The data of the node shall prevail , Match according to index records , The data after the inconsistent position will start from the node data , Force synchronization with leader Agreement .
our mysql Cluster solution , Three replication methods can be supported through configuration .
Reference link :
Geek time 《 Distributed principle and algorithm analysis 》
边栏推荐
- Functions and usage of viewswitch
- 第2章搭建CRM项目开发环境(数据库设计)
- 到底有多二(Lua)
- 大笨钟(Lua)
- 跟奥巴马一起画方块(Lua)
- Notification is the notification displayed in the status bar of the phone
- Lex & yacc of Pisa proxy SQL parsing
- LeetCode 515(C#)
- Function and usage of numberpick
- Mrs offline data analysis: process OBS data through Flink job
猜你喜欢
随机推荐
【可信计算】第十二次课:TPM授权与会话
Matplotlib绘制三维图形
serachview的功能和用法
Solid function learning
Jenkins发布uniapp开发的H5遇到的问题
麒麟信安操作系统衍生产品解决方案 | 存储多路径管理系统,有效提高数据传输可靠性
Dateticket and timeticket, functions and usage of date and time selectors
mysql官网下载:Linux的mysql8.x版本(图文详解)
【分布式理论】(一)分布式事务
DatePickerDialog and trimepickerdialog
【可信计算】第十次课:TPM密码资源管理(二)
MySQL implements the query of merging two fields into one field
深入浅出图解CNN-卷积神经网络
企业经营12法的领悟
本周小贴士#135:测试约定而不是实现
Function and usage of textswitch text switcher
DatePickerDialog和trimepickerDialog
Create dialog style windows with popupwindow
Rpcms method of obtaining articles under the specified classification
LeetCode 515(C#)