当前位置:网站首页>Distributed block device replication: client
Distributed block device replication: client
2022-07-27 10:39:00 【Kun Yu】
Distributed block device replication , English name Distributed Replicated Block Device, Referred to as DRBD, It is driven by the application boot kernel 、 Network based block replication storage solution , It is mainly used for disk partition between servers 、 Logical volume and so on . When a user writes data to a local disk , It also sends data to the disk of another host in the network , So the local host ( Master node ) With the remote host ( Collocated node ) To ensure real-time synchronization , When something goes wrong with the local host , A copy of the same data is retained on the remote host and can continue to be used , Data security is guaranteed .
DRBD The core function of is data mirroring , The implementation method is to mirror the entire disk device or disk partition through the network , The data of one node is transmitted to another remote node in real time through the network , Ensure the consistency of data between two nodes , This is a bit like a network RAID The function of .
Chapter Preview :
1. DRBD Description of transmission between nodes
2. DRBD client
2.1 Partial function analysis
2.2 Function function execution process
2.3 drbdsetup Execution process
Chapter content :
1. DRBD Description of transmission between nodes
Refer to the following :
Per bit 4KB( page ) Storage granularity of ,
* And a few TiB The storage size of ,
* And possible low bandwidth replication ,
* Bitmap transfer time may be too long ,
* If transmitted in plain text .
*
* We try to reduce the bitmap information transmitted
* By encoding the running length of bit polarity .
*
* actually , We never need coding “ zero ”( The running length is positive ).
* But we have to store the first value .
* therefore , If the first running length
* Give the set or unset digits .
*
* We assume that large areas are either completely fixed or not fixed ,
* Any run length method can provide a good compression effect ,
* Even if the run length is encoded as a fixed size 32 position /64 An integer .
*
* For all that , There may also be areas where the polarity is reversed every few digits ,
* And encode the running length sequence of these regions with a fixed size
* Integers are much worse than plain text .
We want to encode a small run length value with the minimum code length ,
* At the same time, it can encode a large number of zeros .
*
* therefore , We need a variable length integer encoding ,VLI.
*
* In some cases , We will generate more code points than plain text input .
* We need to send uncompressed blocks as plaintext , Skip them
* Then see if the next piece is compressed better .
*
* We don't care much “ good ” Large compression ratio
* Running length ( All settings / Remove all ): Whether to achieve 100 times
* perhaps 1000 It's not a big problem .
* We don't want to “ noisy ” Waste too much time in sprint
* Part of the bitmap .
*
*VLI Variants of are emerging in endlessly , We have tried :
** Based on simple bytes
** Based on different bits and different codeword lengths .
*
* To avoid another configuration parameter ( Bitmap compression options
* Algorithm ) It is difficult to explain and adjust , We just chose one
* The variant with the best results in all test cases .
2. DRBD client
DRBD By the client (drbdadm、drbdsetup、drbdmeta)、 The kernel module (drbd.ko、drbd_transport_tcp.ko) And related scripts , To build high availability clusters . Next , analysis DRBD Client execution process .
open user/v9/drbdadm_main.c file , Find the following structure :
struct adm_cmd *cmds[] = {
/* name, function, flags
* sort order:
* - normal config commands,
* - normal meta data manipulation
* - sh-*
* - handler
* - advanced
***/
&attach_cmd,
&disk_options_cmd,
&detach_cmd,
&new_peer_cmd,
&del_peer_cmd,
&new_path_cmd,
&del_path_cmd,
&connect_cmd,
&net_options_cmd,
...
drbd The client defines a task type in the form of an array , A task category may include multiple function functions , Such as additional parameter function 、 Disk option function 、 Connect communication functions, etc . Refer to most task categories :
struct adm_cmd new_minor_cmd;
struct adm_cmd new_resource_cmd;
struct adm_cmd res_options_cmd;
struct adm_cmd res_options_defaults_cmd;
struct adm_cmd attach_cmd;
struct adm_cmd disk_options_cmd;
struct adm_cmd disk_options_defaults_cmd;
struct adm_cmd resize_cmd;
struct adm_cmd new_peer_cmd;
struct adm_cmd del_peer_cmd;
struct adm_cmd new_path_cmd;
struct adm_cmd del_path_cmd;
struct adm_cmd connect_cmd;
struct adm_cmd net_options_cmd;
struct adm_cmd net_options_defaults_cmd;
struct adm_cmd peer_device_options_defaults_cmd;
struct adm_cmd disconnect_cmd;
struct adm_cmd detach_cmd;
struct adm_cmd del_minor_cmd;
struct adm_cmd proxy_conn_down_cmd;
struct adm_cmd proxy_conn_up_cmd;
struct adm_cmd proxy_conn_plugins_cmd;
struct adm_cmd proxy_reconf_cmd;
static const struct adm_cmd invalidate_setup_cmd;
static const struct adm_cmd forget_peer_setup_cmd;
Next , Demonstrate the general calling method of function functions (_adm_drbdmeta Function through command->function call ):
rv = parse_options(argc, argv, &cmd, &resource_names);
Parameters passed in during execution to obtain whether it belongs to a valid function and resource name
config_file = config_file_from_arg(resource_names[0]);
Access to resources
if (config_from_stdin)
config_save = config_file;
else
config_save = canonify_path(config_file);
Load resource content
my_parse();
Parsing resources ( Task type -> Function function )
r = call_cmd(cmd, &ctx, EXIT_ON_FAIL);
Perform function functions
r = run_deferred_cmds();
Keep the task on schedule , Release resources and exit the tool after execution
2.1 Partial function analysis
Next , Analyze and track important functions .
Functions related to kernel communication :
static struct genl_sock *genl_connect(__u32 nl_groups):
struct genl_sock *s = calloc(1, sizeof(*s));
...
s->s_fd = socket(AF_NETLINK, SOCK_DGRAM, NETLINK_GENERIC);
...
Used to initialize GenericNetlink The descriptor
genl_send(struct genl_sock *s, struct msg_buff *msg):
struct nlmsghdr *n = (struct nlmsghdr *)msg->data;
...
return do_send(s->s_fd, msg->data, n->nlmsg_len);
adopt GenericNetlink Descriptor writes data
int genl_recv_msgs(struct genl_sock *s, struct iovec *iov, char **err_desc, int timeout_ms):
struct nlmsghdr *nlh;
int c = genl_recv_timeout(s, iov, timeout_ms);
...
Set the timeout to receive through the kernel GenericNetlink Data sent , The core call of this function is genl_recv_timeout in , Get into genl_recv_timeout function :
pfd.fd = s->s_fd;
pfd.events = POLLIN;
if ((poll(&pfd, 1, timeout_ms) != 1) || !(pfd.revents & POLLIN))
...
n = recvmsg(s->s_fd, &msg, flags);
adopt GenericNetlink The descriptor ( call genl_connect Function to obtain ) Reading data
struct genl_sock *genl_connect_to_family(struct genl_family *family):
...
s = genl_connect(family->nl_groups);
...
if (genl_send(s, msg)) {
...
if (genl_recv_msgs(s, &iov, NULL, 3000) <= 0) {
...
Carry out the alignment GenericNetlink Initialize and interact with the kernel GenericNetlink Service module establishes connection , To gain group id And number id
2.2 Function function execution process
Next , We go through adm_create_md Function understanding "drbdadm crate-md mystore.res" What has been done (adm_create_md By the number of times , The whole synchronization process may execute this function many times ):
set_peer_in_resource(ctx->res, true);
Set up peer resources ( Such as net Options )
tb = run_adm_drbdmeta(ctx, "read-dev-uuid");
This function is used to start drbdmeta Program , Next , Inside the function :
if(pid == 0) {
...
rr = _adm_drbdmeta(&local_ctx,
SLEEPS_VERY_LONG|
DONT_REPORT_FAILED,
NULL);
adopt m_system_ex -> m__system Function startup drbdmeta Program , And then find user/shared/drbdmeta.c Of main function ( Start execution drbdmeta Program ):
...
int c = getopt_long(argc, argv, make_optstring(metaopt), metaopt, 0);
After this function parses the parameters , Then continue down :
cfg = new_cfg();
Allocation configuration , Such as ops、md_device_name、drbd_dev_name、minor( Equipment main version number ) wait , Here you can get the data related to the interaction with the kernel driver (vfs such , Such as open)
if (parse_format(cfg, argv + ai, argc - ai, &ai)) {
...
}
Continue to follow up
struct format_ops f_ops[] = {
[DRBD_V06] = {
.name = "v06",
.args = (char *[]){"minor", NULL},
.parse = v06_parse,
.open = v06_md_open,
.close = generic_md_close,
.md_initialize = v06_md_initialize,
.md_disk_to_cpu = v06_md_disk_to_cpu,
.md_cpu_to_disk = v06_md_cpu_to_disk,
.get_gi = m_get_gc,
.show_gi = m_show_gc,
.set_gi = m_set_gc,
.outdate_gi = m_outdate_gc,
.invalidate_gi = m_invalidate_gc,
},
...
}
stay parse_format Function f_ops Array , You can see the virtual block device function description
return cfg->ops->parse(cfg, argv + 1, argc - 1, ai);
open The respective disk members will be read , And will “ Superblock ” Metadata copied to struct mem _cpu in
if (strcmp(cfg->drbd_dev_name, "-")) {
cfg->minor = dt_minor_of_dev(cfg->drbd_dev_name);
...
cfg->lock_fd = dt_lock_drbd(cfg->minor);
...
}
Find the device subversion number , And before exiting the function , Get a memory lock
...
rv = command->function(cfg, argv + ai, argc - ai);
So this is calling theta meta_create_md function ( Different functions perform different functions ), Then enter the function(meta_create_md):
if (is_v09(cfg)) {
...
}
Check that the configuration is DRBD_V09 edition
...
err = cfg->ops->open(cfg);
perform open function
if (err == VALID_MD_FOUND_AT_LAST_KNOWN_LOCATION) {
...
}
adjustment / Test standby machine ( offline ) Metadata
err = err || cfg->ops->md_cpu_to_disk(cfg);
The cache should be written to disk ( The kernel processes the actual data , Here is the logical processing ), Largest piece ( once ) It's about 128M, You can set... In the configuration file
if (!err)
wipe_after_convert(cfg);
After writing successfully , Clear cache
err = cfg->ops->close(cfg)
go back to meta_create_md function , perform close function , sign out meta_create_md function , Here, one synchronization is completed
go back to m__system function ...
*ex = rv;
Return the result to _adm_drbdmeta function (_adm_drbdmeta The function function consists of ctx->cmd To decide , That is, internally invoked command->function, It can be synchronized as a native , It can also be used as the other party's machine synchronization )
...
In the kernel chapter, we will talk about the calling relationship with the kernel
}
...
go back to run_adm_drbdmeta, Keep going down :
if(!device_uuid) {
get_random_bytes(&device_uuid, sizeof(uint64_t));
}
obtain uuid
if (send) {
uri = ssprintf("http://"HTTP_HOST"/cgi-bin/insert_usage.pl?"
"nu="U64"&ru="U64"&rs="U64,
ni.node_uuid, device_uuid, device_size);
make_get_request(uri);
}
If the local machine is the primary node , A data synchronization request should be sent
...
2.3 drbdsetup Execution process
drbdsetup So far, it has been found that “/lib/drbd/drbd stop” When the ( External call , It is called internally by other tools ), It is generally used to configure loading kernel Of DRBD modular , Next into user/v9/drbdsetup_main.c In the document main function :
return drbdsetup_main(argc, argv);
Get into drbdsetup_main function
if (!modprobe_drbd()) {
...
}
Check drbd.ko Whether to mount
maybe_exec_legacy_drbdsetup(argv);
Check if it is drbdsetup-83 or drbdsetup-84 And implement
drbd_sock = genl_connect_to_family(&drbd_genl_family);
Yes GenericNetlink Initialize and interact with the kernel GenericNetlink Service module establishes connection
if ((context & CTX_MINOR) && !cmd->lockless)
lock_fd = dt_lock_drbd(minor);
Lock the open descriptor
rv = cmd->function(cmd, argc, argv);
Call function
if ((context & CTX_MINOR) && !cmd->lockless)
dt_unlock_drbd(lock_fd);
Unlock descriptor
...
Function execution complete , Program exit
边栏推荐
猜你喜欢

TDengine 商业生态合作伙伴招募开启

Redis data structure analysis (II)

Warning: remote head references to nonexistent ref, unable to checkout error messages

Two architectures of ETL (ETL architecture and ELT Architecture)

PHP generates text and image watermarks

Basic statement of database operation

Metasploit Eternal Blue attack

MySQL日志管理、备份与恢复

Establishment of NFS server

flask_restful中的输出域(Resource、fields、marshal、marshal_with)
随机推荐
Matlab sound classification based on short-time neural network
Basic statement of database operation
flask_ Output fields in restful (resources, fields, marshal, marshal_with)
【英雄哥六月集训】第 26天: 并查集
Multipoint bidirectional republication and routing strategy
Eslint's error message module error (from./node_modules/ [email protected] @Eslint loader / index. JS)
Preparation for Android interview (including the whole process of interview, interview preparation, interview questions and materials, etc.)
[Linux] install redis
文档智能多模态预训练模型LayoutLMv3:兼具通用性与优越性
Li Kou brush question 02 (sum of three numbers + sum of maximum subsequence + nearest common ancestor of binary tree)
Custom page 01 of JSP custom tag
[Linux] install MySQL
免费 DIY 之旅问题
一起学习C语言:结构体(二)
解决ORCLE-ORA-01122 01110 01210
sql注入
[brother hero June training] day 28: dynamic planning
Family Trivia
wind10配置adb命令
Ubuntu及Mysql快速入门教程