当前位置:网站首页>[CEPH] cephfs internal implementation (II): example -- undigested
[CEPH] cephfs internal implementation (II): example -- undigested
2022-06-26 15:12:00 【bandaoyu】
In the previous interview, I was asked to describe the complete process of the next request , The result was not ideal , Try to reorganize today , It's recorded here .

Here's an article article Describe in a simple way VFS Layer page cache in cephfs What will be in “ pit ” And corresponding strategies .
mount What happened after ?
ceph-fuse Not specified rootpath When parameters are ,client Terminal root inode and mds Of root equally . If you specify rootpath Parameters , that client Terminal root inode It is rootpath Last of dentry Of inode, But from this inode Up until mds root inode On the path of all inode It's all going to be client Save a copy ( The red circle node in the following figure ), These parent nodes are used to calculate quota When using , such as quota It may be set at mds root inode On , At this time client End if not mds root inode You can't get quota value , In addition, these parent nodes inode There is no other use .

ceph-fuse The request received must be about a inode Of , This can be done through Client::ll_xxx The function definition shows that . And this inode Must have been from mds Obtained . for instance , Just. mount When finished ,client The metadata information on the end is only root inode( And its parent node inode), At this time FUSE Is not directly to client( namely ceph-fuse) Request a non root inode Of direct child nodes , This is due to VFS+FUSE Module guaranteed .
Open file
adopt open() When the system call opens a file , The passed in parameter is a path ,FUSE The module will start from root A node dentry One dentry Ground traversal ( adopt Client::ll_lookup()) route , Ensure that the inode Are already there client End . If in a just mount well client Request a deep directory path in , In terms of system call, it is just a request , But it's actually client and mds There will be many communications between , The deeper the path, the more communication round trips .
To open a file, you need to give open() Pass in flat_t Indicates read-write permission , These systems flag stay client The end will be converted to CEPH_FILE_MODE_xxx, Then each CEPH_FILE_MODE_xxx Corresponding to a group CAP CEPH_CAP_xxx, If client You already have what you need CAP, Then return to , Otherwise mds Initiate request .

There is only one client when ,client Will act as loner Have it all cap, This is a relatively simple case , Usually it can be done in one request . A little more complicated is the multiple client The situation of .

When client2 towards mds send out OP_OPEN after ,MDS End by end Server::handle_client_open() To process . The process is mainly the four steps in the figure .
The first step of locking is routine operation , To prevent the parent node from being deleted .
The second step issue_new_caps() stay mds End record client2 Claim to need cap, adopt eval() Drive the state of the lock , Because there are new client Join in , And two client You need to write to the file , At this time IFILE lock From the previous EXCL State direction MIX transformation , In this case (client1 After successfully opening the file ,client2 Initiate an open request ),EXCL To MIX The state of cannot be directly in MDS The complete , because client1 Before as loner Granted too many cap, these cap Take it back first , To continue to MIX transformation . Recycling cap It's asynchronous , So the next steps will not be blocked .
The third step check_inode_max_size() It's to record client2 The range that can be written to the file , This data is used as client range Be recorded in the log , Used for fault recovery . This step needs to be done to IFILE Conduct wrlock, There will be no problem when adding the lock this time , Because according to the state machine ,EXCL->MIX It is allowed to EXL The character carries on wrlock Of , just client1 As loner yes XCL role , So locking won't fail . The lock will be released after the log is loaded , Then judge whether to send a new one according to the current status cap To each client, If this is right client1 Of revoke cap Has yet to be completed , Then there will be no new cap Send to client, Because the state of the lock has not changed , You need to keep waiting client1 cap Recovery of ( Of course, the wait is asynchronous ). The entire third step does not block the next step .
Step four will inode Send information back to client2. Only inode Information , No, cap,cap Will be sent separately in other processes client2( When mds Finish right client1 Of cap revoke when ).
Read and write files
After the file is opened, you only have the file handle in the system layer , In the actual call read() or write() front , Of documents CAP May not have been fully granted to client, So whether it's Client::ll_read() still Client::ll_write() We'll do it first get_caps(), Make sure there is a corresponding cap, without , Just wait ( Usually, it is not to MDS Send the request , Because no cap Explain the previous open() The call is not really over yet ,open() Two results are required when the call is completed :1.inode The message is sent back 2.cap Be awarded ).
Read and write files cap Include :
| cap | purpose |
|---|---|
| CEPH_CAP_FILE_RD | Reading a file requires |
| CEPH_CAP_FILE_WR | Writing files requires |
| CEPH_CAP_FILE_CACHE | From Ben client Of cache Intermediate reading files need |
| CEPH_CAP_FILE_CACHE | From Ben client Of cache Chinese writing files need |
FILE_RD and FILE_WR It is necessary to read and write files CAP.FILE_CACHE and FILE_BUFFER Corresponding to the buffer concept in the regular file system , Every client Can put the contents of the document cache In your own memory , If you use data in memory , There must be corresponding two cap.
The four cap stay get_caps() It will increase cap Reference count of , In the corresponding read / write operation from RADOS After the return , The reference count is decremented . When cap The reference count for is not 0 when , Even if you receive mds Of revoke cap request ,client Can not be released cap Of .
author : Songxinxin
link :https://www.jianshu.com/p/79613d9f7160
source : Simple books
The copyright belongs to the author . Commercial reprint please contact the author for authorization , Non-commercial reprint please indicate the source .
边栏推荐
- 小程序:uniapp解决 vendor.js 体积过大的问题
- vue中缓存页面 keepAlive使用
- View触摸分析
- 【TcaplusDB知识库】TcaplusDB运维单据介绍
- One click GCC script installation
- 【TcaplusDB知识库】TcaplusDB单据受理-建表审批介绍
- SAP gui 770 下载
- Cluster addslots establish a cluster
- R language dplyr package summary_ The at function calculates the mean and median of multiple data columns (specified by vectors) in the dataframe data, and specifies na RM parameter configuration dele
- Shell script multi process concurrent writing method example (high level cultivation)
猜你喜欢

【TcaplusDB知识库】TcaplusDB系统用户组介绍

【TcaplusDB知识库】TcaplusDB常规单据介绍

Lexin AWS IOT expresslink module achieves universal availability

Solution to the upper limit of TeamViewer display devices

文献1

Smoothing data using convolution

Bank of Beijing x Huawei: network intelligent operation and maintenance tamps the base of digital transformation service
MySQL数据库基本SQL语句教程之高级操作

10分钟了解BIM+GIS融合,常见BIM数据格式及特性

HR export data Excel VBA
随机推荐
Talk about the recent situation of several students from Tsinghua University
【ceph】CephFS 内部实现(三):快照
打新债注册开户安全吗,有没有什么风险?
nvidia-smi 报错
5 figures illustrate the container network
Unity unitywebrequest download package
Redis cluster
设计人员拿到的工程坐标系等高线CAD图如何加载进图新地球
Unity C# 网络学习(九)——WWWFrom
Pod scheduling of kubernetes
Restcloud ETL extraction de données de table de base de données dynamique
【ceph】CEPHFS 内部实现(一):概念篇--未消化
Redis集群消息
teamviewer显示设备数量上限解决方法
【TcaplusDB知识库】TcaplusDB运维单据介绍
Talk about the RPA direction planning: stick to simple and valuable things for a long time
R language GLM function logistic regression model, using epidisplay package logistic The display function obtains the summary statistical information of the model (initial and adjusted odds ratio and
【TcaplusDB知识库】TcaplusDB单据受理-建表审批介绍
R语言使用epiDisplay包的aggregate函数将数值变量基于因子变量拆分为不同的子集,计算每个子集的汇总统计信息、使用aggregate.data.frame函数计算分组汇总统计信息
Execution of commands in the cluster