当前位置:网站首页>[CEPH] cephfs internal implementation (II): example -- undigested

[CEPH] cephfs internal implementation (II): example -- undigested

2022-06-26 15:12:00 bandaoyu

In the previous interview, I was asked to describe the complete process of the next request , The result was not ideal , Try to reorganize today , It's recorded here .

ceph-fuse1.png

 

Here's an article article Describe in a simple way VFS Layer page cache in cephfs What will be in “ pit ” And corresponding strategies .

mount What happened after ?

ceph-fuse Not specified rootpath When parameters are ,client Terminal root inode and mds Of root equally . If you specify rootpath Parameters , that client Terminal root inode It is rootpath Last of dentry Of inode, But from this inode Up until mds root inode On the path of all inode It's all going to be client Save a copy ( The red circle node in the following figure ), These parent nodes are used to calculate quota When using , such as quota It may be set at mds root inode On , At this time client End if not mds root inode You can't get quota value , In addition, these parent nodes inode There is no other use .

rootpath=/dir when

ceph-fuse The request received must be about a inode Of , This can be done through Client::ll_xxx The function definition shows that . And this inode Must have been from mds Obtained . for instance , Just. mount When finished ,client The metadata information on the end is only root inode( And its parent node inode), At this time FUSE Is not directly to client( namely ceph-fuse) Request a non root inode Of direct child nodes , This is due to VFS+FUSE Module guaranteed .

Open file

adopt open() When the system call opens a file , The passed in parameter is a path ,FUSE The module will start from root A node dentry One dentry Ground traversal ( adopt Client::ll_lookup()) route , Ensure that the inode Are already there client End . If in a just mount well client Request a deep directory path in , In terms of system call, it is just a request , But it's actually client and mds There will be many communications between , The deeper the path, the more communication round trips .
To open a file, you need to give open() Pass in flat_t Indicates read-write permission , These systems flag stay client The end will be converted to CEPH_FILE_MODE_xxx, Then each CEPH_FILE_MODE_xxx Corresponding to a group CAP CEPH_CAP_xxx, If client You already have what you need CAP, Then return to , Otherwise mds Initiate request .

client Side view open()


There is only one client when ,client Will act as loner Have it all cap, This is a relatively simple case , Usually it can be done in one request . A little more complicated is the multiple client The situation of .

the second client call open(path,O_RDWR) when mds Processing in

 

When client2 towards mds send out OP_OPEN after ,MDS End by end Server::handle_client_open() To process . The process is mainly the four steps in the figure .
The first step of locking is routine operation , To prevent the parent node from being deleted .
The second step issue_new_caps() stay mds End record client2 Claim to need cap, adopt eval() Drive the state of the lock , Because there are new client Join in , And two client You need to write to the file , At this time IFILE lock From the previous EXCL State direction MIX transformation , In this case (client1 After successfully opening the file ,client2 Initiate an open request ),EXCL To MIX The state of cannot be directly in MDS The complete , because client1 Before as loner Granted too many cap, these cap Take it back first , To continue to MIX transformation . Recycling cap It's asynchronous , So the next steps will not be blocked .
The third step check_inode_max_size() It's to record client2 The range that can be written to the file , This data is used as client range Be recorded in the log , Used for fault recovery . This step needs to be done to IFILE Conduct wrlock, There will be no problem when adding the lock this time , Because according to the state machine ,EXCL->MIX It is allowed to EXL The character carries on wrlock Of , just client1 As loner yes XCL role , So locking won't fail . The lock will be released after the log is loaded , Then judge whether to send a new one according to the current status cap To each client, If this is right client1 Of revoke cap Has yet to be completed , Then there will be no new cap Send to client, Because the state of the lock has not changed , You need to keep waiting client1 cap Recovery of ( Of course, the wait is asynchronous ). The entire third step does not block the next step .
Step four will inode Send information back to client2. Only inode Information , No, cap,cap Will be sent separately in other processes client2( When mds Finish right client1 Of cap revoke when ).

Read and write files

After the file is opened, you only have the file handle in the system layer , In the actual call read() or write() front , Of documents CAP May not have been fully granted to client, So whether it's Client::ll_read() still Client::ll_write() We'll do it first get_caps(), Make sure there is a corresponding cap, without , Just wait ( Usually, it is not to MDS Send the request , Because no cap Explain the previous open() The call is not really over yet ,open() Two results are required when the call is completed :1.inode The message is sent back 2.cap Be awarded ).
Read and write files cap Include :

cap purpose
CEPH_CAP_FILE_RD Reading a file requires
CEPH_CAP_FILE_WR Writing files requires
CEPH_CAP_FILE_CACHE From Ben client Of cache Intermediate reading files need
CEPH_CAP_FILE_CACHE From Ben client Of cache Chinese writing files need

FILE_RD and FILE_WR It is necessary to read and write files CAP.
FILE_CACHE and FILE_BUFFER Corresponding to the buffer concept in the regular file system , Every client Can put the contents of the document cache In your own memory , If you use data in memory , There must be corresponding two cap.
The four cap stay get_caps() It will increase cap Reference count of , In the corresponding read / write operation from RADOS After the return , The reference count is decremented . When cap The reference count for is not 0 when , Even if you receive mds Of revoke cap request ,client Can not be released cap Of .



author : Songxinxin
link :https://www.jianshu.com/p/79613d9f7160
source : Simple books
The copyright belongs to the author . Commercial reprint please contact the author for authorization , Non-commercial reprint please indicate the source .

原网站

版权声明
本文为[bandaoyu]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/177/202206261456117179.html