当前位置:网站首页>Relevant implementation records of CSI and local disk
Relevant implementation records of CSI and local disk
2022-06-28 05:43:00 【Mrpre】
Interface
CSI There are many interfaces , It is logically divided into three categories , Developers need to implement corresponding interfaces according to requirements . External components (node-driver-registrar、external-provisioner etc. ) Will pass unix socket+gpc Call these interfaces .
stay CSI You can often see these sentences in the code , Is to register the corresponding interface .
csi.RegisterIdentityServer(srv, identity)
csi.RegisterControllerServer(srv, ctl)
csi.RegisterNodeServer(srv, node)
The reason why there are so many categories , Because these classes are called by different external components . actually , You developed CSI pod You can register all these classes , then CSI pod + External-Provisioner When deployed together , The CSI pod It will be called controller Class , No one calls functions of other classes , In the same way, you can get other .
perhaps , You can also put your CSI pod It is divided into 3 individual pod Register separately 3 species , that controller class Of CSI pod And the only External-Provisioner together ,CSI pod node And the only … Similar to microservicing , Actually, it is rare .
identity class
// identity The following interfaces need to be implemented
type IdentityServer interface {
GetPluginInfo(context.Context, *GetPluginInfoRequest) (*GetPluginInfoResponse, error)
GetPluginCapabilities(context.Context, *GetPluginCapabilitiesRequest) (*GetPluginCapabilitiesResponse, error)
Probe(context.Context, *ProbeRequest) (*ProbeResponse, error)
}
GetPluginInfo: pick up information , for example drivername, It's a common disk.xx.comGetPluginCapabilities: Return capability ,PluginCapability_Service_CONTROLLER_SERVICE and PluginCapability_Service_VOLUME_ACCESSIBILITY_CONSTRAINTS, The former means to provide controller Ability , The latter expressed their support for topology, This is critical when implementing a local disk , Must support .
controller class
// ControllerServer is the server API for Controller service.
type ControllerServer interface {
CreateVolume(context.Context, *CreateVolumeRequest) (*CreateVolumeResponse, error)
DeleteVolume(context.Context, *DeleteVolumeRequest) (*DeleteVolumeResponse, error)
ControllerPublishVolume(context.Context, *ControllerPublishVolumeRequest) (*ControllerPublishVolumeResponse, error)
ControllerUnpublishVolume(context.Context, *ControllerUnpublishVolumeRequest) (*ControllerUnpublishVolumeResponse, error)
ValidateVolumeCapabilities(context.Context, *ValidateVolumeCapabilitiesRequest) (*ValidateVolumeCapabilitiesResponse, error)
ListVolumes(context.Context, *ListVolumesRequest) (*ListVolumesResponse, error)
GetCapacity(context.Context, *GetCapacityRequest) (*GetCapacityResponse, error)
ControllerGetCapabilities(context.Context, *ControllerGetCapabilitiesRequest) (*ControllerGetCapabilitiesResponse, error)
CreateSnapshot(context.Context, *CreateSnapshotRequest) (*CreateSnapshotResponse, error)
DeleteSnapshot(context.Context, *DeleteSnapshotRequest) (*DeleteSnapshotResponse, error)
ListSnapshots(context.Context, *ListSnapshotsRequest) (*ListSnapshotsResponse, error)
ControllerExpandVolume(context.Context, *ControllerExpandVolumeRequest) (*ControllerExpandVolumeResponse, error)
ControllerGetVolume(context.Context, *ControllerGetVolumeRequest) (*ControllerGetVolumeResponse, error)
}
Unless otherwise specified below , All are external-provisioner Trigger the call CreateVolume: Create disk , If it is a cloud disk , So call RPC Go outside and create a cloud disk ; If it is a local disk , You may want to create a directory under a directory DeleteVolume: Delete disk ControllerPublishVolume: mount , If it is a cloud disk , So it's similar mount -t nfs And so on. , Mount the remote disk to the local directory ; If it is a local disk , It doesn't work . from external-attacher call ControllerUnpublishVolume: uninstall , from external-attacher call ValidateVolumeCapabilities: Generally speaking, is it allowed to volume By many node mount , For example, the active and standby databases of a database need to use the same block volume Will be used ListVolumes: seeing the name of a thing one thinks of its function GetCapacity: Returns the remaining disk size . A very critical interface , If and only if External-Provisioner as well as csidriver Support storagecapacity Will not take effect until
https://github.com/kubernetes-csi/external-provisioner#capacity-supportControllerGetCapabilities: Express controller Which interfaces are supported , In fact, the external component calls the acquisition capability first , Then decide whether to call CreateVolume ControllerPublishVolume etc. CreateSnapshot: snapshot external-snapshot The interface is called DeleteSnapshot: snapshot external-snapshot The interface is called ListSnapshots: snapshot external-snapshot The interface is called ControllerExpandVolume: Capacity expansion , Be careful CSI Currently, volume reduction is not supported .externa-resizer The interface is called ControllerGetVolume: seeing the name of a thing one thinks of its function
node class
// NodeServer is the server API for Node service.
type NodeServer interface {
NodeStageVolume(context.Context, *NodeStageVolumeRequest) (*NodeStageVolumeResponse, error)
NodeUnstageVolume(context.Context, *NodeUnstageVolumeRequest) (*NodeUnstageVolumeResponse, error)
NodePublishVolume(context.Context, *NodePublishVolumeRequest) (*NodePublishVolumeResponse, error)
NodeUnpublishVolume(context.Context, *NodeUnpublishVolumeRequest) (*NodeUnpublishVolumeResponse, error)
NodeGetVolumeStats(context.Context, *NodeGetVolumeStatsRequest) (*NodeGetVolumeStatsResponse, error)
NodeExpandVolume(context.Context, *NodeExpandVolumeRequest) (*NodeExpandVolumeResponse, error)
NodeGetCapabilities(context.Context, *NodeGetCapabilitiesRequest) (*NodeGetCapabilitiesResponse, error)
NodeGetInfo(context.Context, *NodeGetInfoRequest) (*NodeGetInfoResponse, error)
}
The above functions All be kubelet call ,kubelet How to know these interfaces and addresses ? This involves Node-Driver-Registrar It tells these interfaces kubelet.
NodeStageVolume: Specifies the volume Mount to the specified global path On . actually , All you have to do is execute bind mount operation , Why does it exist global path, The purpose is to facilitate multiple people to mount this global path, Achieve readwritemany effect NodeUnstageVolume: Above NodePublishVolume: Specifies the global path Mount to the specified pod path under , This pod path It's actually a container path, When the container is up , And I'll put this pod path bind mount To your own process space . All this function has to do is execute bind mount operation NodeUnpublishVolume: Above NodeGetVolumeStats:NodeExpandVolume: ControllerExpandVolume If... Is specified in the information of NodeExpansionRequired, Will be called NodeGetCapabilities: and ControllerGetCapabilities similar , tell kubelet Which interfaces do you support NodeGetInfo: tell kubelet Their own nodeid、 Maximum volume Number and topology information . Topology information will be typed in node Of label On ,nodeid It will appear in node Of Annotations On , for example {csi.volume.kubernetes.io/nodeid: {disk.xxx.com: $nodeid}}
Coordination logic
With StatefulSet + volumeClaimTemplates + WaitForFirstConsumer Create as an example , When StatefulSet Of yaml Submitted to the apiserver after
1、CO establish PVC
2、external-provisioner watch To this PVC The creation of , If pvc Medium volume.beta.kubernetes.io/storage-provisioner Is your own , Then obtain the operation right
3、external-provisioner Judge If PVC The type is WaitForFirstConsumer, The scheduler must have been created pod, also pvc Medium volume.kubernetes.io/selected-node It says pod In that node On
4、external-provisioner adopt unix socket Call local CSI pod Of CreateVolume function
5、external-provisioner establish PV
6、CO wait for PV After creation , Will create VolumeAttachment object
7、external-attacherwatch To VolumeAttachment object after , adopt unix socket Call local CSI pod Of ControllerPublish function , Set... When done .Status.Attached by true. If csidriver Set up attachRequired by false If you don't go attach technological process .
8、kubelet watch To CSI Type of PV Dispatch to local , So wait VolumeAttachment object .Status.Attached by true, stay 7 After execution , Call your own MountDevice function , It has passed unix socket call NodeStageVolume and NodePublishVolume function
storagecapacity
scheduler in ,Filter->FindPodVolumes->checkVolumeProvisions,scheduler Screening pod When , Will consider storage Of storage capacity , This is it. storagecapacity The ability of , from CSI Of controller GetCapacity Interface spits . It also has a value that tells scheduler MaximumVolumeSize, With this value , Multi disk management is also very convenient . Consider such a situation ,2 A plate , Each remnant 1G, You report to scheduler Own capacity 2G Words , Then be a 2G Of volume Please come here , In fact, it cannot be allocated , because 2 A disk is nonlinear , Tell me at this time scheduler MaximumVolumeSize = 1, You can make scheduler Only dispatch 2 individual 1G volume Of POD.
func (pl *VolumeBinding) Filter(ctx context.Context, cs *framework.CycleState, pod *v1.Pod, nodeInfo *framework.NodeInfo) *framework.Status {
node := nodeInfo.Node()
if node == nil {
return framework.NewStatus(framework.Error, "node not found")
}
state, err := getStateData(cs)
if err != nil {
return framework.AsStatus(err)
}
if state.skip {
return nil
}
podVolumes, reasons, err := pl.Binder.FindPodVolumes(pod, state.boundClaims, state.claimsToBind, node)
......
}
Local dish CSI
Here is a rough record of several key points
1、storageclass Of VolumeBindingMode The field needs to be WaitForFirstConsumer, If it is not so ,volume Will be in advance with pod Create before scheduling ,volume and pod Not in the same node On
2、external-provisioner You need to enable --node-deployment , Like this Will be based on selected-node Determine who is responsible for handling volume, Because the current is WaitForFirstConsumer Pattern ,scheduler Has been selected by computing resources node. Of course, if you don't turn it on node-deployment, You have to do it yourself rpc, Let the reality pod The scheduling of node To deal with createvolume request .
3、external-provisioner and csidriver Need to open storagecapacity characteristic , Of course , Yes k8s Version also has requirements ,1.21 above (1.19 It needs to be changed apiserver Launch parameters )https://github.com/kubernetes-csi/external-provisioner#capacity-support. Without this feature , be CSI Active updates are required node.Status.Capacity Report the capacity of custom resources , And then again POD Of resource Add the corresponding resources in , So as to achieve resource scheduling .
4、CreateVolume If the capacity is not enough ( Concurrent disk creation ), Can return ResourceExhausted Error code for type , such external-provisioner Will delete PVC Of selected-node( See external-provisioner Of rescheduleProvisioning function ), bring scheduler Rescheduling POD, Rescheduling POD when , As mentioned above, we will consider storagecapacity.
5、 Need to open topology.external-provisioner You need to enable --feature-gates=Topology=True, secondly CreateVolume+ NodeGetInfo In response to AccessibleTopology Need to bring nodeid.
About topology, It involves several aspects ,PV Need to set up https://kubernetes.io/docs/concepts/storage/persistent-volumes/#node-affinity , This part of the logic consists of external-provisioner complete . Why? PV Need to set up node-affinity, stay K8s In the idea ,PV Can be across node Access to the , you POD stay nodeA yes ,PV stay nodeB You can also access , On this point ,CSI The comments in the relevant code also mention
type Volume struct {
......
// Specifies where (regions, zones, racks, etc.) the provisioned
// volume is accessible from.
// A plugin that returns this field MUST also set the
// VOLUME_ACCESSIBILITY_CONSTRAINTS plugin capability.
// An SP MAY specify multiple topologies to indicate the volume is
// accessible from multiple locations.
// COs MAY use this information along with the topology information
// returned by NodeGetInfo to ensure that a given volume is accessible
// from a given node when scheduling workloads.
// This field is OPTIONAL. If it is not specified, the CO MAY assume
// the volume is equally accessible from all nodes in the cluster and
// MAY schedule workloads referencing the volume on any available
// node.
//
// Example 1:
// accessible_topology = {"region": "R1", "zone": "Z2"}
// Indicates a volume accessible only from the "region" "R1" and the
// "zone" "Z2".
//
// Example 2:
// accessible_topology =
// {"region": "R1", "zone": "Z2"},
// {"region": "R1", "zone": "Z3"}
// Indicates a volume accessible from both "zone" "Z2" and "zone" "Z3"
// in the "region" "R1".
AccessibleTopology []*Topology `protobuf:"bytes,5,rep,name=accessible_topology,json=accessibleTopology,proto3" json:"accessible_topology,omitempty"`
The above note also mentions , If not set topology That is to say PV No, node-affinity, New POD and PVC That's all right. , But when POD By delete When dropped and then rescheduled , Will not be scheduled to PV Where node On .
//boundClaims It is to be dispatched POD The associated 、 Already bound PV Of PVC
func (b *volumeBinder) FindPodVolumes(pod *v1.Pod, boundClaims, claimsToBind []*v1.PersistentVolumeClaim, node *v1.Node) (podVolumes *PodVolumes, reasons ConflictReasons, err error) {
......
// Check PV node affinity on bound volumes
if len(boundClaims) > 0 {
// The core is checkBoundClaims
boundVolumesSatisfied, boundPVsFound, err = b.checkBoundClaims(boundClaims, node, podName)
if err != nil {
return
}
}
return
}
func (b *volumeBinder) checkBoundClaims(claims []*v1.PersistentVolumeClaim, node *v1.Node, podName string) (bool, bool, error) {
csiNode, err := b.csiNodeLister.Get(node.Name)
if err != nil {
// TODO: return the error once CSINode is created by default
klog.V(4).Infof("Could not get a CSINode object for the node %q: %v", node.Name, err)
}
// Traverse all of PVC Under the PV
for _, pvc := range claims {
pvName := pvc.Spec.VolumeName
pv, err := b.pvCache.GetPV(pvName)
if err != nil {
if _, ok := err.(*errNotFound); ok {
err = nil
}
return true, false, err
}
pv, err = b.tryTranslatePVToCSI(pv, csiNode)
if err != nil {
return false, true, err
}
// Compare NodeAffinity and node.Labels, Be careful node.Labels On is NodeGetInfo Vomit to kubelet, then kubelet Call label Upper
err = volumeutil.CheckNodeAffinity(pv, node.Labels)
if err != nil {
klog.V(4).Infof("PersistentVolume %q, Node %q mismatch for Pod %q: %v", pvName, node.Name, podName, err)
return false, true, nil
}
klog.V(5).Infof("PersistentVolume %q, Node %q matches for Pod %q", pvName, node.Name, podName)
}
klog.V(4).Infof("All bound volumes for Pod %q match with Node %q", podName, node.Name)
return true, true, nil
}
边栏推荐
- MySQL export query results to excel file
- 数据中台:一篇带你深入浅出了解数据中台
- Introduction to uicollectionviewdiffabledatasource and nsdiffabledatasourcesnapshot
- Create NFS based storageclass on kubernetes
- Qtcanpool knowledge 07:ribbon
- Zzuli:1072 frog climbing well
- JS中的链表(含leetcode例题)<持续更新~>
- Filecoin黑客松开发者大赛
- Windows环境Redis使用AOF持久化,无法生成AOF文件,生成后无法加载AOF文件内容
- A full set of excellent SEO tutorials worth 300 yuan [159 lessons]
猜你喜欢

Oracle基础知识总结

Shanghai Yuge ASR CAT1 4G module 2-way low power 4G application

原动力×云原生正发声 降本增效大讲堂

How does the power outlet transmit electricity? Simple problems that have plagued my little friend for so many years

WordPress zibll sub theme 6.4.1 happy version is free of authorization
![[Verilog quick start of Niuke online question brushing series] ~ one out of four multiplexer](/img/1f/becda82f3136678c58dd8ed7bec8fe.png)
[Verilog quick start of Niuke online question brushing series] ~ one out of four multiplexer

上海域格ASR CAT1 4g模块2路保活低功耗4G应用

線條動畫

qtcanpool 知 07:Ribbon

How to do a good job of dam safety monitoring
随机推荐
Share a powerful tool for factor Mining: genetic programming
Why don't big manufacturers use undefined
Gee learning notes 3- export table data
分享|智慧环保-生态文明信息化解决方案(附PDF)
一看就会 MotionLayout使用的几种方式
【MYSQL】所有查询表中有2千万数据--sql如何优化
Flink 窗口机制 (两次等待, 最后兜底)
Data middle office: six questions data middle office
MySQL export query results to excel file
jq图片放大器
2022 Western pastry (Advanced) test question simulation test platform operation
5GgNB和ng-eNB的主要功能
函数栈帧的创建和销毁
Prove that there are infinite primes / primes
Online yaml to JSON tool
Important basis for ERP software company selection
[Verilog quick start of Niuke online question brushing series] ~ one out of four multiplexer
Intensive learning notes
MCLK configuration of Qualcomm platform camera
Sharing | intelligent environmental protection - ecological civilization informatization solution (PDF attached)