随着技术的发展以及应用对时延、带宽、安全的追求,一个明显的技术趋势是越来越多的应用组件将会被部署到企业所管理的网络边缘.本系列是开源电子书
Edge Cloud Operations: A Systems Approach
的中文版,详细介绍了基于开源组件构建的边缘云的架构、功能及具体实现.

第4章 生命周期管理
Lifecycle management is concerned with updating and evolving a running system over time.We have gone through the bootstrap steps for hardware configuration and installation of the basic software platform(第3章),So now turn your attention to continuously upgrading the software that runs on that platform.作为提醒,We assume that the underlying platform consists of running on each server and switchLinux,加上Docker、Kubernetes和Helm,以及SD-Fabric控制网络.
Although you can narrow view of the life cycle management,And assume that we want to launch the software has experienced offline integration and testing process(This is the traditional model for manufacturers to release new versions of their products),but we took a broader approach,Create new features and functionality from the start of the development process.包括"创新"Steps, formed as shown in figure17As shown in the closed loop,The cloud industry told us,This will help us roll out new features faster.

图17. A virtuous circle with the goal of increasing the speed of feature releases.
当然,Not every business has the same developer army as a cloud provider,But that doesn't mean the company has lost this opportunity..Innovation can come from many sources,包括开源,So the real goal is to democratize integration and deployment,This is the goal of this chapter of life cycle management subsystem.
4.1 设计概述
图18提供了流水线/Overview of tool chain,This forms the two parts of lifecycle management,持续集成(CI)和持续部署(CD),And extended in the first2In chapter summary.The key things to focus on are the intermediate mirrors and configuration repositories,Represents the between two parts"接口": CI生成Docker镜像和Helm Charts,And stored in separate store,而CD消费Docker镜像和Helm Charts,From the respective repository.

图18. CI/CD流水线概述.
配置存储库(Config Repo)Also contains the allocation of resources(Resource Provisioning)Declarative specification of generated infrastructure artifacts,特别是TerraformTemplate and variable file[1].虽然3.1Section of the allocation of resources"手动"和"数据输入"Step inCI/CD流水线之外,But the final output of the config is to check in the config repository(Config Repo)的"基础设施即代码(Infrastructure-as-Code)".These files are the inputs to lifecycle management,means whenever these files change,Terraform就会作为CI/CDPart of the is called.换句话说,CI/CDKeep software-related components in the underlying cloud platform and the microservice workloads running on that platform up to date.
[1] 我们通常使用术语"Config Repo"Represents one or more repositories,Stores all configuration related files.在实践中,There may be a repository storeHelm Charts,另一个存储Terraform模板.
There are three main points in this brief introduction.首先,通过在CI和CD(As well as the allocation of resources andCD)pass well-defined artifacts between,All three subsystems are loosely coupled,and able to perform their respective tasks independently.其次,All reliable state needed to successfully build and deploy the system is contained in the pipeline,Especially as a configuration repository(Config Repo)In a declarative specification.这是"
配置即代码(Configuration-as-Code)
"(有时也被称为
GitOps
)的基石,Is this book to realizeCI/CD的云原生方法.最后,Operators have the opportunity to flexibly configure the pipeline,如图中的"
Deployment of gating(Deployment Gate)
"所示,Control what features are deployed and when.This topic in the sidebar and the rest of this chapter are discussed.
持续交付 vs 持续部署(Continuous Delivery vs Deployment)
你还会听到CD指的是"持续交付(Continuous Delivery)"而不是"持续部署(Continuous Deployment)",But we are interested in the complete end-to-end process,So in this bookCDAlways suggest the latter.但请记住,"持续(continuous)"并不一定意味着"立即(instantaneous)",可以在CI/CDInject various gating functions into the pipeline,to control when and how upgrades are rolled out.重要的是,All stages in the pipeline are automated.
那么,"持续交付(Continuous Delivery)"到底是什么意思呢?可以说,当与"持续集成(Continuous Integration)"结合在一起时,它是多余的,Because the assembly lineCIPart of the set of workpiece(例如Docker镜像)exactly what is being delivered.unless these artifacts need to be deployed,否则没有"下一步".这很棘手,但有些人会认为CILimited to test the new code,The continuous delivery corresponds to the last"发布工件"步骤.出于我们的目的,我们将"发布工件"In the pipelineCI部分.
延伸阅读:
Weaveworks.
.
图18The third repository shown on the far left is the code repository(Code Repo).Although there is no clear,But developers are integrating new features andBugThe fix keeps checking into this repository,然后触发CI/CD流水线.Run a set of tests and code reviews against these checked-in codes,And will these tests/The output report of the review is sent to the developer,Developers modify accordingly.(图18The dashed lines in these dev-test feedback loops suggest.)
图18The far right shows the deployment target set,其中
Staging
和
Production
Is two sample.我们的想法是,First deploy the new version of the software to a set of pre-releases(Staging)集群中,在一段时间内,it will take the actual workload,And then in advance(Staging)After the deployment gives us confidence that the upgrade is reliable,Deploy it to production again(Production)集群中.
This is a simplified description of the actual situation.通常,More than two different versions of cloud software can be deployed at any given time.One reason this happens is that upgrades are usually incremental(例如,Deploy only a few sites at a time over an extended period of time),This means that even production systems are"Staging"role in the new version.例如,A new version may be deployed in first10%The production of machine,only considered reliable,To be extended to the next25%,以此类推.The specific release strategy is embodied in configurable parameters,详见4.4节.
最后,图18所示的两个CI阶段定义了
测试(Testing)
组件.One is for checking in code repositories(Code Repo)A set of component-level tests that run for each patchset of,These tests serve as an integrated access control,Only pass this preliminary test first,to fully merge the patch into the code repository(Code Repo)中.一旦合并,The pipeline will run the build across all components,然后在
质量保证(QA, Quality Assurance)
Second round of testing on the cluster.Passing these tests will decide whether to deploy,但是请注意,The test also occur in the pretest(Staging)集群中,As the lineCD端的一部分.People may naturally wonder about the production(Production)After the cluster running software,How to continue to test?当然,这种情况也会发生,but we tend to call it monitoring and telemetry(And the subsequent diagnosis),而不是测试,这是第6章的主题.
We will explore graphs in more detail in the next chapters18中的每个阶段,But when delving into the various mechanisms,keep a high level in mind、A feature-centric perspective is helpful.毕竟,CI/CDPipelining is just an elaborate mechanism,Help us manage the feature set we want the cloud to support.Every feature starts in development,这与图18Integration in the gating(Integration Gate)Everything that is left corresponds to.Once the candidate features mature enough to be officially accepted by main branch of the code repository(例如,合并),Has entered the phase of integration,在此期间,This feature will be combined with all other candidate features(Includes new and old features)To evaluate combined.最后,as long as a particular subset of features is considered stable,and proved valuable,Will be deployed and eventually run in production.Because testing is at the center of the entire lifecycle of a set of features,所以我们从这里开始.
4.2 测试策略
The goal of our lifecycle management is to increase the speed of feature release,But must be delivered with high quality(可靠、Scalability, and meet the performance requirements)The code phase equilibrium.Ensuring code quality requires going through a series of tests,但"快速"The key to doing this is to use automation effectively.This section describes an approach to test automation,But we first discuss the overall testing strategy.
在Cloud/DevOpsThe best practice for testing in the environment is to use
左移(Shift Left)
策略,This strategy introduces testing early in the development cycle,In the figure18As shown in the assembly line on the left side of the.To apply this principle,First you have to understand what type of test is required,The infrastructure needed to automate these tests can then be set up.
4.2.1 测试类别
About test type,有很多关于QA的词汇,但不幸的是,These definitions are usually blur、重叠的,and is not always used consistently.A simple classification for our purposes is given below,根据CI/CDThe three stages that take place in the pipeline(相对于图18)Different categories of tests are organized:
Integrated gating(Integration Gate):
These tests are run against each checked in patchset,Therefore must be completed quickly,and means that they are limited in scope.Before the merger test there are two kinds of:
冒烟测试(Smoke Tests):
a form of functional testing,Typically runs against a group of related modules,But through a simple/Rough way(So they can run faster)运行."冒烟测试"The etymology of the word is said to come from hardware testing,比如,"When you open the box,Will the smoke come out of the box??"
QA集群:
These test run on a regular basis(例如,一天一次,一周一次),So it can cover a wider area.They usually test entire subsystems,或者在某些情况下,测试整个系统.There are two kinds of combined/部署前测试:
性能测试(Performance Tests):
Similar to a certain range(例如,In the subsystem level)的功能测试,But measuring quantifiable performance parameters,Includes the ability to scale workloads,而不是正确性.
Pretest cluster(Staging Cluster):
before rolling out to production,Release candidates to run on staging clusters for a long time(例如,几天).These tests are run on a complete and fully integrated system,Typically used to find memory leaks and other issues that vary over time and workload.Only one type of test is run at this stage:
浸泡测试(Soak Tests):
有时被称为
金丝雀测试(Canary Tests)
,These tests combine artificially generated traffic with requests from real users,Dealing with real workload on the complete system.Because the entire system is integrated and deployed,So these tests are also used to verifyCI/CD机制,例如,Signing a configuration repository(Config Repo)The specifications of the definition, etc.
图19Summarizes the test order,Highlights their relationship across lifecycle timelines.注意,The most on the left side of the test is usually repeated as part of the development process,And the test on the far right is part of the continuous monitoring of the production deployment.为了简单起见,Diagram showing soak tests run before deployment,但是在实践中,New versions of the system may be rolled out continuously.

图19. Test sequence along the feature release timeline,由CI/CD流水线实现.
Develop the test strategy of one of the challenges is to determine whether a given test belongs to decided to merge patches of smoke test sets,or after the patch is merged into the code repository,But integration test set that happens before deployment.并没有严格的规则,这是一种权衡.We all want to test new software as early as possible,But full integration takes time and resources(the real platform on which the candidate software runs).
Associated with the balance,is a virtual resource required by the test infrastructure(例如,Pre-configured for many underlying platformsVM)And the physical resource(例如,A small cluster that faithfully represents the final target hardware)的组合.同样,It is not hard and fast rules,但早期(Smoke)Test tend to use pre-configured virtual resources,而后期(Integration)Testing tends to be done on representative hardware or cleanVM上运行,Use software built from scratch.
你也会注意到,in this simple classification,No mention of regression testing,但我们的观点是,Regression tests are designed to ensureBugOnce identified and repair,will not be re-introduced into the code,This means it's a common source of new tests,可以添加到Unit、Smoke、Integration、Performance或Soak测试中.实际上,Most tests are regression tests,与它们在CI/CDThe location of the pipeline operation.
4.2.2 测试框架
关于测试框架,图20显示了来自Aether的示例.The exact details will vary widely,Depends on the type of function that needs to be tested.在Aether中,Related components shown in the right,but rearranged to highlight top-down dependencies between subsystems,The corresponding test automation tools are shown on the left,Think of them as frameworks for domain-specific test classes(例如,NG40将5GThe workload is sent to aSD-Core和SD-RAN上,而TestVectorsinject packets into the switch).

图20. AetherThe test framework example.
图20Some of the frameworks shown were co-developed with corresponding software components.TestVectors和TestON就是这样,They respectively send customized workloads toStratum (SwitchOS)和ONOS (NetworkOS)上,两者都是开源的,So gain insight into the challenges of building a test framework.相比之下,NG40Is used to simulate accord with3GPPProprietary framework for standard cellular network traffic,Due to its complexity and follow3GPP标准的价值,is a closed commercial product.
Selenium和Robotis the most common of the five examples,都是开源项目,Has an active developer community.Selenium是用于自动化web应用测试的工具,而Robota more general tool for making requests to any well-defined interface.Developers can write extensions、库、Drivers and plugins to test the user portal and runtime respectivelyAPIin the sense of the specific characteristics of,Both systems are frameworks[2].They both illustrate the purpose of the testing framework,即提供一种方法(1)Automate the execution of a series of tests;(2)收集、Filing test results;(3)Evaluate and analyze test results.此外,When these frameworks are used to test scalable systems(如云服务)时,Is it necessary to make them scalable as well?
[2] SeleniumActually available as a library,可以在RobotWithin the framework of call it,如果考虑在Web GUI上的一组HTML定义的元素(如文本框、按钮、下拉菜单等)上调用HTTP操作,It's useful.
最后,As discussed in the preceding section,Every test framework requires a set of resources,Used to run the test suite(生成工作负载)and the subsystem being tested.对于后者,The ideal is to generate a full copy of the target cluster for each development team,But instantiating virtual environments on demand in the cloud is more cost-effective.幸运的是,Since the software being developed is containerized,而且Kubernetes可以在VM中运行,Therefore, a virtual test environment can be directly supported,which means that it can be used for infrequent(例如每日)dedicated hardware reserved for integration testing.
4.3 持续集成
Continuous Integration for Lifecycle Management(CI)Part is about converting developer checked in source code to deployableDocker镜像集.正如前一节所讨论的,Mainly for the code to run a set of tests,First test if the code is ready for integration,Then test for successful integration,The integration itself is performed entirely according to the declarative specification.This is the value proposition of a microservices architecture: Each component independently developed,打包成容器(Docker),And then by the container management system(Kubernetes)According to the Declarative Integration Plan(Helm)To deploy and Internet.
But the above description omits some important details,Next, some specific mechanisms need to be filled in.
4.3.1 代码存储库
代码库(例如GitHub和Gerrit)A way to temporarily submit patchsets is usually provided,Triggered a set of static checking(例如,通过linter、许可证和CLA检查),and give code reviewers the opportunity to review and comment on the code.This mechanism also provides triggering of the builds discussed next-集成-Method to test the process.Once all completed check,The engineer responsible for the affected module is satisfied,Will merge patch set.This is part of the software development process that everyone knows,我们不再讨论.对于我们的目的而言,The repository and is important in the codeCI/CDThere is a well-defined interface between subsequent stages of the pipeline.
4.3.2 构建(Build)-集成(Integrate)-测试(Test)
CIAt the heart of a pipeline is a mechanism for executing a set of processes,它(a)Build components affected by a given patchset,(b)The executable image that will be generated(如二进制文件)Integrate with other images to build larger subsystems,(c)Run a set of tests against these integrated subsystems and publish the results,(d)Optionally publish new deployment artifacts(如Docker镜像)To the downstream image libraries.The last step will only happen after the patchset has been accepted and merged into the repository(It will also trigger the running18中的构建阶段).重要的是,Build and integrate images for testing in exactly the same way as build and integrate images for deployment.The design principles are the same,没有特殊情况,Just the end-to-endCI/CDAssembly line exports of different.
There are few topics that get developers' attention more than the pros and cons of different build tools.在UnixGrew up on the oldC程序员更喜欢Make.谷歌开发了Bazel,并将其开源.Apache基金会发布了Maven,演变成了Gradle.We don't like to choose either side in this unwinnable debate,Rather, it acknowledges that different teams can choose different build tools for their respective projects(We have called subsystems in generic terms),We use a simple second level tool to integrate the output of all those complex first level tools,The second level mechanism we chose wasJenkins,This is a job automation tool,sysadmins have been using it for years,but has recently been adapted and extended to automateCI/CD流水线.
延伸阅读:
.
At a higher level,Jenkinsis nothing more than an execution called
作业(job)
的脚本、响应某个
触发器(trigger)
的机制.Like the other tools described in the book,JenkinsA graphical dashboard,可以用来创建、Execute and view the results of a set of jobs,But it is mainly used for simple example.因为Jenkins在CIplays a central role in the pipeline,like all the other components we are building,Managed by a set of declarative specification files checked into the repository.问题在于,What is the exact meaning of this?
Jenkins提供了一种名为
Groovy
的脚本语言,Can be used to define a series
阶段(Stage)
组成的
流水线(Pipeline)
,Each stage performs some tasks and tests for success or failure.In principle it is possible to define a singleCI/CD流水线,从"构建"阶段开始,接着是"测试"阶段,如果成功,以"交付"阶段结束.But this approach doesn't take into account the loose coupling between all the components that build the cloud.实际上,狭义来说,Jenkins被用于(1)Build and test individual components,Include components before and after merging into the codebase;(2)Integrate and test combinations of various components,例如每天晚上;(3)Under the specific thing,Will you just build artifacts(例如Docker镜像)Pushed to the image repository.
这是一项艰巨的任务,因此JenkinsSupport tools to help build jobs.具体来说,
Jenkins Job Builder (JJB)
Deal with declarativeYAML文件,这些文件"参数化"用GroovyWritten in assembly line,生成JenkinsThen the problem sets assigned for the run.In addition to other content,这些YAMLThe file specifies the trigger to start the pipeline(For example a patch checked into the code repository).
开发人员如何使用JJBProject details,但是在Aether中,The approach taken is to have each major component define three to four differentGroovy的流水线,Each one corresponds to the figure18As shown in theCI/CDA top-level stage in the pipeline.也就是说,一个GroovyPipelines correspond to pre-merge builds and tests,One corresponds to the merged build and test,one corresponds to integration and testing,There is also one corresponding to the release artifact.Each main component also defines a set ofYAML文件,These files will be specific components of the trigger link to the assembly line,and the associated set of parameters that define the pipeline.YAML文件(and the resulting trigger)The number of varied from component,Common example is when the newDocker镜像发布,Triggers stored in the code repository
VERSION
文件的更改.(在4.5section will explain why.)
作为示例,下面是一个定义Aether APITest lineGroovy脚本,正如我们将在下一章中看到的,It is automatically generated by the runtime control subsystem.Currently we are only interested in the general form of the pipeline,therefore omitted most of the details,But it should be clear from the example what each stage does(记住Docker里
Kind
就是Kubernetes).One stage call fully rendered in the example is the first4.2.2节中介绍的Robot测试框架,Each call to performAPI的不同特性.(为了提高可读性,Examples do not contribute to the collection of resultsRobot显示输出、Logging and reporting parameters.)
pipeline {
...
stages {
stage("Cleanup"){
...
}
stage("Install Kind"){
...
}
stage("Clone Test Repo"){
...
}
stage("Setup Virtual Environment"){
...
}
stage("Generate API Test Framework and API Tests"){
...
}
stage("Run API Tests"){
steps {
sh """
mkdir -p /tmp/robotlogs
cd ${WORKSPACE}/api-tests
source ast-venv/bin/activate; set -u;
robot ${WORKSPACE}/api-tests/ap_list.robot || true
robot ${WORKSPACE}/api-tests/application.robot || true
robot ${WORKSPACE}/api-tests/connectivity_service.robot || true
robot ${WORKSPACE}/api-tests/device_group.robot || true
robot ${WORKSPACE}/api-tests/enterprise.robot || true
robot ${WORKSPACE}/api-tests/ip_domain.robot || true
robot ${WORKSPACE}/api-tests/site.robot || true
robot ${WORKSPACE}/api-tests/template.robot || true
robot ${WORKSPACE}/api-tests/traffic_class.robot || true
robot ${WORKSPACE}/api-tests/upf.robot || true
robot ${WORKSPACE}/api-tests/vcs.robot || true
"""
}
}
}
...
}
需要注意的一点是,This is another tool to use general terms in a specific manner,but not consistent with the general concept we use.图18中的每个
阶段(stage)
都由一个或多个Groovy定义的
流水线(pipeline)
实现,Each pipeline consists of a series ofGroovyDefinition phase composition.正如我们在示例中看到的,这些GroovyStages are fairly low-level operations.
This line is a figure18After the building shown inQAPart of the testing stage,so called by a time based trigger,下面的YAMLFragments are examples of job templates that specify such triggers.注意,如果查看JenkinsThe dashboard of homework sets,就会看到
name
属性的值.
- job-template:
id: aether-api-tests
name: 'aether-api-{api-version}-tests-{release-version}'
project-type: pipeline
pipeline-file: 'aether-api-tests.groovy'
...
triggers:
- timed: |
TZ=America/Los_Angeles
H {time} * * *
...
In order to display a complete,The following from anotherYAMLFile's code snippet showing how to specify repository-based triggers.This example executes a different pipeline(未显示),and corresponds to the pre-merge tests that run when a developer commits a candidate patch set.
- job-template:
id: 'aether-patchset'
name: 'aether-verify-{project}{suffix}'
project-type: pipeline
pipeline-script: 'aether-test.groovy'
...
triggers:
- gerrit:
server-name: '{gerrit-server-name}'
dependency-jobs: '{dependency-jobs}'
trigger-on:
- patchset-created-event:
exclude-drafts: true
exclude-trivial-rebase: false
exclude-no-code-change: true
- draft-published-event
- comment-added-contains-event:
comment-contains-value: '(?i)^.*recheck$'
...
The important conclusion from the discussion is that,There is no single or globalCI作业.Each component has many jobs,Independently publish deployable artifacts when conditions are met.这些条件包括:(1)Component passes required tests,以及(2)The version of the component indicates whether a new artifact is required.我们已经在4.2Section discusses the test strategy,并将在4.5Section describes the versioning strategy,These two issues are at the heart of a solid approach to implementing continuous integration,工具(示例中是Jenkins)just a means to an end.
4.4 持续部署
现在,We are ready to check in the configuration repository(Config Repo)The configuration specification takes action,which includes a specified set of underlying infrastructure(We have been referred to as a cloud platform)的Terraform模板,and a set of deployed microservices on top of the infrastructure(Sometimes called the application)集合的Helm Charts.We have already introduced in Chapter 3Terraform,It is a proxy for actually manipulating infrastructure related forms.在应用程序端,我们使用一个叫做Fleet的开源项目.
图21shows a summary of our work.请注意,Fleet和TerraformBoth rely on the configuration exported by each backend cloud providerAPI,粗略的说,Terraform调用这些API"管理Kubernetes",而Fleet调用这些API"使用Kubernetes".

图21. CD主代理(Terraform和Fleet)和后端Kubernetes集群之间的关系.
图21的TerraformClient is responsible for the deployment of(和配置)The latest platform level software.例如,If operators want to add servers to a given cluster(或虚拟机)、升级KubernetesVersion or changeKubernetes使用的CNI插件,Configuration will be required inTerraform配置文件中指定.(回想一下TerraformCalculate the difference between the existing state and the desired state,and perform the calls needed to make the former consistent with the latter.)Whenever new hardware is added to an existing cluster,相应的TerraformThe file will be modified and checked into the config repository(Config Repo),Triggering deployment operation.We no longer cover the mechanics of how platform deployments are triggered,Because it USES the in4.3.2Exactly the same as described in sectionJenkins,just now being checked into the config repository(Config Repo)的TerraformForm changes triggered by.
图21的FleetClient is responsible for the installation to run on each cluster of micro collection service.These services for one or more applications,由Helm Charts指定.if we try toKubernetes集群上部署一个Chart,那么我们用Helm就够了.Fleetvalue in extending the process,To help us manage multiple across multiple clustersChart的部署.(Fleet是RancherThe independence of derivatives,可以直接与Helm一起使用.)
延伸阅读:
.
FleetDefines three concepts related to our discussion.第一个是
Bundle
,Defines the basic unit to be deployed.在我们的例子中,一个Bundle相当于一个或多个Helm Chart的集合.第二个是
Cluster Group
,Identifies a group ofKubernetes集群,These clusters will be treated in the same way.在我们的例子中,标记为
Production
All clusters of can be considered as one such collection,标记为
Staging
All clusters of can be considered as another such collection(这里,我们讨论的是在Terraformassigned to each cluster in the specification
env
标签,如3.2Section as shown in example).第三个是
GitRepo
存储库,For monitoring ofBundleArtifacts change.在我们的例子中,新的Helm Chartsis checked into the configuration repository(But as is pointed out in this chapter began,In practice there may be dedicated"Helm Repo").
接下来了解Fleet就很简单了,它提供了一种定义Bundle、Cluster Group和GitRepoThe method of correlation between,So whenever updatesHelm ChartBy signingGitRepo时,包含该Chart的所有Bundle都会(重新)Deployed to all associatedCluster Group上.也就是说,FleetCan be seen as a realization of figure18中所示的
Deployment of gating(Deployment Gate)
的机制,Although other factors can also be taken into account(例如,Don't on a Friday afternoon5PM deployment).The next section will introduce a versioning strategy,Can override on this mechanism,to control what features are deployed when.
我们关注Fleet作为触发Helm ChartsExecution agent,But should not neglectHelm ChartItself at the heart of the effect,They are at the heart of how we specify how our services are deployed,Determine the set of interconnected microservices to deploy,正如我们将在下一节中看到的,They are the final arbiters of each microservice version.Subsequent chapters will also cover theseChart如何指定一个Kubernetes
Operator
run when microservices are deployed,and configure the newly started microservice in some component-specific way.最后,Helm ChartsYou can specify the resources each microservice is allowed to use(例如,处理器内核),Including minimum threshold and upper limit.当然,这是因为Kubernetes支持相应的API调用,and control the use of resources accordingly,才让这一切成为可能.
请注意,The last point about resource allocation reveals the edge we are concerned with/The basic characteristics of a hybrid cloud: They are usually resource constrained,Instead of providing a seemingly limitless resource of a datacenter-based elastic cloud.因此,Configuration and lifecycle management are used to decide(1)What service do we want to deploy,(2)How much resources do these services require,以及(3)How to share available resources among a planned set of services.
实现细节问题
We deliberately do not delve into individual tools within the lifecycle management subsystem,But the details are often important,而Fleetprovides us with a good example.细心的读者可能已经注意到,我们可以使用Jenkins来触发FleetDeployment of the application of an upgrade,就像使用Terraform一样.不过,由于Fleet的Bundle和Cluster GroupAbstract is very convenient,我们决定使用FleetThe internal trigger mechanism.
在FleetAfter going live as a deployment mechanism,Developers notice code repositories getting very slow.事实上,这是因为FleetPolling specifiedGitRepo来监控Bundle的更改,The polling too often,Lead to the repository.修改"轮询频率(polling-frequency)"parameters can improve this situation,but also makes people wonder whyJenkinsThe trigger mechanism did not cause the same problem.答案是JenkinsBetter integration with repositories(特别是在Git上运行的Gerrit),When a file checkin happens,The repository will be toJenkinsPush notification,而不需要轮询.
4.5 版本控制策略
本章介绍的CI/CDToolchains only work when applied in conjunction with an end-to-end version strategy,This ensures that the correct combination of source modules is integrated,The right mirror combination deployment.请记住,The top challenge is to manage our cloud support feature set,也就是说,It all depends on how we version these features.
Our starting point is to adopt widely accepted semantic versioning practices,Each component is assigned a three-part version number
MAJOR.MINOR.PATCH
(例如,
3.2.4
),其中
MAJOR
version you made incompatibleAPIIncreasing changes,
MINOR
The version is incremented as you add features in a backwards-compatible way,而
PATCH
Corresponding to the backward compatiblebug修复.
延伸阅读:
Semantic Versioning 2.0.0
.
Version control andCI/CDPossible interactions between toolchains,请记住,有不同的方法来解决这个问题.We break this sequence down into the three main phases of the software life cycle:
研发期(Development Time)
Every patch checked into the source code repository is in the repository
VERSION
The file contains an up-to-date semantic version number.请注意,Every patch doesn't necessarily equal every commit,因为对"开发中"的版本(Sometimes labeled
3.2.4-dev
)It is common to make multiple changes.这个
VERSION
File is used by developers to keep track of the current version number,但正如我们在4.3.2节中看到的,也可以作为JenkinsOperation of the trigger,To release a newDocker或Helm工件.
Commits corresponding to final patches are also marked as(在存储库中)The corresponding semantic version number.在git中,This tag is bound to a hash value,This hash value clearly identifies the commit,making it the authoritative way to tie a version number to a particular instance of source code.
The integration period(Integration Time)
CIThe toolchain does a sanity check on the version number of each component,Make sure that there is no degradation,When seeing a new version number for a microservice,Will build a new image and upload it to the mirror in the repository.按照惯例,The image contains the corresponding source code version number in the assigned unique name.
The deployment period(Deployment Time)
CDtoolchain in one or moreHelm ChartsSpecify by name and instantiate a set ofDocker镜像.Since these image names include semantic version numbers,按照约定,We know the corresponding software version deployed.
每个Helm ChartAlso checked into the repository,So it also has its own version number.每次Helm Chart改变时,由于DockerVersion changes of components of the image,ChartThe version number will also change.
Helm ChartsCan be layered organization,也就是说,一个Chartcontains one or more otherChart(每个ChartHas its own version number),根ChartThe version of effectively identifies the system version of the entire deployment.
请注意,根Helm ChartThe commit of the new version can be seen as triggering the pipelineCDPart of the signal(如图18中的"
Deployment of gating
"所示),即模块(特性)The combination of the now can be deployed.当然,Other factors may also be considered,such as the time mentioned above.
Although just described
源代码 -> Docker镜像 -> Kubernetes容器
The relationships can be encoded in the toolchain,But at least at the level of automated health testing to catch glaring bugs,Ultimate responsibility falls on the developers who check in the source code and the operators who check in the configuration code,They must correctly specify the desired version.Having a simple and clear version control strategy is a prerequisite for the job.
最后,Because version control is essentially the same asAPI相关,每当APIWhen changed in a backwards-incompatible way,
MAJOR
版本号就会增加,It is therefore the developer's responsibility to ensure that the software works correctly with anyAPI.When it comes to persistent state,这样做就会出现问题,Persistent state here refers to state that must be preserved across versions of software that access it.This is something all continuously running operating systems have to deal with,通常需要
数据迁移(data migration)
策略.Solving application-level state in a generic way is beyond the scope of this book,But solve the cloud management system(It has its own persistent state)problem is the subject of our discussion in the next chapter.
4.6 管理密钥
The discussion so far has overlooked an important detail,That is how to manage the key.例如,TerraformNeed access to likeGCPCredentials for such remote services,and the keys used to secure communications between microservices within the edge cluster.These keys are actually part of the hybrid cloud configuration state,means they are stored in the config repository(Config Repo)中,just like all other configuration as code(Configuration-as-Code)Artifacts,但问题在于,Repositories are generally not designed for security.
从高层来说,解决方案很简单.Various keys needed to operate a secure system are encrypted,Only the encrypted version is checked into the configuration repository(Config Repo).This reduces the problem to just having to worry about one key,But that pushes the question to the back.那么,我们如何管理(保护和分发)What about the key needed to decrypt the key??幸运的是,There are some mechanisms to help with this.例如,Aether使用两种不同的方法,每种方法都有自己的优缺点.
其中一种方法是
git-crypt
工具,It matches very well with the high-level overview presented above.在这种情况下,CI/CD机制的"The central processing circuit"(与Aether中的Jenkins相对应)is the trusted entity responsible for decrypting the key for a specific component and passing it to various components at deployment time.这个"传递"Step is usually to useKubernetes
Secrets
机制实现的,It is an encrypted channel that sends configuration state to microservices(也就是说,它类似于
ConfigMaps
).The mechanism should not be with
SealedSecrets
(接下来将讨论)相混淆,Because by itself it doesn't solve the bigger problem we're talking about here,i.e. how to manage keys outside the running cluster.
The advantage of this approach is its generality,because it doesn't make special assumptions,Apply to all keys and components.But also brought forJenkinsNegative Effects of Too Much Trust,或者更确切的说,对DevOps团队使用JenkinsThe practice of the negative effects of.
第二种方法是Kubernetes的
SealedSecrets
机制,The idea is to trustKubernetesIn the cluster running process(技术上,这个进程被称为Controller)To represent all the otherKubernetesMicro managed service management key.在运行时,The process to create a private/公共密钥对,And make the public key ofCI/CDTool chain visible.The private key is limited toSealedSecrets控制器,被称为
密封密钥(sealing key)
.The full protocol details are not intended to be detailed here,Just know that the public key can be used in combination with a randomly generated symmetric key to encrypt needs to be stored in the configuration repository(Config Repo)中的所有密钥,稍后(在部署时),The micro service requestSealedSecrets ControllerUse their sealed keys to help them unlock those keys.
Although this method is not as general as the first method(也就是说,It is dedicated to protectKubernetesIn the cluster key),But the advantage is that the processing loop is completely free of manual operation,Sealed keys are generated programmatically at runtime.然而,一个复杂的问题是,It is usually preferable to write the key to persistent storage,to prevent having to rebootSealedSecrets Controller,This may create an additional attack surface that needs to be protected.
延伸阅读:
git-crypt - transparent file encryption in git
.
"Sealed Secrets" for Kubernetes
.
4.7 GitOps呢?
本章介绍的CI/CD流水线与GitOps是一致的,GitOpsIs a kind of around
配置即代码(Configuration-as-Code)
的思想设计的DevOps方法,Make code the single source of truth for building and deploying cloud-native systems.The premise of this approach is to first make all configuration state declarative(例如,在Helm Charts和Terraform模板中指定),Then use this repository as the single source of truth for building and deploying cloud-native systems.无论是给Pythonfile patch or update configuration file,The repository triggers theCI/CD流水线.
Although this method is based on the chapterGitOps模型的,But there are three considerations mean thatGitOps并不是故事的结尾.It all depends on the question: Can all the state required to operate a cloud-native system be fully managed using repository-based mechanisms.
首先要考虑的是,We need to acknowledge the difference between those who develop software and those who use it to build and operate systems.DevOps(in its simplest formula)means there should be no difference,而在实践中,Developers tend to stay away from operations staff,或者更确切的说,They stay away from design decisions about how others will ultimately use their software.例如,Software is often implemented with a specific set of use cases in mind,but will then integrate with other software,to build entirely new cloud applications,These applications have their own set of abstractions and features,相应的,Has its own set of configuration states.对于AetherIt is like this,其SD-CoreThe subsystem was originally implemented for the global cellular network,but is now being repurposed to support corporate private4G/5G.
Although such a state can indeed beGitManage the repository,但通过pull requestThe idea of doing configuration management is too simplistic.With low(以实现为中心)和高级(以应用程序为中心)变量,换句话说,It is common to run one or more abstraction layers over base software.在这种限制下,May even end users(例如,AetherThe enterprise users)Also want to change the status,This means that fine-grained access control may be required.这些都不影响GitOpsas a way to manage this state,But it does put forward such a possibility,i.e. not all states are created equally,There are a series of configuration state variables that need to be accessed at different times by different people with different skill sets,最重要的是,Need different privilege levels.
The second issue to consider is associated with the location of the configuration state produced.例如,Consider the addresses assigned to servers in the cluster,May originate from an organization's inventory system.or in another specific toAether的示例中,You need to call the remote
Spectrum Access Service (SAS)
to learn how to configure radio settings for deployed small cells.You might naively think that,可以从Git存储库中的YAMLTake this variable out of the file.通常,The system must handle multiple(有时是外部的)Configuration status source,Know which copy is authoritative,Which is derived from,这本身就有问题.没有唯一正确的答案,But a situation like this might result in the need to maintain an authoritative copy of the configuration state,rather than any single use of the state.
A third consideration is the frequency of this state change,So it might trigger a restart or even redeploy a set of containers.这样做对于"一次设置"The configuration parameters of course make sense,但是"The runtime can be set up"The control variable?What is the most cost-effective way to update system parameters that are likely to change frequently?This again raises a possibility,i.e. not all states are created equal,There is continuous change the configuration of the state.
These three notes point out the difference between build-time configuration state and run-time control state,这是下一章的主题.然而,我们强调,There is no single correct answer to the question of how to manage this state,在"配置"和"控制"Drawing the line between is notoriously difficult.GitOpsBoth the supported repository-based mechanism and the runtime control scheme described in the next chapter have value,问题是,For any given information that needs to be maintained for the cloud to function properly,Which one is more match.
你好,我是俞凡,在Motorola做过研发,现在在Mavenir做技术工作,对通信、网络、后端架构、云原生、DevOps、CICD、区块链、AI等技术始终保持着浓厚的兴趣,平时喜欢阅读、思考,相信持续学习、终身成长,欢迎一起交流学习.
微信公众号:DeepNoMind
原网站版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/212/202207311236465198.html