当前位置:网站首页>Build cloud native observability capability suitable for organizations
Build cloud native observability capability suitable for organizations
2022-06-30 16:06:00 【Spruce network】
CNCF In the definition of cloud primitives [1] in , Will observability (Observability) Be clear as an essential element . therefore , Use cloud native application architecture , Enjoy the efficiency gains it brings , What we have to face is how to build the observability of matching . To this day , Observability has a large number of solution puzzles in open source and business ,CNCF Cloud Native Landscape[2] There are hundreds of related contents in . This paper summarizes the maturity model of observability capability , Hope to provide guidance for organizations to choose their own observability scheme .
1.0 | pillar : Basic observability

Time to go back to 2017 year ,Peter Bourgon A blog post summarizes the three pillars of observability : indicators (Metrics)、 track (Tracing)、 journal (Logging)[3]. In the following years, this view was widely recognized in the industry , Develop into the basic requirement of observability capability , And there are many mature solutions in every aspect . for example , Focus on open source components Metrics Of Prometheus、Telegraf、InfluxDB、Grafana etc. , Focus on Tracing Of Skywalking、Jaeger、OpenTracing etc. , Focus on Logging Of Logstash、Elasticsearch、Loki etc. .
The construction of three pillars is the primary stage of observable capacity-building , Based on open source components, it is easy to build a set of observability facilities for each business system out of the box . There are two main problems in this stage :

1) data silos : When the team faces a business failure , You may need to jump frequently to Metrics、Tracing、Logging Between systems , Because the data on these systems are not well connected , The whole troubleshooting process is highly dependent on manual information connection , Sometimes it may be necessary to coordinate different personnel responsible for different systems to participate in problem troubleshooting .
2) Redundant construction : Because the collection of observation data depends on StatsD Pile insertion 、Tracing SDK Pile insertion 、Logging SDK Pile insertion , Observability capability at this stage is generally driven by business development team , Business units will only build observation facilities to serve themselves , This leads to repeated construction between different business units . On the other hand , Out of the box solutions often have scalability problems , It is difficult to grow into a basic service for all businesses .
2.0 | service : Uniform observability
When the opening of observation data and the optimization of observation system are more and more frequent in daily operation and maintenance work , It means that we need to be prepared to improve the observability ability to the next level . This level of observability is centered on Service , Collaboration between the infrastructure team and the business development team . The infrastructure team needs to build a unified observability platform for all businesses , Provide Metrics、Tracing、Logging Data collection 、 Storage 、 Retrieve infrastructure , It also supports the association of different types of data to eliminate islands . The business development team acts as the consumer , Using unity SDK Inject observation data on this platform .

The first problem we face is how to associate different types of data when collecting ,OpenTelemetry[4] It is expected to solve this problem through the collection and transmission of standardized data . follow OpenTelemetry standard , We can see Metrics It can be done by Exemplars Linked to Trace,Trace adopt TraceID、SpanID Linked to Log,Log adopt Instance Name、Service Name Linked to Metrics.OpenTelemetry The community has finished Tracing canonical 1.0 edition , And plan to 2021 Years to complete Metrics standard 、2022 Years to complete Logging standard . This is a rapidly developing project , However, it has received a lot of attention and recognition from the industry , It can also be seen that the observable data has been isolated for a long time !
secondly , We are also faced with the storage of different types of data , Unfortunately, in this regard OpenTelemetry Does not relate to .Metrics and Trace/Log The data are quite different , Usually used TSDB( Such as InfluxDB) Storage Metrics data ,Search Engine( Such as Elasticsearch) Storage Trace/Log data . In order to provide a unified observable platform service , The system needs to have horizontal expansion capability , but TSDB Due to the high base problem, it is usually difficult to store fine to every micro service 、API Index data of , and Search Engine Due to the problem of full-text indexing, it usually brings high resource overhead . To solve these two problems, it is generally considered to choose the real-time data warehouse based on sparse index , for example ClickHouse etc. , The object storage mechanism is used to realize the separation of cold and hot data .
besides , The bigger challenge for the observation system to become a unified service lies in , It needs to have stronger horizontal expansion ability than the business system . for example , Mixing clouds 、 In complex environments such as edge clouds , The observation system should be able to scale up to Region/AZ And the edge machine room , It enables the whole link to monitor complex services .
After solving the problem of data collection and storage , The infrastructure team can open the observation system to the business development team as a unified service , However, there are still two unsolved problems in the observability of this stage :

1) Team coupling : Observation capability as a service (Service), Must be actively invoked by the business development team (Call), But business security KPI Undertaken by the operation and maintenance team , It doesn't fall directly on the development team . In the context of high-speed iteration of cloud native architecture application , Whether the observation service can be improved every time the business is launched 100% call ? Even if the development team can strictly abide by the rules , The whole operation and maintenance team has no initiative . In addition, from the perspective of the development team , The business code has to insert all kinds of mandatory requirements by the operation and maintenance team SDK call .
2) Observation blind spot : Not every line of code of all software services involved in the application architecture is written by the development team , Therefore, intrusive code piling methods are bound to encounter observation blind spots . For example, on the communication path of two microservices API gateway 、iptables/ipvs、 The host machine vSwitch、SLB、Redis Caching services 、MQ Message queue service, etc , Can't get the observation receipt by inserting code .
3.0 | Force : Endogenous observability

When the coupling between development and operation and maintenance teams begins to restrict the development of the organization , When the observation blind spot of basic services begins to restrict the business SLO When further improved , It means that we need to improve the observability level again .
Since every cloud native application needs observability , So can we let infrastructure endogenously provide such capabilities , It's like the force (The Force) equally , Everywhere . therefore , The direction is clear : If no line of observation code is inserted into the business code , How much observable power can we get ? The main challenge at this stage comes from data collection and storage .
How to realize the endogenous application observation data acquisition capability of infrastructure ? A kind of Green Field The idea is to realize through service grid . We can see , Whether it is pure service grid, such as Istio, Or more radical application runtime, such as Dapr, Observability capability has been considered from the beginning of design . Suppose that the access paths between microservices pass through the service grid , Then we can solve the problem of observation data collection from the infrastructure level . The main challenge here comes from the change of application architecture —— All applications need to be migrated to the service grid architecture . But even relying on the service grid , There will still be middleware 、 database 、 cache 、 Observation blind spots on systems such as message queues . Maybe wait until the service grid looks like TCP equally —— When it becomes a layer of the network protocol stack [5], We can achieve endogenous observability through this method .
Another kind Brown Field The solution is to use BPF Zero invasion 、 The ability to observe everywhere .BPF Is an endogenous Linux Kernel Observation technology in , classic BPF(cBPF) It mainly focuses on the filtering and acquisition of network traffic , But in Kernel 4.X The version has been greatly enhanced (eBPF). utilize eBPF, There is no need to change the business code 、 No need to restart the business process , Every... Can be observed end-to-end TCP/UDP(kprobe)、HTTP2/HTTPS(uprobe) Function call ; utilize cBPF, Extract the information of each service access from the network traffic Metrics、Tracing、Logging Observation data , The communication between services can be observed through the virtual machine network card 、 Host network card 、SLB Performance data when waiting for intermediate equipment . With Linux Kernel 4.X More and more widely used , We see the cloud monitoring leader Datadog Recently released based on eBPF Of Universal Service Monitoring(USM) Zero intrusion monitoring capability [6], Domestic Alibaba cloud ARMS The team also recently released based on eBPF Zero intrusion monitoring products Kubernetes monitor [7], Open source community Skywalking v9 And started to pay attention to eBPF[8]. But please note that it only depends on eBPF There will be dependencies 4.X Linux Kernel problems , It may degenerate into a kind of Green Field programme .
The force needs to be everywhere , Network traffic has long been everywhere !
The challenge of data storage is actually related to full link monitoring . Observability based on application code often only considers business and application aspects , The Internet 、 Infrastructure becomes a blind spot . On the middle path (API gateway 、iptables/ipvs、 The host machine vSwitch、SLB、Redis Caching services 、MQ Message Queuing service ) How can the collected observation data be connected with the observation data at the application and business levels , We need to build a microservice oriented Knowledge map . With the cloud platform API、K8s apiserver as well as Service registry Synchronize resource and service information , Build regions for each microservice / Availability zone 、VPC/ subnet 、 Cloud server / The host machine 、 Containers colony / node / The workload 、 service name / Multi dimensional knowledge map information such as method name , As a data tag attached to the observation data , So as to get through all levels of the whole chain Metrics、Tracing、Logging data .
When you get to the 3 Class time , Observability has become an endogenous capability on cloud infrastructure , Like the force , It is contained in every running application system 、 And in each application system that will be added in the future , It is an innate basic ability , This capability does not need to depend on the... In the business code “ call ” To trigger , It's right there .
Learn more about cloud native observability technology practices , Welcome to the... Hosted by spruce network “ Cloud native observability sharing meeting ” A series of live events .7 month 6 Friday night 20:00~21:30, Li Qian, senior product manager of Picea network, will bring 《 HD cloud observable full link tracking practice 》 Theme sharing .
Event registration :https://www.slidestalk.com/m/960/OSCjishuwenzhang

边栏推荐
- topic: Privacy, Deception and Device Abuse
- Asp.NetCore利用缓存使用AOP方式防止重复提交
- Kindle down, ireader relay
- Summary of gradient descent optimizer (rmsprop, momentum, Adam)
- Cloud XR, how to help industrial upgrading
- 婴儿认知学习所带来的启发,也许是下一代无监督机器学习的关键
- C language foundation - pointer array - initialization method & constant pointer array, pointer constant array
- 服务端测试工程师面试经验
- Parameter optimization - bias and variance
- How to connect the Internet Reading Notes - Summary
猜你喜欢

ASP. Send information in sinalr controller of net core

ADB devices cannot detect the problem of Xiaomi note 3
![[sub matrix quantity statistics] cf1181c flag sub matrix quantity statistics](/img/91/2a94749d64d153ef1caf81345594a4.png)
[sub matrix quantity statistics] cf1181c flag sub matrix quantity statistics

Policy Center > Device and Network Abuse

Deep understanding Net (2) kernel mode 1 Kernel mode construct event event

BYD is more and more like Huawei?

What role does "low code" play in enterprise digital transformation?

Oculus quest2 | unity configures the oculus quest2 development environment and packages an application for real machine testing

Compare whether the two arrays are the same

halcon变量窗口的图像变量不显示,重启软件和电脑都没用
随机推荐
Openresty built in variable
How the edge computing platform helps the development of the Internet of things
Unsupported major.minor version 52.0
【时序数据库InfluxDB】Windows环境下配置InfluxDB+数据可视化,以及使用 C#进行简单操作的代码实例
Oracle中的With As 子查询
Which direction should college students choose to find jobs after graduation?
从第三次技术革命看企业应用三大开发趋势
What is XR extended reality and what are the XR cloud streaming platforms
Smart wind power: operation and maintenance of digital twin 3D wind turbine intelligent equipment
Policy Center-User Data
Management joint examination - mathematical formula
Model system: Sword (1)
《网络是怎么样连接的》读书笔记 - 汇总篇
LeCun指明下一代AI方向:自主机器智能
Sword finger offer II 080 Combinatorial backtracking with k elements
Go-Micro安装
CVPR 2022 - Tesla AI proposed: generalized pedestrian re recognition based on graph sampling depth metric learning
消息队列十连问
Using asp Net core creating web API series
With as subquery in Oracle