当前位置:网站首页>Introduction to Alibaba eagle eye system
Introduction to Alibaba eagle eye system
2022-07-27 17:20:00 【Jade faced Dragon】
1、 The problems brought about by microservices
Microservice framework brings many benefits , For example, it improves the efficiency of development , It has better scalability . But microservice is actually a double-edged sword , Microservices also bring some problems , such as :
- Fault location is difficult
- Capacity estimation is difficult
- Waste more resources
- It's hard to sort out the links
These problems are also the problems that our eagle eye system needs to solve .
2、 What is eagle eye
Eagle eye is a monitoring system with link tracking technology as the core , Its main means is through collection 、 Storage 、 Analyze the call event data in the distributed system , Assist the development and operation personnel in fault diagnosis 、 Capacity estimate 、 Performance bottleneck location and call link sorting .
EagleEye( Eagle eye ): It is a distributed call tracking system , That is to summarize and analyze the distributed calls generated by a front-end request .
TraceId: It is a globally unique call chain that indicates a front-end request ID. When the front-end request reaches the server , Before the application container performs the actual business processing , According to EagleEye Embedded logic of , Generate TraceId.
3、 Realization principle
3.1 TranceID

Make unified embedding points in a set of middleware , In the distributed invocation framework 、 Distributed messaging system 、 Cache system 、 Unified access layer 、Web The place of sending and receiving requests in the framework layer shall be uniformly buried , Embedded data can be seamlessly transmitted between systems by a set of middleware .
When the user's request comes in , Eagle eye will generate a unique one in the middleware of the first server to receive this request TraceID, This TraceID With each distributed call, it will be transmitted to the downstream system , All transparent events will be stored in RPC log The file of . Then eagle eye will have a centralized processing cluster to incrementally collect the logs on all machines into the cluster for processing , The logic of processing is relatively simple , Just do a simple cleaning and then reverse the index . As long as an error is reported in the system , And then put TraceID Type it as a keyword in the exception log , You can see what happened in the system during this call , We go through TraceID In fact, it is easy to see that this call is stuck B To C Database call for , It's over time , In this way, we can easily trace the problem in the distributed call link . Actually, through TraceId We can only get the above chronological sequence of call events , What we want is a nested call stack .
3.2 TraceID + RPCID

To restore the call stack , We also need another thing called RPCId( stay OpenTracing There are similar concepts in , be called SpanID),RPCId Is a multidimensional sequence . When it passes through the first link, the initial value is 0, Every time it makes a deep call, it becomes 0.1, Then rise again 0.1.1, Every time it makes a call of the same depth , That is to say A Finish adjusting B Later, it was adjusted D Will become 0.2,RPCId It is also printed to the same copy with this call RPC Log in , Together with the call event itself and TraceId They are collected into the central processing cluster and processed together .

After the collection , Eagle eye follows RPCId Do a depth traversal , You can get such a call stack , The call stack in the above figure is actually the call stack for placing orders in the real Taobao trading system , You can see that this call has experienced many systems . But from the perspective of eagle eye , It's like it happened locally , It is easy to see if there is a problem with a call , The phenomenon of the problem is where it appears , Finally, the root cause of the problem is where it happened . In addition to the return code of the call exception , In fact, you can also see the time-consuming of each call on the right , We can also see how slow each call is . This figure explains how eagle eye solves the problem of difficult fault location in the four major problems of microservice , It can be indexed upside down , Let the user find out the whole picture of each call .
3.3 According to the specified Tag Count the number of calls
If we aggregate trillion level call chain data , Whether we can get more valuable information ? We can have a look , Every call except for its unique identification TraceID and RPCID outside , It also contains some label information (Tag), What is a label ? It is common , Everyone will have such information . For example, what systems have it experienced in this call , These systems are called every time IP What is it? , Which machine room , What is the service name ? Some tags can be transmitted through the link , Like the entrance url, After it passed through, I knew that every event after the request went down was initiated through this portal , So if you aggregate these tags , We can get the data of call chain statistics , For example, count the call chain according to the label of a computer room , We can get the trend chart of the number of calls in each computer room .
What are the benefits of doing so ? In fact, the benefits of this in capacity estimation are very obvious . Let's take an example , Suppose we have a trade order entry label , After we aggregate such labels , Not only can we see how it works in a single call , You can also see that after aggregating these call chain data , What is its total number of calls , What's the average time , I can find the hot spots and bottlenecks in the system , At the same time, we can find some illegal traffic , That is to say, what I didn't know before .
3.4 Summary
Eagle eye is divided into two parts : The first is through TraceId and RPCId Restore the stack of the distributed call chain , So as to realize the function of fault location ; The second is through call chain data analysis , Put these inlets 、 Link characteristics 、 application 、 Machine room, etc tag Aggregate statistics , Capacity estimation can be achieved 、 The positioning of performance bottlenecks and the sorting of call chains .
Reference resources : Alibaba eagle eye technology decryption - See more, learn more, remember more, practice more - Blog Garden
边栏推荐
- 从零开始Blazor Server(1)--项目搭建
- Sharing of local file upload technology of SAP ui5 fileuploader
- 如何通过C#/VB.NET从PDF中提取表格
- Flex弹性盒布局
- Chen Yili of ICT Institute: reducing cost and increasing efficiency is the greatest value of cloud native applications
- Database foundation
- JDBC connection database
- .net core with microservices - what is a microservice
- js中的函数
- 密集光流提取dense_flow理解
猜你喜欢

大排量硬核产品来袭,坦克品牌能否冲破自主品牌天花板?

【SAML SSO解决方案】上海道宁为您带来SAML for ASP.NET/SAML for ASP.NET Core下载、试用、教程

动作捕捉系统用于柔性机械臂的末端定位控制

Flex弹性盒布局

Flex flex flex box layout

ES6数组新增属性

Program environment and preprocessing of C language

通过 FileUploader 的初始化,了解 SAP UI5 应用的 StaticArea 初始化逻辑

深度学习能颠覆视频编解码吗?国家技术发明奖一等奖得主在小红书给你唠

技术实践干货 | 从工作流到工作流
随机推荐
SAP UI5 FileUploader 的隐藏 iframe 设计明细
两表联查1
小于n的最大数
Xcode releases test package testflight
高精度定时器
ES6数组的方法及伪数组转数组方法
三表联查1
诸神黄昏,“猫抖快”告别大V时代
With the arrival of large displacement hard core products, can the tank brand break through the ceiling of its own brand?
URL return nil and urlhash processing
How does vs2019 C language run multiple projects at the same time, how to add multiple source files containing main functions in a project and debug and run them respectively
This large model sparse training method with high accuracy and low resource consumption has been found by Alibaba cloud scientists! Has been included in IJCAI
Day 7 summary & homework
Built in object (bottom)
(2)融合cbam的two-stream项目搭建----数据准备
移动端页面布局
Global string object (function type) +math object
Program environment and preprocessing of C language
Mobile page layout
Storage of data in C language