当前位置:网站首页>Introduction to Alibaba eagle eye system
Introduction to Alibaba eagle eye system
2022-07-27 17:20:00 【Jade faced Dragon】
1、 The problems brought about by microservices
Microservice framework brings many benefits , For example, it improves the efficiency of development , It has better scalability . But microservice is actually a double-edged sword , Microservices also bring some problems , such as :
- Fault location is difficult
- Capacity estimation is difficult
- Waste more resources
- It's hard to sort out the links
These problems are also the problems that our eagle eye system needs to solve .
2、 What is eagle eye
Eagle eye is a monitoring system with link tracking technology as the core , Its main means is through collection 、 Storage 、 Analyze the call event data in the distributed system , Assist the development and operation personnel in fault diagnosis 、 Capacity estimate 、 Performance bottleneck location and call link sorting .
EagleEye( Eagle eye ): It is a distributed call tracking system , That is to summarize and analyze the distributed calls generated by a front-end request .
TraceId: It is a globally unique call chain that indicates a front-end request ID. When the front-end request reaches the server , Before the application container performs the actual business processing , According to EagleEye Embedded logic of , Generate TraceId.
3、 Realization principle
3.1 TranceID

Make unified embedding points in a set of middleware , In the distributed invocation framework 、 Distributed messaging system 、 Cache system 、 Unified access layer 、Web The place of sending and receiving requests in the framework layer shall be uniformly buried , Embedded data can be seamlessly transmitted between systems by a set of middleware .
When the user's request comes in , Eagle eye will generate a unique one in the middleware of the first server to receive this request TraceID, This TraceID With each distributed call, it will be transmitted to the downstream system , All transparent events will be stored in RPC log The file of . Then eagle eye will have a centralized processing cluster to incrementally collect the logs on all machines into the cluster for processing , The logic of processing is relatively simple , Just do a simple cleaning and then reverse the index . As long as an error is reported in the system , And then put TraceID Type it as a keyword in the exception log , You can see what happened in the system during this call , We go through TraceID In fact, it is easy to see that this call is stuck B To C Database call for , It's over time , In this way, we can easily trace the problem in the distributed call link . Actually, through TraceId We can only get the above chronological sequence of call events , What we want is a nested call stack .
3.2 TraceID + RPCID

To restore the call stack , We also need another thing called RPCId( stay OpenTracing There are similar concepts in , be called SpanID),RPCId Is a multidimensional sequence . When it passes through the first link, the initial value is 0, Every time it makes a deep call, it becomes 0.1, Then rise again 0.1.1, Every time it makes a call of the same depth , That is to say A Finish adjusting B Later, it was adjusted D Will become 0.2,RPCId It is also printed to the same copy with this call RPC Log in , Together with the call event itself and TraceId They are collected into the central processing cluster and processed together .

After the collection , Eagle eye follows RPCId Do a depth traversal , You can get such a call stack , The call stack in the above figure is actually the call stack for placing orders in the real Taobao trading system , You can see that this call has experienced many systems . But from the perspective of eagle eye , It's like it happened locally , It is easy to see if there is a problem with a call , The phenomenon of the problem is where it appears , Finally, the root cause of the problem is where it happened . In addition to the return code of the call exception , In fact, you can also see the time-consuming of each call on the right , We can also see how slow each call is . This figure explains how eagle eye solves the problem of difficult fault location in the four major problems of microservice , It can be indexed upside down , Let the user find out the whole picture of each call .
3.3 According to the specified Tag Count the number of calls
If we aggregate trillion level call chain data , Whether we can get more valuable information ? We can have a look , Every call except for its unique identification TraceID and RPCID outside , It also contains some label information (Tag), What is a label ? It is common , Everyone will have such information . For example, what systems have it experienced in this call , These systems are called every time IP What is it? , Which machine room , What is the service name ? Some tags can be transmitted through the link , Like the entrance url, After it passed through, I knew that every event after the request went down was initiated through this portal , So if you aggregate these tags , We can get the data of call chain statistics , For example, count the call chain according to the label of a computer room , We can get the trend chart of the number of calls in each computer room .
What are the benefits of doing so ? In fact, the benefits of this in capacity estimation are very obvious . Let's take an example , Suppose we have a trade order entry label , After we aggregate such labels , Not only can we see how it works in a single call , You can also see that after aggregating these call chain data , What is its total number of calls , What's the average time , I can find the hot spots and bottlenecks in the system , At the same time, we can find some illegal traffic , That is to say, what I didn't know before .
3.4 Summary
Eagle eye is divided into two parts : The first is through TraceId and RPCId Restore the stack of the distributed call chain , So as to realize the function of fault location ; The second is through call chain data analysis , Put these inlets 、 Link characteristics 、 application 、 Machine room, etc tag Aggregate statistics , Capacity estimation can be achieved 、 The positioning of performance bottlenecks and the sorting of call chains .
Reference resources : Alibaba eagle eye technology decryption - See more, learn more, remember more, practice more - Blog Garden
边栏推荐
- Built in object (bottom)
- Shell programming specifications and variables
- md 中超链接的解析问题:解析`this.$set()`,`$`前要加空格或转义符 `\`
- Why is domestic Xinguan oral medicine a drug for the treatment of AIDS
- URL return nil and urlhash processing
- MySQL: 函数
- New attributes of ES6 array
- Hegong sky team vision training Day7 - vision, Jetson naon and d435i
- SVM+Surf+K-means花朵分类(Matlab)
- 第7天总结&作业
猜你喜欢

Hegong sky team vision training Day7 - vision, Jetson naon and d435i

node包依赖下载管理

信通院陈屹力:降本增效是云原生应用最大价值

深度学习能颠覆视频编解码吗?国家技术发明奖一等奖得主在小红书给你唠

Unity 入门

第7天总结&作业

The 7-year-old boy broke his finger by AI robot just because he played chess too fast?

Can deep learning overturn video codec? The first prize winner of the National Technological Invention Award nags you in the little red book

day07 作业

Node package depends on download management
随机推荐
Flex flex flex box layout 2
Gartner 权威预测未来4年网络安全的8大发展趋势
Neural network implementation of handwritten numeral classification matlab
Why is domestic Xinguan oral medicine a drug for the treatment of AIDS
VO、DO、DTO、PO是什么
Mobile end Foundation
Operators of C language
Start from scratch blazor server (1) -- project construction
信号量保护之位带操作
Behind every piece of information you collect, you can't live without TA
Hidden iframe design details of SAP ui5 fileuploader
Purchase in Appstore
Niuke topic -- binary search tree and bidirectional linked list
Smart fish tank design based on stm32
Swift QQ authorized login pit set
三表联查2
信通院陈屹力:降本增效是云原生应用最大价值
Microsoft silently donated $10000 to curl, which was not notified until half a year later
Natural sorting: comparable interface, customized sorting: the difference between comparator interface
记一次 .NET 某智慧物流 WCS系统 CPU 爆高分析