当前位置:网站首页>Introduction to Alibaba eagle eye system
Introduction to Alibaba eagle eye system
2022-07-27 17:20:00 【Jade faced Dragon】
1、 The problems brought about by microservices
Microservice framework brings many benefits , For example, it improves the efficiency of development , It has better scalability . But microservice is actually a double-edged sword , Microservices also bring some problems , such as :
- Fault location is difficult
- Capacity estimation is difficult
- Waste more resources
- It's hard to sort out the links
These problems are also the problems that our eagle eye system needs to solve .
2、 What is eagle eye
Eagle eye is a monitoring system with link tracking technology as the core , Its main means is through collection 、 Storage 、 Analyze the call event data in the distributed system , Assist the development and operation personnel in fault diagnosis 、 Capacity estimate 、 Performance bottleneck location and call link sorting .
EagleEye( Eagle eye ): It is a distributed call tracking system , That is to summarize and analyze the distributed calls generated by a front-end request .
TraceId: It is a globally unique call chain that indicates a front-end request ID. When the front-end request reaches the server , Before the application container performs the actual business processing , According to EagleEye Embedded logic of , Generate TraceId.
3、 Realization principle
3.1 TranceID

Make unified embedding points in a set of middleware , In the distributed invocation framework 、 Distributed messaging system 、 Cache system 、 Unified access layer 、Web The place of sending and receiving requests in the framework layer shall be uniformly buried , Embedded data can be seamlessly transmitted between systems by a set of middleware .
When the user's request comes in , Eagle eye will generate a unique one in the middleware of the first server to receive this request TraceID, This TraceID With each distributed call, it will be transmitted to the downstream system , All transparent events will be stored in RPC log The file of . Then eagle eye will have a centralized processing cluster to incrementally collect the logs on all machines into the cluster for processing , The logic of processing is relatively simple , Just do a simple cleaning and then reverse the index . As long as an error is reported in the system , And then put TraceID Type it as a keyword in the exception log , You can see what happened in the system during this call , We go through TraceID In fact, it is easy to see that this call is stuck B To C Database call for , It's over time , In this way, we can easily trace the problem in the distributed call link . Actually, through TraceId We can only get the above chronological sequence of call events , What we want is a nested call stack .
3.2 TraceID + RPCID

To restore the call stack , We also need another thing called RPCId( stay OpenTracing There are similar concepts in , be called SpanID),RPCId Is a multidimensional sequence . When it passes through the first link, the initial value is 0, Every time it makes a deep call, it becomes 0.1, Then rise again 0.1.1, Every time it makes a call of the same depth , That is to say A Finish adjusting B Later, it was adjusted D Will become 0.2,RPCId It is also printed to the same copy with this call RPC Log in , Together with the call event itself and TraceId They are collected into the central processing cluster and processed together .

After the collection , Eagle eye follows RPCId Do a depth traversal , You can get such a call stack , The call stack in the above figure is actually the call stack for placing orders in the real Taobao trading system , You can see that this call has experienced many systems . But from the perspective of eagle eye , It's like it happened locally , It is easy to see if there is a problem with a call , The phenomenon of the problem is where it appears , Finally, the root cause of the problem is where it happened . In addition to the return code of the call exception , In fact, you can also see the time-consuming of each call on the right , We can also see how slow each call is . This figure explains how eagle eye solves the problem of difficult fault location in the four major problems of microservice , It can be indexed upside down , Let the user find out the whole picture of each call .
3.3 According to the specified Tag Count the number of calls
If we aggregate trillion level call chain data , Whether we can get more valuable information ? We can have a look , Every call except for its unique identification TraceID and RPCID outside , It also contains some label information (Tag), What is a label ? It is common , Everyone will have such information . For example, what systems have it experienced in this call , These systems are called every time IP What is it? , Which machine room , What is the service name ? Some tags can be transmitted through the link , Like the entrance url, After it passed through, I knew that every event after the request went down was initiated through this portal , So if you aggregate these tags , We can get the data of call chain statistics , For example, count the call chain according to the label of a computer room , We can get the trend chart of the number of calls in each computer room .
What are the benefits of doing so ? In fact, the benefits of this in capacity estimation are very obvious . Let's take an example , Suppose we have a trade order entry label , After we aggregate such labels , Not only can we see how it works in a single call , You can also see that after aggregating these call chain data , What is its total number of calls , What's the average time , I can find the hot spots and bottlenecks in the system , At the same time, we can find some illegal traffic , That is to say, what I didn't know before .
3.4 Summary
Eagle eye is divided into two parts : The first is through TraceId and RPCId Restore the stack of the distributed call chain , So as to realize the function of fault location ; The second is through call chain data analysis , Put these inlets 、 Link characteristics 、 application 、 Machine room, etc tag Aggregate statistics , Capacity estimation can be achieved 、 The positioning of performance bottlenecks and the sorting of call chains .
Reference resources : Alibaba eagle eye technology decryption - See more, learn more, remember more, practice more - Blog Garden
边栏推荐
- 立创EDA——PCB的布局(四)
- Kubernetes第八篇:使用kubernetes部署NFS系统完成数据库持久化(Kubernetes工作实践类)
- Advanced pointer of C language
- Technical practice dry goods | from workflow to workflow
- Three table joint query 1
- Flex flex flex box layout 2
- Character stream read file
- 三表联查2
- Mpc5744p clock module
- The 7-year-old boy broke his finger by AI robot just because he played chess too fast?
猜你喜欢

day07 作业

微软默默给 curl 捐赠一万美元,半年后才通知

这种精度高,消耗资源少的大模型稀疏训练方法被阿里云科学家找到了!已被收录到IJCAI

Niuke topic -- judge whether it is a complete binary tree or a balanced binary tree

This large model sparse training method with high accuracy and low resource consumption has been found by Alibaba cloud scientists! Has been included in IJCAI

JSP El expression, JSTL tag

了解Bom与DOM的基本属性

WebView basic use

三表联查2

Big manufacturers finally can't stand "adding one second", and companies such as Microsoft, Google meta propose to abolish leap seconds
随机推荐
Understand the staticarea initialization logic of SAP ui5 application through the initialization of fileuploader
Niuke topic -- judge whether it is a complete binary tree or a balanced binary tree
从零开始Blazor Server(1)--项目搭建
这种精度高,消耗资源少的大模型稀疏训练方法被阿里云科学家找到了!已被收录到IJCAI
Hegong sky team vision training Day8 - vision, target recognition
Hyperlink parsing in MD: parsing `this$ Set() `, ` $` should be preceded by a space or escape character`\`
Shell programming specifications and variables
JDBC connection database
Global string object (function type) +math object
神经网络实现手写数字分类matlab
Complete steps of JDBC program implementation
Bit band operation of semaphore protection
ES6 array method and pseudo array to array method
Passive income: return to the original and safe two ways to earn
Hegong sky team vision training Day7 - vision, Jetson naon and d435i
技术实践干货 | 从工作流到工作流
腾讯云上传使用
ES6数组的方法及伪数组转数组方法
Mobile end Foundation
苹果官网罕见打折,iPhone13全系优惠600元;国际象棋机器人弄伤对弈儿童手指;国内Go语言爱好者发起新编程语言|极客头条...