当前位置:网站首页>Multi dimensional monitoring: the data base of intelligent monitoring
Multi dimensional monitoring: the data base of intelligent monitoring
2022-07-03 11:22:00 【Blue whale Zhiyun】
Preface
Take component monitoring as an example , Introduce the roadmap for monitoring products
The function of operation and maintenance monitoring system is self-evident , Throughout the operation and maintenance 5 Functions : Release 、 change 、 Fault handling 、 Experience Optimization 、 Daily needs , Ensure the service availability of the above functions .
From the characteristics of big data ( Large amount of data 、 multidimensional 、 completeness )[1] Look at , The construction of operation and maintenance monitoring system can be divided into 2 Stages : Multidimensional monitoring ( Accumulate data ) and Intelligent monitoring ( Using data ), Through multi-dimensional monitoring, the fault can be seen 、 Yes , Intelligent monitoring detects risks in advance 、 Find out the cause of the fault .
Component monitoring is the first step of a multi-dimensional monitoring system 3 layer , Mainly for Common open source components 、 Monitor the performance indicators of middleware , such as Nginx The performance indicators are Active Connections( Current number of client connections )、Waiting( Number of connections waiting ) etc. ,Oracle The performance indicators are SQL Hard resolution rate 、 Table space usage, etc .
By collecting the key performance indicators of components , Learn the health status of components in real time , Find problems ahead of time , Instead of just monitoring whether a process or port is alive ( When the process or port is normal , Does not mean that services can be provided ).
This paper takes the construction component monitoring as an example , from The composition of multi-dimensional monitoring 、 Monitoring the product to solve 3 A question 、 Technology selection of component monitoring 、 Cloud distribution collector configuration 、 The openness of the community To introduce the monitoring product design roadmap .
1. The composition of multi-dimensional monitoring
From the perspective of user access to the link , The dimensions of monitoring indicators are divided into User level 、 application layer 、 Component layer 、 Host layer 、 The network layer . User level , Simulate the user's access behavior through service dial-up testing , You don't have to wait for users to complain ; application layer , Trace the call status of the application through the call chain ; The other three layers are easy to understand and will not be introduced .
Through this 5 layer + Other key indicators ( Like a journal 、 Business KPI Curves, etc ), Build multi-dimensional monitoring capability of monitoring system , Provide data support for the second stage of intelligent monitoring .
2. Monitoring the product to solve 3 A question
In addition to obtaining key performance indicators , Monitoring products still need to be solved 3 A question , Failure correlation analysis can be carried out for fallback , The intelligent scenario of operation and maintenance can be built .
2.1 Yes IT Autonomous Control of the system
because Yes IT Lack of autonomous control ability of the system ," Replacing IT System " and " Trend replacement IT On the way to the system , Is part of 、 Large enterprises in " Internet +" Actively embracing the current situation of the Internet under the tide .
In view of this situation , Some industries have made it clear that [2][3], We must pay more attention to IT The ability of the system to control itself .
therefore , Product design , It should be considered that users of the monitoring system can participate in the development or partial development of the monitoring system .
2.2 Refuse to build another chimney
The shaft structure is estimated to be built by most enterprises IT The state of the system , There is no correlation between each system , Each purchase of a system is equivalent to building an information island , Extremely low added value .
If you want to realize fallback, you can perform fault correlation analysis , The intelligent scenario of operation and maintenance can be built , Can be based on PaaS On the operation and maintenance platform [4], adopt iPaaS Get through all the inside of the enterprise IT Operating system .
2.3 There are many components , It's not very realistic to be completely self-study
There are a wide variety of components used in the industry , From database 、 Storage 、HTTP Service to message queue, etc 100+, It's certainly unrealistic to make a complete self-study .
A good way is to study the core by yourself 、 Components with poor industry support , The rest rely on the accumulated capacity of the industry for many years , Make fewer wheels , Save electricity for the society .
3. Technology selection of component monitoring
stay 2.3 Self research is mentioned in + The first 3 The idea of open source collector , Here is the open source collector Prometheus Exporter For example . Prometheus Exporter Our community is very active [5], Support 100+ Common open source components , Some large factories even specially write corresponding Prometheus Exporter, such as Oracle Compiling Weblogic Exporter,IBM Compiling IBM MQ exporter,k8s、etcd Even built-in based on Exporter canonical metrics.
According to this scheme , Just do one Protocol conversion You can stock in indicators
4. Experience Optimization : Cloud distribution collector configuration
After solving the basic requirements , You need to optimize your experience right away .
Send the collector or configuration to the monitored host , Generally, you need to manually deploy or use third-party tools ( Such as Ansible).
Switch multiple systems to accomplish one thing , The experience is very bad .
There is an optimization scheme , adopt iPaaS Use the file distribution and command execution capabilities of the control platform layer [4], Let users complete the configuration process in one page , Improve efficiency .
5. The openness of the community
After meeting the basic functions and optimizing the product experience , Next, consider Product scalability .
First, it solves the convenience of users' one click Import of self-developed components , Next, provide a communication platform for community users to share freely .
While gaining the open source capability of the community , It also needs to feed the community .
6. ending
The multi-dimensional monitoring that belongs to the basic monitoring scope is relative to the intelligent monitoring , Not very bright , but It is the data base of intelligent monitoring , There is no data provided by multi-dimensional monitoring , Failure prediction cannot be realized 、 Intelligent monitoring scenarios such as fault root cause analysis .
When traditional enterprises or Internet enterprises embrace the change of the Internet , Need to think calmly , Follow the roadmap step by step .
7. reference
[1] Wu Jun . The age of intelligence : Big data and intelligent revolution redefine the future [M]. Beijing : Citic publishing group ,2016-8.
[2] People's Bank of China . Information technology in China's financial industry “ Much starker choices-and graver consequences-in ” development planning [EB/OL]. 2017.06
[3] China Banking Regulatory Commission . China's banking information technology “ Much starker choices-and graver consequences-in ” Regulatory guidance on Development Planning ( Solicitation draft )[EB/OL]. 2016.07.15
[4] China Communications Standardization Association . Cloud computing operation and maintenance platform reference framework and technical requirements [EB/OL]. 2017.11.16
[5] Prometheus. EXPORTERS AND INTEGRATIONS [EB/OL].
Blue whale wisdom cloud
This article is edited and released by Tencent blue whale Zhiyun , Tencent blue whale Zhiyun ( Short for blue whale ) The software system is a set of systems based on PaaS Technology solutions for , Committed to building an industry-leading one-stop automatic operation and maintenance platform . At present, the community version has been launched 、 Enterprise Edition , Welcome to experience .
- Official website :https://bk.tencent.com/
- Download link :https://bk.tencent.com/download/
- Community :https://bk.tencent.com/s-mart/community/question
边栏推荐
- Driver development based on I2C protocol
- [VTK] vtkWindowedSincPolyDataFilter 源码注释解读
- Solution: jupyter notebook does not pop up the default browser
- Google Earth engine (GEE) -- when we use the front and back images to make up for the interpolation effect, what if there is no effect?
- C language log base zlog basic use
- 大厂技术专家:工程师如何提升沟通能力?
- Abandon the Internet after 00: don't want to enter a big factory after graduation, but go to the most fashionable Web3
- After setting up ADG, instance 2 cannot start ora-29760: instance_ number parameter not specified
- 软件测试工程师的5年之痒,讲述两年突破瓶颈经验
- Bi skills - permission axis
猜你喜欢
Cause: org. apache. ibatis. builder. Builderexception: error parsing SQL mapper configuration problem analysis
Abandon the Internet after 00: don't want to enter a big factory after graduation, but go to the most fashionable Web3
How did I grow up in the past eight years as a test engineer of meituan? I hope technicians can gain something after reading it
Summary of interview questions (2) IO model, set, NiO principle, cache penetration, breakdown avalanche
Unique in the industry! Fada electronic contract is on the list of 36 krypton hard core technology enterprises
如何清理v$rman_backup_job_details视图 报错ORA-02030
反正切熵(Arctangent entropy):2022.7月最新SCI论文
Solve undefined reference to`__ aeabi_ Uidivmod 'and undefined reference to`__ aeabi_ Uidiv 'error
MATLAB提取不規則txt文件中的數值數據(簡單且實用)
历经一个月,终于拿到金蝶Offer!分享一下四面面经+复习资料
随机推荐
Ext file system mechanism principle
如何清理v$rman_backup_job_details视图 报错ORA-02030
My understanding of testing (summarized by senior testers)
Google Earth engine (GEE) - ghsl global population grid dataset 250 meter resolution
Empire CMS no thumbnail smart tag (e:loop) two ways to judge whether there is a titlepic
2022-07-02:以下go语言代码输出什么?A:编译错误;B:Panic;C:NaN。 package main import “fmt“ func mai
Encapsulate a koa distributed locking middleware to solve the problem of idempotent or repeated requests
11. Provider service registration of Nacos service registration source code analysis
glassfish org. h2.server. Shutdownhandler classnotfoundexception exception exception handling
The role and necessity of implementing serializable interface
一文搞懂Go语言Context
Unique in the industry! Fada electronic contract is on the list of 36 krypton hard core technology enterprises
How to clean up v$rman_ backup_ job_ Details view reports error ora-02030
Crawl with requests
项目管理精华读书笔记(六)
How to become a senior digital IC Design Engineer (1-2) Verilog coding syntax: Verilog 1995, 2001, 2005 standards
00后抛弃互联网: 毕业不想进大厂,要去搞最潮Web3
如何成为一名高级数字 IC 设计工程师(1-2)Verilog 编码语法篇:Verilog 1995、2001、2005 标准
10. Nacos source code construction
Google Earth engine (GEE) -- when we use the front and back images to make up for the interpolation effect, what if there is no effect?