当前位置:网站首页>New core and new speed - next generation standard O & M engine
New core and new speed - next generation standard O & M engine
2022-06-24 05:49:00 【Tencent blue whale assistant】
1. summary
Standard operation and maintenance V3 It is a system of task flow arrangement and execution through visual graphic interface , Tencent blue whale product system is a lightweight scheduling class SaaS product . Based on Tencent blue whale PaaS Platform API Gateway service , Connect with various systems within the enterprise API The ability of , Integrate the working mode of switching between multiple systems into one process , Realize one key automatic scheduling .
The process arranged by the user is used by the bottom layer of standard operation and maintenance bamboo-pipeline Process engine to schedule and advance , Under the use of many scenarios for many years , Some problems in the design of the engine itself have also been exposed , In order to provide better service for users , Let the standard operation and maintenance go further , We decided to refactor and upgrade the underlying process engine .
2. Problems with the old engine
The old version of bamboo-pipeline The following problems exist in the design and implementation of the engine :
- Serialization granularity : The process execution data serialization granularity is too large , Cause additional IO expenses
- Serialization mode : The process execution data serialization method uses pickle, The data is unreadable after serialization , It's hard to find out
- Process data storage method : Process structure data and node data are not stored separately , When the node output data grows, it will cause the execution data to swell , Affect engine performance
- Engine architecture : The control surface and execution surface of the engine are not layered , This leads to a high coupling between the core scheduling logic of the engine and the specific framework
2.1. Serialization granularity
bamboo-pipeline After the build engine executes the data , The whole process object will be serialized and stored in the database
This serialization granularity is too coarse , For small processes ( Fewer nodes ) Come on , It doesn't create a particularly big problem . But when dealing with big processes ( The number of nodes reaches 100 , Thousand level ), Problems will come out .
Because the engine advances the process as follows :
- Read the current node information
- Push and execute nodes
- If you encounter a node that needs polling or waiting for callback , Relinquish engine scheduling rights , Scheduling switching occurs
Because the process data serialization granularity is the whole process , So in the first step, you need to read all the process data , Even if the current process only moves forward 1 Nodes , You also need to read the data of the whole process , This leads to a large overhead when the engine performs scheduling switching .
2.2. Serialization mode
bamboo-pipeline Use pickle To serialize process execution data , and pickle yes python A built-in binary serialization method , The agreement will follow python The upgrade of has changed ,bamboo-pipeline At that time python2 upgrade python3 It was handled when pickle Protocol upgrades and python Old data deserialization failure caused by built-in object upgrade .
and pickle The serialized data is unreadable , This greatly increases the cost of troubleshooting .
2.3. Process data storage method
During process execution, nodes will generate execution data , These data also need to be stored persistently .bamboo-pipeline Store the execution data and process objects in the database , This will lead to the expansion of process execution data when the node output data grows , Increase the cost of engine scheduling switching .
2.4. Engine architecture
bamboo-pipeline The control surface and execution surface of the engine are not layered , Result in the whole process engine SDK With a specific framework (Django,Celery) Strong coupling , It is not conducive to subsequent engine upgrades and the development of new functions , The user's use cost is also relatively high .
3. The design of the new engine
In order to solve these problems , We have a new version of the engine bamboo-engine Design and implementation of , The goals are as follows :
- solve bamboo-pipeline Various problems in
- Enhance the robustness and fault tolerance of the engine
- Enhanced engine observability
- After the upgrade, two sets of engines are allowed to exist and run at the same time , It provides the possibility for gray switching and upgrading
For the problems mentioned in the previous section , The solution for the new engine is as follows
3.1. Serialization granularity
bamboo-engine A more granular way to serialize , Split and store the data of each node in the process , such , It can ensure that the engine can run with the smallest unit ( Single node ) To read , Reduce unnecessary overhead during scheduling switching , Improve the scheduling efficiency of the engine .
3.2. Serialization mode
bamboo-engine use JSON Format for serialization , Get rid of right pickle Dependence , Reduce subsequent engine version upgrades , The cost of troubleshooting .
3.3. Process data storage method
bamboo-engine In addition to splitting the data of each node for storage , At the same time, the data executed by the node will be stored separately . Ensure that the static data and dynamic data of the process are stored separately , At the same time, even if the node outputs relatively large execution data , It will not affect the execution efficiency of subsequent processes .
3.4. Engine architecture
bamboo-engine from engine And Runtime interface Two parts make up .
The engine module is responsible for implementing the core scheduling logic of the process , That is, the logic of the process 、 Processing logic for each type of node 、 Scheduling and switching logic of processes, etc .
The engine runtime that implements the runtime interface provides the process runtime data storage to the engine 、 Process management 、 Implementation of task dispatch , The relationship between the two is shown in the figure below :
The benefits of abstracting the runtime are , If the default runtime does not meet the needs of the project in some ways , A new set of runtime can be implemented according to the runtime interface , Can be directly integrated into the engine .
meanwhile , In order to increase the observability of the system ,bamboo-engine Core metrics will be recorded and collection entry will be provided , For access to blue whale monitoring 、Prometheus Wait for the monitoring system .
4. Comparison between old and new engines
The standard O & M is completed after the upgrade of the new engine , We did a comparative test , The test process is as follows :
The test environment is
MacBook Pro(16 Inch ,2019) processor :2.6 GHz Six cores Intel Core i7 Memory :32 GB 2667 MHz DDR4 OS:macOS Big Sur 11.2.1 Broker:RabbitMQ 3.8.2 engine worker With gevent mode ,100 Concurrent
Testing process : Create and start the new and old engines at the same time 100 A test process , Measure the following indicators :
- Average execution time of all nodes in the process ( Equate to
Process execution takes time - Scheduling switching takes time) - Average process execution time
- The total execution time of all processes (
Completion time of the last completed process - Start time of the first process)
The test comparison data are as follows :
Metrics | Old engine | New engine |
|---|---|---|
Average execution time of all nodes in the process | 298.76s | 166.37s |
Average process execution time | 339.1s | 172.3s |
The total execution time of all processes | 355s | 188s |
Engine internal scheduling takes time | 339.1 - 298.76 = 30.34s | 172.3 - 166.37 = 5.93s |
You can see , The throughput of the new engine is twice that of the old engine , meanwhile , The internal scheduling time of the engine is also significantly reduced , It can be seen that , This upgrade is effective and worthwhile .
The address of the new engine :https://github.com/Tencent/bk-sops/tree/sdk/sdk/bamboo-engine
Download experience
New functions called by subprocess , You can go to the official website of blue whale Zhiyun (https://bk.tencent.com/download/ ), Download Community Edition 6.0 Version base package , Install standard O & M SaaS, Experience .
Related reading
Play with task arrangement - Flexible application layer process engine
Subprocess call - Process arrangement under complex operation and maintenance scenarios
边栏推荐
- Best practices for building a distributed Domain Driven Architecture Based on data mesh
- Experience sharing on unified management and construction of virtual machine
- Kubernetes configures two ways of hot update
- ZABBIX enterprise distributed monitoring
- Lightweight toss plan 3, develop in the browser - build your own development bucket (Part 1)
- How to build a website after registering a domain name? Do you need maintenance later?
- Adobe international certification wants to design! Understanding the style guide is your best introduction design
- Netaapp data recovery process
- My two-year persistence is worth it!
- PV and PVC analysis and use in kubernetes
猜你喜欢
随机推荐
How to renew the domain name when it expires
When we talk about zero trust, what are we talking about?
What are the stages from tradition to Tencent cloud
How to build a website with a domain name? Can I build a website before applying for a domain name?
MySQL optimization
Tencent (t-sec NTA) was listed in the report emerging trends: best use cases for network detection and response recently released by Gartner
[Tencent cloud] enterprise micro marketing, private domain traffic value growth and operation efficiency improvement
Interpretation of Cocos creator source code: siblingindex and zindex
Technical dry goods | understand go memory allocation
How about the VIP domain name? Does the VIP domain name need to be filed after registration?
Hacking with Golang
Is the prospect of cloud computing in the security industry worth being optimistic about?
Will cloud server hosting become the mainstream?
Analysis and summary of the packet capturing artifact tcpdump - covering major use scenarios and advanced usage
Netaapp data recovery process
2021, how to select a programming language?
Experience sharing on unified management and construction of virtual machine
"Yi Jian Xing" was "Internet stormed". What countermeasures does the game company have other than "rather be broken jade"?
How to register domain name and web address? What is the domain name and URL?
Spirit breath development log (7)