当前位置:网站首页>Resource Cost Optimization Practice of R & D team
Resource Cost Optimization Practice of R & D team
2022-07-03 13:28:00 【Haibao 7】
background
Engineers mainly face technical challenges , Pay more attention to technical goals . The manager of the R & D team will take the achievement of project results and business needs as the core goal . In actual projects , Resources needed by the R & D team ( Such as physical machines 、 Memory 、 Hard disk 、 Network bandwidth, etc ) Cost of , It's easy to overlook , Or think about it very late .
In general , If we want to meet more technical indicators such as concurrency and complexity , Or meet the pressure of peak business , The most direct and effective way is to invest more resources . However , On the whole , If the cost of resources is not optimized , Finally, as shown in the figure below “ Diminishing marginal utility ” The phenomenon —— The increase of technical ability does not match the increase of resources , Even the expansion speed of resources will exceed the improvement of technical ability , So that the technical project itself ROI A big discount .
From the author's more than ten years of work experience , Resource cost optimization should be done sooner rather than later . Many managers are in the early stage of business development , May not be aware of the pressure caused by resource cost inflation . When the business reaches a certain scale , When I get the machine bill , Exclaim “ Why is the machine so expensive ”, Want to reduce costs immediately , May have missed the best time , Because technology itself is a relatively long-term transformation process .
therefore , Readers who are reading this article , If you have felt the pressure of cost inflation , Or is doing work related to cost control , Congratulations , This is the trouble of happiness , Your business volume should have reached a certain scale , It also shows that your management awareness may have been ahead of business development ( handshake ).
In this article, we will share the resource cost optimization practice of meituan food R & D team .
practice
Companies with a size like meituan , The provider of resources includes multiple teams , In addition to the resources used by the restaurant R & D team , There are also several brother teams that provide resource support or jointly build , such as SRE、 Big data team 、 The risk control team 、 Advertising team, etc . After getting the cost bill every month , We all need to strip our cocoons , Disassemble each item , To formulate corresponding solutions , The specific process is shown in the figure below .
1. Determine methodology
“ Everything is done in advance , If you don't anticipate, you will lose ”. Before doing one thing , Fully evaluate the elements of the complete life cycle of the whole work , And design the overall working framework , A scientific methodology is very necessary . Cost optimization follows a mature in the industry P(Plan)D(Do)C(Check)A(Act) methodology , In each stage, there are corresponding secondary iterations and microcirculation . The specific methodology is as follows :
stay Plan or Standard What to do in the stage : Build awareness -> Set goals -> Analyze the status quo -> Determine the evaluation index .
stay Do What to do in the implementation stage : Decompose atomic items -> Determine the plan -> Put it into practice -> Optimize atomic indicators .
stay Check What to do in the inspection stage : Specified action check -> Evaluation of action results -> System problem location -> Correct the standard action .
stay Act What to do in the optimization processing stage : Regular resumption -> Form a report -> Iterative cognition -> Upgrade methodology -> Next stage goal .
2. Planning stage (Plan&Standard)
The core goal at this stage is : use 2-3 Three indicators to measure your work . The reason why many jobs fail in the end , Many times, relevant personnel simply have no way to measure their work with specific and measurable indicators , So for the work results , There can only be one “ qualitative ” The understanding of ( For example, very good , Very good , Not good. , Poor ), And you can't do it “ ration ”.
For R & D personnel , The result that cannot be quantified is not scientific , Specifically, how to determine the indicators , Or determine which indicators are the work objectives , In fact, it is also a science ( Have the opportunity to post another article to discuss ). The suggested steps at this stage are :
Build awareness . This is the team Leader The primary responsibility of , Let team members know how much they spend on resources , Is cost control a matter of real significance and value , We should achieve the same cognition . Although I have seen some teams advocating cost control , But when it comes to concrete actions , But it's a mere formality or there's no way to start , Finally, I can only stay in words , It didn't produce practical results .
Set goals . This process is relatively macro , You could say yes “ qualitative ” The stage of . What should be clear at this stage is , On cost control , What is the problem to be solved in the follow-up action ? For example, some teams have high overall costs , But the total cost of some teams is not high , Instead, it should increase costs , Some teams consume high costs for non core services , These goals need to be discussed by team members to get consistent results . In the iteration of subsequent stages , It can also be revised constantly . It's like “ Customers never know their needs ” equally , Many people don't know their goals , have access to SMART Principles to clarify goals .
Analyze the status quo . About the cost , List relevant data , Help yourself make judgments as much as possible . My team is on cost optimization , At which stage , Which jobs are likely to be further optimized , At this stage, it should be made clear .
Determine the evaluation index . For different professional sequences , Even for different people in the same professional sequence , Everyone has different evaluation indicators for cost . At this stage, the final convergence should be achieved , Optimize the future cost of the team , Express it with clear data . Specifically in the catering R & D team , We confirmed that 2 Three optimized core indicators : The total cost 、 Total order cost . Follow up all your efforts , If there is no relationship or weak correlation with these two indicators , Can be ignored .
The biggest experience at this stage is “ Easier said than done ”, Although it's easy to think of oneortwo directions and goals by patting your head , But finally, when using data to demonstrate the current situation , How to judge yourself is “ good ”、“ good ” still “ fail, ”? Who is the benchmarking team ? Why is the benchmarking object TA? It all depends on the size of personnel 、 Business stage 、 Business volume 、 Consider the characteristics of the industry carefully , Also need to think clearly , Its workload is not even smaller than the actual working stage .
3. Execution phase (Do)
3.1 Establish a thinking process
The process in the implementation phase is : Decompose atomic items -> Determine the plan -> Put it into practice -> Optimize atomic indicators . There are two core elements here :1) Decompose the work related to core indicators to the next level ;2) On the next floor , Find specific people to implement , This person should have the ability to decompose the indicators he is responsible for into more details , Similar to what we call tree structure . In this way, it decomposes layer by layer , The leaf node of each layer can find the corresponding person in charge . such “ Total score ” structure , In a classic tutorial 《 Pyramid principle 》 It is also described in detail in .
Decompose atomic items . At this stage, a completely detailed hierarchical structure should be established , Use the... In the pyramid principle ”MECE No weight, no leakage ” principle , Decompose the work content to the smallest controllable granularity . As for which dimension to split , Different teams or businesses may have different principles , For example, some teams are split directly by sub teams , Some teams are split by business , Some teams split up according to the process . From the perspective of more teams , Cost control , Indicators can be simply decomposed into secondary indicators , Include “ Cost of self use ” and “ Allocated costs ”. among ,“ Cost of self use ” Refer to , In order to meet the needs of their own business , The cost incurred by the technical team in applying for or using resources ;“ Allocated costs ” Refer to , Because according to some kind of calculation logic , Indirectly used the resources of other teams , Bear part of the cost for other technical teams , For example, common resources include advertisements developed by other teams in the company 、 Put on the 、 Risk control 、 Security and other systems . If it can be split into specific systems , Then each system can be further broken down into finer grained constituent projects , Each node is a small “ Total score ” structure , Continue to decompose downward according to this logic , Can be divided into “ The most fine-grained cost that can be landed ” and “ The most fine-grained cost allocation that can be implemented ”.
Then according to the method described at the beginning , Determine the evaluation index of each atom , Projects that cannot be quantified are “ behave like a hoodlum ”. In this way, a more complete pyramid structure is formed , As shown in the figure below :
Determine the plan . According to the pyramid structure above , Every atomic index , Professional students are needed to evaluate and analyze , Determine how to optimize . such as , Cost of system host , Mainly focus on virtual machines + Store such resources , The measurement index can be determined as “ Resource utilization ” and “ Single order cost ”, In order to solve “ Resource utilization ” This atomic index , We need to consider whether the current idle machine can be offline , Whether online services can be optimized or merged ; In order to solve “ Single order cost ” This indicator , Consider analyzing the system architecture , Whether the services related to the core process processing can be more efficient or abstracted into a service platform , This will release some ” Chimney type ” Construction resources , Make the core processing capacity more centralized 、 Efficient . Integrate all solutions like this , The final solution is formed .
Put it into practice . With the plan , Be sure to be unique Owner( Lord R), Based on experience , Lord R Only one would be better , Otherwise, it will cause “ responsibility ”、“ power ”、“ benefit ” Unclear segmentation . In the process , It is also a good opportunity to cultivate the technical ability and structural ability of the team .
Optimization indicators . Different plans , The implementation cycle and cost are different , Each master R After going deep into different majors , Analysis and feedback on current resource indicators , It is possible that theoretically all indicators need to be optimized , It is also possible that some indicators have been very good , At this time, it is necessary to identify the implementation of resource indicators “ Leverage ” Relatively high , It is recommended to apply 80/20 Analyze the principles , That is, some indicators are invested 20% Resources and energy can solve the final 80% Core issues of , Ensure that the input of appropriate workload will bring higher output . For resources without solutions or resources that are too difficult to implement , It is suggested to give up or put aside .
3.2 Practice analysis framework
In practice , We can put the above process , Again, use a pyramid structure to express , As shown in the figure below :
Established the above structure , According to the different majors , Optimize their respective indicators , If the most detailed indicators are successfully optimized , The top indicators are bound to decline . Because the above indicators have their own deep-seated businesses 、 technology , Even financial logic , Therefore, I will repeat some concepts that need attention here .
The machine cost of each technical team in many companies , In finance, it is called “ Website operation and maintenance costs ”( Website ? It sounds like PC Is the concept of the times right ), From the top level, it can be divided into two types of constituent factors , Namely “ Own costs ”( For your own use ) and “ Allocated costs ”( Others use it for you ) Two categories: . Continue to drill down related to yourself , It can be divided into transaction related resource costs ( Business process related ) And the cost of big data related to analysis ( analysis 、 Algorithm 、 Decision making is relevant ).
3.2.1 Business host cost
Most business system teams , The cost of resources used is included in this part , For example, the merchant R & D team 、 Order system R & D team 、 Front end R & D team 、 Supply chain R & D team 、 Marketing system R & D team 、CRM R & D team, etc . The typical physical carrier of these resources is the physical machine 、 virtual machine 、 Storage of container resources and corresponding machine connections (DB、 cache 、K-V Database etc. ) resources , It will also include due to exchange 、 The bandwidth generated by storing data between the above resources 、 Cloud resources 、CDN etc. .
This part of the resources , From the perspective of cost control , The shallowest level , It is recommended to pay attention to the service group (OWT) The resource utilization of the consumed host , If there are many hosts with low resource utilization , It is recommended to get offline in time . meanwhile , In terms of the technical solution itself , Between the business capacity carried by any service and the resources consumed , There will be a relative “ The proportion ” Or weight . Whether some highly utilized services can be architecturally restructured 、 Decoupling or transformation , It is also very conducive to saving resources . This content is related to the work of the food technology department in the past year , For the core 、 Non core services have been combed , The services that can be optimized are also partially reconstructed . Compared with the early , It can reduce the cost of resources , The changes of the two main indicators of business host cost are as follows ( remarks , Subsequently, due to the addition of other businesses, the cost increased slightly
3.2.2 Big data costs
The application of data industry in the Internet has been relatively mature , The mainstream data processing architecture in the industry is Yarn 2.0 Or similar framework , The core resource consumption is mainly based on Container(Vcore+Mem) Computing resources for + be based on HDFS Storage resource consumption of these two parts :
The first part , It is the consumption of storage resources , The general model in the industry is based on Physics HDFS Or self-developed similar storage engine , This part mainly refers to offline ETL Used to partition ( Usually by timestamp ) Resources for storage , Because one of the core concepts of data warehouse is preservation “ all ” The data of , On this basis, the data is pre summarized according to the dimension modeling theory 、 Add and . however , Due to the different understanding of the model construction itself , Therefore, data redundancy based on basic data , In the view of many data developers, it is taken for granted , This leads to the rapid expansion of storage resources , This is a difficult problem faced by every data team in the management process .
Here it is , The catering R & D team mainly adopts two methods :
1、 The heat of the data model is graded , Divide the data into cold 、 temperature 、 Thermal data , Only the data that needs to be retained is saved in the production environment HDD、SSD in , For unimportant cold data , Store in other media in a heterogeneous way .
2、 For the data model itself , We need to rethink the value and storage of data , In the middle layer of data ( Convergence layer ), Reconstruct the data model , This is also the basic skill that many data teams ignore .
The arrival data team has conducted a second iteration of the data warehouse , Every time based on a new business model , Rebuild the market on the middle tier and above 、 Wide surface , Effectively saves space .
Another technical means is compression , For example, the data of traffic is often stored by large users , But the format of traffic data is relatively fixed , So many traffic data can be compressed or its storage format can be changed ( Such as map type ), According to the actual measurement, it can save 20% The above traffic data space .
In addition, we need to add , There is another part OLAP Storage resources , It will also consume a lot of resources , such as Kylin、Elasticsearch、Druid、MySQL etc. , These databases are mainly used to HDFS File on , Synchronize to the media that can be directly accessed by the front end , For system access .
Some of these resources are also based on HDFS Of ( Such as Kylin、HBase), Some require separate storage media , We also need to pay attention to its expansion speed and storage cycle .
The second part , Is the consumption of computing resources , It mainly satisfies the analysis based on complex rules or the calculation in machine learning algorithms , That's real time ETL Computing and offline ETL Calculated scenario ( Representative engines such as Storm、Flink There are also MapReduce The calculation of ).
The resources consumed by this part of calculation are similar to those of business systems , You can refer to “ Resource utilization ” Determine several indicators , Machine optimization or algorithm logic optimization .
3.2.3 Cost sharing ( One ) Risk control and anti climbing
In some companies , Content developed by a technical team , It is possible to serve other team businesses , For example, the risk control mentioned above 、 Anti creeping 、 Advertising, etc , Will provide basic technical capabilities for various businesses . At this time, an important concept is involved “ Share the ”. There are two rules for allocation , One is to press “ Actual dosage ”, The other is according to “ Use proportion ” Conduct , Above these two modes , There may also be a mixed billing mode , namely “ Allocate the overall cost according to the proportion actually incurred ”, When doing cost control , It is necessary to clearly know which logic this part of the cost is calculated according to .
In the practice of risk control and anti climbing , Meituan's risk control and anti climbing are based on the overall cost of the overall risk control technology team , Prorated to the business team . So as a business team , If you try to reduce this part of the cost , Also focus on two components : The first is the absolute value of the number of risk control and anti climbing atomic businesses used by oneself , It is necessary to judge whether the total number of requests for daily risk control and anti climbing is reasonable , To ensure that their business requests do not increase ; The second is the proportion of their own business use . It needs to be analyzed together with the relevant technical team , To prevent some scenes , The absolute value of its own business use has decreased , But because the absolute value of other businesses fell faster , Cause their proportion to rise instead , This leads to higher costs .
3.2.4 Cost sharing ( Two ) Safety warehouse cost
In order to ensure the offline data exchange between various business teams , Meituan group has built a secure data warehouse , It is used to meet the data exchange between cross teams . The cost of this part is also counted according to the proportion of resources actually incurred , therefore , Empathy , To reduce costs , There are two components that need attention :
One is the quantity used by oneself , Whether the efficiency of relevant data models can be improved from the perspective of architecture design 、 Reducing space is the key factor ;
Second, the proportion of their own resources in the overall resources , At this time, we also need to work with relevant teams to reduce the total cost . Technology teams in many companies , There is also a similar concept of data sharing warehouse or co construction warehouse .
3.2.5 Cost sharing ( 3、 ... and ) Advertising cost
Many Internet companies have technical teams for advertising business , The main forms of advertising are charging per click CPC, Long term charge on time CPT wait , The logic of this part of allocation is the same as the above two , It is also apportioned according to the proportion of the final total cost .
But there is a point that needs to be paid attention to , Because the business logic of advertising is not to eat their own business side , That is to say, the part that can be controlled by the R & D team is relatively small , Therefore, an effective evaluation system needs to be established in this process , To measure the cost of advertising , The indicators used here are “ Thousand exposure cost ” and “ Thousand yuan advertising revenue cost ”, Here is for your reference only .
3.2.6 Other costs
In addition to the items combed above , Some new cost items will be added every month , The team should keep enough attention .
In practice, we will find that the cost of a certain item suddenly rises in a few months , At this time, we need to find a new project , Or is an indicator adjusted in business or Algorithm .
4. Check (Check)
Specified action check .
Whether the specified scheme is implemented ? Whether the relevant students have taken corresponding actions according to the specified actions ? This stage only focuses on the process, not the result , And more attention is paid to the executor 、 Cooperating party 、 Point in time , Operate with the idea of project management .
Evaluation of results .
Whether the indicators sorted out before have been optimized ? This process is to verify the results , The optimized and non optimized indexes in each index should be sorted out in detail List, Some indicators such as “ Resource utilization ” You can view the results immediately , Some results require periodic time to obtain .
On this basis, we can continue to think in reverse , Press “ Is there a problem with the definition of indicators -> Is there any problem in the formulation of the plan -> Is there a problem with the executor -> Is there any problem with the cooperating party ” This process is used to evaluate .
System problem location .
In the process , It can be closed in a small range , It is suggested that multiple sets of optimization schemes can be designed for a certain index , programme A No, iterate into a plan immediately B, Quick trial and error , Find a reasonable plan .
Correct the standard action .
In the process of execution , Many schemes and actions , They were found and corrected on the front line , There is no need to wait for a large-scale resumption before asking questions and summarizing , Lord R Have such awareness , Say more and ask more in the process of implementation , Find the key elements , I believe every student has had such an experience .
Students who have experienced a complete project life cycle , It is often the fastest growing backbone in the team .
In the practice of the catering R & D team , Similar experience can be shared in the definition of indicators of business systems . When the optimization work starts , A lot of projects and indicators have been defined , For example, business hosts are divided into cloud storage 、 bandwidth 、CDN、Tair、Redis wait , Pay attention to each item for RD The time and energy invested are huge losses , Later, after repeated confirmation with the relevant brother team , An upward abstraction “ Resource utilization of the service group ”, At this time, you don't need to pay attention to too many detailed projects , And only pay attention to the use of machines related to these services , Because the machine will naturally consume CPU、 Memory 、 bandwidth 、CDN etc. , This can effectively save the time and cost of operation , Focus on optimizing machines and optimizing service architecture design .
5. Conclusion of the second round , Continue iteration (Act)
Regular resumption .
Double check is a very important ability , Personally think , To some extent, the ability to make a comprehensive summary also represents our own “ The ability to abstract + Thinking ability + Ability to manage ”, There are many books about the methodology of double disk , I won't go into details here . At this stage , Personal suggestions focus on two “ know ”:“ Know you don't know ”, Master the method of cost optimization through double checking 、 frame 、 programme 、 Team quality 、 result ;“ I don't know I knew ”, Through some results , I know whether I have been on the right path or the wrong path , The with “ luck ” The success of ingredients , Sublimate into a future “ Habitual success ”.
Form a report .
Let the person who saw this report for the first time , Can also pass through 1-2 It's a practice , Learn about cost optimization .
Iterative cognition .
Deepen and iterate the previous process , Again PDCA The process of , Polish your abstract ability repeatedly 、 Thinking ability 、 Ability to manage , Make yourself work deep 、 Breadth ROI Continue to improve . stay
During iteration , There will always be some surprises and gains . Personally , Originally, I thought the cost project was just a management project , In the process of continuously obtaining cost optimization through technical means , Gained the right architecture 、 The understanding of Technology , And many times we need to use innovative means to solve problems that have not been broken through by our predecessors , In addition, I also gained 7 Item and architecture upgrade 、 data compression 、 Technology processing related technology patents , It is also a proof of the improvement of technical ability .
summary
Cost optimization , It may be ignored periodically , But the importance always exists . The catering R & D team has operated for nearly a year , Help the company save tens of millions of costs . This process is sometimes boring , Sometimes it's exciting , Sometimes it's annoying and frustrating , Sometimes I'm actually asking myself a question :“ On the premise of continuous business , Dare to cut off the redundant machines ?” In today's increasingly refined management , I believe that more people of insight also have some needs or have carried out some practice . Looking forward to working with industry peers , On the premise of ensuring the technical ability and meeting the business , More rational use of resources , Save company costs , Continuously improve the efficiency of the R & D team , I hope this article can give you some inspiration .
Author's brief introduction
Liu Qiang , Head of data direction of meituan daodian catering R & D Center , Member of meituan data technology channel ,2017 Comments on joining meituan in , Working in the catering R & D center of daodian , Be responsible for the arrival data warehouse 、 Data products 、 Research and development of data system . Previously, he was the head of data direction of many companies .
Build a clock 、 Xiaoying 、 Yang Xuan 、 Yun Jie 、 Fang Xu 、 Pengwen , All of them are engineers of meituan catering R & D team , Have contributed to this article .
Special thanks
In this paper and the process of cost optimization , Got Li Wei, the technical team of meituan 、 Ren Dengjun 、 Li Wen 、 Xie Yuchen 、 Hong Dan 、 Zuo Pucun 、 Guoshuyi 、 Diaoshihan and others' support and help , Thank you !
Reference source :https://tech.meituan.com/2019/02/21/rd-team-resource-cost-optimization-practice.html
边栏推荐
- PowerPoint 教程,如何在 PowerPoint 中將演示文稿另存為視頻?
- 显卡缺货终于到头了:4000多块可得3070Ti,比原价便宜2000块拿下3090Ti
- Server coding bug
- Kivy教程之 如何自动载入kv文件
- php:  The document cannot be displayed in Chinese
- The network card fails to start after the cold migration of the server hard disk
- 人身变声器的原理
- [Database Principle and Application Tutorial (4th Edition | wechat Edition) Chen Zhibo] [Chapter 7 exercises]
- [sort] bucket sort
- Comprehensive evaluation of double chain notes remnote: fast input, PDF reading, interval repetition / memory
猜你喜欢
The 35 required questions in MySQL interview are illustrated, which is too easy to understand
Flink SQL knows why (XIV): the way to optimize the performance of dimension table join (Part 1) with source code
MySQL functions and related cases and exercises
[today in history] July 3: ergonomic standards act; The birth of pioneers in the field of consumer electronics; Ubisoft releases uplay
Flink SQL knows why (19): the transformation between table and datastream (with source code)
Libuv库 - 设计概述(中文版)
AI 考高数得分 81,网友:AI 模型也免不了“内卷”!
This math book, which has been written by senior ml researchers for 7 years, is available in free electronic version
8 Queen question
2022-02-11 heap sorting and recursion
随机推荐
已解决TypeError: Argument ‘parser‘ has incorrect type (expected lxml.etree._BaseParser, got type)
MapReduce实现矩阵乘法–实现代码
R语言使用data函数获取当前R环境可用的示例数据集:获取datasets包中的所有示例数据集、获取所有包的数据集、获取特定包的数据集
MyCms 自媒体商城 v3.4.1 发布,使用手册更新
研发团队资源成本优化实践
The 35 required questions in MySQL interview are illustrated, which is too easy to understand
显卡缺货终于到头了:4000多块可得3070Ti,比原价便宜2000块拿下3090Ti
Solve system has not been booted with SYSTEMd as init system (PID 1) Can‘t operate.
2022-02-10 introduction to the design of incluxdb storage engine TSM
2022-02-11 practice of using freetsdb to build an influxdb cluster
2022-02-09 survey of incluxdb cluster
Typeerror resolved: argument 'parser' has incorrect type (expected lxml.etree.\u baseparser, got type)
Kotlin - improved decorator mode
人身变声器的原理
Flink SQL knows why (16): dlink, a powerful tool for developing enterprises with Flink SQL
Libuv库 - 设计概述(中文版)
[today in history] July 3: ergonomic standards act; The birth of pioneers in the field of consumer electronics; Ubisoft releases uplay
Start signing up CCF C ³- [email protected] chianxin: Perspective of Russian Ukrainian cyber war - Security confrontation and sanctions g
The difference between stratifiedkfold (classification) and kfold (regression)
Idea full text search shortcut ctr+shift+f failure problem