当前位置:网站首页>Devops Practice Guide - reading notes (long text alarm)
Devops Practice Guide - reading notes (long text alarm)
2022-07-04 07:38:00 【zhibo_ lv】
DevOps Practice Guide
《DevOps Practice Guide 》 brief introduction
- This book is divided into 6 Parts of :
- The first part is an overview DevOps History and three basic principles , namely “ Three step method ”;
- The second part introduces opening DevOps The process of transformation ;
- The third to fifth parts discuss deeply “ Three step method ” The elements of ;
- The sixth part focuses on how to correctly integrate security and compliance into daily work .
- The book covers 40 More than a DevOps Case study , With Google 、 Amazon 、Facebook Based on the actual survey results of world-renowned enterprises and organizations , Show how to improve management efficiency through modern operation and maintenance management , And then win a larger market for enterprises 、 Create more profits .
The core idea of the feedback principle is : Although complex systems inevitably have errors , However, safety measures can be taken to ensure that before quality problems occur , Quickly find and handle errors . —— Zeng Chaojing
Agile is usually DevOps The guarantee of efficiency .DevOps It's not just automation , Just as astronomy is not just a telescope .
Part 1——DevOps Introduce
- DevOps Three step method :
- The principle of mobility : It accelerates development 、 Forward process from operation and maintenance to delivery to customers ;
- The principle of feedback : It enables organizations to build security 、 Reliable working system , And get feedback
- Principles of continuous learning and experiment : It creates a culture of high trust , And integrate improvement and innovation into daily work ;
Brief history
DevOps Based on lean 、 Constraint theory 、 Toyota production system 、 Flexible engineering 、 Learning organization 、 Safety culture 、 Personnel optimization factors and other knowledge systems , And reference to the high trust management culture 、 Service-oriented leadership 、 Methodology of organization change management . Apply all these most credible principles to IT In the value stream , It produces DevOps Such a result . Implement it in the whole technology value stream , Involving product management 、 Development 、QA、IT Different roles such as operation and maintenance and information security specialists , With lower cost and efforts , Ensure the high quality of products 、 reliability 、 Stability and safety .
The first 1 Chapter agile 、 Continuous delivery and three-step
- Focus on deployment lead time
- Lead time Start timing after the work order is created , By the time the work is finished ; and The processing time Then the timing starts from the actual processing of this work , It does not include the waiting time of this work in the queue .
- We should focus on Reduce lead time , Instead of processing time . however , The ratio of processing time to lead time is a very important efficiency indicator , So we have to Reduce the waiting time of work in the queue .
- Focus on Rework index : This indicator reflects the Output quality . Focus on really useful work .
- Three step method :DevOps The basic principles of
- First step : The work from development to operation and maintenance flows rapidly from left to right
- Work Visualization ;
- Reduce the size and waiting time of each batch ;
- Eliminate the transmission of defects to the downstream through the built-in quality ;
- The second step : Application continues 、 Fast feedback mechanism
- Shorten the feedback cycle ;
- Amplify the feedback loop to prevent problems from recurring ;
- The third step : Build a culture of creativity and credibility
- By actively taking risks , Not only can you learn from success , Can also learn from failure ;
- First step : The work from development to operation and maintenance flows rapidly from left to right
The first 2 Chapter First step : The principle of mobility
- Make work visible
The core , Because all optimization and analysis first need to have accurate 、 Visible data
- Avoid different teams / People may work because of incomplete information “ Kick around ”, The existing problems will also be transferred to the downstream ;
- Limit the number of work in progress
- Assign an engineer to multiple projects at the same time , He had to work on multiple tasks 、 Switch back and forth between cognitive rules and goals , Pay the cost of re entering the role ;
- For most work items , Before it is finished , In fact, it is impossible to predict how long it will take ;
- Reduce the size of a single batch
- The smallest batch is single piece flow
- Reduce the number of handover
- Information or knowledge is inevitably lost in the handover ;
- Strive Reduce the number of handover or use automation Way to perform most operations or Adjust organizational structure ;
- Continuously identify and improve constraint points
- If we optimize the work center before the constraint point , Then the work will be overstocked faster at this constraint point ;
- If you optimize the work center after the constraint point , Then they will be hungry ;
The first 3 Chapter The second step : The principle of feedback
- Work safely in complex systems
- Manage complex work , Identify design and operation problems ;
- Work together to solve the problem , So as to quickly build new knowledge ;
- In the whole organization , Apply new regional knowledge to the global scope ;
- Leaders should continue to cultivate people with above talents ;
- Discover problems in time
- The goal is earlier 、 faster 、 At the lowest possible cost 、 Increase the information flow of the system from as many dimensions as possible , And determine the cause and effect of the problem as clearly as possible .
- Work and pull together , Overcome problems and acquire new knowledge
- This will allow all participants to gain more in-depth knowledge , Understand how to manage the system , Put the inevitable 、 The early stage of ignorance becomes a process of learning ;
Toyota “ Set up the light rope ” It is a very successful case
- Light rope system : An Deng System Baidu Encyclopedia introduction
- Prevent problems from being brought into the downstream processing links , Otherwise, not only the cost and workload of repair will increase exponentially , And still owe Technical debt ;
- Prevent the work center from starting new work , That may introduce new errors into the system ;
- If the problem is not solved in a short time , Then you may encounter the same problem in the next operation , Higher repair costs are needed ;
- Pull when problems are found “ Set up the light rope This behavior should not be punished , Even this behavior is encouraged .
- This will allow all participants to gain more in-depth knowledge , Understand how to manage the system , Put the inevitable 、 The early stage of ignorance becomes a process of learning ;
- Ensure quality at the source
- Cases of ineffective quality control :
- Need the help of other teams to complete a series of boring 、 Error prone and manual tasks , These tasks should have been completed by the demander himself in an automated way ;
- It needs the approval of those who are far away from the actual workplace and busy , Force them to make decisions without knowing the work situation and potential impact , Or just seal and approve it routinely ;
- Write a lot of suspicious details , Documents that are out of date soon after writing ;
- Push a lot of work to the operation and maintenance team and Expert Committee for review and processing , And then wait for a reply ;
- Cases of ineffective quality control :
- Constantly optimize the downstream work center
This sentence is a classic of job optimization
- When you don't know what needs to be optimized , You might as well ask the downstream .
The first 4 Chapter The third step : Principles of continuous learning and experiment
- Establish learning organization and safety culture
- Culture of justice
- Unfair accidents and accident handling will hinder the safety investigation , Make safety workers feel afraid ( Instead of focusing ), Make the whole organization more bureaucratic ( Instead of being more detailed ), It will even lead to information closure 、 Avoid responsibility and breed self preservation consciousness ;
- The manager's punishment of the person responsible for the accident will not only cause fear , It will also lead to the concealment of problems and failures , Until the next catastrophic accident happens ;
- If you don't blame , Employees have no fear , There is no fear , You can be honest , And honesty can effectively prevent accidents ;
- Culture of justice
- Institutionalize the improvement of daily work
- More important than daily work , It's continuous improvement of daily work ;
- Improve daily work by clearly setting aside time , Including setting aside time to repay technical debt 、 Fix the defect 、 Refactoring and optimizing code and environment .
- Set aside a period of time between development cycles ;
- Blitzkrieg ;
- Reserved in the cycle 20% Time for engineers to solve problems they are interested in ;
- Transform local discovery into global optimization
- Once the results are achieved on a local scale , It should be shared with others in the organization
- Turn all accident reports into searchable knowledge base
- The leadership strengthens the learning culture
- Better leadership is actually creating conditions for the team , Let the team feel this excellence in their daily work .
Ron Westrum Organizational typology model : How the organization handles information
Pathological tissue | Bureaucratic organization | Production organization |
---|---|---|
concealing information | Ignore information | Actively explore information |
Destroy the messenger | Don't pay attention to messengers | Training messenger |
Evasion of responsibility | Each bears his own responsibility | Shared responsibility |
Hinder team interaction | Tolerate team interaction | Encourage alliances between teams |
Hide the accident | The organization is fair and tolerant | Investigate the root cause of the accident |
Suppress new ideas | Think new ideas will cause trouble | Accept new ideas |
Part 2—— Where to start
How to move forward in the organization DevOps The first step of transformation ? Who needs to participate ? How to organize a team ? How to ensure that team members invest their energy and maximize their chances of success ?
The first 5 Chapter Choose the right value stream as the entry point
- Start with the team that is most willing to innovate
- The first step should be to focus on a few pilot areas , Make sure they succeed , Then gradually expand ;
The big bang transformation from top to bottom is not impossible , But this model needs to be supported by the highest level
- The first step should be to focus on a few pilot areas , Make sure they succeed , Then gradually expand ;
- expand DevOps The scope of the
- Show results as soon as possible , And actively promote . Break down big improvement goals into small incremental steps
- First find innovators and early adopters
- Then win the followers
- The final solution is “ Nail house ”
The first 6 Chapter understand 、 Visualizing and applying value streams
- Have a common goal
- For each iteration , The team should customize a set of small goals that can produce value , And moving towards long-term goals . At the end of each iteration , The team should check the progress , And set new goals for the next iteration
- Keep the improvement plan small
- Reserve... For non functional requirements 20% Development time of , Reducing technology debt
- hold 20% The development and operation and maintenance time invested in refactoring 、 Automation work 、 Architecture optimization and non functional requirements .
- Marty Cagan
eBay Senior vice president of products and design
Pointed out that , If the organization is not willing to pay for this “20% The tax ”, Then the technology debt will eventually deteriorate to the point where all available resources are exhausted . One day , Service will become vulnerable , Demand delivery will stagnate , All the engineers are solving reliability problems or looking for temporary solutions .Many first-line companies have experienced this scenario in the early stage , Finally, the essence summed up
- Improve visualization of work
The first 8 Chapter Integrate operation and maintenance into daily development work
- Create shared services , Improve development productivity
- Build production services
- Deployment pipeline
- Automated test tool
- Production environment monitoring console
- Integrate operation and maintenance engineers into the service team
- The service team has a fixed Operation and maintenance contact ;
- Attend the development team meeting ;
- Join the operation and maintenance work to see the layout display ;
Part 3—— First step : Mobile technology practice
The first 10 Chapter Achieve fast and reliable automated testing
Without automated testing , Then the more code we write , The more time and money it takes to test the code . in the majority of cases , This business model is not scalable for any technology organization .
- Try to automate manual tests
- The purpose of automated testing is to find as many code errors as possible , And reduce the dependence on manual testing ;
- Perform a few reliable automated tests , It is often better than performing a large number of manual tests or unreliable automated tests . If you abandon a manual test and change it to Automation , The production environment is defective , Then you should add this test back to the manual test suite , But ultimately it should be automated ;
- When the team has paid attention to automated testing , You can start to introduce measurement test coverage , And visualize the measurement results , Generally, coverage can be defined as 80%;
- Once a developer submits a code change that causes a build or automated test to fail , This problem needs to be solved immediately . If someone needs help in solving a problem , They can get whatever resources they need ;
- We believe that team goals are higher than personal goals —— While helping others advance their work , It also helped the whole team ;
- Everyone should know that work is more than “ Write code ”, But also “ Operation service ”;
The first 11 Chapter Continuous integration of application and practice
The longer developers work independently on their own branches , The harder it is to incorporate these changes into the trunk . in fact , When the number of branches and the number of changes on each branch increase at the same time , The difficulty of merging will soar .
- Adopt continuous integration and backbone based development
- The more frequently developers submit , The smaller the number of submissions each time , They are far from ideal One piece flow The closer the state is
- Refuse to accept any submission that makes the system deviate from the deployable state , This is called Door control submission
- Daily code submissions also force developers to further break down their work
- Backbone based development can bring higher productivity 、 Better stability , Even higher job satisfaction and lower job burnout rate .
The first 12 Chapter Automated and low-risk publishing
- Automate the deployment process
- Handle the deployment of all environments in the same way
- Perform smoke tests on the deployment
- Maintain the consistency of the environment
- Decouple deployment from release
- Blue and green deployment
- Release of canary
- Cluster immune system
- Black Start Technology
The first 13 Chapter Architecture to reduce release risk
The strangler application mode is especially suitable for migrating some functions of single applications or tightly coupled services to a loosely coupled architecture .
Part 4—— The second step : Feedback and technical realization
Everyone can get work feedback , Information is visible for everyone to learn , Can quickly test product assumptions , Help us judge whether the currently developed features help to achieve the business objectives of the organization .
The first 18 Chapter Establish review and collaboration processes to improve current work quality
- Put an end to counterfactual thinking
- There was an accident , We should pay attention to : Why does this happen ? Why did we do this at that time ? How should we avoid similar problems in the future ? Instead of blaming , Ask why there was no such thing at that time / Do that ?
- Peer review
Code review
- The goal of peer review is to reduce change errors through careful verification by engineering colleagues . This form of review not only improves the quality of changes , It is also equivalent to cross training .
- The reasonable time is when the code is submitted to the trunk in the version control system .
- Let the programmer see 10 Line of code, he can find 10 A question . Ask him to examine 500 Line code , He would say that everything looks good .
- The form of code review
- Pair programming
- Development + Development
- Development + test
- Old colleagues + New colleagues
- Set a small period of time every day as pair programming time
- Email for approval
- Tool aided review
- Centralized review : One or more engineers review one or more pieces of code together
- Pair programming
Part 5—— The third step : Technical practice of continuous learning and experiment
Quickly apply and promote the experience learned by the team or individual in a certain field to the whole organization
The first 19 Chapter Integrate learning into daily work
- Establish a culture of justice and learning
- Treat faults and defects as opportunities for learning , Not a chance to punish .
- If the response to events and accidents is considered unfair , It may hinder the safety investigation , Thus causing fear among personnel engaged in safety critical work , Make the organization more bureaucratic rather than more cautious , And induce professional confidentiality 、 Escape and self-protection .
- Bad apple theory : Human error is not the cause of the problem ; On the contrary , Human error is the result of design problems in the tools we provide . If the accident is not “ Bad apples ” Caused by the , It is caused by the inevitable design problems in the complex system we build , Then the person who caused the failure should not be “ Roll call 、 Blame and shame ”. Our goal should always be to maximize opportunities for organizational learning , Continue to emphasize that we attach importance to widely revealing and communicating problems in daily work . Only in this way can we improve the quality and security of our system , And strengthen the relationship between all people in the system .
- Hold an after the fact analysis meeting without blame
- After the accident , Memory fades 、 Causality becomes blurred 、 Before the environment changes , Arrange the post analysis meeting as soon as possible .
- We need to do :
- Build schedule , Collect all details about the fault from multiple perspectives , Promise not to punish those who make mistakes ;
- By asking all engineers to explain in detail how they caused the failure , Enable them to improve safety ;
- Allow and encourage those who make mistakes to become experts who teach others not to make the same mistakes in the future ;
- Create a free decision-making space , Let people decide whether to take action , And put the judgment of the decision after the fact ;
- Formulate countermeasures to prevent corresponding accidents , And make sure to record these countermeasures 、 Target date and person in charge , For tracking ;
- Who needs to participate in the meeting :
- People involved in decision-making on related issues ;
- Identify the person with the problem ;
- People who respond to questions ;
- The person who diagnosed the problem ;
- People affected by the problem ;
- Anyone interested in participating in memories ;
- It is strictly forbidden to use “ It should be ” or “ Could have been ” Wait for words , Because they are counterfactual statements .
- Publish the results of the post event analysis meeting as widely as possible
- Make these post analysis documents widely available and encourage others in the organization to read , It can enhance organizational learning .
- Reduce accident tolerance , Look for weaker fault signals
- Everyone should face failure calmly , Assumed liabilities , And learn from failure . in fact , When the number of accidents decreases significantly , Tolerance has also decreased , So that we can continue to learn .
The first 20 Chapter Set aside time for organizational learning and improvement
- Improve Blitzkrieg
- It is an important part of Toyota Production System , It refers to solving a specific problem in a specially concentrated period of time , Usually for several days .
- Improving Blitzkrieg often takes the form of , A group gets together , Focus on the process of exploring an existing problem , The goal is to optimize the process , The method is to focus on People outside the process Give advice to people who are usually in the process .
- Let everyone learn from each other
- Whether through traditional preaching ( attend class;class begins 、 train ), Or through a more experimental or open approach ( meeting 、 discuss 、 Work like 、 To guide the ), Dynamic learning culture can not only create learning conditions for everyone , It can also create teaching opportunities . We can devote special organizational time to promote this kind of teaching and learning .
Your praise is the biggest driving force of my creation , If you write well , Can we have a three company
边栏推荐
- L1-023 output gplt (20 points)
- I was pressed for the draft, so let's talk about how long links can be as efficient as short links in the development of mobile terminals
- Literature collation and thesis reading methods
- A real penetration test
- Zabbix agent主动模式的实现
- rapidjson读写json文件
- BasicVSR++: Improving Video Super-Resolutionwith Enhanced Propagation and Alignment
- The crackdown on Huawei prompted made in China to join forces to fight back, and another enterprise announced to invest 100 billion in R & D
- Zephyr learning notes 1, threads
- 神经网络入门(下)
猜你喜欢
Status of the thread
Zephyr Learning note 2, Scheduling
[web security] nodejs prototype chain pollution analysis
Technical experts from large factories: common thinking models in architecture design
Data double write consistency between redis and MySQL
深入浅出:了解时序数据库 InfluxDB
PCIe knowledge points -010: where to get PCIe hot plug data
Text processing function sorting in mysql, quick search of collection
Introduction to rce in attack and defense world
Oracle-存储过程与函数
随机推荐
Flask 常用组件
Unity 从Inspector界面打开资源管理器选择并记录文件路径
SQL foundation 9 [grouping data]
Adaptive spatiotemporal fusion of multi-target networks for compressed video perception enhancement
博客停更声明
flask-sqlalchemy 循环引用
SQL注入测试工具之Sqli-labs下载安装重置数据库报错解决办法之一(#0{main}thrown in D:\Software\phpstudy_pro\WWW\sqli-labs-……)
Unity opens the explorer from the inspector interface, selects and records the file path
Tri des fonctions de traitement de texte dans MySQL, recherche rapide préférée
【Kubernetes系列】Kubernetes 上安装 KubeSphere
One of the general document service practice series
[FreeRTOS] FreeRTOS learning notes (7) - handwritten FreeRTOS two-way linked list / source code analysis
MySQL error resolution - error 1261 (01000): row 1 doesn't contain data for all columns
Oracle-存储过程与函数
21个战略性目标实例,推动你的公司快速发展
MySQL storage engine
Book list | as the technical support Party of the Winter Olympics, Alibaba cloud's technology is written in these books!
With excellent strength, wangchain technology, together with IBM and Huawei, has entered the annual contribution list of "super ledger"!
Chain ide -- the infrastructure of the metauniverse
[C language] open the door of C