当前位置:网站首页>The trunk warehouse can also be tob, and Tencent cloud microenterprises do not leave quality behind
The trunk warehouse can also be tob, and Tencent cloud microenterprises do not leave quality behind
2022-06-24 04:45:00 【economic expert】
Few teams in the company use trunk warehouse development , Questions have been raised all the time ,toB The field is even more controversial , Many teams worry about this and there . Tencent cloud microenterprises in actual combat , The delivery bandwidth is seriously insufficient , While the R & D team delivers new tasks , And maintain a lot of old code , But most of the time, the old code is piled up temporarily , Maintainability is basically not , Team members to solve an old problem , It often takes several days to break your muscles and bones , In this context , New deliveries are bound to be affected , In order to catch the delivery time , R & D students continue to stay up late and pile up more “ The garbage ” Code , There is no time to improve the technical power , At the same time, I feel strongly that the product is abusive , Leaders are in a hurry , miserable . After the R & D students suffocated their internal injuries , I had to walk away , Students who accept the offer continue “ cheat ” And quilt “ cheat ”, A vicious cycle . To avoid code corruption and R & D delivery “ Ponzi scheme ”, Tencent cloud microenterprises decided to take advantage of the trunk warehouse to magnify the advantages and disadvantages , Facing challenges , Don't leave a way for yourself , from 2020 year 3 Month began to be used , Very lucky , Forced by the trunk warehouse , The team has made rapid progress , It also broadens the delivery bandwidth , See the light .
Today, I will share the following points :
- devops Ideas and methodology
- Customer feedback processing efficiency improved
- Practice of improving development efficiency ( The trunk + Da Cang )
- Fast positioning efficiency improvement practice ( The cloud monitoring + Tianji Pavilion + Large warehouse integration )
- Fast delivery efficiency improvement practices
- Basic principles of code design and CR practice
1. DevOps idea
Our most basic and core idea : sustainable 、 Since the driver 、 Small iteration
1.1 Forward cycle
From the beginning, make Devops Start , Our three-step plan has been implemented one by one
First step
Build a simple and continuous CI/CD The process of . from Dev To Ops, A simple delivery process .
The second step
Establish a feedback loop , Customer feedback can be transferred to our development through many channels , product , Test this side .
Then we go to iterative development , Iterative upgrade . Make our products more perfect , Customer feedback is more proactive .
The third step
Between a big cycle , Before user feedback , We will do a lot of small iterations internally , These small iterations ensure that users get a more stable product than before through the following action items
- Establish a sound quality inspection mechanism
- Build a learning organization
- Establish a mechanism for rapid release and launch
- Establish a sound code quality interception mechanism
At present , The cloud microenterprises are gradually building the third stage . Although there is still a long way to go , As long as the direction is right , It will surely come to the end .
1.2 Obstacles to evolution
No matter what changes are made , When you try to change the inherent development delivery specifications of a department or group . There will be an obstacle to its evolution .
Because in all our groups , All people can be divided into five factions .
- The Precursors
- Early practitioners
- Most of the early
- The lagging majority
- A stubborn person
In the face of this situation , How to quickly promote one in a group or department DevOps The transformation of ?
Evolutionary methodology
Follow two steps
- Find the people in this group
innovatorandpractitioner, Convince them to participate in the evolution together - Build a learning organization , The invitation
Most of the earlyParticipate in
as for , The lagging majority and A stubborn person , When the number of users exceeds half , They will also passively accept these changes .
1.3 Deal with key
How to measure a devops Well done
- Customer satisfaction
- R & D happiness
1.3.1 Customer satisfaction
If there is no problem with the product we deliver , It will definitely satisfy the customer , But this reality ? Not reality ! Based on this unrealistic situation , How can we maximize a customer's satisfaction ?
- We can respond quickly to customer feedback
- After response, it can quickly locate the cause of the fault
- After appeasing the customer , Let's do another quick test 、 deliver
- There is a very fast iterative release
1.3.2 R & D happiness
When many people first joined the project , There will be a certain freshness , So you can be creative when you do things , passion .
But a few years later , As your initial passion for joining the project faded , Many R & D projects are hard to put enough passion into the project , It may even cause the brain drain .
Why does this happen ? There was a material problem , I think it can be roughly divided into the following parts :
- A lot of repetitive work
- Unable to rotate regularly
In the development of our department in the past few years , Many people receive a service , This service will be maintained for the next threeorfour years , Until the service is no longer maintained or iterated , This service has always belonged to this person . And this person is handed over to others when he leaves the team , It is also difficult for the handover personnel to change these codes
- The positioning problem is very difficult
We know how difficult it has been for Zhiping to locate the problem . It is a problem that our customers have fed back , We have to go through a lot of steps to find out why :
1. For this customer guid Dye it 2. Then let the customer reproduce bug 3. After the customer reappears , We can check the brief log 4. After checking the brief log , We go to the corresponding container to find the corresponding detailed log 5. analysis , location Locating a problem often requires 1 Hours and even 1 It's been a long time , It is easy to cause a flow of customers. Everyone has a rapid rise in all their technical development curves , And then in a steady state . If no corresponding adjustment is made , It is difficult for technology to make continuous progress .
Based on these, we have made some Devops Evolution plan
2. Customer feedback processing efficiency improved
Our department 2020 year 9 Monthly customer feedback response speed is 60+ God , Frequently complained by customers , It has reached the point where we can't do nothing about it . So we are determined to 2021 year H1 Reduce customer response to 7 God .
We have communicated with external teams for many times in the past six months , Why they can do a good customer feedback ?
For example, what we generally perceive is Tencent cloud assistant , Blue Shield ,PCG The public accounts of the service assistant give a very positive feedback .
I think the main points are as follows :
- The response to the problem is very timely
- Can be very proactive push Process follow-up
- Rising mechanism
Then why do we act as a service provider , It is difficult to do this when communicating problems with customers ?
Prevarication and circulation
2.1 trouble
The following is a solution sequence for the user feedback problem of our team
In fact, this problem is not limited to our team , Any team will have .
For example, a common scene :
Customer feedback questions The product assigns problems to developers Developers don't think this is my problem , May be the cause of the test The test thinks that the problem of product positioning is pushed to the product The product is very confused , And then to development
In the process of this seemingly normal circulation , There is a serious and normal problem :
There are time intervals for circulation
If we add the circulation time , It's very obvious
Customer feedback questions ( immediately ) (1 day) The product assigns problems to developers (1 day) Developers don't think this is my problem , May be the cause of the test (1 day) The test thinks that the problem of product positioning is pushed to the product (2 day) The product is very confused , And then to development
In the end, the feedback received by customers is very long , In the long run, it will damage the brand image
A pity
Based on this situation, we went to consult An Deng , t-ara These customer service system platforms , But unfortunately , Due to various reasons of the platform , after 1 Months of docking , Finally, I failed to get through this function .
2.2 Tapd A new feature of
Just when we are at a loss , We were surprised at tapd I found a new one on Timing task function
What does this function do ? You can set a lot of small tasks , Trigger when certain conditions are met , For example . Based on this small function , We did some optimization on customer feedback
That is to say :
Customer feedback questions ( immediately ) tapd Immediately pull the group to inform the product personnel (1h) The product assigns problems to developers tapd Immediately pull the group to inform the developer (1 h) Developers don't think this is my problem , May be the cause of the test tapd Immediately pull the group and inform the tester (1 h) The test thinks that the problem of product positioning is pushed to the product tapd Inform product personnel immediately …… If the problem exceeds 2 Days to go , We will automatically go up and notify the team leader If the problem exceeds 1 Solved at the weekend , We will automatically raise the alarm to the director level If the problem exceeds 2 Solved at the weekend , We will automatically raise the alarm to GM layer
At present, we are already 3 This function has been used on project lines .
2.3 effect
We from 2020.09 After discovering this problem , Originally, the customer reported that the problem processing time should be 64 God , Often complained about .
To this end, we incorporate customer feedback into SLA The system goes through once a week , Under this measure , We see that the processing time of customer feedback ranges from 64 The sky gradually fell to 11-14 God , But when it comes down to this level , We find it hard to improve .
We are 2021 year 5 Monthly introduction TAPD After the function of automatic task ,2021 year 6 Monthly customer feedback processing time breaks through for the first time 7 The world fell to 5. 01 God . We have achieved the goals set in the first half of the year !
At the same time 8 month 1 The sun fell again to 4.22 God !
Customer feedback processing time actually involves many problems , For example, rapid development , Rapid positioning , Deliver these supporting solutions quickly . I will talk about these later
3 agile development
Our department has learned from the whole PCG One of the epc A development process of . By studying how they practice , Read a lot of articles , There are also several students , After several months of practice with the background team of Tencent documents, we summarized and got a set of solutions
3.1 The main development
Let's first look at the difference between trunk development and branch development
3.1.1 Branch Development
We used to use a branch and merge development pattern .
This model is that everyone has his own small warehouse , Then everyone goes to maintain their own branch , Then regularly pass by yourself MR The way Merge To master, Then fight by yourself tag To trigger . Such a development mode , Has the following characteristics
Code familiarity
Everyone has a project for themselves , Very familiar with the code . But other students' code for this student is completely blind
CR Less communication
To break the code CR The missing situation , Our center will organize 1 Zhou 1 Take turns CR, Equivalent to everyone's code ,1 It takes months to be CR Part of the
Slow release speed
Now, Zhiping has a bill of lading every Monday , Approved on Tuesday , Gray scale will be released on Wednesday , On Thursday, the official environment . Follow this process , We have to wait at least a week to fix one problem at a time , If there is a process that goes slowly , Or the developers themselves forget to send a formal , It may take another week !
But this development model is very suitable for those terminal teams . Because they are on a regular weekly distribution OTA, Then the customer can get an update at the specified time of the day , They don't need to pay a high price to get a stable rhythm of publishing
3.1.2 The main development
For background services , Personally, I think trunk development may be a more suitable development mode .
Familiar with architecture
In this mode, each student may have only 70%, But very familiar with the overall architecture
CR Communicate frequently
Our trunk merge development is mandatory CR Of , We used to spend a year , Launched 2800+ Time CR, On average, there are 8 Time CR Be triggered .
The release speed is very fast
Because I left CR, Plus other quality assurance measures , You can actually do it anytime release edition .
Moreover, our cloud microenterprise team has already achieved :
From finding problems , To fix problems to release to production environment , The overall time is even the fastest Not more than half an hour , Compared with the previous week or two, the release speed is no longer in the order of magnitude .
3.2 Collaborative development
Trunk development follows monorepo It's a complete set , When you want to merge development with the trunk , collocation monorepo It can make your development mode have an industrial level improvement .
3.2.1 Linkage development
What is linkage development ?
For example :
Each of our former students may be responsible for a main module , Then several business modules , This kind of development mode , If we need to develop a feature , We need three students to develop it .
Slow development
This kind of development has a fatal problem :
In many cases, each student has his own development rhythm and priority
It often happens that Zhang San has finished his development , Waiting for Li Si to develop , Li Si's development is finished, waiting for Wang Wu's development , Then we will arrange a time for joint commissioning , Sometimes it's just adding a field , It takes threeorfour weeks !
It is difficult to change jobs
Exchange the work of two of them , First of all, they have a very long adaptation period , And it can't be connected seamlessly
Most of all , because Code style and coding habits are not unified , After exchanging code, they will look down upon each other , In the end, the contradictions in the group have not been solved , It will deepen .
Job rotation eventually leads to , Wasted a lot of handover expiration time , But the effect is not as good as not handing over
No global view
I don't know what the students of online education are like , Anyway, a phenomenon often occurs in our department , When the defense season comes , I often see my classmates carrying notebooks or computers . Come and ask a classmate what you do in this module , There is something in the middle , What optimizations have been made or what components have been used , How about the performance , There are no performance bottlenecks . They have only a very rough understanding of the overall structure .
3.2.2 Collaborative development
We from 2020 year 3 month 28 Start using monoRepo Conduct collaborative development
This pattern is to put all the large components , All businesses are put in one big warehouse . Develop a feature , Only one student needs to be involved in the whole process .
Fast development
A large number of components are reused , To develop a feature, Just put in one classmate , Modify the whole link . Other students only need to participate CR
Job rotation is convenient
In fact, there is no job rotation at all , Each student develops another unknown service , There's basically no threshold , Just connect to the development . Another point is , There is basically no work handover . An inappropriate example , We have a core developer who has been transferred , I have been involved in the development of more than ten modules in Dachang , These do not require any handover
Clear architecture
When you put all your business in one big warehouse , I dare not say that I have a general understanding of all the sorting businesses of the whole department . But for the current project , For example, educational programs , For example, industry platform projects , Members have a very, very clear understanding of the overall architecture .
The details are not clear
Of course, this development model will also have some problems , The details of some modules are not particularly clear .
3.3 Collaborative development
Based on this monorepo Thought , This is a whole frame composition of our current warehouse .
Look from the bottom up , We have encapsulated some commonly used monitoring systems .
We have also encapsulated most of the commonly used storage .
There are also logs that we often use 、 notice 、 function 、 Common components , They are basically encapsulated .
Above this are some public agreements . For example, public constants 、 Structure 、 Error code 、 Protocol library and compilation environment .
The second layer is the adaptation of some framework layers . Let's write now trpc Code or write taf Code , At the coding level , The difference that people can't perceive .
When we're done with this setup , We find that there are many things that will benefit the team .
3.3.1 Quickly build the environment
New employees from the past , We will all set aside a week for him to take a ride . But for now , It may only take half an hour to an hour .
My computer broke down last time , I tried it once , It takes about ten minutes to set up the overall environment .
3.3.2 Unified error code
The unified error code is actually a copy of NASA's maintenance manual . We often see , There is something wrong with the space station , The astronauts rushed out the instructions , Find the error code , Then find the corresponding operation , Then go and solve the problem .
We unify error codes , In fact, I want to achieve a final state :
As long as it is a user of our cloud microenterprise , Just tell us a return code , We can know roughly what the problem is
3.3.3 A unified CI/CD
All assembly lines based on large warehouse are unified . The new service only needs to copy the assembly line of others , Just change two of the parameters
3.3.4 Learning organization
I think the biggest help of the whole development model is , Established a learning organization .
Why is it a learning organization ? Because every problem we find will be fed back in the group , And then launched CR Will also initiate in the group , Then invite everyone to CR Own code . Instead of sneaking away like before Merge.
Another advantage is that it can promote the evolution of our technology stack , In the past, we had a platform team to promote the component packaging of this large warehouse , Now it's time for members to consciously package some useful components into our public library . Like sensitive word detection 、 Kafka and other components . Other benefits are , From individual combat to group combat , Can work together to quickly overcome some problems and solve some problems .
3.3.5 Code Search
100+ Service case reference ,100+ Component encapsulation support , Service development can scale and Process
3.4 Collaborative questions
I've been talking about collaborative development for so long , Some people will resent , Is Dachang a cure all thing ? If so , Why didn't the whole company move forward , Just a small department of yours .
First of all, let's be clear : The development of dachangjia trunk is not a silver bullet , But an amplifier of code engineering practice . In fact, this is km Or le Wen has been discussed many times .
Why is he an amplifier , Because :
!!#ff0000 When you do well in Okura , Your advantage will be magnified infinitely
But when you do a bad job , Then the problem will be magnified infinitely !!
3.5 Member feedback
from 2020 year 3 At the end of this month, I began to do this practice . stay 21 year 3 At the end of the month , I did a questionnaire survey , Invited those who participated in the backbone development at that time 16 Students make feedback statistics , The following is the proportion of feedback .
PS:
Before and after sharing this course , I did a pre class and post class check :
Before class is 23 People vote 20 People are willing to try , Proportion 87%
After class Yes 32 Individual voting 31 Individuals are willing to try , Proportion 97%!
I believe that all people yearn for a happy development model !
3.6 Challenges and solutions
From the feedback of the above members , We see a lot of negative feedback . Then how to solve this problem ?
3.6.1 Code submission conflict
Our request now is : Every submission must be Achieve 200 Within the line , No more than 500 That's ok . If exceeded 500 That's ok , Just go to the director CR, Then other students will not participate .
3.6.2 Code styles are very different
We have made a lot of specifications : First estimate google Of golint This specification may not be fully followed by other teams , But we are very demanding .
Why is it harsh ? because golint The check has been integrated into the compile command , namely :golint The problem with is detected at compile time . besides , We have also made a lot of checkpoints to ensure that everyone writes the same code as the code produced on the assembly line . There will be a special explanation later .
3.6.3 Common library code quality assurance
This is the worst thing we have done in our practice , We used to be in a hurry , No single test is written when writing the public library , As a result, the single test code coverage of our public library has reached 30% about , We from 2021 year 4 The mandatory single test of the public database will be implemented in January CR The system , And a pipeline is developed to scan the coverage of single test .( At present, the single test coverage scanning needs 50min, Therefore, the red line has not been set for the time being )
3.6.4 Entrainment release
This is actually a difficult problem . Because we currently have half a million codes , nearly 100 Services are being developed together , Entrainment release is inevitable , What we have to do is , How to reduce the risk of introducing entrainment release , Let's talk about two solving cases :
- Google, It is required that all incoming mainlines can be directly published online , If you can't , Just add code and close it , When it can be opened , Delete and close . This practice requires high quality of engineering personnel , We also need a very fast risk response plan .
- Tencent advertising , Made a powerful switching system . All feature codes have switch control , There is also a perfect notification deletion system . The advantage of this approach is that all the code is controllable , Also try to reduce the situation of entrainment release . More details can be found on the intranet 《 How Tencent advertising in 3000 Implement trunk development and sky level automatic release on a large code base of more than ten thousand lines 》 Which of the two options is better ? The author thinks google Is the best solution , Tencent's advertising plan is The most suitable solution for Tencent . Why do you say that? ? First , We have to admit that the average engineering literacy of our Tencent technicians is lower than google Of , It is very difficult for us to do without switching , It can ensure that the code can be directly published online , So their plan is not suitable for us . As for why Tencent's advertising plan is not the best one ? Because this scheme will elevate the code complexity in an already very complex warehouse , It is likely that the complexity will increase as the business requirements become more and more complex , It leads to the switch that can be seen everywhere in the business , The person who took over the code later saw a dozen switches , The whole person is stupid , The later maintainers dare not delete it at will , This will cause code corruption to accelerate . And our intelligent platform product department , Now is to ask every student Try to make it available at any time , But not every student can guarantee this , Because we are 5-10 Small team cooperation mode , At present, it is still within the controllable range . On the whole , The solution to this problem , We are still at a stage of exploration .
3.7 Code quality assurance
Then we will focus on solutions with different code styles . We did a very strict code quality assurance , Has been strict to what extent ?
First look at the picture below
From new recruits to integration codes to mainline , We have formulated very strict 4 Gateway
3.7.1 The first level : New person training
First of all, we are new recruits , Whether you are social or school recruitment , After we come in, we will all do a code CR A training for :
- How to design the code ?
- Da Cang + How to develop the trunk ?
- How to set up the environment quickly ?
- The code atmosphere of the team is quickly cultivated
- What code specifications are there ?
It takes about an hour or two each time .
3.7.2 The second level : Code check
Then when writing code , our makefile It's integrated golint Check and go fmt Auto format . As long as you compile , Will prompt golint Check , And do automatic code formatting , These are all done when the new person is completely unaware
3.7.3 The third level :PreMerge
We are initiating MR after , Automatically start to execute the assembly line ( We devops team , It's called Gonggong . Therefore, this basic support assembly line is called buzhoushan ) , What are the functions of this mountain
- Tencent code check
+ Woodpecker safety check
+ gometalint Check
+ Cycle complexity check
- xcheck Check
- Unit test check (doing)
3.7.4 The fourth level :CR
We are mandatory CR Of , At least one person CR After passing , To join in master Of , And participate CR All of our classmates are our core development backbones , We are also planning to do some for these core development backbones Code ReadAbility train , At present, reference has been made to PCG Of EPC How to do it , Making Code ReadAbility The training is planned .
A code is merged into master, If the code causes subsequent problems , Our current identification is Developer 90% responsibility ,10%CR Students' responsibilities .
3.7.5 Code governance achievements :
We are 2020 year 4 The month Dachang project was strictly implemented at the beginning of its establishment golint standard +《 Tencent code specification 》, But even if we fully comply with the code specification , It's hard to guarantee that our code will be completely free . A year later, that is, in 2021 year 4 month , We added Tencent code check again +xcheck Check , What surprised me was , When we open these things , The original number of questions is 0, It soared to 1500+!
Many things we didn't care about , But after deep thinking , Indeed, it is found that non-compliance with these specifications will lead to accelerated code corruption . So we began to repair , After three months of repair , The number of current problems has dropped to 24 individual .
The rest of this 24 It's all about complexity , This kind of problem is difficult to fix . For example : In a function 200 Many lines of code , Each function should be optimized to no more than 80 That's ok . You may think that it doesn't matter if you don't fix this problem , Just write notes and documents , It can still ensure that the code is very readable . But is this really the case ? Not at all ! Because when your code for this function exceeds 80 After line , People who work in shifts touch your code , Or it's hard to change your code , Will he change or not ? If you don't transform , Then expand the number of lines , The code will start to corrupt , If the transformation , Will that introduce new problems ?
Our current large warehouse is forced to open the standard inspection . When your single function code exceeds 80 After line , There's no way merge To our master Branch , So as to ensure that our overall code is clean .
Of course, can we eradicate code corruption by doing this ? You can't ! We can only delay the progress of code corruption . Others need to do code refactoring in two years , We can only guarantee that the code may not be refactored until five years later , And most importantly , Do code refactoring in five years , Refactoring must be very easy , There will be no such a situation as "scolding your mother while reconstructing..." .
3.8 Industrial development
This is a very, very big proposition , In my opinion, the biggest and best harvest of our trunk development is that we have realized the transition stage from the project to industrial development .
What is industrial development :
Development has scale 、 Standardization 、 creativity
3.8.1 scale
In the area of scale , I think we are already very mature .
We have a unified CI/CD Assembly line , All you have to do is modify based on the template 3-5 One parameter can complete CI/CD Opening up the system
We have a unified code detection system , You just need to develop normally , compile ,MR You can access to a perfect code detection system
We have a unified compilation environment , No matter you use osx still linux Or windows, Can quickly build a compilation environment for compilation and release
We have 100+ Common component encapsulation of ,99% The functions of have been encapsulated , Just waiting for you to use , This number , And every month 10+ The rate of growth
These imperceptible parts , It is the foundation of industrial development . imagine : These things are micro aware or even non aware when each developer comes in . For example, the configuration file , Tianji Pavilion , Developers don't need to worry about permission management , Functional segmentation , grouping , Subsequent iterations are upgraded , The principle of component architecture , It can be used without obstacles . Can you devote all your energy to business development , Create greater value ?
3.8.2 Standardization
Thanks to our large-scale applications , We have also done a solid job in standardization
Such as code standardization , Make sure that every student doesn't get estranged when they see the code , If there is , Namely CR It was not done in detail
Such as coding standardization , Those components should be connected , How to design an efficient function architecture , These are on a daily basis CR in , There will be code exchange to promote everyone's rapid growth
For example, output standardization , We have done a lot of training and Discussion on log output , If you do an efficient log output, it is also in daily use CR There are in
3.8.3 creativity
After complementing the scale and Standardization , All engineers can be creative on this . for instance :
- Solve some problems quickly
- Promote some improvement of common components
- There is also an overall iteration of the technology stack , The rapid development to the primary cloud relies on Qi .
- Turn a lot of local optimization into global optimization
These are all places where we are currently evolving , And we are working together , Has done quite well .
3.8.4 results
At present, Tencent cloud microenterprises can support 40+ Different KA End Special characteristics of , Hundreds of customers bot, as well as 8 Government traffic with different functional characteristics And other scenarios
Enterprise projects , Development of Xi'an support field , You can enter the warehouse and participate in the war according to the cloud microenterprise code and quality specifications , High quality completion of function development and code handover after development , Indirectly expand the delivery bandwidth ;
3.9 Lessons learned
Since we implemented trunk development , It's not plain sailing , Here are some of our experiences and lessons
3.9.1 Summary of experience
Log encapsulation :
Start with a log encapsulation that we initially implemented , Everyone starts using a component . We will be compatible later RPC, Support atta Report , Support Tianji Pavilion trace Log reporting these functions , All users are insensible . We don't even know why he typed a line of logs , You can go to Tianji Pavilion 、 Query on the eagle eye .
Tianji Pavilion is sealed
How easy it is to connect Tianji Pavilion in our warehouse . Just one person , It only needs 10 minute , Can complete 70+ Service access and online . Efficiency is terrible
CR communication
In all experience , I think it is a technological growth for everyone 、 The biggest help of continuous progress is CR Communication system , All our code wants merge To the main branch is mandatory CR Of , Under this mechanism , Our code communication has become very frequent , Of course, there are many problems in this communication process , These problems also force us to gradually standardize , Gradually form a unified development mode .
3.9.2 The problem summary
Quality problems of common components
For example , As Tianji Pavilion Oteam member , I often follow up the upgrade of Tianji Pavilion , Because our Tianji Pavilion is deployed independently , This leads us to upgrade to the latest version of Tianji Pavilion , Occasionally there are some problems .
Once after I finished upgrading , This Tianji Pavilion is due to api change , Not tested on a large scale . As a result, all our services failed to be reported , Although this problem can be solved by fast rollback after it occurs , It doesn't affect the online , But it also warns us to be cautious when dealing with the upgrade of public components .
Non standard operation of members
A classmate changed the public library , After the change, I put it up directly without compiling , And initiated MR.CR My classmate looked at it , Logical changes are normal , It also conforms to the code specification , Let's go to master 了 . Later, it will cause others to fail to compile , Until it is unified later fix To solve this problem . This kind of problem also warns us , The submitted code must be compiled first , The walk through test passed , Only after passing the functional verification can it be connected to master
4. Rapid positioning
After talking about rapid development , We began to develop a rapid positioning system
4.1 Monitoring system
We started to construct a complete set of monitoring system in April last year .
Throughout the transition , The overall monitoring system is very chaotic . As you can see, the top half of the figure is the monitoring system we access at these times :
original TAF It comes with a service monitor /PP monitor / Feature monitoring , Then we found that Tianji pavilion was very useful , Later, our company is mainly promoting 007 Unified monitoring and alarm system , So we have also carried out a unified 007 Monitoring and reporting .
After we access so many monitoring alarms , It is found that there are many redundant data reports , But because we adopt the development mode of Okura , Now it has been gathered to cloud monitoring and Tianji Pavilion .
All current frameworks 、 Storage is based on unified cloud monitoring and Tianji Pavilion monitoring and alarm .
4.2 Tianji Pavilion quick positioning
The Tianji Pavilion of the company is really very useful , As a major contributor to Tianji Pavilion, our department , Contributed to taf-cpp,taf-go,taf-java,taf-nodejs Code and component encapsulation .
meanwhile , Due to the TPS A very large (1000W/TPS), So we deployed some important modules of Tianji Pavilion independently . Like in the picture collector,ES Storage and its API Server Have been independently deployed to the cloud . This way of independent deployment can ensure the stability of our Tianji Pavilion , Even if Tianji Pavilion leads to a wide range of faults and changes , We will not be affected .
How to use Tianji Pavilion :
We directly catch one in the group chat SessionID Of , Open Tianji Pavilion directly , You can see the link log . Just throw the cut graph of the problem to the group , Then you can reply quickly . The whole process , Forty or fifty seconds is enough .
About Tianji Pavilion , Because we have specially invited the head of Tianji Pavilion tensorchen Let's explain the architecture and functions , So we won't go into details here .
4.3 Log system
Okura is really very fast in making this log system . As shown in the figure below , Developers just need one line of code , You can access a very large log system without perception
In fact, some students don't know what his back-end principle looks like , For all the students , He just kept a diary , You can even go to Tianji Pavilion and eagle eye to find out . This is one of the advantages of Okura .
5. Fast delivery
When we are making the delivery system , There's a problem , Each time we issue a bill of lading, we must go through the process of issuance and approval , This process will involve the approval of many irrelevant personnel
For example :
We turn off a function , The corresponding code contains a Boolean value modification , According to our original delivery system , You need a bill of lading , Need designers , Testers , Product people , Business student approval , They don't usually read , And sometimes they are too busy to approve , The whole process may take twoorthree days .
According to this situation , We have built a two person delivery system .
5.1 Two person delivery system
We started from MR Finally release to the formal environment , A whole process .
Probably the fastest is 28 minute , The slowest time is forty or fifty minutes , And most of the time , Is spent in the automatic observation phase . At this stage we have to delay 20 Minutes to observe the timeout rate of the service , Exception rate? If there is a problem, you need to roll back automatically .
There are only two people involved in the whole process , The first is the developers themselves , The second is the tester . Few people are connected , There is no need to test , Release speed is very agile .
Some students will ask : How does this delivery system ensure delivery quality ? After all, you will be in the production environment in about half an hour , There must be no time to test , How to ensure code quality ?
yes , we have : The purple module in the figure , Some quality assurance modules . In addition to these module guarantees , There is also a very strict test free verification .
5.2 Test free premise
In terms of line coverage and method coverage , We are actually better than Tencent advertising , Tencent documents , App Bao is more rigorous , however Line coverage Follow Method coverage We don't think it means anything . Because even when our row coverage reaches 90%, Our branch coverage may be just 30% Less than . So of all the coverage indicators , The most important thing is branch coverage . And we dare to release without testing , Where does the confidence come from ?, Make up the branch coverage .
Make branch coverage 60% How difficult is it? ?
For example : You wrote 700 Single test , Line coverage 85%, But branch coverage is 20% about . In order to reach this pass line , You may need additional supplements 3000 Line single test code can achieve !
besides , We also demand , Every time I release Line coverage + Method coverage + Indicator data of branch coverage Not lower than last time
We have now completed 58 Construction of a test free assembly line , Yes 58 A service , Has begun to take this green channel to quickly publish services to the public .
6 Encoding practice
Coding practice we have probably practiced 1 More years , After more than a year of repetition CR, Now, except for the new students , Now the submitted code is very standard .
6.1 MR standard
1. No more than... Code can be submitted each time 200 That's ok . exceed 200 Line below 500 That's ok , Others may help you when they are in a good mood . But you have surpassed 500 OK, then you can only go to the director CR 了
- Binary file submission is not allowed , Only in this way can we ensure that the warehouse capacity expands too fast . Binary system ,doc Documents, etc , All stored in Tencent software source and iwiki Up .
- Do not write code in parallel . When you have developed a piece of code , launch CR after , Don't do another thing immediately feature Development . The right thing to do is to urge others CR Your code , And then you do it according to other people's CR Repair as soon as possible , Join to the main branch , Then pull out new branches from the main branch for new development
6.2 Code design
Code design is a mysterious thing , You can go and have a look Cheaterlin Some articles .
Let me focus on the following points .
- Don't type so many logs . Some of our services are not abnormal every day , however 1 There is... In the sky 80G Of debug Print out the log , Caused a great waste of resources 2. Key information , There are thousands of lines in the log , Some key information is missing , I feel lonely 3. Don't use magic numbers , Don't use magic strings , Used without leaving any comments , Code corrupts directly
- Code cleanliness , When you find that you don't like it, you list it , Deal with as soon as possible
- Take the initiative to repair , stay CR Problems found outside , Repair it easily . And leave a note , Exchange experience . The whole team atmosphere will be better
6.3 Code CR actual combat
Our department is in CR All the time I was CR What is it , Practice it .
Let's look at the following code , from 12 Row to 32 That's ok , In a matter of 20 OK, you can CR How many things come out
CR give an example :
The modified code :
6.4 Code specification
The last is the code specification we are following now
7 Q&A
Q: In the process of advancing the warehouse , You are not plain sailing . There must have been a lot of ups and downs , Is it recorded , What problems have you encountered and how to solve them . Then we can use it for reference .
A: It's a pity that when we were doing these things , I didn't make some records on paper . But basically all the problems are solved , Will be kept in the submitted information of that big warehouse
Q: Do you have any questions that you are impressed with ?
A: In fact, it is the problem of engineers' code literacy . This is what I think is the biggest challenge . It doesn't mean everyone will follow your rules , They may modify the code themselves due to carelessness or other reasons , Will cause others to compile but , Second, you can't immediately release the code online , The situation of being carried and released , These are actually some problems encountered in the practice of Dachang . Then it happens occasionally , But at present, there is no online accident .
Q: But I'm worried about a big accident , For example, you should not have many single tests at first , There are no automated use cases ? At that time, how to guarantee the code quality ?
A: Yes . Just before the big position , Our single test is missing , Everyone plays his own game , After entering the big warehouse , These controls are very strict , We are also asking you to supplement the single test slowly , It took us more than half a year , The single test of all services starts from 20% Added to 80%, Branch coverage ranges from 10% Added to 60%, Indeed, a lot of human resources have been invested to do this
Q: The one you just talked about CR practice . Compare the surface problems . Is there any tool or method to guarantee this ?
A: It's like this , There must be some of our tools . And there are three sets of tests , But tools are not everything , When it comes to some code design problems, we need to look at them ourselves . Tools can solve 90% The problem of , But there will be 10% You need to look at the problem , And this 10%, Often the most important .
Q: Do we have any good experience in putting old code into big warehouse ? Do we have to refactor them all ? Let go of the old ones first , Then just do it for the new code CR?
A: In fact, we were doing something a while ago . We took over a batch of codes from other students , Recently, we are migrating this part of the code to the warehouse , We arrange interns , Fresh graduates , Or the students who have just joined the company will transfer . Let novices familiarize themselves with the code by using the process of standardizing the code , After all repairs, it will be put into the big warehouse , Instead of closing it first and then repairing it .
If it is a combination before governance , It will probably be a year or two later . So we are very strict in terms of access .
Q:CR This problem . Let me ask you again , For example, we develop a function that may be relatively large , This large function may write a lot of code . Your rule is that every time CR Code cannot exceed 200 That's ok , But even if it's just 200 That's ok , I can't finish the whole frame . What to do in such a case ?
A: This situation shows that your split is not detailed enough . When you split it into an atom , Everything can be solved in one function . What is the average amount of code submitted by Tencent employees in a day of development ? yes 167 That's ok . therefore 200 OK, it's just a classmate's day's workload . exceed 200 Line resubmit ,CR It's hard for people to pay attention to your code logic .
Each work can be disassembled , Don't think it's a big demand. I've finished it all in one fell swoop , Submit it , It's going to be very fulfilling , This kind of long-time coding will make people gradually become tired , Instead, it is prone to quality problems .
边栏推荐
- Ribbon
- What is the role of ECS? How does FTP connect to ECS configuration?
- 博士申请 | 香港科技大学(广州)刘浩老师招收数据挖掘方向全奖博士/硕士
- What are the differences between ECs and virtual hosts? Which is better, ECS or VM?
- How to create an FTP server on the ECS? Is it safe to create an FTP server on the ECS?
- 梯度下降法介紹-黑馬程序員機器學習講義
- Next. JS + cloud development webify creates an excellent website
- What does IIS mean and what is its function? How does IIS set the size of the web site space on the server?
- Idea创建Servlet 后访问报404问题
- How does the VPS server upload data? Is the VPS server free to use?
猜你喜欢

Detailed explanation of tcpip protocol

Final summary of freshman semester (supplement knowledge loopholes)

Facebook内部通告:将重新整合即时通讯功能

Introduction à la méthode de descente par Gradient - document d'apprentissage automatique pour les programmeurs de chevaux noirs

2022年二级造价工程师备考攻略,你准备好了吗?

ServiceStack. Source code analysis of redis (connection and connection pool)

解析90后创客教育的主观积极性

Facebook internal announcement: instant messaging will be re integrated

SAP mts/ato/mto/eto topic 7: ATO mode 1 m+m mode strategy 82 (6892)

Are you ready for the exam preparation strategy of level II cost engineer in 2022?
随机推荐
Worthington胰蛋白酶的物化性质及特异性
Jointly build Euler community and share Euler ecology | join hands with Kirin software to create a digital intelligence future
Easyanticheat uses to inject unsigned code into a protected process (2)
Collagenase -- four types of crude collagenase from Worthington
Introduction to C language custom types (structure, enumeration, union, bit segment)
What is an evpn switch?
Summary of Android interview questions in 2020 (intermediate)
How to add a domain name to ECS? What are the advantages of ECS?
Indicator statistics: real time uvpv statistics based on flow computing Oceanus (Flink)
5g and industrial Internet
Advantages of fixed assets management system
一款支持内网脱机分享文档的接口测试软件
Web penetration test - 5. Brute force cracking vulnerability - (6) VNC password cracking
MySQL - SQL execution process
黑马程序员机器学习讲义:线性回归api初步使用
SAP mts/ato/mto/eto topic 8: ATO mode 2 d+ empty mode strategy 85
Jimureport building block report - expression introduction
Library management backstage
Doctor application | Hong Kong University of science and Technology (Guangzhou) Mr. Liu Hao recruits the full award doctor / Master in data mining
What technology is VPS? How does the server VPS?