当前位置:网站首页>The trunk warehouse can also be tob, and Tencent cloud microenterprises do not leave quality behind

The trunk warehouse can also be tob, and Tencent cloud microenterprises do not leave quality behind

2022-06-24 04:45:00 economic expert

Few teams in the company use trunk warehouse development , Questions have been raised all the time ,toB The field is even more controversial , Many teams worry about this and there . Tencent cloud microenterprises in actual combat , The delivery bandwidth is seriously insufficient , While the R & D team delivers new tasks , And maintain a lot of old code , But most of the time, the old code is piled up temporarily , Maintainability is basically not , Team members to solve an old problem , It often takes several days to break your muscles and bones , In this context , New deliveries are bound to be affected , In order to catch the delivery time , R & D students continue to stay up late and pile up more “ The garbage ” Code , There is no time to improve the technical power , At the same time, I feel strongly that the product is abusive , Leaders are in a hurry , miserable . After the R & D students suffocated their internal injuries , I had to walk away , Students who accept the offer continue “ cheat ” And quilt “ cheat ”, A vicious cycle . To avoid code corruption and R & D delivery “ Ponzi scheme ”, Tencent cloud microenterprises decided to take advantage of the trunk warehouse to magnify the advantages and disadvantages , Facing challenges , Don't leave a way for yourself , from 2020 year 3 Month began to be used , Very lucky , Forced by the trunk warehouse , The team has made rapid progress , It also broadens the delivery bandwidth , See the light .

Today, I will share the following points :

  • devops Ideas and methodology
  • Customer feedback processing efficiency improved
  • Practice of improving development efficiency ( The trunk + Da Cang )
  • Fast positioning efficiency improvement practice ( The cloud monitoring + Tianji Pavilion + Large warehouse integration )
  • Fast delivery efficiency improvement practices
  • Basic principles of code design and CR practice

1. DevOps idea

Our most basic and core idea : sustainable 、 Since the driver 、 Small iteration

1.1 Forward cycle

From the beginning, make Devops Start , Our three-step plan has been implemented one by one

Devops iteration

First step

Build a simple and continuous CI/CD The process of . from Dev To Ops, A simple delivery process .

The second step

Establish a feedback loop , Customer feedback can be transferred to our development through many channels , product , Test this side .

Then we go to iterative development , Iterative upgrade . Make our products more perfect , Customer feedback is more proactive .

The third step

Between a big cycle , Before user feedback , We will do a lot of small iterations internally , These small iterations ensure that users get a more stable product than before through the following action items

  • Establish a sound quality inspection mechanism
  • Build a learning organization
  • Establish a mechanism for rapid release and launch
  • Establish a sound code quality interception mechanism

At present , The cloud microenterprises are gradually building the third stage . Although there is still a long way to go , As long as the direction is right , It will surely come to the end .

1.2 Obstacles to evolution

No matter what changes are made , When you try to change the inherent development delivery specifications of a department or group . There will be an obstacle to its evolution .

Because in all our groups , All people can be divided into five factions .

image.png
  • The Precursors
  • Early practitioners
  • Most of the early
  • The lagging majority
  • A stubborn person

In the face of this situation , How to quickly promote one in a group or department DevOps The transformation of ?

Evolutionary methodology

Follow two steps

  1. Find the people in this group innovator and practitioner , Convince them to participate in the evolution together
  2. Build a learning organization , The invitation Most of the early Participate in

as for , The lagging majority and A stubborn person , When the number of users exceeds half , They will also passively accept these changes .

1.3 Deal with key

How to measure a devops Well done

  • Customer satisfaction
  • R & D happiness

1.3.1 Customer satisfaction

If there is no problem with the product we deliver , It will definitely satisfy the customer , But this reality ? Not reality ! Based on this unrealistic situation , How can we maximize a customer's satisfaction ?

image.png
  1. We can respond quickly to customer feedback
  2. After response, it can quickly locate the cause of the fault
  3. After appeasing the customer , Let's do another quick test 、 deliver
  4. There is a very fast iterative release

1.3.2 R & D happiness

When many people first joined the project , There will be a certain freshness , So you can be creative when you do things , passion .

But a few years later , As your initial passion for joining the project faded , Many R & D projects are hard to put enough passion into the project , It may even cause the brain drain .

Why does this happen ? There was a material problem , I think it can be roughly divided into the following parts :

  • A lot of repetitive work
  • Unable to rotate regularly

In the development of our department in the past few years , Many people receive a service , This service will be maintained for the next threeorfour years , Until the service is no longer maintained or iterated , This service has always belonged to this person . And this person is handed over to others when he leaves the team , It is also difficult for the handover personnel to change these codes

  • The positioning problem is very difficult

We know how difficult it has been for Zhiping to locate the problem . It is a problem that our customers have fed back , We have to go through a lot of steps to find out why :

1.  For this customer guid Dye it 
2.  Then let the customer reproduce bug
3.  After the customer reappears , We can check the brief log 
4.  After checking the brief log , We go to the corresponding container to find the corresponding detailed log 
5.  analysis , location 
 Locating a problem often requires 1 Hours and even 1 It's been a long time , It is easy to cause a flow of customers. Everyone has a rapid rise in all their technical development curves , And then in a steady state . If no corresponding adjustment is made , It is difficult for technology to make continuous progress .

Based on these, we have made some Devops Evolution plan

image.png

2. Customer feedback processing efficiency improved

Our department 2020 year 9 Monthly customer feedback response speed is 60+ God , Frequently complained by customers , It has reached the point where we can't do nothing about it . So we are determined to 2021 year H1 Reduce customer response to 7 God .

We have communicated with external teams for many times in the past six months , Why they can do a good customer feedback ?

For example, what we generally perceive is Tencent cloud assistant , Blue Shield ,PCG The public accounts of the service assistant give a very positive feedback .

I think the main points are as follows :

  1. The response to the problem is very timely
  2. Can be very proactive push Process follow-up
  3. Rising mechanism

Then why do we act as a service provider , It is difficult to do this when communicating problems with customers ?

Prevarication and circulation

2.1 trouble

The following is a solution sequence for the user feedback problem of our team

In fact, this problem is not limited to our team , Any team will have .

For example, a common scene :

 Customer feedback questions 
 The product assigns problems to developers 
 Developers don't think this is my problem , May be the cause of the test 
 The test thinks that the problem of product positioning is pushed to the product 
 The product is very confused , And then to development 

In the process of this seemingly normal circulation , There is a serious and normal problem :

There are time intervals for circulation

If we add the circulation time , It's very obvious

 Customer feedback questions ( immediately )
(1 day) The product assigns problems to developers 
(1 day) Developers don't think this is my problem , May be the cause of the test 
(1 day) The test thinks that the problem of product positioning is pushed to the product 
(2 day) The product is very confused , And then to development 

In the end, the feedback received by customers is very long , In the long run, it will damage the brand image

A pity

Based on this situation, we went to consult An Deng , t-ara These customer service system platforms , But unfortunately , Due to various reasons of the platform , after 1 Months of docking , Finally, I failed to get through this function .

2.2 Tapd A new feature of

Just when we are at a loss , We were surprised at tapd I found a new one on Timing task function

What does this function do ? You can set a lot of small tasks , Trigger when certain conditions are met , For example . Based on this small function , We did some optimization on customer feedback

That is to say :

 Customer feedback questions ( immediately )
tapd Immediately pull the group to inform the product personnel 
(1h) The product assigns problems to developers 
tapd Immediately pull the group to inform the developer 
(1 h) Developers don't think this is my problem , May be the cause of the test 
tapd Immediately pull the group and inform the tester 
(1 h) The test thinks that the problem of product positioning is pushed to the product 
tapd Inform product personnel immediately 
……
 If the problem exceeds 2 Days to go , We will automatically go up and notify the team leader 
 If the problem exceeds 1 Solved at the weekend , We will automatically raise the alarm to the director level 
 If the problem exceeds 2 Solved at the weekend , We will automatically raise the alarm to GM layer 

At present, we are already 3 This function has been used on project lines .

2.3 effect

We from 2020.09 After discovering this problem , Originally, the customer reported that the problem processing time should be 64 God , Often complained about .

To this end, we incorporate customer feedback into SLA The system goes through once a week , Under this measure , We see that the processing time of customer feedback ranges from 64 The sky gradually fell to 11-14 God , But when it comes down to this level , We find it hard to improve .

We are 2021 year 5 Monthly introduction TAPD After the function of automatic task ,2021 year 6 Monthly customer feedback processing time breaks through for the first time 7 The world fell to 5. 01 God . We have achieved the goals set in the first half of the year !

At the same time 8 month 1 The sun fell again to 4.22 God !

image.png

Customer feedback processing time actually involves many problems , For example, rapid development , Rapid positioning , Deliver these supporting solutions quickly . I will talk about these later

3 agile development

Our department has learned from the whole PCG One of the epc A development process of . By studying how they practice , Read a lot of articles , There are also several students , After several months of practice with the background team of Tencent documents, we summarized and got a set of solutions

3.1 The main development

Let's first look at the difference between trunk development and branch development

image.png

3.1.1 Branch Development

We used to use a branch and merge development pattern .

This model is that everyone has his own small warehouse , Then everyone goes to maintain their own branch , Then regularly pass by yourself MR The way Merge To master, Then fight by yourself tag To trigger . Such a development mode , Has the following characteristics

Code familiarity

Everyone has a project for themselves , Very familiar with the code . But other students' code for this student is completely blind

CR Less communication

To break the code CR The missing situation , Our center will organize 1 Zhou 1 Take turns CR, Equivalent to everyone's code ,1 It takes months to be CR Part of the

Slow release speed

Now, Zhiping has a bill of lading every Monday , Approved on Tuesday , Gray scale will be released on Wednesday , On Thursday, the official environment . Follow this process , We have to wait at least a week to fix one problem at a time , If there is a process that goes slowly , Or the developers themselves forget to send a formal , It may take another week !

But this development model is very suitable for those terminal teams . Because they are on a regular weekly distribution OTA, Then the customer can get an update at the specified time of the day , They don't need to pay a high price to get a stable rhythm of publishing

3.1.2 The main development

For background services , Personally, I think trunk development may be a more suitable development mode .

Familiar with architecture

In this mode, each student may have only 70%, But very familiar with the overall architecture

CR Communicate frequently

Our trunk merge development is mandatory CR Of , We used to spend a year , Launched 2800+ Time CR, On average, there are 8 Time CR Be triggered .

The release speed is very fast

Because I left CR, Plus other quality assurance measures , You can actually do it anytime release edition .

Moreover, our cloud microenterprise team has already achieved :

From finding problems , To fix problems to release to production environment , The overall time is even the fastest Not more than half an hour , Compared with the previous week or two, the release speed is no longer in the order of magnitude .

3.2 Collaborative development

Trunk development follows monorepo It's a complete set , When you want to merge development with the trunk , collocation monorepo It can make your development mode have an industrial level improvement .

image.png

3.2.1 Linkage development

What is linkage development ?

For example :

Each of our former students may be responsible for a main module , Then several business modules , This kind of development mode , If we need to develop a feature , We need three students to develop it .

Slow development

This kind of development has a fatal problem :

In many cases, each student has his own development rhythm and priority

It often happens that Zhang San has finished his development , Waiting for Li Si to develop , Li Si's development is finished, waiting for Wang Wu's development , Then we will arrange a time for joint commissioning , Sometimes it's just adding a field , It takes threeorfour weeks !

It is difficult to change jobs

Exchange the work of two of them , First of all, they have a very long adaptation period , And it can't be connected seamlessly

Most of all , because Code style and coding habits are not unified , After exchanging code, they will look down upon each other , In the end, the contradictions in the group have not been solved , It will deepen .

Job rotation eventually leads to , Wasted a lot of handover expiration time , But the effect is not as good as not handing over

No global view

I don't know what the students of online education are like , Anyway, a phenomenon often occurs in our department , When the defense season comes , I often see my classmates carrying notebooks or computers . Come and ask a classmate what you do in this module , There is something in the middle , What optimizations have been made or what components have been used , How about the performance , There are no performance bottlenecks . They have only a very rough understanding of the overall structure .

3.2.2 Collaborative development

We from 2020 year 3 month 28 Start using monoRepo Conduct collaborative development

This pattern is to put all the large components , All businesses are put in one big warehouse . Develop a feature , Only one student needs to be involved in the whole process .

Fast development

A large number of components are reused , To develop a feature, Just put in one classmate , Modify the whole link . Other students only need to participate CR

Job rotation is convenient

In fact, there is no job rotation at all , Each student develops another unknown service , There's basically no threshold , Just connect to the development . Another point is , There is basically no work handover . An inappropriate example , We have a core developer who has been transferred , I have been involved in the development of more than ten modules in Dachang , These do not require any handover

Clear architecture

When you put all your business in one big warehouse , I dare not say that I have a general understanding of all the sorting businesses of the whole department . But for the current project , For example, educational programs , For example, industry platform projects , Members have a very, very clear understanding of the overall architecture .

The details are not clear

Of course, this development model will also have some problems , The details of some modules are not particularly clear .

3.3 Collaborative development

Based on this monorepo Thought , This is a whole frame composition of our current warehouse .

Look from the bottom up , We have encapsulated some commonly used monitoring systems .

We have also encapsulated most of the commonly used storage .

There are also logs that we often use 、 notice 、 function 、 Common components , They are basically encapsulated .

Above this are some public agreements . For example, public constants 、 Structure 、 Error code 、 Protocol library and compilation environment .

The second layer is the adaptation of some framework layers . Let's write now trpc Code or write taf Code , At the coding level , The difference that people can't perceive .

When we're done with this setup , We find that there are many things that will benefit the team .

3.3.1 Quickly build the environment

New employees from the past , We will all set aside a week for him to take a ride . But for now , It may only take half an hour to an hour .

My computer broke down last time , I tried it once , It takes about ten minutes to set up the overall environment .

3.3.2 Unified error code

The unified error code is actually a copy of NASA's maintenance manual . We often see , There is something wrong with the space station , The astronauts rushed out the instructions , Find the error code , Then find the corresponding operation , Then go and solve the problem .

We unify error codes , In fact, I want to achieve a final state :

As long as it is a user of our cloud microenterprise , Just tell us a return code , We can know roughly what the problem is

3.3.3 A unified CI/CD

All assembly lines based on large warehouse are unified . The new service only needs to copy the assembly line of others , Just change two of the parameters

3.3.4 Learning organization

I think the biggest help of the whole development model is , Established a learning organization .

Why is it a learning organization ? Because every problem we find will be fed back in the group , And then launched CR Will also initiate in the group , Then invite everyone to CR Own code . Instead of sneaking away like before Merge.

Another advantage is that it can promote the evolution of our technology stack , In the past, we had a platform team to promote the component packaging of this large warehouse , Now it's time for members to consciously package some useful components into our public library . Like sensitive word detection 、 Kafka and other components . Other benefits are , From individual combat to group combat , Can work together to quickly overcome some problems and solve some problems .

100+ Service case reference ,100+ Component encapsulation support , Service development can scale and Process

3.4 Collaborative questions

I've been talking about collaborative development for so long , Some people will resent , Is Dachang a cure all thing ? If so , Why didn't the whole company move forward , Just a small department of yours .

First of all, let's be clear : The development of dachangjia trunk is not a silver bullet , But an amplifier of code engineering practice . In fact, this is km Or le Wen has been discussed many times .

Why is he an amplifier , Because :

!!#ff0000 When you do well in Okura , Your advantage will be magnified infinitely

But when you do a bad job , Then the problem will be magnified infinitely !!

3.5 Member feedback

from 2020 year 3 At the end of this month, I began to do this practice . stay 21 year 3 At the end of the month , I did a questionnaire survey , Invited those who participated in the backbone development at that time 16 Students make feedback statistics , The following is the proportion of feedback .

image.png

PS:

Before and after sharing this course , I did a pre class and post class check :

Before class is 23 People vote 20 People are willing to try , Proportion 87%

After class Yes 32 Individual voting 31 Individuals are willing to try , Proportion 97%!

I believe that all people yearn for a happy development model !

3.6 Challenges and solutions

From the feedback of the above members , We see a lot of negative feedback . Then how to solve this problem ?

image.png

3.6.1 Code submission conflict

Our request now is : Every submission must be Achieve 200 Within the line , No more than 500 That's ok . If exceeded 500 That's ok , Just go to the director CR, Then other students will not participate .

3.6.2 Code styles are very different

We have made a lot of specifications : First estimate google Of golint This specification may not be fully followed by other teams , But we are very demanding .

Why is it harsh ? because golint The check has been integrated into the compile command , namely :golint The problem with is detected at compile time . besides , We have also made a lot of checkpoints to ensure that everyone writes the same code as the code produced on the assembly line . There will be a special explanation later .

3.6.3 Common library code quality assurance

This is the worst thing we have done in our practice , We used to be in a hurry , No single test is written when writing the public library , As a result, the single test code coverage of our public library has reached 30% about , We from 2021 year 4 The mandatory single test of the public database will be implemented in January CR The system , And a pipeline is developed to scan the coverage of single test .( At present, the single test coverage scanning needs 50min, Therefore, the red line has not been set for the time being )

3.6.4 Entrainment release

This is actually a difficult problem . Because we currently have half a million codes , nearly 100 Services are being developed together , Entrainment release is inevitable , What we have to do is , How to reduce the risk of introducing entrainment release , Let's talk about two solving cases :

  1. Google, It is required that all incoming mainlines can be directly published online , If you can't , Just add code and close it , When it can be opened , Delete and close . This practice requires high quality of engineering personnel , We also need a very fast risk response plan .
  2. Tencent advertising , Made a powerful switching system . All feature codes have switch control , There is also a perfect notification deletion system . The advantage of this approach is that all the code is controllable , Also try to reduce the situation of entrainment release . More details can be found on the intranet 《 How Tencent advertising in 3000 Implement trunk development and sky level automatic release on a large code base of more than ten thousand lines 》 Which of the two options is better ? The author thinks google Is the best solution , Tencent's advertising plan is The most suitable solution for Tencent . Why do you say that? ? First , We have to admit that the average engineering literacy of our Tencent technicians is lower than google Of , It is very difficult for us to do without switching , It can ensure that the code can be directly published online , So their plan is not suitable for us . As for why Tencent's advertising plan is not the best one ? Because this scheme will elevate the code complexity in an already very complex warehouse , It is likely that the complexity will increase as the business requirements become more and more complex , It leads to the switch that can be seen everywhere in the business , The person who took over the code later saw a dozen switches , The whole person is stupid , The later maintainers dare not delete it at will , This will cause code corruption to accelerate . And our intelligent platform product department , Now is to ask every student Try to make it available at any time , But not every student can guarantee this , Because we are 5-10 Small team cooperation mode , At present, it is still within the controllable range . On the whole , The solution to this problem , We are still at a stage of exploration .

3.7 Code quality assurance

Then we will focus on solutions with different code styles . We did a very strict code quality assurance , Has been strict to what extent ?

First look at the picture below

From new recruits to integration codes to mainline , We have formulated very strict 4 Gateway

3.7.1 The first level : New person training

First of all, we are new recruits , Whether you are social or school recruitment , After we come in, we will all do a code CR A training for :

  • How to design the code ?
  • Da Cang + How to develop the trunk ?
  • How to set up the environment quickly ?
  • The code atmosphere of the team is quickly cultivated
  • What code specifications are there ?

It takes about an hour or two each time .

3.7.2 The second level : Code check

Then when writing code , our makefile It's integrated golint Check and go fmt Auto format . As long as you compile , Will prompt golint Check , And do automatic code formatting , These are all done when the new person is completely unaware

3.7.3 The third level :PreMerge

We are initiating MR after , Automatically start to execute the assembly line ( We devops team , It's called Gonggong . Therefore, this basic support assembly line is called buzhoushan ) , What are the functions of this mountain

  • Tencent code check
+  Woodpecker safety check 
+ gometalint Check 
+  Cycle complexity check 
  • xcheck Check
  • Unit test check (doing)

3.7.4 The fourth level :CR

We are mandatory CR Of , At least one person CR After passing , To join in master Of , And participate CR All of our classmates are our core development backbones , We are also planning to do some for these core development backbones Code ReadAbility train , At present, reference has been made to PCG Of EPC How to do it , Making Code ReadAbility The training is planned .

A code is merged into master, If the code causes subsequent problems , Our current identification is Developer 90% responsibility ,10%CR Students' responsibilities .

3.7.5 Code governance achievements :

We are 2020 year 4 The month Dachang project was strictly implemented at the beginning of its establishment golint standard +《 Tencent code specification 》, But even if we fully comply with the code specification , It's hard to guarantee that our code will be completely free . A year later, that is, in 2021 year 4 month , We added Tencent code check again +xcheck Check , What surprised me was , When we open these things , The original number of questions is 0, It soared to 1500+

Many things we didn't care about , But after deep thinking , Indeed, it is found that non-compliance with these specifications will lead to accelerated code corruption . So we began to repair , After three months of repair , The number of current problems has dropped to 24 individual .

image.png

The rest of this 24 It's all about complexity , This kind of problem is difficult to fix . For example : In a function 200 Many lines of code , Each function should be optimized to no more than 80 That's ok . You may think that it doesn't matter if you don't fix this problem , Just write notes and documents , It can still ensure that the code is very readable . But is this really the case ? Not at all ! Because when your code for this function exceeds 80 After line , People who work in shifts touch your code , Or it's hard to change your code , Will he change or not ? If you don't transform , Then expand the number of lines , The code will start to corrupt , If the transformation , Will that introduce new problems ?

Our current large warehouse is forced to open the standard inspection . When your single function code exceeds 80 After line , There's no way merge To our master Branch , So as to ensure that our overall code is clean .

Of course, can we eradicate code corruption by doing this ? You can't ! We can only delay the progress of code corruption . Others need to do code refactoring in two years , We can only guarantee that the code may not be refactored until five years later , And most importantly , Do code refactoring in five years , Refactoring must be very easy , There will be no such a situation as "scolding your mother while reconstructing..." .

3.8 Industrial development

This is a very, very big proposition , In my opinion, the biggest and best harvest of our trunk development is that we have realized the transition stage from the project to industrial development .

What is industrial development :

Development has scale 、 Standardization 、 creativity

3.8.1 scale

In the area of scale , I think we are already very mature .

We have a unified CI/CD Assembly line , All you have to do is modify based on the template 3-5 One parameter can complete CI/CD Opening up the system

We have a unified code detection system , You just need to develop normally , compile ,MR You can access to a perfect code detection system

We have a unified compilation environment , No matter you use osx still linux Or windows, Can quickly build a compilation environment for compilation and release

We have 100+ Common component encapsulation of ,99% The functions of have been encapsulated , Just waiting for you to use , This number , And every month 10+ The rate of growth

These imperceptible parts , It is the foundation of industrial development . imagine : These things are micro aware or even non aware when each developer comes in . For example, the configuration file , Tianji Pavilion , Developers don't need to worry about permission management , Functional segmentation , grouping , Subsequent iterations are upgraded , The principle of component architecture , It can be used without obstacles . Can you devote all your energy to business development , Create greater value ?

3.8.2 Standardization

Thanks to our large-scale applications , We have also done a solid job in standardization

Such as code standardization , Make sure that every student doesn't get estranged when they see the code , If there is , Namely CR It was not done in detail

Such as coding standardization , Those components should be connected , How to design an efficient function architecture , These are on a daily basis CR in , There will be code exchange to promote everyone's rapid growth

For example, output standardization , We have done a lot of training and Discussion on log output , If you do an efficient log output, it is also in daily use CR There are in

3.8.3 creativity

After complementing the scale and Standardization , All engineers can be creative on this . for instance :

  • Solve some problems quickly
  • Promote some improvement of common components
  • There is also an overall iteration of the technology stack , The rapid development to the primary cloud relies on Qi .
  • Turn a lot of local optimization into global optimization

These are all places where we are currently evolving , And we are working together , Has done quite well .

3.8.4 results

At present, Tencent cloud microenterprises can support 40+ Different KA End Special characteristics of , Hundreds of customers bot, as well as 8 Government traffic with different functional characteristics And other scenarios

Enterprise projects , Development of Xi'an support field , You can enter the warehouse and participate in the war according to the cloud microenterprise code and quality specifications , High quality completion of function development and code handover after development , Indirectly expand the delivery bandwidth ;

3.9 Lessons learned

Since we implemented trunk development , It's not plain sailing , Here are some of our experiences and lessons

image.png

3.9.1 Summary of experience

Log encapsulation :

Start with a log encapsulation that we initially implemented , Everyone starts using a component . We will be compatible later RPC, Support atta Report , Support Tianji Pavilion trace Log reporting these functions , All users are insensible . We don't even know why he typed a line of logs , You can go to Tianji Pavilion 、 Query on the eagle eye .

Tianji Pavilion is sealed

How easy it is to connect Tianji Pavilion in our warehouse . Just one person , It only needs 10 minute , Can complete 70+ Service access and online . Efficiency is terrible

CR communication

In all experience , I think it is a technological growth for everyone 、 The biggest help of continuous progress is CR Communication system , All our code wants merge To the main branch is mandatory CR Of , Under this mechanism , Our code communication has become very frequent , Of course, there are many problems in this communication process , These problems also force us to gradually standardize , Gradually form a unified development mode .

3.9.2 The problem summary

Quality problems of common components

For example , As Tianji Pavilion Oteam member , I often follow up the upgrade of Tianji Pavilion , Because our Tianji Pavilion is deployed independently , This leads us to upgrade to the latest version of Tianji Pavilion , Occasionally there are some problems .

Once after I finished upgrading , This Tianji Pavilion is due to api change , Not tested on a large scale . As a result, all our services failed to be reported , Although this problem can be solved by fast rollback after it occurs , It doesn't affect the online , But it also warns us to be cautious when dealing with the upgrade of public components .

Non standard operation of members

A classmate changed the public library , After the change, I put it up directly without compiling , And initiated MR.CR My classmate looked at it , Logical changes are normal , It also conforms to the code specification , Let's go to master 了 . Later, it will cause others to fail to compile , Until it is unified later fix To solve this problem . This kind of problem also warns us , The submitted code must be compiled first , The walk through test passed , Only after passing the functional verification can it be connected to master

4. Rapid positioning

After talking about rapid development , We began to develop a rapid positioning system

4.1 Monitoring system

We started to construct a complete set of monitoring system in April last year .

Throughout the transition , The overall monitoring system is very chaotic . As you can see, the top half of the figure is the monitoring system we access at these times :

original TAF It comes with a service monitor /PP monitor / Feature monitoring , Then we found that Tianji pavilion was very useful , Later, our company is mainly promoting 007 Unified monitoring and alarm system , So we have also carried out a unified 007 Monitoring and reporting .

After we access so many monitoring alarms , It is found that there are many redundant data reports , But because we adopt the development mode of Okura , Now it has been gathered to cloud monitoring and Tianji Pavilion .

All current frameworks 、 Storage is based on unified cloud monitoring and Tianji Pavilion monitoring and alarm .

image.png
image.png
image.png
image.png

4.2 Tianji Pavilion quick positioning

The Tianji Pavilion of the company is really very useful , As a major contributor to Tianji Pavilion, our department , Contributed to taf-cpp,taf-go,taf-java,taf-nodejs Code and component encapsulation .

meanwhile , Due to the TPS A very large (1000W/TPS), So we deployed some important modules of Tianji Pavilion independently . Like in the picture collector,ES Storage and its API Server Have been independently deployed to the cloud . This way of independent deployment can ensure the stability of our Tianji Pavilion , Even if Tianji Pavilion leads to a wide range of faults and changes , We will not be affected .

How to use Tianji Pavilion :

We directly catch one in the group chat SessionID Of , Open Tianji Pavilion directly , You can see the link log . Just throw the cut graph of the problem to the group , Then you can reply quickly . The whole process , Forty or fifty seconds is enough .

About Tianji Pavilion , Because we have specially invited the head of Tianji Pavilion tensorchen Let's explain the architecture and functions , So we won't go into details here .

4.3 Log system

Okura is really very fast in making this log system . As shown in the figure below , Developers just need one line of code , You can access a very large log system without perception

image.png

In fact, some students don't know what his back-end principle looks like , For all the students , He just kept a diary , You can even go to Tianji Pavilion and eagle eye to find out . This is one of the advantages of Okura .

5. Fast delivery

When we are making the delivery system , There's a problem , Each time we issue a bill of lading, we must go through the process of issuance and approval , This process will involve the approval of many irrelevant personnel

For example :

We turn off a function , The corresponding code contains a Boolean value modification , According to our original delivery system , You need a bill of lading , Need designers , Testers , Product people , Business student approval , They don't usually read , And sometimes they are too busy to approve , The whole process may take twoorthree days .

According to this situation , We have built a two person delivery system .

5.1 Two person delivery system

We started from MR Finally release to the formal environment , A whole process .

Probably the fastest is 28 minute , The slowest time is forty or fifty minutes , And most of the time , Is spent in the automatic observation phase . At this stage we have to delay 20 Minutes to observe the timeout rate of the service , Exception rate? If there is a problem, you need to roll back automatically .

There are only two people involved in the whole process , The first is the developers themselves , The second is the tester . Few people are connected , There is no need to test , Release speed is very agile .

Some students will ask : How does this delivery system ensure delivery quality ? After all, you will be in the production environment in about half an hour , There must be no time to test , How to ensure code quality ?

yes , we have : The purple module in the figure , Some quality assurance modules . In addition to these module guarantees , There is also a very strict test free verification .

5.2 Test free premise

image.png

In terms of line coverage and method coverage , We are actually better than Tencent advertising , Tencent documents , App Bao is more rigorous , however Line coverage Follow Method coverage We don't think it means anything . Because even when our row coverage reaches 90%, Our branch coverage may be just 30% Less than . So of all the coverage indicators , The most important thing is branch coverage . And we dare to release without testing , Where does the confidence come from ?, Make up the branch coverage .

Make branch coverage 60% How difficult is it? ?

For example : You wrote 700 Single test , Line coverage 85%, But branch coverage is 20% about . In order to reach this pass line , You may need additional supplements 3000 Line single test code can achieve !

besides , We also demand , Every time I release Line coverage + Method coverage + Indicator data of branch coverage Not lower than last time

We have now completed 58 Construction of a test free assembly line , Yes 58 A service , Has begun to take this green channel to quickly publish services to the public .

6 Encoding practice

Coding practice we have probably practiced 1 More years , After more than a year of repetition CR, Now, except for the new students , Now the submitted code is very standard .

6.1 MR standard

image.png

1. No more than... Code can be submitted each time 200 That's ok . exceed 200 Line below 500 That's ok , Others may help you when they are in a good mood . But you have surpassed 500 OK, then you can only go to the director CR 了

  1. Binary file submission is not allowed , Only in this way can we ensure that the warehouse capacity expands too fast . Binary system ,doc Documents, etc , All stored in Tencent software source and iwiki Up .
  2. Do not write code in parallel . When you have developed a piece of code , launch CR after , Don't do another thing immediately feature Development . The right thing to do is to urge others CR Your code , And then you do it according to other people's CR Repair as soon as possible , Join to the main branch , Then pull out new branches from the main branch for new development

6.2 Code design

Code design is a mysterious thing , You can go and have a look Cheaterlin Some articles .

Let me focus on the following points .

image.png
  1. Don't type so many logs . Some of our services are not abnormal every day , however 1 There is... In the sky 80G Of debug Print out the log , Caused a great waste of resources 2. Key information , There are thousands of lines in the log , Some key information is missing , I feel lonely 3. Don't use magic numbers , Don't use magic strings , Used without leaving any comments , Code corrupts directly
  2. Code cleanliness , When you find that you don't like it, you list it , Deal with as soon as possible
  3. Take the initiative to repair , stay CR Problems found outside , Repair it easily . And leave a note , Exchange experience . The whole team atmosphere will be better

6.3 Code CR actual combat

Our department is in CR All the time I was CR What is it , Practice it .

Let's look at the following code , from 12 Row to 32 That's ok , In a matter of 20 OK, you can CR How many things come out

image.png

CR give an example :

image.png

The modified code :

image.png

6.4 Code specification

The last is the code specification we are following now

image.png

7 Q&A

Q: In the process of advancing the warehouse , You are not plain sailing . There must have been a lot of ups and downs , Is it recorded , What problems have you encountered and how to solve them . Then we can use it for reference .

A: It's a pity that when we were doing these things , I didn't make some records on paper . But basically all the problems are solved , Will be kept in the submitted information of that big warehouse

Q: Do you have any questions that you are impressed with ?

A: In fact, it is the problem of engineers' code literacy . This is what I think is the biggest challenge . It doesn't mean everyone will follow your rules , They may modify the code themselves due to carelessness or other reasons , Will cause others to compile but , Second, you can't immediately release the code online , The situation of being carried and released , These are actually some problems encountered in the practice of Dachang . Then it happens occasionally , But at present, there is no online accident .

Q: But I'm worried about a big accident , For example, you should not have many single tests at first , There are no automated use cases ? At that time, how to guarantee the code quality ?

A: Yes . Just before the big position , Our single test is missing , Everyone plays his own game , After entering the big warehouse , These controls are very strict , We are also asking you to supplement the single test slowly , It took us more than half a year , The single test of all services starts from 20% Added to 80%, Branch coverage ranges from 10% Added to 60%, Indeed, a lot of human resources have been invested to do this

Q: The one you just talked about CR practice . Compare the surface problems . Is there any tool or method to guarantee this ?

A: It's like this , There must be some of our tools . And there are three sets of tests , But tools are not everything , When it comes to some code design problems, we need to look at them ourselves . Tools can solve 90% The problem of , But there will be 10% You need to look at the problem , And this 10%, Often the most important .

Q: Do we have any good experience in putting old code into big warehouse ? Do we have to refactor them all ? Let go of the old ones first , Then just do it for the new code CR?

A: In fact, we were doing something a while ago . We took over a batch of codes from other students , Recently, we are migrating this part of the code to the warehouse , We arrange interns , Fresh graduates , Or the students who have just joined the company will transfer . Let novices familiarize themselves with the code by using the process of standardizing the code , After all repairs, it will be put into the big warehouse , Instead of closing it first and then repairing it .

If it is a combination before governance , It will probably be a year or two later . So we are very strict in terms of access .

Q:CR This problem . Let me ask you again , For example, we develop a function that may be relatively large , This large function may write a lot of code . Your rule is that every time CR Code cannot exceed 200 That's ok , But even if it's just 200 That's ok , I can't finish the whole frame . What to do in such a case ?

A: This situation shows that your split is not detailed enough . When you split it into an atom , Everything can be solved in one function . What is the average amount of code submitted by Tencent employees in a day of development ? yes 167 That's ok . therefore 200 OK, it's just a classmate's day's workload . exceed 200 Line resubmit ,CR It's hard for people to pay attention to your code logic .

Each work can be disassembled , Don't think it's a big demand. I've finished it all in one fell swoop , Submit it , It's going to be very fulfilling , This kind of long-time coding will make people gradually become tired , Instead, it is prone to quality problems .

原网站

版权声明
本文为[economic expert]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/09/20210906174920114r.html