当前位置:网站首页>Explore performance optimization! Performance improvement from 2 months to 4 hours!

Explore performance optimization! Performance improvement from 2 months to 4 hours!

2022-06-12 01:46:00 androidstarjack

Click on the top “ Terminal R & D department

 Set to “ Star standard ”, Master more database knowledge with you 

author : Dodger sun  |  Blog Garden

https://www.cnblogs.com/flashsun

I don't know what to do with performance optimization , How to think about it , Until recently, I took over a small project of the company , It can be said that the sparrow has five internal organs . Let me learn a lot about performance optimization , Or some ways of thinking . When you really feel the loss of efficiency at any point and magnify it by a certain multiple , It's going to be astronomical . At first, my program calculated that I needed to run 2 Months to finish , after 2 Constantly adjust the structure and details weekly , Improved performance to 4 Hours to complete .

A lot of experience , Hope to share with you , I also hope to criticize and correct more , Common progress .

One 、 Project description

I abstract the company's project content , Probably to do such a thing :

1、 database A There is 2000 10000 user data ;

2、 Will database A User readout in , Build for each user guid, And save to database B in ;

3、 At the same time in the database A Generate association table in ;

567f0bf7fbbf73f3ee3ea2b8df3cd2f5.png

Project requirements are :

1、 Save users to database B The procedure of needs to call sdk Interface for registration , Direct operation not allowed jdbc Insert ;

2、 Data needs to be recoverable : Run again to skip successful data ; The data in error should be persisted so that you can choose to recover this part of data next time ;

3、 Data should be consistent : Without error , database B The users of must correspond to the database one by one A Association table . If something goes wrong , Then the correct data plus the recorded error data should be consistent ;

4、 Speed should be as high as possible : common 2000 Ten thousand data , On the premise of correctness , Within one day ;

Two 、 The first edition : Process oriented ——2 Months

features : Process oriented 、 Single thread 、 Non expansion 、 Extreme coupling 、 Insertion by item 、 Data is not recoverable

a1c2e7a67a0e99f89e4000b01f2cbba3.png

The original version is a collection of all the shortcomings of a project . The whole process is from A The library reads a piece of data , Do it now , Then call the interface to insert. B library , And then I'm trying to spell the sql sentence , Insert A library . No counter , No error message handling . The resulting code eventually predicts 2000 10000 pieces of data to be processed 2 Months . If even one piece of data in the middle goes wrong , We're going to do it again 2 Months . Absolutely horrible .

This flow chart is equivalent to nonsense , It is based entirely on process oriented thinking , The whole code is in a big main Method , The actual business process is exactly the same as the code process . Simple to think about , But it is extremely difficult to realize and maintain , Code structure is long and confusing . And it's almost non scalable . Let's not talk about the beauty of code design , There are several main reasons why it is so inefficient :

1、 The speed of each piece of data is controlled by the slowest link in the whole chain . Imagine if there was one A The data that the library inserts the associated table is stuck , Waiting for nearly 1 minute ( Exaggerated ), That's a minute jvm Just waiting , It can go on with the first two steps . Just as you can do everything else while you wait for the eggs to boil .
2、 towards B Library insertion user needs to call sdk(HTTP request ) Interface , Then each call needs to establish a connection , Waiting response , Release the link again . Just like you're going to give a friend a box of apples , You divided 100 One at a time , It's time to get on the way .

3、 ... and 、 The second edition : object-oriented ——21 God

features : object-oriented 、 Single thread 、 Scalable 、 Slightly coupled 、 Batch insert 、 Data recoverable

4fe2dfc7ebdc17e79e7d05171da435b5.png

3.1、 Architecture design

Issues designed according to the first edition , There are some improvements in the second edition . Of course, the most obvious is the transformation from process oriented to object-oriented .

I took the whole process apart , Assign to different objects for processing . such , This is how I assign objects :

1、 A configuration object :BatchStrategy. Read the policy of this task from the configuration file and pass it to the executor , Configuration includes basic configuration such as total number , Quantity per batch query , Quantity per batch insert . There are also data sources , Table name of the Tathagata source table 、 Name 、 etc. , In this way, if you change to a similar import of other databases , It can be expanded through configuration .

2、 Three performers : The whole execution process can be divided into three parts : Reading data -- Processing data -- Writing data , It can be handed over to three objects respectively Reader,Processor,Writer Conduct . So if the logic changes , Can be changed independently without affecting other links .

3、 A failure data processing class :ErrorHandler. In this way, whenever there is an exception in the data , Then throw the modified data to this class , Write log in this class , Or other solutions . Decouple the processing of failed data to a certain extent .

This design largely decouples , In particular, the processing of failure data is basically completely decoupled . But because the whole implementation process still needs to have a main To call three object processing tasks respectively , So the three are not completely decoupled ,main Part of the logic is still process oriented , More complicated . Even if main The logic executed in the service, This problem is still unsolved .

3.2、 The efficiency problem

Since the first edition of the article by article insertion is changed to batch insertion . among sdk In the interface part, a batch of data is passed in , Less http Number of requests . The part of generating association table is used jdbc batch operation , Insert the excute Change it to excuteBatch, Efficiency improvement is obvious . Efficiency improvement brought by these two parts of batch , Code that would have taken two months , Promoted to 21 God , But it's still astronomical .

It can be seen that , This efficiency increase is only reducing http Number of requests , Optimize sql The insertion logic aspect of , But it still hasn't solved a fatal problem in the first edition , The speed of one cycle is still controlled by the slowest link in the whole chain , It can be seen from this point that the three are not decoupled , When the other two don't finish the work , Just wait , This is the most serious loss of efficiency .

  Four 、 The third edition : Completely decoupled ( queue + Multithreading )——3 God

features : object-oriented 、 Multithreading 、 Scalable 、 Completely decoupled 、 Batch insert 、 Data recoverable .

9e8fe609f7abdd69aabfe3426720c309.png

4.1、 Architecture design

This version has no code implementation , But it's really an important process of thinking over to the next edition , So it is recorded in . There are two major improvements from the previous version : Queues and multithreading .

queue : Among them, the use of queues makes the execution classes not completely decoupled in the previous version , Complete decoupling is realized , Make synchronous process asynchronous , At the same time, it is also the premise of multithreading .Reader All you have to do is read the data , And put it in the queue , As for its next link Processor How to process the data of a queue , It doesn't care , At this time, you can continue to read the data . This enables complete decoupling , Multithreading can also be used to process queue data .

Multithreading Processor and Writer The things that were done , Is to read the data in its own queue , Then process . It's just Processor Than Writer It also undertakes the process of putting data in the next loop of the queue . The queue here is multithreaded safe ConcurrentLinkedQueue. Therefore, you can use multithreading to perform the tasks of both . Due to the complete decoupling between all links , The occasional card owner on a ring will not affect the progress of the whole process , So I don't know one or two points about efficiency improvement .

Another point is that the recoverability of data is guaranteed in this design , Successful users are saved so that they can run again without conflict , The failed association table data is also recorded , At next run time Writer I will add this part to my queue first , The correctness of the whole data has a plan that is not particularly perfect , There has been a considerable increase in efficiency .

4.2、 The efficiency problem

Although efficiency from 21 Heaven has risen to 3 God , But we still have some questions to think about . Actually found in the process of execution ,Writer The completed data always follows Processor after . This means that Processor Processing speed is slower than Writer, because Processor Before inserting the database, you need to go through the business logic of registering users . There's a problem , When the speed of the previous ring is slower than that of the next ring , Is batch operation necessary ? The answer is not needed . Just imagine , If you're on the production line , Your last ring 2 One part per second , And your speed is 1 Seconds, one. . At this point, even if your batch processing speed is faster , From the perspective of system optimization , You should also have a part to deal with right away , Instead of waiting to accumulate 100 Individual batch processing .

There's another problem , We never thought about it Reader Performance of . Actually, I use limit Operation to batch read database , and mysql Of limit Check the whole table first and then intercept it , When the starting position is large , It's going to get slower and slower .0-1000 It's easy , but 1000 Wan to 2000 Wan Wan is “ Can't do anything ”. Therefore, the bottleneck of efficiency finally falls on the library reading operation .

5、 ... and 、 The Fourth Edition : Highly abstract ( One button start )——4 Hours

features : An interface 、 Multithreading 、 Scalable 、 Completely decoupled 、 Batch or insert item by item 、 Data recoverable 、 Optimized query limit operation

5.1、 Thinking about architecture

Elegant code should be neat and beautiful , It shouldn't be long and complicated . This edition will be designed to be as simple as the first , Performance and scalability surpass all versions of Architecture .

By summarizing the features of the first three editions , I found that whether it was Reader,Processor,Writer, All have common characteristics : Start the task 、 Processing tasks 、 End task . and Reader and Processor There is also a common way to transfer data to the next process , The function of notifying the end of data transmission of the next operation . They're like processes on a production line , They are interrelated and running independently . Each process can be started , Dealing with tasks crazily , Until the end of the last process notice . And the first one that initiated the notice ended Reader, After that, we will inform the next one , Until the whole process stops , It's a wonderful process .

8d342ad2ac7ccee97ae2e5e4cdd822d4.png

So we can think of all three as Job, except Reader They also have the ability to interact with the previous process ( Actually Reader The last operation of is database ), So we have the following interface design .

8950d16a43aee6a131764bca088b6cbd.png

5f65411d2ac2c6572493f06a938c5ba7.png

With this interface design , No matter how the implementation class is written , The main method can be written out , Become unusually neat and orderly .

Refine only the main part , Some details have been removed , Such as log output 、 Time record, etc .

d5faed64973188804aac6c6f6e158894.png

The next step is to implement the class , Here, the implementation class mainly implements three functions :

1、 Receive data from the previous ring : Belong to Interactive Interface receive Method implementation , Based on the previous design , That is, one of the objects ConcurrentLinkedQueue Properties of type , Used to receive data from the previous ring .

2、 Process data and pass it to the next ring : In every one of them ( With the next ring ) In object properties , Objects placed in the next ring . Such as Reader Must have Processor object ,Processor Want to have Writer, Once there is data to be added to the next ring of queues , Call its receiive The method can .

3、 Tell the next ring I'm done : At the end of this task , Call the closeInteractive Method . And the way each object judges its end depends on the situation , such as Reader The end condition is that the data read in bulk exceeds the data set at the beginning total, Description data reading completed , Can end . and Processor The end condition is , It's told by the last ring that it's over , And from their own queues poll There's nothing going on , Proof should end , Inform the next link after the end . So that the whole process can exit safely and orderly . But because of multithreading , therefore Processor No notice Writer End signal , Need to be in Processor Get a counter inside , Only the thread with the expected number of counters Processor, To initiate an end notice .

5.2、 The efficiency problem :

As proposed in the previous edition ,Processor Processing speed is slower than Writer, therefore Writer No need to use batch To process data insertion , Instead, it's a way to improve performance .

Large amount of data limit Time consuming operation , Because the test part is only in the first few million tests , So we underestimated the loss of efficiency . In the last few million, it can be said that every time limit We can't read anything . With this in mind , I chose the only field that has an index and is slightly easier to sort “ User's mobile number ”,( They didn't want to make complaints about their designs. id...), Each time the whole table sorts the mobile phone numbers , Again limit Inquire about . Save the last cell phone number after query , It becomes an identification of the last piece of data currently read . Next time limit Operation can be started after this mobile number . So that every query, no matter where it starts , The speed is the same . Although the data speed of the previous part is much slower than that of the previous scheme , But it perfectly solves the problem of large data volume limit Excessive waiting time for operation , Prevent the occurrence of danger .

thus , The project architecture is simple again , But compared with the first edition , It's not the same level of simplicity . 

cd1ea3838741e4c5f98a06c958efe1cf.png

6、 ... and 、 Thinking about continuous optimization

1、Reader Part of it is single thread processing , Because the read is from the database , It's not in the queue , So it's a bit of a hassle to design multithreading , But not necessarily , Here is the optimization point

2、 The log part accounts for a large proportion ,2000 Wan tie Du 、 Handle 、 At least 6000 Ten thousand log output . If it is designed to be asynchronous processing , Efficiency will improve a lot .

This is my experience of this project optimization , I hope you can give me some advice . Because the code is for the company to avoid suspicion , Do not send to github 了 , Interested God can talk privately .

——End——

 reply  【idea Activate 】 You can get idea How to activate 
 reply  【Java】 obtain java Relevant video tutorials and materials 
 reply  【SpringCloud】 obtain SpringCloud Many relevant learning materials 
 reply  【python】 Get the full set 0 Basics Python Knowledge Manual 
 reply  【2020】 obtain 2020java Related interview questions tutorial 
 reply  【 Add group 】 You can join the technical exchange group related to the terminal R & D department 
 Read more 
 use  Spring  Of  BeanUtils  front , I suggest you understand these pits first !

lazy-mock , A lazy tool for generating backend simulation data 

 In Huawei Hongmeng  OS  Try some fresh food , My first one “hello world”, take off !

 The byte is bouncing :i++  Is it thread safe ?

 One  SQL  Accidents caused by , Colleagues are fired directly !!

 Too much ! Check Alibaba cloud  ECS  Of  CPU  Incredibly reach 100%

 a vue Write powerful swagger-ui, A little show ( Open source address attached )


 Believe in yourself , Nothing is impossible , Only unexpected, not only technology is obtained here !



 If you like, just give me “ Looking at ”

原网站

版权声明
本文为[androidstarjack]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/163/202206120142224076.html