当前位置:网站首页>Explore performance optimization! Performance improvement from 2 months to 4 hours!
Explore performance optimization! Performance improvement from 2 months to 4 hours!
2022-06-12 01:46:00 【androidstarjack】
Click on the top “ Terminal R & D department ”
Set to “ Star standard ”, Master more database knowledge with you author : Dodger sun | Blog Garden
https://www.cnblogs.com/flashsun
I don't know what to do with performance optimization , How to think about it , Until recently, I took over a small project of the company , It can be said that the sparrow has five internal organs . Let me learn a lot about performance optimization , Or some ways of thinking . When you really feel the loss of efficiency at any point and magnify it by a certain multiple , It's going to be astronomical . At first, my program calculated that I needed to run 2 Months to finish , after 2 Constantly adjust the structure and details weekly , Improved performance to 4 Hours to complete .
A lot of experience , Hope to share with you , I also hope to criticize and correct more , Common progress .
One 、 Project description
I abstract the company's project content , Probably to do such a thing :
1、 database A There is 2000 10000 user data ;
2、 Will database A User readout in , Build for each user guid, And save to database B in ;
3、 At the same time in the database A Generate association table in ;

Project requirements are :
1、 Save users to database B The procedure of needs to call sdk Interface for registration , Direct operation not allowed jdbc Insert ;
2、 Data needs to be recoverable : Run again to skip successful data ; The data in error should be persisted so that you can choose to recover this part of data next time ;
3、 Data should be consistent : Without error , database B The users of must correspond to the database one by one A Association table . If something goes wrong , Then the correct data plus the recorded error data should be consistent ;
4、 Speed should be as high as possible : common 2000 Ten thousand data , On the premise of correctness , Within one day ;
Two 、 The first edition : Process oriented ——2 Months
features : Process oriented 、 Single thread 、 Non expansion 、 Extreme coupling 、 Insertion by item 、 Data is not recoverable

The original version is a collection of all the shortcomings of a project . The whole process is from A The library reads a piece of data , Do it now , Then call the interface to insert. B library , And then I'm trying to spell the sql sentence , Insert A library . No counter , No error message handling . The resulting code eventually predicts 2000 10000 pieces of data to be processed 2 Months . If even one piece of data in the middle goes wrong , We're going to do it again 2 Months . Absolutely horrible .
This flow chart is equivalent to nonsense , It is based entirely on process oriented thinking , The whole code is in a big main Method , The actual business process is exactly the same as the code process . Simple to think about , But it is extremely difficult to realize and maintain , Code structure is long and confusing . And it's almost non scalable . Let's not talk about the beauty of code design , There are several main reasons why it is so inefficient :
1、 The speed of each piece of data is controlled by the slowest link in the whole chain . Imagine if there was one A The data that the library inserts the associated table is stuck , Waiting for nearly 1 minute ( Exaggerated ), That's a minute jvm Just waiting , It can go on with the first two steps . Just as you can do everything else while you wait for the eggs to boil .
2、 towards B Library insertion user needs to call sdk(HTTP request ) Interface , Then each call needs to establish a connection , Waiting response , Release the link again . Just like you're going to give a friend a box of apples , You divided 100 One at a time , It's time to get on the way .
3、 ... and 、 The second edition : object-oriented ——21 God
features : object-oriented 、 Single thread 、 Scalable 、 Slightly coupled 、 Batch insert 、 Data recoverable

3.1、 Architecture design
Issues designed according to the first edition , There are some improvements in the second edition . Of course, the most obvious is the transformation from process oriented to object-oriented .
I took the whole process apart , Assign to different objects for processing . such , This is how I assign objects :
1、 A configuration object :BatchStrategy. Read the policy of this task from the configuration file and pass it to the executor , Configuration includes basic configuration such as total number , Quantity per batch query , Quantity per batch insert . There are also data sources , Table name of the Tathagata source table 、 Name 、 etc. , In this way, if you change to a similar import of other databases , It can be expanded through configuration .
2、 Three performers : The whole execution process can be divided into three parts : Reading data -- Processing data -- Writing data , It can be handed over to three objects respectively Reader,Processor,Writer Conduct . So if the logic changes , Can be changed independently without affecting other links .
3、 A failure data processing class :ErrorHandler. In this way, whenever there is an exception in the data , Then throw the modified data to this class , Write log in this class , Or other solutions . Decouple the processing of failed data to a certain extent .
This design largely decouples , In particular, the processing of failure data is basically completely decoupled . But because the whole implementation process still needs to have a main To call three object processing tasks respectively , So the three are not completely decoupled ,main Part of the logic is still process oriented , More complicated . Even if main The logic executed in the service, This problem is still unsolved .
3.2、 The efficiency problem
Since the first edition of the article by article insertion is changed to batch insertion . among sdk In the interface part, a batch of data is passed in , Less http Number of requests . The part of generating association table is used jdbc batch operation , Insert the excute Change it to excuteBatch, Efficiency improvement is obvious . Efficiency improvement brought by these two parts of batch , Code that would have taken two months , Promoted to 21 God , But it's still astronomical .
It can be seen that , This efficiency increase is only reducing http Number of requests , Optimize sql The insertion logic aspect of , But it still hasn't solved a fatal problem in the first edition , The speed of one cycle is still controlled by the slowest link in the whole chain , It can be seen from this point that the three are not decoupled , When the other two don't finish the work , Just wait , This is the most serious loss of efficiency .
Four 、 The third edition : Completely decoupled ( queue + Multithreading )——3 God
features : object-oriented 、 Multithreading 、 Scalable 、 Completely decoupled 、 Batch insert 、 Data recoverable .
4.1、 Architecture design
This version has no code implementation , But it's really an important process of thinking over to the next edition , So it is recorded in . There are two major improvements from the previous version : Queues and multithreading .
queue : Among them, the use of queues makes the execution classes not completely decoupled in the previous version , Complete decoupling is realized , Make synchronous process asynchronous , At the same time, it is also the premise of multithreading .Reader All you have to do is read the data , And put it in the queue , As for its next link Processor How to process the data of a queue , It doesn't care , At this time, you can continue to read the data . This enables complete decoupling , Multithreading can also be used to process queue data .
Multithreading :Processor and Writer The things that were done , Is to read the data in its own queue , Then process . It's just Processor Than Writer It also undertakes the process of putting data in the next loop of the queue . The queue here is multithreaded safe ConcurrentLinkedQueue. Therefore, you can use multithreading to perform the tasks of both . Due to the complete decoupling between all links , The occasional card owner on a ring will not affect the progress of the whole process , So I don't know one or two points about efficiency improvement .
Another point is that the recoverability of data is guaranteed in this design , Successful users are saved so that they can run again without conflict , The failed association table data is also recorded , At next run time Writer I will add this part to my queue first , The correctness of the whole data has a plan that is not particularly perfect , There has been a considerable increase in efficiency .
4.2、 The efficiency problem
Although efficiency from 21 Heaven has risen to 3 God , But we still have some questions to think about . Actually found in the process of execution ,Writer The completed data always follows Processor after . This means that Processor Processing speed is slower than Writer, because Processor Before inserting the database, you need to go through the business logic of registering users . There's a problem , When the speed of the previous ring is slower than that of the next ring , Is batch operation necessary ? The answer is not needed . Just imagine , If you're on the production line , Your last ring 2 One part per second , And your speed is 1 Seconds, one. . At this point, even if your batch processing speed is faster , From the perspective of system optimization , You should also have a part to deal with right away , Instead of waiting to accumulate 100 Individual batch processing .
There's another problem , We never thought about it Reader Performance of . Actually, I use limit Operation to batch read database , and mysql Of limit Check the whole table first and then intercept it , When the starting position is large , It's going to get slower and slower .0-1000 It's easy , but 1000 Wan to 2000 Wan Wan is “ Can't do anything ”. Therefore, the bottleneck of efficiency finally falls on the library reading operation .
5、 ... and 、 The Fourth Edition : Highly abstract ( One button start )——4 Hours
features : An interface 、 Multithreading 、 Scalable 、 Completely decoupled 、 Batch or insert item by item 、 Data recoverable 、 Optimized query limit operation
5.1、 Thinking about architecture
Elegant code should be neat and beautiful , It shouldn't be long and complicated . This edition will be designed to be as simple as the first , Performance and scalability surpass all versions of Architecture .
By summarizing the features of the first three editions , I found that whether it was Reader,Processor,Writer, All have common characteristics : Start the task 、 Processing tasks 、 End task . and Reader and Processor There is also a common way to transfer data to the next process , The function of notifying the end of data transmission of the next operation . They're like processes on a production line , They are interrelated and running independently . Each process can be started , Dealing with tasks crazily , Until the end of the last process notice . And the first one that initiated the notice ended Reader, After that, we will inform the next one , Until the whole process stops , It's a wonderful process .

So we can think of all three as Job, except Reader They also have the ability to interact with the previous process ( Actually Reader The last operation of is database ), So we have the following interface design .


With this interface design , No matter how the implementation class is written , The main method can be written out , Become unusually neat and orderly .
Refine only the main part , Some details have been removed , Such as log output 、 Time record, etc .

The next step is to implement the class , Here, the implementation class mainly implements three functions :
1、 Receive data from the previous ring : Belong to Interactive Interface receive Method implementation , Based on the previous design , That is, one of the objects ConcurrentLinkedQueue Properties of type , Used to receive data from the previous ring .
2、 Process data and pass it to the next ring : In every one of them ( With the next ring ) In object properties , Objects placed in the next ring . Such as Reader Must have Processor object ,Processor Want to have Writer, Once there is data to be added to the next ring of queues , Call its receiive The method can .
3、 Tell the next ring I'm done : At the end of this task , Call the closeInteractive Method . And the way each object judges its end depends on the situation , such as Reader The end condition is that the data read in bulk exceeds the data set at the beginning total, Description data reading completed , Can end . and Processor The end condition is , It's told by the last ring that it's over , And from their own queues poll There's nothing going on , Proof should end , Inform the next link after the end . So that the whole process can exit safely and orderly . But because of multithreading , therefore Processor No notice Writer End signal , Need to be in Processor Get a counter inside , Only the thread with the expected number of counters Processor, To initiate an end notice .
5.2、 The efficiency problem :
As proposed in the previous edition ,Processor Processing speed is slower than Writer, therefore Writer No need to use batch To process data insertion , Instead, it's a way to improve performance .
Large amount of data limit Time consuming operation , Because the test part is only in the first few million tests , So we underestimated the loss of efficiency . In the last few million, it can be said that every time limit We can't read anything . With this in mind , I chose the only field that has an index and is slightly easier to sort “ User's mobile number ”,( They didn't want to make complaints about their designs. id...), Each time the whole table sorts the mobile phone numbers , Again limit Inquire about . Save the last cell phone number after query , It becomes an identification of the last piece of data currently read . Next time limit Operation can be started after this mobile number . So that every query, no matter where it starts , The speed is the same . Although the data speed of the previous part is much slower than that of the previous scheme , But it perfectly solves the problem of large data volume limit Excessive waiting time for operation , Prevent the occurrence of danger .
thus , The project architecture is simple again , But compared with the first edition , It's not the same level of simplicity .

6、 ... and 、 Thinking about continuous optimization
1、Reader Part of it is single thread processing , Because the read is from the database , It's not in the queue , So it's a bit of a hassle to design multithreading , But not necessarily , Here is the optimization point
2、 The log part accounts for a large proportion ,2000 Wan tie Du 、 Handle 、 At least 6000 Ten thousand log output . If it is designed to be asynchronous processing , Efficiency will improve a lot .
This is my experience of this project optimization , I hope you can give me some advice . Because the code is for the company to avoid suspicion , Do not send to github 了 , Interested God can talk privately .
——End——
reply 【idea Activate 】 You can get idea How to activate
reply 【Java】 obtain java Relevant video tutorials and materials
reply 【SpringCloud】 obtain SpringCloud Many relevant learning materials
reply 【python】 Get the full set 0 Basics Python Knowledge Manual
reply 【2020】 obtain 2020java Related interview questions tutorial
reply 【 Add group 】 You can join the technical exchange group related to the terminal R & D department
Read more
use Spring Of BeanUtils front , I suggest you understand these pits first !
lazy-mock , A lazy tool for generating backend simulation data
In Huawei Hongmeng OS Try some fresh food , My first one “hello world”, take off !
The byte is bouncing :i++ Is it thread safe ?
One SQL Accidents caused by , Colleagues are fired directly !!
Too much ! Check Alibaba cloud ECS Of CPU Incredibly reach 100%
a vue Write powerful swagger-ui, A little show ( Open source address attached )
Believe in yourself , Nothing is impossible , Only unexpected, not only technology is obtained here !
If you like, just give me “ Looking at ”边栏推荐
- 如何让杀毒软件停止屏蔽某个网页?以GDATA为例
- pip运行报错:Fatal error in launcher: Unable to create process using
- Database
- 初探性能优化!从2个月到4小时的性能提升!
- 西南林业大学“西林链”通过工信部电子标准院功能测试 | FISCO BCOS案例
- [learn FPGA programming from scratch -20]: quick start chapter - operation steps 4-2-quick use of Altera quartz II tool (Modelsim co simulation, program download to altera development board)
- Kmeans from 0 to 1
- 【科普视频】到底什么是透镜天线?
- Chinese Version Vocaloid AI Tuner Feasibility Test
- 小程序111111
猜你喜欢

Concepts of programs, processes, and threads

php开发 博客系统的公告模块的建立和引入

Several visualization methods of point cloud 3D object detection results (second, pointpillars, pointrcnn)

On the night of the joint commissioning, I beat up my colleagues

Detailed explanation and examples of common parameters of curl

MySQL高级部分知识点

西南林业大学“西林链”通过工信部电子标准院功能测试 | FISCO BCOS案例

华为,这也太强了吧..

博文推荐|BookKeeper - Apache Pulsar 高可用 / 强一致 / 低延迟的存储实现

Bracket generation (backtracking)
随机推荐
Installing mysql-5.7 for Linux (centos7)
Chinese Version Vocaloid AI Tuner Feasibility Test
You don't need to show your talent. The method is here. You earn 9844 in 31 days as an we media
Mise en œuvre de l'ombre de l'animation du Vertex de l'unit é
How to buy children's serious illness insurance, what to pay attention to and how to choose products
Blog recommended | bookkeeper - Apache pulsar high availability / strong consistency / low latency storage implementation
Don't write about the full screen explosion, try the decorator mode, this is the elegant way!!
[project training] wechat official account to obtain user openid
Redis實現消息隊列的4種方案
Basic use of MATLAB
Machines / scenarios not suitable for webrdp
Comprehensive quality of teaching resources in the second half of 2019 - subjective questions
Operating mechanism of Google ads bidding
国资入股,建业地产这回稳了吗?
Simulated 100 questions and simulated examination for safety management personnel of metal and nonmetal mines (small open pit quarries) in 2022
2022 tool fitter (Advanced) recurrent training question bank and online simulation examination question bank source: YiDianTong official account of safety production simulation examination
Shadow implementation of unity vertex animation
Dataset how to use dataset gracefully. After reading this article, you can fully understand the dataset in c7n/choerodon/ toothfish UI
感知机从0到1
Introduction to SVM
