当前位置：网站首页>From 5 seconds to 1 second, remember the performance optimization with "very" significant effect once

From 5 seconds to 1 second, remember the performance optimization with "very" significant effect once

2022-07-23 17:35:00 【Fat technology house】

performance optimization , Sometimes it looks like a comparison virtual Technical requirements of . Unless the code is unbearably slow , otherwise , Few companies have the awareness to invest resources to do these jobs . Even if you have performance data , It is also difficult to persuade leaders to do a time-consuming job 300ms Down to 150ms Improvement , Because it has no business value .

It's sad , But this is a sad reality .

performance optimization , Usually initiated by people with technical pursuit , Forward optimization according to observation indexes . They usually have the spirit of craftsmanship , Find fault with every millisecond of time , strive for perfection . Of course , If you have time .

1. Optimize the background and objectives

Our performance optimization , It's because it's unbearable , The optimization work , It belongs to ex post facto remedy , Problem driven approach . It's usually no problem , After all, business comes first , The iteration is carried out in the pit filling .

Let's start with the background . The services optimized this time , The request response time is very unstable . As the amount of data increases , Most requests , Time consuming 5-6 About seconds ！ Beyond what ordinary people can bear .

Of course, it needs to be optimized .

To illustrate the objectives to be optimized , I sketched its topology . As shown in the figure , This is a set of microservice architecture services .

among , Our optimization goal , In a relatively upstream service . It needs to go through Feign Interface , Call many downstream service providers , Aggregate and splice after obtaining data , Finally through zuul Gateway and nginx, To send to the browser client .

In order to observe the call relationship between services and monitor data , We have access to Skywalking Call chain platform and Prometheus Monitoring platform , Collect important data so that optimization decisions can be made . Before optimization , We need to first look at the two technical indicators that need to be referred to in the optimization .

throughput ： The number of occurrences per unit time . such as QPS、TPS、HPS etc. .
Mean response time ： Average time per request .

The smaller the average response time, the better , The smaller it is , The higher the throughput . The increase of throughput can also make rational use of multi-core , Increase the number of occurrences per unit time through parallelism .

Our goal of this optimization , Is to reduce the average response time of some interfaces , Down to 1 Within seconds ; Increased throughput , That is to improve QPS, So that the single instance system can undertake more concurrent requests .

2. Through compression, the time consumption is greatly reduced

I want to introduce the most important optimization method to make the system fly ： Compress .

By means of chrome Of inspect View the requested data in , We found a key request interface , Each transmission is about 10MB The data of . How many things have to be stuffed .

So big data , Downloading alone takes a lot of time . As shown in the figure below , I asked juejin A request from the home page , Among them content download, It represents the transmission time of data on the network . If the user's bandwidth is very slow , So the time of this request , It will be very long .

In order to reduce the transmission time of data on the network , Enable gzip Compress .gzip Compression belongs to the practice of changing time into space . For most services , The last ring is nginx, Most people will be in nginx This layer is compressed . Its main configuration is as follows ：

gzip on;
gzip_vary on;
gzip_min_length 10240;
gzip_proxied expired no-cache no-store private auth;
gzip_types text/plain text/css text/xml text/javascript application/x-javascript application/xml;
gzip_disable "MSIE [1-6]\.";
 Copy code

How amazing the compression rate is ？ We can take a look at this screenshot . You can see , After data compression , from 8.95MB Reduced to 368KB！ It can be downloaded by the browser in an instant .

But wait. ,nginx Just the outermost link , It's not over yet. , We can also make the request faster .

See the request path below , Due to the adoption of microservices , The flow of requests becomes complicated ：nginx It does not directly call the relevant services , It calls theta zuul gateway ,zuul The gateway is the real target service , The target service calls other services . Intranet bandwidth is also bandwidth , Network latency also affects call speed , Also compress .

nginx->zuul-> service A-> service E
 Copy code

If you want to Feign All calls between go through the compression channel , Additional configuration is required . We are springboot service , Can pass okhttp Transparent compression of .

Add its dependencies ：

<dependency>
	<groupId>io.github.openfeign</groupId>
	<artifactId>feign-okhttp</artifactId>
</dependency>
 Copy code

Open the server configuration ：

server:
  port:8888
  compression:
    enabled:true
    min-response-size:1024
    mime-types:["text/html","text/xml","application/xml","application/json","application/octet-stream"]
 Copy code

Open client configuration ：

feign:
  httpclient:
    enabled:false
  okhttp:
    enabled:true
 Copy code

After these compressions , The average response time of our interface , Directly from 5-6 Seconds down to 2-3 second , The optimization effect is very significant .

Of course , We also made an article on the result set , In the data returned to the front end , Unused objects and fields , Have been streamlined . But in general , These changes are traumatic , A lot of code needs to be adjusted , So our energy on this is limited , The effect is naturally Limited .

3. Getting data in parallel , Fast response

Next , We need to go deep into the logic of the code for analysis . We mentioned above , User oriented interface , It's actually a data aggregation interface . Its every request , adopt Feign, Called dozens of other service interfaces , Data acquisition , Then splice the result set .

Why slow ？ Because these requests are all serial ！Feign The call is a remote call , It's the Internet I/O Intensive call , Most of the time waiting , If the data is satisfied , It is very suitable for parallel calls .

First , We need to analyze the dependencies of these dozens of sub interfaces , See if they have strict sequencing requirements . If most don't , That would be great .

The results of the analysis are mixed , This pile of interfaces , According to the call logic , In general, it can be divided into A,B class . First , Need to request A The class interface , After splicing data , These data can be used for B Class uses . But in A,B Intra class , There is no order requirement .

in other words , We can put this interface , Split into two parts executed sequentially , In a certain part, data can be obtained in parallel .

Then try to transform it according to the analysis results , Use concurrent In the bag CountDownLatch, It's easy to implement the merge function .

CountDownLatch latch = new CountDownLatch(jobSize);
//submit job
executor.execute(() -> { 
    //job code
	latch.countDown(); 
}); 
executor.execute(() -> { 
	latch.countDown(); 
}); 
...
//end submit
latch.await(timeout, TimeUnit.MILLISECONDS); 
 Copy code

The results are very satisfying , Our interface takes time , Reduced by nearly half ！ here , The interface time has been reduced to 2 Seconds or less .

You may ask , Why not Java What about parallel streams ？

Concurrent programming must be careful , Especially concurrent programming in business code . We constructed a dedicated thread pool , To support the function of concurrent acquisition .

final ThreadPoolExecutor executor = new ThreadPoolExecutor(100, 200, 1, 
            TimeUnit.HOURS, new ArrayBlockingQueue<>(100)); 
 Copy code

Compression and parallelization , It is in our optimization , The most effective means . They directly cut down most of the time-consuming of the request , Very effective . But we are still not satisfied , Because every time I ask , There are still 1 More than seconds .

4. Cache classification , Further accelerate

We found that , Some data acquisition , It's in a loop , There are many invalid requests , That can't endure .

for(List){
    client.getData();
}
 Copy code

If you cache these common results , Then you can greatly reduce the network IO Number of requests , Increase the running efficiency of the program .

Caching is in the optimization of most applications , It's very important . But because of the contrast between compression and parallelism , Cache in our scene , The effect is not very obvious , But it still reduces the request time by about thirty or forty milliseconds .

This is what we do .

First , We'll make some of the code logic simple , fit Cache Aside Pattern Data for patterns , In the distributed cache Redis in . say concretely , It's when reading , Read cache first , When the cache cannot be read , Reread the database ; When it's updated , Update the database first , Delete the cache （ Delay double delete ）. In this way , It can solve most caching scenarios with simple business logic , And can solve the problem of data consistency .

however , Just doing so is not enough , Because some business logic is very complex , The updated code is very scattered , Unsuitable for use Cache Aside Pattern To transform . We learned that , There's some data , It has the following characteristics ：

These data , After time-consuming acquisition , In extreme times , Will be used again
Consistency requirements for business data , It can be controlled within seconds
For the use of these data , Cross code 、 Cross thread , Various ways of use

In this case , We designed an in heap memory cache with a very short lifetime , The data is in 1 Seconds later , It will fail. , Then read from the database again . Adding a node to call the server interface is 1 Second 1k Time , We reduced it directly to 1 Time .

ad locum , Used Guava Of LoadingCache, Reduced Feign Interface call , It's an order of magnitude .

LoadingCache<String, String> lc = CacheBuilder
      .newBuilder()
      .expireAfterWrite(1,TimeUnit.SECONDS)
      .build(new CacheLoader<String, String>() {
      @Override
      public String load(String key) throws Exception {
            return slowMethod(key);
}});
 Copy code

5. MySQL Index optimization

Our business system , It uses MySQL database , Because there is no professional DBA intervention , And the data table uses JPA Generated . When it comes to optimization , A large number of unreasonable indexes were found , Of course, we should optimize .

because SQL It has strong sensitivity , I will only talk about some index optimization rules encountered in the optimization process , I believe you can make analogy in your own business system .

Indexes are very useful , But be careful , If you do a function operation on a field , Then the index won't work . Common index failures , There are two other situations ：

The index field type of the query , Different from the data type passed by the user , To do a layer of implicit conversion . such as varchar Type field on , Into int Parameters
Between the two tables of the query , Different character sets are used , You can't use the associated field as an index

MySQL Index optimization , The most basic thing is to follow the leftmost prefix principle , When there is a、b、c Three fields , If the query criteria use a, perhaps a、b, perhaps a、b、c, So we can create （a,b,c） Just one index , It contains a and ab. Of course , Strings can also be prefixed and indexed , But in ordinary applications, it is less .

occasionally ,MySQL The optimizer for , The wrong index is selected , We need to use force index Specify the index to use . stay JPA in , Then use nativeQuery, To write binding to MySQL Database SQL sentence , We try to avoid this situation .

Another optimization is to reduce back to table . because InnoDB Adopted B+ Trees , But if you don't use a non primary key index , Will pass the secondary index （secondary index） First find the cluster index （clustered index）, Then locate the data . More than a step , Generate back table . Use Overlay index , It can avoid returning to the table to a certain extent , It is a common optimization method . specific working means , Is to query the fields , Put it together with the index to make a joint index , It's a practice of exchanging space for time .

6. JVM Optimize

I usually JVM The optimization of is put in the last ring . and , Unless there is a serious jam in the system , perhaps OOM problem , Will not actively over optimize it .

unfortunately , Our application , Because a large amount of memory is turned on （8GB+）, stay JDK1.8 Under the default parallel collector , It happens all the time . Although not very often , But every few seconds , It has seriously affected the smoothness of some requests .

The program has just begun , It's naked running in JVM Under the ,GC Information , also OOM, Nothing left . To record GC Information , We have made the following transformation .

First step , Join in GC Various parameters for troubleshooting .

-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/xxx.hprof  -DlogPath=/opt/logs/ -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintTenuringDistribution -Xloggc:/opt/logs/gc_%p.log -XX:ErrorFile=/opt/logs/hs_error_pid%p.log
 Copy code

such , We can take the generated GC file , Upload to gceasy And so on . You can see JVM Throughput and delay of each stage .

The second step , Turn on SpringBoot Of GC Information , Access Promethus monitor .

stay pom Add dependency in .

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
 Copy code

Then configure the exposure point . such , We have real-time analysis data , With the basis of optimization .

management.endpoints.web.exposure.include=health,info,prometheus
 Copy code

I'm observing JVM After your performance , We switched to G1 Garbage collector .G1 Have the biggest pause goal , We can make our GC Time is smoother . It mainly has the following tuning parameters ：

-XX:MaxGCPauseMillis Set the target pause time ,G1 We will try our best to achieve .
-XX:G1HeapRegionSize Set the small heap size . The value of 2 Power of power , Don't be too big , Not too small . If you don't know how to set it , Keep default .
-XX:InitiatingHeapOccupancyPercent When the whole heap memory usage reaches a certain proportion （ The default is 45%）, The concurrent marking phase is started .
-XX:ConcGCThreads The number of threads used by the concurrent garbage collector . The default value follows JVM Running on different platforms . Modification is not recommended .

Switch to G1 after , This uninterrupted pause , Miraculously disappeared ！ period , There have been many memory overflow problems , However, there are MAT The blessing of this artifact , In the end, it was easy The problem has been solved .

7. Other optimization

In terms of engineering structure and Architecture , If there is a hard injury , So code optimization , The effect is actually limited , For example, in our case .

But the main code still needs to be adjusted to accommodate . Some critical code in time-consuming logic , We took special care of it . According to the development specification , A unified clean-up of the code . among , There are several deeply impressed points .

Some students in order to be able to reuse map aggregate , After every use , All use clear Methods to clean up .

map1.clear();
map2.clear();
map3.clear();
map4.clear();
 Copy code

these map Data in , It's special , and clear The method is a little special , Its time complexity matters O(n) Of , Resulting in high time consumption .

public void clear() {
    Node<K,V>[] tab;
    modCount++;
    if ((tab = table) != null && size > 0) {
        size = 0;
        for (int i = 0; i < tab.length; ++i)
            tab[i] = null;
    }
}
 Copy code

The same thread safe queue , Yes ConcurrentLinkedQueue, its size() Method , Time complexity is very high , Somehow I was used by my colleagues , These are all performance killers .

public int size() {
        restartFromHead: for (;;) {
            int count = 0;
            for (Node<E> p = first(); p != null;) {
                if (p.item != null)
                    if (++count == Integer.MAX_VALUE)
                        break;  // @see Collection.size()
                if (p == (p = p.next))
                    continue restartFromHead;
            }
            return count;
        }
}
 Copy code

in addition , Some services web page , The response itself is very slow , This is because the business logic is complex , front end JavaScript In itself, the implementation is slow . This part of the code optimization , The front-end colleagues need to deal with it , Pictured , Use chrome perhaps firefox Of performance tab , You can easily find the time-consuming front end Code .

8. summary

performance optimization , In fact, there is a routine , But most teams wait for problems to optimize , There is little planning . But with surveillance and APM It's different. , We can get the data at any time , Reverse the optimization process .

Some performance problems , Be able to at the level of business needs , Or at the architectural level . What has been brought to the code layer , Optimization that requires programmer intervention , It has reached the demand side and the architecture side. We can't move any more , Or don't want to move again .

Performance optimization starts with gathering information , Find the bottleneck , Balance CPU、 Memory 、 The Internet 、、IO And so on , Then minimize the average response time , Increase throughput .

cache 、 buffer 、 Pooling 、 Reduce lock conflicts 、 asynchronous 、 parallel 、 Compress , Are common optimization methods . In our scenario , Play the greatest role , Data compression and parallel requests . Of course , With the help of other optimization methods , Our business interface , from 5-6 Seconds , Directly reduced to 1 In seconds , This optimization effect is still very impressive . It is estimated that for a long time to come , It won't be optimized anymore .