当前位置:网站首页>Once, the online environment redis response was slow, causing an avalanche

Once, the online environment redis response was slow, causing an avalanche

2022-06-09 05:37:00 Die hard audio and video

  • Preface
    redis It is a master-slave plus sentry structure ,java The service is in k8s On , Use springboot Of actuator To have a health check-up .

The cause of the fault is as follows ,redis because cpu a surge , Resulting in slow response , Disconnected and java The connection of , Just in k8s During the time of health examination , It was found that java The service is not normal ,k8s It's gone java service , Lead to abnormal business .

  • Solutions
  1. First, prevent avalanche effect , Avoid snowballs rolling bigger and bigger , Network fluctuations are inevitable , Nor can we for this reason , To increase k8s Check gap time or close k8s Health check , You can't pick up sesame seeds just because you lose watermelon , So the more appropriate solution is ,springboot When you go for a health check-up , Turn off the right redis The inspection of , Don't cry because it is redis The problem of , Lead to java Service avalanche .
management.health.redis.enabled=false

2.redis cpu Surge check ,cpu soar , We should choose the appropriate method to check according to the situation , If it is always high , Look directly at the information , Maybe the thread is deadlocked , If it is a while high , Lower in a moment , Flame diagram is more suitable for .

  1. First check redis Slow log ,slowlog get 5, If there are slow logs , Use keys and keys* De matching of , We need to correct the code . Recommended scan Instead of keys
     Insert picture description here
    4. If optimized redis command ,cpu Still can't get down , We can use redis The official advice of , Check out , Use the following command
latency doctor

give the result as follows

  • Deleting, expiring or evicting (because of maxmemory policy) large objects is a blocking operation. If you have very large objects that ar
    e often deleted, expired, or evicted, try to fragment those objects into multiple smaller objects.
  • I detected a non zero amount of anonymous huge pages used by your process. This creates very serious latency events in different condition
    s, especially when Redis is persisting on disk. To disable THP support use the command ‘echo never > /sys/kernel/mm/transparent_hugepage/ena
    bled’, make sure to also add it into /etc/rc.local so that the command will be executed again after a reboot. Note that even if you have alr
    eady disabled THP, you still need to restart the Redis process to get rid of the huge pages already created.

We can see that large objects have been deleted , Then how to find a big target , Wait a minute .

THP Need to be closed , What is? THP Can baidu next , Closing it requires a reboot redis service .

  • redis monitor redis Official supply monitor command , Can be monitored in redis All commands executed , But this command consumes redis Performance of , All peak periods of business should be excluded
monitor > redis-monitor.txt
  • Flame chart
    Use perf sampling
perf record -g --pid $(grep redis-server) -F 999 -- sleep 60

Set the minimum call graph inclusion threshold to 0.5%

perf report -g "graph,0.5,caller"

Generate flame chart

git clone https://github.com/brendangregg/FlameGraph.git
perf script > redis.perf.stacks
stackcollapse-perf.pl redis.perf.stacks > redis.folded.stacks
flamegraph.pl redis.folded.stacks > redis.svg
原网站

版权声明
本文为[Die hard audio and video]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/03/202203021427409508.html