当前位置:网站首页>We were tossed all night by a Kong performance bug
We were tossed all night by a Kong performance bug
2022-07-26 15:56:00 【InfoQ】
The story background
A series of failed attempts
Parameter tuning
NGINX_WORKER_PROCESSESMEM_CACHE_SIZEDB_UPDATE_FREQUENCYWORKER_STATE_UPDATE_FREQUENCYwork_memshare_buffersClean up the data
kong Instance read-write separation
postgres transfer RDS
Roll back
The way to reproduce the problem
Import data
psql -h 127.0.0.1 -U kong < kong.sqlcurl -X POST http://10.97.4.116:8001/plugins --data "name=prometheus"Phenomenon one
curl http://10.97.4.116:8001
Phenomenon two
curl -i -X POST http://10.97.4.116:8001/services/ -d 'name=baidu2' -d 'url=http://www.baidu.com'
curl -i -X POST http://10.97.4.116:8001/services/baidu2/routes \
-d "name=test2" \
-d "paths[1]=/baidu2"curl http://10.97.4.116:8000/baidu2#!/bin/bash
curl -i -X POST http://10.97.4.116:8001/services/ -d 'name=baidu' -d 'url=http://www.baidu.com'
curl -i -X POST http://10.97.4.116:8001/services/baidu/routes \
-d "name=test" \
-d "paths[1]=/baidu"
curl -s http://10.97.4.116:8001/services
curl -i -X DELETE http://10.97.4.116:8001/services/baidu/routes/testfor i in `seq 1 100`; do sh 1.sh ; donecurl http://10.97.4.116:8000/baidu2
Accompanying phenomenon
- kong Example of cpu Follow mem Both continue to rise , And when admin This phenomenon is still not over after the interface call .mem It will rise to a certain extent nginx worker process oom fall , And then restart , This may be the reason for the slow access ;
- We set it up
KONG_NGINX_WORKER_PROCESSES by 4, And for pod The memory of is 4G When ,pod The overall memory will be stable at 2.3G, however call admin Interface test ,pod Memory will keep rising to more than 4G, Trigger worker Of OOM, So I will pod The memory of is adjusted to 8G. Call again admin Interface , Find out pod Memory is still rising , It just rose to 4.11 G It's over , This seems to mean that we are going to set pod The memory of is KONG_NGINX_WORKER_PROCESSES twice as much , This problem is solved ( But there is another important question is why to call once admin Interface , It will cause the memory to rise so much );
- in addition , When I keep calling admin At the interface , The final memory will continue to grow and stabilize to 6.9G.
pmap -x [pid]

Conclusion
- The question is related to kong The upgrade (0.14 --> 2.2.0) It doesn't matter. , Use it directly 2.2.0 Version will also have this problem ;
- kong every other
worker_state_update_frequency It will be rebuilt in memory after time router, Once reconstruction starts, it will lead to Memory goes up , After looking at the code, the problem is Router.new Here's the way , Will apply for lrucache But there is no flush_all, According to the latest 2.8.1 Version of lrucache After the release, the problem still exists ;
- That is to say kong Of
Router.new When other logic in the method arrives, the memory rises ;

- This shows that the problem is kong There is a performance bug, It still exists in the latest version , When route Follow service When reaching a certain order of magnitude, there will be calls admin Interface , Lead to kong Of worker Memory is rising rapidly , bring oom This leads to poor business access performance , The temporary solution can be to reduce
NGINX_WORKER_PROCESSES And increase kong pod Of memory , Make sure to call admin The memory required after the interface is enough to use without triggering oom, To ensure the normal use of business .
边栏推荐
- 2022 what is your sense of security? Volvo asked in the middle of the year
- 大型仿人机器人整机构型研究与应用
- 2023 catering industry exhibition, China catering supply chain exhibition and Jiangxi catering Ingredients Exhibition were held in February
- Digital warehouse: iqiyi digital warehouse platform construction practice
- Deep packet inspection using cuckoo filter paper summary
- 山西阳泉一煤矿发生致1人死亡安全事故,被责令停产整顿
- Gcc/g++ and dynamic and static libraries and GDB
- Can't you see the withdrawal? Three steps to prevent withdrawal on wechat.
- 组件化开发基本规范、localStorage 和 sessionStorage、对象数据转基本值、原型链使用
- Strengthen the defense line of ecological security, and carry out emergency drills for environmental emergencies in Guangzhou
猜你喜欢

Teach the big model to skip the "useless" layer and improve the reasoning speed × 3. The performance remains unchanged, and the new method of Google MIT is popular

OSPF综合实验

【DSCTF2022】pwn补题记录

【EXPDP导出数据】expdp导出23行记录,且不包含lob字段的表,居然用时48分钟,请大家帮忙看看

企业数字化转型需要深入研究,不能为了转型而转型

Desktop application layout

DELTA控制器RMC200

Refuse noise, the entry journey of earphone Xiaobai

A comprehensive review of image enhancement technology in deep learning

如何通过ETL调度工具 TASKCTL 使用作业插件类型调用 kettle作业?
随机推荐
关于我写的IDEA插件能一键生成service,mapper....这件事(附源码)
工具技能学习(一):前置技能-makfile、make、.mk
[leetcode daily question] - 121. The best time to buy and sell stocks
数据中台、BI业务访谈(四)—— 十个问题看本质
js 对数组操作的 API 总结
[dsctf2022] PWN supplementary question record
PS + PL heterogeneous multicore case development manual for Ti C6000 tms320c6678 DSP + zynq-7045 (4)
Change an ergonomic chair to relieve the old waist of sitting and writing code~
sklearn clustering聚类
一文详解 Redis 中 BigKey、HotKey 的发现与处理
What is a virtual camera
How to convert planning map into vector data with longitude and latitude geojson
原来卡布奇诺信息安全协会是干这个的呀,一起来看看吧。
【5分钟Paper】Pointer Network指针网络
山西阳泉一煤矿发生致1人死亡安全事故,被责令停产整顿
LeetCode_ Prefix and_ Hash table_ Medium_ 525. Continuous array
Bluetooth ble4.0-hm-10 device pairing Guide
Using two stacks to implement a queue
Sklearn clustering clustering
德国emg电动执行器EB800-60II