当前位置:网站首页>druid. The performance of IO + tranquility real-time tasks is summarized with the help of 2020 double 11
druid. The performance of IO + tranquility real-time tasks is summarized with the help of 2020 double 11
2022-07-29 02:00:00 【master-dragon】
Catalog
Mainly to sum up my double 11 Of druid.io Cluster operation and maintenance work and some thoughts
Draw a picture directly and remember 
The problem summary
Operation and maintenance automation ?
Many problems still pass the alarm , Then operate after manual judgment
This can be automated , The conditions of judgment should be abstract and logical ( If there are no machine indicators, go to the operation and maintenance department , Develop without its own system indicators , The script can be used ), Operations can be encapsulated ( There are operations , There is a fallback ), You can start
The need is repeated / Afterwards communication ?
Always consider how to solve problems when they are exposed , In fact, I mentioned before , Just don't pay attention to , Think the problem is not big or important .
So simulate as soon as possible , Get started , Simulate problems without problems , This is similar to writing algorithm considering boundary conditions , Have foresight
Insufficient design and ability ?
The architecture design is flawed , Then there is the lack of ability , If you want to change it, you can't change it , Have to go through a third party or continue in-depth research , Improve your ability , Then make the architecture better
double 11 The difference in peace ?
Not much difference , Because pressure measurement and capacity expansion are prepared in advance ; But everything has an accident , It depends on whether the consideration is comprehensive , There are also temporary remedies for incompleteness , Always aim to have the least impact , As an opportunity to test service ability , It is also an opportunity to expose problems and improve .
There is no special difference to say
druid.io Performance summary
druid.io + tranquility What I fear most is two middleManager Hang up ( Downtime or OutOfMemory), But judging from so long experience in operation and maintenance , Basically, this probability is very small ( The probability of machine downtime at the same time is small , Unless there is a power failure or something wrong with the machine room , secondly :middleManager Sufficient and real-time query makes some super large query restrictions , At the same time, the task has backup , And upstream lag Alarms and so on , Alarms are sufficient )
During the overall promotion, the task is very stable , Inquiries will rise a lot ( But query routing , Current limiting , There are all machine capacity expansion , There are all kinds of machine monitoring , It can be expanded at any time ), The query is also very stable ; The overall performance feels good 9 branch ( Full marks 10 branch ), Except for some accidents
- Insufficient estimation leads to temporary expansion
Real time task traffic is rising , Then expand , At present, manual or automatic , There is a time window and the correct estimate is basically solved in advance 99% Real time tasks of . a 2C4G tranquility Consumption upstream 1-3w qps Sure , But for the >=200wqps Upstream kafka Real time flow , It really consumes a lot of machines , meanwhile kafka partition It needs to be expanded within a reasonable range , It's too small to keep up with consumption
However, there will always be underestimation , This time double 11 Just meet , However, the timely alarm and operation , The temporary expansion has been solved
- Data skew
This common thing , Continued expansion can solve , Adding random dimensions can also solve ; Critical situation , Of course, it's expansion , Alleviate the lower tilt problem ; However, we still need to analyze the data later / government , Avoid serious tilting , At this stage, the inclination has hash All dimension values , Theoretically, it's ok , There is no quantitative level of inclination , This point may need to be predicted later , Handle well
Personal operation summary
This time I am on duty to summarize : As usual , Never mess , It is still the analysis of various indicators , Focus on the problem , Then operate reasonably , The action is nothing more than a little faster than usual .
As usual , There is no need to panic .
----------2020 year 11 month 20 Japan Friday 20 when 48 branch 28 second CST
边栏推荐
- [7.21-26] code source - [good sequence] [social circle] [namonamo]
- Super scientific and technological data leakage prevention system, control illegal Internet behaviors, and ensure enterprise information security
- Autoware reports an error: can't generate global path for start solution
- 为什么 BI 软件都搞不定关联分析
- Sigma-DSP-OUTPUT
- How companies make business decisions -- with the help of data-driven marketing
- 关于df[‘某一列名’][序号]
- As long as I run fast enough, it won't catch me. How does a high school student achieve a 70% salary increase under the epidemic?
- Stonedb invites you to participate in the open source community monthly meeting!
- Minimalist thrift+consumer
猜你喜欢

使用本地缓存+全局缓存实现小型系统用户权限管理

With the explosive growth of digital identity in 2022, global organizations are facing greater network security
![[10:00 public class]: application exploration of Kwai gpu/fpga/asic heterogeneous platform](/img/e7/1d06eba0e50eeb91d2d5da7524f4af.png)
[10:00 public class]: application exploration of Kwai gpu/fpga/asic heterogeneous platform
![[the road of Exile - Chapter 7]](/img/3c/8b4b7c40367b8b68d0361d9ca4013a.png)
[the road of Exile - Chapter 7]

基于 ICA 与 DL 的语音信号盲分离

Tomorrow infinite plan, 2022 conceptual planning scheme for a company's yuanuniverse product launch

动态内存与智能指针

Lua log implementation -- print table

DSP vibration seat
![[netding cup 2020 rosefinch group]nmap](/img/22/1fdf716a216ae26b9110b2e5f211f6.png)
[netding cup 2020 rosefinch group]nmap
随机推荐
[the road of Exile - Chapter 7]
Explanation of yocto project directory structure
JVM learning minutes
Stonedb invites you to participate in the open source community monthly meeting!
With the explosive growth of digital identity in 2022, global organizations are facing greater network security
Thirty years of MPEG audio coding
【GoLang】网络连接 net.Dial
Top network security prediction: nearly one-third of countries will regulate blackmail software response within three years
Analyzing the function of human-computer interface module of runtime manager based on autoware
Reinforcement learning (I): Q-learning, with source code interpretation
[understanding of opportunity-54]: plain book-1-the origin of things [original chapter 1]: the road is simple.
In depth analysis of C language memory alignment
[golang] use select {}
[netding cup 2020 rosefinch group]nmap
【公开课预告】:快手GPU/FPGA/ASIC异构平台的应用探索
Add graceful annotations to latex formula; "Data science" interview questions collection of RI Gai; College Students' computer self-study guide; Personal firewall; Cutting edge materials / papers | sh
【7.21-26】代码源 - 【好序列】【社交圈】【namonamo】
Golang startup error [resolved]
为什么 BI 软件都搞不定关联分析
Wonderful use of data analysis