当前位置:网站首页>Practice of curve replacing CEPH in Netease cloud music
Practice of curve replacing CEPH in Netease cloud music
2022-06-28 16:48:00 【InfoQ】

- Poor performance: Due to the poor performance of a single volume ( Mainly IO High delay ,IOPS Couldn't get on , And vulnerable to other high load volumes in the cluster ), Therefore, it can only be used for the system disk , Or it can be used to print logs for cloud disk supply , Unable to support the use of middleware business .
- IO shake: After our observation, we found that IO Delay exceeds 2s It may cause disk util 100%, The business will alarm in a large area , Request to pile up , In severe cases, it will cause avalanche effect ; According to the former 2 Years of observation ,Ceph Cloud disk IO Jitter is very frequent ( Basically every month ), The duration of jitter also reaches the level of minutes , Therefore, many core applications have switched to local storage to avoid similar problems .
- shake: Since using Curve After cloud disk , disk IO util Monitoring has never been caused by distributed storage systems 100% The alarm , The stability of business operation has been greatly improved , The core business has gradually moved back to Curve Cloud disk ( After all, the high space utilization of cloud disk 、 reliability 、 Portability 、 Rapid recovery capability is also highly valued by the business ).
- performance: Under the same hardware ,Curve Single volume performance is Ceph Roll up 2 times +, The delay is also much lower than Ceph, Refer to the following figure for specific performance comparison :

- Service upgrade: Common scenarios that require upgrading the client include bug Repair 、 New function enhancement and version upgrade , We met a Ceph Community message module 32 Bit sequence number overflow bug, The bug It will appear on the long-running client , cause IO hang, Both the client and the server need to be updated to solve the problem . There are two options for updating the client , First, restart the virtual machine QEMU process , Second, perform hot migration operations on virtual machines live migration, These two operations are relatively feasible in a few virtual machine scenarios , But if you do this for hundreds of virtual machines , Obviously, the operability is very low , Business is clearly unacceptable . In addition, the server upgrade is restarting OSD Process will also cause certain IO shake , It is necessary to operate in the low peak period of business , And the business needs to shut down the disk temporarily util The alarm .
- performance: The O & M personnel mainly focus on the overall performance of the storage cluster , If the total capacity of the cluster does not match the total performance , This can easily lead to insufficient performance when the capacity is sufficient , Or create fewer volumes, resulting in a waste of capacity , You can either continue to create volumes that will affect a single volume IO Delay and throughput , in addition Ceph When the number of cluster volumes reaches a certain scale , As the number of volumes increases , The overall performance of the cluster is also declining , This results in a greater impact on the performance of a single volume .
- Algorithm: Limited by CRUSH Algorithmic limitations ,Ceph Of OSD The data distribution is very uneven , Serious waste of space , According to our observation , The highest and lowest OSD The difference in space utilization can reach 50%, Data balancing is often required , However, a large number of data migration operations will occur in the process of data balancing , Lead to IO shake , In addition, data balance cannot be solved perfectly OSD Unbalanced capacity utilization .
- IO shake: Change the bad plate , The node is down , high IO load , Capacity expansion ( No new pool, Too many new pool It can lead to OpenStack Maintenance becomes complicated ) Data equalization , Network card packet loss , Slow disk, etc .
- Service upgrade: The client supports hot upgrade , In operation QEMU The process does not need to be restarted , There is no need to migrate , The millisecond impact is almost insensitive to the business in the virtual machine , For architecture design related to thermal upgrade, please refer to ①.Curve When upgrading the server , Thanks to the quorum Mechanism consistency protocol raft, As long as you upgrade by replica domain , It can guarantee the business IO The effect is on the second level ,IO The delay does not exceed 2s It won't lead to util 100%.
- performance:Curve The cluster can operate at the same capacity , Create more volumes , And maintain stable performance output .
- Algorithm :Curve Data distribution is centralized MDS The service is carried out , It can guarantee a very high balance , The highest and lowest chunkserver The deviation of space utilization rate shall not exceed 10%, There is no need to perform data balancing operation .
- IO shake:Ceph Cloud disks are prone to IO Jittery scene ,Curve Cloud disk performance is more stable ,Curve VS Ceph The details are as follows: :



- Exploration is based on Curve Cloud native middleware scenario of block storage , For example, the transformed Redis、Kafka、 Services such as message queuing run in Curve Block storage on a volume , Reduce failover time .
- Online based on CurveBS+PolarFS+MySQL Cloud native database .
- Other stock uses Ceph Cloud disk 、 The local storage virtual machine is switched to Curve Block storage volume .
- GitHub:https://github.com/opencurve/curve
- Wechat group :Please search add or search group assistant wechat OpenCurve_bot
边栏推荐
- [force button] 977 Square of ordered array
- NOIP1998-2018年普及组 CSP-J2 2019 2020 解题报告及视频
- 【Hot100】3. Longest substring without duplicate characters
- 论文解读(GCC)《Efficient Graph Convolution for Joint Node RepresentationLearning and Clustering》
- 基数排序——【常见排序法(2/8)】
- DPDK 20.11编译安装运行程序
- STM32CubeMX使用方法及功能介绍
- 通过setTimeout解决子组件不会销毁的问题
- 批量修改指定字符文件名 bat脚本
- A simple reflective XSS operation and idea
猜你喜欢

云上竞技,360°见证速度与激情

Super automation and the future of network security

Csp-j1 csp-s1 preliminary training plan and learning points in summer and September 2022

The future of platform as code is kubernetes extension

基数排序——【常见排序法(2/8)】
![[proteus simulation] L297 driving stepping motor](/img/12/7902cf31f19df5d2613de7f25dca5b.png)
[proteus simulation] L297 driving stepping motor

10.hystrix circuit breaker

【TcaplusDB】祝大家端午安康!

MATLB|可视化学习(plot和bar)

批量修改指定字符文件名 bat脚本
随机推荐
WPF video hard decoding, rendering and playing (no airspace) (support 4K, 8K and high frame rate video)
浅谈 SAP 软件里的价格折扣设计原理
10.Hystrix断路器
Super automation and the future of network security
Internet of things cloud convergence Security Guide
编写自己的 WordPress 模板
[208] API design based on accesstoken
【Hot100】4. Find the median of two positive arrays
如何清除 WordPress 中的缓存
【Hot100】3. 无重复字符的最长子串
【TcaplusDB】祝大家端午安康!
[force button] 977 Square of ordered array
MATLB|电力系统优化运行与市场化
【Laravel】关于Laravel8的composer安装
Solve the problem that subcomponents will not be destroyed through setTimeout
LTspice 电路仿真入门
Please ask me, the queries written in my database account for 99%. Is it better to use pay as you go mode or reservation mode?
How to log in to your WordPress admin dashboard
Must the database primary key be self incremented? What scenarios do not suggest self augmentation? ByteDance experience sharing using Flink state 𞓜 afternoon tea with sauce issue 16
How to query all the data in a table in the database?