当前位置:网站首页>Practice of curve replacing CEPH in Netease cloud music
Practice of curve replacing CEPH in Netease cloud music
2022-06-28 15:18:00 【Netease Shufan】

Curve Block storage has been used online in the production environment for nearly three years , It has withstood the test of various abnormal and extreme scenes , Performance and stability exceed core business requirements
Netease cloud music background
Netease cloud music is one of the leading online music platforms in China , Provide an interactive content community for music lovers . Netease cloud music has created a large 、 Energetic and strong 、 Fast growing business , Provide users with community-centered online music services and social entertainment services . Its iconic key products include “ NetEase cloud music ” And affiliated social entertainment products , Such as “LOOK live broadcast ”、“ acoustic wave ” And “ Yinjie ”, Let music lovers explore independently through technology driven tools 、 enjoy 、 Share and create different music and music derivative content , And interact with others .
Cloud music cloud disk business background
Cloud music services using cloud disk mainly include the main station 、UGC、 Qu Ku, etc Java application , The main station is the core business of cloud music , The highest level of SLA guarantee ( Annual availability >=99.99%), In the face of providing a stable cloud music experience with hundreds of millions of users , This has always been our key and difficult point .2019 Before, cloud music mainly used Ceph Cloud disk , as everyone knows ,Ceph Performance defects exist in large-scale scenarios , And it's hard to guarantee that we are in all kinds of anomalies ( Bad disk slow disk 、 Storage downtime 、 Storage network congestion, etc ) The cloud disk in the scene IO Response delay is not affected ;Ceph Cloud disk IO Jitter problem , We have tried to spend a lot of manpower on Optimization and transformation , But they are only slightly relieved , It can't be solved completely ; Performance problems are also analyzed and optimized with a lot of manpower , But it still can not meet the expectations , because This is why we have established a project to understand Curve Block storage distributed storage system .
Curve Introduction to block storage
Curve Block storage can be well adapted to mainstream cloud computing platforms , And it has high performance 、 Easy operation and maintenance 、 Stable without jitter . We are in practical application , Use Curve Block storage docking Cinder As the backend of virtual machine cloud disk storage , docking Nova As a virtual machine system disk , docking Glance As a mirror storage backend . In the process of creating virtual machine ,Nova Will pass Curve Block storage provides Python SDK Clone a new volume for use as a virtual machine system disk . In the process of creating the cloud disk ,Cinder Will pass Python SDK Create an empty volume or clone a new volume from an existing volume snapshot , After that, it can be attached to the virtual machine and used as a cloud disk . Virtual machine uses Libvirt As a virtualization control service , Use QEMU/KVM As a virtualization engine .Curve The block is stored as Libvirt/QEMU Provides a driver library , After compilation, you can directly use Curve Volume as remote storage , There is no need to put Curve Block storage volumes are mounted locally .
Why choose Curve
1. Business side
i. According to our cloud music application scenarios ,Ceph There are two main cloud disks Pain points :
Poor performance : Due to the poor performance of a single volume ( Mainly IO High delay ,IOPS Couldn't get on , And vulnerable to other high load volumes in the cluster ), Therefore, it can only be used for the system disk , Or it can be used to print logs for cloud disk supply , Unable to support the use of middleware business .
IO shake : After our observation, we found that IO Delay exceeds 2s It may cause disk util 100%, The business will alarm in a large area , Request to pile up , In severe cases, it will cause avalanche effect ; According to the former 2 Years of observation ,Ceph Cloud disk IO Jitter is very frequent ( Basically every month ), The duration of jitter also reaches the level of minutes , Therefore, many core applications have switched to local storage to avoid similar problems .
ii. Curve Cloud disk advantage :
shake : Since using Curve After cloud disk , disk IO util Monitoring has never been caused by distributed storage systems 100% The alarm , The stability of business operation has been greatly improved , The core business has gradually moved back to Curve Cloud disk ( After all, the high space utilization of cloud disk 、 reliability 、 Portability 、 Rapid recovery capability is also highly valued by the business ).
performance : Under the same hardware ,Curve Single volume performance is Ceph Roll up 2 times +, The delay is also much lower than Ceph, Refer to the following figure for specific performance comparison :

2. Operation and maintenance side
i. According to our cloud music operation and maintenance scenario ,Ceph Of Pain points Mainly as follows :
Service upgrade : Common scenarios that require upgrading the client include bug Repair 、 New function enhancement and version upgrade , We met a Ceph Community message module 32 Bit sequence number overflow bug, The bug It will appear on the long-running client , cause IO hang, Both the client and the server need to be updated to solve the problem . There are two options for updating the client , First, restart the virtual machine QEMU process , Second, perform hot migration operations on virtual machines live migration, These two operations are relatively feasible in a few virtual machine scenarios , But if you do this for hundreds of virtual machines , Obviously, the operability is very low , Business is clearly unacceptable . In addition, the server upgrade is restarting OSD Process will also cause certain IO shake , It is necessary to operate in the low peak period of business , And the business needs to shut down the disk temporarily util The alarm .
performance : The O & M personnel mainly focus on the overall performance of the storage cluster , If the total capacity of the cluster does not match the total performance , This can easily lead to insufficient performance when the capacity is sufficient , Or create fewer volumes, resulting in a waste of capacity , You can either continue to create volumes that will affect a single volume IO Delay and throughput , in addition Ceph When the number of cluster volumes reaches a certain scale , As the number of volumes increases , The overall performance of the cluster is also declining , This results in a greater impact on the performance of a single volume .
Algorithm : Limited by CRUSH Algorithmic limitations ,Ceph Of OSD The data distribution is very uneven , Serious waste of space , According to our observation , The highest and lowest OSD The difference in space utilization can reach 50%, Data balancing is often required , However, a large number of data migration operations will occur in the process of data balancing , Lead to IO shake , In addition, data balance cannot be solved perfectly OSD Unbalanced capacity utilization .
IO shake : Change the bad plate , The node is down , high IO load , Capacity expansion ( No new pool, Too many new pool It can lead to OpenStack Maintenance becomes complicated ) Data equalization , Network card packet loss , Slow disk, etc .
ii. relatively speaking Curve It has remarkable advantages in the above aspects advantage :
Service upgrade : The client supports hot upgrade , In operation QEMU The process does not need To restart , There is no need to migrate , The millisecond impact is almost insensitive to the business in the virtual machine , For architecture design related to thermal upgrade, please refer to ①.Curve When upgrading the server , Thanks to the quorum Mechanism Consistency agreement for raft, As long as you upgrade by replica domain , It can guarantee the business IO The effect is on the second level ,IO The delay does not exceed 2s It won't lead to util 100%.
performance :Curve The cluster can operate at the same capacity , Create more volumes , And maintain stable performance output .
count Law :Curve Data distribution is centralized MDS The service is carried out , It can guarantee a very high balance , The highest and lowest chunkserver The deviation of space utilization rate shall not exceed 10%, There is no need to perform data balancing operation .
IO shake :Ceph Cloud disks are prone to IO Jittery scene ,Curve Cloud disk performance is more stable ,Curve VS Ceph The details are as follows: :



Use Curve The result of landing
Curve Block storage has been used online in the production environment for nearly three years , It has withstood the test of various abnormal and extreme scenes , Performance and stability exceed core business requirements , No obvious fault occurs under common fault scenarios IO shake , The upgrade of the server and client versions has not affected the normal operation of the business , This fully proves that our choice at that time was correct , Also thank you Curve The help given by the students of the team in the process of our use . at present : Cloud music use Curve Block storage serves as the cloud disk and system disk of the virtual machine , The system disk is usually of fixed capacity 40GB or 60GB Two specifications , The cloud disk has the smallest capacity 50GB, The biggest support 4TB( This is a soft limitation ,Curve The cloud disk actually supports the creation of PB Level volume ).
Follow up planning
combination Curve Block storage :
Exploration is based on Curve Cloud native middleware scenario of block storage , For example, the transformed Redis、Kafka、 Services such as message queuing run in Curve Block storage on a volume , Reduce failover time .
Online based on CurveBS+PolarFS+MySQL Cloud native database .
Other stock uses Ceph Cloud disk 、 The local storage virtual machine is switched to Curve Block storage volume .
at present Curve The team is also fully developing the shared file storage service , Netease is based on OpenStack Private cloud 2.0 The platform has evolved to be based on Kubernetes Of 3.0 platform , Business for ReadWriteMany The type of PVC The demand for volumes has become more and more urgent ,Curve Team developed Curve Distributed shared file system , The system supports storing data in Curve Block storage backend or compatible S3 Protocol object storage service , It will also be launched as soon as possible .
Reference resources :
① https://github.com/opencurve/curve/blob/master/docs/cn/nebd.md
Wechat group : Please search add or search group assistant wechat OpenCurve_bot
边栏推荐
- [C language] how to generate normal or Gaussian random numbers
- How to build a 100000 level QPS large flow and high concurrency coupon system from zero
- Functools: high order functions and operations on callable objects (continuous updating ing...)
- How to solve the following problems in the Seata database?
- Curve 替换 Ceph 在网易云音乐的实践
- R语言ggplot2可视化:使用patchwork包将两个ggplot2可视化结果纵向堆叠起来(stacking)形成组合图、一个可视化结果堆叠在另外一个可视化结果上
- Longest continuous sequence
- 成龙和快品牌,谁才是快手的救星?
- Ros21 lecture
- Successful cases of rights protection of open source projects: successful rights protection of SPuG open source operation and maintenance platform
猜你喜欢

坐拥1200亿,她又要IPO敲钟了

WPF 视频硬解码渲染播放(无空域)(支持4K、8K、高帧率视频)

Express模板引擎

看界面控件DevExpress WinForms如何创建一个虚拟键盘
![Experiment 6 8255 parallel interface experiment [microcomputer principle] [experiment]](/img/70/394ccf6e08a0774acade1eb1b8bf00.png)
Experiment 6 8255 parallel interface experiment [microcomputer principle] [experiment]

Facebook! Adaptive gradient defeats manual parameter adjustment

Grand launch of qodana: your favorite CI code quality platform

Complete model training routine (I)

With a return of 5000 times, the South African newspaper invested in Tencent to make a province
DBMS in Oracle_ output. put_ Line output problem solving process
随机推荐
Longest continuous sequence
成龙和快品牌,谁才是快手的救星?
环保产品“绿色溢价”高?低碳生活方式离人们还有多远
WPF 视频硬解码渲染播放(无空域)(支持4K、8K、高帧率视频)
openGauss内核:SQL解析过程分析
R language ggplot2 visualization: use the patchwork package (directly use the plus sign +) to horizontally combine a ggplot2 visualization result and a plot function visualization result to form a fin
使用Karmada实现Helm应用的跨集群部署
厨卫电器行业S2B2C系统网站解决方案:打造S2B2C平台全渠道商业系统
With a return of 5000 times, the South African newspaper invested in Tencent to make a province
ROS knowledge points - definition and use of topic messages
【算法篇】刷了两道大厂面试题,含泪 ”重学数组“
利用MySqlBulkLoader实现批量插入数据的示例详解
Spacetutorial (continuous updating...)
分布式 CAP 定理的前世今生
Power battery is divided up like this
不要使用短路逻辑编写 stl sorter 多条件比较
开源大咖说 - Linus 与 Jim 对话中国开源
兼顾企业抗疫和发展的5个解决方案,来自IBM
MIPS汇编语言学习-01-两数求和以及环境配置、如何运行
蔚来潜藏的危机:过去、现在到未来