当前位置:网站首页>Scheme and practice of cold and hot separation of massive data
Scheme and practice of cold and hot separation of massive data
2022-07-03 01:32:00 【ByteDance technical team】
Focus on Dry goods don't get lost
background
With the rapid development of financial payment business , Considering the continuous growth of orders in the future , Online storage meets greater challenges , It needs to be planned in advance . At present, the main business of financial payment is to use mysql(InnoDB) As data storage , Due to the low access frequency of historical order information and taking up a lot of database storage space , It is expected to separate the historical data from the latest production transaction data , The current database retains the data of the latest period of time as a hot storage , Historical transactions are stored in another database for compressed storage as a cold storage (rocksdb), That is, the separation of cold and hot databases . This will greatly save the cost of database equipment , Reduce the duration of service downtime due to insufficient online storage space expansion , The following case analysis based on the current situation of the unified trading system of financial payment is for your reference only .
programme
Technology selection
Architecture diagram
data:image/s3,"s3://crabby-images/f7180/f718009ebe894d18274d0ed3e60effa6f9fdbbe5" alt="568a826805010d4bd065ef0d25c9fed0.jpeg"
Scheme analysis
Because the business scenario is relatively complex , If the workload is sorted according to the business scenario, it will increase exponentially , Alternate dimension , Database related operations are nothing more than queries 、 Insert 、 to update , As long as the query can be guaranteed in the database interaction layer 、 Insert 、 The basic operation of updating these databases will not be affected after adding cold and hot separation . Financial payment codes have unified layered specifications , All database operations are converged and encapsulated to the database interaction layer , Therefore, it is better to transform , Without capacity expansion , The heat storage is expected to be kept recently X God ( The time is adjustable ) data , X Archive the data of days ago to the cold storage .
data:image/s3,"s3://crabby-images/0658a/0658a1ce88105b0e55a8f22c9a8301e18decfeda" alt="56af68cc8fda71fcc9e49a4cf8de0997.jpeg"
data:image/s3,"s3://crabby-images/81437/814376436597fe8cd242abde896b34a7b4455e4d" alt="6c7f33264c0abc2394644d90c2ff9790.jpeg"
Scheme comparison
Scheme 1 : A solution to the pressure of database storage , But the performance requirements of cold storage are too high , If the insertion involved 、 to update 、 Query can filter the time according to the document number , Reduce dependence on cold storage .
Option two : Suitable for cold storage with low performance , The insertion involved 、 to update 、 When most queries cannot filter the time according to the document number , Need to transfer and filter the heat storage archive table .
Option three : If the scenario involved in the system is relatively simple , There is no subsequent change to historical orders , You can archive by scene .
Options
transaction : The transaction table is responsible for recording the mapping between merchant orders and financial payment internal orders 、 Transaction amount 、 Important information such as buyer and seller , The most important function is to prevent repeated transactions . But the performance of cold storage is lower than that of hot storage , There is no fixed rule for merchant order number , It is impossible to judge the time filter according to the order number to reduce the pressure of the cold storage , And heat storage cpu The usage rate is very low , Heat storage database calculation is not a bottleneck , Therefore, scheme II is selected for the transaction . The main significance of transaction filing table is to reduce the dependence of online transactions on cold storage .
payment : The payment form is responsible for saving the payment method used in the transaction sheet 、 How much does this payment method need to deduct 、 Where to buckle 、 Where to buckle to wait for information , Order query involved 、 to update 、 The insertion can be judged according to the transaction number or payment number, and the time is reduced to query the cold storage , So payment option one .
The basic principle of
In order to fully guarantee 0 accident ,0 Asset loss , During scheme design , The following basic principles are proposed , In R & D 、 test 、 During code review, the following basic principles are referred to for layer by layer control , It can effectively avoid the occurrence of production accidents .
Data insertion uniqueness :
All the unique keys of the hot storage archive table must be consistent with the hot storage table to be archived .
Hot storage archive records existing orders , The cold storage must have corresponding data ,
Cold storage insert : First insert After the success of the cold storage Again insert Heat storage archive table
Cold storage update : Update cold storage data , Use the same transaction First delete Again insert Cold storage data
Hot storage delete : Use cold storage data when deleting hot storage data where Conditions , All hot storage fields ( contain ID) Only when all the conditions are met can the deletion be successful .
Data update consistency :
There is no cold storage update operation , All update operations must be carried out in the hot storage , If the data needs to be updated and only exists in the cold storage , It needs to be synchronized to the hot storage , Then complete the update in the hot storage .
When cold storage and hot storage data exist at the same time , Subject to heat storage data . The data source of cold storage is only the hot storage data synchronized to the cold storage .
When data is synchronized from cold storage to hot storage , The operation filing table and transaction table should be completed in the same transaction , The queries involved must use the write library .
Accuracy of data query :
Single query : When the query hot storage data does not exist , There is no need to query the cold storage again ( If the order date can be judged in the order number , You can add another layer of date filtering , Reduce cold storage queries )
Batch query : When the cold storage and hot storage data exist, the hot storage data will be returned first .
Batch query : After merging cold and hot storage data , It depends on whether the interface sequence of the original query is required , If there are requirements for order, you need to sort after merging .
Reduce cold storage pressure : The performance of cold storage is low , Online real-time transactions minimize the query and dependence on cold storage ( You can filter through the date in the transaction number or the filing table ).
Limit the number of days to control : Database interaction Layer days control by n, The number of days for archiving task control is m, requirement m>n. for example ,mode layer Some judge that the order exceeds n Days will query the cold storage , Archiving tasks only archive m The historical data of the day before , Separate control can prevent data from being found due to the adjustment of filing days .
Specific details
Archive table structure
data:image/s3,"s3://crabby-images/01207/012074d72dfc7bd8fce027c7d6a9fcabda6fad82" alt="19cba3ebde7b642f6d9b435f0d2d5d5b.jpeg"
Archive table status flow
data:image/s3,"s3://crabby-images/e5715/e571523bd7422ee2e0211882a58ec7d3a2a542b7" alt="961e52ed71c2a5855f08646ece241d2e.jpeg"
Consistent deletion
Use all the fields of the cold storage record as the delete hot storage where Conditions ( Including self increasing id), Deleting hot storage and updating hot storage archive status to cold storage need to be in one thing , Rollback if any failure .
Transaction and payment tasks ( Data archiving 、 Delete 、 The bottom line )
Archiving tasks
Query the hot storage order table X ( The time is adjustable ) Order of days ago , Synchronize hot storage orders to cold storage , Insert the hot storage archive table , The archiving status is in process , Put delay delete mq news .
Archive delete TASK
The resident service TASK Consumption delete mq news ,rpc Call the deletion interface provided by transaction payment , Support local current limiting capability .
Tell the whole story :
The main function : Query the orders in the hot storage archive table that have been modified for more than the specified time in processing, and force the deletion operation . Mainly used to prevent mq Abnormal or daily lost messages , Using the task of covering the bottom can compensate for the archived records in the digestion process .
Perform logical
Data archiving tasks ( Start once a day )
for {
Initialize query time range and paging
for{
Inquire about X Trading order days ago limit 1000( Index sort , Rolling time query )
if Records exist also Number of pieces =1000 {
for For each record {
// Enable x Processes
Trade order idempotent is written into cold storage ( There is no guarantee of the latest , Only ensure the existence of cold storage data )
Idempotent write archive record table (type: PROCESSING, When the hot storage data is deleted, it will be updated to COLD, Archive record already exists HOT Status updated to PROCESSING )
Hair MQ Delay message ,X min( Configurable ) Delete the heat storage data
}
}
if Number of pieces =1000 {
continue
}
The time frame moves down
// Record does not exist
if The end time exceeds the specified time {
break ( Out of the loop , End of the task )
}
redis Record current query criteria , It is convenient for subsequent tasks to resume
}
}
Delete hot storage data , consumption MQ
consumption MQ Record {
Query cold storage
Data consistency deletion ( Open transaction Conditionally delete hot storage data , Update archive record table status as COLD End the business )
Consistent deletion of hot storage failed , Synchronize hot storage data to cold storage , Data consistency deletion
}
Compensation task ( Every time 30 It starts every 10 minutes )
{
The status in the query archive record table is PROCESSING, Change the time to X +Y min Records before limit 1000
if non-existent {
break
}
for For each record
Query cold storage
Data consistency deletion ( Open transaction Conditionally delete hot storage data , Update archive record table status as COLD End the business )
Consistent deletion of hot storage failed , Synchronize hot storage data to cold storage , Data consistency deletion
}
}
Archive task query time rolling mechanism : The first start time of the time range is a fixed date ( Earliest date of financial payment order ), The end time is the specified date , The next start time is equal to the last end time , The end time is the last end time plus the specified time range ). Every time you query the next time window redis Save information , Specify Date , The time range of the task of the day , paged .
data:image/s3,"s3://crabby-images/b3df4/b3df4067182837da64a28841cbda6e9a5a222588" alt="9db50c5c32a1aef60f4e4c5e6f02a0f1.jpeg"
Archive tasks are processed concurrently : It needs to support multi task sharding and concurrent processing
Increase the volume of archived orders throughout the day : In order not to affect online transactions , all day 24 Hours distinguish Peak trading 、 Low peak 、 daily Three different time periods , Archiving speed is different .
transaction - There are filing forms ( Inquire about 、 newly added 、 to update )
characteristic : The only key has an external number , The order rules are random, and the time cannot be judged according to the order number , Therefore, there must be a filing form .
Inquire about
The logic realizes unified processing in the database interaction layer
data:image/s3,"s3://crabby-images/99e03/99e03f19b0ac6597ae6efb8d749326ec93d64f03" alt="2d8e746ae425bfb613579cf436294fec.png"
Some of the following situations can be handled specially to reduce the dependence of database cold storage .
Single query :
Query according to the external document number , If the inquiry qps Higher , You can use the archive table to filter and judge before querying the cold storage .
Query according to the transaction number , If you can judge the time according to the order number , Use the document number to filter the time range before querying the cold storage .
Batch query : Some functions manage background function paging query , When adding cold storage query logic with high requirements for data query range , You can add the start time of the incoming query time range to filter whether to query the cold storage , When cold storage and hot storage exist, the hot storage data shall be retained first ( Only filter the same document number data in the same page ), If you have any objection to the result, you can use the order number to query again and return to the latest reconfirmation . Confirm with the product and operation whether it can be supported or not, just query the hot storage .
to update
The logic realizes unified processing in the database interaction layer
data:image/s3,"s3://crabby-images/7a65b/7a65be5549d52a41dc163e9eddb357f0439bc54e" alt="2e79b19d19aa402c994a23567a17d90a.jpeg"
Insert
The logic is implemented in the database interaction layer
data:image/s3,"s3://crabby-images/0bc0b/0bc0b4aedcc369f4eef9494d12281c75d00d57d2" alt="e4d384b8d099e2fafc4ef5f4bfe5ea34.jpeg"
payment - No filing table ( Inquire about 、 newly added 、 to update )
characteristic : The only key is the internal number , The time of existing main queries can be judged according to the document number , There is no need to archive the table , It can completely solve the problem of hot storage database .
Inquire about
The logic realizes unified processing in the database interaction layer
data:image/s3,"s3://crabby-images/694ce/694ce710e3457fbcad77f14ad1474448c7f3e6d0" alt="ce2332d82afad1427db60e27df175e51.jpeg"
Some of the following situations can be handled specially to reduce the dependence of database cold storage .
Single query :
Query according to the payment order number , If you can judge the time according to the order number , Use the document number to filter the time range before querying the cold storage .
Batch query :
Query according to the transaction number , If you can judge the time according to the order number , Use the document number to filter the time range before querying the cold storage .
Some functions manage background function paging query , When adding cold storage query logic with high requirements for data query range , You can add the start time of the incoming query time range to filter whether to query the cold storage , When cold storage and hot storage exist, the hot storage data shall be retained first ( Only filter the same document number data in the same page ), If you have any objection to the result, you can use the order number to query again and return to the latest reconfirmation . Confirm with the product and operation whether it can be supported or not, just query the hot storage .
to update
The logic realizes unified processing in the database interaction layer
data:image/s3,"s3://crabby-images/153d7/153d749c6dd2f0d001a16f2b7a5d8468d04977ec" alt="3a9a34ee8aa511497988f4991a65f8c5.jpeg"
Insert
The logic is implemented in the database interaction layer
data:image/s3,"s3://crabby-images/33223/332238d2aaa8608954544eab45871fc0b26e59b7" alt="741b4b62adc225a302ab2c99ef9513e5.jpeg"
summary
Payment completely solves the problem of database storage pressure because there is no archived table , It greatly saves database storage resources .
Due to the new filing table , It greatly delays the storage pressure of the heat storage database , It also provides additional buffer expansion time for the transaction database , It provides sufficient time for the subsequent optimization of transactions and the solution of database storage problems .
results
Completely solved the problem of storage pressure of payment database , It effectively relieves the storage pressure of the hot database of the transaction database .
The retention days of database hot storage can be flexibly adjusted , The number of days available for storage can be reasonably adjusted according to the subsequent order quantity .
shortcoming
The archiving table is added in scheme 2 , And archive the full amount of data stored in the table , It can only reduce the storage space tension of transaction and payment databases , Unable to completely solve the problem of database storage .
Trading table released datafree Storage space cannot be provided for archive tables , It can only be used in the transaction table , The transaction table needs to be released irregularly datafree Space .
data:image/s3,"s3://crabby-images/07ab9/07ab99f6b4323d8d5befb446b98614557a773132" alt="897aed495ab9c3cdba87b3827c51ef8e.jpeg"
Join us
Financial payment As a public payment service , Provide stability for business 、 Efficient 、 Rich capital related services ; Tiktok payment as byte's own payment product , It can help users better consume and shop , Welcome to join the ByteDance financial team .
The financial R & D team is in hot recruitment , Welcome to Click on “ Read the original ” Or scan the qr code below The CV .
data:image/s3,"s3://crabby-images/e9508/e95085dfb42a58dc5c2333c62f6399759e46a525" alt="f295009889507e97ce6701673c9ddd3e.png"
Click on “ Read the original ”, Check the job details !
边栏推荐
- The industrial scope of industrial Internet is large enough. The era of consumer Internet is only a limited existence in the Internet industry
- Look at how clothing enterprises take advantage of the epidemic
- 按键精灵打怪学习-多线程后台坐标识别
- 产业互联网的产业范畴足够大 消费互联网时代仅是一个局限在互联网行业的存在
- C application interface development foundation - form control (2) - MDI form
- MySQL foundation 04 MySQL architecture
- MySQL foundation 07-dcl
- 【数据挖掘】任务6:DBSCAN聚类
- Tp6 fast installation uses mongodb to add, delete, modify and check
- 一位苦逼程序员的找工作经历
猜你喜欢
简易分析fgui依赖关系工具
MySQL foundation 05 DML language
leetcode 6103 — 从树中删除边的最小分数
[FPGA tutorial case 6] design and implementation of dual port RAM based on vivado core
Basis of information entropy
电信客户流失预测挑战赛
C application interface development foundation - form control (2) - MDI form
海量数据冷热分离方案与实践
串口抓包/截断工具的安装及使用详解
MySQL foundation 04 MySQL architecture
随机推荐
Daily topic: movement of haystack
Niu Ke swipes questions and clocks in
MySQL foundation 07-dcl
Type expansion of non ts/js file modules
MySQL - database query - basic query
數學知識:臺階-Nim遊戲—博弈論
LDC Build Shared Library
kivy教程之在 Kivy App 中使用 matplotlib 的示例
数学知识:台阶-Nim游戏—博弈论
Top ten regular spot trading platforms 2022
[Arduino experiment 17 L298N motor drive module]
leetcode 2097 — 合法重新排列数对
Leetcode 2097 - Legal rearrangement of pairs
强化学习 Q-learning 实例详解
leetcode 6103 — 从树中删除边的最小分数
MySQL basic usage 02
MySQL - database query - condition query
Kivy教程大全之如何在 Kivy 中创建下拉列表
Why can't the start method be called repeatedly? But the run method can?
Androd Gradle 对其使用模块依赖的替换