当前位置:网站首页>Practice of issuing vouchers for Tiktok payment of 100000 TPS traffic
Practice of issuing vouchers for Tiktok payment of 100000 TPS traffic
2022-06-23 23:13:00 【ByteDance technical team】
Focus on Dry goods don't get lost
background
During the Spring Festival in recent years , Tiktok will bring users a variety of spring festival activities , Hundreds of millions of users participate in it every year .2022 In the Spring Festival , Tiktok payment also participated in the spring festival activities , Issue Tiktok payment vouchers to a large number of users , Help users get a better experience in the Tiktok spring festival activities . For Tiktok payment , It's a big challenge , Because the Tiktok payment team has not really experienced the application scenario of the spring festival activity before , This is a great test for the Tiktok payment marketing system .
Introduction to Tiktok payment marketing system
The current marketing system hierarchy and structure are as follows :

The business of the marketing team is mainly divided into three directions , Marketing launch 、 Marketing activities 、 Marketing assets . Marketing delivery is mainly responsible for marketing rights and interests , Expose marketing interests to users , Marketing activities are mainly responsible for the construction of marketing methods and the distribution of rights and interests , Marketing assets are mainly responsible for the management of user marketing assets , Like coupons 、 Distribution and use of legislative deductions . For the Spring Festival issue , The link is connected with the main venue of the Spring Festival by marketing activities , Call the marketing asset interface to issue payment vouchers to users .
Challenge
The Spring Festival has a large number of concurrent activities , At the same time, a large number of people will participate in the collection of vouchers , The impact on the system during the peak period will be very large .
Spring festival activities are for users all over the country , The audience is very wide , User experience is very important , Therefore, the time-consuming action of issuing bonds needs to be as low as possible .
The number of participants in spring festival activities is very large , During the Spring Festival, a large number of payment vouchers are expected to be issued , Fund security needs key guarantee .
programme
Performance guarantee
Asynchronous issuance - Improve interface response speed
Considering that most payment vouchers are used in Tiktok e-commerce scenarios , And the online shopping flow of users during the Spring Festival is small , Users have a low probability of using the coupon immediately after receiving it , You only need to ensure that the user receives the actual perceived coupon ( Check the coupon or use the coupon ) The time delay is controllable , Then the user experience will not be affected , Therefore, we have adopted the asynchronous issuance mode , After the marketing receives the upstream issuance request , Return immediately after being singled out , Notify the upstream that the acceptance is successful .

One of the problems brought about by asynchronous issuance is that the user perceives the success of collecting vouchers , The actual coupon was not issued to the user , Most of them are out of stock 、 Risk control interception and other reasons . In response to this question , We had a discussion with our operation classmates , During the Spring Festival, the inventory of marketing activities will be allocated as much as possible , The interception rate of risk control will be reduced to the minimum , No interception is made except for abnormal bill brushing , Try to reduce the possibility of asynchronous issuance failure .
Two tier local queues - Improve processing power , Smooth flow
After the traffic is asynchronized , Because the activity flow is mostly the structure of ECG , The peaks and troughs are obvious , We learn from the producers - Consumer model , Queues are introduced to smooth traffic . At first we thought about RocketMQ, However, once additional middleware is introduced into the core link of event coupon issuance , And then we rely on it , Its availability and disaster recovery plan need to be considered additionally , And RocketMq Belongs to the remote queue , The delay between producers and consumers is not easy to control , So we designed a local queue model , To avoid the above problems .

The local queue model is as above , The queue consumer logic first obtains the token from the distributed flow limiter , Get the data from the queue after successful acquisition , Create a new one goroutine Process activity award logic , Then repeat the process .
Although the queue itself has the function of peak clipping , But we still can't control the consumption rate accurately , When the upstream flow is too large , The consumption rate of the queue consumer will also rise , There is a risk of breaking the system , Therefore, it is still necessary for the current limiting layer to do more accurate flow control , However, the current limiting granularity only depending on the interface dimension is too coarse , Therefore, a layer of business flow restriction is introduced in the business logic layer . The distributed current limiter here uses a self-developed distributed current limiter component , Priority will get the token locally , When the local token is insufficient , It will pull tokens from the remote in batches to the local .
In addition, in this model, we use a double-layer queue , The first tier queue is used to protect the marketing campaign tier , Set flow limit based on the processing capacity of marketing activities ; After the activity decision is passed , Then put the request into the second level queue , The current limit of the second tier queue is set based on the system carrying capacity of the marketing assets . Through a two-tier queue , It can avoid the capacity difference between marketing activities and marketing assets , The throughput of both systems can be maximized .
Inventory deduction optimization - Reduce hot spots , Pressure drop
For performance reasons , The coupon batch inventory of marketing assets is placed in Redis, At present, the logic of marketing asset inventory operation is : Received a user's ticket issuance request -> Read Redis, Check whether the inventory of issued coupons is sufficient -> write in Redis, Deduction of stock in bond batch .
In order to make Redis The cluster traffic is uniform , The inventory data of different coupon batches are scattered to different Redis In pieces , However, when a certain batch of coupons is distributed in a centralized way for a period of time , The traffic will still be largely shifted into a partition , cause Redis Data hot issues . If we can find a way to combine the operations of multiple sporadic deduction of inventory in a coupon batch , Then the data hotspot problem can be greatly alleviated .

The logic of consolidated issuance is shown in the figure above , The marketing campaign attempts to get... Non blocking from the tier 2 queue first N Coupon issuance request data , If you can get , Package and send these data to marketing assets , If the data obtained from the queue is less than N strip , It means that there is no more data in the queue at the moment , Send the coupon directly as soon as possible ; If you can't get the data , After a short period of random sleep , Try again to get , If the data cannot be obtained after a limited number of retries , End this cycle .
After the marketing assets receive the request for consolidated issuance , It will try to merge the coupon issuance requests of the same coupon batch in the request , Perform centralized deduction when deducting inventory , For example, before N Different users issue the same coupon batch A, Each inventory minus 1, Need to be right Redis Conduct N Write operations ; After the merger and issuance of securities , Only need to Redis Conduct 1 Write operations , Inventory deduction N that will do .
in addition , Verification logic before deducting inventory , Actually, you don't need to visit every time Redis, This verification itself is only a pre verification , Whether the final deduction is successful or not depends on the execution result of the subsequent deduction operation , The most important function of verification is Redis When the inventory is insufficient, unnecessary deductions can be blocked , It doesn't need to be very precise , Therefore, we consider maintaining the inventory information of a coupon batch in the local memory of the application , It will be Redis Synchronize inventory information to the local , When issuing vouchers, you only need to simply verify the inventory information in the local memory , No longer need to access remote Redis.
Graceful exit - Improve system robustness
A major drawback of using local queues for data processing is that memory volatility will make the data unable to be stored permanently , When the application is republished or upgraded , The coupon issuance data in the local queue may be lost , The user's ticket issuance request cannot be processed normally .
In order not to lose the data in memory , We need to be able to sense the exit signal of the application , Process the data in memory before the application exits . Therefore, we investigated the life cycle of byte cloud application instances , When the instance terminates , First, the current application instance will be removed from the service registry , This operation means that the current instance will no longer receive new external traffic ; It will be sent later SIGINT Exit signal to business process . App received SIGINT After the signal , No longer consume the remaining coupon issuance request data in the queue , Instead, the data is sent to a remote queue , The data is consumed by other application instances that are still alive .

Bottom compensation - Ensure ultimate consistency
Although the application graceful exit has been realized , But in extreme cases , such as panic、oom、 Application exit caused by abnormal conditions such as physical downtime , The application cannot receive SIGINT The signal , It is impossible to execute the business logic of graceful exit . Therefore, we have added an additional compensation mechanism , Scan the table through scheduled tasks , Re deliver the data stuck in the intermediate state for a long time to the local queue for processing .

Now that you have a scheduled task to compensate for , Is there still a need for elegant exit logic ? In fact, it is necessary , When the application goes online, frequent application restarts will occur , At this point, there is likely to be a large number of requests from local queues that have not been processed , If you only rely on timed tasks to handle everything , Then the time delay between the user receiving the coupon and the actual receipt of the coupon may be very large , It may make the user experience worse . Therefore, elegant exit and compensation are complementary , Elegant exit maximizes the user experience , Bottom compensation ensures the final consistency of data .
Green channel - Enhance user experience
The assumption of asynchronous issuance is that the user takes the action of collecting vouchers , To actually perceive the existence of the coupon , There is a period of buffer in the middle , However, users may directly enter the Spring Festival wallet to check after receiving the coupon , If the asynchronous issuance has not been completed at this time , It may cause customer complaints . In this case , We have made an agreement with the upstream Spring Festival host , When the user enters the Spring Festival wallet to view the coupons within a short time after receiving the coupons , The upstream will call the coupon issuance interface again , And add the green channel logo , After receiving this ID, we will change the asynchronous issuance to synchronous , Give priority to issuing vouchers to current users , Ensure the user experience .
Fund prevention and control
In addition to the performance guarantee , Capital security also needs to be paid attention to , In this Spring Festival coupon issuance activity , We have mainly taken the following prevention and control measures .
Idempotent check
Each coupon issuance action will generate a globally unique serial number , The serial number will be entered into the database as a unique index when issuing the coupons , When the user continuously clicks to collect vouchers or the network is abnormal and retries, etc , The same serial number cannot be successfully dropped due to a unique index conflict , So as to avoid the capital loss caused by repeated issuance of securities .
User dimension collection limit
Idempotent check through serial number can solve some problems , But for some professional scalpers , This restriction may be bypassed , Bypass idempotent check by forging serial number . In this case , We maintain a collection data of user coupons , The coupon issuance will verify whether each user has reached the upper limit of collection , If the upper limit is not reached, the coupon will be issued normally , At the same time, update the user's collection record , Otherwise, the issuance will be terminated .
Coupon batch groups are mutually exclusive
The user dimension collection limit mainly prevents the same user from collecting the same batch multiple times , But in the whole spring festival activities , The operating students may issue multiple coupons for different purposes , However, the distribution groups may overlap or even the same batch , If different batches of vouchers are issued to users for many times , It may drive up the marketing cost . For the above reasons , We extracted the concept of coupon batch group , Coupons in the same group , The marketing purpose is basically the same , For example, they are all new 、 The purpose of promoting or preserving , When the user has received a coupon in the coupon batch group , Users can no longer receive other coupons in the group , That is, there is a mutually exclusive relationship between coupon batches in the group , In this way, avoid double subsidy of marketing expenses .

Inventory oversold prevention
As mentioned above , The inventory data of the marketing coupon batch is stored in Redis, Every time the Redis When making inventory deductions , There may be a network timeout 、 Failure and other abnormal conditions , The result of inventory deduction is unknown . When this happens , We choose " tolerate ", Think that the issuance of securities failed , Directly end the issuance logic , Do not rollback ; After deducting inventory successfully , And then issue vouchers for users , If the issuance fails , At this point, you can try to rollback Redis stock , Because it has been determined that the inventory has been successfully deducted in this request , But rollback failed , No additional retry processing .

The above plan , It may cause the inventory to be sold less , But this conservative strategy can effectively prevent the possibility of oversold inventory , It can be seen as a trade-off and balance between data consistency and availability .
The risk control platform is connected
In the coupon issuance link , We also connected to the risk control platform inside the byte , The risk control platform will collect and analyze user and equipment information , Identify scalpers and malicious users through risk assessment , Intercept the issuance of coupons , Avoid potential asset losses .
Data monitoring and checking
In addition to the above capital control measures , We also do a lot of monitoring on the issuance activities , Including the quantity of coupons issued by the batch , The issuing rate of the coupon batch 、 Local queue accumulation 、 Local queue consumer rate, etc , When the monitoring data is abnormal year-on-year or month on month, it will give an alarm in time and manually intervene for troubleshooting . in addition , When the issuance of the coupon batch is completed , We will check the consistency of the data again , Including comparing whether the number of vouchers issued by the user is consistent with the inventory consumption 、 Check whether the voucher user has exceeded the upper limit of voucher batch collection .
summary
After the optimization of the above scheme , We successfully supported this year's spring festival main venue ticket issuance activities , And achieved good results :
On the system
The overall external marketing can undertake 100000 TPS The throughput of issuing bonds .
Business
During the Spring Festival, tens of millions of Tiktok cards were distributed for payment 、DOU Installment coupons , Support Tiktok payment 、DOU The activity demands of the two core businesses by stages .
99% Your ticket can be in 0.5s To the user's account , The actual delay of asynchronous issuance is very low , Better user experience , Meet business expectations .
Follow up planning
After the test of the flow of spring festival activities this year , Marketing has accumulated a lot of experience and systematic ability , However, there are still areas that need continuous iteration and improvement :
Standardization of asynchronous issuance capability . We have initially tried asynchronous coupon issuance and applied it to spring festival activities , The effect is good , It can be predicted that 618、 There will still be many scenes suitable for asynchronous issuance of bonds in the double 11 and other major promotion festivals , Therefore, we are ready to standardize the interface for issuing coupons for external marketing , Make asynchronous coupon issuance an optional capability , With the access party 、 Scenarios, etc , To achieve flexible selection and configuration of the issuance mode .
Promotion of local queue mode . This design and implementation of the two-tier local queue , We have successfully completed the task of issuing bonds , The task execution delay is lower than that of the remote queue , Queue hierarchy 、 Current limiting 、 Graceful exit 、 Compensation and other auxiliary functions also have a good guarantee for the robustness of the system , Later, we will abstract this module into a general-purpose small framework , So that it can support more business scenarios suitable for asynchronous processing .
边栏推荐
- sql server常用sql
- What is the development prospect of face recognition technology?
- How to use FTP to upload websites to the web
- 新股民怎样炒股票开户?在线开户安全么?
- SAVE: 软件分析验证和测试平台
- Dlib detects 68 facial features, and uses sklearn to train a facial smile recognition model based on SVM
- JMeter pressure measuring tool beginner level chapter
- Giants end up "setting up stalls" and big stalls fall into "bitter battle"
- Section 29 basic configuration case of Tianrongxin topgate firewall
- 2022云顾问技术系列之存储&CDN专场分享会
猜你喜欢

Section 29 basic configuration case of Tianrongxin topgate firewall

蚂蚁集团自研TEE技术通过国家级金融科技产品认证
Detailed usage of exists in SQL statements

Bilibili×蓝桥云课|线上编程实战赛全新上新!

Desai wisdom number - histogram (basic histogram): the way to celebrate father's day in 2022

评估和选择最佳学习模型的一些指标总结

蚂蚁获FinQA竞赛冠军,在长文本数值推理AI技术上取得突破

Face and lining of fresh food pre storage

详解四元数

Save: software analysis, verification and test platform
随机推荐
Aicon2021 | AI technology helps content security and promotes the healthy development of Internet Environment
What server is used for website construction? What is the price of the server
How PostgreSQL creates partition tables
Urgent! Tencent cloud container security supports the detection of Apache log4j2 vulnerabilities for the first time. It is in free trial
The sandbox and bayz have reached cooperation to jointly drive the development of metauniverse in Brazil
What is an immunohistochemical experiment? Immunohistochemical experiment
Performance test - LoadRunner obtains the return value and user-defined parameters (parameter operation)
Ambire 指南:Arbitrum 奥德赛活动开始!第一周——跨链桥
百万消息量IM系统技术要点分享
【设计】1359- Umi3 如何实现插件化架构
Unknown character set index for field ‘255‘ received from server.
2022云顾问技术系列之存储&CDN专场分享会
TDD开发模式流程推荐
评估和选择最佳学习模型的一些指标总结
Command line setting the next boot to enter safe mode
Deserialization - PHP deserialization
Analysis and application of ThreadLocal source code
巨头下场“摆摊”,大排档陷入“苦战”
Operation and maintenance failure experience sharing
Application of clock synchronization system in banking system