当前位置:网站首页>Xiaomai technology x hologres: high availability of real-time data warehouse construction of ten billion level advertising
Xiaomai technology x hologres: high availability of real-time data warehouse construction of ten billion level advertising
2022-06-29 17:55:00 【InfoQ】
One 、 Business Introduction
Two 、 Development history of Xiaomai digital warehouse : From divine plan to real-time warehouse integrating flow and batch
1、 Divine strategy stage
2、 Offline warehouse ( introduce MaxCompute)
- adopt JDBC The way to pull the Shence compass server , And pass DataWorks Synchronize data offline to MaxCompute;
- stay MaxCompute Through the four layer modeling of data warehouse (ODS、DWD、DWS、ADS), The result data passed DataWorks Offline synchronization to DB;
- In a certain DB Various analysis requirements of the docking business system .
- Between the systems, the Shence system is used to JDBC Pull data in the same way , Excessive reliance on third-party magic , Over coupling , When something goes wrong , The entire calculation process cannot continue , Unable to meet the agile analysis needs of the business . After the divine plan is restored , Manually participate in re running data , A lot of manpower was wasted .
- After data statistics , Data warehousing is slow , This greatly affects the running time of the entire link , And the real-time requirement of data computing is getting higher and higher , Unable to support at this stage .
- The data volume is growing exponentially , More and more analysis dimensions , The result data basically reached the level of detailed data , An existing data query engine DB It is not enough to support the multi-dimensional analysis of such big data , The biggest challenge is to make low latency for 10 billion scale behavioral data 、 high QPS Query analysis of .
- In order to solve the problem that the query engine is not enough to support large amount of data query , Therefore, a lot of pre calculation has been done for the data , Cause computational redundancy , The rising cost of .
- At the same time, there are more and more systems , Resulting in O & M costs 、 The development cost also increases linearly , As a result, various demands of the business cannot be met quickly .
3、 Stream batch integrated real-time data warehouse ( introduce Hologres+Flink)

- Clearer data structure : For different levels of data , They have different scopes , Each data tier has its scope , This makes it easier for businesses to locate and understand when using tables .
- Data bloodline : A business table is provided for business use , But this business table may come from many tables . If there's a problem with one of the source tables , We can locate the problem quickly and accurately , And clearly understand the scope of each table .
- Reduce redevelopment : Data layering normalization , Develop some common middle tier data , Can reduce double counting , Improve the usage of a single business table .
- Simplify complex problems : Divide a complex business into several steps to realize , Each layer deals with a single step , It's simpler and easier to understand . And it's easy to maintain the accuracy of the data , When the data goes wrong , You don't have to fix all the data , Just start with the problem steps and fix it . It's kind of similar Spark RDD Fault tolerance mechanism of .
- Reduce business impact : The business may change frequently , In this way, there is no need to change the service once and re access the data .
- Data is more real-time , Business decisions are made more quickly .
- Data is decoupled from third parties , More robust .
3、 ... and 、 Why choose Hologres?
1、 Support high-performance writes and extremely fast complex queries
- Query performance : Based on the current actual business scenario , Including simple and complex SQL Perform query performance verification , The performance without optimization in the early stage is almost the same , Back to Hologres Table design and underlying optimization for , We verified that Hologres Basically, there can be 4 Double or so , We will also do more performance tuning work with Ali's colleagues later .
- Write performance : Before in a DB On the environment ,MaxCompute Write a DB It's been a long time (1 Billion data in about an hour ), Especially after the query business comes up , Write performance is slowed down several times , Even downtime . And write MaxCompute Data to Hologres The performance of is very strong , 1 Hundreds of millions of data import 10 It can be completed in more than seconds .
2、 Meet multiple analysis scenarios
- Real time data warehouse : because Hologres And Flink Good integration , Through real-time data collection ,Flink Real time computing , Write data directly to Hologres in , Real time large screen can be built in real time 、 Real time monitoring and early warning 、 Real-time recommendation 、 Real time training and other applications , Respond quickly to business needs .
- MaxCompute Speed up queries :Hologres Can be directly through the way of appearance , Yes MaxCompute Query the data of , If higher performance is required , You can import the data into Hologres Higher performance query processing in . If it is the former way , You can do this without outputting the data , Query and analyze offline data .
- Adaptive advertising analysis scenarios :Hologres There are many rich analysis functions , Such as retained analysis function and funnel analysis function , This is very applicable to the relevant scenarios of advertising business , There is no need for our secondary development , You can use it directly .
Four 、 Ten billion level user behavior analysis best practices

CALL set_table_property('holo_dws_usr_label_df', 'distribution_key', 'product_id,device_id');
CALL set_table_property('holo_ad_income_dt_test', 'distribution_key', 'product_id,device_id');CALL SET_TABLE_PROPERTY('public.holo_ad_income_dt_test', 'bitmap_columns', '"product_id:on","ad_id:on","position_id:on"');5、 ... and 、Hologres High availability implementation of read / write separation
1、 Optimize the background : Reading and writing do not separate from each other
- It is offline at about 10 a.m. every day (T+1) Peak period of task writing , During this period, a large number of report statistics tasks are aggregated , Yes Hologres The write operation takes a lot of resources .
- The data volume of some write tasks is particularly large , The result data of day increment has reached hundreds of millions , Long write time , Keep taking up resources . Some result tables have too many fields , More than 1000 , It consumes more resources .
- While writing , There are parts MaxCompute Read Hologres The task of appearance , This causes the number of connections to increase , Affect other tasks .
- The reporting period is also the peak period for business query , A large number of queries are executed at the same time as a large number of writes , interact .
- There is an automatic retry mechanism for the write task , Every time oom、timeout Or other abnormal errors , The task will automatically re run and occupy resources , As a result, more and more write tasks in large areas are abnormal .
2、 Optimization means :Hologres Shared storage instance deployment
- Divide the business into different modules , At the same time, the report background 、tableau、 Migrate read-only queries of production business modules to read-only instances
- Synchronization tasks and a small number of read / write tasks remain in the read / write master instance , Different module data is stored in different schema, Easy to manage .


3、 The optimization effect : The system stability is significantly improved
6、 ... and 、 Business value
边栏推荐
- 数字孪生能源系统,打造低碳时代“透视”眼
- 使用autoIt 上传文件
- The soft youth under the blessing of devcloud makes education "smart" in the cloud
- What is the function of MySQL cursors
- Openfeign use step polling strategy and weight log4j configuration of openfeign interceptor
- Bloom filter:
- Li Kou today's question -535 Encryption and decryption of tinyurl
- selenium上传文件
- 金鱼哥RHCA回忆录:DO447构建高级作业工作流--创建作业模板调查以设置工作的变量
- SRM supplier collaborative management system function introduction
猜你喜欢

Walk with love, educate and run poor families, and promote public welfare undertakings

基于STM32F103ZET6库函数串口实验

Digital twin energy system, creating a "perspective" in the low-carbon era

SRM供应商协同管理系统功能介绍

Serial port experiment based on stm32f103zet6 library function

牛客小白月赛52 E 分组求对数和(容斥定理+二分)

SRM supplier collaborative management system function introduction

小程序容器是什么技术?能助力物联网企业红海突围?

【目标跟踪】|stark配置 win otb

Have you grasped the most frequently asked question in the interview about massive data processing?
随机推荐
[Oracle] basic knowledge interview questions
Visual Studio插件CodeRush正式发布v22.1——优化调试可视化工具
3h精通OpenCV(六)-图像堆叠
Yurun multidimensional makes efforts in the charity field and bravely resists the corporate public welfare banner
Partial mock of static class of phpunit operation
lodash深拷贝使用
[webdriver] upload files using AutoIT
Mac installation php7.2
基于STM32F103ZET6库函数定时器中断实验
Repair of JSON parsing errors in a collection
Segment tree and tree array template (copy and paste are really easy to use)
Visual studio plug-in coderush officially released v22.1 -- visual tool for optimizing debugging
ABC253 D FizzBuzz Sum Hard(容斥定理)
Createstore for Redux source code analysis
Premature end of script headers 或 End of script output before headers
2022 spring summer collection koreano essential reshapes the vitality of fashion
力扣每日一题 06.29 两数相加
mac安装php7.2
Face recognition 4- research on Baidu commercial solutions
使用autoIt 上传文件