当前位置:网站首页>Xiaomai technology x hologres: high availability of real-time data warehouse construction of ten billion level advertising
Xiaomai technology x hologres: high availability of real-time data warehouse construction of ten billion level advertising
2022-06-29 17:55:00 【InfoQ】
One 、 Business Introduction
Two 、 Development history of Xiaomai digital warehouse : From divine plan to real-time warehouse integrating flow and batch
1、 Divine strategy stage
2、 Offline warehouse ( introduce MaxCompute)
- adopt JDBC The way to pull the Shence compass server , And pass DataWorks Synchronize data offline to MaxCompute;
- stay MaxCompute Through the four layer modeling of data warehouse (ODS、DWD、DWS、ADS), The result data passed DataWorks Offline synchronization to DB;
- In a certain DB Various analysis requirements of the docking business system .
- Between the systems, the Shence system is used to JDBC Pull data in the same way , Excessive reliance on third-party magic , Over coupling , When something goes wrong , The entire calculation process cannot continue , Unable to meet the agile analysis needs of the business . After the divine plan is restored , Manually participate in re running data , A lot of manpower was wasted .
- After data statistics , Data warehousing is slow , This greatly affects the running time of the entire link , And the real-time requirement of data computing is getting higher and higher , Unable to support at this stage .
- The data volume is growing exponentially , More and more analysis dimensions , The result data basically reached the level of detailed data , An existing data query engine DB It is not enough to support the multi-dimensional analysis of such big data , The biggest challenge is to make low latency for 10 billion scale behavioral data 、 high QPS Query analysis of .
- In order to solve the problem that the query engine is not enough to support large amount of data query , Therefore, a lot of pre calculation has been done for the data , Cause computational redundancy , The rising cost of .
- At the same time, there are more and more systems , Resulting in O & M costs 、 The development cost also increases linearly , As a result, various demands of the business cannot be met quickly .
3、 Stream batch integrated real-time data warehouse ( introduce Hologres+Flink)

- Clearer data structure : For different levels of data , They have different scopes , Each data tier has its scope , This makes it easier for businesses to locate and understand when using tables .
- Data bloodline : A business table is provided for business use , But this business table may come from many tables . If there's a problem with one of the source tables , We can locate the problem quickly and accurately , And clearly understand the scope of each table .
- Reduce redevelopment : Data layering normalization , Develop some common middle tier data , Can reduce double counting , Improve the usage of a single business table .
- Simplify complex problems : Divide a complex business into several steps to realize , Each layer deals with a single step , It's simpler and easier to understand . And it's easy to maintain the accuracy of the data , When the data goes wrong , You don't have to fix all the data , Just start with the problem steps and fix it . It's kind of similar Spark RDD Fault tolerance mechanism of .
- Reduce business impact : The business may change frequently , In this way, there is no need to change the service once and re access the data .
- Data is more real-time , Business decisions are made more quickly .
- Data is decoupled from third parties , More robust .
3、 ... and 、 Why choose Hologres?
1、 Support high-performance writes and extremely fast complex queries
- Query performance : Based on the current actual business scenario , Including simple and complex SQL Perform query performance verification , The performance without optimization in the early stage is almost the same , Back to Hologres Table design and underlying optimization for , We verified that Hologres Basically, there can be 4 Double or so , We will also do more performance tuning work with Ali's colleagues later .
- Write performance : Before in a DB On the environment ,MaxCompute Write a DB It's been a long time (1 Billion data in about an hour ), Especially after the query business comes up , Write performance is slowed down several times , Even downtime . And write MaxCompute Data to Hologres The performance of is very strong , 1 Hundreds of millions of data import 10 It can be completed in more than seconds .
2、 Meet multiple analysis scenarios
- Real time data warehouse : because Hologres And Flink Good integration , Through real-time data collection ,Flink Real time computing , Write data directly to Hologres in , Real time large screen can be built in real time 、 Real time monitoring and early warning 、 Real-time recommendation 、 Real time training and other applications , Respond quickly to business needs .
- MaxCompute Speed up queries :Hologres Can be directly through the way of appearance , Yes MaxCompute Query the data of , If higher performance is required , You can import the data into Hologres Higher performance query processing in . If it is the former way , You can do this without outputting the data , Query and analyze offline data .
- Adaptive advertising analysis scenarios :Hologres There are many rich analysis functions , Such as retained analysis function and funnel analysis function , This is very applicable to the relevant scenarios of advertising business , There is no need for our secondary development , You can use it directly .
Four 、 Ten billion level user behavior analysis best practices

CALL set_table_property('holo_dws_usr_label_df', 'distribution_key', 'product_id,device_id');
CALL set_table_property('holo_ad_income_dt_test', 'distribution_key', 'product_id,device_id');CALL SET_TABLE_PROPERTY('public.holo_ad_income_dt_test', 'bitmap_columns', '"product_id:on","ad_id:on","position_id:on"');5、 ... and 、Hologres High availability implementation of read / write separation
1、 Optimize the background : Reading and writing do not separate from each other
- It is offline at about 10 a.m. every day (T+1) Peak period of task writing , During this period, a large number of report statistics tasks are aggregated , Yes Hologres The write operation takes a lot of resources .
- The data volume of some write tasks is particularly large , The result data of day increment has reached hundreds of millions , Long write time , Keep taking up resources . Some result tables have too many fields , More than 1000 , It consumes more resources .
- While writing , There are parts MaxCompute Read Hologres The task of appearance , This causes the number of connections to increase , Affect other tasks .
- The reporting period is also the peak period for business query , A large number of queries are executed at the same time as a large number of writes , interact .
- There is an automatic retry mechanism for the write task , Every time oom、timeout Or other abnormal errors , The task will automatically re run and occupy resources , As a result, more and more write tasks in large areas are abnormal .
2、 Optimization means :Hologres Shared storage instance deployment
- Divide the business into different modules , At the same time, the report background 、tableau、 Migrate read-only queries of production business modules to read-only instances
- Synchronization tasks and a small number of read / write tasks remain in the read / write master instance , Different module data is stored in different schema, Easy to manage .


3、 The optimization effect : The system stability is significantly improved
6、 ... and 、 Business value
边栏推荐
- 软件测试——基础理论知识你都不一定看得懂
- 基于STM32F103ZET6库函数定时器中断实验
- 从一个被应用商店坑了的BUG说起
- Inherit Chinese virtues, pay attention to the health of the middle-aged and the elderly, and Yurun milk powder has strong respect for the elderly
- 图像特征计算与表示——基于内容的图像检索
- What technology is an applet container? Can it help Internet of things enterprises break through the red sea?
- 最长异或路径(dfs+01trie)
- Prevent form resubmission based on annotations and interceptors
- Selenium file upload method
- Walk with love, educate and run poor families, and promote public welfare undertakings
猜你喜欢

Yurun multidimensional makes efforts in the charity field and bravely resists the corporate public welfare banner

How to use the chart control of the b/s development tool devextreme - customize the axis position?

How to solve MySQL 1045 error in Linux

Issue 42: is it necessary for MySQL to have multiple column partitions

传承中华美德,关注中老年大健康,育润奶粉敬老情浓

Serial port experiment based on stm32f103zet6 library function

牛客小白月赛52 E 分组求对数和(容斥定理+二分)

Distributed | several steps of rapid read / write separation

如何使用B/S开发工具DevExtreme的图表控件 - 自定义轴位置?

How MySQL queries character set codes of tables
随机推荐
Redux源码分析之createStore
Mysql database literacy, do you really know what a database is
kubekey2.2.1 kubernetes1.23.7离线包制作+harbor部暑并上传镜像
Professor of Cambridge University: eating breakfast often is harmful and dangerous. - you know what
js两个一维数组合并并去除相同项(整理)
面试中问最常问的海量数据处理你拿捏了没?
What is the SRM system? How do I apply the SRM system?
What is the function of MySQL cursors
Uploading files using AutoIT
关于日期相加减问题
Partial mock of static class of phpunit operation
ABC253 D FizzBuzz Sum Hard(容斥定理)
小迈科技 X Hologres:高可用的百亿级广告实时数仓建设
[Oracle] basic knowledge interview questions
Can MySQL views create indexes
selenium上传文件
Web Scraping with Beautiful Soup for Data Scientist
Face recognition 4- research on Baidu commercial solutions
Visio标注、批注位置
传承中华美德,关注中老年大健康,育润奶粉敬老情浓