当前位置:网站首页>4. Data splitting of Flink real-time project
4. Data splitting of Flink real-time project
2022-07-03 21:27:00 【Contestant No.1 position】
1. Abstract
The log data we collected earlier has been saved to Kafka in , As log data ODS layer , from kafka Of ODS The log data read by the layer is divided into 3 class , Page log 、 Start log and exposure log . Although these three types of data are user behavior data , But it has a completely different data structure , So we need to split it . Write the split logs back to Kafka In different topics , As log DWD layer .
The page log is output to the main stream , The startup log is output to the startup side output stream , The exposure log is output to the exposure side output stream
2. Identify new and old users
The client business itself has the identification of new and old users , But it's not accurate enough , Need to reconfirm with real-time calculation ( Business operations are not involved , Just make a simple status confirmation ).
Data splitting is realized by using side output stream
According to the content of log data , Divide the log data into 3 class : Page log 、 Start log and exposure log . Push the data of different streams to the downstream kafka Different Topic in
3. Code implementation
In the bag app Create flink Mission BaseLogTask.java,
adopt flink consumption kafka The data of , Then record the consumption checkpoint Deposit in hdfs in , Remember to create the path manually , Then give the authority
checkpoint Optional use , You can turn it off during the test .
MyKafkaUtil.java Tool class
4. New and old visitor status repair
Rules for identifying new and old customers
Identify new and old visitors , The front end will record the status of new and old customers , Maybe not , Here again , preservation mid State of one day ( Save first visit date as status ), Wait until there is a log in the back equipment , Get the date from the status and compare the log generation date , If the status is not empty , And the status date is not equal to the current date , It means it's a regular visitor , If is_new Mark is 1, Then repair its state .
5. Data splitting is realized by using side output stream
After the above new and old customers repair , Then divide the log data into 3 class
Start log label definition : OutputTag<String> startTag = new OutputTag<String>("start"){};
And exposure log label definitions : OutputTag<String> displayTag = new OutputTag<String>("display"){};
The page log is output to the main stream , The startup log is output to the startup side output stream , The exposure log is output to the exposure log side output stream .
The data is split and sent to kafka
- dwd_start_log: start log
- dwd_display_log: Exposure log
- dwd_page_log: Page log
边栏推荐
- "Designer universe" argument: Data Optimization in the design field ultimately falls on cost, safety and health | chinabrand.com org
- Remember the experience of automatically jumping to spinach station when the home page was tampered with
- MySQL - index
- 全网都在疯传的《老板管理手册》(转)
- 电子科技大学|强化学习中有效利用的聚类经验回放
- 常用sql集合
- Minio deployment
- Collections SQL communes
- Transformation between yaml, Jason and Dict
- [gd32l233c-start] 5. FLASH read / write - use internal flash to store data
猜你喜欢

Minio deployment

MySQL——idea连接MySQL

Collections SQL communes

Rhcsa third day notes

APEC industry +: father of the king of the ox mill, industrial Internet "king of the ox mill anti-wear faction" Valentine's Day greetings | Asia Pacific Economic media | ChinaBrand
![[secretly kill little buddy pytorch20 days -day02- example of image data modeling process]](/img/14/8ab1f1fb142e10dead124851180d03.jpg)
[secretly kill little buddy pytorch20 days -day02- example of image data modeling process]

"Designer universe" APEC safety and health +: environmental protection Panda "xiaobaobao" Happy Valentine's Day 2022 | ChinaBrand | Asia Pacific Economic media

Etcd raft Based Consistency assurance

MySQL——JDBC

Hcie security Day12: supplement the concept of packet filtering and security policy
随机推荐
请教大家一个问题,用人用过flink sql的异步io关联MySQL中的维表吗?我按照官网设置了各种
Talk about daily newspaper design - how to write a daily newspaper and what is the use of a daily newspaper?
For in, foreach, for of
Cesiumjs 2022 ^ source code interpretation [7] - Analysis of the request and loading process of 3dfiles
电子科技大学|强化学习中有效利用的聚类经验回放
JS three families
Tidb's initial experience of ticdc6.0
MySQL - index
Global and Chinese market of wall mounted kiosks 2022-2028: Research Report on technology, participants, trends, market size and share
Imitation Netease cloud music applet
Idea shortcut word operation
不同业务场景该如何选择缓存的读写策略?
JS notes (III)
MySQL——索引
[gd32l233c-start] 5. FLASH read / write - use internal flash to store data
17 websites for practicing automated testing. I'm sure you'll like them
Single page application architecture
Global and Chinese market of AC induction motors 2022-2028: Research Report on technology, participants, trends, market size and share
leetcode-540. A single element in an ordered array
技术管理进阶——如何在面试中考察候选人并增大入职概率