当前位置:网站首页>4. Data splitting of Flink real-time project
4. Data splitting of Flink real-time project
2022-07-03 21:27:00 【Contestant No.1 position】
1. Abstract
The log data we collected earlier has been saved to Kafka in , As log data ODS layer , from kafka Of ODS The log data read by the layer is divided into 3 class , Page log 、 Start log and exposure log . Although these three types of data are user behavior data , But it has a completely different data structure , So we need to split it . Write the split logs back to Kafka In different topics , As log DWD layer .
The page log is output to the main stream , The startup log is output to the startup side output stream , The exposure log is output to the exposure side output stream
2. Identify new and old users
The client business itself has the identification of new and old users , But it's not accurate enough , Need to reconfirm with real-time calculation ( Business operations are not involved , Just make a simple status confirmation ).
Data splitting is realized by using side output stream
According to the content of log data , Divide the log data into 3 class : Page log 、 Start log and exposure log . Push the data of different streams to the downstream kafka Different Topic in
3. Code implementation
In the bag app Create flink Mission BaseLogTask.java,
adopt flink consumption kafka The data of , Then record the consumption checkpoint Deposit in hdfs in , Remember to create the path manually , Then give the authority
checkpoint Optional use , You can turn it off during the test .
MyKafkaUtil.java Tool class
4. New and old visitor status repair
Rules for identifying new and old customers
Identify new and old visitors , The front end will record the status of new and old customers , Maybe not , Here again , preservation mid State of one day ( Save first visit date as status ), Wait until there is a log in the back equipment , Get the date from the status and compare the log generation date , If the status is not empty , And the status date is not equal to the current date , It means it's a regular visitor , If is_new Mark is 1, Then repair its state .
5. Data splitting is realized by using side output stream
After the above new and old customers repair , Then divide the log data into 3 class
Start log label definition : OutputTag<String> startTag = new OutputTag<String>("start"){};
And exposure log label definitions : OutputTag<String> displayTag = new OutputTag<String>("display"){};
The page log is output to the main stream , The startup log is output to the startup side output stream , The exposure log is output to the exposure log side output stream .
The data is split and sent to kafka
- dwd_start_log: start log
- dwd_display_log: Exposure log
- dwd_page_log: Page log
边栏推荐
- 常用sql集合
- 十大券商开户注册安全靠谱吗?有没有风险的?
- 内存分析器 (MAT)
- XAI+网络安全?布兰登大学等最新《可解释人工智能在网络安全应用》综述,33页pdf阐述其现状、挑战、开放问题和未来方向
- 仿网易云音乐小程序
- 大神们,我想发两个广播流1 从mysql加载基础数据,广播出去2 从kafka加载基础数据的变更
- JS notes (III)
- Qualcomm platform WiFi -- P2P issue
- Capturing and sorting out external articles -- autoresponder, composer, statistics [III]
- Day 9 HomeWrok-ClassHierarchyAnalysis
猜你喜欢
鹏城杯 WEB_WP
[secretly kill little buddy pytorch20 days -day02- example of image data modeling process]
Yyds dry inventory hcie security Day12: concept of supplementary package filtering and security policy
QT6 QML book/qt quick 3d/ Basics
《ActBERT》百度&悉尼科技大学提出ActBERT,学习全局局部视频文本表示,在五个视频-文本任务中有效!...
Hcie security Day12: supplement the concept of packet filtering and security policy
常用sql集合
Pengcheng cup Web_ WP
[vulnhub shooting range] impulse: lupinone
Mysql - - Index
随机推荐
gslb(global server load balance)技术的一点理解
淺析 Ref-NeRF
Hcie security Day12: supplement the concept of packet filtering and security policy
Minio deployment
[Yugong series] go teaching course 002 go language environment installation in July 2022
Redis concludes that the second pipeline publishes / subscribes to bloom filter redis as a database and caches RDB AOF redis configuration files
Day 9 HomeWrok-ClassHierarchyAnalysis
一台服务器最大并发 tcp 连接数多少?65535?
Capture de paquets et tri du contenu externe - - autoresponder, composer, statistiques [3]
Compilation Principle -- syntax analysis
Qualcomm platform WiFi -- P2P issue
Pengcheng cup Web_ WP
leetcode-540. A single element in an ordered array
Global and Chinese market of wireless hard disk 2022-2028: Research Report on technology, participants, trends, market size and share
Decompile and modify the non source exe or DLL with dnspy
An expression that regularly matches one of two strings
Redis data migration (II)
Great gods, I want to send two broadcast streams: 1. Load basic data from MySQL and 2. Load changes in basic data from Kafka
Sort out several network request methods of JS -- get rid of callback hell
Goodbye 2021, how do programmers go to the top of the disdain chain?