当前位置:网站首页>Based on Flink real-time project: user behavior analysis (III: Statistics of total website views (PV))
Based on Flink real-time project: user behavior analysis (III: Statistics of total website views (PV))
2022-07-27 00:59:00 【A photographer who can't play is not a good programmer】
1. demand :
1. Total site views (PV) The statistics of
2. about PV Make a brief introduction :
One of the simplest indicators to measure website traffic , Is the page views of the website (Page View,PV). Every time a user opens a page, he records 1 Time PV, If you open the same page several times, the total number of views will be . Generally speaking ,PV It's proportional to the number of visitors , however PV It doesn't directly determine the number of real visitors to the page , It's like a visitor constantly refreshing the page , You can also make very high PV.
3. Ideas : Set the scrolling time window , Real time statistics of websites per hour
2. Code implementation
import org.apache.flink.api.scala._
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.streaming.api.windowing.time.Time
// Set input data type
case class userBehavior(userId:Long,itemId:Long,categoryId:Int,behavior:String,timestamp:Long)
object PageView {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
val data = env.readTextFile("E:\\WY\\programme\\MusicProject\\src\\main\\resources\\UserBehavior.csv")
val dataStream = data.map(data => {
val arr = data.split(",")
userBehavior(arr(0).toLong, arr(1).toLong, arr(2).toInt, arr(3), arr(4).toLong)
})
val resultStream = dataStream
.assignAscendingTimestamps(_.timestamp * 1000L)
.filter(_.behavior == "pv")
.map(x => ("pv", 1))
.keyBy(_._1)
.timeWindow(Time.seconds(60 * 60))
.sum(1)
resultStream.print()
env.execute()
}
}
3. Result display

边栏推荐
- 8_ Polynomial regression and model generalization
- [b01lers2020]Welcome to Earth
- Flink 1.15 local cluster deployment standalone mode (independent cluster mode)
- The difference between golang slice make and new
- 基于Flink实时项目:用户行为分析(三:网站总浏览量统计(PV))
- DOM day_ 03 (7.11) event bubbling mechanism, event delegation, to-do items, block default events, mouse coordinates, page scrolling events, create DOM elements, DOM encapsulation operations
- Promise基本用法 20211130
- MySql - 如何确定一个字段适合构建索引?
- Programmers must do 50 questions
- Only hard work, hard work and hard work are the only way out C - patient entity class
猜你喜欢
![[ciscn2019 North China division Day1 web2]ikun](/img/80/53f8253a80a80931ff56f4e684839e.png)
[ciscn2019 North China division Day1 web2]ikun

redis——缓存雪崩、缓存穿透、缓存击穿

JSCORE day_04(7.5)

Flink Interval Join源码理解
![[HITCON 2017]SSRFme](/img/ed/4b396e5685bfe025eb96e34a8bd6a3.png)
[HITCON 2017]SSRFme

Flink 1.15本地集群部署Standalone模式(独立集群模式)

JSCORE day_ 04(7.5)
![[Network Research Institute] attackers scan 1.6 million WordPress websites to find vulnerable plug-ins](/img/91/4d6e7d46599a67e3d7c73afb375abd.png)
[Network Research Institute] attackers scan 1.6 million WordPress websites to find vulnerable plug-ins

Consistency inspection and evaluation method kappa

MYSQL 使用及实现排名函数RANK、DENSE_RANK和ROW_NUMBER
随机推荐
[ciscn2019 southeast China division]double secret
[NCTF2019]SQLi
2022.7.18DAY608
[CTF攻防世界] WEB区 关于Cookie的题目
Medical data of more than 4000 people has been exposed for 16 years
select查询题目练习
(Spark调优~)算子的合理选择
Ansible MySQL installation case record
10 - CentOS 7 上部署MySql
MySQL Article 1
Only hard work, hard work and hard work are the only way out C - patient entity class
2022.7.10DAY602
mermaid
[SQL注入] 报错注入
[BJDCTF2020]EzPHP
forward和redirect的区别
golang实现AES有五种加密模式函数,Encrypt加解密字符串输出
The difference between golang slice make and new
[CTF攻防世界] WEB区 关于备份的题目
SSRF explanation and burp automatic detection SSRF