当前位置:网站首页>基于Flink实时项目:用户行为分析(三:网站总浏览量统计(PV))
基于Flink实时项目:用户行为分析(三:网站总浏览量统计(PV))
2022-07-26 22:40:00 【不会打球的摄影师不是好程序员】
1.需求:
1.网站总浏览量(PV)的统计
2.对于PV做一个简介:
衡量网站流量一个最简单的指标,就是网站的页面浏览量(Page View,PV)。用户每次打开一个页面便记录 1 次 PV,多次打开同一页面则浏览量累计。一般来说,PV 与来访者的数量成正比,但是 PV 并不直接决定页面的真实来访者数量,如同一个来访者通过不断的刷新页面,也可以制造出非常高的 PV。
3.思路:设置滚动时间窗口,实时统计每小时内的网站
2.代码实现
import org.apache.flink.api.scala._
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.streaming.api.windowing.time.Time
//设置输入数据类型
case class userBehavior(userId:Long,itemId:Long,categoryId:Int,behavior:String,timestamp:Long)
object PageView {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
val data = env.readTextFile("E:\\WY\\programme\\MusicProject\\src\\main\\resources\\UserBehavior.csv")
val dataStream = data.map(data => {
val arr = data.split(",")
userBehavior(arr(0).toLong, arr(1).toLong, arr(2).toInt, arr(3), arr(4).toLong)
})
val resultStream = dataStream
.assignAscendingTimestamps(_.timestamp * 1000L)
.filter(_.behavior == "pv")
.map(x => ("pv", 1))
.keyBy(_._1)
.timeWindow(Time.seconds(60 * 60))
.sum(1)
resultStream.print()
env.execute()
}
}
3.结果展示

边栏推荐
猜你喜欢

Two or three things about redis

BUUCTF-随便注、Exec、EasySQL、Secret File

DOM day_03(7.11) 事件冒泡机制、事件委托、待办事项、阻止默认事件、鼠标坐标、页面滚动事件、创建DOM元素、DOM封装操作

DOM day_04(7.12)BOM、打开新页面(延迟打开)、地址栏操作、浏览器信息读取、历史操作

3_ Jupiter notebook, numpy and mattlotlib

Flink1.11 intervalJoin watermark生成,状态清理机制源码理解&Demo分析

DOM day_ 04 (7.12) BOM, open new page (delayed opening), address bar operation, browser information reading, historical operation

Detailed explanation of CSRF forged user request attack

Flink 1.15实现 Sql 脚本从savepointh恢复数据

DOM day_02(7.8)网页制作流程、图片src属性、轮播图、自定义属性、标签栏、输入框事件、勾选操作、访问器语法
随机推荐
The company gave how to use the IP address (detailed version)
[4.1 prime number and linear sieve]
On the expression of thymeleaf
Point to plane projection
JSCORE day_04(7.5)
[RootersCTF2019]I_< 3_ Flask
[SQL注入] 联合查询
[qt] solve the problem of Chinese garbled code
07 - 日志服务器的搭建与攻击
JSCORE day_ 02(7.1)
MySQL common functions (summary)
Flink面试常见的25个问题(无答案)
flinksql 窗口提前触发
Two or three things about redis
C # conversion of basic data types for entry
JS screen detection method summary 2021-10-05
3_ Jupiter notebook, numpy and mattlotlib
Consistency inspection and evaluation method kappa
[interview: concurrent Article 16: multithreading: detailed explanation of wait/notify] principle and wrong usage (false wake-up, etc.)
Valueerror: the device should not be 'GPU', since paddepaddle is not compiled with CUDA