当前位置:网站首页>Spark: get the access volume of each time period in the log (entry level - simple implementation)
Spark: get the access volume of each time period in the log (entry level - simple implementation)
2022-07-24 14:37:00 【One's cow】
Take one hour as the time period to obtain the access volume of each time period in the log , The results are printed on the console .
Here is the code , Find the log file yourself .
import java.text.SimpleDateFormat
import java.util.Date
import org.apache.spark.{SparkConf, SparkContext}
object RDD_Operator_Transform_groupBy_Test {
def main(args: Array[String]): Unit = {
//TODO Create an environment
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("RDD")
val sc = new SparkContext(sparkConf)
//TODO RDD operator ——groupBy
println(" Get the access volume of each time period in the log ")
val rdd = sc.textFile("datas/apache.log")// File path
val timeRDD = rdd.map(
line => {
val datas = line.split(" ") // Separate... With spaces
val time = datas(3) // Take the third place
val sdf = new SimpleDateFormat("DD/MM/YYYY:hh:mm:ss")
val date: Date = sdf.parse(time) // Parsing time
val sdf1 = new SimpleDateFormat("hh") // The time period is hours
val hour: String = sdf1.format(date) // Incoming time
(hour, 1)
}
).groupBy(_._1)
timeRDD.map{
case (hour, iter) => { // Pattern matching
(hour, iter.size)
}
}.collect().foreach(println)
//TODO Shut down the environment
sc.stop()
}
}

边栏推荐
- 小熊派 课程导读
- Learn science minimize
- After five years of contact with nearly 100 bosses, as a headhunter, I found that the secret of promotion was only four words
- Grpc middleware implements grpc call retry
- Rest style
- 解决 uni-starter 使用本地函数可以登录微信 但是使用云函数登录失败
- C language -- three ways to realize student information management
- Extjs4 instance address and Chinese document address
- bibliometrix: 从千万篇论文中挖掘出最值得读的那一篇!
- Don't lose heart. The famous research on the explosive influence of Yolo and PageRank has been rejected by the CS summit
猜你喜欢

Beijing all in one card listed and sold 68.45% of its equity at 352.888529 million yuan, with a premium rate of 84%

Under multi data source configuration, solve org.apache.ibatis.binding Bindingexception: invalid bound statement (not found) problem

mysql

How vscode debug nodejs

Learning and thinking about the relevant knowledge in the direction of building network security knowledge base
![[oauth2] IV. oauth2authorizationrequestredirectfilter](/img/42/fff83a8d477e2f2d07d1f5ad4e4405.png)
[oauth2] IV. oauth2authorizationrequestredirectfilter

Detailed explanation of address bus, data bus and control bus

茅台冰淇淋“逆势”走红,跨界之意却并不在“卖雪糕”
![[C language note sharing] - dynamic memory management malloc, free, calloc, realloc, flexible array](/img/3f/35c9ff3be5c0ef781ffcb537287a20.png)
[C language note sharing] - dynamic memory management malloc, free, calloc, realloc, flexible array

REST风格
随机推荐
Nodejs uses the express framework to post the request message "badrequesterror:request aborted"
Ztree tree Metro style mouse through the display user-defined controls add, edit, delete, down, up operations
Simple understanding and implementation of unity delegate
exchange
[oauth2] III. interpretation of oauth2 configuration
TS learning record (I) sudo forgets the password (oolong) try changing the 'lib' compiler option to include 'DOM'
IEEE Transaction期刊模板使用注意事项
Overview of dobesie wavelet (DB wavelet function) in wavelet transform
LeetCode·每日一题·1184.公交站间的距离·模拟
CSDN垃圾的没有底线!
【NLP】下一站,Embodied AI
Detailed explanation of IO model (easy to understand)
The vs compiled application is missing DLL
Binlog and iptables prevent nmap scanning, xtrabackup full + incremental backup, and the relationship between redlog and binlog
记不住正则表达式?这里我整理了99个常用正则
Comparison of traversal speed between map and list
电赛设计报告模板及历年资源
不要灰心,大名鼎鼎的YOLO、PageRank影响力爆棚的研究,曾被CS顶会拒稿
Usage differences of drop, truncate and delete
本机异步网络通信执行快于同步指令