当前位置:网站首页>Spark-day01- get started quickly
Spark-day01- get started quickly
2022-06-26 12:09:00 【There will always be daylight】
1:spark What is it?
Spark It is a memory based fast 、 Universal 、 Scalable big data analysis and calculation engine .
2:spark and Hadoop The difference between - Use scenarios
Hadoop: One time data calculation , When the framework processes data , Will read data from the storage device , Perform logical operations , The result of the processing is then re stored in the media .
spark:spark and Hadoop The fundamental difference is the problem of data communication between multiple jobs :spark Data communication between multiple jobs is memory based , and Hadoop It's disk based
3:spark Core module
spark core:spark The most basic and core functions
spark sql:spark Components used to manipulate structured data
spark streaming:spark Platform for real-time data stream computing components
spark mllib: Machine learning algorithm library
spark graphx: The framework and algorithm library provided by graph computing
4:word count Two ways to implement the case ---- The second is commonly used

package com.atguigu.bigdata.spark.wc
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object Spark01_WordCount {
def main(args: Array[String]): Unit = {
//application
//spark frame
// Establish and spark The connection of , The local environment
val sparConf = new SparkConf().setMaster("local").setAppName("WordCount")
val sc = new SparkContext(sparConf);
// Perform business operations
//1. Read the file , Get row by row data "hello world"
val lines:RDD[String] = sc.textFile("datas");
//2. Split a row of data , Form words one by one "hello word"=>hello,word
val words:RDD[String] = lines.flatMap(_.split(" "));
//3. Group data according to words , Easy to count (hello,hello,hello)(world,world)
val wordGroup:RDD[(String,Iterable[String])] = words.groupBy(word=>word);
//4. Convert the grouped data (hello,3) (world,2)
val wordToCount = wordGroup.map{
case (word,list) => {
(word,list.size)
}
}
//5. Acquisition output console
val array:Array[(String,Int)] = wordToCount.collect()
array.foreach(println)
// Close the connection
sc.stop();
}
}

package com.atguigu.bigdata.spark.wc
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object Spark02_WordCount {
def main(args: Array[String]): Unit = {
//application
//spark frame
// Establish and spark The connection of , The local environment
val sparConf = new SparkConf().setMaster("local").setAppName("WordCount")
val sc = new SparkContext(sparConf);
// Perform business operations
//1. Read the file , Get row by row data "hello world"
val lines:RDD[String] = sc.textFile("datas");
//2. Split a row of data , Form words one by one "hello word"=>hello,word
val words:RDD[String] = lines.flatMap(_.split(" "));
//3. After each word , They are all marked with 1
val wordToOne = words.map{
word => (word,1)
}
//4. Group data according to words , Easy to count (hello,hello,hello)(world,world)
val groupRDD:RDD[(String,Iterable[(String,Int)])] = wordToOne.groupBy(
t => t._1
)
//5. Convert the grouped data (hello,3) (world,2)
val wordToCount = groupRDD.map{
case (word,list) => {
list.reduce(
(t1, t2) => {
(t1._1, t1._2 + t2._2)
}
)
}
}
//6. Acquisition output console
val array:Array[(String,Int)] = wordToCount.collect()
array.foreach(println)
// Close the connection
sc.stop();
}
}
5: Use spark function , Realization word count
package com.atguigu.bigdata.spark.wc
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object Spark03_WordCount {
def main(args: Array[String]): Unit = {
//application
//spark frame
// Establish and spark The connection of , The local environment
val sparConf = new SparkConf().setMaster("local").setAppName("WordCount")
val sc = new SparkContext(sparConf);
// Perform business operations
//1. Read the file , Get row by row data "hello world"
val lines:RDD[String] = sc.textFile("datas");
//2. Split a row of data , Form words one by one "hello word"=>hello,word
val words:RDD[String] = lines.flatMap(_.split(" "));
//3. After each word , They are all marked with 1
val wordToOne = words.map{
word => (word,1)
}
//spark The framework provides more functionality , Grouping and aggregation can be implemented using one method
// same key The data of , It can be done to value Conduct reduce polymerization
val wordToCount = wordToOne.reduceByKey((x,y) => {x+y})
val array:Array[(String,Int)] = wordToCount.collect()
array.foreach(println)
// Close the connection
sc.stop();
}
}
边栏推荐
- Member system + enterprise wechat + applet to help the efficient transformation of private domain
- Deep thinking from senior member managers
- On the use of protostaff [easy to understand]
- ctfshow web入门 命令执行web75-77
- Statistical genetics: Chapter 2, the concept of statistical analysis
- FasterRCNN
- 基于slate构建文档编辑器
- 请指教同花顺是什么软件?在线开户安全么?
- Random numbers in leetcode 710 blacklist [random numbers] the leetcode path of heroding
- Common problems and Thoughts on member operation management
猜你喜欢

leetcode 715. Range module (hard)

JMeter response time and TPS listener tutorial

MOS管基本原理,单片机重要知识点

Mqtt disconnect and reconnect

Ctfshow web getting started command execution web75-77

Ad - update the modified PCB package to the current PCB

4. N queen problem

Omni channel member link - tmall member link 3: preparation of member operation content

Apiccloud implements the document download and preview functions

Statistical genetics: Chapter 1, basic concepts of genome
随机推荐
VMware虚拟机 桥接模式 无法上网 校园网「建议收藏」
24 database interview questions that must be mastered!
修改calico网络模式为host-gw
CG bone animation
有手就行的移动平均法、指数平滑法的Excel操作,用来时间序列预测
Basic use of express in nodejs
NFS shared storage service installation
房租是由什么决定的
基于slate构建文档编辑器
I want to know whether flush is a stock market? Is online account opening safe?
Random numbers in leetcode 710 blacklist [random numbers] the leetcode path of heroding
动态规划解决股票问题(下)
高并发下如何防重?
Five trends of member management in 2022
Basic principle of MOS tube and important knowledge points of single chip microcomputer
统计遗传学:第一章,基因组基础概念
I want to know how the top ten securities firms open accounts? Is online account opening safe?
MOS管基本原理,单片机重要知识点
利用 Repository 中的方法解决实际问题
NFS共享存储服务安装