当前位置:网站首页>Spark实战1:单节点本地模式搭建Spark运行环境
Spark实战1:单节点本地模式搭建Spark运行环境
2022-07-03 12:39:00 【星哥玩云】
前言:
Spark本身用scala写的,运行在JVM之上。
JAVA版本:java 6 /higher edition.
1 下载Spark
http://spark.apache.org/downloads.html
你可以自己选择需要的版本,这里我的选择是:
http://d3kbcqa49mib13.cloudfront.net/spark-1.1.0-bin-hadoop1.tgz
如果你是奋发图强的好码农,你可以自己下载源码:http://github.com/apache/spark.
注意:我这里是运行在Linux环境下。没有条件的可以安装下虚拟机之上!
2 解压缩&进入目录
tar -zvxf spark-1.1.0-bin-Hadoop1.tgz
cd spark-1.1.0-bin-hadoop1/
3 启动shell
./bin/spark-shell
你会看到打印很多东西,最后显示
4 小试牛刀
先后执行下面几个语句
val lines = sc.textFile("README.md")
lines.count()
lines.first()
val pythonLines = lines.filter(line => line.contains("Python"))
scala> lines.first() res0: String = ## Interactive Python Shel
---解释,什么是sc
sc是默认产生的SparkContext对象。
比如
scala> sc res13: org.apache.spark.SparkContext = [email protected]
这里只是本地运行,先提前了解下分布式计算的示意图:
5 独立的程序
最后以一个例子结束本节
为了让它顺利运行,按照以下步骤来实施即可:
--------------目录结构如下:
/usr/local/spark-1.1.0-bin-hadoop1/test$ find . . ./src ./src/main ./src/main/scala ./src/main/scala/example.scala ./simple.sbt
然后simple.sbt的内容如下:
name := "Simple Project"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.1.0"
example.scala的内容如下:
import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.SparkContext._
object example { def main(args: Array[String]) { val conf = new SparkConf().setMaster("local").setAppName("My App") val sc = new SparkContext("local", "My App") sc.stop() //System.exit(0) //sys.exit() println("this system exit ok!!!") } }
红色local:一个集群的URL,这里是local,告诉spark如何连接一个集群,local表示在本机上以单线程运行而不需要连接到某个集群。
橙黄My App:一个项目的名字,
然后执行:sbt package
成功之后执行
./bin/spark-submit --class "example" ./target/scala-2.10/simple-project_2.10-1.0.jar
结果如下:
说明确实成功执行了!
结束!
边栏推荐
- [Database Principle and Application Tutorial (4th Edition | wechat Edition) Chen Zhibo] [sqlserver2012 comprehensive exercise]
- regular expression
- [Database Principle and Application Tutorial (4th Edition | wechat Edition) Chen Zhibo] [Chapter 7 exercises]
- 剑指 Offer 11. 旋转数组的最小数字
- [Database Principle and Application Tutorial (4th Edition | wechat Edition) Chen Zhibo] [Chapter 6 exercises]
- 【Colab】【使用外部数据的7种方法】
- Logback 日志框架
- Some thoughts on business
- OpenHarmony应用开发之ETS开发方式中的Image组件
- Flink SQL knows why (13): is it difficult to join streams? (next)
猜你喜欢
2022-02-14 analysis of the startup and request processing process of the incluxdb cluster Coordinator
双链笔记 RemNote 综合评测:快速输入、PDF 阅读、间隔重复/记忆
Logseq 评测:优点、缺点、评价、学习教程
JSP and filter
2022-02-09 survey of incluxdb cluster
35道MySQL面试必问题图解,这样也太好理解了吧
The 35 required questions in MySQL interview are illustrated, which is too easy to understand
Elk note 24 -- replace logstash consumption log with gohangout
2022-02-14 incluxdb cluster write data writetoshard parsing
[colab] [7 methods of using external data]
随机推荐
Understanding of CPU buffer line
106. 如何提高 SAP UI5 应用路由 url 的可读性
Deeply understand the mvcc mechanism of MySQL
MySQL
STM32 and motor development (from MCU to architecture design)
The difference between session and cookie
Oracle memory management
Logseq evaluation: advantages, disadvantages, evaluation, learning tutorial
Sword finger offer14 the easiest way to cut rope
CVPR 2022 image restoration paper
When we are doing flow batch integration, what are we doing?
SSH login server sends a reminder
Create a dojo progress bar programmatically: Dojo ProgressBar
【R】【密度聚类、层次聚类、期望最大化聚类】
Flink SQL knows why (13): is it difficult to join streams? (next)
Kotlin - improved decorator mode
My creation anniversary: the fifth anniversary
道路建设问题
已解决TypeError: Argument ‘parser‘ has incorrect type (expected lxml.etree._BaseParser, got type)
Sitescms v3.1.0 release, launch wechat applet