当前位置:网站首页>RDD creation method of spark
RDD creation method of spark
2022-07-06 02:04:00 【Diligent ls】
stay Spark Created in RDD There are three ways to create : Create... From the collection RDD、 Create... From external storage RDD、 From the other RDD establish .
Creation time environment dependency
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.0.0</version>
</dependency>
</dependencies>
<build>
<finalName>SparkCoreTest</finalName>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.4.6</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
1. Create... From collection
object createrdd {
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf()
.setAppName("SparkCoreTest")
.setMaster("local[*]")
val sc: SparkContext = new SparkContext(conf)
// Use parallelize() establish rdd
//val rdd: RDD[Int] = sc.parallelize(Array(1,2,3,4,5,6))
// rdd.collect().foreach(println)
// Use makeRDD() establish rdd
val rdd1: RDD[Int] = sc.makeRDD(Array(1,2,3,4,5,6))
rdd1.collect().foreach(println)
sc.stop()
}
}
notes :makeRDD Not exactly equal to parallelize, In one of the refactoring methods ,makeRDD Added location information .
2. Create from a dataset of an external storage system
object crearedd2 {
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf()
.setAppName("WC")
.setMaster("local[*]")
val sc: SparkContext = new SparkContext(conf)
val value: RDD[String] = sc.textFile("input")
value.foreach(println)
sc.stop()
}
}
3. From the other RDD establish
Mainly through a RDD After the calculation , And create new RDD.
边栏推荐
- Blue Bridge Cup embedded_ STM32 learning_ Key_ Explain in detail
- 【Flask】官方教程(Tutorial)-part2:蓝图-视图、模板、静态文件
- 01.Go语言介绍
- A Cooperative Approach to Particle Swarm Optimization
- leetcode-两数之和
- How to improve the level of pinduoduo store? Dianyingtong came to tell you
- Redis key operation
- Computer graduation design PHP enterprise staff training management system
- [flask] official tutorial -part1: project layout, application settings, definition and database access
- Leetcode sum of two numbers
猜你喜欢
【Flask】官方教程(Tutorial)-part2:蓝图-视图、模板、静态文件
Using SA token to solve websocket handshake authentication
Basic operations of databases and tables ----- default constraints
2022年PMP项目管理考试敏捷知识点(8)
How to upgrade kubernetes in place
使用npm发布自己开发的工具包笔记
Basic operations of databases and tables ----- unique constraints
Computer graduation design PHP college classroom application management system
Online reservation system of sports venues based on PHP
Social networking website for college students based on computer graduation design PHP
随机推荐
02.Go语言开发环境配置
module ‘tensorflow. contrib. data‘ has no attribute ‘dataset
Accelerating spark data access with alluxio in kubernetes
Kubernetes stateless application expansion and contraction capacity
使用npm发布自己开发的工具包笔记
Cadre du Paddle: aperçu du paddlelnp [bibliothèque de développement pour le traitement du langage naturel des rames volantes]
Unity learning notes -- 2D one-way platform production method
Publish your own toolkit notes using NPM
[depth first search] Ji Suan Ke: Betsy's trip
Computer graduation design PHP college classroom application management system
Redis-Key的操作
leetcode-2. Palindrome judgment
Computer graduation design PHP animation information website
Campus second-hand transaction based on wechat applet
Selenium waiting mode
Comments on flowable source code (XXXV) timer activation process definition processor, process instance migration job processor
You are using pip version 21.1.1; however, version 22.0.3 is available. You should consider upgradin
leetcode3、实现 strStr()
插卡4G工业路由器充电桩智能柜专网视频监控4G转以太网转WiFi有线网速测试 软硬件定制
NumPy 数组索引 切片