当前位置:网站首页>RDD creation method of spark

RDD creation method of spark

2022-07-06 02:04:00 Diligent ls

         stay Spark Created in RDD There are three ways to create : Create... From the collection RDD、 Create... From external storage RDD、 From the other RDD establish .

Creation time environment dependency



1. Create... From collection

object createrdd {
  def main(args: Array[String]): Unit = {
    val conf: SparkConf = new SparkConf()
    val sc: SparkContext = new SparkContext(conf)

    // Use parallelize() establish rdd
    //val rdd: RDD[Int] = sc.parallelize(Array(1,2,3,4,5,6))

   // rdd.collect().foreach(println)
    // Use makeRDD() establish rdd
    val rdd1: RDD[Int] = sc.makeRDD(Array(1,2,3,4,5,6))

notes :makeRDD Not exactly equal to parallelize, In one of the refactoring methods ,makeRDD Added location information .

2. Create from a dataset of an external storage system

object crearedd2 {
  def main(args: Array[String]): Unit = {
    val conf: SparkConf = new SparkConf()
    val sc: SparkContext = new SparkContext(conf)

    val value: RDD[String] = sc.textFile("input")

3. From the other RDD establish

         Mainly through a RDD After the calculation , And create new RDD.


本文为[Diligent ls]所创,转载请带上原文链接,感谢