当前位置:网站首页>Spark's action operator
Spark's action operator
2022-07-01 09:26:00 【Diligent ls】
Because the conversion operators are lazy , Not immediately executed , Execute only when encountering action operator .
Catalog
1.reduce()
polymerization ,f Function aggregation RDD All elements in , Aggregate the data in the partition first , Re aggregate inter partition data .

val listRDD: RDD[Int] = sc.makeRDD(List(1, 2, 3, 4),2)
// Action operator
// reduce
// There is calculation logic within and between partitions In the partition, the elements are calculated in sequence The first element is the initial value
// Partitions are not necessarily According to which partition is calculated first, which partition is used as the initial value
val i: Int = listRDD.reduce(_ -_)
println(i)2.collect()
In the driver , In array Array Returns all elements of the dataset in the form of .

3.count()
return RDD The number of elements in

//count Statistics RDD The number of
val l: Long = listRDD.count()
println(l)4.first()
return RDD The first element in

// first When you take the first element Be sure to take 0 Data from partition number
val i: Int = listRDD.first()
println(i)5.take()
Returns a RDD Before n An array of elements

// take It can identify the specific partition data from 0 Partition No. begins to fetch data
val ints: Array[Int] = listRDD.take(2)
println(ints)6.takeOrdered()
Return to the RDD The first after sorting n An array of elements

// takeOrdered
// First pair rdd Sort data in Then take the front n individual
// If you need to sort in reverse You need to fill in implicit parameters
val array: Array[Int] = listRDD.takeOrdered(3)
val array1: Array[Int] = listRDD.takeOrdered(3) (Ordering[Int].reverse)
println(array.toList)
// Take the data first and then sort
val sorted: Array[Int] = listRDD.take(3).sorted
println(sorted.toList)7.aggregate()
The elements in the partition are aggregated according to the logic and initial values in the partition , Then aggregate according to the logic between partitions and initial values

val listRDD: RDD[Int] = sc.makeRDD(List(1, 2, 3, 4),4)
val i: Int = listRDD.aggregate(10)(_-_,_-_)
println(i)
8.fold()
aggregate Simplified version of operation , Intra partition and inter partition logic are the same .

// fold Computational logic and aggregate identical Will use the initial value twice '
// The calculation logic within and between partitions is the same
val j: Int = listRDD.fold(10)(_ + _)
val f: Int = listRDD.fold(10)(_ - _)
println(j)
println(f)9.countByKey()
Count each key The number of

val value: RDD[(String, Int)] = sc.makeRDD(
List(("a", 10), ("b", 7), ("a", 11), ("b", 21)), 4)
val stringToLong: collection.Map[String, Long] = value.countByKey()
println(stringToLong)10.save
1.saveAsTextFile(path) Save as Text file
Set the elements of the dataset as textfile In the form of HDFS File systems or other supported file systems , For each element ,Spark Will call toString Method , Replace it with the text in the file .
2.saveAsSequenceFile(path) Save as Sequencefile file
Set the elements in the dataset as Hadoop Sequencefile Save the format to the specified directory , You can make HDFS Or other Hadoop Supported file systems . notes : Only KV Type has this operation .
3.saveAsObjectFile(path) Serialized objects are saved to a file
Is used to RDD Elements in are sequenced into objects , Store in file .
// Save based text file You can read it directly
value.saveAsTextFile("output")
// saveAsSequenceFile Can only be used for binary rdd
value.saveAsSequenceFile("output")
value.saveAsObjectFile("output")11.foreach()
Traverse RDD Each element in

val intRDD: RDD[Int] = sc.makeRDD(1 to 20, 5)
// Use collect Add print
val ints: Array[Int] = intRDD.collect()
ints.foreach(println)
// Use foreach Print directly
// Directly in ex End for printing Multithreaded printing Overall disorder The partition is orderly
intRDD.foreach(println)
边栏推荐
- 2022.02.15_ Daily question leetcode six hundred and ninety
- [ESP nanny level tutorial] crazy completion chapter - Case: chemical environment system detection based on Alibaba cloud and Arduino, supporting nail robot alarm
- Principle and application of single chip microcomputer timer, serial communication and interrupt system
- delete和delete[]引发的问题
- Set the type of the input tag to number, and remove the up and down arrows
- 【ESP 保姆级教程】疯狂毕设篇 —— 案例:基于阿里云、小程序、Arduino的WS2812灯控系统
- Using closures to implement private variables
- Exception handling of classes in C #
- Mise en œuvre simple de l'équilibrage de la charge par nacos
- [pytorch] 2.4 convolution function nn conv2d
猜你喜欢

2.4 激活函数

Mysql 优化

NiO zero copy
![delete和delete[]引发的问题](/img/d9/a1c3e5ce51ef1be366a973aa42d1f0.png)
delete和delete[]引发的问题

Learning practice: comprehensive application of cycle and branch structure (II)

2.3 【kaggle数据集 - dog breed 举例】数据预处理、重写Dataset、DataLoader读取数据

JS prototype chain

An overview of the design of royalties and service fees of mainstream NFT market platforms

树结构---二叉树2非递归遍历

Implementation and application of queue
随机推荐
Rich text interpolation
tensorrt yolov5_ trt. Py comments
Mysql8.0 learning record 17 -create table
【pytorch】transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
MySQL optimization
Niuke monthly race 22 tree sub chain
Record a redis timeout
delete和delete[]引发的问题
Preparing for the Blue Bridge Cup -- bit operation
Error org apache. catalina. core. StandardContext. FilterStart start filter exception
树结构---二叉树2非递归遍历
Understand shallow replication and deep replication through code examples
An overview of the design of royalties and service fees of mainstream NFT market platforms
记一次redis超时
Flink interview questions
SDN_简单总结
js重写自己的函数
Youqitong PE toolbox [vip] v3.7.2022.0106 official January 22 Edition
js原型继承仅可继承实例而非构造器
Understanding and implementation of AVL tree