当前位置:网站首页>Spark's action operator
Spark's action operator
2022-07-01 09:26:00 【Diligent ls】
Because the conversion operators are lazy , Not immediately executed , Execute only when encountering action operator .
Catalog
1.reduce()
polymerization ,f Function aggregation RDD All elements in , Aggregate the data in the partition first , Re aggregate inter partition data .

val listRDD: RDD[Int] = sc.makeRDD(List(1, 2, 3, 4),2)
// Action operator
// reduce
// There is calculation logic within and between partitions In the partition, the elements are calculated in sequence The first element is the initial value
// Partitions are not necessarily According to which partition is calculated first, which partition is used as the initial value
val i: Int = listRDD.reduce(_ -_)
println(i)2.collect()
In the driver , In array Array Returns all elements of the dataset in the form of .

3.count()
return RDD The number of elements in

//count Statistics RDD The number of
val l: Long = listRDD.count()
println(l)4.first()
return RDD The first element in

// first When you take the first element Be sure to take 0 Data from partition number
val i: Int = listRDD.first()
println(i)5.take()
Returns a RDD Before n An array of elements

// take It can identify the specific partition data from 0 Partition No. begins to fetch data
val ints: Array[Int] = listRDD.take(2)
println(ints)6.takeOrdered()
Return to the RDD The first after sorting n An array of elements

// takeOrdered
// First pair rdd Sort data in Then take the front n individual
// If you need to sort in reverse You need to fill in implicit parameters
val array: Array[Int] = listRDD.takeOrdered(3)
val array1: Array[Int] = listRDD.takeOrdered(3) (Ordering[Int].reverse)
println(array.toList)
// Take the data first and then sort
val sorted: Array[Int] = listRDD.take(3).sorted
println(sorted.toList)7.aggregate()
The elements in the partition are aggregated according to the logic and initial values in the partition , Then aggregate according to the logic between partitions and initial values

val listRDD: RDD[Int] = sc.makeRDD(List(1, 2, 3, 4),4)
val i: Int = listRDD.aggregate(10)(_-_,_-_)
println(i)
8.fold()
aggregate Simplified version of operation , Intra partition and inter partition logic are the same .

// fold Computational logic and aggregate identical Will use the initial value twice '
// The calculation logic within and between partitions is the same
val j: Int = listRDD.fold(10)(_ + _)
val f: Int = listRDD.fold(10)(_ - _)
println(j)
println(f)9.countByKey()
Count each key The number of

val value: RDD[(String, Int)] = sc.makeRDD(
List(("a", 10), ("b", 7), ("a", 11), ("b", 21)), 4)
val stringToLong: collection.Map[String, Long] = value.countByKey()
println(stringToLong)10.save
1.saveAsTextFile(path) Save as Text file
Set the elements of the dataset as textfile In the form of HDFS File systems or other supported file systems , For each element ,Spark Will call toString Method , Replace it with the text in the file .
2.saveAsSequenceFile(path) Save as Sequencefile file
Set the elements in the dataset as Hadoop Sequencefile Save the format to the specified directory , You can make HDFS Or other Hadoop Supported file systems . notes : Only KV Type has this operation .
3.saveAsObjectFile(path) Serialized objects are saved to a file
Is used to RDD Elements in are sequenced into objects , Store in file .
// Save based text file You can read it directly
value.saveAsTextFile("output")
// saveAsSequenceFile Can only be used for binary rdd
value.saveAsSequenceFile("output")
value.saveAsObjectFile("output")11.foreach()
Traverse RDD Each element in

val intRDD: RDD[Int] = sc.makeRDD(1 to 20, 5)
// Use collect Add print
val ints: Array[Int] = intRDD.collect()
ints.foreach(println)
// Use foreach Print directly
// Directly in ex End for printing Multithreaded printing Overall disorder The partition is orderly
intRDD.foreach(println)
边栏推荐
- 【pytorch】transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
- MapReduce programming basics
- How to launch circle of friends marketing and wechat group activities
- 富文本实现插值
- 记一次redis超时
- 序列化、监听、自定义注解
- Get the list of a column in phpexcel get the letters of a column
- Mysql8.0 learning record 17 -create table
- 2.4 激活函数
- 2.2 【pytorch】torchvision. transforms
猜你喜欢

Installation and use of NoSQL database

Tree structure -- binary tree 2 non recursive traversal

【pytorch】nn.CrossEntropyLoss() 与 nn.NLLLoss()

Implementation and application of queue
![2.3 [kaggle dataset - dog feed example] data preprocessing, rewriting dataset, dataloader reading data](/img/6e/d8ef618127ac492f5142f7b600266d.png)
2.3 [kaggle dataset - dog feed example] data preprocessing, rewriting dataset, dataloader reading data

Reproduced Xray - cve-2017-7921 (unauthorized access by Hikvision)

How to launch circle of friends marketing and wechat group activities
![[interview brush 101] linked list](/img/52/d159bc66c0dbc44c1282a96cf6b2fd.png)
[interview brush 101] linked list

Construction of esp8266 FreeRTOS development environment
![[pytorch] softmax function](/img/97/b8ae22e8496a77e665d716cb0e9ee3.png)
[pytorch] softmax function
随机推荐
[pytorch] softmax function
Problems caused by delete and delete[]
Simple load balancing with Nacos
Rich text interpolation
PR training notes
【pytorch】nn.AdaptiveMaxPool2d
Tree structure --- binary tree 1
Understanding and implementation of AVL tree
树结构---二叉树2非递归遍历
Leetcode daily question brushing record --540 A single element in an ordered array
类加载
Principles of Microcomputer - Introduction
R language observation log (part24) -- initialization settings
JS原型链
js this丢失问题分析 及 解决方案
【pytorch】nn.CrossEntropyLoss() 与 nn.NLLLoss()
NoSQL数据库的安装和使用
2.2 【pytorch】torchvision. transforms
利用闭包实现私有变量
Redis -- lattice connects to redis cluster