当前位置:网站首页>Spark's action operator
Spark's action operator
2022-07-01 09:26:00 【Diligent ls】
Because the conversion operators are lazy , Not immediately executed , Execute only when encountering action operator .
Catalog
1.reduce()
polymerization ,f Function aggregation RDD All elements in , Aggregate the data in the partition first , Re aggregate inter partition data .

val listRDD: RDD[Int] = sc.makeRDD(List(1, 2, 3, 4),2)
// Action operator
// reduce
// There is calculation logic within and between partitions In the partition, the elements are calculated in sequence The first element is the initial value
// Partitions are not necessarily According to which partition is calculated first, which partition is used as the initial value
val i: Int = listRDD.reduce(_ -_)
println(i)2.collect()
In the driver , In array Array Returns all elements of the dataset in the form of .

3.count()
return RDD The number of elements in

//count Statistics RDD The number of
val l: Long = listRDD.count()
println(l)4.first()
return RDD The first element in

// first When you take the first element Be sure to take 0 Data from partition number
val i: Int = listRDD.first()
println(i)5.take()
Returns a RDD Before n An array of elements

// take It can identify the specific partition data from 0 Partition No. begins to fetch data
val ints: Array[Int] = listRDD.take(2)
println(ints)6.takeOrdered()
Return to the RDD The first after sorting n An array of elements

// takeOrdered
// First pair rdd Sort data in Then take the front n individual
// If you need to sort in reverse You need to fill in implicit parameters
val array: Array[Int] = listRDD.takeOrdered(3)
val array1: Array[Int] = listRDD.takeOrdered(3) (Ordering[Int].reverse)
println(array.toList)
// Take the data first and then sort
val sorted: Array[Int] = listRDD.take(3).sorted
println(sorted.toList)7.aggregate()
The elements in the partition are aggregated according to the logic and initial values in the partition , Then aggregate according to the logic between partitions and initial values

val listRDD: RDD[Int] = sc.makeRDD(List(1, 2, 3, 4),4)
val i: Int = listRDD.aggregate(10)(_-_,_-_)
println(i)
8.fold()
aggregate Simplified version of operation , Intra partition and inter partition logic are the same .

// fold Computational logic and aggregate identical Will use the initial value twice '
// The calculation logic within and between partitions is the same
val j: Int = listRDD.fold(10)(_ + _)
val f: Int = listRDD.fold(10)(_ - _)
println(j)
println(f)9.countByKey()
Count each key The number of

val value: RDD[(String, Int)] = sc.makeRDD(
List(("a", 10), ("b", 7), ("a", 11), ("b", 21)), 4)
val stringToLong: collection.Map[String, Long] = value.countByKey()
println(stringToLong)10.save
1.saveAsTextFile(path) Save as Text file
Set the elements of the dataset as textfile In the form of HDFS File systems or other supported file systems , For each element ,Spark Will call toString Method , Replace it with the text in the file .
2.saveAsSequenceFile(path) Save as Sequencefile file
Set the elements in the dataset as Hadoop Sequencefile Save the format to the specified directory , You can make HDFS Or other Hadoop Supported file systems . notes : Only KV Type has this operation .
3.saveAsObjectFile(path) Serialized objects are saved to a file
Is used to RDD Elements in are sequenced into objects , Store in file .
// Save based text file You can read it directly
value.saveAsTextFile("output")
// saveAsSequenceFile Can only be used for binary rdd
value.saveAsSequenceFile("output")
value.saveAsObjectFile("output")11.foreach()
Traverse RDD Each element in

val intRDD: RDD[Int] = sc.makeRDD(1 to 20, 5)
// Use collect Add print
val ints: Array[Int] = intRDD.collect()
ints.foreach(println)
// Use foreach Print directly
// Directly in ex End for printing Multithreaded printing Overall disorder The partition is orderly
intRDD.foreach(println)
边栏推荐
- [ESP nanny level tutorial preview] crazy node JS server - Case: esp8266 + DHT11 +nodejs local service + MySQL database
- Principle and application of single chip microcomputer timer, serial communication and interrupt system
- Construction of esp8266 FreeRTOS development environment
- 2.3 [pytorch] data preprocessing torchvision datasets. ImageFolder
- SDN_ Simple summary
- How Kolo enables NFT music industry
- R language observation log (part24) -- initialization settings
- Understanding and implementation of AVL tree
- JS functionarguments object
- SQL学习笔记(04)——数据更新、查询操作
猜你喜欢

树结构---二叉树2非递归遍历

3D printing Arduino four axis aircraft
![[pytorch] 2.4 convolution function nn conv2d](/img/eb/382a00af5f88d5954f10ea76343d6e.png)
[pytorch] 2.4 convolution function nn conv2d

How to realize the usage of connecting multiple databases in idel

SQL学习笔记(01)——数据库基本知识

Nacos service configuration and persistence configuration

Redis -- lattice connects to redis cluster

Reproduced Xray - cve-2017-7921 (unauthorized access by Hikvision)

Construction of esp8266 FreeRTOS development environment

Why is the Ltd independent station a Web3.0 website!
随机推荐
How Kolo enables NFT music industry
Youqitong PE toolbox [vip] v3.7.2022.0106 official January 22 Edition
Log4j 日志框架
MapReduce programming basics
Phishing identification app
Mise en œuvre simple de l'équilibrage de la charge par nacos
Network counting 01 physical layer
序列化、监听、自定义注解
Understand shallow replication and deep replication through code examples
Set the type of the input tag to number, and remove the up and down arrows
[ESP nanny level tutorial preview] crazy node JS server - Case: esp8266 + MQ Series + nodejs local service + MySQL storage
[ESP nanny level tutorial] crazy completion chapter - Case: gy906 infrared temperature measurement access card swiping system based on the Internet of things
Class loading
Closure implementation iterator effect
【电赛训练】红外光通信装置 2013年电赛真题
node. How to implement the SQL statement after JS connects to the database?
js 使用toString 区分Object、Array
【pytorch】nn. AdaptiveMaxPool2d
MapReduce编程基础
PR training notes