当前位置:网站首页>Spark DF增加一列
Spark DF增加一列
2022-07-25 15:10:00 【南风知我意丿】
文章目录
方法一:利用createDataFrame方法,新增列的过程包含在构建rdd和schema中
val trdd = input.select(targetColumns).rdd.map(x=>{
if (x.get(0).toString().toDouble > critValueR || x.get(0).toString().toDouble < critValueL)
Row(x.get(0).toString().toDouble,"F")
else Row(x.get(0).toString().toDouble,"T")
})
val schema = input.select(targetColumns).schema.add("flag", StringType, true)
val sample3 = ss.createDataFrame(trdd, schema).distinct().withColumnRenamed(targetColumns, "idx")
方法二:利用withColumn方法,新增列的过程包含在udf函数中
val code :(Int => String) = (arg: Int) => {
if (arg > critValueR || arg < critValueL) "F" else "T"}
val addCol = udf(code)
val sample3 = input.select(targetColumns).withColumn("flag", addCol(input(targetColumns)))
.withColumnRenamed(targetColumns, "idx")
方法三:利用SQL代码,新增列的过程直接写入SQL代码中
input.select(targetColumns).createOrReplaceTempView("tmp")
val sample3 = ss.sqlContext.sql("select distinct "+targetColname+
" as idx,case when "+targetColname+">"+critValueR+" then 'F'"+
" when "+targetColname+"<"+critValueL+" then 'F' else 'T' end as flag from tmp")
方法四:以上三种是增加一个有判断的列,如果想要增加一列唯一序号,可以使用monotonically_increasing_id
//添加序号列新增一列方法4
import org.apache.spark.sql.functions.monotonically_increasing_id
val inputnew = input.withColumn("idx", monotonically_increasing_id)
边栏推荐
- "Ask every day" briefly talk about JMM / talk about your understanding of JMM
- SQL Server forcibly disconnects
- Spark002---spark任务提交,传入json作为参数
- 6线SPI传输模式探索
- [Nacos] what does nacosclient do during service registration
- Stored procedure bias of SQL to LINQ
- 35 quick format code
- Raft of distributed consistency protocol
- 44 Sina navigation, Xiaomi sidebar exercise
- [C topic] force buckle 876. Intermediate node of linked list
猜你喜欢

Docker上运行redis以配置文件方式启动,连接客户端报错Error: Server closed the connection

VS2010 add WAP mobile form template

API health status self inspection

SPI传输出现数据与时钟不匹配延后问题分析与解决

打开虚拟机时出现VMware Workstation 未能启动 VMware Authorization Service

As methods for viewing and excluding dependencies

延迟加载源码剖析:

oracle_12505错误解决方法

Introduction to raspberry Pie: initial settings of raspberry pie

Bridge NF call ip6tables is an unknown key exception handling
随机推荐
Stored procedure bias of SQL to LINQ
在win10系统下使用命令查看WiFi连接密码
防抖(debounce)和节流(throttle)
LeetCode第 303 场周赛
iframe嵌套其它网站页面 全屏设置
39 simple version of millet sidebar exercise
流程控制(上)
Spark 判断DF为空
打开虚拟机时出现VMware Workstation 未能启动 VMware Authorization Service
TypeScript学习2——接口
vscode 插件篇收集
6线SPI传输模式探索
Leetcode combination sum + pruning
sql to linq 之存储过程偏
处理ORACLE死锁
推荐10个堪称神器的学习网站
"How to use" observer mode
43 box model
Leo-sam: tightly coupled laser inertial odometer with smoothing and mapping
什么是物联网