当前位置:网站首页>Spark DF增加一列
Spark DF增加一列
2022-07-06 00:23:00 【南风知我意丿】
文章目录
方法一:利用createDataFrame方法,新增列的过程包含在构建rdd和schema中
val trdd = input.select(targetColumns).rdd.map(x=>{
if (x.get(0).toString().toDouble > critValueR || x.get(0).toString().toDouble < critValueL)
Row(x.get(0).toString().toDouble,"F")
else Row(x.get(0).toString().toDouble,"T")
})
val schema = input.select(targetColumns).schema.add("flag", StringType, true)
val sample3 = ss.createDataFrame(trdd, schema).distinct().withColumnRenamed(targetColumns, "idx")
方法二:利用withColumn方法,新增列的过程包含在udf函数中
val code :(Int => String) = (arg: Int) => {
if (arg > critValueR || arg < critValueL) "F" else "T"}
val addCol = udf(code)
val sample3 = input.select(targetColumns).withColumn("flag", addCol(input(targetColumns)))
.withColumnRenamed(targetColumns, "idx")
方法三:利用SQL代码,新增列的过程直接写入SQL代码中
input.select(targetColumns).createOrReplaceTempView("tmp")
val sample3 = ss.sqlContext.sql("select distinct "+targetColname+
" as idx,case when "+targetColname+">"+critValueR+" then 'F'"+
" when "+targetColname+"<"+critValueL+" then 'F' else 'T' end as flag from tmp")
方法四:以上三种是增加一个有判断的列,如果想要增加一列唯一序号,可以使用monotonically_increasing_id
//添加序号列新增一列方法4
import org.apache.spark.sql.functions.monotonically_increasing_id
val inputnew = input.withColumn("idx", monotonically_increasing_id)
边栏推荐
- 从底层结构开始学习FPGA----FIFO IP核及其关键参数介绍
- DEJA_VU3D - Cesium功能集 之 055-国内外各厂商地图服务地址汇总说明
- MySQL之函数
- Hudi of data Lake (2): Hudi compilation
- Leetcode:20220213 week race (less bugs, top 10% 555)
- PV static creation and dynamic creation
- 选择致敬持续奋斗背后的精神——对话威尔价值观【第四期】
- Classic CTF topic about FTP protocol
- Global and Chinese markets for hinged watertight doors 2022-2028: Research Report on technology, participants, trends, market size and share
- MySql——CRUD
猜你喜欢
Tools to improve work efficiency: the idea of SQL batch generation tools
After summarizing more than 800 kubectl aliases, I'm no longer afraid that I can't remember commands!
STM32 configuration after chip replacement and possible errors
FFT 学习笔记(自认为详细)
Analysis of the combination of small program technology advantages and industrial Internet
Atcoder beginer contest 254 [VP record]
MDK debug时设置数据实时更新
Knowledge about the memory size occupied by the structure
Opencv classic 100 questions
权限问题:source .bash_profile permission denied
随机推荐
LeetCode 6005. The minimum operand to make an array an alternating array
Classical concurrency problem: the dining problem of philosophers
《编程之美》读书笔记
Leetcode 450 deleting nodes in a binary search tree
[binary search tree] add, delete, modify and query function code implementation
剖面测量之提取剖面数据
行列式学习笔记(一)
Recognize the small experiment of extracting and displaying Mel spectrum (observe the difference between different y_axis and x_axis)
SQLServer连接数据库读取中文乱码问题解决
How much do you know about the bank deposit business that software test engineers must know?
Anconda download + add Tsinghua +tensorflow installation +no module named 'tensorflow' +kernelrestart: restart failed, kernel restart failed
LeetCode 6004. Get operands of 0
wx. Getlocation (object object) application method, latest version
LeetCode 斐波那契序列
【DesignMode】适配器模式(adapter pattern)
Mysql - CRUD
Choose to pay tribute to the spirit behind continuous struggle -- Dialogue will values [Issue 4]
FFT learning notes (I think it is detailed)
Global and Chinese markets for pressure and temperature sensors 2022-2028: Research Report on technology, participants, trends, market size and share
There is no network after configuring the agent by capturing packets with Fiddler mobile phones