当前位置:网站首页>Spark DF增加一列
Spark DF增加一列
2022-07-06 00:23:00 【南风知我意丿】
文章目录
方法一:利用createDataFrame方法,新增列的过程包含在构建rdd和schema中
val trdd = input.select(targetColumns).rdd.map(x=>{
if (x.get(0).toString().toDouble > critValueR || x.get(0).toString().toDouble < critValueL)
Row(x.get(0).toString().toDouble,"F")
else Row(x.get(0).toString().toDouble,"T")
})
val schema = input.select(targetColumns).schema.add("flag", StringType, true)
val sample3 = ss.createDataFrame(trdd, schema).distinct().withColumnRenamed(targetColumns, "idx")
方法二:利用withColumn方法,新增列的过程包含在udf函数中
val code :(Int => String) = (arg: Int) => {
if (arg > critValueR || arg < critValueL) "F" else "T"}
val addCol = udf(code)
val sample3 = input.select(targetColumns).withColumn("flag", addCol(input(targetColumns)))
.withColumnRenamed(targetColumns, "idx")
方法三:利用SQL代码,新增列的过程直接写入SQL代码中
input.select(targetColumns).createOrReplaceTempView("tmp")
val sample3 = ss.sqlContext.sql("select distinct "+targetColname+
" as idx,case when "+targetColname+">"+critValueR+" then 'F'"+
" when "+targetColname+"<"+critValueL+" then 'F' else 'T' end as flag from tmp")
方法四:以上三种是增加一个有判断的列,如果想要增加一列唯一序号,可以使用monotonically_increasing_id
//添加序号列新增一列方法4
import org.apache.spark.sql.functions.monotonically_increasing_id
val inputnew = input.withColumn("idx", monotonically_increasing_id)
边栏推荐
- Single source shortest path exercise (I)
- Ffmpeg captures RTSP images for image analysis
- 行列式学习笔记(一)
- USB Interface USB protocol
- Key structure of ffmpeg -- AVCodecContext
- DEJA_ Vu3d - cesium feature set 055 - summary description of map service addresses of domestic and foreign manufacturers
- FPGA内部硬件结构与代码的关系
- 【NOI模拟赛】Anaid 的树(莫比乌斯反演,指数型生成函数,埃氏筛,虚树)
- Pointer pointer array, array pointer
- MDK debug时设置数据实时更新
猜你喜欢

anconda下载+添加清华+tensorflow 安装+No module named ‘tensorflow‘+KernelRestarter: restart failed,内核重启失败

Classic CTF topic about FTP protocol

What are the functions of Yunna fixed assets management system?

Doppler effect (Doppler shift)

Hudi of data Lake (1): introduction to Hudi

How much do you know about the bank deposit business that software test engineers must know?
![Atcoder beginer contest 258 [competition record]](/img/e4/1d34410f79851a7a81dd8f4a0b54bf.gif)
Atcoder beginer contest 258 [competition record]

Gavin teacher's perception of transformer live class - rasa project actual combat e-commerce retail customer service intelligent business dialogue robot system behavior analysis and project summary (4

notepad++正則錶達式替換字符串

电机的简介
随机推荐
LeetCode 1598. Folder operation log collector
MySQL functions
Uniapp development, packaged as H5 and deployed to the server
Power Query数据格式的转换、拆分合并提取、删除重复项、删除错误、转置与反转、透视和逆透视
MySql——CRUD
PHP determines whether an array contains the value of another array
Global and Chinese markets of universal milling machines 2022-2028: Research Report on technology, participants, trends, market size and share
Recognize the small experiment of extracting and displaying Mel spectrum (observe the difference between different y_axis and x_axis)
Start from the bottom structure and learn the introduction of fpga---fifo IP core and its key parameters
MySQL global lock and table lock
7.5 装饰器
数据分析思维分析方法和业务知识——分析方法(二)
notepad++正则表达式替换字符串
Key structure of ffmpeg - avframe
选择致敬持续奋斗背后的精神——对话威尔价值观【第四期】
Hudi of data Lake (1): introduction to Hudi
AtCoder Beginner Contest 254【VP记录】
Configuring OSPF load sharing for Huawei devices
Gd32f4xx UIP protocol stack migration record
[designmode] Decorator Pattern