当前位置:网站首页>Spark DF增加一列
Spark DF增加一列
2022-07-06 00:23:00 【南风知我意丿】
文章目录
方法一:利用createDataFrame方法,新增列的过程包含在构建rdd和schema中
val trdd = input.select(targetColumns).rdd.map(x=>{
if (x.get(0).toString().toDouble > critValueR || x.get(0).toString().toDouble < critValueL)
Row(x.get(0).toString().toDouble,"F")
else Row(x.get(0).toString().toDouble,"T")
})
val schema = input.select(targetColumns).schema.add("flag", StringType, true)
val sample3 = ss.createDataFrame(trdd, schema).distinct().withColumnRenamed(targetColumns, "idx")
方法二:利用withColumn方法,新增列的过程包含在udf函数中
val code :(Int => String) = (arg: Int) => {
if (arg > critValueR || arg < critValueL) "F" else "T"}
val addCol = udf(code)
val sample3 = input.select(targetColumns).withColumn("flag", addCol(input(targetColumns)))
.withColumnRenamed(targetColumns, "idx")
方法三:利用SQL代码,新增列的过程直接写入SQL代码中
input.select(targetColumns).createOrReplaceTempView("tmp")
val sample3 = ss.sqlContext.sql("select distinct "+targetColname+
" as idx,case when "+targetColname+">"+critValueR+" then 'F'"+
" when "+targetColname+"<"+critValueL+" then 'F' else 'T' end as flag from tmp")
方法四:以上三种是增加一个有判断的列,如果想要增加一列唯一序号,可以使用monotonically_increasing_id
//添加序号列新增一列方法4
import org.apache.spark.sql.functions.monotonically_increasing_id
val inputnew = input.withColumn("idx", monotonically_increasing_id)
边栏推荐
- Anconda download + add Tsinghua +tensorflow installation +no module named 'tensorflow' +kernelrestart: restart failed, kernel restart failed
- 7.5 decorator
- Power Query数据格式的转换、拆分合并提取、删除重复项、删除错误、转置与反转、透视和逆透视
- Classical concurrency problem: the dining problem of philosophers
- 什么叫做信息安全?包含哪些内容?与网络安全有什么区别?
- 【NOI模拟赛】Anaid 的树(莫比乌斯反演,指数型生成函数,埃氏筛,虚树)
- There is no network after configuring the agent by capturing packets with Fiddler mobile phones
- Basic introduction and source code analysis of webrtc threads
- 7.5 simulation summary
- 2022.7.5-----leetcode.729
猜你喜欢
FFT learning notes (I think it is detailed)
Ffmpeg captures RTSP images for image analysis
建立时间和保持时间的模型分析
wx.getLocation(Object object)申请方法,最新版
2022-02-13 work record -- PHP parsing rich text
小程序技术优势与产业互联网相结合的分析
Location based mobile terminal network video exploration app system documents + foreign language translation and original text + guidance records (8 weeks) + PPT + review + project source code
GD32F4xx uIP协议栈移植记录
OpenCV经典100题
Data analysis thinking analysis methods and business knowledge - analysis methods (III)
随机推荐
小程序技术优势与产业互联网相结合的分析
SQLServer连接数据库读取中文乱码问题解决
QT -- thread
[designmode] adapter pattern
Room cannot create an SQLite connection to verify the queries
选择致敬持续奋斗背后的精神——对话威尔价值观【第四期】
MySQL functions
认识提取与显示梅尔谱图的小实验(观察不同y_axis和x_axis的区别)
【DesignMode】适配器模式(adapter pattern)
免费的聊天机器人API
Transport layer protocol ----- UDP protocol
《编程之美》读书笔记
MySql——CRUD
Global and Chinese markets of POM plastic gears 2022-2028: Research Report on technology, participants, trends, market size and share
[Chongqing Guangdong education] Chongqing Engineering Vocational and Technical College
Yunna | what are the main operating processes of the fixed assets management system
Go learning - dependency injection
How to solve the problems caused by the import process of ecology9.0
DEJA_VU3D - Cesium功能集 之 055-国内外各厂商地图服务地址汇总说明
2022.7.5-----leetcode. seven hundred and twenty-nine