当前位置:网站首页>Spark DF增加一列
Spark DF增加一列
2022-07-06 00:23:00 【南风知我意丿】
文章目录
方法一:利用createDataFrame方法,新增列的过程包含在构建rdd和schema中
val trdd = input.select(targetColumns).rdd.map(x=>{
if (x.get(0).toString().toDouble > critValueR || x.get(0).toString().toDouble < critValueL)
Row(x.get(0).toString().toDouble,"F")
else Row(x.get(0).toString().toDouble,"T")
})
val schema = input.select(targetColumns).schema.add("flag", StringType, true)
val sample3 = ss.createDataFrame(trdd, schema).distinct().withColumnRenamed(targetColumns, "idx")
方法二:利用withColumn方法,新增列的过程包含在udf函数中
val code :(Int => String) = (arg: Int) => {
if (arg > critValueR || arg < critValueL) "F" else "T"}
val addCol = udf(code)
val sample3 = input.select(targetColumns).withColumn("flag", addCol(input(targetColumns)))
.withColumnRenamed(targetColumns, "idx")
方法三:利用SQL代码,新增列的过程直接写入SQL代码中
input.select(targetColumns).createOrReplaceTempView("tmp")
val sample3 = ss.sqlContext.sql("select distinct "+targetColname+
" as idx,case when "+targetColname+">"+critValueR+" then 'F'"+
" when "+targetColname+"<"+critValueL+" then 'F' else 'T' end as flag from tmp")
方法四:以上三种是增加一个有判断的列,如果想要增加一列唯一序号,可以使用monotonically_increasing_id
//添加序号列新增一列方法4
import org.apache.spark.sql.functions.monotonically_increasing_id
val inputnew = input.withColumn("idx", monotonically_increasing_id)
边栏推荐
- 【NOI模拟赛】Anaid 的树(莫比乌斯反演,指数型生成函数,埃氏筛,虚树)
- MySQL存储引擎
- Detailed explanation of APP functions of door-to-door appointment service
- FFMPEG关键结构体——AVFormatContext
- uniapp开发,打包成H5部署到服务器
- 数据分析思维分析方法和业务知识——分析方法(二)
- Data analysis thinking analysis methods and business knowledge - analysis methods (III)
- LeetCode 6005. The minimum operand to make an array an alternating array
- 关于slmgr命令的那些事
- MySQL之函数
猜你喜欢

Ffmpeg captures RTSP images for image analysis

Hudi of data Lake (2): Hudi compilation

Priority queue (heap)

LeetCode 1598. Folder operation log collector

Classic CTF topic about FTP protocol

State mode design procedure: Heroes in the game can rest, defend, attack normally and attack skills according to different physical strength values.

FFmpeg学习——核心模块

Location based mobile terminal network video exploration app system documents + foreign language translation and original text + guidance records (8 weeks) + PPT + review + project source code

关于结构体所占内存大小知识

MySql——CRUD
随机推荐
[online chat] the original wechat applet can also reply to Facebook homepage messages!
XML配置文件
An understanding of & array names
JS 这次真的可以禁止常量修改了!
提升工作效率工具:SQL批量生成工具思想
免费的聊天机器人API
The difference of time zone and the time library of go language
[Chongqing Guangdong education] reference materials for Zhengzhou Vocational College of finance, taxation and finance to play around the E-era
Chapter 16 oauth2authorizationrequestredirectwebfilter source code analysis
[EI conference sharing] the Third International Conference on intelligent manufacturing and automation frontier in 2022 (cfima 2022)
FFmpeg抓取RTSP图像进行图像分析
Global and Chinese markets for hinged watertight doors 2022-2028: Research Report on technology, participants, trends, market size and share
notepad++正则表达式替换字符串
Data analysis thinking analysis methods and business knowledge -- analysis methods (II)
What are the functions of Yunna fixed assets management system?
OpenCV经典100题
【DesignMode】组合模式(composite mode)
时间戳的拓展及应用实例
选择致敬持续奋斗背后的精神——对话威尔价值观【第四期】
LeetCode 1189. Maximum number of "balloons"