当前位置:网站首页>Spark DF增加一列
Spark DF增加一列
2022-07-06 00:23:00 【南风知我意丿】
文章目录
方法一:利用createDataFrame方法,新增列的过程包含在构建rdd和schema中
val trdd = input.select(targetColumns).rdd.map(x=>{
if (x.get(0).toString().toDouble > critValueR || x.get(0).toString().toDouble < critValueL)
Row(x.get(0).toString().toDouble,"F")
else Row(x.get(0).toString().toDouble,"T")
})
val schema = input.select(targetColumns).schema.add("flag", StringType, true)
val sample3 = ss.createDataFrame(trdd, schema).distinct().withColumnRenamed(targetColumns, "idx")
方法二:利用withColumn方法,新增列的过程包含在udf函数中
val code :(Int => String) = (arg: Int) => {
if (arg > critValueR || arg < critValueL) "F" else "T"}
val addCol = udf(code)
val sample3 = input.select(targetColumns).withColumn("flag", addCol(input(targetColumns)))
.withColumnRenamed(targetColumns, "idx")
方法三:利用SQL代码,新增列的过程直接写入SQL代码中
input.select(targetColumns).createOrReplaceTempView("tmp")
val sample3 = ss.sqlContext.sql("select distinct "+targetColname+
" as idx,case when "+targetColname+">"+critValueR+" then 'F'"+
" when "+targetColname+"<"+critValueL+" then 'F' else 'T' end as flag from tmp")
方法四:以上三种是增加一个有判断的列,如果想要增加一列唯一序号,可以使用monotonically_increasing_id
//添加序号列新增一列方法4
import org.apache.spark.sql.functions.monotonically_increasing_id
val inputnew = input.withColumn("idx", monotonically_increasing_id)
边栏推荐
- [EI conference sharing] the Third International Conference on intelligent manufacturing and automation frontier in 2022 (cfima 2022)
- Browser local storage
- Transport layer protocol ----- UDP protocol
- Shardingsphere source code analysis
- FFmpeg抓取RTSP图像进行图像分析
- [designmode] Decorator Pattern
- Chapter 16 oauth2authorizationrequestredirectwebfilter source code analysis
- Key structure of ffmpeg - avformatcontext
- Location based mobile terminal network video exploration app system documents + foreign language translation and original text + guidance records (8 weeks) + PPT + review + project source code
- State mode design procedure: Heroes in the game can rest, defend, attack normally and attack skills according to different physical strength values.
猜你喜欢
FFT learning notes (I think it is detailed)
FPGA内部硬件结构与代码的关系
Data analysis thinking analysis methods and business knowledge -- analysis methods (II)
Calculate sha256 value of data or file based on crypto++
Leetcode:20220213 week race (less bugs, top 10% 555)
[designmode] Decorator Pattern
[noi simulation] Anaid's tree (Mobius inversion, exponential generating function, Ehrlich sieve, virtual tree)
Ffmpeg learning - core module
Determinant learning notes (I)
建立时间和保持时间的模型分析
随机推荐
Codeforces round 804 (Div. 2) [competition record]
Global and Chinese market of digital serial inverter 2022-2028: Research Report on technology, participants, trends, market size and share
An understanding of & array names
anconda下载+添加清华+tensorflow 安装+No module named ‘tensorflow‘+KernelRestarter: restart failed,内核重启失败
JS can really prohibit constant modification this time!
Codeforces gr19 D (think more about why the first-hand value range is 100, JLS yyds)
Go learning --- read INI file
[Online gadgets] a collection of online gadgets that will be used in the development process
NSSA area where OSPF is configured for Huawei equipment
Permission problem: source bash_ profile permission denied
如何解决ecology9.0执行导入流程流程产生的问题
LeetCode 1189. Maximum number of "balloons"
7.5 装饰器
Yunna | what are the main operating processes of the fixed assets management system
SQLServer连接数据库读取中文乱码问题解决
After summarizing more than 800 kubectl aliases, I'm no longer afraid that I can't remember commands!
[binary search tree] add, delete, modify and query function code implementation
FFMPEG关键结构体——AVFormatContext
QT -- thread
常用API类及异常体系