当前位置:网站首页>Spark DF增加一列
Spark DF增加一列
2022-07-06 00:23:00 【南风知我意丿】
文章目录
方法一:利用createDataFrame方法,新增列的过程包含在构建rdd和schema中
val trdd = input.select(targetColumns).rdd.map(x=>{
if (x.get(0).toString().toDouble > critValueR || x.get(0).toString().toDouble < critValueL)
Row(x.get(0).toString().toDouble,"F")
else Row(x.get(0).toString().toDouble,"T")
})
val schema = input.select(targetColumns).schema.add("flag", StringType, true)
val sample3 = ss.createDataFrame(trdd, schema).distinct().withColumnRenamed(targetColumns, "idx")
方法二:利用withColumn方法,新增列的过程包含在udf函数中
val code :(Int => String) = (arg: Int) => {
if (arg > critValueR || arg < critValueL) "F" else "T"}
val addCol = udf(code)
val sample3 = input.select(targetColumns).withColumn("flag", addCol(input(targetColumns)))
.withColumnRenamed(targetColumns, "idx")
方法三:利用SQL代码,新增列的过程直接写入SQL代码中
input.select(targetColumns).createOrReplaceTempView("tmp")
val sample3 = ss.sqlContext.sql("select distinct "+targetColname+
" as idx,case when "+targetColname+">"+critValueR+" then 'F'"+
" when "+targetColname+"<"+critValueL+" then 'F' else 'T' end as flag from tmp")
方法四:以上三种是增加一个有判断的列,如果想要增加一列唯一序号,可以使用monotonically_increasing_id
//添加序号列新增一列方法4
import org.apache.spark.sql.functions.monotonically_increasing_id
val inputnew = input.withColumn("idx", monotonically_increasing_id)
边栏推荐
- After summarizing more than 800 kubectl aliases, I'm no longer afraid that I can't remember commands!
- PHP determines whether an array contains the value of another array
- Knowledge about the memory size occupied by the structure
- Transport layer protocol ----- UDP protocol
- 电机的简介
- Senparc. Weixin. Sample. MP source code analysis
- LeetCode 6005. The minimum operand to make an array an alternating array
- Classic CTF topic about FTP protocol
- LeetCode 1598. Folder operation log collector
- 关于slmgr命令的那些事
猜你喜欢
![Atcoder beginer contest 254 [VP record]](/img/13/656468eb76bb8b6ea3b6465a56031d.png)
Atcoder beginer contest 254 [VP record]

State mode design procedure: Heroes in the game can rest, defend, attack normally and attack skills according to different physical strength values.

Huawei equipment is configured with OSPF and BFD linkage

notepad++正則錶達式替換字符串

Extracting profile data from profile measurement

Anconda download + add Tsinghua +tensorflow installation +no module named 'tensorflow' +kernelrestart: restart failed, kernel restart failed

Basic introduction and source code analysis of webrtc threads

Yunna | what are the main operating processes of the fixed assets management system

Multithreading and high concurrency (8) -- summarize AQS shared lock from countdownlatch (punch in for the third anniversary)

【DesignMode】装饰者模式(Decorator pattern)
随机推荐
2022-02-13 work record -- PHP parsing rich text
Classical concurrency problem: the dining problem of philosophers
Ffmpeg captures RTSP images for image analysis
NSSA area where OSPF is configured for Huawei equipment
Codeforces round 804 (Div. 2) [competition record]
Key structure of ffmpeg - avformatcontext
Go learning - dependency injection
What are Yunna's fixed asset management systems?
State mode design procedure: Heroes in the game can rest, defend, attack normally and attack skills according to different physical strength values.
Multithreading and high concurrency (8) -- summarize AQS shared lock from countdownlatch (punch in for the third anniversary)
The global and Chinese markets of dial indicator calipers 2022-2028: Research Report on technology, participants, trends, market size and share
LeetCode 1598. Folder operation log collector
提升工作效率工具:SQL批量生成工具思想
Problems and solutions of converting date into specified string in date class
Mysql - CRUD
What are the functions of Yunna fixed assets management system?
7.5 装饰器
Doppler effect (Doppler shift)
Room cannot create an SQLite connection to verify the queries
MySQL之函数