当前位置:网站首页>Spark DF adds a column
Spark DF adds a column
2022-07-06 00:28:00 【The south wind knows what I mean】
List of articles
- Method 1 : utilize createDataFrame Method , The process of adding new columns is included in building rdd and schema in
- Method 2 : utilize withColumn Method , The process of adding new columns is included in udf Function
- Method 3 : utilize SQL Code , The process of adding new columns is written directly to SQL In the code
- Method four : The above three are to add a judged column , If you want to add a unique sequence number , have access to monotonically_increasing_id
Method 1 : utilize createDataFrame Method , The process of adding new columns is included in building rdd and schema in
val trdd = input.select(targetColumns).rdd.map(x=>{
if (x.get(0).toString().toDouble > critValueR || x.get(0).toString().toDouble < critValueL)
Row(x.get(0).toString().toDouble,"F")
else Row(x.get(0).toString().toDouble,"T")
})
val schema = input.select(targetColumns).schema.add("flag", StringType, true)
val sample3 = ss.createDataFrame(trdd, schema).distinct().withColumnRenamed(targetColumns, "idx")
Method 2 : utilize withColumn Method , The process of adding new columns is included in udf Function
val code :(Int => String) = (arg: Int) => {
if (arg > critValueR || arg < critValueL) "F" else "T"}
val addCol = udf(code)
val sample3 = input.select(targetColumns).withColumn("flag", addCol(input(targetColumns)))
.withColumnRenamed(targetColumns, "idx")
Method 3 : utilize SQL Code , The process of adding new columns is written directly to SQL In the code
input.select(targetColumns).createOrReplaceTempView("tmp")
val sample3 = ss.sqlContext.sql("select distinct "+targetColname+
" as idx,case when "+targetColname+">"+critValueR+" then 'F'"+
" when "+targetColname+"<"+critValueL+" then 'F' else 'T' end as flag from tmp")
Method four : The above three are to add a judged column , If you want to add a unique sequence number , have access to monotonically_increasing_id
// Add sequence number column add a column method 4
import org.apache.spark.sql.functions.monotonically_increasing_id
val inputnew = input.withColumn("idx", monotonically_increasing_id)
边栏推荐
- Hudi of data Lake (1): introduction to Hudi
- 电机的简介
- FFmpeg抓取RTSP图像进行图像分析
- Extension and application of timestamp
- Global and Chinese market of digital serial inverter 2022-2028: Research Report on technology, participants, trends, market size and share
- What is information security? What is included? What is the difference with network security?
- Shardingsphere source code analysis
- Spark AQE
- Spark SQL空值Null,NaN判断和处理
- Spark-SQL UDF函数
猜你喜欢

Date类中日期转成指定字符串出现的问题及解决方法

Hudi of data Lake (2): Hudi compilation

Wechat applet -- wxml template syntax (with notes)

Model analysis of establishment time and holding time

Set data real-time update during MDK debug

MySql——CRUD

【NOI模拟赛】Anaid 的树(莫比乌斯反演,指数型生成函数,埃氏筛,虚树)

FFMPEG关键结构体——AVCodecContext
![[EI conference sharing] the Third International Conference on intelligent manufacturing and automation frontier in 2022 (cfima 2022)](/img/39/9d189a18f3f75110b400506e274391.png)
[EI conference sharing] the Third International Conference on intelligent manufacturing and automation frontier in 2022 (cfima 2022)

How to use the flutter framework to develop and run small programs
随机推荐
Codeforces Round #804 (Div. 2)【比赛记录】
LeetCode 6006. Take out the least number of magic beans
Key structure of ffmpeg -- AVCodecContext
LeetCode 6004. Get operands of 0
AtCoder Beginner Contest 254【VP记录】
Extracting profile data from profile measurement
XML配置文件
数据分析思维分析方法和业务知识——分析方法(二)
Huawei equipment configuration ospf-bgp linkage
Leetcode Fibonacci sequence
OS i/o devices and device controllers
【NOI模拟赛】Anaid 的树(莫比乌斯反演,指数型生成函数,埃氏筛,虚树)
uniapp开发,打包成H5部署到服务器
How much do you know about the bank deposit business that software test engineers must know?
Configuring OSPF load sharing for Huawei devices
Pointer pointer array, array pointer
Hudi of data Lake (1): introduction to Hudi
Reading notes of the beauty of programming
Hudi of data Lake (2): Hudi compilation
Gd32f4xx UIP protocol stack migration record