当前位置:网站首页>Spark DF adds a column
Spark DF adds a column
2022-07-06 00:28:00 【The south wind knows what I mean】
List of articles
- Method 1 : utilize createDataFrame Method , The process of adding new columns is included in building rdd and schema in
- Method 2 : utilize withColumn Method , The process of adding new columns is included in udf Function
- Method 3 : utilize SQL Code , The process of adding new columns is written directly to SQL In the code
- Method four : The above three are to add a judged column , If you want to add a unique sequence number , have access to monotonically_increasing_id
Method 1 : utilize createDataFrame Method , The process of adding new columns is included in building rdd and schema in
val trdd = input.select(targetColumns).rdd.map(x=>{
if (x.get(0).toString().toDouble > critValueR || x.get(0).toString().toDouble < critValueL)
Row(x.get(0).toString().toDouble,"F")
else Row(x.get(0).toString().toDouble,"T")
})
val schema = input.select(targetColumns).schema.add("flag", StringType, true)
val sample3 = ss.createDataFrame(trdd, schema).distinct().withColumnRenamed(targetColumns, "idx")
Method 2 : utilize withColumn Method , The process of adding new columns is included in udf Function
val code :(Int => String) = (arg: Int) => {
if (arg > critValueR || arg < critValueL) "F" else "T"}
val addCol = udf(code)
val sample3 = input.select(targetColumns).withColumn("flag", addCol(input(targetColumns)))
.withColumnRenamed(targetColumns, "idx")
Method 3 : utilize SQL Code , The process of adding new columns is written directly to SQL In the code
input.select(targetColumns).createOrReplaceTempView("tmp")
val sample3 = ss.sqlContext.sql("select distinct "+targetColname+
" as idx,case when "+targetColname+">"+critValueR+" then 'F'"+
" when "+targetColname+"<"+critValueL+" then 'F' else 'T' end as flag from tmp")
Method four : The above three are to add a judged column , If you want to add a unique sequence number , have access to monotonically_increasing_id
// Add sequence number column add a column method 4
import org.apache.spark.sql.functions.monotonically_increasing_id
val inputnew = input.withColumn("idx", monotonically_increasing_id)
边栏推荐
- anconda下载+添加清华+tensorflow 安装+No module named ‘tensorflow‘+KernelRestarter: restart failed,内核重启失败
- Extracting profile data from profile measurement
- State mode design procedure: Heroes in the game can rest, defend, attack normally and attack skills according to different physical strength values.
- Configuring OSPF GR features for Huawei devices
- Arduino六足机器人
- MySQL之函数
- LeetCode 1189. Maximum number of "balloons"
- Global and Chinese market of digital serial inverter 2022-2028: Research Report on technology, participants, trends, market size and share
- MySql——CRUD
- Data analysis thinking analysis methods and business knowledge -- analysis methods (II)
猜你喜欢
Mysql - CRUD
Calculate sha256 value of data or file based on crypto++
Gavin teacher's perception of transformer live class - rasa project actual combat e-commerce retail customer service intelligent business dialogue robot system behavior analysis and project summary (4
Opencv classic 100 questions
Wechat applet -- wxml template syntax (with notes)
Notepad++ regular expression replacement string
[designmode] Decorator Pattern
数据分析思维分析方法和业务知识——分析方法(三)
Model analysis of establishment time and holding time
Introduction of motor
随机推荐
FFMPEG关键结构体——AVFrame
Go learning - dependency injection
Location based mobile terminal network video exploration app system documents + foreign language translation and original text + guidance records (8 weeks) + PPT + review + project source code
【DesignMode】装饰者模式(Decorator pattern)
《编程之美》读书笔记
Global and Chinese markets of POM plastic gears 2022-2028: Research Report on technology, participants, trends, market size and share
2022-02-13 work record -- PHP parsing rich text
Detailed explanation of APP functions of door-to-door appointment service
Hudi of data Lake (1): introduction to Hudi
Global and Chinese market of digital serial inverter 2022-2028: Research Report on technology, participants, trends, market size and share
FFmpeg抓取RTSP图像进行图像分析
[binary search tree] add, delete, modify and query function code implementation
MySql——CRUD
[designmode] Decorator Pattern
Key structure of ffmpeg - avformatcontext
Set data real-time update during MDK debug
如何利用Flutter框架开发运行小程序
Priority queue (heap)
Leetcode 450 deleting nodes in a binary search tree
【DesignMode】组合模式(composite mode)