当前位置:网站首页>Spark DF adds a column
Spark DF adds a column
2022-07-06 00:28:00 【The south wind knows what I mean】
List of articles
- Method 1 : utilize createDataFrame Method , The process of adding new columns is included in building rdd and schema in
- Method 2 : utilize withColumn Method , The process of adding new columns is included in udf Function
- Method 3 : utilize SQL Code , The process of adding new columns is written directly to SQL In the code
- Method four : The above three are to add a judged column , If you want to add a unique sequence number , have access to monotonically_increasing_id
Method 1 : utilize createDataFrame Method , The process of adding new columns is included in building rdd and schema in
val trdd = input.select(targetColumns).rdd.map(x=>{
if (x.get(0).toString().toDouble > critValueR || x.get(0).toString().toDouble < critValueL)
Row(x.get(0).toString().toDouble,"F")
else Row(x.get(0).toString().toDouble,"T")
})
val schema = input.select(targetColumns).schema.add("flag", StringType, true)
val sample3 = ss.createDataFrame(trdd, schema).distinct().withColumnRenamed(targetColumns, "idx")
Method 2 : utilize withColumn Method , The process of adding new columns is included in udf Function
val code :(Int => String) = (arg: Int) => {
if (arg > critValueR || arg < critValueL) "F" else "T"}
val addCol = udf(code)
val sample3 = input.select(targetColumns).withColumn("flag", addCol(input(targetColumns)))
.withColumnRenamed(targetColumns, "idx")
Method 3 : utilize SQL Code , The process of adding new columns is written directly to SQL In the code
input.select(targetColumns).createOrReplaceTempView("tmp")
val sample3 = ss.sqlContext.sql("select distinct "+targetColname+
" as idx,case when "+targetColname+">"+critValueR+" then 'F'"+
" when "+targetColname+"<"+critValueL+" then 'F' else 'T' end as flag from tmp")
Method four : The above three are to add a judged column , If you want to add a unique sequence number , have access to monotonically_increasing_id
// Add sequence number column add a column method 4
import org.apache.spark.sql.functions.monotonically_increasing_id
val inputnew = input.withColumn("idx", monotonically_increasing_id)
边栏推荐
- 权限问题:source .bash_profile permission denied
- Recognize the small experiment of extracting and displaying Mel spectrum (observe the difference between different y_axis and x_axis)
- Global and Chinese market of valve institutions 2022-2028: Research Report on technology, participants, trends, market size and share
- PHP determines whether an array contains the value of another array
- DEJA_VU3D - Cesium功能集 之 055-国内外各厂商地图服务地址汇总说明
- FFT learning notes (I think it is detailed)
- Transport layer protocol ----- UDP protocol
- SQLServer连接数据库读取中文乱码问题解决
- Pointer pointer array, array pointer
- 免费的聊天机器人API
猜你喜欢

Detailed explanation of APP functions of door-to-door appointment service

选择致敬持续奋斗背后的精神——对话威尔价值观【第四期】

notepad++正则表达式替换字符串

Hardware and interface learning summary

权限问题:source .bash_profile permission denied

小程序技术优势与产业互联网相结合的分析

Room cannot create an SQLite connection to verify the queries
![Choose to pay tribute to the spirit behind continuous struggle -- Dialogue will values [Issue 4]](/img/d8/a367c26b51d9dbaf53bf4fe2a13917.png)
Choose to pay tribute to the spirit behind continuous struggle -- Dialogue will values [Issue 4]

LeetCode 1598. Folder operation log collector

State mode design procedure: Heroes in the game can rest, defend, attack normally and attack skills according to different physical strength values.
随机推荐
Atcoder beginer contest 258 [competition record]
SQLServer连接数据库读取中文乱码问题解决
Calculate sha256 value of data or file based on crypto++
The relationship between FPGA internal hardware structure and code
MySQL之函数
Classic CTF topic about FTP protocol
Global and Chinese market of valve institutions 2022-2028: Research Report on technology, participants, trends, market size and share
The global and Chinese markets of dial indicator calipers 2022-2028: Research Report on technology, participants, trends, market size and share
Problems encountered in the database
How much do you know about the bank deposit business that software test engineers must know?
多线程与高并发(8)—— 从CountDownLatch总结AQS共享锁(三周年打卡)
Knowledge about the memory size occupied by the structure
XML配置文件
LeetCode 斐波那契序列
7.5模拟赛总结
Arduino六足机器人
[designmode] adapter pattern
How to use the flutter framework to develop and run small programs
选择致敬持续奋斗背后的精神——对话威尔价值观【第四期】
Room cannot create an SQLite connection to verify the queries