当前位置:网站首页>Spark DF adds a column
Spark DF adds a column
2022-07-06 00:28:00 【The south wind knows what I mean】
List of articles
- Method 1 : utilize createDataFrame Method , The process of adding new columns is included in building rdd and schema in
- Method 2 : utilize withColumn Method , The process of adding new columns is included in udf Function
- Method 3 : utilize SQL Code , The process of adding new columns is written directly to SQL In the code
- Method four : The above three are to add a judged column , If you want to add a unique sequence number , have access to monotonically_increasing_id
Method 1 : utilize createDataFrame Method , The process of adding new columns is included in building rdd and schema in
val trdd = input.select(targetColumns).rdd.map(x=>{
if (x.get(0).toString().toDouble > critValueR || x.get(0).toString().toDouble < critValueL)
Row(x.get(0).toString().toDouble,"F")
else Row(x.get(0).toString().toDouble,"T")
})
val schema = input.select(targetColumns).schema.add("flag", StringType, true)
val sample3 = ss.createDataFrame(trdd, schema).distinct().withColumnRenamed(targetColumns, "idx")
Method 2 : utilize withColumn Method , The process of adding new columns is included in udf Function
val code :(Int => String) = (arg: Int) => {
if (arg > critValueR || arg < critValueL) "F" else "T"}
val addCol = udf(code)
val sample3 = input.select(targetColumns).withColumn("flag", addCol(input(targetColumns)))
.withColumnRenamed(targetColumns, "idx")
Method 3 : utilize SQL Code , The process of adding new columns is written directly to SQL In the code
input.select(targetColumns).createOrReplaceTempView("tmp")
val sample3 = ss.sqlContext.sql("select distinct "+targetColname+
" as idx,case when "+targetColname+">"+critValueR+" then 'F'"+
" when "+targetColname+"<"+critValueL+" then 'F' else 'T' end as flag from tmp")
Method four : The above three are to add a judged column , If you want to add a unique sequence number , have access to monotonically_increasing_id
// Add sequence number column add a column method 4
import org.apache.spark.sql.functions.monotonically_increasing_id
val inputnew = input.withColumn("idx", monotonically_increasing_id)
边栏推荐
- [QT] QT uses qjson to generate JSON files and save them
- LeetCode 斐波那契序列
- Model analysis of establishment time and holding time
- Power Query数据格式的转换、拆分合并提取、删除重复项、删除错误、转置与反转、透视和逆透视
- Key structure of ffmpeg -- AVCodecContext
- Global and Chinese market of valve institutions 2022-2028: Research Report on technology, participants, trends, market size and share
- Pointer - character pointer
- 关于slmgr命令的那些事
- 剖面测量之提取剖面数据
- 【DesignMode】组合模式(composite mode)
猜你喜欢

多线程与高并发(8)—— 从CountDownLatch总结AQS共享锁(三周年打卡)

About the slmgr command

FFMPEG关键结构体——AVCodecContext

Ffmpeg captures RTSP images for image analysis

Spark AQE

Recognize the small experiment of extracting and displaying Mel spectrum (observe the difference between different y_axis and x_axis)

OpenCV经典100题

Knowledge about the memory size occupied by the structure
![[binary search tree] add, delete, modify and query function code implementation](/img/38/810a83575c56f17a7a0ed428a2e02e.png)
[binary search tree] add, delete, modify and query function code implementation

Key structure of ffmpeg - avframe
随机推荐
Room cannot create an SQLite connection to verify the queries
Huawei equipment configuration ospf-bgp linkage
OpenCV经典100题
Spark AQE
Global and Chinese markets for hinged watertight doors 2022-2028: Research Report on technology, participants, trends, market size and share
Global and Chinese markets of POM plastic gears 2022-2028: Research Report on technology, participants, trends, market size and share
LeetCode 1598. Folder operation log collector
QT -- thread
NSSA area where OSPF is configured for Huawei equipment
Permission problem: source bash_ profile permission denied
提升工作效率工具:SQL批量生成工具思想
选择致敬持续奋斗背后的精神——对话威尔价值观【第四期】
MySQL global lock and table lock
[Chongqing Guangdong education] reference materials for Zhengzhou Vocational College of finance, taxation and finance to play around the E-era
剖面测量之提取剖面数据
Hudi of data Lake (2): Hudi compilation
[noi simulation] Anaid's tree (Mobius inversion, exponential generating function, Ehrlich sieve, virtual tree)
如何解决ecology9.0执行导入流程流程产生的问题
STM32按键消抖——入门状态机思维
MDK debug时设置数据实时更新