当前位置:网站首页>Spark SQL空值Null,NaN判断和处理
Spark SQL空值Null,NaN判断和处理
2022-07-06 00:23:00 【南风知我意丿】
Spark SQL空值Null,NaN判断和处理
Null 和 NaN
null表示无、不存在或无效的对象或地址引用。它在简单的数学运算中会转换为0,它是一个全局对象。null ==false返回的值是false。
undefined是一个全局属性,原始值undefined。它告诉我们有些东西没有赋值,没有定义。undefined不能转换成任何数字,因此在数学计算中使用它,返回的是NaN。
val d: Double = math.sqrt(-1.0)
println(d)
val n: Boolean = math.sqrt(-1.0).isNaN
println(n)

Spark SQL空值Null,NaN判断和处理
val df: DataFrame = session.sql(
s""" |select * from sparktuning.course_pay1 |""".stripMargin)
// 删除所有列的空值和NaN
val resNull=data1.na.drop()
resNull.limit(10).show()
+-------+------+---+------------+--------+-------------+---------+----------+------+
|affairs|gender|age|yearsmarried|children|religiousness|education|occupation|rating|
+-------+------+---+------------+--------+-------------+---------+----------+------+
| 0| male| 37| 10| no| 3| 18| 7| 4|
| 0| male| 57| 15| yes| 2| 14| 4| 4|
| 0|female| 32| 15| yes| 4| 16| 1| 2|
| 0| male| 22| 1.5| no| 4| 14| 4| 5|
| 0| male| 37| 15| yes| 2| 20| 7| 2|
| 0| male| 27| 4| yes| 4| 18| 6| 4|
| 0| male| 47| 15| yes| 5| 17| 6| 4|
| 0|female| 22| 1.5| no| 2| 17| 5| 4|
| 0|female| 27| 4| no| 4| 14| 5| 4|
| 0|female| 37| 15| yes| 1| 17| 5| 5|
+-------+------+---+------------+--------+-------------+---------+----------+------+
//删除某列的空值和NaN
val res=data1.na.drop(Array("gender","yearsmarried"))
// 删除某列的非空且非NaN的低于10的 --注意字段类型
data1.na.drop(10,Array("gender","yearsmarried"))
//填充所有空值[Boolean]的列 --注意字段类型
df.na.fill(false,Array("courseid"))
//填充所有空值的列
val res123=data1.na.fill("wangxiao123")
res123.limit(10).show()
+-------+-----------+---+------------+--------+-------------+---------+----------+-----------+
|affairs| gender|age|yearsmarried|children|religiousness|education|occupation| rating|
+-------+-----------+---+------------+--------+-------------+---------+----------+-----------+
| 0| male| 37| 10| no| 3| 18| 7| 4|
| 0|wangxiao123| 27| wangxiao123| no| 4| 14| 6|wangxiao123|
| 0|wangxiao123| 32| wangxiao123| yes| 1| 12| 1|wangxiao123|
| 0|wangxiao123| 57| wangxiao123| yes| 5| 18| 6|wangxiao123|
| 0|wangxiao123| 22| wangxiao123| no| 2| 17| 6|wangxiao123|
| 0|wangxiao123| 32| wangxiao123| no| 2| 17| 5|wangxiao123|
| 0| female| 22| wangxiao123| no| 2| 12| 1|wangxiao123|
| 0| male| 57| 15| yes| 2| 14| 4| 4|
| 0| female| 32| 15| yes| 4| 16| 1| 2|
| 0| male| 22| 1.5| no| 4| 14| 4| 5|
+-------+-----------+---+------------+--------+-------------+---------+----------+-----------+
//对指定列的控制进行填充 -- 多列同值
df1.na.fill(123456,cols = Array("courseid","pointlistid")).show(false)
+---------+--------+-------+-----+-----------+--------+----+
|chapterid|courseid|majorid|money|pointlistid|dt |dn |
+---------+--------+-------+-----+-----------+--------+----+
|4 |123456 |5 |100 |3 |20190722|webA|
|7 |123456 |7 |100 |1 |20190722|webA|
|8 |123456 |3 | |8 |20190722|webA|
|5 |14 |3 |100 |123456 |20190722|webA|
|4 |15 |2 |100 |3 |20190722|webA|
|9 |123456 |8 |100 |7 |20190722|webA|
|7 |17 |7 |100 |123456 |20190722|webA|
|0 |18 |9 | |7 |20190722|webA|
|5 |123456 |8 |100 |4 |20190722|webA|
|4 |20 |1 |100 |123456 |20190722|webA|
|4 |123456 |5 |100 |1 |20190722|webA|
|0 |22 |3 |100 |9 |20190722|webA|
|1 |123456 |8 |100 |0 |20190722|webA|
|4 |24 |0 |100 |5 |20190722|webA|
|9 |123456 |9 |100 |0 |20190722|webA|
+---------+--------+-------+-----+-----------+--------+----+
//对指定列的控制进行填充 -- 多列不同值
df1.na.fill(Map("courseid"->123456,"pointlistid"->654321)).show(false)
+---------+--------+-------+-----+-----------+--------+----+
|chapterid|courseid|majorid|money|pointlistid|dt |dn |
+---------+--------+-------+-----+-----------+--------+----+
|4 |123456 |5 |100 |3 |20190722|webA|
|7 |123456 |7 |100 |1 |20190722|webA|
|8 |123456 |3 | |8 |20190722|webA|
|5 |14 |3 |100 |654321 |20190722|webA|
|4 |15 |2 |100 |3 |20190722|webA|
|9 |123456 |8 |100 |7 |20190722|webA|
|7 |17 |7 |100 |654321 |20190722|webA|
|0 |18 |9 | |7 |20190722|webA|
|5 |123456 |8 |100 |4 |20190722|webA|
|4 |20 |1 |100 |654321 |20190722|webA|
|4 |123456 |5 |100 |1 |20190722|webA|
|0 |22 |3 |100 |9 |20190722|webA|
|1 |123456 |8 |100 |0 |20190722|webA|
|4 |24 |0 |100 |5 |20190722|webA|
|9 |123456 |9 |100 |0 |20190722|webA|
+---------+--------+-------+-----+-----------+--------+----+
//查询空值列
data1.filter("gender is null").select("gender").limit(10).show
+------+
|gender|
+------+
| null|
| null|
| null|
| null|
| null|
+------+
data1.filter("gender is not null").select("gender").limit(10).show
+------+
|gender|
+------+
| male|
|female|
| male|
|female|
| male|
| male|
| male|
| male|
|female|
|female|
+------+
data1.filter( data1("gender").isNull ).select("gender").limit(10).show
+------+
|gender|
+------+
| null|
| null|
| null|
| null|
| null|
+------+
data1.filter("gender<>''").select("gender").limit(10).show
+------+
|gender|
+------+
| male|
|female|
| male|
|female|
| male|
| male|
| male|
| male|
|female|
|female|
+------+
边栏推荐
- Notepad + + regular expression replace String
- FFMPEG关键结构体——AVCodecContext
- Codeforces round 804 (Div. 2) [competition record]
- Global and Chinese markets for pressure and temperature sensors 2022-2028: Research Report on technology, participants, trends, market size and share
- [online chat] the original wechat applet can also reply to Facebook homepage messages!
- Data analysis thinking analysis methods and business knowledge - analysis methods (III)
- Atcoder beginer contest 258 [competition record]
- FFT learning notes (I think it is detailed)
- DEJA_ Vu3d - cesium feature set 055 - summary description of map service addresses of domestic and foreign manufacturers
- Single merchant v4.4 has the same original intention and strength!
猜你喜欢

Detailed explanation of APP functions of door-to-door appointment service

Basic introduction and source code analysis of webrtc threads

MySql——CRUD
![Atcoder beginer contest 254 [VP record]](/img/13/656468eb76bb8b6ea3b6465a56031d.png)
Atcoder beginer contest 254 [VP record]

Classical concurrency problem: the dining problem of philosophers

如何解决ecology9.0执行导入流程流程产生的问题

电机的简介

Teach you to run uni app with simulator on hbuilderx, conscience teaching!!!

常用API类及异常体系

FFMPEG关键结构体——AVFrame
随机推荐
Detailed explanation of APP functions of door-to-door appointment service
《编程之美》读书笔记
[EI conference sharing] the Third International Conference on intelligent manufacturing and automation frontier in 2022 (cfima 2022)
小程序技术优势与产业互联网相结合的分析
如何利用Flutter框架开发运行小程序
Key structure of ffmpeg - avformatcontext
How much do you know about the bank deposit business that software test engineers must know?
从底层结构开始学习FPGA----FIFO IP核及其关键参数介绍
【NOI模拟赛】Anaid 的树(莫比乌斯反演,指数型生成函数,埃氏筛,虚树)
什么叫做信息安全?包含哪些内容?与网络安全有什么区别?
PHP determines whether an array contains the value of another array
2022-02-13 work record -- PHP parsing rich text
Notepad++ regular expression replacement string
Single merchant v4.4 has the same original intention and strength!
Location based mobile terminal network video exploration app system documents + foreign language translation and original text + guidance records (8 weeks) + PPT + review + project source code
Hudi of data Lake (2): Hudi compilation
Start from the bottom structure and learn the introduction of fpga---fifo IP core and its key parameters
Solve the problem of reading Chinese garbled code in sqlserver connection database
OS i/o devices and device controllers
数据分析思维分析方法和业务知识——分析方法(三)