当前位置:网站首页>Spark SQL空值Null,NaN判断和处理
Spark SQL空值Null,NaN判断和处理
2022-07-25 15:10:00 【南风知我意丿】
Spark SQL空值Null,NaN判断和处理
Null 和 NaN
null表示无、不存在或无效的对象或地址引用。它在简单的数学运算中会转换为0,它是一个全局对象。null ==false返回的值是false。
undefined是一个全局属性,原始值undefined。它告诉我们有些东西没有赋值,没有定义。undefined不能转换成任何数字,因此在数学计算中使用它,返回的是NaN。
val d: Double = math.sqrt(-1.0)
println(d)
val n: Boolean = math.sqrt(-1.0).isNaN
println(n)

Spark SQL空值Null,NaN判断和处理
val df: DataFrame = session.sql(
s""" |select * from sparktuning.course_pay1 |""".stripMargin)
// 删除所有列的空值和NaN
val resNull=data1.na.drop()
resNull.limit(10).show()
+-------+------+---+------------+--------+-------------+---------+----------+------+
|affairs|gender|age|yearsmarried|children|religiousness|education|occupation|rating|
+-------+------+---+------------+--------+-------------+---------+----------+------+
| 0| male| 37| 10| no| 3| 18| 7| 4|
| 0| male| 57| 15| yes| 2| 14| 4| 4|
| 0|female| 32| 15| yes| 4| 16| 1| 2|
| 0| male| 22| 1.5| no| 4| 14| 4| 5|
| 0| male| 37| 15| yes| 2| 20| 7| 2|
| 0| male| 27| 4| yes| 4| 18| 6| 4|
| 0| male| 47| 15| yes| 5| 17| 6| 4|
| 0|female| 22| 1.5| no| 2| 17| 5| 4|
| 0|female| 27| 4| no| 4| 14| 5| 4|
| 0|female| 37| 15| yes| 1| 17| 5| 5|
+-------+------+---+------------+--------+-------------+---------+----------+------+
//删除某列的空值和NaN
val res=data1.na.drop(Array("gender","yearsmarried"))
// 删除某列的非空且非NaN的低于10的 --注意字段类型
data1.na.drop(10,Array("gender","yearsmarried"))
//填充所有空值[Boolean]的列 --注意字段类型
df.na.fill(false,Array("courseid"))
//填充所有空值的列
val res123=data1.na.fill("wangxiao123")
res123.limit(10).show()
+-------+-----------+---+------------+--------+-------------+---------+----------+-----------+
|affairs| gender|age|yearsmarried|children|religiousness|education|occupation| rating|
+-------+-----------+---+------------+--------+-------------+---------+----------+-----------+
| 0| male| 37| 10| no| 3| 18| 7| 4|
| 0|wangxiao123| 27| wangxiao123| no| 4| 14| 6|wangxiao123|
| 0|wangxiao123| 32| wangxiao123| yes| 1| 12| 1|wangxiao123|
| 0|wangxiao123| 57| wangxiao123| yes| 5| 18| 6|wangxiao123|
| 0|wangxiao123| 22| wangxiao123| no| 2| 17| 6|wangxiao123|
| 0|wangxiao123| 32| wangxiao123| no| 2| 17| 5|wangxiao123|
| 0| female| 22| wangxiao123| no| 2| 12| 1|wangxiao123|
| 0| male| 57| 15| yes| 2| 14| 4| 4|
| 0| female| 32| 15| yes| 4| 16| 1| 2|
| 0| male| 22| 1.5| no| 4| 14| 4| 5|
+-------+-----------+---+------------+--------+-------------+---------+----------+-----------+
//对指定列的控制进行填充 -- 多列同值
df1.na.fill(123456,cols = Array("courseid","pointlistid")).show(false)
+---------+--------+-------+-----+-----------+--------+----+
|chapterid|courseid|majorid|money|pointlistid|dt |dn |
+---------+--------+-------+-----+-----------+--------+----+
|4 |123456 |5 |100 |3 |20190722|webA|
|7 |123456 |7 |100 |1 |20190722|webA|
|8 |123456 |3 | |8 |20190722|webA|
|5 |14 |3 |100 |123456 |20190722|webA|
|4 |15 |2 |100 |3 |20190722|webA|
|9 |123456 |8 |100 |7 |20190722|webA|
|7 |17 |7 |100 |123456 |20190722|webA|
|0 |18 |9 | |7 |20190722|webA|
|5 |123456 |8 |100 |4 |20190722|webA|
|4 |20 |1 |100 |123456 |20190722|webA|
|4 |123456 |5 |100 |1 |20190722|webA|
|0 |22 |3 |100 |9 |20190722|webA|
|1 |123456 |8 |100 |0 |20190722|webA|
|4 |24 |0 |100 |5 |20190722|webA|
|9 |123456 |9 |100 |0 |20190722|webA|
+---------+--------+-------+-----+-----------+--------+----+
//对指定列的控制进行填充 -- 多列不同值
df1.na.fill(Map("courseid"->123456,"pointlistid"->654321)).show(false)
+---------+--------+-------+-----+-----------+--------+----+
|chapterid|courseid|majorid|money|pointlistid|dt |dn |
+---------+--------+-------+-----+-----------+--------+----+
|4 |123456 |5 |100 |3 |20190722|webA|
|7 |123456 |7 |100 |1 |20190722|webA|
|8 |123456 |3 | |8 |20190722|webA|
|5 |14 |3 |100 |654321 |20190722|webA|
|4 |15 |2 |100 |3 |20190722|webA|
|9 |123456 |8 |100 |7 |20190722|webA|
|7 |17 |7 |100 |654321 |20190722|webA|
|0 |18 |9 | |7 |20190722|webA|
|5 |123456 |8 |100 |4 |20190722|webA|
|4 |20 |1 |100 |654321 |20190722|webA|
|4 |123456 |5 |100 |1 |20190722|webA|
|0 |22 |3 |100 |9 |20190722|webA|
|1 |123456 |8 |100 |0 |20190722|webA|
|4 |24 |0 |100 |5 |20190722|webA|
|9 |123456 |9 |100 |0 |20190722|webA|
+---------+--------+-------+-----+-----------+--------+----+
//查询空值列
data1.filter("gender is null").select("gender").limit(10).show
+------+
|gender|
+------+
| null|
| null|
| null|
| null|
| null|
+------+
data1.filter("gender is not null").select("gender").limit(10).show
+------+
|gender|
+------+
| male|
|female|
| male|
|female|
| male|
| male|
| male|
| male|
|female|
|female|
+------+
data1.filter( data1("gender").isNull ).select("gender").limit(10).show
+------+
|gender|
+------+
| null|
| null|
| null|
| null|
| null|
+------+
data1.filter("gender<>''").select("gender").limit(10).show
+------+
|gender|
+------+
| male|
|female|
| male|
|female|
| male|
| male|
| male|
| male|
|female|
|female|
+------+
边栏推荐
- Spark002---spark任务提交,传入json作为参数
- 如何解决Visual Stuido2019 30天体验期过后的登陆问题
- MFC 线程AfxBeginThread基本用法,传多个参数
- As methods for viewing and excluding dependencies
- 35 quick format code
- 37 element mode (inline element, block element, inline block element)
- Stored procedure bias of SQL to LINQ
- 32 use of chrome debugging tools
- Melody + realsense d435i configuration and error resolution
- IP address classification, which determines whether a network segment is a subnet supernetwork
猜你喜欢

MySQL之事务与MVCC

6线SPI传输模式探索

Nacos2.1.0 cluster construction

outline和box-shadow实现外轮廓圆角高光效果

《三子棋》C语言数组应用 --n皇后问题雏形

"How to use" decorator mode

System.AccessViolationException: 尝试读取或写入受保护的内存。这通常指示其他内存已损坏

Yarn: the file yarn.ps1 cannot be loaded because running scripts is prohibited on this system.

【微信小程序】小程序宿主环境详解

45padding won't open the box
随机推荐
oracle_ 12505 error resolution
6线SPI传输模式探索
MeanShift聚类-01原理分析
Bridge NF call ip6tables is an unknown key exception handling
转载----如何阅读代码?
pkg_resources动态加载插件
LeetCode_ String_ Medium_ 151. Reverse the words in the string
Unable to start web server when Nacos starts
I2C device driver hierarchy
27 选择器的分类
Promise对象与宏任务、微任务
Detailed explanation of lio-sam operation process and code
How to realize a correct double check locking
VS2010 add WAP mobile form template
BigDecimal rounds the data
43 box model
Spark002---spark任务提交,传入json作为参数
[C题目]力扣876. 链表的中间结点
The implementation process of inheritance and the difference between Es5 and ES6 implementation
Pl/sql creates and executes ORALCE stored procedures and returns the result set