当前位置:网站首页>Spark SQL null value, Nan judgment and processing
Spark SQL null value, Nan judgment and processing
2022-07-06 00:28:00 【The south wind knows what I mean】
Spark SQL Null value Null,NaN Judge and deal with
Null and NaN
null It means nothing 、 Nonexistent or invalid object or address reference . It can be converted into 0, It's a global object .null ==false The value returned is false.
undefined It's a global property , Original value undefined. It tells us that some things have no assignment , No definition .undefined Cannot convert to any number , So use it in mathematical calculations , The return is NaN.
val d: Double = math.sqrt(-1.0)
println(d)
val n: Boolean = math.sqrt(-1.0).isNaN
println(n)
Spark SQL Null value Null,NaN Judge and deal with
val df: DataFrame = session.sql(
s""" |select * from sparktuning.course_pay1 |""".stripMargin)
// Delete null values and... For all columns NaN
val resNull=data1.na.drop()
resNull.limit(10).show()
+-------+------+---+------------+--------+-------------+---------+----------+------+
|affairs|gender|age|yearsmarried|children|religiousness|education|occupation|rating|
+-------+------+---+------------+--------+-------------+---------+----------+------+
| 0| male| 37| 10| no| 3| 18| 7| 4|
| 0| male| 57| 15| yes| 2| 14| 4| 4|
| 0|female| 32| 15| yes| 4| 16| 1| 2|
| 0| male| 22| 1.5| no| 4| 14| 4| 5|
| 0| male| 37| 15| yes| 2| 20| 7| 2|
| 0| male| 27| 4| yes| 4| 18| 6| 4|
| 0| male| 47| 15| yes| 5| 17| 6| 4|
| 0|female| 22| 1.5| no| 2| 17| 5| 4|
| 0|female| 27| 4| no| 4| 14| 5| 4|
| 0|female| 37| 15| yes| 1| 17| 5| 5|
+-------+------+---+------------+--------+-------------+---------+----------+------+
// Delete the null value and... Of a column NaN
val res=data1.na.drop(Array("gender","yearsmarried"))
// Delete a column that is not empty and not NaN Below 10 Of -- Note the field type
data1.na.drop(10,Array("gender","yearsmarried"))
// Fill in all null values [Boolean] The column of -- Note the field type
df.na.fill(false,Array("courseid"))
// Fill in all null Columns
val res123=data1.na.fill("wangxiao123")
res123.limit(10).show()
+-------+-----------+---+------------+--------+-------------+---------+----------+-----------+
|affairs| gender|age|yearsmarried|children|religiousness|education|occupation| rating|
+-------+-----------+---+------------+--------+-------------+---------+----------+-----------+
| 0| male| 37| 10| no| 3| 18| 7| 4|
| 0|wangxiao123| 27| wangxiao123| no| 4| 14| 6|wangxiao123|
| 0|wangxiao123| 32| wangxiao123| yes| 1| 12| 1|wangxiao123|
| 0|wangxiao123| 57| wangxiao123| yes| 5| 18| 6|wangxiao123|
| 0|wangxiao123| 22| wangxiao123| no| 2| 17| 6|wangxiao123|
| 0|wangxiao123| 32| wangxiao123| no| 2| 17| 5|wangxiao123|
| 0| female| 22| wangxiao123| no| 2| 12| 1|wangxiao123|
| 0| male| 57| 15| yes| 2| 14| 4| 4|
| 0| female| 32| 15| yes| 4| 16| 1| 2|
| 0| male| 22| 1.5| no| 4| 14| 4| 5|
+-------+-----------+---+------------+--------+-------------+---------+----------+-----------+
// Fill in the control of the specified column -- Multiple columns with the same value
df1.na.fill(123456,cols = Array("courseid","pointlistid")).show(false)
+---------+--------+-------+-----+-----------+--------+----+
|chapterid|courseid|majorid|money|pointlistid|dt |dn |
+---------+--------+-------+-----+-----------+--------+----+
|4 |123456 |5 |100 |3 |20190722|webA|
|7 |123456 |7 |100 |1 |20190722|webA|
|8 |123456 |3 | |8 |20190722|webA|
|5 |14 |3 |100 |123456 |20190722|webA|
|4 |15 |2 |100 |3 |20190722|webA|
|9 |123456 |8 |100 |7 |20190722|webA|
|7 |17 |7 |100 |123456 |20190722|webA|
|0 |18 |9 | |7 |20190722|webA|
|5 |123456 |8 |100 |4 |20190722|webA|
|4 |20 |1 |100 |123456 |20190722|webA|
|4 |123456 |5 |100 |1 |20190722|webA|
|0 |22 |3 |100 |9 |20190722|webA|
|1 |123456 |8 |100 |0 |20190722|webA|
|4 |24 |0 |100 |5 |20190722|webA|
|9 |123456 |9 |100 |0 |20190722|webA|
+---------+--------+-------+-----+-----------+--------+----+
// Fill in the control of the specified column -- Multiple columns of different values
df1.na.fill(Map("courseid"->123456,"pointlistid"->654321)).show(false)
+---------+--------+-------+-----+-----------+--------+----+
|chapterid|courseid|majorid|money|pointlistid|dt |dn |
+---------+--------+-------+-----+-----------+--------+----+
|4 |123456 |5 |100 |3 |20190722|webA|
|7 |123456 |7 |100 |1 |20190722|webA|
|8 |123456 |3 | |8 |20190722|webA|
|5 |14 |3 |100 |654321 |20190722|webA|
|4 |15 |2 |100 |3 |20190722|webA|
|9 |123456 |8 |100 |7 |20190722|webA|
|7 |17 |7 |100 |654321 |20190722|webA|
|0 |18 |9 | |7 |20190722|webA|
|5 |123456 |8 |100 |4 |20190722|webA|
|4 |20 |1 |100 |654321 |20190722|webA|
|4 |123456 |5 |100 |1 |20190722|webA|
|0 |22 |3 |100 |9 |20190722|webA|
|1 |123456 |8 |100 |0 |20190722|webA|
|4 |24 |0 |100 |5 |20190722|webA|
|9 |123456 |9 |100 |0 |20190722|webA|
+---------+--------+-------+-----+-----------+--------+----+
// Query null column
data1.filter("gender is null").select("gender").limit(10).show
+------+
|gender|
+------+
| null|
| null|
| null|
| null|
| null|
+------+
data1.filter("gender is not null").select("gender").limit(10).show
+------+
|gender|
+------+
| male|
|female|
| male|
|female|
| male|
| male|
| male|
| male|
|female|
|female|
+------+
data1.filter( data1("gender").isNull ).select("gender").limit(10).show
+------+
|gender|
+------+
| null|
| null|
| null|
| null|
| null|
+------+
data1.filter("gender<>''").select("gender").limit(10).show
+------+
|gender|
+------+
| male|
|female|
| male|
|female|
| male|
| male|
| male|
| male|
|female|
|female|
+------+
边栏推荐
- 《编程之美》读书笔记
- [Chongqing Guangdong education] reference materials for Zhengzhou Vocational College of finance, taxation and finance to play around the E-era
- Knowledge about the memory size occupied by the structure
- XML配置文件
- 7.5 decorator
- 时间戳的拓展及应用实例
- Global and Chinese markets for pressure and temperature sensors 2022-2028: Research Report on technology, participants, trends, market size and share
- Global and Chinese markets of POM plastic gears 2022-2028: Research Report on technology, participants, trends, market size and share
- 【线上小工具】开发过程中会用到的线上小工具合集
- Shardingsphere source code analysis
猜你喜欢
LeetCode 1189. Maximum number of "balloons"
Priority queue (heap)
Key structure of ffmpeg - avframe
wx. Getlocation (object object) application method, latest version
Go learning --- structure to map[string]interface{}
Choose to pay tribute to the spirit behind continuous struggle -- Dialogue will values [Issue 4]
关于slmgr命令的那些事
电机的简介
Spark AQE
XML配置文件
随机推荐
DEJA_VU3D - Cesium功能集 之 055-国内外各厂商地图服务地址汇总说明
Global and Chinese market of water heater expansion tank 2022-2028: Research Report on technology, participants, trends, market size and share
关于slmgr命令的那些事
小程序技术优势与产业互联网相结合的分析
如何制作自己的机器人
QT -- thread
Leetcode Fibonacci sequence
Multithreading and high concurrency (8) -- summarize AQS shared lock from countdownlatch (punch in for the third anniversary)
提升工作效率工具:SQL批量生成工具思想
An understanding of & array names
MySQL functions
FFMPEG关键结构体——AVFrame
Model analysis of establishment time and holding time
XML Configuration File
[designmode] adapter pattern
anconda下载+添加清华+tensorflow 安装+No module named ‘tensorflow‘+KernelRestarter: restart failed,内核重启失败
《编程之美》读书笔记
Spark-SQL UDF函数
免费的聊天机器人API
NSSA area where OSPF is configured for Huawei equipment