当前位置:网站首页>Spark SQL null value, Nan judgment and processing
Spark SQL null value, Nan judgment and processing
2022-07-06 00:28:00 【The south wind knows what I mean】
Spark SQL Null value Null,NaN Judge and deal with
Null and NaN
null It means nothing 、 Nonexistent or invalid object or address reference . It can be converted into 0, It's a global object .null ==false The value returned is false.
undefined It's a global property , Original value undefined. It tells us that some things have no assignment , No definition .undefined Cannot convert to any number , So use it in mathematical calculations , The return is NaN.
val d: Double = math.sqrt(-1.0)
println(d)
val n: Boolean = math.sqrt(-1.0).isNaN
println(n)
Spark SQL Null value Null,NaN Judge and deal with
val df: DataFrame = session.sql(
s""" |select * from sparktuning.course_pay1 |""".stripMargin)
// Delete null values and... For all columns NaN
val resNull=data1.na.drop()
resNull.limit(10).show()
+-------+------+---+------------+--------+-------------+---------+----------+------+
|affairs|gender|age|yearsmarried|children|religiousness|education|occupation|rating|
+-------+------+---+------------+--------+-------------+---------+----------+------+
| 0| male| 37| 10| no| 3| 18| 7| 4|
| 0| male| 57| 15| yes| 2| 14| 4| 4|
| 0|female| 32| 15| yes| 4| 16| 1| 2|
| 0| male| 22| 1.5| no| 4| 14| 4| 5|
| 0| male| 37| 15| yes| 2| 20| 7| 2|
| 0| male| 27| 4| yes| 4| 18| 6| 4|
| 0| male| 47| 15| yes| 5| 17| 6| 4|
| 0|female| 22| 1.5| no| 2| 17| 5| 4|
| 0|female| 27| 4| no| 4| 14| 5| 4|
| 0|female| 37| 15| yes| 1| 17| 5| 5|
+-------+------+---+------------+--------+-------------+---------+----------+------+
// Delete the null value and... Of a column NaN
val res=data1.na.drop(Array("gender","yearsmarried"))
// Delete a column that is not empty and not NaN Below 10 Of -- Note the field type
data1.na.drop(10,Array("gender","yearsmarried"))
// Fill in all null values [Boolean] The column of -- Note the field type
df.na.fill(false,Array("courseid"))
// Fill in all null Columns
val res123=data1.na.fill("wangxiao123")
res123.limit(10).show()
+-------+-----------+---+------------+--------+-------------+---------+----------+-----------+
|affairs| gender|age|yearsmarried|children|religiousness|education|occupation| rating|
+-------+-----------+---+------------+--------+-------------+---------+----------+-----------+
| 0| male| 37| 10| no| 3| 18| 7| 4|
| 0|wangxiao123| 27| wangxiao123| no| 4| 14| 6|wangxiao123|
| 0|wangxiao123| 32| wangxiao123| yes| 1| 12| 1|wangxiao123|
| 0|wangxiao123| 57| wangxiao123| yes| 5| 18| 6|wangxiao123|
| 0|wangxiao123| 22| wangxiao123| no| 2| 17| 6|wangxiao123|
| 0|wangxiao123| 32| wangxiao123| no| 2| 17| 5|wangxiao123|
| 0| female| 22| wangxiao123| no| 2| 12| 1|wangxiao123|
| 0| male| 57| 15| yes| 2| 14| 4| 4|
| 0| female| 32| 15| yes| 4| 16| 1| 2|
| 0| male| 22| 1.5| no| 4| 14| 4| 5|
+-------+-----------+---+------------+--------+-------------+---------+----------+-----------+
// Fill in the control of the specified column -- Multiple columns with the same value
df1.na.fill(123456,cols = Array("courseid","pointlistid")).show(false)
+---------+--------+-------+-----+-----------+--------+----+
|chapterid|courseid|majorid|money|pointlistid|dt |dn |
+---------+--------+-------+-----+-----------+--------+----+
|4 |123456 |5 |100 |3 |20190722|webA|
|7 |123456 |7 |100 |1 |20190722|webA|
|8 |123456 |3 | |8 |20190722|webA|
|5 |14 |3 |100 |123456 |20190722|webA|
|4 |15 |2 |100 |3 |20190722|webA|
|9 |123456 |8 |100 |7 |20190722|webA|
|7 |17 |7 |100 |123456 |20190722|webA|
|0 |18 |9 | |7 |20190722|webA|
|5 |123456 |8 |100 |4 |20190722|webA|
|4 |20 |1 |100 |123456 |20190722|webA|
|4 |123456 |5 |100 |1 |20190722|webA|
|0 |22 |3 |100 |9 |20190722|webA|
|1 |123456 |8 |100 |0 |20190722|webA|
|4 |24 |0 |100 |5 |20190722|webA|
|9 |123456 |9 |100 |0 |20190722|webA|
+---------+--------+-------+-----+-----------+--------+----+
// Fill in the control of the specified column -- Multiple columns of different values
df1.na.fill(Map("courseid"->123456,"pointlistid"->654321)).show(false)
+---------+--------+-------+-----+-----------+--------+----+
|chapterid|courseid|majorid|money|pointlistid|dt |dn |
+---------+--------+-------+-----+-----------+--------+----+
|4 |123456 |5 |100 |3 |20190722|webA|
|7 |123456 |7 |100 |1 |20190722|webA|
|8 |123456 |3 | |8 |20190722|webA|
|5 |14 |3 |100 |654321 |20190722|webA|
|4 |15 |2 |100 |3 |20190722|webA|
|9 |123456 |8 |100 |7 |20190722|webA|
|7 |17 |7 |100 |654321 |20190722|webA|
|0 |18 |9 | |7 |20190722|webA|
|5 |123456 |8 |100 |4 |20190722|webA|
|4 |20 |1 |100 |654321 |20190722|webA|
|4 |123456 |5 |100 |1 |20190722|webA|
|0 |22 |3 |100 |9 |20190722|webA|
|1 |123456 |8 |100 |0 |20190722|webA|
|4 |24 |0 |100 |5 |20190722|webA|
|9 |123456 |9 |100 |0 |20190722|webA|
+---------+--------+-------+-----+-----------+--------+----+
// Query null column
data1.filter("gender is null").select("gender").limit(10).show
+------+
|gender|
+------+
| null|
| null|
| null|
| null|
| null|
+------+
data1.filter("gender is not null").select("gender").limit(10).show
+------+
|gender|
+------+
| male|
|female|
| male|
|female|
| male|
| male|
| male|
| male|
|female|
|female|
+------+
data1.filter( data1("gender").isNull ).select("gender").limit(10).show
+------+
|gender|
+------+
| null|
| null|
| null|
| null|
| null|
+------+
data1.filter("gender<>''").select("gender").limit(10).show
+------+
|gender|
+------+
| male|
|female|
| male|
|female|
| male|
| male|
| male|
| male|
|female|
|female|
+------+
边栏推荐
- 电机的简介
- Location based mobile terminal network video exploration app system documents + foreign language translation and original text + guidance records (8 weeks) + PPT + review + project source code
- LeetCode 斐波那契序列
- MySQL functions
- XML配置文件
- OS i/o devices and device controllers
- FPGA内部硬件结构与代码的关系
- Yolov5, pychar, Anaconda environment installation
- LeetCode 1189. Maximum number of "balloons"
- 【EI会议分享】2022年第三届智能制造与自动化前沿国际会议(CFIMA 2022)
猜你喜欢
Recognize the small experiment of extracting and displaying Mel spectrum (observe the difference between different y_axis and x_axis)
Introduction of motor
Transport layer protocol ----- UDP protocol
多线程与高并发(8)—— 从CountDownLatch总结AQS共享锁(三周年打卡)
Date类中日期转成指定字符串出现的问题及解决方法
FFmpeg学习——核心模块
Extracting profile data from profile measurement
Calculate sha256 value of data or file based on crypto++
Configuring OSPF GR features for Huawei devices
N1 # if you work on a metauniverse product [metauniverse · interdisciplinary] Season 2 S2
随机推荐
uniapp开发,打包成H5部署到服务器
FFmpeg学习——核心模块
Extracting profile data from profile measurement
SQLServer连接数据库读取中文乱码问题解决
Spark-SQL UDF函数
Codeforces gr19 D (think more about why the first-hand value range is 100, JLS yyds)
MySql——CRUD
[binary search tree] add, delete, modify and query function code implementation
Start from the bottom structure and learn the introduction of fpga---fifo IP core and its key parameters
STM32 configuration after chip replacement and possible errors
Single source shortest path exercise (I)
Hudi of data Lake (1): introduction to Hudi
notepad++正则表达式替换字符串
[EI conference sharing] the Third International Conference on intelligent manufacturing and automation frontier in 2022 (cfima 2022)
Huawei equipment configuration ospf-bgp linkage
权限问题:source .bash_profile permission denied
MySQL storage engine
关于slmgr命令的那些事
LeetCode 6005. The minimum operand to make an array an alternating array
STM32按键消抖——入门状态机思维