当前位置:网站首页>Spark SQL null value, Nan judgment and processing
Spark SQL null value, Nan judgment and processing
2022-07-06 00:28:00 【The south wind knows what I mean】
Spark SQL Null value Null,NaN Judge and deal with
Null and NaN
null It means nothing 、 Nonexistent or invalid object or address reference . It can be converted into 0, It's a global object .null ==false The value returned is false.
undefined It's a global property , Original value undefined. It tells us that some things have no assignment , No definition .undefined Cannot convert to any number , So use it in mathematical calculations , The return is NaN.
val d: Double = math.sqrt(-1.0)
println(d)
val n: Boolean = math.sqrt(-1.0).isNaN
println(n)
Spark SQL Null value Null,NaN Judge and deal with
val df: DataFrame = session.sql(
s""" |select * from sparktuning.course_pay1 |""".stripMargin)
// Delete null values and... For all columns NaN
val resNull=data1.na.drop()
resNull.limit(10).show()
+-------+------+---+------------+--------+-------------+---------+----------+------+
|affairs|gender|age|yearsmarried|children|religiousness|education|occupation|rating|
+-------+------+---+------------+--------+-------------+---------+----------+------+
| 0| male| 37| 10| no| 3| 18| 7| 4|
| 0| male| 57| 15| yes| 2| 14| 4| 4|
| 0|female| 32| 15| yes| 4| 16| 1| 2|
| 0| male| 22| 1.5| no| 4| 14| 4| 5|
| 0| male| 37| 15| yes| 2| 20| 7| 2|
| 0| male| 27| 4| yes| 4| 18| 6| 4|
| 0| male| 47| 15| yes| 5| 17| 6| 4|
| 0|female| 22| 1.5| no| 2| 17| 5| 4|
| 0|female| 27| 4| no| 4| 14| 5| 4|
| 0|female| 37| 15| yes| 1| 17| 5| 5|
+-------+------+---+------------+--------+-------------+---------+----------+------+
// Delete the null value and... Of a column NaN
val res=data1.na.drop(Array("gender","yearsmarried"))
// Delete a column that is not empty and not NaN Below 10 Of -- Note the field type
data1.na.drop(10,Array("gender","yearsmarried"))
// Fill in all null values [Boolean] The column of -- Note the field type
df.na.fill(false,Array("courseid"))
// Fill in all null Columns
val res123=data1.na.fill("wangxiao123")
res123.limit(10).show()
+-------+-----------+---+------------+--------+-------------+---------+----------+-----------+
|affairs| gender|age|yearsmarried|children|religiousness|education|occupation| rating|
+-------+-----------+---+------------+--------+-------------+---------+----------+-----------+
| 0| male| 37| 10| no| 3| 18| 7| 4|
| 0|wangxiao123| 27| wangxiao123| no| 4| 14| 6|wangxiao123|
| 0|wangxiao123| 32| wangxiao123| yes| 1| 12| 1|wangxiao123|
| 0|wangxiao123| 57| wangxiao123| yes| 5| 18| 6|wangxiao123|
| 0|wangxiao123| 22| wangxiao123| no| 2| 17| 6|wangxiao123|
| 0|wangxiao123| 32| wangxiao123| no| 2| 17| 5|wangxiao123|
| 0| female| 22| wangxiao123| no| 2| 12| 1|wangxiao123|
| 0| male| 57| 15| yes| 2| 14| 4| 4|
| 0| female| 32| 15| yes| 4| 16| 1| 2|
| 0| male| 22| 1.5| no| 4| 14| 4| 5|
+-------+-----------+---+------------+--------+-------------+---------+----------+-----------+
// Fill in the control of the specified column -- Multiple columns with the same value
df1.na.fill(123456,cols = Array("courseid","pointlistid")).show(false)
+---------+--------+-------+-----+-----------+--------+----+
|chapterid|courseid|majorid|money|pointlistid|dt |dn |
+---------+--------+-------+-----+-----------+--------+----+
|4 |123456 |5 |100 |3 |20190722|webA|
|7 |123456 |7 |100 |1 |20190722|webA|
|8 |123456 |3 | |8 |20190722|webA|
|5 |14 |3 |100 |123456 |20190722|webA|
|4 |15 |2 |100 |3 |20190722|webA|
|9 |123456 |8 |100 |7 |20190722|webA|
|7 |17 |7 |100 |123456 |20190722|webA|
|0 |18 |9 | |7 |20190722|webA|
|5 |123456 |8 |100 |4 |20190722|webA|
|4 |20 |1 |100 |123456 |20190722|webA|
|4 |123456 |5 |100 |1 |20190722|webA|
|0 |22 |3 |100 |9 |20190722|webA|
|1 |123456 |8 |100 |0 |20190722|webA|
|4 |24 |0 |100 |5 |20190722|webA|
|9 |123456 |9 |100 |0 |20190722|webA|
+---------+--------+-------+-----+-----------+--------+----+
// Fill in the control of the specified column -- Multiple columns of different values
df1.na.fill(Map("courseid"->123456,"pointlistid"->654321)).show(false)
+---------+--------+-------+-----+-----------+--------+----+
|chapterid|courseid|majorid|money|pointlistid|dt |dn |
+---------+--------+-------+-----+-----------+--------+----+
|4 |123456 |5 |100 |3 |20190722|webA|
|7 |123456 |7 |100 |1 |20190722|webA|
|8 |123456 |3 | |8 |20190722|webA|
|5 |14 |3 |100 |654321 |20190722|webA|
|4 |15 |2 |100 |3 |20190722|webA|
|9 |123456 |8 |100 |7 |20190722|webA|
|7 |17 |7 |100 |654321 |20190722|webA|
|0 |18 |9 | |7 |20190722|webA|
|5 |123456 |8 |100 |4 |20190722|webA|
|4 |20 |1 |100 |654321 |20190722|webA|
|4 |123456 |5 |100 |1 |20190722|webA|
|0 |22 |3 |100 |9 |20190722|webA|
|1 |123456 |8 |100 |0 |20190722|webA|
|4 |24 |0 |100 |5 |20190722|webA|
|9 |123456 |9 |100 |0 |20190722|webA|
+---------+--------+-------+-----+-----------+--------+----+
// Query null column
data1.filter("gender is null").select("gender").limit(10).show
+------+
|gender|
+------+
| null|
| null|
| null|
| null|
| null|
+------+
data1.filter("gender is not null").select("gender").limit(10).show
+------+
|gender|
+------+
| male|
|female|
| male|
|female|
| male|
| male|
| male|
| male|
|female|
|female|
+------+
data1.filter( data1("gender").isNull ).select("gender").limit(10).show
+------+
|gender|
+------+
| null|
| null|
| null|
| null|
| null|
+------+
data1.filter("gender<>''").select("gender").limit(10).show
+------+
|gender|
+------+
| male|
|female|
| male|
|female|
| male|
| male|
| male|
| male|
|female|
|female|
+------+
边栏推荐
- Ffmpeg learning - core module
- STM32 configuration after chip replacement and possible errors
- uniapp开发,打包成H5部署到服务器
- MySQL global lock and table lock
- Codeforces gr19 D (think more about why the first-hand value range is 100, JLS yyds)
- 7.5 simulation summary
- SQLServer连接数据库读取中文乱码问题解决
- 选择致敬持续奋斗背后的精神——对话威尔价值观【第四期】
- Extension and application of timestamp
- Common API classes and exception systems
猜你喜欢
Tools to improve work efficiency: the idea of SQL batch generation tools
Go learning --- structure to map[string]interface{}
Key structure of ffmpeg -- AVCodecContext
FFMPEG关键结构体——AVFrame
Mysql - CRUD
Configuring OSPF load sharing for Huawei devices
MySQL storage engine
OpenCV经典100题
Knowledge about the memory size occupied by the structure
Anconda download + add Tsinghua +tensorflow installation +no module named 'tensorflow' +kernelrestart: restart failed, kernel restart failed
随机推荐
Room cannot create an SQLite connection to verify the queries
权限问题:source .bash_profile permission denied
Set data real-time update during MDK debug
Yolov5, pychar, Anaconda environment installation
Global and Chinese markets for hinged watertight doors 2022-2028: Research Report on technology, participants, trends, market size and share
There is no network after configuring the agent by capturing packets with Fiddler mobile phones
Uniapp development, packaged as H5 and deployed to the server
Configuring OSPF GR features for Huawei devices
时间戳的拓展及应用实例
Hudi of data Lake (2): Hudi compilation
Key structure of ffmpeg - avformatcontext
Classical concurrency problem: the dining problem of philosophers
提升工作效率工具:SQL批量生成工具思想
FPGA内部硬件结构与代码的关系
Power query data format conversion, Split Merge extraction, delete duplicates, delete errors, transpose and reverse, perspective and reverse perspective
[QT] QT uses qjson to generate JSON files and save them
Global and Chinese market of water heater expansion tank 2022-2028: Research Report on technology, participants, trends, market size and share
Codeforces round 804 (Div. 2) [competition record]
QT -- thread
[Chongqing Guangdong education] Chongqing Engineering Vocational and Technical College