当前位置:网站首页>Pyspark de duplication dropduplicates, distinct; withColumn、lit、col; unionByName、groupBy
Pyspark de duplication dropduplicates, distinct; withColumn、lit、col; unionByName、groupBy
2022-07-02 08:59:00 【loong_ XL】
1、 duplicate removal dropDuplicates、distinct
ff =d.select(['dnum']).dropDuplicates()
ff.count()
ff.show()
fff =d.select(['dnum']).distinct()
2、withColumn、lit、col
withColumn Add a row
lit Specified column
col Select column
import pyspark.sql.functions as F
temp_df = temp_df.withColumn("date", F.lit(target_date))
movie_feature_df = movie_feature_df.withColumn('tags', regexp_replace(col('tags'), "\[", ""))
3、unionByName、groupBy
play_video_df = None
for i in range(args.range):
t = target_date - datetime.timedelta(days=i)
temp_df = spark.sql(
"select * from ***album where year=%s and month=%s and day=%s" % (t.year, t.month, t.day))
temp_df = temp_df.withColumn("date", F.lit(target_date))
if play_video_df == None:
play_video_df = temp_df
else:
play_video_df = play_video_df.unionByName(temp_df)
target_df = play_video_df
target_groupped_movie_df = target_movie_df.groupBy("dnum", "aid").agg(F.max("finish_rate").alias("finish_rate"))
边栏推荐
- Gocv image cutting and display
- Openshift deployment application
- Dip1000 runaway
- Linux安装Oracle Database 19c
- OpenShift 部署应用
- Sentinel easy to use
- gocv opencv exit status 3221225785
- C language custom type enumeration, Union (clever use of enumeration, calculation of union size)
- C# 将网页保存为图片(利用WebBrowser)
- Loadbalancer dynamically refreshes Nacos server
猜你喜欢
Driving test Baodian and its spokesperson Huang Bo appeared together to call for safe and civilized travel
Mysql安装时mysqld.exe报`应用程序无法正常启动(0xc000007b)`
Chrome debugging
Illegal use of crawlers, an Internet company was terminated, the police came to the door, and 23 people were taken away
Data asset management function
Sqli labs Level 2
Web security -- Logical ultra vires
Sentinel easy to use
ARP and ARP Spoofing
OpenFeign 简单使用
随机推荐
Pclpy projection filter -- projection of point cloud to cylinder
Minecraft空岛服开服
Shortcut key to comment code and cancel code in idea
Minecraft module service opening
Essay: RGB image color separation (with code)
First week of JS study
Image transformation, transpose
一、Qt的核心类QObject
QT drag event
Npoi export word font size correspondence
HackTheBox-Gunship
Dip1000 implicitly tagged with fields
Solid principle: explanation and examples
Makefile Fundamentals
File upload Labs
Minecraft安装资源包
Function ‘ngram‘ is not defined
kubeadm部署kubernetes v1.23.5集群
选择排序和插入排序
Sqli labs level 1