当前位置:网站首页>Pyspark de duplication dropduplicates, distinct; withColumn、lit、col; unionByName、groupBy
Pyspark de duplication dropduplicates, distinct; withColumn、lit、col; unionByName、groupBy
2022-07-02 08:59:00 【loong_ XL】
1、 duplicate removal dropDuplicates、distinct
ff =d.select(['dnum']).dropDuplicates()
ff.count()
ff.show()
fff =d.select(['dnum']).distinct()
2、withColumn、lit、col
withColumn Add a row
lit Specified column
col Select column
import pyspark.sql.functions as F
temp_df = temp_df.withColumn("date", F.lit(target_date))
movie_feature_df = movie_feature_df.withColumn('tags', regexp_replace(col('tags'), "\[", ""))
3、unionByName、groupBy
play_video_df = None
for i in range(args.range):
t = target_date - datetime.timedelta(days=i)
temp_df = spark.sql(
"select * from ***album where year=%s and month=%s and day=%s" % (t.year, t.month, t.day))
temp_df = temp_df.withColumn("date", F.lit(target_date))
if play_video_df == None:
play_video_df = temp_df
else:
play_video_df = play_video_df.unionByName(temp_df)
target_df = play_video_df
target_groupped_movie_df = target_movie_df.groupBy("dnum", "aid").agg(F.max("finish_rate").alias("finish_rate"))
边栏推荐
- Leetcode sword finger offer brush questions - day 23
- Nacos download, start and configure MySQL database
- Multi version concurrency control mvcc of MySQL
- Minecraft插件服开服
- Web技术发展史
- commands out of sync. did you run multiple statements at once
- Service de groupe minecraft
- Hengyuan cloud_ Can aiphacode replace programmers?
- Finishing the interview essentials of secsha system!!!
- NPOI 导出Word 字号对应
猜你喜欢

OpenShift构建镜像

Introduction to the basic concept of queue and typical application examples

Gateway 简单使用

Hcia - Application Layer

kubernetes部署loki日志系统

Nacos 下载启动、配置 MySQL 数据库

Function ‘ngram‘ is not defined

HCIA - application layer

Googlenet network explanation and model building
![[blackmail virus data recovery] suffix Hydra blackmail virus](/img/27/f44334cf98229d0f8b33c70a878ca8.jpg)
[blackmail virus data recovery] suffix Hydra blackmail virus
随机推荐
zipkin 简单使用
Mysql安装时mysqld.exe报`应用程序无法正常启动(0xc000007b)`
HCIA - data link layer
Programmer training, crazy job hunting, overtime ridiculed by colleagues deserve it
C language custom types - structure, bit segment (anonymous structure, self reference of structure, memory alignment of structure)
KubeSphere 虚拟化 KSV 安装体验
Sqli labs level 12
k8s入门:Helm 构建 MySQL
Driving test Baodian and its spokesperson Huang Bo appeared together to call for safe and civilized travel
Getting started with k8s: building MySQL with Helm
Makefile Fundamentals
Oracle related statistics
Honeypot attack and defense drill landing application scheme
C Baidu map, Gaode map, Google map (GPS) longitude and latitude conversion
NPOI 导出Word 字号对应
kubeadm部署kubernetes v1.23.5集群
选择排序和插入排序
Tensorflow2 keras classification model
Minecraft install resource pack
Gocv boundary fill