当前位置:网站首页>Pyspark changes the column order and saves it into iceberg database
Pyspark changes the column order and saves it into iceberg database
2022-07-28 08:34:00 【Lu Xinhang】
Create an environment , Appoint catalog
def get_spark():
os.environ.setdefault('HADOOP_USER_NAME', 'root')
# total size of serialized results of tasks is bigger than spark.driver.maxResultSize
# ERROR DataWritingSparkTask: Aborting commit for partition 2 (task 2, attempt 0, stage 0.0) Out of memory
spark = SparkSession.builder \
.config('spark.sql.debug.maxToStringFields', 2000) \
.config('spark.debug.maxToStringFields', 2000) \
.config('spark.driver.memory', '16g') \
.config('spark.executor.memory', '16g') \
.config('spark.driver.maxResultSize', '4g') \
.config('spark.network.timeout', 180) \
.getOrCreate()
spark.conf.set("spark.sql.catalog.iceberg",
"org.apache.iceberg.spark.SparkCatalog")
spark.conf.set("spark.sql.catalog.iceberg.type", "hive")
spark.conf.set("spark.sql.catalog.iceberg.uri",
"thrift://192.168.x.xx:9083")
spark.conf.set("spark.sql.session.timeZone", "GMT+8")
spark.conf.set("spark.sql.iceberg.handle-timestamp-without-timezone", True)
return spark
Column A B To B A Store in database , If you do not change the order, you will report wrong columns and mismatches
1.spark dataframe Create a temporary view , And then to spark dataframe
spark = get_Spark()
spark_df = spark.sql('select A from ***')
spark_df = spark_df.withColumn("B", )
spark_df.createOrReplaceTempView("test")
DF = spark.sql("select B A from test")
DF.write.insertInto('xxx', True)
2. establish schema( Not recommended )
2.1 direct spark.createDataFrame(df) There may be a mistake schema ValueError: Some of types cannot be determined after inferring
There are fields spark Its type cannot be inferred ,
2.2 Specify the type of each field
Data conversion may report errors IntegerType can not accept object 2.0 in type <class 'float'>
schema = StructType([
StructField("xx", StringType(), True),
StructField("xx", IntegerType(), True)
])
边栏推荐
- 2022 Niuke multi school second problem solving Report
- Three different numbers with 0 in leetcode/ array
- 49-OpenCv深入分析轮廓
- 单片机IO口控制12V电压通断,MOS和三极管电路
- 业务数字化飞速奔跑,管理数字化亟待出发
- How to build the protection system of class protection technology of 2022 series of ISO compliance (Part I)
- Unity中队列(Queue)的简单使用
- 学术界爆火的类脑智能,啥时候能落地?来听行业大咖怎么说丨量子位·对撞派 x 时识科技...
- Chairman tree review
- EMC EMI磁珠的特性
猜你喜欢

本人男,27岁技术经理,收入太高,心头慌得一比

A group of South University students rely on science and technology to go to sea, with an annual income of 1billion

单片机IO口控制12V电压通断,MOS和三极管电路

5张图告诉你:同样是职场人,差距怎么这么大?

数字签名和CA证书

Can‘t connect to server on ‘IP‘ (60)

Allure use

Can the variable modified by final be modified

EMC EMI磁珠的特性

UE4 engine customizes screenpass and MRT output
随机推荐
Allure use
EMC EMI磁珠的特性
解决EMC、EMI传导干扰的八大方法
2022 Niuke multi school first problem solving Report
[leetcode] 24. Exchange nodes in the linked list in pairs
Usage of qgroupbox
GBase 8a MPP与银河麒麟(x86版)完成深度适配
sparksql 与flinksql 建表 与 连表记录
Change the dataDir path after mysql8.0.16 installation
Common solutions for distributed ID - take one
Meituan Er Mian: why does redis have sentinels?
JS cartoon English alphabet typing game source code
What happens when you unplug the power? Gaussdb (for redis) dual life keeps you prepared
When unity switches to another scene, he finds that the scene is dimmed
[mindspire YiDianTong robot-01] you may have seen many Knowledge Q & A robots, but this is a little different
leetcode/单词长度的最大乘积
opencv+paddle orc 识别图片提取表格信息
网口网络水晶头RJ45、POE接口定义线序
百度智能云九州区县大脑,描绘城乡新蓝图!
Five screens, VR, projection, "Wei Xiaoli" rolled up on the intelligent cockpit