当前位置:网站首页>Small file special
Small file special
2022-07-03 10:47:00 【Samooyou】
Hive Small file merge parameters ;
Spark Small file merging ideas :
Adoption community SPARK-24940 How to deal with , With the help of SQL hint Way to merge small files .
|
Add auto merge small file result file .
- The user side : When spark.sql.shuffle.partitions The setting is relatively large and the result data set is relatively small , There's a lot of small files , newly added spark.sql.result.partitions Parameter to control the number of final output files .
- Platform side : Trigger small file detection when the data falls into the disk , stay InsertIntoHiveTable If small file merge is enabled , And the average size of the file is lower than the threshold, the merge is performed , Do it after merging loadTable perhaps loadPartition operation .( The platform side is enabled by default )
Dynamic setting Shuffle Partition.
Spark Adaptive Execution Function support Shuffle Operate downstream Stage According to the upstream Stage Produced Shuffle Data volume automatically adjusts downstream Stage Of Task Count , namely Shuffle Read Multiple small files Partition hand
边栏推荐
猜你喜欢
Jupiter notebook changing font style and font size
DAY 7 小练习
MySQL reports an error "expression 1 of select list is not in group by claim and contains nonaggre" solution
ThreadLocal原理及使用场景
深度学习入门之线性回归(PyTorch)
大型电商项目-环境搭建
Softmax 回归(PyTorch)
深度学习入门之自动求导(Pytorch)
Model selection for neural network introduction (pytorch)
多层感知机(PyTorch)
随机推荐
Leetcode skimming ---977
Leetcode skimming ---189
Leetcode刷题---44
Common scenarios in which Seata distributed transactions fail and do not take effect (transactions do not rollback)
Leetcode skimming ---1
项目组织战略管理
Knowledge map reasoning -- hybrid neural network and distributed representation reasoning
Ind kwf first week
【吐槽&脑洞】关于逛B站时偶然体验的弹幕互动游戏魏蜀吴三国争霸游戏的一些思考
Windows security center open blank
Ind FHL first week
Leetcode skimming ---44
Ind wks first week
Mysql5.7 installation and configuration tutorial (Graphic ultra detailed version)
熵值法求权重
EFFICIENT PROBABILISTIC LOGIC REASONING WITH GRAPH NEURAL NETWORKS
权重衰退(PyTorch)
Leetcode刷题---10
Leetcode刷题---832
Unity learning notes: personal learning project "crazy genius Edgar" error correction document