当前位置:网站首页>Solve the problem of too many small files
Solve the problem of too many small files
2022-07-06 09:34:00 【Prism 7】
List of articles
1. Use hive Self contained concatenate command , Automatically merge small files
Usage method :
2. Adjust the parameters to reduce map Number
In execution map Forward small file merge , stay mapper Combine multiple files into one split As input . adjustment split At least the size of
3. Reduce Reduce The number of
reduce The number of output files depends on the number of output files , So you can adjust reduce Control the number of hive The number of files in the table .
4. HAR file
Use hadoop Of archive File small files , Can package multiple small files into one har file
5. jvm reusing
Hadoop The default configuration is usually to use derivation JVM To execute map and Reduce Mission . At this time JVM The startup process can be quite expensive , Especially for execution job There are hundreds of thousands of them task Mission status .JVM Reuse allows JVM Instance in the same job Reuse in N Time .
The downside of this feature is that , Turn on JVM Reuse will always be used to task slot , For reuse , Not released until the mission is complete .
边栏推荐
- Kratos战神微服务框架(三)
- MapReduce工作机制
- Multivariate cluster analysis
- Design and implementation of online snack sales system based on b/s (attached: source code paper SQL file)
- Parameterization of postman
- Minio distributed file storage cluster for full stack development
- [three storage methods of graph] just use adjacency matrix to go out
- Redis core configuration
- AcWing 2456. Notepad
- [daily question] Porter (DFS / DP)
猜你喜欢
Kratos战神微服务框架(二)
【图的三大存储方式】只会用邻接矩阵就out了
基于WEB的网上购物系统的设计与实现(附:源码 论文 sql文件)
Mapreduce实例(八):Map端join
Redis之发布订阅
[three storage methods of graph] just use adjacency matrix to go out
DCDC power ripple test
软件负载均衡和硬件负载均衡的选择
Improved deep embedded clustering with local structure preservation (Idec)
Redis之Geospatial
随机推荐
What is MySQL? What is the learning path of MySQL
Design and implementation of online shopping system based on Web (attached: source code paper SQL file)
Redis' bitmap
[shell script] - archive file script
Activiti7工作流的使用
Global and Chinese market of electronic tubes 2022-2028: Research Report on technology, participants, trends, market size and share
Minio distributed file storage cluster for full stack development
[three storage methods of graph] just use adjacency matrix to go out
【shell脚本】——归档文件脚本
Redis cluster
基于B/S的医院管理住院系统的研究与实现(附:源码 论文 sql文件)
【shell脚本】使用菜单命令构建在集群内创建文件夹的脚本
Meituan Er Mian: why does redis have sentinels?
Global and Chinese markets for small seed seeders 2022-2028: Research Report on technology, participants, trends, market size and share
Chapter 1 :Application of Artificial intelligence in Drug Design:Opportunity and Challenges
五月刷题03——排序
Reids之删除策略
[Chongqing Guangdong education] reference materials for nine lectures on the essence of Marxist Philosophy in Wuhan University
Hard core! One configuration center for 8 classes!
软件负载均衡和硬件负载均衡的选择