当前位置:网站首页>解决小文件处过多
解决小文件处过多
2022-07-06 09:01:00 【棱镜7】
1. 使用hive自带的concatenate命令,自动合并小文件
使用方法:
2. 调整参数减少map数量
在执行map前进行小文件合并,在 mapper 中将多个文件合成一个 split 作为输入。调整split 的至少大小
3. 减少Reduce的数量
reduce 的个数决定了输出的文件的个数,所以可以调整 reduce 的个数控制 hive表的文件数量。
4. HAR归档
使用hadoop的archive将小文件归档,能够将多个小文件打包成一个har文件
5. jvm重用
Hadoop的默认配置通常是使用派生JVM来执行map和Reduce任务的。这时JVM的启动过程可能会造成相当大的开销,尤其是执行的job包含有成百上千task任务的情况。JVM重用可以使得JVM实例在同一个job中重新使用N次。
这个功能的缺点是,开启JVM重用将一直占用使用到的task插槽,以便进行重用,直到任务完成后才能释放。
边栏推荐
- What is an R-value reference and what is the difference between it and an l-value?
- Redis cluster
- Heap (priority queue) topic
- 【shell脚本】使用菜单命令构建在集群内创建文件夹的脚本
- The carousel component of ant design calls prev and next methods in TS (typescript) environment
- Improved deep embedded clustering with local structure preservation (Idec)
- [oc]- < getting started with UI> -- common controls - prompt dialog box and wait for the prompt (circle)
- Servlet learning diary 8 - servlet life cycle and thread safety
- I-BERT
- Intel distiller Toolkit - Quantitative implementation 1
猜你喜欢
SimCLR:NLP中的对比学习
Selenium+pytest automated test framework practice (Part 2)
LeetCode41——First Missing Positive——hashing in place & swap
Advanced Computer Network Review(4)——Congestion Control of MPTCP
Servlet learning diary 8 - servlet life cycle and thread safety
一改测试步骤代码就全写 为什么不试试用 Yaml实现数据驱动?
Redis cluster
Activiti7工作流的使用
Reids之删除策略
Sqlmap installation tutorial and problem explanation under Windows Environment -- "sqlmap installation | CSDN creation punch in"
随机推荐
Sqlmap installation tutorial and problem explanation under Windows Environment -- "sqlmap installation | CSDN creation punch in"
Pytest参数化你不知道的一些使用技巧 /你不知道的pytest
Selenium+Pytest自动化测试框架实战
Blue Bridge Cup_ Single chip microcomputer_ PWM output
[daily question] Porter (DFS / DP)
基于B/S的网上零食销售系统的设计与实现(附:源码 论文 Sql文件)
Minio distributed file storage cluster for full stack development
Redis cluster
Kratos战神微服务框架(二)
Global and Chinese market of electronic tubes 2022-2028: Research Report on technology, participants, trends, market size and share
Redis cluster
QML control type: menu
Appears when importing MySQL
KDD 2022 paper collection (under continuous update)
Servlet learning diary 8 - servlet life cycle and thread safety
LeetCode41——First Missing Positive——hashing in place & swap
Redis之核心配置
Global and Chinese market of capacitive displacement sensors 2022-2028: Research Report on technology, participants, trends, market size and share
Post training quantification of bminf
Leetcode problem solving 2.1.1