当前位置:网站首页>解决小文件处过多
解决小文件处过多
2022-07-06 09:01:00 【棱镜7】
1. 使用hive自带的concatenate命令,自动合并小文件
使用方法:
2. 调整参数减少map数量
在执行map前进行小文件合并,在 mapper 中将多个文件合成一个 split 作为输入。调整split 的至少大小
3. 减少Reduce的数量
reduce 的个数决定了输出的文件的个数,所以可以调整 reduce 的个数控制 hive表的文件数量。
4. HAR归档
使用hadoop的archive将小文件归档,能够将多个小文件打包成一个har文件
5. jvm重用
Hadoop的默认配置通常是使用派生JVM来执行map和Reduce任务的。这时JVM的启动过程可能会造成相当大的开销,尤其是执行的job包含有成百上千task任务的情况。JVM重用可以使得JVM实例在同一个job中重新使用N次。
这个功能的缺点是,开启JVM重用将一直占用使用到的task插槽,以便进行重用,直到任务完成后才能释放。
边栏推荐
- Detailed explanation of cookies and sessions
- CUDA implementation of self defined convolution attention operator
- Global and Chinese market of airport kiosks 2022-2028: Research Report on technology, participants, trends, market size and share
- 【文本生成】论文合集推荐丨 斯坦福研究者引入时间控制方法 长文本生成更流畅
- QML control type: Popup
- [oc foundation framework] - < copy object copy >
- IDS cache preheating, avalanche, penetration
- Pytest parameterization some tips you don't know / pytest you don't know
- Mise en œuvre de la quantification post - formation du bminf
- What is an R-value reference and what is the difference between it and an l-value?
猜你喜欢
Blue Bridge Cup_ Single chip microcomputer_ Measure the frequency of 555
Advanced Computer Network Review(3)——BBR
Advanced Computer Network Review(4)——Congestion Control of MPTCP
基于WEB的网上购物系统的设计与实现(附:源码 论文 sql文件)
Solve the problem of inconsistency between database field name and entity class attribute name (resultmap result set mapping)
[OC foundation framework] - [set array]
Pytest's collection use case rules and running specified use cases
Detailed explanation of cookies and sessions
Redis之持久化实操(Linux版)
A convolution substitution of attention mechanism
随机推荐
Le modèle sentinelle de redis
Redis之cluster集群
Minio distributed file storage cluster for full stack development
Global and Chinese market of appointment reminder software 2022-2028: Research Report on technology, participants, trends, market size and share
The five basic data structures of redis are in-depth and application scenarios
Mathematical modeling 2004b question (transmission problem)
在QWidget上实现窗口阻塞
Redis之持久化实操(Linux版)
Digital people anchor 618 sign language with goods, convenient for 27.8 million people with hearing impairment
requests的深入刨析及封装调用
Redis之Bitmap
Pytest参数化你不知道的一些使用技巧 /你不知道的pytest
Redis之Lua脚本
Detailed explanation of cookies and sessions
Advanced Computer Network Review(3)——BBR
Servlet learning diary 7 -- servlet forwarding and redirection
[three storage methods of graph] just use adjacency matrix to go out
Activiti7工作流的使用
CUDA implementation of self defined convolution attention operator
[OC foundation framework] - string and date and time >