当前位置:网站首页>解决小文件处过多
解决小文件处过多
2022-07-06 09:01:00 【棱镜7】
1. 使用hive自带的concatenate命令,自动合并小文件
使用方法:
2. 调整参数减少map数量
在执行map前进行小文件合并,在 mapper 中将多个文件合成一个 split 作为输入。调整split 的至少大小
3. 减少Reduce的数量
reduce 的个数决定了输出的文件的个数,所以可以调整 reduce 的个数控制 hive表的文件数量。
4. HAR归档
使用hadoop的archive将小文件归档,能够将多个小文件打包成一个har文件
5. jvm重用
Hadoop的默认配置通常是使用派生JVM来执行map和Reduce任务的。这时JVM的启动过程可能会造成相当大的开销,尤其是执行的job包含有成百上千task任务的情况。JVM重用可以使得JVM实例在同一个job中重新使用N次。
这个功能的缺点是,开启JVM重用将一直占用使用到的task插槽,以便进行重用,直到任务完成后才能释放。
边栏推荐
猜你喜欢

Pytest之收集用例规则与运行指定用例

Intel distiller Toolkit - Quantitative implementation 2

QML control type: Popup

LeetCode41——First Missing Positive——hashing in place & swap

工作流—activiti7环境搭建

Selenium+pytest automated test framework practice (Part 2)

Intel distiller Toolkit - Quantitative implementation 1

Different data-driven code executes the same test scenario

基于WEB的网上购物系统的设计与实现(附:源码 论文 sql文件)
![[text generation] recommended in the collection of papers - Stanford researchers introduce time control methods to make long text generation more smooth](/img/10/c0545cb34621ad4c6fdb5d26b495ee.jpg)
[text generation] recommended in the collection of papers - Stanford researchers introduce time control methods to make long text generation more smooth
随机推荐
Detailed explanation of cookies and sessions
leetcode-14. Longest common prefix JS longitudinal scanning method
Activiti7工作流的使用
Seven layer network architecture
运维,放过监控-也放过自己吧
[oc]- < getting started with UI> -- common controls uibutton
[oc]- < getting started with UI> -- learning common controls
Simclr: comparative learning in NLP
Design and implementation of online shopping system based on Web (attached: source code paper SQL file)
Kratos ares microservice framework (III)
Advanced Computer Network Review(5)——COPE
[oc]- < getting started with UI> -- common controls - prompt dialog box and wait for the prompt (circle)
使用标签模板解决用户恶意输入的问题
Redis分布式锁实现Redisson 15问
Redis之哨兵模式
What is an R-value reference and what is the difference between it and an l-value?
The order of include header files and the difference between double quotation marks "and angle brackets < >
[OC foundation framework] - string and date and time >
Global and Chinese market of capacitive displacement sensors 2022-2028: Research Report on technology, participants, trends, market size and share
Publish and subscribe to redis