当前位置:网站首页>Solve the problem of too many small files
Solve the problem of too many small files
2022-07-06 09:34:00 【Prism 7】
List of articles
1. Use hive Self contained concatenate command , Automatically merge small files
Usage method :
2. Adjust the parameters to reduce map Number
In execution map Forward small file merge , stay mapper Combine multiple files into one split As input . adjustment split At least the size of
3. Reduce Reduce The number of
reduce The number of output files depends on the number of output files , So you can adjust reduce Control the number of hive The number of files in the table .
4. HAR file
Use hadoop Of archive File small files , Can package multiple small files into one har file
5. jvm reusing
Hadoop The default configuration is usually to use derivation JVM To execute map and Reduce Mission . At this time JVM The startup process can be quite expensive , Especially for execution job There are hundreds of thousands of them task Mission status .JVM Reuse allows JVM Instance in the same job Reuse in N Time .
The downside of this feature is that , Turn on JVM Reuse will always be used to task slot , For reuse , Not released until the mission is complete .
边栏推荐
- In order to get an offer, "I believe that hard work will make great achievements
- Design and implementation of online shopping system based on Web (attached: source code paper SQL file)
- Global and Chinese market of AVR series microcontrollers 2022-2028: Research Report on technology, participants, trends, market size and share
- IDS' deletion policy
- 五月刷题26——并查集
- go-redis之初始化连接
- Use of activiti7 workflow
- Reids之缓存预热、雪崩、穿透
- QML type: overlay
- Redis cluster
猜你喜欢

Use of activiti7 workflow

Nacos installation and service registration

Advanced Computer Network Review(3)——BBR

Mapreduce实例(九):Reduce端join

Master slave replication of redis

Redis之核心配置

Kratos战神微服务框架(二)

【深度学习】语义分割:论文阅读(NeurIPS 2021)MaskFormer: per-pixel classification is not all you need

Chapter 1 :Application of Artificial intelligence in Drug Design:Opportunity and Challenges

Redis之哨兵模式
随机推荐
【深度学习】语义分割:论文阅读:(2021-12)Mask2Former
MapReduce instance (V): secondary sorting
Advance Computer Network Review(1)——FatTree
Minio distributed file storage cluster for full stack development
AcWing 2456. Notepad
MySQL数据库优化的几种方式(笔面试必问)
Redis cluster
Hard core! One configuration center for 8 classes!
Leetcode problem solving 2.1.1
软件负载均衡和硬件负载均衡的选择
解决小文件处过多
六月刷题01——数组
Redis之连接redis服务命令
Webrtc blog reference:
【shell脚本】——归档文件脚本
[deep learning] semantic segmentation - source code summary
Mapreduce实例(七):单表join
基于B/S的医院管理住院系统的研究与实现(附:源码 论文 sql文件)
QML type: overlay
Redis' performance indicators and monitoring methods