当前位置:网站首页>Solve the problem of too many small files
Solve the problem of too many small files
2022-07-06 09:34:00 【Prism 7】
List of articles
1. Use hive Self contained concatenate command , Automatically merge small files
Usage method :
2. Adjust the parameters to reduce map Number
In execution map Forward small file merge , stay mapper Combine multiple files into one split As input . adjustment split At least the size of
3. Reduce Reduce The number of
reduce The number of output files depends on the number of output files , So you can adjust reduce Control the number of hive The number of files in the table .
4. HAR file
Use hadoop Of archive File small files , Can package multiple small files into one har file
5. jvm reusing
Hadoop The default configuration is usually to use derivation JVM To execute map and Reduce Mission . At this time JVM The startup process can be quite expensive , Especially for execution job There are hundreds of thousands of them task Mission status .JVM Reuse allows JVM Instance in the same job Reuse in N Time .
The downside of this feature is that , Turn on JVM Reuse will always be used to task slot , For reuse , Not released until the mission is complete .
边栏推荐
- Segmentation sémantique de l'apprentissage profond - résumé du code source
- Webrtc blog reference:
- 为什么要数据分层
- Redis之主从复制
- 基于B/S的网上零食销售系统的设计与实现(附:源码 论文 Sql文件)
- Sentinel mode of redis
- Reids之删除策略
- CSP salary calculation
- Servlet learning diary 7 -- servlet forwarding and redirection
- Advanced Computer Network Review(5)——COPE
猜你喜欢

Publish and subscribe to redis

Redis' bitmap

Mapreduce实例(八):Map端join

【深度學習】語義分割-源代碼匯總

发生OOM了,你知道是什么原因吗,又该怎么解决呢?

Multivariate cluster analysis

Mapreduce实例(七):单表join

LeetCode41——First Missing Positive——hashing in place & swap

IDS cache preheating, avalanche, penetration

Sqlmap installation tutorial and problem explanation under Windows Environment -- "sqlmap installation | CSDN creation punch in"
随机推荐
基于B/S的医院管理住院系统的研究与实现(附:源码 论文 sql文件)
Reids之缓存预热、雪崩、穿透
Redis之cluster集群
Improved deep embedded clustering with local structure preservation (Idec)
Seven layer network architecture
MapReduce instance (IV): natural sorting
One article read, DDD landing database design practice
Detailed explanation of cookies and sessions
Solve the problem of inconsistency between database field name and entity class attribute name (resultmap result set mapping)
Redis之哨兵模式
[daily question] Porter (DFS / DP)
解决小文件处过多
Appears when importing MySQL
Advanced Computer Network Review(5)——COPE
Scoped in webrtc_ refptr
Global and Chinese market for annunciator panels 2022-2028: Research Report on technology, participants, trends, market size and share
Mapreduce实例(六):倒排索引
Parameterization of postman
Redis之连接redis服务命令
QML type: locale, date