当前位置:网站首页>Explanation of spark common parameters
Explanation of spark common parameters
2022-06-11 02:34:00 【hzp666】
Spark The default configuration file for is located in this location on the fortress machine : $SPARK_CONF_DIR/spark-defaults.conf, Users can view and understand .
It should be noted that , The default value has the lowest priority , If the user explicitly specifies the configuration when submitting a task or in the code , The user configuration shall prevail . On the basis that users understand the meaning of parameters , Parameters can be adjusted according to specific task conditions ( Modify submission parameters --conf value , No spark-defaults.conf file ).
The following common parameter configurations can be configured through --conf XXX=Y Way to use , For other parameters and descriptions, please refer to Configuration - Spark 3.2.1 Documentation
Parameter name | recommended value | explain |
|---|---|---|
| spark.master | yarn | Which resource scheduler to use , In general use yarn. Local debugging can use local |
| spark.submit.deployMode | cluster | driver Where the program runs , Debugging can use client, Online task suggestions cluster. |
| spark.driver.cores | 4 | driver Maximum use cpu( Threads ) Count |
| spark.driver.memory | 4-10g | driver Request memory size |
| spark.executor.memory | 3. Spark Task tuning techniques | Single executor Request heap memory size |
| spark.python.worker.memory | spark.executor.memory/2 | Generally, the default value is used |
| spark.yarn.executor.memoryOverhead | 3072 | Single executor Request out of heap memory size , Generally, the default value is used |
| spark.executor.cores | 3. Spark Task tuning techniques | Single executor Maximum concurrency task Count |
| spark.executor.instances | 3. Spark Task tuning techniques | executor Count |
| spark.speculation | The default value is false | The speculative execution mechanism defaults to false( close ), If the operation gets stuck occasionally, you can try to open . |
| spark.default.parallelism | 3. Spark Task tuning techniques | Control default RDD Of partithion Count , Read hdfs When you file partition Count blocksize And whether to merge the input . |
| spark.sql.shuffle.partitions | perform sql or sql Class operator shuffle Partition number , This value should be increased when the amount of data is large . | |
| spark.pyspark.python | python2/python3/python3.5 | Appoint pyspark The use of python edition ( If you use docker Mirror image , Please confirm whether there is a corresponding version in the image , The platform basic image only has python2) |
| spark.log.level | The default value is info | ALL, TRACE, DEBUG, INFO, WARN, ERROR, FATAL, OFF, Case insensitive . |
| spark.sql.hive.mergeFiles | The default value is false | Opening will automatically close spark-sql Small files generated |
| spark.hadoop.jd.bdp.streaming.monitor.enable | The default value is false | Open or not streaming Homework batch Backlog alarm function , The default is false, It can be done by --conf spark.hadoop.jd.bdp.streaming.monitor.enable=true Turn on |
| spark.hadoop.jd.bdp.batch.threshold | The default value is 10 | streaming Homework batch Backlog alarm threshold , The default value is 10, Users can adjust according to their needs , for example : --conf spark.hadoop.jd.bdp.batch.threshold=20 |
| spark.hadoop.jd.bdp.user.define.erps | The alarm group configured by the platform is used by default | For similar streaming Homework batch Backlog and other indicators that only need users' attention , The user can customize the alarm group , for example : --conf spark.hadoop.jd.bdp.user.define.erps="baibing12|maruilei" ( Be careful : Multi person configurable , adjacent erp Use vertical line | Separate ) |
spark.isLoadHivercFile spark.sql.tempudf.ignoreIfExists | Default false | Whether to load all hive udf( Only support spark-sql Next use , I won't support it spark-submit、pyspark).(HiveTask The inside has been opened , The user doesn't need extra settings ) |
边栏推荐
- 关于Set集合类你都知道什么?来自《卷Ⅰ》的灵魂提问
- How to read PMBOK guide in 3 steps (experience + data sharing)
- A digit DP
- Find - (half find / half find)
- Jetpack Compose Scaffold和TopAppBar(顶部导航)
- 軟件測試英語常見詞匯
- Epoll principle and Application & ET mode and lt mode
- app 测试 常用 adb 命令集合
- [3.delphi common components] 6 scroll bar
- 2022 simulated 100 questions and answers for crane driver (limited to bridge crane) examination
猜你喜欢
![[C language] storage of data in memory -1 plastic](/img/4a/24c1bb4743bd4ae965ed88f333f2fe.jpg)
[C language] storage of data in memory -1 plastic

软件测试英语常见词汇

如何保障数仓数据质量?

Ortele has obtained three rounds of financing nine months after its establishment, and hard discount stores have found new ways to grow?

Sd3.0 notes

Test questions and answers of 2022r1 quick opening pressure vessel operation certificate

Jetpack Compose Box控件

贵金属白银和现货白银之间是什么关系

金属有机骨架材料Fe-MIL-53,Mg-MOF-74,Ti-KUMOF-1,Fe-MIL-100,Fe-MIL-101)负载异氟醚/甲氨蝶呤/阿霉素(DOX)/紫杉醇/布洛芬/喜树碱

Find - (block find)
随机推荐
C language principle explanation and code implementation of scalable / reduced thread pool
mysql重装时写my.ini配置文件出错
APP测试_测试点总结
Project - redis message queue + worker thread fetches user operation logs and stores them (2)
InfoQ geek media's 15th anniversary solicitation | in depth analysis of container runtime Technology
当逻辑删除遇上唯一索引,遇到的问题和解决方案?
What can the enterprise exhibition hall design bring to the enterprise?
Find - (sequential search)
[3.delphi common components] 8 dialog box
深度学习基础篇【4】从0开始搭建EasyOCR并进行简单文字识别
Everything实现快速搜索的原理
Modify release opening animation
889. 根据前序和后序遍历构造二叉树
軟件測試英語常見詞匯
MOFs, metal organic framework materials of folic acid ligands, are loaded with small molecule drugs such as 5-fluorouracil, sidabelamine, taxol, doxorubicin, daunorubicin, ibuprofen, camptothecin, cur
[3.delphi common components] 7 timer
Jetpack Compose Box控件
Closing method of SQL injection
Large screen - full screen, exit full screen
Penetration test - security service system +owasp top 10