当前位置:网站首页>Spark submission parameters -- use of files
Spark submission parameters -- use of files
2022-07-25 15:15:00 【The south wind knows what I mean】
Project scenario :
We have two clusters (ps: Computing Cluster / Storage cluster ), Now there is a need for , Computing cluster runs Spark Mission , from kafka Write the data to the storage cluster hive
Problem description
Read and write data across clusters , We tested writing hbase It can be written from the computing cluster to the storage cluster , And it can be written in .
But once you write hive He just doesn't write about storage clusters hive in , Each time, it only writes about the computing cluster hive in .
It's hard for me to understand , And I am here IDEA During the test , Can be written to the storage cluster hive in , Once you get on the dolphin, put it on the cluster and run He wrote that he had deviated , It is written to the computing cluster hive Inside the . I am here resource The folder also contains the storage cluster core-site.xml hdfs-site.xml hive-site.xml The file , I also wrote in the code changeNameNode The method . But the program still seems unable to switch to the storage cluster when running NN Up 
/*** * @Author: lzx * @Description: * @Date: 2022/5/27 * @Param session: bulid well Sparkssion * @Param nameSpace: The namespace of the cluster * @Param nn1: nn1_ID * @Param nn1Addr: nn1 Corresponding IP:host * @Param nn2: nn2_ID * @Param nn2Addr: nn2 Corresponding IP:host * @return: void **/
def changeHDFSConf(session:SparkSession,nameSpace:String,nn1:String,nn1Addr:String,nn2:String,nn2Addr:String): Unit ={
val sc: SparkContext = session.sparkContext
sc.hadoopConfiguration.set("fs.defaultFS", s"hdfs://$nameSpace")
sc.hadoopConfiguration.set("dfs.nameservices", nameSpace)
sc.hadoopConfiguration.set(s"dfs.ha.namenodes.$nameSpace", s"$nn1,$nn2")
sc.hadoopConfiguration.set(s"dfs.namenode.rpc-address.$nameSpace.$nn1", nn1Addr)
sc.hadoopConfiguration.set(s"dfs.namenode.rpc-address.$nameSpace.$nn2", nn2Addr)
sc.hadoopConfiguration.set(s"dfs.client.failover.proxy.provider.$nameSpace", s"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider")
}
Cause analysis :
1. I'll go first Spark On the running interface of Environment Under the hadoop Parameters of , I searched nn1 Went to have a look , Look at my changenamenode Has the method worked for me
2, result dfs.namenode.http-address.hr-hadoop.nn1 The value of the or node03( Calculate the cluster ) No node118( Storage cluster ) Explain that the method is still not effective
Why not take effect ???
Configuration conf=new Configuration();
Create a Configuration Object time , Its construction method will load... By default hadoop Two configuration files in , Namely hdfs-site.xml as well as core-site.xml, There will be access in these two files hdfs Required parameter values
I have this in my code , Why didn't you load it ??
3, After analysis, I found that , The code is submitted to the cluster for execution , It loads the... On the cluster core/hdfs-site.xml file , Directly discard the configuration file in the code
Solution :
1. In the code , Replace the cluster configuration file with your own configuration file , In this way, you can find the information of the storage cluster
val hadoopConf: Configuration = new Configuration()
hadoopConf.addResource("hdfs-site.xml")
hadoopConf.addResource("core-site.xml")
If both configuration resources contain the same configuration item , And the configuration item of the previous resource is not marked as final, that , The following configuration will overwrite the previous configuration . In the example above ,core-site.xml The configuration in will override core-default.xml Configuration with the same name in . If in the first resource (core-default.xml) A configuration item in is marked as final, that , When loading the second resource , There will be a warning .
2, It's not possible to do that just above , It says , Once packaged and run on the cluster , He will put resource Under the folder core/hdfs-site.xml File discard , then .addResource(“hdfs-site.xml”) I can't find my own document , Go to the configuration file of the cluster
3, Put your two configuration files in the execution directory , Submit again spark When the task , Specify in the submission parameters
--files /srv/udp/2.0.0.0/spark/userconf/hdfs-site.xml,/srv/udp/2.0.0.0/spark/userconf/core-site.xml \
4, Expand :
--files Transferred files :
If you are in the same cluster as the current submission cluster , It will prompt that the current data source is the same as the target file storage system , The copy is not triggered at this time
INFO Client: Source and destination file systems are the same. Not copying
If you are in a different cluster from the current submission cluster , The source file is updated from the source path to the current file storage system
INFO Client: Uploading resource
边栏推荐
- Automatically set the template for VS2010 and add header comments
- Process control (Part 1)
- SPI传输出现数据与时钟不匹配延后问题分析与解决
- js URLEncode函数
- 6线SPI传输模式探索
- Scala110-combineByKey
- Scala111-map、flatten、flatMap
- Overview of JS synchronous, asynchronous, macro task and micro task
- Hbck 修复问题
- System.AccessViolationException: 尝试读取或写入受保护的内存。这通常指示其他内存已损坏
猜你喜欢

Spark SQL空值Null,NaN判断和处理

MySQL之事务与MVCC

瀑布流布局

node学习

记一次Yarn Required executor memeory is above the max threshold(8192MB) of this cluster!

"Ask every day" reentrantlock locks and unlocks

What is the Internet of things

基于OpenCV和YOLOv3的目标检测实例应用

Award winning interaction | 7.19 database upgrade plan practical Summit: industry leaders gather, why do they come?

【微信小程序】小程序宿主环境详解
随机推荐
打开虚拟机时出现VMware Workstation 未能启动 VMware Authorization Service
Share a department design method that avoids recursion
TypeScript学习1——数据类型
浏览器工作流程(简化)
Promise object and macro task, micro task
pkg_ Resources dynamic loading plug-in
Install entityframework method
SPI传输出现数据与时钟不匹配延后问题分析与解决
Spark sql 常用时间函数
用setTimeout模拟setInterval定时器
【JS高级】js之正则相关函数以及正则对象_02
args参数解析
MFC 线程AfxBeginThread基本用法,传多个参数
Visual Studio 2022 查看类关系图
API health status self inspection
VS2010 add WAP mobile form template
Simulate setinterval timer with setTimeout
Instance Tunnel 使用
oracle_ 12505 error resolution
Spark-SQL UDF函数