当前位置:网站首页>Spark submission parameters -- use of files
Spark submission parameters -- use of files
2022-07-25 15:15:00 【The south wind knows what I mean】
Project scenario :
We have two clusters (ps: Computing Cluster / Storage cluster ), Now there is a need for , Computing cluster runs Spark Mission , from kafka Write the data to the storage cluster hive
Problem description
Read and write data across clusters , We tested writing hbase It can be written from the computing cluster to the storage cluster , And it can be written in .
But once you write hive He just doesn't write about storage clusters hive in , Each time, it only writes about the computing cluster hive in .
It's hard for me to understand , And I am here IDEA During the test , Can be written to the storage cluster hive in , Once you get on the dolphin, put it on the cluster and run He wrote that he had deviated , It is written to the computing cluster hive Inside the . I am here resource The folder also contains the storage cluster core-site.xml hdfs-site.xml hive-site.xml The file , I also wrote in the code changeNameNode The method . But the program still seems unable to switch to the storage cluster when running NN Up 
/*** * @Author: lzx * @Description: * @Date: 2022/5/27 * @Param session: bulid well Sparkssion * @Param nameSpace: The namespace of the cluster * @Param nn1: nn1_ID * @Param nn1Addr: nn1 Corresponding IP:host * @Param nn2: nn2_ID * @Param nn2Addr: nn2 Corresponding IP:host * @return: void **/
def changeHDFSConf(session:SparkSession,nameSpace:String,nn1:String,nn1Addr:String,nn2:String,nn2Addr:String): Unit ={
val sc: SparkContext = session.sparkContext
sc.hadoopConfiguration.set("fs.defaultFS", s"hdfs://$nameSpace")
sc.hadoopConfiguration.set("dfs.nameservices", nameSpace)
sc.hadoopConfiguration.set(s"dfs.ha.namenodes.$nameSpace", s"$nn1,$nn2")
sc.hadoopConfiguration.set(s"dfs.namenode.rpc-address.$nameSpace.$nn1", nn1Addr)
sc.hadoopConfiguration.set(s"dfs.namenode.rpc-address.$nameSpace.$nn2", nn2Addr)
sc.hadoopConfiguration.set(s"dfs.client.failover.proxy.provider.$nameSpace", s"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider")
}
Cause analysis :
1. I'll go first Spark On the running interface of Environment Under the hadoop Parameters of , I searched nn1 Went to have a look , Look at my changenamenode Has the method worked for me
2, result dfs.namenode.http-address.hr-hadoop.nn1 The value of the or node03( Calculate the cluster ) No node118( Storage cluster ) Explain that the method is still not effective
Why not take effect ???
Configuration conf=new Configuration();
Create a Configuration Object time , Its construction method will load... By default hadoop Two configuration files in , Namely hdfs-site.xml as well as core-site.xml, There will be access in these two files hdfs Required parameter values
I have this in my code , Why didn't you load it ??
3, After analysis, I found that , The code is submitted to the cluster for execution , It loads the... On the cluster core/hdfs-site.xml file , Directly discard the configuration file in the code
Solution :
1. In the code , Replace the cluster configuration file with your own configuration file , In this way, you can find the information of the storage cluster
val hadoopConf: Configuration = new Configuration()
hadoopConf.addResource("hdfs-site.xml")
hadoopConf.addResource("core-site.xml")
If both configuration resources contain the same configuration item , And the configuration item of the previous resource is not marked as final, that , The following configuration will overwrite the previous configuration . In the example above ,core-site.xml The configuration in will override core-default.xml Configuration with the same name in . If in the first resource (core-default.xml) A configuration item in is marked as final, that , When loading the second resource , There will be a warning .
2, It's not possible to do that just above , It says , Once packaged and run on the cluster , He will put resource Under the folder core/hdfs-site.xml File discard , then .addResource(“hdfs-site.xml”) I can't find my own document , Go to the configuration file of the cluster
3, Put your two configuration files in the execution directory , Submit again spark When the task , Specify in the submission parameters
--files /srv/udp/2.0.0.0/spark/userconf/hdfs-site.xml,/srv/udp/2.0.0.0/spark/userconf/core-site.xml \
4, Expand :
--files Transferred files :
If you are in the same cluster as the current submission cluster , It will prompt that the current data source is the same as the target file storage system , The copy is not triggered at this time
INFO Client: Source and destination file systems are the same. Not copying
If you are in a different cluster from the current submission cluster , The source file is updated from the source path to the current file storage system
INFO Client: Uploading resource
边栏推荐
- pkg_ Resources dynamic loading plug-in
- TypeScript学习2——接口
- 安装EntityFramework方法
- Promise对象与宏任务、微任务
- [Android] recyclerview caching mechanism, is it really difficult to understand? What level of cache is it?
- Solve the error caused by too large file when uploading file by asp.net
- node学习
- Process control (Part 1)
- Implementation of redis distributed lock
- Stored procedure bias of SQL to LINQ
猜你喜欢

Overview of JS synchronous, asynchronous, macro task and micro task

在win10系统下使用命令查看WiFi连接密码

Share a department design method that avoids recursion

6月产品升级观察站

Boosting之GBDT源码分析
![[Nacos] what does nacosclient do during service registration](/img/76/3c2e8f9ba19e36d9581f34fda65923.png)
[Nacos] what does nacosclient do during service registration

如何解决Visual Stuido2019 30天体验期过后的登陆问题

一个程序最多可以使用多少内存?

spark分区算子partitionBy、coalesce、repartition

Recommend 10 learning websites that can be called artifact
随机推荐
System. Accessviolationexception: an attempt was made to read or write to protected memory. This usually indicates that other memory is corrupted
Unable to start web server when Nacos starts
Automatically set the template for VS2010 and add header comments
mysql heap表_MySQL内存表heap使用总结-九五小庞
Promise object and macro task, micro task
期货在线开户是否安全?去哪家公司手续费最低?
Use the command to check the WiFi connection password under win10 system
spark分区算子partitionBy、coalesce、repartition
Spark002 --- spark task submission, pass JSON as a parameter
记一次Spark报错:Failed to allocate a page (67108864 bytes), try again.
[thread knowledge points] - spin lock
Example of password strength verification
SPI传输出现数据与时钟不匹配延后问题分析与解决
Stored procedure bias of SQL to LINQ
请问seata中mysql参数每个客户端连接最大的错误允许数量要怎么理解呢?
Leo-sam: tightly coupled laser inertial odometer with smoothing and mapping
oracle_ 12505 error resolution
dpdk 收发包问题案例:使用不匹配的收发包函数触发的不收包问题定位
Fast-lio: fast and robust laser inertial odometer based on tightly coupled IEKF
"Ask every day" reentrantlock locks and unlocks