当前位置:网站首页>Basic configuration and use of spark
Basic configuration and use of spark
2022-07-06 17:39:00 【Bald Second Senior brother】
Catalog
Content :
spark Configuration of the three modes and spark The basic use method
Spark Three models of
Local Pattern ( Local mode )
local Set up Master Methods :local( Default to a thread ),local[k]( Specified number of threads ),local[*]( Most used cpu Set thread ); The thread executing is Worker
To configure :
local There is no need to modify the configuration file ,Spark After installation, you can use it directly local Model to calculate and analyze data
Standalone Pattern
Operation mode and hadoop Of resourcemanage Very similar .

The configuration file

Modify this file , If you don't change the name of his reference document
Modify the content :

Add the name of the host , Follow hadoop The same is for group and cluster services

This file also needs to be modified , If nothing, change the reference file
Configure the content

This is designated to run on that machine master, And specify the port number
The next step is to send the configuration file to slaves Each host in

This file modification specifies java The running address of

History server
To add a historical server, you need to modify this configuration file and add the following contents


Be careful : Take out the notes


As above, the configuration file needs to be modified
stay hdps Created in directory file , And start up hdfs
HA

Modify the configuration file

Add content

Appoint zookeeper Location of , And hdfs At the same time, comment out

To avoid conflict
Then distribute the configuration file
Yarn Pattern

effect : You don't need to build Spark Cluster of
The configuration file :
1.

modify hadoop in yarn Configuration file for
Modify the content :

The function of this configuration is to turn off operations with excessive memory , Otherwise, when the computing memory exceeds a certain limit spark It will turn off automatically
2. modify

Modify the content :

Be careful
Before using different modes, you should comment or delete the contents of other modes in the configuration file .

Use of official cases

Pay attention to each different mode ,master The content of is different , There will be some differences in other places
api Use
object WordCount {
def main(args: Array[String]): Unit = {
//WordCount Development
//local Pattern
// establish sparkConf object
// Set up spark Deployment environment for
val config = new SparkConf().setMaster("local[*]").setAppName("WordCount")
// establish spark Context object
val sc=new SparkContext(config)
// Read the file , Read line by line ( Local search file:///xxxxx)
val lines = sc.textFile("file:///opt/module/ha/spark/in")
// Decompose the data into words one by one
val words = lines.flatMap(_.split(" "))
// Transformation structure
val wordToOne = words.map((_, 1))
// Group aggregation
val wordToSum = wordToOne.reduceByKey(_ + _)
// Printout
println(wordToSum.collect())
}
}
summary :
Today I learned how to spark At the same time, the official use of spark Three modes of and their configuration files ,spark The configuration file of is relatively simple, but it is inconvenient that it cannot coexist , At the same time, the use method of his official operation is too complex to be easy to remember , It is estimated that you still need to check the document for later use . Next is in java It is also cumbersome when used on , But it is much simpler than the command line
边栏推荐
- Re signal writeup
- 【MMdetection】一文解决安装问题
- BearPi-HM_ Nano development board "flower protector" case
- 遠程代碼執行滲透測試——B模塊測試
- [elastic] elastic lacks xpack and cannot create template unknown setting index lifecycle. name index. lifecycle. rollover_ alias
- JVM 垃圾回收器之Serial SerialOld ParNew
- Serial serialold parnew of JVM garbage collector
- C# NanoFramework 点灯和按键 之 ESP32
- Essai de pénétration du Code à distance - essai du module b
- Pyspark operator processing spatial data full parsing (5): how to use spatial operation interface in pyspark
猜你喜欢

C # nanoframework lighting and key esp32

Flink analysis (I): basic concept analysis

CTF逆向入门题——掷骰子

【逆向中级】跃跃欲试

Serial serialold parnew of JVM garbage collector

02 personal developed products and promotion - SMS platform
![[ASM] introduction and use of bytecode operation classwriter class](/img/0b/87c9851e577df8dcf8198a272b81bd.png)
[ASM] introduction and use of bytecode operation classwriter class

Distributed (consistency protocol) leader election (dotnext.net.cluster implements raft election)

连接局域网MySql

06个人研发的产品及推广-代码统计工具
随机推荐
Grafana 9 is officially released, which is easier to use and more cool!
Automatic operation and maintenance sharp weapon ansible Foundation
JUnit unit test
关于Selenium启动Chrome浏览器闪退问题
Application service configurator (regular, database backup, file backup, remote backup)
The most complete tcpdump and Wireshark packet capturing practice in the whole network
CTF逆向入门题——掷骰子
Distributed (consistency protocol) leader election (dotnext.net.cluster implements raft election)
【MMdetection】一文解决安装问题
沉淀下来的数据库操作类-C#版(SQL Server)
自动答题 之 Selenium测试直接运行在浏览器中,就像真正的用户在操作一样。
PySpark算子处理空间数据全解析(5): 如何在PySpark里面使用空间运算接口
How uipath determines that an object is null
Deploy flask project based on LNMP
轻量级计划服务工具研发与实践
Models used in data warehouse modeling and layered introduction
[elastic] elastic lacks xpack and cannot create template unknown setting index lifecycle. name index. lifecycle. rollover_ alias
DataGridView scroll bar positioning in C WinForm
案例:检查空字段【注解+反射+自定义异常】
The NTFS format converter (convert.exe) is missing from the current system