当前位置:网站首页>Spark saving to external data source
Spark saving to external data source
2022-06-29 05:25:00 【wx5ba7ab4695f27】
List of articles
Save as sequenceFile
package write
import org.apache.hadoop.io.compress.GzipCodec
import org.apache.spark.{SparkConf, SparkContext}
object saveToSeq {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[*]")
.setAppName("saveToSeq")
val sc = new SparkContext(conf)
val data = List(("name", "xiaoming"), ("age", "18"))
val rddData = sc.parallelize(data, 1)
rddData.saveAsSequenceFile("D:\\studyplace\\sparkBook\\chapter4\\result\\1",Some(classOf[GzipCodec]))
}
}
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
among saveAsSequenceFile Of api The first parameter is the save file path , The second parameter is to set the compression mode
about ClassOf[xxxCodec] Objects must be encapsulated in Option In the collection SequenceFile In the method , stay scala in Option Two examples of are Some The collection and None aggregate , The latter means that there are no elements
In compression mode ,GzipCodec The compression ratio of is higher , If there are not enough disks, you can use this method , although Bzip Higher compression , But it's not suitable for frequent reading and writing scenarios
Save to HDFS
- saveAsTextFile
Essentially called saveAsHadoopFile Method - saveAsHadoopFile
Yes URI Judge , With file:/// Save the data to the local file system , If schema yes hdfs:// Write data to hdfs In file
saveAsHadoopFile In the method , The default call is TextOutputFormat Implementation class as a formatting tool for output data
import org.apache.hadoop.io.{IntWritable, Text} import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat import org.apache.spark.{SparkConf, SparkContext} object saveTohadoop { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("saveTohadoop").setMaster("local[*]") val sc = new SparkContext(conf) val rddData = sc.parallelize(List(("cat",20),("dog",29),("pig",11)),1) rddData.saveAsNewAPIHadoopFile(" route ",classOf[Text],classOf[IntWritable],classOf[TextOutputFormat[Text,IntWritable]]) sc.stop() } } Save to mysql
package write
import java.sql.DriverManager
import org.apache.spark.{SparkConf, SparkContext}
object saveToMySQL {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[*]").setAppName("saveToMySQL")
val sc = new SparkContext(conf)
Class.forName("com.mysql.jdbc.Driver")
val rddData = sc.parallelize(List(("tom",11),("jettty",19)))
rddData.foreachPartition((iter:Iterator[(String,Int)]) => {
val conn = DriverManager.getConnection("jdbc:mysql://localhost:3306/spark?useUnicode=true&characterEncoding=utf-8","root","123456")
conn.setAutoCommit(false)
val statement = conn.prepareStatement("insert into spark.person (name,age) VALUES (?,?);")
iter.foreach( t => {
statement.setString(1,t._1)
statement.setInt(2,t._2)
statement.addBatch()
})
statement.executeBatch()
conn.commit()
conn.close()
})
sc.stop()
}
}
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
When saving data, use foreachPartition Methods through RDD Every partition in the world
Be careful :DriverManager.getConnection Need to move to foreaPartition Inside
conn.setAutoCommit(false) Turn off auto submit , It is more suitable for large amount of data batch operation
边栏推荐
- 5000+ word interpretation | Product Manager: how to do a good job in component selection?
- In 2022, I haven't found a job yet. I have been unemployed for more than one year. What is the "old tester" for eight years?
- What is an anonymous inner class and how to use it
- Love that can't be met -- what is the intimate relationship maintained by video chat
- Analysis report on the investment market of the development planning prospect of the recommended rare earth industry research industry in 2022 (the attachment is a link to the online disk, and the rep
- 想问问,券商选哪个比较好尼?本人小白不懂,现在网上开户安全么?
- Research Report on the overall scale, major manufacturers, major regions, product and application segmentation of spinning flow forming hub in the global market in 2022
- [IOT] description of renaming the official account "Jianyi commerce" to "product renweipeng"
- 嵌入式RTOS
- Research Report on the overall scale, major manufacturers, major regions, products and application segments of 5g modules of the Internet of things in the global market in 2022
猜你喜欢

网传广东一名学生3次考上北大,3年共赚200万元奖金

Research on heuristic intelligent task scheduling

Annual inventory review of Alibaba cloud's observable practices in 2021

Mvcc principle in MySQL

Love that can't be met -- what is the intimate relationship maintained by video chat

Technical specifications of Tektronix tds3054b oscilloscope
![[IOT] description of renaming the official account](/img/54/43189f34b81a7441cd46d5c2066970.png)
[IOT] description of renaming the official account "Jianyi commerce" to "product renweipeng"

【IoT】公众号“简一商业”更名为“产品人卫朋”说明

Blip: conduct multimodal pre training with cleaner and more diverse data, and the performance exceeds clip! Open source code

Research Report on the new energy industry of recommended power equipment in 2022 industry development prospect market investment analysis (the attachment is a link to the network disk, and the report
随机推荐
Research Report on the overall scale, major manufacturers, major regions, products and application segmentation of GPS antenna modules in the global market in 2022
Complete collection of necessary documents for project management: you can't write these 14 project documents yet?
D parallel and rotator
What has urbanization brought to our mental health and behavior?
Research Report on the new energy industry of recommended power equipment in 2022 industry development prospect market investment analysis (the attachment is a link to the network disk, and the report
Research Report on the overall scale, major manufacturers, major regions, products and applications of electric hydrofoil surfboards in the global market in 2022
Cipher
[code random entry - hash table] T15, sum of three numbers - double pointer + sort
Software architecture final review summary
什么是匿名内部类,如何使用匿名内部类
51 single chip microcomputer learning notes 7 -- Ultrasonic Ranging
【代码随想录-哈希表】T15、三数之和-双指针+排序
RTOS embarqués
Leetcode theme [array] -219- duplicate Element II exists
2022-2028 global and Chinese industrial digital electronic blasting detonator Market Status and future development trend
Sailing with karmada: multi cluster management of massive nodes
[code Capriccio - dynamic planning] longest common subsequence
real time AI based system questionaires
(practice C language every day) matrix
2022 recommended quantum industry research industry development planning prospect investment market analysis report (the attachment is a link to the online disk, and the report is continuously updated