当前位置:网站首页>Spark saving to external data source
Spark saving to external data source
2022-06-29 05:25:00 【wx5ba7ab4695f27】
List of articles
Save as sequenceFile
package write
import org.apache.hadoop.io.compress.GzipCodec
import org.apache.spark.{SparkConf, SparkContext}
object saveToSeq {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[*]")
.setAppName("saveToSeq")
val sc = new SparkContext(conf)
val data = List(("name", "xiaoming"), ("age", "18"))
val rddData = sc.parallelize(data, 1)
rddData.saveAsSequenceFile("D:\\studyplace\\sparkBook\\chapter4\\result\\1",Some(classOf[GzipCodec]))
}
}
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
among saveAsSequenceFile Of api The first parameter is the save file path , The second parameter is to set the compression mode
about ClassOf[xxxCodec] Objects must be encapsulated in Option In the collection SequenceFile In the method , stay scala in Option Two examples of are Some The collection and None aggregate , The latter means that there are no elements
In compression mode ,GzipCodec The compression ratio of is higher , If there are not enough disks, you can use this method , although Bzip Higher compression , But it's not suitable for frequent reading and writing scenarios
Save to HDFS
- saveAsTextFile
Essentially called saveAsHadoopFile Method - saveAsHadoopFile
Yes URI Judge , With file:/// Save the data to the local file system , If schema yes hdfs:// Write data to hdfs In file
saveAsHadoopFile In the method , The default call is TextOutputFormat Implementation class as a formatting tool for output data
import org.apache.hadoop.io.{IntWritable, Text} import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat import org.apache.spark.{SparkConf, SparkContext} object saveTohadoop { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("saveTohadoop").setMaster("local[*]") val sc = new SparkContext(conf) val rddData = sc.parallelize(List(("cat",20),("dog",29),("pig",11)),1) rddData.saveAsNewAPIHadoopFile(" route ",classOf[Text],classOf[IntWritable],classOf[TextOutputFormat[Text,IntWritable]]) sc.stop() } } Save to mysql
package write
import java.sql.DriverManager
import org.apache.spark.{SparkConf, SparkContext}
object saveToMySQL {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[*]").setAppName("saveToMySQL")
val sc = new SparkContext(conf)
Class.forName("com.mysql.jdbc.Driver")
val rddData = sc.parallelize(List(("tom",11),("jettty",19)))
rddData.foreachPartition((iter:Iterator[(String,Int)]) => {
val conn = DriverManager.getConnection("jdbc:mysql://localhost:3306/spark?useUnicode=true&characterEncoding=utf-8","root","123456")
conn.setAutoCommit(false)
val statement = conn.prepareStatement("insert into spark.person (name,age) VALUES (?,?);")
iter.foreach( t => {
statement.setString(1,t._1)
statement.setInt(2,t._2)
statement.addBatch()
})
statement.executeBatch()
conn.commit()
conn.close()
})
sc.stop()
}
}
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
When saving data, use foreachPartition Methods through RDD Every partition in the world
Be careful :DriverManager.getConnection Need to move to foreaPartition Inside
conn.setAutoCommit(false) Turn off auto submit , It is more suitable for large amount of data batch operation
边栏推荐
- Real time waveform calculation function of Waveform Recorder mr6000
- Set column width in jitter - set column width in jitter
- Research Report on the overall scale, major manufacturers, major regions, product and application segmentation of the gsm-gprs-edge module of the Internet of things in the global market in 2022
- How to test electronic components with a multimeter
- Matlab直接求贝塞尔函数的导函数
- Résultats D - exam de Qinhuangdao au cours des 20 dernières années
- Introduction to Photoshop (the first case)
- QT precautions and RCC download address
- Structure training camp module II operation
- HTTP Caching Protocol practice
猜你喜欢

It is said on the Internet that a student from Guangdong has been admitted to Peking University for three times and earned a total of 2million yuan in three years

2022 recommended cloud computing industry research report investment strategy industry development prospect market analysis (the attachment is a link to the online disk, and the report is continuously

2022 recommended RCEP regional comprehensive economic partnership agreement market quotation Investment Analysis Industry Research Report (the attachment is a link to the online disk, and the report i

Love that can't be met -- what is the intimate relationship maintained by video chat

Network device setting / canceling console port login separate password

Blip: conduct multimodal pre training with cleaner and more diverse data, and the performance exceeds clip! Open source code

Structure training camp module II operation

《软件体系结构》期末复习总结

2022 recommended quantum industry research industry development planning prospect investment market analysis report (the attachment is a link to the online disk, and the report is continuously updated
![[high concurrency] deeply analyze the callable interface](/img/42/43d1f0b894f2632f2c7f1bfe970708.jpg)
[high concurrency] deeply analyze the callable interface
随机推荐
Matlab直接求贝塞尔函数的导函数
AttributeError: module ‘torch. nn. Parameter 'has no attribute' uninitializedparameter 'solution
20年秦皇岛D - Exam Results(二分+思维,附易错数据)
Research Report on the overall scale, major manufacturers, major regions, product and application segmentation of the gsm-gprs-edge module of the Internet of things in the global market in 2022
【IoT】公众号“简一商业”更名为“产品人卫朋”说明
网传广东一名学生3次考上北大,3年共赚200万元奖金
Research Report on the overall scale, major manufacturers, major regions, products and applications of high temperature film capacitors in the global market in 2022
[code Capriccio - dynamic planning] longest common subsequence
IDENTITY
證券開戶安全麼,有沒有什麼危險呢
Research Report on the recommended lithography industry in 2022 industry development prospect market investment analysis (the attachment is a link to the network disk, and the report is continuously u
Cipher
2022-2028 global and Chinese industrial electronic detonator Market Status and future development trend
Accelerate the global cloud native layout, kyligence intelligent data cloud officially supports Google cloud
Satellite navigation time service related terms Collection Edition
The first in China! CICA technology database antdb appears at the performance test tool conference of China Academy of communications technology
开启生态新姿势 | 使用 WordPress 远程附件存储到 COS
Cloud native annual technology inventory is released! Ride the wind and waves at the right time
Use typescript compiler parameter 'skiplibcheck' - usage of the typescript compiler argument'skiplibcheck'
Technical parameters of Tektronix DPO4104 digital fluorescence oscilloscope