当前位置:网站首页>Spark学习:用spark实现ETL
Spark学习:用spark实现ETL
2022-07-30 18:45:00 【我爱夜来香A】
一、RDBMS To RDBMS
1、pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>Spark</groupId>
<artifactId>Spark</artifactId>
<version>0.0.1-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.2.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.2</version>
<executions>
<execution>
<id>compile-scala</id>
<phase>compile</phase>
<goals>
<goal>add-source</goal>
<goal>compile</goal>
</goals>
</execution>
<execution>
<id>test-compile-scala</id>
<phase>test-compile</phase>
<goals>
<goal>add-source</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<configuration>
<scalaVersion>2.12.15</scalaVersion>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.2</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass></mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.6.0</version>
<executions>
<execution>
<goals>
<goal>exec</goal>
</goals>
</execution>
</executions>
<configuration>
<executable>java</executable>
<includeProjectDependencies>true</includeProjectDependencies>
<includePluginDependencies>false</includePluginDependencies>
<classpathScope>compile</classpathScope>
<mainClass>cn.spark.study.App</mainClass>
</configuration>
</plugin>
</plugins>
</build>
</project>
2、Main
package bigData
import org.apache.spark.sql.SparkSession
import java.util.Properties
object DataSync {
def main(args:Array[String]):Unit = {
val app = s"${
this.getClass.getSimpleName}".filter(!_.equals('$'))
val spark: SparkSession = SparkSession.builder
.appName(app)
.master("local[*]")
.config("spark.shuffle.consolidateFiles", "true")
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.config("spark.streaming.kafka.maxRatePerPartition", "500")
.config("spark.streaming.stopGracefullyOnShutdown", "true")
.config("spark.network.timeout", "600")
.config("spark.streaming.kafka.consumer.poll.ms", "60000")
.config("spark.core.connection.ack.wait.timeout", "900")
.config("spark.rpc.message.maxSize", "50")
.config("spark.akka.timeout", "900")
.getOrCreate()
//数据源
val source_url = s"jdbc:oracle:thin:@192.168.0.101:1521/orcl"
val source_prop = new Properties()
source_prop.put("user","scott")
source_prop.put("password","scott")
source_prop.put("driver","oracle.jdbc.driver.OracleDriver")
//目标库
val target_url = s"jdbc:oracle:thin:@192.168.0.101:1521/orcl"
val target_prop = new Properties()
target_prop.put("user","scott")
target_prop.put("password","scott")
target_prop.put("driver","oracle.jdbc.driver.OracleDriver")
import spark.implicits._
spark.read.jdbc(source_url,"EMP",source_prop).write.jdbc(target_url,"EMP_TEST",target_prop)
}
}
3、运行结果
不过要注意,代码中没有做表已经存在的判断,第二次运行会报错
边栏推荐
- MYSQL(基本篇)——一篇文章带你走进MYSQL的奇妙世界
- 荐号 | 对你有恩的人,不要请吃饭来报答
- The sixteenth issue of eight-part article Balabala said (MQ)
- LeetCode 练习——关于查找数组元素之和的两道题
- 【Pointing to Offer】Pointing to Offer 18. Delete the node of the linked list
- 智慧中控屏
- DM8:单库单实例搭建本地数据守护服务
- ESP8266-Arduino编程实例-HC-SR04超声波传感器驱动
- ESP8266-Arduino编程实例-DS18B20温度传感器驱动
- kotlin的by lazy
猜你喜欢
The use of @ symbol in MySql
【剑指 Offe】剑指 Offer 18. 删除链表的节点
[Summary] 1396- 60+ VSCode plugins to create a useful editor
智慧中控屏
Application of time series database in the field of ship risk management
线性筛求积性函数
NC | 西湖大学陶亮组-TMPRSS2“助攻”病毒感染并介导索氏梭菌出血毒素的宿主入侵...
WeChat Mini Program Cloud Development | Urban Information Management
OneFlow source code analysis: Op, Kernel and interpreter
荐号 | 对你有恩的人,不要请吃饭来报答
随机推荐
ESP8266-Arduino编程实例-BMP180气压温度传感器驱动
经济新闻:错误# 15:初始化libiomp5md。dll,但发现libiomp5md。已经初始化dll。解决方法
运营 23 年,昔日“国内第一大电商网站”黄了...
【PHPWord】Quick Start of PHPWord in PHPOffice Suite
【总结】1396- 60+个 VSCode 插件,打造好用的编辑器
Mysql执行原理剖析
实体中增加操作方法
Swiper rotates pictures and plays background music
C# wpf borderless window add shadow effect
Two-point answer naked question (plus a little pigeonhole principle)
arcpy获取要素类(属性表)包含的数目
Immersive experience iFLYTEK 2022 Consumer Expo "Official Designated Product"
One year after graduation, I was engaged in software testing and won 11.5k. I didn't lose face to the post-98 generation...
JsonUtil基于字符串操作josn
设计消息队列存储消息数据的 MySQL 表格
DM8:单库单实例搭建本地数据守护服务
AI Basics: Graphical Transformer
A senior with 13 years of experience in software testing, summed up 5 test employment suggestions....
高精度加法
微博广告分布式配置中心的构建与实践(有彩蛋)