当前位置:网站首页>Spark学习:用spark实现ETL
Spark学习:用spark实现ETL
2022-07-30 18:45:00 【我爱夜来香A】
一、RDBMS To RDBMS
1、pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>Spark</groupId>
<artifactId>Spark</artifactId>
<version>0.0.1-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.2.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.2</version>
<executions>
<execution>
<id>compile-scala</id>
<phase>compile</phase>
<goals>
<goal>add-source</goal>
<goal>compile</goal>
</goals>
</execution>
<execution>
<id>test-compile-scala</id>
<phase>test-compile</phase>
<goals>
<goal>add-source</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<configuration>
<scalaVersion>2.12.15</scalaVersion>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.2</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass></mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.6.0</version>
<executions>
<execution>
<goals>
<goal>exec</goal>
</goals>
</execution>
</executions>
<configuration>
<executable>java</executable>
<includeProjectDependencies>true</includeProjectDependencies>
<includePluginDependencies>false</includePluginDependencies>
<classpathScope>compile</classpathScope>
<mainClass>cn.spark.study.App</mainClass>
</configuration>
</plugin>
</plugins>
</build>
</project>
2、Main
package bigData
import org.apache.spark.sql.SparkSession
import java.util.Properties
object DataSync {
def main(args:Array[String]):Unit = {
val app = s"${
this.getClass.getSimpleName}".filter(!_.equals('$'))
val spark: SparkSession = SparkSession.builder
.appName(app)
.master("local[*]")
.config("spark.shuffle.consolidateFiles", "true")
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.config("spark.streaming.kafka.maxRatePerPartition", "500")
.config("spark.streaming.stopGracefullyOnShutdown", "true")
.config("spark.network.timeout", "600")
.config("spark.streaming.kafka.consumer.poll.ms", "60000")
.config("spark.core.connection.ack.wait.timeout", "900")
.config("spark.rpc.message.maxSize", "50")
.config("spark.akka.timeout", "900")
.getOrCreate()
//数据源
val source_url = s"jdbc:oracle:thin:@192.168.0.101:1521/orcl"
val source_prop = new Properties()
source_prop.put("user","scott")
source_prop.put("password","scott")
source_prop.put("driver","oracle.jdbc.driver.OracleDriver")
//目标库
val target_url = s"jdbc:oracle:thin:@192.168.0.101:1521/orcl"
val target_prop = new Properties()
target_prop.put("user","scott")
target_prop.put("password","scott")
target_prop.put("driver","oracle.jdbc.driver.OracleDriver")
import spark.implicits._
spark.read.jdbc(source_url,"EMP",source_prop).write.jdbc(target_url,"EMP_TEST",target_prop)
}
}
3、运行结果
不过要注意,代码中没有做表已经存在的判断,第二次运行会报错
边栏推荐
- 微信小程序云开发 | 城市信息管理
- ctf.show_web5
- [OC study notes] attribute keyword
- 基于inquirer封装一个控制台文件选择器
- 【PHPWord】Quick Start of PHPWord in PHPOffice Suite
- 【Pointing to Offer】Pointing to Offer 22. The kth node from the bottom in the linked list
- C# wpf borderless window add shadow effect
- 卫星电话是直接与卫星通信还是通过地面站?
- MySQL——基础知识
- ByteArrayInputStream class source code analysis
猜你喜欢

【剑指 Offer】剑指 Offer 22. 链表中倒数第k个节点

Pytorch基础--tensorboard使用(一)

kotlin的by lazy

Delay queue optimization (2)
![[Prometheus] An optimization record of the Prometheus federation [continued]](/img/5d/56e171b7a02584337a0cfe5c731fb2.png)
[Prometheus] An optimization record of the Prometheus federation [continued]

A senior with 13 years of experience in software testing, summed up 5 test employment suggestions....
![【Prometheus】Prometheus联邦的一次优化记录[续]](/img/5d/56e171b7a02584337a0cfe5c731fb2.png)
【Prometheus】Prometheus联邦的一次优化记录[续]

natural language processing nltk

经济新闻:错误# 15:初始化libiomp5md。dll,但发现libiomp5md。已经初始化dll。解决方法

Immersive experience iFLYTEK 2022 Consumer Expo "Official Designated Product"
随机推荐
生物医学论文有何价值 论文中译英怎样翻译效果好
运营 23 年,昔日“国内第一大电商网站”黄了...
ESP8266-Arduino programming example-HC-SR04 ultrasonic sensor driver
scrapy基本使用
kotlin by lazy
7.30模拟赛总结
What is the value of biomedical papers? How to translate the papers into Chinese and English?
不同的路径依赖
Hello, my new name is "Bronze Lock/Tongsuo"
【PHPWord】Quick Start of PHPWord in PHPOffice Suite
Web结题报告
第十六期八股文巴拉巴拉说(MQ篇)
OneFlow source code analysis: Op, Kernel and interpreter
C# wpf 无边框窗口添加阴影效果
怎么样的框架对于开发者是友好的?
MYSQL (Basic) - An article takes you into the wonderful world of MYSQL
MySQL数据类型
CCNA-NAT协议(理论与实验练习)
第14章 类型信息
Recommendation | People who are kind to you, don't repay them by inviting them to eat