当前位置:网站首页>Spark学习:用spark实现ETL
Spark学习:用spark实现ETL
2022-07-30 18:45:00 【我爱夜来香A】
一、RDBMS To RDBMS
1、pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>Spark</groupId>
<artifactId>Spark</artifactId>
<version>0.0.1-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.2.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.2</version>
<executions>
<execution>
<id>compile-scala</id>
<phase>compile</phase>
<goals>
<goal>add-source</goal>
<goal>compile</goal>
</goals>
</execution>
<execution>
<id>test-compile-scala</id>
<phase>test-compile</phase>
<goals>
<goal>add-source</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<configuration>
<scalaVersion>2.12.15</scalaVersion>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.2</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass></mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.6.0</version>
<executions>
<execution>
<goals>
<goal>exec</goal>
</goals>
</execution>
</executions>
<configuration>
<executable>java</executable>
<includeProjectDependencies>true</includeProjectDependencies>
<includePluginDependencies>false</includePluginDependencies>
<classpathScope>compile</classpathScope>
<mainClass>cn.spark.study.App</mainClass>
</configuration>
</plugin>
</plugins>
</build>
</project>
2、Main
package bigData
import org.apache.spark.sql.SparkSession
import java.util.Properties
object DataSync {
def main(args:Array[String]):Unit = {
val app = s"${
this.getClass.getSimpleName}".filter(!_.equals('$'))
val spark: SparkSession = SparkSession.builder
.appName(app)
.master("local[*]")
.config("spark.shuffle.consolidateFiles", "true")
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.config("spark.streaming.kafka.maxRatePerPartition", "500")
.config("spark.streaming.stopGracefullyOnShutdown", "true")
.config("spark.network.timeout", "600")
.config("spark.streaming.kafka.consumer.poll.ms", "60000")
.config("spark.core.connection.ack.wait.timeout", "900")
.config("spark.rpc.message.maxSize", "50")
.config("spark.akka.timeout", "900")
.getOrCreate()
//数据源
val source_url = s"jdbc:oracle:thin:@192.168.0.101:1521/orcl"
val source_prop = new Properties()
source_prop.put("user","scott")
source_prop.put("password","scott")
source_prop.put("driver","oracle.jdbc.driver.OracleDriver")
//目标库
val target_url = s"jdbc:oracle:thin:@192.168.0.101:1521/orcl"
val target_prop = new Properties()
target_prop.put("user","scott")
target_prop.put("password","scott")
target_prop.put("driver","oracle.jdbc.driver.OracleDriver")
import spark.implicits._
spark.read.jdbc(source_url,"EMP",source_prop).write.jdbc(target_url,"EMP_TEST",target_prop)
}
}
3、运行结果
不过要注意,代码中没有做表已经存在的判断,第二次运行会报错
边栏推荐
- Recommended Books | Recommend 3 database books with rave reviews
- ESP8266-Arduino编程实例-HC-SR04超声波传感器驱动
- AWS console
- ESP8266-Arduino programming example-HC-SR04 ultrasonic sensor driver
- SwiftUI iOS Boutique Open Source Project Complete Baked Food Recipe App based on SQLite (tutorial including source code)
- 网络基础(三)01-网络的基础概念——URL地址组成之协议、主机地址、路径和参数&127.0.0.1本地回环地址& 查看网址IP地址并访问之ping空格+网址&netstat -anb查看本机占用端口
- Anaconda Navigator stuck on loading applications
- 尊重客观事实
- 高精度加法
- 【剑指 Offe】剑指 Offer 18. 删除链表的节点
猜你喜欢
线性筛求积性函数
Critical Reviews | 南农邹建文组综述全球农田土壤抗生素与耐药基因分布
Critical Reviews | A review of the global distribution of antibiotics and resistance genes in farmland soil by Nannong Zou Jianwen's group
深化校企合作 搭建技术技能人才成长“立交桥”
core sound driver详解
What is the value of biomedical papers? How to translate the papers into Chinese and English?
【剑指 Offer】剑指 Offer 22. 链表中倒数第k个节点
【Swords Offer】Swords Offer 17. Print n digits from 1 to the largest
Fixed asset visualization intelligent management system
LeetCode 练习——关于查找数组元素之和的两道题
随机推荐
CCNA-NAT协议(理论与实验练习)
arcpy获取要素类(属性表)包含的数目
沉浸式体验科大讯飞2022消博会“官方指定产品”
Read the "Language Model" in one article
CCNA-子网划分(VLSM)
MYSQL (Basic) - An article takes you into the wonderful world of MYSQL
AWS 控制台
Fixed asset visualization intelligent management system
WEBSOCKETPP使用简介+demo
OneFlow source code analysis: Op, Kernel and interpreter
Go system collection
尊重客观事实
Multiple instances of mysql
积性函数
【每日一道LeetCode】——191. 位1的个数
Chapter 14 Type Information
OSPF详解(4)
OneFlow源码解析:Op、Kernel与解释器
Delay queue optimization (2)
NC | 西湖大学陶亮组-TMPRSS2“助攻”病毒感染并介导索氏梭菌出血毒素的宿主入侵...