当前位置:网站首页>Spark学习:为Spark Sql添加自定义优化规则
Spark学习:为Spark Sql添加自定义优化规则
2022-07-31 13:03:00 【我爱夜来香A】
一、自定义优化规则
Spark在2.2版本引入了一个强大的特性,添加钩子和拓展点,允许用户自定义优化规则
1、实现自定义规则 (静默规则,通过 set spark.sql.planChangeLog.level=WARN,确认执行到就行)
case class MyPushDown(spark: SparkSession) extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
.... }
}
2、创建自己的 Extension 并注入
class MySparkSessionExtension extends (SparkSessionExtensions => Unit) {
override def apply(extensions: SparkSessionExtensions): Unit = {
extensions.injectOptimizerRule {
session =>
new MyPushDown(session)
}
}
}
3、通过 spark.sql.extensions 提交
bin/spark-sql --jars my.jar --conf MySparkSessionExtension
二、代码实现
1、pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>Spark</groupId>
<artifactId>Spark</artifactId>
<version>0.0.1-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.2.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.2</version>
<executions>
<execution>
<id>compile-scala</id>
<phase>compile</phase>
<goals>
<goal>add-source</goal>
<goal>compile</goal>
</goals>
</execution>
<execution>
<id>test-compile-scala</id>
<phase>test-compile</phase>
<goals>
<goal>add-source</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<configuration>
<scalaVersion>2.12.15</scalaVersion>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.2</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass></mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.6.0</version>
<executions>
<execution>
<goals>
<goal>exec</goal>
</goals>
</execution>
</executions>
<configuration>
<executable>java</executable>
<includeProjectDependencies>true</includeProjectDependencies>
<includePluginDependencies>false</includePluginDependencies>
<classpathScope>compile</classpathScope>
<mainClass>cn.spark.study.App</mainClass>
</configuration>
</plugin>
</plugins>
</build>
</project>
2、class
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
import org.apache.spark.sql.catalyst.rules.Rule
case class MyRule(spark: SparkSession) extends Rule[LogicalPlan] {
logWarning("成吉思汗的规则")
override def apply(plan: LogicalPlan): LogicalPlan = plan
}
import org.apache.spark.sql.SparkSessionExtensions
class MySparkSessionExtension extends (SparkSessionExtensions => Unit) {
override def apply(extensions: SparkSessionExtensions): Unit = {
extensions.injectOptimizerRule {
session => MyRule(session)
}
}
}
3、使用maven打成jar包
三、结果验证

大功告成!完美!!!
边栏推荐
- Double non-one into bytes!!Pure dry goods sharing
- 基于verilog的CRC校验(汇总)
- Optimization of five data submission methods
- ERROR 1064 (42000) You have an error in your SQL syntax; check the manual that corresponds to your
- CWE4.8 -- 2022年危害最大的25种软件安全问题
- Introduction to using NPM
- 函数的参数
- 计算机复试面试问题(计算机面试常见问题)
- EXCEL如何快速拆分合并单元格数据
- 亲测可用!!!WPF中遍历整个窗口的所有TextBox组件,对每个输入框做非空判断。
猜你喜欢

PHP序列化:eval

报错IDEA Terminated with exit code 1

golang-gin-pprof-使用以及安全问题

阿里三面:MQ 消息丢失、重复、积压问题,怎么解决?

PyQt5 rapid development and actual combat 10.2 compound interest calculation && 10.3 refresh blog clicks

ERROR 2003 (HY000) Can‘t connect to MySQL server on ‘localhost3306‘ (10061)解决办法

IDEA连接MySQL数据库并执行SQL查询操作

AMBA APB学习记录(AMBA 3/4)

关于MySQL主从复制的数据同步延迟问题

Istio微服务治理网格的全方面可视化监控(微服务架构展示、资源监控、流量监控、链路监控)
随机推荐
vivado里那些看不懂的原语
Invalid bound statement (not found)出现的原因和解决方法
Google Chrome(谷歌浏览器)安装使用
sqlalchemy determines whether a field of type array has at least one consistent data with an array
The 2nd activity of the TOGAF10 Standard Reading Club continues wonderfully, and the highlights will be reviewed!
关于我放弃考研这件事儿
【牛客刷题-SQL大厂面试真题】NO3.电商场景(某东商城)
Build a Valentine's Day confession website (super detailed process, package teaching package)
报错:npm ERR code EPERM
CentOS7 —— yum安装mysql
EXCEL如何快速拆分合并单元格数据
小试牛刀—猜数字游戏
Use IN List Population in Your JDBC Application to Avoid Cursor Cache Contention Issues
PyQt5 rapid development and actual combat 10.1 Get city weather forecast
基于去噪自编码器的故障隔离与识别方法
SAP ABAP OData 服务如何支持 $filter (过滤)操作试读版
硬盘分区,拓展C盘,不重装系统,不重装D盘软件的全教程。
2022年最新重庆建筑安全员模拟题库及答案
FIFO深度计算学习记录(汇总)
P5019 [NOIP2018 提高组] 铺设道路