当前位置:网站首页>Spark Learning: Add Custom Optimization Rules for Spark Sql
Spark Learning: Add Custom Optimization Rules for Spark Sql
2022-07-31 13:16:00 【I love night Laixiang A】
一、Custom optimization rules
Spark在2.2Version introduces a powerful feature,Add hooks and extension points,Allows users to customize optimization rules
1、实现自定义规则 (静默规则,通过 set spark.sql.planChangeLog.level=WARN,确认执行到就行)
case class MyPushDown(spark: SparkSession) extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
.... }
}
2、创建自己的 Extension 并注入
class MySparkSessionExtension extends (SparkSessionExtensions => Unit) {
override def apply(extensions: SparkSessionExtensions): Unit = {
extensions.injectOptimizerRule {
session =>
new MyPushDown(session)
}
}
}
3、通过 spark.sql.extensions 提交
bin/spark-sql --jars my.jar --conf MySparkSessionExtension
二、代码实现
1、pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>Spark</groupId>
<artifactId>Spark</artifactId>
<version>0.0.1-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.2.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.2</version>
<executions>
<execution>
<id>compile-scala</id>
<phase>compile</phase>
<goals>
<goal>add-source</goal>
<goal>compile</goal>
</goals>
</execution>
<execution>
<id>test-compile-scala</id>
<phase>test-compile</phase>
<goals>
<goal>add-source</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<configuration>
<scalaVersion>2.12.15</scalaVersion>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.2</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass></mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.6.0</version>
<executions>
<execution>
<goals>
<goal>exec</goal>
</goals>
</execution>
</executions>
<configuration>
<executable>java</executable>
<includeProjectDependencies>true</includeProjectDependencies>
<includePluginDependencies>false</includePluginDependencies>
<classpathScope>compile</classpathScope>
<mainClass>cn.spark.study.App</mainClass>
</configuration>
</plugin>
</plugins>
</build>
</project>
2、class
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
import org.apache.spark.sql.catalyst.rules.Rule
case class MyRule(spark: SparkSession) extends Rule[LogicalPlan] {
logWarning("The rules of Genghis Khan")
override def apply(plan: LogicalPlan): LogicalPlan = plan
}
import org.apache.spark.sql.SparkSessionExtensions
class MySparkSessionExtension extends (SparkSessionExtensions => Unit) {
override def apply(extensions: SparkSessionExtensions): Unit = {
extensions.injectOptimizerRule {
session => MyRule(session)
}
}
}
3、使用maven打成jar包
三、结果验证

大功告成!完美!!!
边栏推荐
猜你喜欢

C# control StatusStrip use

Edge Cloud Explained in Simple Depth | 4. Lifecycle Management

Error: npm ERR code EPERM

Anaconda安装labelImg图像标注软件

IDEA连接MySQL数据库并执行SQL查询操作

攻防演练丨赛宁红方管控平台走进广东三地 助力数字政府网络安全建设

关于MySQL主从复制的数据同步延迟问题

ERROR 1819 (HY000) Your password does not satisfy the current policy requirements

Solution for browser hijacking by hao360

Google Chrome(谷歌浏览器)安装使用
随机推荐
How IDEA runs web programs
ECCV2022:在Transformer上进行递归,不增参数,计算量还少!
PyQt5 rapid development and actual combat 10.2 compound interest calculation && 10.3 refresh blog clicks
How to quickly split and merge cell data in Excel
滑窗法切分数据
架构实战营|模块8
Introduction to using NPM
ECCV2022: Recursion on Transformer without adding parameters and less computation!
[RPI]树莓派监控温度及报警关机保护「建议收藏」
Productivity Tools and Plugins
爱可可AI前沿推介(7.31)
pytorch gpu版本安装最新
NameNode故障处理的两种方法
ECCV 2022 | 机器人的交互感知与物体操作
C#使用ComboBox控件
Invalid bound statement (not found)出现的原因和解决方法
查看Mysql数据库版本
NameNode (NN) 和SecondaryNameNode (2NN)工作机制
go中select语句
IDEA的database使用教程(使用mysql数据库)