当前位置:网站首页>Spark project Packaging Optimization Practice

Spark project Packaging Optimization Practice

2022-06-24 07:05:00 Angryshark_ one hundred and twenty-eight

Problem description

In the use of Scala/Java Conduct Spark Project development process , It often involves project construction, packaging and uploading , Due to project dependency Spark Basic related packages are generally large , If remote development debugging is involved after packaging , It takes a lot more time to pack each time , Therefore, this process needs to be optimized .

Optimization plan

programme 1: One full upload jar package , Subsequent incremental updates class

POM File configuration (Maven)

  <dependencies>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.11</artifactId>
      <version>${spark.version}</version>
    </dependency>
    
    ........
    
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.4</version>
      <scope>test</scope>
    </dependency>
  </dependencies>

  <!--  Building configuration  -->
  <build>
    <resources>
      <resource>
        <directory>src/main/resources</directory>
      </resource>
    </resources>
    <plugins>
      <plugin>
        <groupId>net.alchim31.maven</groupId>
        <artifactId>scala-maven-plugin</artifactId>
        <version>3.2.2</version>
        <configuration>
          <recompileMode>incremental</recompileMode>
        </configuration>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
              <goal>testCompile</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-assembly-plugin</artifactId>
        <version>2.4.1</version>
        <configuration>
          <!-- get all project dependencies -->
          <descriptorRefs>
            <descriptorRef>jar-with-dependencies</descriptorRef>
          </descriptorRefs>
        </configuration>
        <executions>
          <execution>
            <id>make-assembly</id>
            <!-- bind to the packaging phase -->
            <phase>package</phase>
            <goals>
              <goal>single</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>

Package according to the above configuration , You'll get *-1.0-SNAPSHOT.jar and *-1.0-SNAPSHOT-jar-with-dependencies.jar Two jar package , The latter can be performed separately jar package , But because it's packed in a lot of useless dependencies , It leads to even a very simple project , One or two hundred M.

Principle notice :jar The bag is actually just an ordinary rar Compressed package , After decompression, the internal system is composed of related jar Packages and compiled class file 、 Static resource files, etc . in other words , Every time we modify the code and repackage it , Just updated some of them class Or static resource files , Therefore, subsequent updates only need to replace the updated code class File can .

example :
Write a simple sparktest project , After packing, there will be sparktest-1.0-SNAPSHOT.jar and sparktest-1.0-SNAPSHOT-jar-with-dependencies.jar Two jar package .

 Insert picture description here

among sparktest-1.0-SNAPSHOT-jar-with-dependencies.jar Is separately executable jar package , Upload to the server to execute , Use the decompression software to open the jar You can see the directory structure .

 Insert picture description here

among ,App*.class File is the compiled file corresponding to the main code ,

 Insert picture description here

modify App.scala After code , Execute re compile once , stay target/classes Under the directory, you can see the new App*.class

 Insert picture description here

Will update the class file , Upload to server jar The package is under the same directory , Replace it

jar uvf sparktest-1.0-SNAPSHOT-jar-with-dependencies.jar App*.class

notes : If so class The file is not in jar Package root directory , Then create the same directory , Then replace , Such as

jar uvf sparktest-1.0-SNAPSHOT-jar-with-dependencies.jar com/example/App*.class

programme 2: Dependencies are uploaded separately from the project , The project will be updated separately later jar package

POM File configuration (Maven)

<dependencies>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>${spark.version}</version>
        </dependency>
        ......
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.4</version>
            <scope>test</scope>
        </dependency>
    </dependencies>

    <build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-dependency-plugin</artifactId>
            <executions>
                <execution>
                    <id>copy-dependencies</id>
                    <phase>package</phase>
                    <goals>
                        <goal>copy-dependencies</goal>
                    </goals>
                    <configuration>
                        <outputDirectory>target/lib</outputDirectory>
                        <excludeTransitive>false</excludeTransitive>
                        <stripVersion>true</stripVersion>
                    </configuration>
                </execution>
            </executions>
        </plugin>
        <!--scala Packaging plug-in -->
        <plugin>
            <groupId>net.alchim31.maven</groupId>
            <artifactId>scala-maven-plugin</artifactId>
            <version>3.3.1</version>
            <executions>
                <execution>
                    <goals>
                        <goal>compile</goal>
                        <goal>testCompile</goal>
                    </goals>
                    <configuration>
                        <args>
                            <arg>-dependencyfile</arg>
                            <arg>${project.build.directory}/.scala_dependencies</arg>
                        </args>
                    </configuration>
                </execution>
            </executions>
        </plugin>
        <!--java Code packaging plug-in , Don't package dependencies -->
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-jar-plugin</artifactId>
            <configuration>
                <archive>
                    <manifest>
                        <!-- <addClasspath>true</addClasspath> -->
                        <mainClass>com.oidd.App</mainClass>
                    </manifest>
                </archive>
            </configuration>
        </plugin>
    </plugins>

After packing , There will be a separate jar Bao He lib Catalog

 Insert picture description here

Upload jar Bao He lib Folder to server , Subsequent updates only need to replace jar Bag can , perform spark-submit when , Just add –jars *\lib*.jar directory .

原网站

版权声明
本文为[Angryshark_ one hundred and twenty-eight]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/175/202206240050384938.html