当前位置:网站首页>编译Hudi
编译Hudi
2022-07-31 02:40:00 【hyunbar】
大数据技术AI
Flink/Spark/Hadoop/数仓,数据分析、面试,源码解读等干货学习资料
129篇原创内容
公众号
版本分布
centos:centos8
hudi:0.10.1
spark:3.1.3
scala:2.12
1、Maven安装
1.1 手动安装
(1)下载maven
https://maven.apache.org/download.cgi
(2)上传解压maven
tar -zxvf apache-maven-3.6.1-bin.tar.gz -C /bigdata/
(3)添加环境变量到/etc/profile中
#MAVEN_HOME
export MAVEN_HOME=/bigdata/apache-maven-3.6.1
export PATH=$PATH:$MAVEN_HOME/bin
source /etc/profile
(4)测试安装结果
[email protected]:~$ mvn -v
Apache Maven 3.6.3
Maven home: /bigdata/apache-maven-3.6.1
Java version: 1.8.0_321, vendor: Oracle Corporation, runtime: /bigdata/module/jdk1.8.0_321/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "5.13.0-44-generic", arch: "aarch64", family: "unix"
(5)修改setting.xml,指定为阿里云
nexus-aliyun central Nexus aliyun http://maven.aliyun.com/nexus/content/groups/public

### 1.2 apt或yum安装
apt install maven
2、安装git
-------
yum install [email protected]:~$ git --versiongit version 2.25.1
3、构建hudi
--------
### 3.1 通过国内镜像拉取源码
git clone --branch release-0.10.1 https://gitee.com/apache/Hudi.git
3.2 修改pom.xml
[email protected]:~# vim Hudi/pom.xml
nexus-aliyun
nexus-aliyun
http://maven.aliyun.com/nexus/content/groups/public/
true
false
### 3.3 构建
不同spark版本的编译
| Maven build options | Expected Spark bundle jar name | Notes |
| :-- | :-- | :-- |
| (empty) | hudi-spark-bundle\_2.11 (legacy bundle name) | For Spark 2.4.4 and Scala 2.11 (default options) |
| `-Dspark2.4` | hudi-spark2.4-bundle\_2.11 | For Spark 2.4.4 and Scala 2.11 (same as default) |
| `-Dspark2.4 -Dscala-2.12` | hudi-spark2.4-bundle\_2.12 | For Spark 2.4.4 and Scala 2.12 |
| `-Dspark3.1 -Dscala-2.12` | hudi-spark3.1-bundle\_2.12 | For Spark 3.1.x and Scala 2.12 |
| `-Dspark3.2 -Dscala-2.12` | hudi-spark3.2-bundle\_2.12 | For Spark 3.2.x and Scala 2.12 |
| `-Dspark3` | hudi-spark3-bundle\_2.12 (legacy bundle name) | For Spark 3.2.x and Scala 2.12 |
| `-Dscala-2.12` | hudi-spark-bundle\_2.12 (legacy bundle name) | For Spark 2.4.4 and Scala 2.12 |
mvn clean package -DskipTests -Dspark3 -Dscala-2.12
耗时周末一天,终于编译成功

### 4、问题总结
#### **Q1:dependencies at io.confluent:kafka-avro-serializer:jar**
ERROR] Failed to execute goal on project hudi-utilities_2.12: Could not resolve dependencies for project org.apache.hudi:hudi-utilities_2.12:jar:0.10.1: Failed to collect dependencies at io.confluent:kafka-avro-serializer:jar:5.3.4: Failed to read artifact descriptor for io.confluent:kafka-avro-serializer:jar:5.3.4: Could not transfer artifact io.confluent:kafka-avro-serializer:pom:5.3.4 from/to maven-default-http-blocker (http://0.0.0.0/): Blocked mirror for repositories: [nexus-aliyun (http://maven.aliyun.com/nexus/content/groups/public/, default, releases)] -> [Help 1]
解决:将原来的mirror也打开,阿里仓库没有
Starting from versions 0.11, Hudi no longer requires spark-avro to be specified using --packages

#### **Q2:The goal you specified requires a project to execute but there is no POM in this directory (/root). Please verify you invoked Maven from the correct directory**
解决:切换到有pom的文件夹下才能执行
边栏推荐
- 软件积累 -- 截图软件ScreenToGif
- 怎样做好一个创业公司CTO?
- To write good test cases, you must first learn test design
- Coldfusion file read holes (CVE - 2010-2861)
- StringJoiner详解
- CMOS和TTL的区别?
- AI software development process in medical imaging field
- 【银行系列第一期】中国人民银行
- BAT can't sell "Medical Cloud": Hospitals flee, mountains stand, and there are rules
- 【Android】Room —— SQLite的替代品
猜你喜欢
mmdetection trains a model related command
关于 mysql8.0数据库中主键位id,使用replace插入id为0时,实际id插入后自增导致数据重复插入 的解决方法
User interaction + formatted output
Manchester City confuses fans with smart scarf that detects emotions
图解lower_bound&upper_bound
系统需求多变如何设计
221. Largest Square
Installation, start and stop of redis7 under Linux
Introduction to flask series 】 【 flask - using SQLAlchemy
Shell script to loop through values in log file to sum and calculate average, max and min
随机推荐
跨专业考研难度大?“上岸”成功率低?这份实用攻略请收下!
英特尔软硬优化,赋能东软加速智慧医疗时代到来
Mathematical Ideas in AI
【银行系列第一期】中国人民银行
Introduction to flask series 】 【 flask - using SQLAlchemy
经典链表OJ强训题——快慢双指针高效解法
8、统一处理异常(控制器通知@ControllerAdvice全局配置类、@ExceptionHandler统一处理异常)
BAT can't sell "Medical Cloud": Hospitals flee, mountains stand, and there are rules
f.grid_sample
ShardingJDBC使用总结
Teach you how to configure Jenkins automated email notifications
Live Preview | KDD2022 Doctoral Dissertation Award Champion and Runner-up Dialogue
自动化办公案例:如何自动生成期数据?
Basic learning about Redis related content
mysql view
CentOS7下mysql5.7.37的安装【完美方案】
数学解决——环形链表问题
MPPT太阳能充放电控制器数据采集-通过网关采集电池电压容量电量SOC,wifi传输
The application of AI in the whole process of medical imaging equipment
The difference between link and @import