当前位置:网站首页>元数据管理Apache Atlas编译集成部署及测试
元数据管理Apache Atlas编译集成部署及测试
2022-06-29 15:45:00 【韧小钊】
Apache Atlas集成部署
一、背景
Atlas采集Hive元数据过程,通过hive hook及Kafka作为中间件完成元数据采集。直接通过hive hook采集(同步)的话,会对元数据采集源性能造成影响。因此通过Kafka作为中间件传送消息。本文直接通过前文的内嵌式编译包部署(通用-只需修改配置文件),新增Kafka、hive、hbase、solr部署。此次简单记录一下吧,哪天忍不住买工作站了,好好整理一下这些集群的详细部署,最近都部署吐了,电脑也吐了,硬盘坏过,蓝屏等等…
二、基础组件
2.1、hadoop
虚拟机之前部署过(2.7.3版本),略!
2.2、Kafka
kafka_2.13-3.2.0.tgz下载地址,速度比较快,如果不行,只能去官网下载了。
config/server.properties
直接解压,指定zk地址即可,目前都是单机版,没办法,资源有限
- 启动命令
nohup bin/kafka-server-start.sh config/server.properties &
- 查看topic命令
./kafka-topics.sh --bootstrap-server localhost:9092 --list
- 查看具体topic内容命令
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic ATLAS_HOOK --from-beginning
2.3、hive
自行官网下载,有积分的花两分快速下载
- conf/hive-env.sh
此次主要是增加HIVE_AUX_JARS_PATH变量,其路径部署atlas时会涉及到
HADOOP_HOME=/root/hadoop-2.7.3
export HIVE_CONF_DIR=/root/apache-hive-3.1.3-bin/conf
export HIVE_AUX_JARS_PATH=/home/atlas/apache-atlas-2.2.0/hook/hive
- bin/hive
指定hbase
- conf/hive-site.xml
添加hooks(还需要部署MySQL哟,也可以不部署,采用内嵌模式-应该不影响采集吧,尝试的越多越感觉无知)
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.38.10:3306/hive_metastore?createDatabaseIfNotExist=true&useSSL=false&allowPublicKeyRetrieval=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>[email protected]</value>
</property>
<property>
<name>datanucleus.schema.autoCreateAll</name>
<value>true</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>192.168.38.10</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<!-- Hive元数据存储的验证 -->
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<!-- 元数据存储授权 -->
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
<property>
<name>hive.exec.post.hooks</name>
<value>org.apache.atlas.hive.hook.HiveHook</value>
</property>
</configuration>
- conf/atlas-application.properties
从atlas/conf目录下拷贝过来 - 元数据初始化
bin/schematool -dbType mysql -initSchema
- 启动
nohup bin/hive --service hiveserver2 &
2.4、zookeeper-3.4.1
略
2.5、hbase和solr
下载地址
本文直接用的内嵌式编译包中的hbase和solr,修改相关配置即可
- hbase-env.sh
单独部署了zk,此处设为false
export HBASE_MANAGES_ZK=false
- hbase-site.xml
<configuration>
<!-- The following properties are set for running HBase as a single process on a developer workstation. With this configuration, HBase is running in "stand-alone" mode and without a distributed file system. In this mode, and without further configuration, HBase and ZooKeeper data are stored on the local filesystem, in a path under the value configured for `hbase.tmp.dir`. This value is overridden from its default value of `/tmp` because many systems clean `/tmp` on a regular basis. Instead, it points to a path within this HBase installation directory. Running against the `LocalFileSystem`, as opposed to a distributed filesystem, runs the risk of data integrity issues and data loss. Normally HBase will refuse to run in such an environment. Setting `hbase.unsafe.stream.capability.enforce` to `false` overrides this behavior, permitting operation. This configuration is for the developer workstation only and __should not be used in production!__ -->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>./tmp</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
</configuration>
- 启动
bin/start-hbase.sh
- solr 启停
bin/solr start
bin/solr stop -p 8983
三、Apache Atlas
3、配置
3.1、conf/atlas-application.properties
编译后文件内容都有的,只要改下地址、路径即可
atlas.graph.storage.backend=hbase2
atlas.graph.storage.hbase.table=apache_atlas_janus
atlas.graph.storage.hostname=192.168.38.10
atlas.graph.storage.hbase.regions-per-server=1
atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.HBaseBasedAuditRepository
atlas.graph.index.search.backend=solr
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=192.168.38.10:2181
atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
atlas.graph.index.search.solr.zookeeper-session-timeout=60000
atlas.graph.index.search.solr.wait-searcher=false
atlas.graph.index.search.max-result-set-size=150
atlas.notification.embedded=false
atlas.kafka.data=/home/atlas/apache-atlas-2.2.0/data/kafka
atlas.kafka.zookeeper.connect=192.168.38.10:2181/kafka
atlas.kafka.bootstrap.servers=192.168.38.10:9092
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.connection.timeout.ms=200
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.hook.group.id=atlas
atlas.kafka.enable.auto.commit=false
atlas.kafka.auto.offset.reset=earliest
atlas.kafka.session.timeout.ms=30000
atlas.kafka.offsets.topic.replication.factor=1
atlas.kafka.poll.timeout.ms=1000
atlas.notification.create.topics=true
atlas.notification.replicas=1
atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
atlas.notification.log.failed.messages=true
atlas.notification.consumer.retry.interval=500
atlas.notification.hook.retry.interval=1000
atlas.enableTLS=false
atlas.authentication.method.kerberos=false
atlas.authentication.method.file=true
atlas.authentication.method.ldap.type=none
atlas.authentication.method.file.filename=${sys:atlas.home}/conf/users-credentials.properties
atlas.rest.address=http://192.168.38.10:21000
atlas.audit.hbase.tablename=apache_atlas_entity_audit
atlas.audit.zookeeper.session.timeout.ms=1000
atlas.audit.hbase.zookeeper.quorum=192.168.38.10:2181
atlas.server.ha.enabled=false
atlas.authorizer.impl=simple
atlas.authorizer.simple.authz.policy.file=atlas-simple-authz-policy.json
atlas.rest-csrf.enabled=true
atlas.rest-csrf.browser-useragents-regex=^Mozilla.*,^Opera.*,^Chrome.*
atlas.rest-csrf.methods-to-ignore=GET,OPTIONS,HEAD,TRACE
atlas.rest-csrf.custom-header=X-XSRF-HEADER
atlas.metric.query.cache.ttlInSecs=900
atlas.search.gremlin.enable=false
atlas.ui.default.version=v1
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary
3.2、conf/atlas-env.sh
export MANAGE_EMBEDDED_CASSANDRA=false
export MANAGE_LOCAL_ELASTICSEARCH=false
export HBASE_CONF_DIR=/home/atlas/hbase/conf
3.3、apache-atlas-2.2.0-hive-hook.tar.gz
下载地址,解压apache-atlas-2.2.0-hive-hook.tar.gz,将内容拷贝到atlas安装目录下
对应hive中增加的HIVE_AUX_JARS_PATH变量
四、测试
hive创建数据库、表
[[email protected] bin]# ./beeline -u jdbc:hive2://192.168.38.10:10000 -n root
Connecting to jdbc:hive2://192.168.38.10:10000
Connected to: Apache Hive (version 3.1.3)
Driver: Hive JDBC (version 3.1.3)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.3 by Apache Hive
0: jdbc:hive2://192.168.38.10:10000> show databases;
+----------------+
| database_name |
+----------------+
| default |
+----------------+
1 row selected (3.811 seconds)
0: jdbc:hive2://192.168.38.10:10000> create database testatlas;
No rows affected (0.375 seconds)
0: jdbc:hive2://192.168.38.10:10000> use testatlas;
No rows affected (0.152 seconds)
0: jdbc:hive2://192.168.38.10:10000> CREATE TABLE atlas_table_test(id int,name string);
No rows affected (2.664 seconds)
0: jdbc:hive2://192.168.38.10:10000> show tables;
+-------------------+
| tab_name |
+-------------------+
| atlas_table_test |
+-------------------+
1 row selected (0.195 seconds)
0: jdbc:hive2://192.168.38.10:10000> select * from atlas_table_test;
+----------------------+------------------------+
| atlas_table_test.id | atlas_table_test.name |
+----------------------+------------------------+
+----------------------+------------------------+
No rows selected (2.983 seconds)
0: jdbc:hive2://192.168.38.10:10000>

一段时间后,atlas便可以查看到(历史数据不会同步,需要通过hook-bin/import-hive.sh导入)
- 进入hbase命令界面
bin/hbase shell
- 列举表
list
- 全表查询
scan "apache_atlas_entity_audit"

边栏推荐
- 【Rust日报】 Rust 2021-稳定性
- #夏日挑战赛# HarmonyOS - 方舟开发框架ArkUI 流光按钮效果
- C language big job - Matching System
- BS-GX-018 基于SSM实现在校学生考试系统
- 蓝桥杯几道全排列的题目
- 微博评论高可用高性能计算架构
- El table column row button anti weight control loading
- I want to know where I can open an account in Nanning? In addition, is it safe to open a mobile account?
- The difference between Magento and WordPress
- 如何在网站上安装 WordPress
猜你喜欢

LeetCode-234-回文链表

scratch报时的公鸡 电子学会图形化编程scratch等级考试一级真题和答案解析2022年6月

CVPR 2022 | 大幅减少零样本学习所需的人工标注,马普所和北邮提出富含视觉信息的类别语义嵌入

2022 OpenVINO DevCon 大揭秘!英特尔携众多合作伙伴深化开发者生态建设,释放AI产业创新潜能

Digital tracking analysis of insurance services in the first quarter of 2022

瓜分1000+万奖金池,昇腾AI创新大赛2022实力赋能开发者

MySQL常用语句和命令汇总

Interviewer: tell me about the MySQL transaction isolation level?

自学编程能看得懂代码,但是自己写不出来怎么办

three. JS and Gaode map are combined to introduce obj format model - effect demonstration
随机推荐
C language homework - matching system
[rust daily] rust 2021 stability
JD health responded that it planned to acquire JD assets with us $355.4 million: related to pet health product category
Kotlin annotation declaration and use
A. Print a Pedestal (Codeforces logo?)
mysql数据库基础:DDL数据定义语言
支付宝“安全锁”入选信通院“护童计划”优秀案例:超过33万用户已开通游戏保护
硬件开发笔记(八): 硬件开发基本流程,制作一个USB转RS232的模块(七):创建基础DIP元器件(晶振)封装并关联原理图元器件
挖财学堂证券开户安全嘛?
2022年第一季度保险服务数字化跟踪分析
智能聊天机器人的优势在哪里?资深独立站卖家告诉你!
自学编程能看得懂代码,但是自己写不出来怎么办
A. Marathon
Digital tracking analysis of insurance services in the first quarter of 2022
新股民如何网上开户?究竟网上开户是否安全么?
小程序判断数据为不为空
京东方:随着下半年旺季到来、促销拉动、新产品发布等影响,需求有望出现好转
按键精灵打怪学习-多窗口多线程后台判断人物、宠物血量和宠物快乐度
水球图-利用动态波纹展示百分比
【大家的项目】 Rbatis ORM官网上线