当前位置:网站首页>元数据管理Apache Atlas编译集成部署及测试
元数据管理Apache Atlas编译集成部署及测试
2022-06-29 15:45:00 【韧小钊】
Apache Atlas集成部署
一、背景
Atlas采集Hive元数据过程,通过hive hook及Kafka作为中间件完成元数据采集。直接通过hive hook采集(同步)的话,会对元数据采集源性能造成影响。因此通过Kafka作为中间件传送消息。本文直接通过前文的内嵌式编译包部署(通用-只需修改配置文件),新增Kafka、hive、hbase、solr部署。此次简单记录一下吧,哪天忍不住买工作站了,好好整理一下这些集群的详细部署,最近都部署吐了,电脑也吐了,硬盘坏过,蓝屏等等…
二、基础组件
2.1、hadoop
虚拟机之前部署过(2.7.3版本),略!
2.2、Kafka
kafka_2.13-3.2.0.tgz下载地址,速度比较快,如果不行,只能去官网下载了。
config/server.properties
直接解压,指定zk地址即可,目前都是单机版,没办法,资源有限- 启动命令
nohup bin/kafka-server-start.sh config/server.properties &
- 查看topic命令
./kafka-topics.sh --bootstrap-server localhost:9092 --list
- 查看具体topic内容命令
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic ATLAS_HOOK --from-beginning
2.3、hive
自行官网下载,有积分的花两分快速下载
- conf/hive-env.sh
此次主要是增加HIVE_AUX_JARS_PATH
变量,其路径部署atlas时会涉及到
HADOOP_HOME=/root/hadoop-2.7.3
export HIVE_CONF_DIR=/root/apache-hive-3.1.3-bin/conf
export HIVE_AUX_JARS_PATH=/home/atlas/apache-atlas-2.2.0/hook/hive
- bin/hive
指定hbase
- conf/hive-site.xml
添加hooks(还需要部署MySQL哟,也可以不部署,采用内嵌模式-应该不影响采集吧,尝试的越多越感觉无知)
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.38.10:3306/hive_metastore?createDatabaseIfNotExist=true&useSSL=false&allowPublicKeyRetrieval=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>[email protected]</value>
</property>
<property>
<name>datanucleus.schema.autoCreateAll</name>
<value>true</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>192.168.38.10</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<!-- Hive元数据存储的验证 -->
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<!-- 元数据存储授权 -->
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
<property>
<name>hive.exec.post.hooks</name>
<value>org.apache.atlas.hive.hook.HiveHook</value>
</property>
</configuration>
- conf/atlas-application.properties
从atlas/conf目录下拷贝过来 - 元数据初始化
bin/schematool -dbType mysql -initSchema
- 启动
nohup bin/hive --service hiveserver2 &
2.4、zookeeper-3.4.1
略
2.5、hbase和solr
下载地址
本文直接用的内嵌式编译包中的hbase和solr,修改相关配置即可
- hbase-env.sh
单独部署了zk,此处设为false
export HBASE_MANAGES_ZK=false
- hbase-site.xml
<configuration>
<!-- The following properties are set for running HBase as a single process on a developer workstation. With this configuration, HBase is running in "stand-alone" mode and without a distributed file system. In this mode, and without further configuration, HBase and ZooKeeper data are stored on the local filesystem, in a path under the value configured for `hbase.tmp.dir`. This value is overridden from its default value of `/tmp` because many systems clean `/tmp` on a regular basis. Instead, it points to a path within this HBase installation directory. Running against the `LocalFileSystem`, as opposed to a distributed filesystem, runs the risk of data integrity issues and data loss. Normally HBase will refuse to run in such an environment. Setting `hbase.unsafe.stream.capability.enforce` to `false` overrides this behavior, permitting operation. This configuration is for the developer workstation only and __should not be used in production!__ -->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>./tmp</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
</configuration>
- 启动
bin/start-hbase.sh
- solr 启停
bin/solr start
bin/solr stop -p 8983
三、Apache Atlas
3、配置
3.1、conf/atlas-application.properties
编译后文件内容都有的,只要改下地址、路径即可
atlas.graph.storage.backend=hbase2
atlas.graph.storage.hbase.table=apache_atlas_janus
atlas.graph.storage.hostname=192.168.38.10
atlas.graph.storage.hbase.regions-per-server=1
atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.HBaseBasedAuditRepository
atlas.graph.index.search.backend=solr
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=192.168.38.10:2181
atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
atlas.graph.index.search.solr.zookeeper-session-timeout=60000
atlas.graph.index.search.solr.wait-searcher=false
atlas.graph.index.search.max-result-set-size=150
atlas.notification.embedded=false
atlas.kafka.data=/home/atlas/apache-atlas-2.2.0/data/kafka
atlas.kafka.zookeeper.connect=192.168.38.10:2181/kafka
atlas.kafka.bootstrap.servers=192.168.38.10:9092
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.connection.timeout.ms=200
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.hook.group.id=atlas
atlas.kafka.enable.auto.commit=false
atlas.kafka.auto.offset.reset=earliest
atlas.kafka.session.timeout.ms=30000
atlas.kafka.offsets.topic.replication.factor=1
atlas.kafka.poll.timeout.ms=1000
atlas.notification.create.topics=true
atlas.notification.replicas=1
atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
atlas.notification.log.failed.messages=true
atlas.notification.consumer.retry.interval=500
atlas.notification.hook.retry.interval=1000
atlas.enableTLS=false
atlas.authentication.method.kerberos=false
atlas.authentication.method.file=true
atlas.authentication.method.ldap.type=none
atlas.authentication.method.file.filename=${sys:atlas.home}/conf/users-credentials.properties
atlas.rest.address=http://192.168.38.10:21000
atlas.audit.hbase.tablename=apache_atlas_entity_audit
atlas.audit.zookeeper.session.timeout.ms=1000
atlas.audit.hbase.zookeeper.quorum=192.168.38.10:2181
atlas.server.ha.enabled=false
atlas.authorizer.impl=simple
atlas.authorizer.simple.authz.policy.file=atlas-simple-authz-policy.json
atlas.rest-csrf.enabled=true
atlas.rest-csrf.browser-useragents-regex=^Mozilla.*,^Opera.*,^Chrome.*
atlas.rest-csrf.methods-to-ignore=GET,OPTIONS,HEAD,TRACE
atlas.rest-csrf.custom-header=X-XSRF-HEADER
atlas.metric.query.cache.ttlInSecs=900
atlas.search.gremlin.enable=false
atlas.ui.default.version=v1
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary
3.2、conf/atlas-env.sh
export MANAGE_EMBEDDED_CASSANDRA=false
export MANAGE_LOCAL_ELASTICSEARCH=false
export HBASE_CONF_DIR=/home/atlas/hbase/conf
3.3、apache-atlas-2.2.0-hive-hook.tar.gz
下载地址,解压apache-atlas-2.2.0-hive-hook.tar.gz
,将内容拷贝到atlas安装目录下
对应hive中增加的HIVE_AUX_JARS_PATH
变量
四、测试
hive创建数据库、表
[[email protected] bin]# ./beeline -u jdbc:hive2://192.168.38.10:10000 -n root
Connecting to jdbc:hive2://192.168.38.10:10000
Connected to: Apache Hive (version 3.1.3)
Driver: Hive JDBC (version 3.1.3)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.3 by Apache Hive
0: jdbc:hive2://192.168.38.10:10000> show databases;
+----------------+
| database_name |
+----------------+
| default |
+----------------+
1 row selected (3.811 seconds)
0: jdbc:hive2://192.168.38.10:10000> create database testatlas;
No rows affected (0.375 seconds)
0: jdbc:hive2://192.168.38.10:10000> use testatlas;
No rows affected (0.152 seconds)
0: jdbc:hive2://192.168.38.10:10000> CREATE TABLE atlas_table_test(id int,name string);
No rows affected (2.664 seconds)
0: jdbc:hive2://192.168.38.10:10000> show tables;
+-------------------+
| tab_name |
+-------------------+
| atlas_table_test |
+-------------------+
1 row selected (0.195 seconds)
0: jdbc:hive2://192.168.38.10:10000> select * from atlas_table_test;
+----------------------+------------------------+
| atlas_table_test.id | atlas_table_test.name |
+----------------------+------------------------+
+----------------------+------------------------+
No rows selected (2.983 seconds)
0: jdbc:hive2://192.168.38.10:10000>
一段时间后,atlas便可以查看到(历史数据不会同步,需要通过hook-bin/import-hive.sh
导入)
- 进入hbase命令界面
bin/hbase shell
- 列举表
list
- 全表查询
scan "apache_atlas_entity_audit"
边栏推荐
- Autodesk Revit 2023软件安装包下载及安装教程
- 京东方:随着下半年旺季到来、促销拉动、新产品发布等影响,需求有望出现好转
- 作为开发人员,无代码开发平台 iVX 你有必要了解一下
- 关于组织开展2022年南京市创新产品(第一批)申报工作的通知
- 挖财学堂证券开户安全嘛?
- Daily / June 29, 2022: where is Li Feifei's focus on "embodied intelligence"?
- swoole TCP 分布式实现
- leetcode:232. 用栈实现队列【双栈,一个辅助一个模拟队列】
- golang gopsutil库的使用:进程和系统资源监控(CPU 内存 磁盘等)
- 商业智能BI与业务管理决策思维之三:业务质量分析
猜你喜欢
天谋科技 Timecho 完成近亿元人民币天使轮融资,围绕 Apache IoTDB 打造工业物联网原生时序数据库
leetcode:535. Encryption and decryption of tinyurl [mapping of URL and ID, ID self increment]
STM32与GD32笔记
《网络是怎么样连接的》读书笔记 - 服务器端的局域网中(四)
leetcode:232. 用栈实现队列【双栈,一个辅助一个模拟队列】
scratch报时的公鸡 电子学会图形化编程scratch等级考试一级真题和答案解析2022年6月
leetcode:535. TinyURL 的加密与解密【url和id的映射,id自增】
MySQL XA distributed transaction
It is expected to significantly improve the computational performance of integrated photonic circuits. The Tsinghua team proposed a diffraction pattern neural network framework
Leetcode-470- implement rand10() with rand7()
随机推荐
关于组织开展2022年南京市创新产品(第一批)申报工作的通知
CSDN无法复制问题
Numpy 的研究仿制 1
leetcode:535. TinyURL 的加密与解密【url和id的映射,id自增】
蓝桥杯几道全排列的题目
Applet judges that the data is not empty
A. Beat The Odds
El table column row button anti weight control loading
#夏日挑战赛# HarmonyOS - 方舟开发框架ArkUI 流光按钮效果
leetcode:42. Rain water [double hands are elegant]
LeetCode-470-用Rand7()实现Rand10()
微信公告号 图灵机器人实现智能回复
新股民如何网上开户?究竟网上开户是否安全么?
C. Most Similar Words
天龙八部TLBB系列 - 如何让宠物学习十二满技能
Andorid Jetpack Hilt
按键精灵打怪学习-多窗口多线程后台判断人物、宠物血量和宠物快乐度
智能聊天机器人的优势在哪里?资深独立站卖家告诉你!
京东方:随着下半年旺季到来、促销拉动、新产品发布等影响,需求有望出现好转
STM32与GD32笔记