当前位置：网站首页>元数据管理Apache Atlas编译集成部署及测试

元数据管理Apache Atlas编译集成部署及测试

2022-06-29 15:45:00 【韧小钊】

Apache Atlas集成部署

一、背景
二、基础组件
三、Apache Atlas
- 3、配置
四、测试

一、背景

Atlas采集Hive元数据过程，通过hive hook及Kafka作为中间件完成元数据采集。直接通过hive hook采集（同步）的话，会对元数据采集源性能造成影响。因此通过Kafka作为中间件传送消息。本文直接通过前文的内嵌式编译包部署（通用-只需修改配置文件），新增Kafka、hive、hbase、solr部署。此次简单记录一下吧，哪天忍不住买工作站了，好好整理一下这些集群的详细部署，最近都部署吐了,电脑也吐了，硬盘坏过，蓝屏等等…

二、基础组件

2.1、hadoop

虚拟机之前部署过（2.7.3版本），略！

2.2、Kafka

kafka_2.13-3.2.0.tgz下载地址，速度比较快，如果不行，只能去官网下载了。

config/server.properties
直接解压，指定zk地址即可，目前都是单机版，没办法，资源有限
启动命令

nohup bin/kafka-server-start.sh  config/server.properties &

查看topic命令

./kafka-topics.sh --bootstrap-server localhost:9092 --list

查看具体topic内容命令

./kafka-console-consumer.sh  --bootstrap-server localhost:9092 --topic ATLAS_HOOK --from-beginning

2.3、hive

自行官网下载，有积分的花两分快速下载

conf/hive-env.sh

此次主要是增加HIVE_AUX_JARS_PATH变量，其路径部署atlas时会涉及到

HADOOP_HOME=/root/hadoop-2.7.3
export HIVE_CONF_DIR=/root/apache-hive-3.1.3-bin/conf
export HIVE_AUX_JARS_PATH=/home/atlas/apache-atlas-2.2.0/hook/hive

bin/hive

指定hbase
在这里插入图片描述

conf/hive-site.xml

添加hooks（还需要部署MySQL哟，也可以不部署，采用内嵌模式-应该不影响采集吧，尝试的越多越感觉无知）

<configuration>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://192.168.38.10:3306/hive_metastore?createDatabaseIfNotExist=true&amp;useSSL=false&amp;allowPublicKeyRetrieval=true</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.cj.jdbc.Driver</value>
    </property>
        <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>[email protected]</value>
    </property>
    <property>
        <name>datanucleus.schema.autoCreateAll</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>192.168.38.10</value>
    </property>
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
    </property>
    
   <!-- Hive元数据存储的验证 -->
    <property>
        <name>hive.metastore.schema.verification</name>
        <value>false</value>
    </property>
   
    <!-- 元数据存储授权 -->
    <property>
        <name>hive.metastore.event.db.notification.api.auth</name>
        <value>false</value>
    </property>
<property>
    <name>hive.exec.post.hooks</name>
    <value>org.apache.atlas.hive.hook.HiveHook</value>
</property>
</configuration>

conf/atlas-application.properties
从atlas/conf目录下拷贝过来
元数据初始化

bin/schematool -dbType mysql -initSchema

启动

nohup bin/hive --service hiveserver2 &

2.4、zookeeper-3.4.1

略

2.5、hbase和solr

下载地址
本文直接用的内嵌式编译包中的hbase和solr,修改相关配置即可
在这里插入图片描述

hbase-env.sh

单独部署了zk,此处设为false

export HBASE_MANAGES_ZK=false

hbase-site.xml

<configuration>
  <!-- The following properties are set for running HBase as a single process on a developer workstation. With this configuration, HBase is running in "stand-alone" mode and without a distributed file system. In this mode, and without further configuration, HBase and ZooKeeper data are stored on the local filesystem, in a path under the value configured for `hbase.tmp.dir`. This value is overridden from its default value of `/tmp` because many systems clean `/tmp` on a regular basis. Instead, it points to a path within this HBase installation directory. Running against the `LocalFileSystem`, as opposed to a distributed filesystem, runs the risk of data integrity issues and data loss. Normally HBase will refuse to run in such an environment. Setting `hbase.unsafe.stream.capability.enforce` to `false` overrides this behavior, permitting operation. This configuration is for the developer workstation only and __should not be used in production!__ -->
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.tmp.dir</name>
    <value>./tmp</value>
  </property>
  <property>
    <name>hbase.unsafe.stream.capability.enforce</name>
    <value>false</value>
  </property>
</configuration>

启动

bin/start-hbase.sh

solr 启停

bin/solr start

bin/solr stop -p 8983

三、Apache Atlas

3、配置

3.1、conf/atlas-application.properties

编译后文件内容都有的，只要改下地址、路径即可

atlas.graph.storage.backend=hbase2
atlas.graph.storage.hbase.table=apache_atlas_janus
atlas.graph.storage.hostname=192.168.38.10
atlas.graph.storage.hbase.regions-per-server=1
atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.HBaseBasedAuditRepository
atlas.graph.index.search.backend=solr
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=192.168.38.10:2181
atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
atlas.graph.index.search.solr.zookeeper-session-timeout=60000
atlas.graph.index.search.solr.wait-searcher=false
atlas.graph.index.search.max-result-set-size=150
atlas.notification.embedded=false
atlas.kafka.data=/home/atlas/apache-atlas-2.2.0/data/kafka
atlas.kafka.zookeeper.connect=192.168.38.10:2181/kafka
atlas.kafka.bootstrap.servers=192.168.38.10:9092
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.connection.timeout.ms=200
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.hook.group.id=atlas
atlas.kafka.enable.auto.commit=false
atlas.kafka.auto.offset.reset=earliest
atlas.kafka.session.timeout.ms=30000
atlas.kafka.offsets.topic.replication.factor=1
atlas.kafka.poll.timeout.ms=1000
atlas.notification.create.topics=true
atlas.notification.replicas=1
atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
atlas.notification.log.failed.messages=true
atlas.notification.consumer.retry.interval=500
atlas.notification.hook.retry.interval=1000
atlas.enableTLS=false
atlas.authentication.method.kerberos=false
atlas.authentication.method.file=true
atlas.authentication.method.ldap.type=none
atlas.authentication.method.file.filename=${sys:atlas.home}/conf/users-credentials.properties
atlas.rest.address=http://192.168.38.10:21000
atlas.audit.hbase.tablename=apache_atlas_entity_audit
atlas.audit.zookeeper.session.timeout.ms=1000
atlas.audit.hbase.zookeeper.quorum=192.168.38.10:2181
atlas.server.ha.enabled=false
atlas.authorizer.impl=simple
atlas.authorizer.simple.authz.policy.file=atlas-simple-authz-policy.json
atlas.rest-csrf.enabled=true
atlas.rest-csrf.browser-useragents-regex=^Mozilla.*,^Opera.*,^Chrome.*
atlas.rest-csrf.methods-to-ignore=GET,OPTIONS,HEAD,TRACE
atlas.rest-csrf.custom-header=X-XSRF-HEADER
atlas.metric.query.cache.ttlInSecs=900
atlas.search.gremlin.enable=false
atlas.ui.default.version=v1
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary

3.2、conf/atlas-env.sh

export MANAGE_EMBEDDED_CASSANDRA=false
export MANAGE_LOCAL_ELASTICSEARCH=false
export HBASE_CONF_DIR=/home/atlas/hbase/conf

3.3、apache-atlas-2.2.0-hive-hook.tar.gz

下载地址，解压apache-atlas-2.2.0-hive-hook.tar.gz,将内容拷贝到atlas安装目录下
在这里插入图片描述
对应hive中增加的HIVE_AUX_JARS_PATH变量

四、测试

hive创建数据库、表

[[email protected] bin]# ./beeline -u jdbc:hive2://192.168.38.10:10000 -n root
Connecting to jdbc:hive2://192.168.38.10:10000
Connected to: Apache Hive (version 3.1.3)
Driver: Hive JDBC (version 3.1.3)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.3 by Apache Hive
0: jdbc:hive2://192.168.38.10:10000> show databases;
+----------------+
| database_name  |
+----------------+
| default        |
+----------------+
1 row selected (3.811 seconds)
0: jdbc:hive2://192.168.38.10:10000> create database testatlas;
No rows affected (0.375 seconds)
0: jdbc:hive2://192.168.38.10:10000> use testatlas;
No rows affected (0.152 seconds)
0: jdbc:hive2://192.168.38.10:10000> CREATE  TABLE  atlas_table_test(id int,name string);
No rows affected (2.664 seconds)
0: jdbc:hive2://192.168.38.10:10000> show tables;
+-------------------+
|     tab_name      |
+-------------------+
| atlas_table_test  |
+-------------------+
1 row selected (0.195 seconds)
0: jdbc:hive2://192.168.38.10:10000> select * from atlas_table_test;
+----------------------+------------------------+
| atlas_table_test.id  | atlas_table_test.name  |
+----------------------+------------------------+
+----------------------+------------------------+
No rows selected (2.983 seconds)
0: jdbc:hive2://192.168.38.10:10000>

在这里插入图片描述