当前位置:网站首页>Clickhouse distributed cluster construction
Clickhouse distributed cluster construction
2022-07-28 14:11:00 【Big data Institute】
01 ClickHouse Cluster installation deployment
1.1 Installation and deployment Zookeeper
1. The installation mode
Zookeeper There are three installation modes :
standalone mode :Zookeeper Run on only one server , Suitable for test environment .
Pseudo cluster mode : Run multiple on one physical machine Zookeeper example , Suitable for test environment .
Distributed cluster mode :Zookeeper Running in a cluster , Suitable for production environment .
2. Cluster planning
(1) Hosting plan
Zookeeper The smallest cluster is 3 Node cluster , In production environment 100 Clusters below nodes 3 individual Zookeeper Nodes are enough ,500 Clusters below nodes 5 Nodes are enough .
Hadoop3-01 | Hadoop3-02 | Hadoop3-03 | |
Zookeeper | yes | yes | yes |
(2) Software planning
JDK Use the newer version commonly used in work jdk1.8(java10 newest ).
Software | edition | digit |
Jdk | 1.8 | 64 |
Centos | 7 | 64 |
Zookeeper | zookeeper-3.5.6 | Stable version |
(3) User planning
The big data platform cluster software is unified in hadoop User installation .
The name of the node | User group | user |
Hadoop3-01 | Hadoop | Hadoop |
Hadoop3-02 | Hadoop | Hadoop |
Hadoop3-03 | Hadoop | Hadoop |
(4) Directory planning
For the convenience of unified management , Plan the software directory in advance 、 Script directory and data directory .
name | route |
All software catalogs | /home/hadoop/app |
The scripts directory | /home/hadoop/tools |
Data directory | /home/hadoop/data |
3.JDK install
Zookeeper By Java To write , Running on the JVM, So it needs to be installed in advance JDK Running environment .
(1) download JDK
You can download the corresponding version on the official website jdk, Choose install... Here jdk1.8 edition , And upload to /home/hadoop/app Under the table of contents .
(2) decompression JDK
adopt tar -zxvf Command to jdk Unpack the installation package .

(3) Create a soft connection
In order to facilitate the replacement and learning of the version , You can create jdk Soft connection pointing jdk Real installation path . You can use the following command :ln -s jdk1.8.0_51 jdk

(4) Configure environment variables
1) modify /etc/profile file
This method is recommended if your computer is only used for development , Because all users shell All have access to these environment variables , May bring security problems to the system . This is for all users , be-all shell.
vi /etc/profileJAVA_HOME=/home/hadoop/app/jdkCLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarPATH=$JAVA_HOME/bin:/home/hadoop/tools:$PATHexport JAVA_HOME CLASSPATH PATH
2) modify .bashrc file
This method is safer , It can control the permission to use these environment variables to the user level , This is for a specific user , If you need to give a user permission to use these environment variables , You only need to modify the .bashrc Just file it .
vi ~/.bashrcJAVA_HOME=/home/hadoop/app/jdkCLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarPATH=$JAVA_HOME/bin:/home/hadoop/tools:$PATHexport JAVA_HOME CLASSPATH PATH

(5)source Make profile effective
adopt source ~/.bashrc The command makes the environment variable just configured take effect .
(6) Check JDK Is the installation successful
Through the command :java –version see jdk edition , If you can view the current jdk edition , explain jdk Installation successful

(7)JDK The installation package is synchronized to other nodes
Through script commands :deploy.sh jdk1.8.0_51 /home/hadoop/app/ slave take jdk The installation package is synchronized to other nodes , Then repeat 3.3~3.6 Step complete the... Of each node jdk install .

4.ZK install
(1) download Zookeeper
Apache Zookeeper Download address of core version :
http://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/
Apache Zookeeper All version download addresses :
https://archive.apache.org/dist/zookeeper/

Be careful : The first is Zookeeper Installation package , The second is Zookeeper Source package . If you choose to install the source package, the following error will be reported :
Error: Could not find or load main class org.apache.zookeeper.server.quorum.QuorumPeerMain
(2) decompression Zookeeper
adopt tar -zxvf Command to Zookeeper Unpack the installation package .

(3) Create a soft connection
In order to facilitate the replacement and learning of the version , You can create zookeeper Soft connection pointing zookeeper Real installation path . You can use the following command :ln -s zookeeper-xxx zookeeper

(4) modify zoo.cfg The configuration file
# The number of milliseconds of each tick
# This time is for Zookeeper Time interval between servers or between clients and servers to maintain heartbeat
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
# To configure Zookeeper The maximum number of heartbeat intervals that can be tolerated when accepting client initiated connections .
initLimit=10# The number of ticks that can pass between# sending a request and getting an acknowledgement#Leader And Follower Sending messages between , Length of request and reply timesyncLimit=5# the directory where the snapshot is stored.# do not use /tmp for storage, /tmp here is just# example sakes.
# Data directories need to be created in advance
dataDir=/home/hadoop/data/zookeeper/zkdata
# The log directory needs to be created in advance
dataLogDir=/home/hadoop/data/zookeeper/zkdatalog# the port at which the clients will connect
# Access port number
clientPort=2181# the maximum number of client connections.# increase this if you need to handle more clients#maxClientCnxns=60## Be sure to read the maintenance section of the# administrator guide before turning on autopurge.## http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance## The number of snapshots to retain in dataDir#autopurge.snapRetainCount=3# Purge task interval in hours# Set to "0" to disable auto purge feature#autopurge.purgeInterval=1
#server. Service number of each node = The server ip Address : Cluster communication port : Election port
server.1=hadoop3-01:2888:3888
server.2=hadoop3-02:2888:3888
server.3=hadoop3-03:2888:3888
5.zk The installation directory is synchronized to other nodes
take Zookeeper The installation directory is distributed to other nodes as a whole
deploy.sh apache-zookeeper-3.5.6 /home/hadoop/app/ slave

And create soft connections respectively
ln -s apache-zookeeper-3.5.6 zookeeper

6. Create a planned directory
runRemoteCmd.sh "mkdir -p /home/hadoop/data/
zookeeper/zkdata" all

runRemoteCmd.sh "mkdir -p /home/hadoop/data/
zookeeper/zkdatalog" all

7. Modify the service number of each node
Go to each node , Get into /home/hadoop/data/zookeeper/
zkdata Catalog , create a file myid, The contents inside are filled with :1、2、3



8. test run
start-up Zookeeper
runRemoteCmd.sh "/home/hadoop/app/zookeeper/bin/zkServer.sh start" all

see Zookeeper process
runRemoteCmd.sh "jps" all

see Zookeeper state
runRemoteCmd.sh "/home/hadoop/app/zookeeper/bin/zkServer.sh status" all

1.2 ClickHouse Cluster deployment
1.ClickHouse Other nodes are ready
Other nodes ClickHouse Single node installation and deployment are completed .
2.ClickHouse Cluster configuration
(1) To configure metrika.xml
With hadoop3-1 Node as an example , stay metrika.xml In file , Add the following configuration :
vi /etc/clickhouse-server/config.d/metrika.xml<yandex><!--ClickHouse Related configuration --><clickhouse_remote_servers><cluster_2shards_0replicas><shard><replica><host>hadoop3-1</host><port>9000</port></replica></shard><shard><replica><host>hadoop3-2</host><port>9000</port></replica></shard></cluster_2shards_0replicas></clickhouse_remote_servers><!--zookeeper Related configuration --><zookeeper-servers><node index="1"><host>hadoop3-1</host><port>2181</port></node><node index="2"><host>hadoop3-2</host><port>2181</port></node><node index="3"><host>hadoop3-3</host><port>2181</port></node></zookeeper-servers><macros><replica>hadoop3-1</replica></macros><networks><ip>::/0</ip></networks></yandex>
(2) To configure config.xml
In global configuration config.xml Use in <include_from> The tag introduces the configuration just defined :
vi /etc/clickhouse-server/config.xml<include_from>/etc/clickhouse-server/config.d/metrika.xml</include_from>
# quote Zookeeper Definition of configuration
<zookeeper incl="zookeeper-servers" optional="false" /># Open notes , Let other nodes access the current node ClickHouse
<listen_host>::</listen_host>(3) Repeat the above configuration for other nodes of the cluster
Increase in other nodes in turn metrika.xml The configuration file , And modify the global configuration config.xml. stay config.xml In file , Each node needs to modify its own macro definition , With hadoop3-2 Node as an example
<macros>
<replica>hadoop3-2</replica>
</macros>
3. start-up ClickHouse colony
(1) start-up Zookeeper colony
runRemoteCmd.sh "/home/hadoop/app/zookeeper/bin/zkServer.sh start" all
(2) start-up ClickHouse colony
Start based on the default configuration ClickHouse
sudo /etc/init.d/clickhouse-server start
(3) verification ClickHouse colony
Start at each node clickhouse client , Query the cluster information separately .
If you can view the partition and replica information of the cluster configuration , explain clickhouse The cluster deployment of is completely successful .

stay ClickHouse In the system table , Provided a Zookeeper Proxy table , have access to SQL visit Zookeeper The data in .
# Inquire about Zookeeper root directory
select * from system.zookeeper where path = '/'
# Inquire about ClickHouse Catalog
select * from system.zookeeper where path = '/clickhouse'
边栏推荐
猜你喜欢

Leetcode 105. construct binary tree from preorder and inorder traversal sequence & 106. construct binary tree from inorder and postorder traversal sequence

RSA用私钥加密数据公钥解密数据(不是签名验证过程)

30 day question brushing plan (III)

安全保障基于软件全生命周期-NetworkPolicy应用

Slam thesis collection

Security assurance is based on software life cycle -psp application

Qt5 development from introduction to mastery -- the first overview

多线程与高并发(三)—— 源码解析 AQS 原理

阿里、京东、抖音:把云推向产业心脏

Implementation of StrCmp, strstr, memcpy, memmove
随机推荐
R language uses LM function to build linear regression model and subset function to specify subset of data set to build regression model (use floor function and length function to select the former pa
vite在项目中配置路径别名
走进音视频的世界——FLV视频封装格式
[util] redis tool class: change the value serializer of redis to genericjackson2jsonredisserializer, and the return value can be object or collection
Three cases of thread blocking.
关于栈的理解以及实际应用场景
Several efficient APIs commonly used in inventory operation URL
掌握常见的几种排序-选择排序
阿里、京东、抖音:把云推向产业心脏
Thoroughly master binary search
[lvgl events] Application of events on different components (I)
Do you really know esmodule
Qt5 development from introduction to mastery -- the first overview
[basic course of flight control development 7] crazy shell · open source formation UAV SPI (barometer data acquisition)
【Utils】JsonUtil
Security assurance is based on software life cycle -istio authentication mechanism
IP黑白名单
文献阅读(245)Roller
Vite configuring path aliases in the project
LeetCode 0143. 重排链表