当前位置:网站首页>Clickhouse distributed cluster construction
Clickhouse distributed cluster construction
2022-07-28 14:11:00 【Big data Institute】
01 ClickHouse Cluster installation deployment
1.1 Installation and deployment Zookeeper
1. The installation mode
Zookeeper There are three installation modes :
standalone mode :Zookeeper Run on only one server , Suitable for test environment .
Pseudo cluster mode : Run multiple on one physical machine Zookeeper example , Suitable for test environment .
Distributed cluster mode :Zookeeper Running in a cluster , Suitable for production environment .
2. Cluster planning
(1) Hosting plan
Zookeeper The smallest cluster is 3 Node cluster , In production environment 100 Clusters below nodes 3 individual Zookeeper Nodes are enough ,500 Clusters below nodes 5 Nodes are enough .
Hadoop3-01 | Hadoop3-02 | Hadoop3-03 | |
Zookeeper | yes | yes | yes |
(2) Software planning
JDK Use the newer version commonly used in work jdk1.8(java10 newest ).
Software | edition | digit |
Jdk | 1.8 | 64 |
Centos | 7 | 64 |
Zookeeper | zookeeper-3.5.6 | Stable version |
(3) User planning
The big data platform cluster software is unified in hadoop User installation .
The name of the node | User group | user |
Hadoop3-01 | Hadoop | Hadoop |
Hadoop3-02 | Hadoop | Hadoop |
Hadoop3-03 | Hadoop | Hadoop |
(4) Directory planning
For the convenience of unified management , Plan the software directory in advance 、 Script directory and data directory .
name | route |
All software catalogs | /home/hadoop/app |
The scripts directory | /home/hadoop/tools |
Data directory | /home/hadoop/data |
3.JDK install
Zookeeper By Java To write , Running on the JVM, So it needs to be installed in advance JDK Running environment .
(1) download JDK
You can download the corresponding version on the official website jdk, Choose install... Here jdk1.8 edition , And upload to /home/hadoop/app Under the table of contents .
(2) decompression JDK
adopt tar -zxvf Command to jdk Unpack the installation package .

(3) Create a soft connection
In order to facilitate the replacement and learning of the version , You can create jdk Soft connection pointing jdk Real installation path . You can use the following command :ln -s jdk1.8.0_51 jdk

(4) Configure environment variables
1) modify /etc/profile file
This method is recommended if your computer is only used for development , Because all users shell All have access to these environment variables , May bring security problems to the system . This is for all users , be-all shell.
vi /etc/profileJAVA_HOME=/home/hadoop/app/jdkCLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarPATH=$JAVA_HOME/bin:/home/hadoop/tools:$PATHexport JAVA_HOME CLASSPATH PATH
2) modify .bashrc file
This method is safer , It can control the permission to use these environment variables to the user level , This is for a specific user , If you need to give a user permission to use these environment variables , You only need to modify the .bashrc Just file it .
vi ~/.bashrcJAVA_HOME=/home/hadoop/app/jdkCLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarPATH=$JAVA_HOME/bin:/home/hadoop/tools:$PATHexport JAVA_HOME CLASSPATH PATH

(5)source Make profile effective
adopt source ~/.bashrc The command makes the environment variable just configured take effect .
(6) Check JDK Is the installation successful
Through the command :java –version see jdk edition , If you can view the current jdk edition , explain jdk Installation successful

(7)JDK The installation package is synchronized to other nodes
Through script commands :deploy.sh jdk1.8.0_51 /home/hadoop/app/ slave take jdk The installation package is synchronized to other nodes , Then repeat 3.3~3.6 Step complete the... Of each node jdk install .

4.ZK install
(1) download Zookeeper
Apache Zookeeper Download address of core version :
http://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/
Apache Zookeeper All version download addresses :
https://archive.apache.org/dist/zookeeper/

Be careful : The first is Zookeeper Installation package , The second is Zookeeper Source package . If you choose to install the source package, the following error will be reported :
Error: Could not find or load main class org.apache.zookeeper.server.quorum.QuorumPeerMain
(2) decompression Zookeeper
adopt tar -zxvf Command to Zookeeper Unpack the installation package .

(3) Create a soft connection
In order to facilitate the replacement and learning of the version , You can create zookeeper Soft connection pointing zookeeper Real installation path . You can use the following command :ln -s zookeeper-xxx zookeeper

(4) modify zoo.cfg The configuration file
# The number of milliseconds of each tick
# This time is for Zookeeper Time interval between servers or between clients and servers to maintain heartbeat
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
# To configure Zookeeper The maximum number of heartbeat intervals that can be tolerated when accepting client initiated connections .
initLimit=10# The number of ticks that can pass between# sending a request and getting an acknowledgement#Leader And Follower Sending messages between , Length of request and reply timesyncLimit=5# the directory where the snapshot is stored.# do not use /tmp for storage, /tmp here is just# example sakes.
# Data directories need to be created in advance
dataDir=/home/hadoop/data/zookeeper/zkdata
# The log directory needs to be created in advance
dataLogDir=/home/hadoop/data/zookeeper/zkdatalog# the port at which the clients will connect
# Access port number
clientPort=2181# the maximum number of client connections.# increase this if you need to handle more clients#maxClientCnxns=60## Be sure to read the maintenance section of the# administrator guide before turning on autopurge.## http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance## The number of snapshots to retain in dataDir#autopurge.snapRetainCount=3# Purge task interval in hours# Set to "0" to disable auto purge feature#autopurge.purgeInterval=1
#server. Service number of each node = The server ip Address : Cluster communication port : Election port
server.1=hadoop3-01:2888:3888
server.2=hadoop3-02:2888:3888
server.3=hadoop3-03:2888:3888
5.zk The installation directory is synchronized to other nodes
take Zookeeper The installation directory is distributed to other nodes as a whole
deploy.sh apache-zookeeper-3.5.6 /home/hadoop/app/ slave

And create soft connections respectively
ln -s apache-zookeeper-3.5.6 zookeeper

6. Create a planned directory
runRemoteCmd.sh "mkdir -p /home/hadoop/data/
zookeeper/zkdata" all

runRemoteCmd.sh "mkdir -p /home/hadoop/data/
zookeeper/zkdatalog" all

7. Modify the service number of each node
Go to each node , Get into /home/hadoop/data/zookeeper/
zkdata Catalog , create a file myid, The contents inside are filled with :1、2、3



8. test run
start-up Zookeeper
runRemoteCmd.sh "/home/hadoop/app/zookeeper/bin/zkServer.sh start" all

see Zookeeper process
runRemoteCmd.sh "jps" all

see Zookeeper state
runRemoteCmd.sh "/home/hadoop/app/zookeeper/bin/zkServer.sh status" all

1.2 ClickHouse Cluster deployment
1.ClickHouse Other nodes are ready
Other nodes ClickHouse Single node installation and deployment are completed .
2.ClickHouse Cluster configuration
(1) To configure metrika.xml
With hadoop3-1 Node as an example , stay metrika.xml In file , Add the following configuration :
vi /etc/clickhouse-server/config.d/metrika.xml<yandex><!--ClickHouse Related configuration --><clickhouse_remote_servers><cluster_2shards_0replicas><shard><replica><host>hadoop3-1</host><port>9000</port></replica></shard><shard><replica><host>hadoop3-2</host><port>9000</port></replica></shard></cluster_2shards_0replicas></clickhouse_remote_servers><!--zookeeper Related configuration --><zookeeper-servers><node index="1"><host>hadoop3-1</host><port>2181</port></node><node index="2"><host>hadoop3-2</host><port>2181</port></node><node index="3"><host>hadoop3-3</host><port>2181</port></node></zookeeper-servers><macros><replica>hadoop3-1</replica></macros><networks><ip>::/0</ip></networks></yandex>
(2) To configure config.xml
In global configuration config.xml Use in <include_from> The tag introduces the configuration just defined :
vi /etc/clickhouse-server/config.xml<include_from>/etc/clickhouse-server/config.d/metrika.xml</include_from>
# quote Zookeeper Definition of configuration
<zookeeper incl="zookeeper-servers" optional="false" /># Open notes , Let other nodes access the current node ClickHouse
<listen_host>::</listen_host>(3) Repeat the above configuration for other nodes of the cluster
Increase in other nodes in turn metrika.xml The configuration file , And modify the global configuration config.xml. stay config.xml In file , Each node needs to modify its own macro definition , With hadoop3-2 Node as an example
<macros>
<replica>hadoop3-2</replica>
</macros>
3. start-up ClickHouse colony
(1) start-up Zookeeper colony
runRemoteCmd.sh "/home/hadoop/app/zookeeper/bin/zkServer.sh start" all
(2) start-up ClickHouse colony
Start based on the default configuration ClickHouse
sudo /etc/init.d/clickhouse-server start
(3) verification ClickHouse colony
Start at each node clickhouse client , Query the cluster information separately .
If you can view the partition and replica information of the cluster configuration , explain clickhouse The cluster deployment of is completely successful .

stay ClickHouse In the system table , Provided a Zookeeper Proxy table , have access to SQL visit Zookeeper The data in .
# Inquire about Zookeeper root directory
select * from system.zookeeper where path = '/'
# Inquire about ClickHouse Catalog
select * from system.zookeeper where path = '/clickhouse'
边栏推荐
- 修订版 | 目标检测:速度和准确性比较(Faster R-CNN,R-FCN,SSD,FPN,RetinaNet和YOLOv3)...
- IntersectionObserver交叉观察器
- 你真的了解esModule吗
- Poj3275 ranking the cows
- jenkins
- A label_ File download (download attribute)
- On websocket
- 安全保障基于软件全生命周期-NetworkPolicy应用
- R language ggplot2 visualization: visualize the scatter diagram and add text labels to the data points in the scatter diagram, using geom of ggrep package_ text_ The rep function avoids overlapping da
- 每日一题——奖学金
猜你喜欢

基于NoneBot2的qq机器人配置记录

30 day question brushing plan (III)

阿里、京东、抖音:把云推向产业心脏

30 day question brushing plan (II)

Entering the world of audio and video -- flv video packaging format

Slam thesis collection

每日一题——奖学金

JMeter installation tutorial and login add token

Multithreading and high concurrency (III) -- source code analysis AQS principle

作为一个程序员,如何高效的管理时间?
随机推荐
Graph traversal (BFS & DFS basis)
DXF reading and writing: align the calculation of the position of the dimension text in the middle and above
线程阻塞的三种情况。
Diablo 4 ps4/ps5 beta has been added to the Playstation database
jenkins
每日一题——奖学金
安全保障基于软件全生命周期-NetworkPolicy应用
Multi level cache scheme
《机器学习》(周志华) 第6章 支持向量 学习心得 笔记
leetcode(442)数组中重复的数据
30 day question brushing plan (IV)
IntersectionObserver交叉观察器
leetcode-深度优先与广度优先遍历
regular expression
Socket class understanding and learning about TCP character stream programming
MySQL开发技巧——视图
如何有效进行回顾会议(上)?
Clickhouse分布式集群搭建
Entering the world of audio and video -- flv video packaging format
R language test sample proportion: use prop The test function performs the single sample proportion test to calculate the confidence interval of the p value of the successful sample proportion in the