当前位置：网站首页>Clickhouse distributed cluster construction

Clickhouse distributed cluster construction

2022-07-28 14:11:00 【Big data Institute】

01 ClickHouse Cluster installation deployment

1.1 Installation and deployment Zookeeper

1. The installation mode

Zookeeper There are three installation modes ：

standalone mode ：Zookeeper Run on only one server , Suitable for test environment .

Pseudo cluster mode ： Run multiple on one physical machine Zookeeper example , Suitable for test environment .

Distributed cluster mode ：Zookeeper Running in a cluster , Suitable for production environment .

2. Cluster planning

（1） Hosting plan

Zookeeper The smallest cluster is 3 Node cluster , In production environment 100 Clusters below nodes 3 individual Zookeeper Nodes are enough ,500 Clusters below nodes 5 Nodes are enough .

	Hadoop3-01	Hadoop3-02	Hadoop3-03
Zookeeper	yes	yes	yes

（2） Software planning

JDK Use the newer version commonly used in work jdk1.8（java10 newest ）.

Software	edition	digit
Jdk	1.8	64
Centos	7	64
Zookeeper	zookeeper-3.5.6	Stable version

（3） User planning

The big data platform cluster software is unified in hadoop User installation .

The name of the node	User group	user
Hadoop3-01	Hadoop	Hadoop
Hadoop3-02	Hadoop	Hadoop
Hadoop3-03	Hadoop	Hadoop

（4） Directory planning

For the convenience of unified management , Plan the software directory in advance 、 Script directory and data directory .

name	route
All software catalogs	/home/hadoop/app
The scripts directory	/home/hadoop/tools
Data directory	/home/hadoop/data

3.JDK install

Zookeeper By Java To write , Running on the JVM, So it needs to be installed in advance JDK Running environment .

（1） download JDK

You can download the corresponding version on the official website jdk, Choose install... Here jdk1.8 edition , And upload to /home/hadoop/app Under the table of contents .

（2） decompression JDK

adopt tar -zxvf Command to jdk Unpack the installation package .

(3） Create a soft connection

In order to facilitate the replacement and learning of the version , You can create jdk Soft connection pointing jdk Real installation path . You can use the following command ：ln -s jdk1.8.0_51 jdk

（4） Configure environment variables

1） modify /etc/profile file

This method is recommended if your computer is only used for development , Because all users shell All have access to these environment variables , May bring security problems to the system . This is for all users , be-all shell.

vi /etc/profileJAVA_HOME=/home/hadoop/app/jdkCLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarPATH=$JAVA_HOME/bin:/home/hadoop/tools:$PATHexport JAVA_HOME CLASSPATH PATH

2） modify .bashrc file

This method is safer , It can control the permission to use these environment variables to the user level , This is for a specific user , If you need to give a user permission to use these environment variables , You only need to modify the .bashrc Just file it .

vi ~/.bashrcJAVA_HOME=/home/hadoop/app/jdkCLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarPATH=$JAVA_HOME/bin:/home/hadoop/tools:$PATHexport JAVA_HOME CLASSPATH PATH

（5）source Make profile effective

adopt source ~/.bashrc The command makes the environment variable just configured take effect .

（6） Check JDK Is the installation successful

Through the command ：java –version see jdk edition , If you can view the current jdk edition , explain jdk Installation successful

（7）JDK The installation package is synchronized to other nodes

Through script commands ：deploy.sh jdk1.8.0_51 /home/hadoop/app/ slave take jdk The installation package is synchronized to other nodes , Then repeat 3.3~3.6 Step complete the... Of each node jdk install .

4.ZK install

（1） download Zookeeper

Apache Zookeeper Download address of core version ：

http://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/

Apache Zookeeper All version download addresses ：

https://archive.apache.org/dist/zookeeper/

Be careful ： The first is Zookeeper Installation package , The second is Zookeeper Source package . If you choose to install the source package, the following error will be reported ：

Error: Could not find or load main class org.apache.zookeeper.server.quorum.QuorumPeerMain

（2） decompression Zookeeper

adopt tar -zxvf Command to Zookeeper Unpack the installation package .

（3） Create a soft connection

In order to facilitate the replacement and learning of the version , You can create zookeeper Soft connection pointing zookeeper Real installation path . You can use the following command ：ln -s zookeeper-xxx zookeeper

（4） modify zoo.cfg The configuration file

# The number of milliseconds of each tick

# This time is for Zookeeper Time interval between servers or between clients and servers to maintain heartbeat

tickTime=2000

# The number of ticks that the initial

# synchronization phase can take

# To configure Zookeeper The maximum number of heartbeat intervals that can be tolerated when accepting client initiated connections .

initLimit=10# The number of ticks that can pass between # sending a request and getting an acknowledgement#Leader  And  Follower  Sending messages between , Length of request and reply time syncLimit=5# the directory where the snapshot is stored.# do not use /tmp for storage, /tmp here is just # example sakes.

# Data directories need to be created in advance

dataDir=/home/hadoop/data/zookeeper/zkdata

# The log directory needs to be created in advance

dataLogDir=/home/hadoop/data/zookeeper/zkdatalog# the port at which the clients will connect

# Access port number

clientPort=2181# the maximum number of client connections.# increase this if you need to handle more clients#maxClientCnxns=60## Be sure to read the maintenance section of the # administrator guide before turning on autopurge.## http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance## The number of snapshots to retain in dataDir#autopurge.snapRetainCount=3# Purge task interval in hours# Set to "0" to disable auto purge feature#autopurge.purgeInterval=1

#server. Service number of each node = The server ip Address ： Cluster communication port ： Election port

server.1=hadoop3-01:2888:3888

server.2=hadoop3-02:2888:3888

server.3=hadoop3-03:2888:3888

5.zk The installation directory is synchronized to other nodes

take Zookeeper The installation directory is distributed to other nodes as a whole

deploy.sh apache-zookeeper-3.5.6 /home/hadoop/app/ slave

And create soft connections respectively

ln -s apache-zookeeper-3.5.6 zookeeper

6. Create a planned directory

runRemoteCmd.sh "mkdir -p /home/hadoop/data/

zookeeper/zkdata" all

runRemoteCmd.sh "mkdir -p /home/hadoop/data/

zookeeper/zkdatalog" all

7. Modify the service number of each node

Go to each node , Get into /home/hadoop/data/zookeeper/

zkdata Catalog , create a file myid, The contents inside are filled with ：1、2、3

8. test run

start-up Zookeeper

runRemoteCmd.sh "/home/hadoop/app/zookeeper/bin/zkServer.sh start" all

see Zookeeper process

runRemoteCmd.sh "jps" all

see Zookeeper state

runRemoteCmd.sh "/home/hadoop/app/zookeeper/bin/zkServer.sh status" all

1.2 ClickHouse Cluster deployment

1.ClickHouse Other nodes are ready

Other nodes ClickHouse Single node installation and deployment are completed .

2.ClickHouse Cluster configuration

（1） To configure metrika.xml

With hadoop3-1 Node as an example , stay metrika.xml In file , Add the following configuration ：

vi /etc/clickhouse-server/config.d/metrika.xml<yandex><!--ClickHouse Related configuration --><clickhouse_remote_servers><cluster_2shards_0replicas><shard><replica><host>hadoop3-1</host><port>9000</port></replica></shard><shard><replica><host>hadoop3-2</host><port>9000</port></replica></shard></cluster_2shards_0replicas></clickhouse_remote_servers><!--zookeeper Related configuration --><zookeeper-servers><node index="1"><host>hadoop3-1</host><port>2181</port></node><node index="2"><host>hadoop3-2</host><port>2181</port></node><node index="3"><host>hadoop3-3</host><port>2181</port></node></zookeeper-servers><macros><replica>hadoop3-1</replica></macros><networks><ip>::/0</ip></networks></yandex>

（2） To configure config.xml

In global configuration config.xml Use in <include_from> The tag introduces the configuration just defined ：

vi /etc/clickhouse-server/config.xml<include_from>/etc/clickhouse-server/config.d/metrika.xml</include_from>

# quote Zookeeper Definition of configuration

<zookeeper incl="zookeeper-servers" optional="false" />

# Open notes , Let other nodes access the current node ClickHouse

<listen_host>::</listen_host>

（3） Repeat the above configuration for other nodes of the cluster

Increase in other nodes in turn metrika.xml The configuration file , And modify the global configuration config.xml. stay config.xml In file , Each node needs to modify its own macro definition , With hadoop3-2 Node as an example

<replica>hadoop3-2</replica>

</macros>

3. start-up ClickHouse colony

（1） start-up Zookeeper colony

runRemoteCmd.sh "/home/hadoop/app/zookeeper/bin/zkServer.sh start" all

（2） start-up ClickHouse colony

Start based on the default configuration ClickHouse

sudo /etc/init.d/clickhouse-server start

（3） verification ClickHouse colony

Start at each node clickhouse client , Query the cluster information separately .

If you can view the partition and replica information of the cluster configuration , explain clickhouse The cluster deployment of is completely successful .