当前位置：网站首页>Clickhouse learning (V) cluster operation

Clickhouse learning (V) cluster operation

2022-07-29 05:33:00 【Crying dogs in the sun】

Catalog

copy
- zookeeper To configure
- test
Fragmentation cluster
- build
- test

copy

zookeeper To configure

Internal direct modification
Just change it to your own

Insert picture description here

External file form
stay /etc/clickhouse-server/config.d Create metrika.xml file

<?xml version="1.0"?>
<yandex>
<zookeeper-servers>
 <node index="1">
 <host>spark01</host>
 <port>2181</port>
 </node>
 <node index="2">
 <host>spark02</host>
 <port>2181</port>
 </node>
 <node index="3">
 <host>spark03</host>
 <port>2181</port>
 </node>
</zookeeper-servers>
</yandex>

To other machines
Insert picture description here
stay /etc/clickhouse-server/config.xml Add the following information to :

<zookeeper incl="zookeeper-servers" optional="true" />
<include_from>/etc/clickhouse-server/config.d/metrika.xml</include_from>

Insert picture description here
Distributed cluster

This completes the configuration

test

Replicas can only synchronize data, but cannot synchronize table structure data

/clickhouse/table/01/t_order_rep It means that zookeeper Path information in among 01 Represents a fragment
rep_102 Indicates the name of the copy

Create table structures on three machines respectively

spark01 To create a :
create table t_order_rep2 (
 id UInt32,
 sku_id String,
 total_amount Decimal(16,2),
 create_time Datetime
) engine =ReplicatedMergeTree('/clickhouse/table/01/t_order_rep','rep_101')
 partition by toYYYYMMDD(create_time)
 primary key (id)
 order by (id,sku_id);
 
spark02 To create a :
create table t_order_rep2 (
 id UInt32,
 sku_id String,
 total_amount Decimal(16,2),
 create_time Datetime
) engine =ReplicatedMergeTree('/clickhouse/table/01/t_order_rep','rep_102')
 partition by toYYYYMMDD(create_time)
 primary key (id)
 order by (id,sku_id);

spark03 To create a :
create table t_order_rep2 (
 id UInt32,
 sku_id String,
 total_amount Decimal(16,2),
 create_time Datetime
) engine =ReplicatedMergeTree('/clickhouse/table/01/t_order_rep','rep_103')
 partition by toYYYYMMDD(create_time)
 primary key (id)
 order by (id,sku_id);

towards spark01 Insert data

insert into t_order_rep2 values
(101,'sku_001',1000.00,'2020-06-01 12:00:00'),
(102,'sku_002',2000.00,'2020-06-01 12:00:00'),
(103,'sku_004',2500.00,'2020-06-01 12:00:00'),
(104,'sku_002',2000.00,'2020-06-01 12:00:00'),
(105,'sku_003',600.00,'2020-06-02 12:00:00');

Can be in spark02,03 It's on the Internet
Insert picture description here

Fragmentation cluster

Sharding is to distribute the data of a table on different nodes , Re pass Distributed The table engine splices data together for use .
Distributed The table engine itself does not store data, but is only used to manage other partitions

build

Create two tiles , The first fragment has a copy

stay config.d Create metrika-shard.xml file

<?xml version="1.0"?>
<yandex>
<remote_servers>
<clusters> <!--  Cluster name --> 
<shard> <!-- The first slice of the cluster -->
<internal_replication>true</internal_replication>
 <replica> <!-- The first copy of the slice -->
 <host>spark01</host>
 <port>9000</port>
 </replica>
 <replica> <!-- The second copy of the slice -->
 <host>spark02</host>
 <port>9000</port>
 </replica>
</shard>
<shard> <!-- The second slice of the cluster -->
 <internal_replication>true</internal_replication>
 <replica> <!-- The first copy of the slice -->
 <host>spark03</host>
 <port>9000</port>
 </replica>
</shard>
</clusters>
</remote_servers>
<zookeeper-servers>
<node index="1">
<host>spark01</host>
<port>2181</port>
</node>
<node index="2">
<host>spark02</host>
 <port>2181</port>
</node>
<node index="3">
 <host>spark03</host>
 <port>2181</port>
</node>
</zookeeper-servers>
<macros>
<shard>01</shard> <!-- Different machines put different pieces -->
<replica>rep_1_1</replica> <!-- The number of copies placed on different machines is different -->
</macros>
</yandex>

Distribute to other clusters
Insert picture description here
take spark02 Modify the file on

take spark03 Modify the file on

stay config.xml Change the file name under the file

Distributed cluster

After each configuration config.xml The service must be restarted

test

First create a shard table

create table st_order_mt on cluster clusters (
 id UInt32,
 sku_id String,
 total_amount Decimal(16,2),
 create_time Datetime
) engine =ReplicatedMergeTree('/clickhouse/tables/{shard}/st_order_mt','{replica}')
 partition by toYYYYMMDD(create_time)
 primary key (id)
 order by (id,sku_id);

Insert picture description here
To create a Distribute Distributed table

create table test2 on cluster clusters
(
 id UInt32,
 sku_id String,
 total_amount Decimal(16,2),
 create_time Datetime
)engine = Distributed(clusters,default, st_order_mt,hiveHash(sku_id));

Insert picture description here
Insert data into a distributed table

insert into test2 values
(201,'sku_001',1000.00,'2020-06-03 12:00:00') ;
(202,'sku_002',2000.00,'2020-06-01 12:00:00'),
(203,'sku_004',2500.00,'2020-06-01 12:00:00'),
(204,'sku_002',2000.00,'2020-06-01 12:00:00'),
(205,'sku_003',600.00,'2020-06-02 12:00:00');