当前位置:网站首页>Data warehouse 4.0 notes - user behavior data collection III
Data warehouse 4.0 notes - user behavior data collection III
2022-07-23 11:41:00 【Silky】
1 Kafka Cluster installation
[[email protected] software]$ tar -zxvf kafka_2.11-2.4.1.tgz -C /opt/module/

Modify name
[[email protected] module]$ mv kafka_2.11-2.4.1/ kafka
establish logs Folder
[[email protected] kafka]$ mkdir logs

Modify the configuration file
[[email protected] kafka]$ cd config/
[[email protected] config]$ vi server.properties


Modify or add the following :
#broker The global unique number of , Can't repeat
broker.id=0
# Delete topic Function enables
delete.topic.enable=true
#kafka Run the path where the log is stored
log.dirs=/opt/module/kafka/data
# configure connections Zookeeper The cluster address
zookeeper.connect=hadoop102:2181,hadoop103:2181,hadoop104:2181/kafka



[[email protected] config]$ cd ..
[[email protected] kafka]$ cd ..
[[email protected] module]$ ll
distribution
[[email protected] module]$ xsync kafka/


Go to 103 104 Modify the broker.id



Configure environment variables
[[email protected] kafka]$ sudo vim /etc/profile.d/my_env.sh

[[email protected] kafka]$ source /etc/profile.d/my_env.sh
distribution
[[email protected] kafka]$ sudo /home/zhang/bin/xsync /etc/profile.d/my_env.sh 
[[email protected] config]$ source /etc/profile.d/my_env.sh
[[email protected] config]$ source /etc/profile.d/my_env.sh
Start cluster
[[email protected] kafka]$ bin/kafka-server-start.sh -daemon /opt/module/kafka/config/server.properties

[[email protected] kafka]$ bin/kafka-server-start.sh -daemon /opt/module/kafka/config/server.properties

[[email protected] kafka]$ bin/kafka-server-start.sh -daemon /opt/module/kafka/config/server.properties

2 Kafka Cluster start stop script
stay /home/zhang/bin Create script in directory kf.sh
[[email protected] bin]$ vim kf.sh

Add script content
#! /bin/bash
case $1 in
"start"){
for i in hadoop102 hadoop103 hadoop104
do
echo " -------- start-up $i Kafka-------"
ssh $i "/opt/module/kafka/bin/kafka-server-start.sh -daemon /opt/module/kafka/config/server.properties"
done
};;
"stop"){
for i in hadoop102 hadoop103 hadoop104
do
echo " -------- stop it $i Kafka-------"
ssh $i "/opt/module/kafka/bin/kafka-server-stop.sh stop"
done
};;
esac

[[email protected] bin]$ chmod 777 kf.sh

Use the script command to close

The start command is also successful

3 Kafka Common commands
see Kafka Topic list
[[email protected] kafka]$ bin/kafka-topics.sh --zookeeper hadoop102:2181/kafka --list
establish Kafka Topic
Enter into /opt/module/kafka/ Create log topic under directory
[[email protected] kafka]$ bin/kafka-topics.sh --zookeeper hadoop102:2181,hadoop103:2181,hadoop104:2181/kafka --create --replication-factor 1 --partitions 1 --topic topic_log
Delete Kafka Topic
[[email protected] kafka]$ bin/kafka-topics.sh --delete --zookeeper hadoop102:2181,hadoop103:2181,hadoop104:2181/kafka --topic topic_log
Kafka Production news
[[email protected] kafka]$ bin/kafka-console-producer.sh \
--broker-list hadoop102:9092 --topic topic_log
>hello world
>zhang zhang
Kafka News consumption
[[email protected] kafka]$ bin/kafka-console-consumer.sh \
--bootstrap-server hadoop102:9092 --from-beginning --topic topic_log
4 Project experience Kafka Pressure test

hadoop102、hadoop103、hadoop104 The network bandwidth is set to 100mbps.
close hadoop102 host , And according to hadoop102 Clone out hadoop105( modify IP And host name )
hadoop105 Unlimited bandwidth .
Shut down the cluster : Be sure to close it first Kafka, To shut down zk, If you shut it down first zk, It won't close Kafka.






hadoop105 Connected to the xshell 
start-up hadoop102

Set up 105 The network is not limited 
Create a test topic, Set to 3 Zones 2 Copies
[[email protected] kafka]$ bin/kafka-topics.sh --zookeeper hadoop102:2181,hadoop103:2181,hadoop104:2181/kafka --create --replication-factor 2 --partitions 3 --topic test

test
[[email protected] kafka]$ bin/kafka-producer-perf-test.sh --topic test --record-size 100 --num-records 10000000 --throughput -1 --producer-props bootstrap.servers=hadoop102:9092,hadoop103:9092,hadoop104:9092
record-size It's how big a message is , Unit is byte .
num-records How many messages are sent in total .
throughput How many messages per second , set -1, No current limit , Production data as soon as possible , The maximum throughput of the producer can be measured .

batch.size The default value is 16k.
batch.size smaller , Will reduce throughput . for instance , Batch size is 0 Batch processing is completely disabled , Will send messages one by one );
batch.size Too big , It will increase the message sending delay . for instance ,Batch Set to 64k, But wait 5 Second Batch Just filled up 64k, To send out . The delay of this message is 5 Second .

summary
Simultaneous setting batch.size and linger.ms, The message will be sent if any condition is met first
Kafka The balance between high throughput and delay needs to be considered .
Kafka Consumer Pressure test
[[email protected] kafka]$ bin/kafka-consumer-perf-test.sh --broker-list hadoop102:9092,hadoop103:9092,hadoop104:9092 --topic test --fetch-size 10000 --messages 10000000 --threads 1
--broker-list Appoint Kafka The cluster address
--topic Appoint topic The name of
--fetch-size Specify each time fetch The size of the data
--messages Total number of messages consumed

increase fetch-size value , Observe consumption throughput .

summary
Throughput is affected by network bandwidth and throughput fetch-size Influence
Project experience value Kafka Partition number calculation

The number of partitions is generally set to :3-10 individual
边栏推荐
- DVWA学习笔记
- MySQL增删改查&&高级查询语句
- NFT digital collection development: Jingdong "Qida bear takes you to the capital" tourism package
- SQL labs 5-6 customs clearance notes
- 编译原理-语法分析详解
- User defined MVC usage & addition, deletion, modification and query
- NFT数字藏品系统开发:音乐和NFT的结合
- NepCTF 2022 MISC <签到题>(极限套娃)
- 用户连续登陆(允许中断)查询sql
- [pyautogui learning] screen coordinates and mouse scrolling
猜你喜欢

文件上传漏洞常见绕过方式

Niuke question brushing record -- MySQL

基于el-table的树形表格及js-xlsx实现下载excel功能(二)

NFT数字藏品开发/DAPP开发

Eth transfer times reached a one month high

NFT digital collection platform development and construction, source code development digital collection
D2dengine edible tutorial (2) -- drawing images

NFT trading platform digital collection system | development and customization

NFT数字藏品开发:数字藏品在未来究竟有哪些可能的应用场景?

数仓4.0笔记——用户行为数据采集三
随机推荐
Composants web - cycle de vie des éléments personnalisés
Web Component-自定义元素的生命周期
quartz2.2简单调度Job
如何自定义Jsp标签
MySQL之函数&视图&导入导出
数仓4.0笔记——数仓建模
Precautions for realizing "real-time data response" on the page
MySQL之账号管理&&四大引擎&&建库建表
NFT digital collection development: what are the possible application scenarios of digital collections in the future?
Customize foreach tags & select tags to echo data
Federal Reserve governor Waller: decentralized finance may eventually change the traditional financial market
DVWA learning notes
自定义MVC的使用&增删改查
composer的一些操作
Bank of Indonesia governor said the country is actively exploring encrypted assets
Php+ code cloud code hook automatically updates online code
Burpsuite learning notes
NepCTF2022 Writeup
upload-lab第1~4关
渗透测试基础