当前位置:网站首页>Bowen dry goods | Apache inlong uses Apache pulsar to create data warehousing
Bowen dry goods | Apache inlong uses Apache pulsar to create data warehousing
2022-06-11 09:17:00 【StreamNative】
About Apache Pulsar
Apache Pulsar yes Apache Software foundation top projects , It is the next generation cloud native distributed message flow platform , Set message 、 Storage 、 Lightweight functional calculation as a whole , Using the separation of computing and storage architecture design , Multi tenant support 、 Persistent storage 、 Multi machine room cross regional data replication , With strong consistency 、 High throughput 、 Low latency and high scalability streaming data storage features .
GitHub Address :http://github.com/apache/pulsar/
The article is transferred from the official account. :Apache InLong, Original address :https://mp.weixin.qq.com/s/WgVJzu77Hncu-okce8_qaQ
Apache InLong Increased by Apache Pulsar The ability to access data , Make the most of it Pulsar Different from others MQ Technical advantages of , For Finance 、 Billing and other data access scenarios with higher data quality requirements , Provide a complete solution . In the following , We'll show you how to use... Through a complete example Apache InLong Use Apache Pulsar Access to the data .

Apache InLong(incubating) brief introduction
Apache InLong( winged dragon https://inlong.apache.org) Tencent donated it to Apache Community one-stop data flow access service platform , Provide automatic 、 Security 、 Reliable and high-performance data transmission capability , Facilitate business construction and data analysis based on streaming 、 Modeling and Application .InLong Original project name TubeMQ , Focus on high performance 、 Low cost Message Queuing service . To further release TubeMQ The surrounding ecological capacity , We upgraded the project to InLong, Focus on building a one-stop data flow access service platform .Apache InLong With the internal use of Tencent TDBank As a prototype , Relying on trillion level data access and processing capacity , Integrated data collection 、 Converge 、 Storage 、 The whole process of sorting data processing , Easy to use 、 Flexible expansion 、 Stable and reliable .

Apache InLong Serve the whole life cycle from data acquisition to landing , Provide different processing modules according to different stages of data , It mainly includes :
•inlong-agent, Data collection Agent, Support reading general logs from specified directories or files 、 Report item by item . It will also be extended in the future DB collection 、HTTP Reporting and other capabilities ;•inlong-dataproxy, One is based on Flume-ng Of Proxy Components , Support data transmission blocking and falling disk retransmission , Have the ability to forward the received data to different sites MQ( Message queue ) The ability of ;•inlong-tubemq, Tencent's self-developed Message Queuing service , Focus on high-performance storage and transmission of massive data in big data scenarios , It has good core advantages in massive practice and low cost ;•inlong-sort, Yes, from different MQ The data consumed is analyzed ETL Handle , Then aggregate and write Hive、ClickHouse、Hbase、Iceberg And so on ;•inlong-manager, Provide complete data service management and control capability , Including metadata 、 Task flow 、 jurisdiction ,OpenAPI etc. ;•inlong-website, Front end page for managing data access , Simplify the whole InLong Use of control platform .
About Apache Pulsar

Apache Pulsar yes Pub/Sub Model message system , And the separation of storage and calculation is made in the design .Apache Pulsar An architecture that separates computing from storage , And the design of segmented storage is Apache Pulsar Compared with traditional partition based storage MQ Some advantages of :
•Broker and Bookie Are independent of each other , It is convenient to realize independent expansion and independent fault tolerance ;•Broker No state , It's easy to get on quickly 、 Offline , More suitable for cloud native scenes ;• Partitioned storage is not limited to the storage capacity of a single node ;• The partition data is evenly distributed .
Preparation conditions
• install Apache Pulsar, edition 2.6+• install Apache Hive, edition 2.3+
install InLong
Deploy InLong , have access to Docker Compose Implement one click deployment , It can also be deployed on ordinary machines through binary files .
•Docker Compose Deploy :https://inlong.apache.org/zh-CN/docs/next/deployment/docker• Deploy using the installation package :https://inlong.apache.org/zh-CN/docs/next/deployment/bare_metal
The difference in InLong TubeMQ, If you use Apache Pulsar, Need to be in Manager Components are configured during installation Pulsar Cluster information , The format is as follows :
# Pulsar admin URL
pulsar.adminUrl=http://127.0.0.1:8080,127.0.0.2:8080,127.0.0.3:8080
# Pulsar broker address
pulsar.serviceUrl=pulsar://127.0.0.1:6650,127.0.0.1:6650,127.0.0.1:6650
# Default tenant of Pulsar
pulsar.defaultTenant=publicCreate data access
Configure data flow Group Information

When creating data access , Data flow Group Optional message oriented middleware Pulsar, Others follow Pulsar Related configuration items also include :
•Queue module: The queue model , Parallel or sequential , When parallel is selected, you can set Topic The number of partitions , The order is a partition ;•Write quorum: Number of copies written to the message ;•Ack quorum: Confirm write Bookies The number of ;•retention time: Has been consumer The time when the confirmed message is saved ;•ttl: Expiration time of unacknowledged messages ;•retention size: Has been consumer The size of the confirmation message saved .
Configure data flow

When configuring the message source , The path to the data source in the file , Referable inlong-agent in File Agent Detailed guidelines for [1].
Configure data format

To configure Hive colony
preservation Hive flow , Click on “ Submit for approval ”.

Data access approval
Get into Approval management page , Click on My approval , Approve the access application submitted above , After the approval, it will be in Pulsar It is necessary for the cluster to create data flow synchronously Topic And subscriptions .
We can do it in Pulsar The cluster uses the command line tool to check Topic Whether to create successfully :

The configuration file Agent
In profile Agent when , You need to create files according to the directory specified when creating data access :
touch /data/test_file.txt;
Follow the data source format when creating the data flow , Write data to file ( More data can be written in format ):
echo -e "1|test\n2|test\n" >> /data/test_file.txt
Data landing inspection
Last , We log in Hive colony , adopt Hive Of SQL Command view test_stream Whether data has been successfully inserted into the table .
Troubleshoot problems
If the data is not written correctly Hive colony , You can check Dataproxy and Sort Whether relevant information is synchronized :
• Check Inlong-Dataproxy Of conf/topics.properties Whether the file corresponding to the data stream is correctly written in the folder Topic Information b_test_group/test_stream=persistent://public/b_test_group/test_stream• Check InLong Sort Monitoring ZooKeeper Whether the configuration information of the data stream is successfully pushed in :get /inlong_hive/dataflows/{{sink_id}}
Reference link
[1] File Agent Detailed guidelines for : https://inlong.apache.org/docs/next/modules/agent/file#file-agent-configuration
▼ Focus on 「Apache Pulsar」, Get more technical dry goods ▼
Join in Apache Pulsar Chinese communication group

This article is from WeChat official account. - ApachePulsar(ApachePulsar).
If there is any infringement , Please contact the [email protected] Delete .
Participation of this paper “OSC Source creation plan ”, You are welcome to join us , share .
边栏推荐
- OpenCV CEO教你用OAK(四):创建复杂的管道
- openstack详解(二十四)——Neutron服务注册
- Opencv oak-d-w wide angle camera test
- openstack详解(二十三)——Neutron其他配置、数据库初始化与服务启动
- 报错RuntimeError: BlobReader error: The version of imported blob doesn‘t match graph_transformer
- Fabric.js 動態設置字號大小
- CUMT学习日记——ucosII理论解析—任哲版教材
- Sword finger offer 06 Print linked list from end to end
- 【方案开发】红外体温计测温仪方案
- 面试题 17.10. 主要元素
猜你喜欢

Comparison and introduction of OpenCV oak cameras

Openstack explanation (22) -- neutron plug-in configuration

openstack详解(二十一)——Neutron组件安装与配置

Exclusive interview with PMC member Liu Yu: female leadership in Apache pulsar community

Openstack explanation (21) -- installation and configuration of neutron components

Pulsar job Plaza | Tencent, Huawei cloud, shrimp skin, Zhong'an insurance, streamnational and other hot jobs

ArcGIS 10.9.1 地质、气象体元数据处理及服务发布调用

Openstack explanation (XXIII) -- other configurations, database initialization and service startup of neutron

Machine learning notes - in depth Learning Skills Checklist
![[C language - function stack frame] analyze the whole process of function call from the perspective of disassembly](/img/c5/40ea5571f187e525b2310812ff2af8.png)
[C language - function stack frame] analyze the whole process of function call from the perspective of disassembly
随机推荐
1400. 构造 K 个回文字符串
Pulsar job Plaza | Tencent, Huawei cloud, shrimp skin, Zhong'an insurance, streamnational and other hot jobs
机器学习笔记 - 使用TensorFlow的Spatial Transformer网络
Sword finger offer 18 Delete the node of the linked list
报错ModularNotFoundError: No module named ‘find_version’
[software] ERP model selection method for large enterprises
【方案设计】基于单片机开发的家用血氧仪方案
OpenCV OAK相机对比及介绍
ERP体系能帮忙企业处理哪些难题?
Machine learning notes - the story of master kaggle Janio Martinez Bachmann
剑指 Offer II 041. 滑动窗口的平均值
Strength and appearance Coexist -- an exclusive interview with Liu Yu, a member of Apache pulsar PMC
ERP体系的这些优势,你知道吗?
Install jupyter in the specified environment
Opencv oak-d-w wide angle camera test
Version mismatch between installed deeply lib and the required one by the script
2161. 根据给定数字划分数组
Design of wrist sphygmomanometer based on sic32f911ret6
企业决议时,哪个部分应该主导ERP项目?
Why is it difficult to implement informatization in manufacturing industry?