当前位置:网站首页>Nebula importer data import practice
Nebula importer data import practice
2022-07-04 19:14:00 【InfoQ】
Preface
- Need to bring from Kafka、Pulsar Streaming data of the platform , Import Nebula Graph database
- From relational database ( Such as MySQL) Or distributed file systems ( Such as HDFS) Read batch data in
- Large quantities of data need to be generated Nebula Graph Recognable SST file
- Importer Applicable to local CSV Import the contents of the file into Nebula Graph in
- In different Nebula Graph Migrate data between clusters
- In the same Nebula Graph Migrate data between different graph spaces in the cluster
- Nebula Graph Migrate data with other data sources
- combination Nebula Algorithm Do graph calculation
- In different Nebula Graph Migrate data between clusters
- In the same Nebula Graph Migrate data between different graph spaces in the cluster
- Nebula Graph Migrate data with other data sources
data:image/s3,"s3://crabby-images/60461/60461a9c1e5c5cc92577184ccd3f424f687620f8" alt=""
Nebula Importer Use
[[email protected] importer]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
Stepping: 7
CPU MHz: 2499.998
BogoMIPS: 4999.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
NUMA node0 CPU(s): 0-15
Disk:SSD
Memory: 128G
Cluster environment
- Nebula Version:v2.6.1
- Deployment way :RPM
- The cluster size : Three copies , Six nodes
Data scale
---------+--------------------------+-----------+
| "Space" | "vertices" | 559191827 |
+---------+--------------------------+-----------+
| "Space" | "edges" | 722490436 |
+---------+--------------------------+-----------+
Importer To configure
# Graph edition , Connect 2.x Is set to v2.
version: v2
description: Relation Space import data
# Whether to delete the temporarily generated logs and error data files .
removeTempFiles: false
clientSettings:
# nGQL Number of retries for statement execution failure .
retry: 3
# Nebula Graph Number of concurrent clients .
concurrency: 5
# Every Nebula Graph The cache queue size of the client .
channelBufferSize: 1024
# Specify the data to import Nebula Graph Graph space .
space: Relation
# Connection information .
connection:
user: root
password: ******
address: 10.0.XXX.XXX:9669,10.0.XXX.XXX:9669
postStart:
# configure connections Nebula Graph After the server , Some operations performed before inserting data .
commands: |
# The interval between the execution of the above command and the execution of the insert data command .
afterPeriod: 1s
preStop:
# Configure disconnect Nebula Graph Some operations performed before connecting to the server .
commands: |
# Error and other log information output file path .
logPath: /mnt/csv_file/prod_relation/err/test.log
....
50 03 15 * * /mnt/csv_file/importer/nebula-importer -config /mnt/csv_file/importer/rel.yaml >> /root/rel.log
2022/05/15 03:50:11 [INFO] statsmgr.go:62: Tick: Time(10.00s), Finished(1952500), Failed(0), Read Failed(0), Latency AVG(4232us), Batches Req AVG(4582us), Rows AVG(195248.59/s)
2022/05/15 03:50:16 [INFO] statsmgr.go:62: Tick: Time(15.00s), Finished(2925600), Failed(0), Read Failed(0), Latency AVG(4421us), Batches Req AVG(4761us), Rows AVG(195039.12/s)
2022/05/15 03:50:21 [INFO] statsmgr.go:62: Tick: Time(20.00s), Finished(3927400), Failed(0), Read Failed(0), Latency AVG(4486us), Batches Req AVG(4818us), Rows AVG(196367.10/s)
2022/05/15 03:50:26 [INFO] statsmgr.go:62: Tick: Time(25.00s), Finished(5140500), Failed(0), Read Failed(0), Latency AVG(4327us), Batches Req AVG(4653us), Rows AVG(205619.44/s)
2022/05/15 03:50:31 [INFO] statsmgr.go:62: Tick: Time(30.00s), Finished(6080800), Failed(0), Read Failed(0), Latency AVG(4431us), Batches Req AVG(4755us), Rows AVG(202693.39/s)
2022/05/15 03:50:36 [INFO] statsmgr.go:62: Tick: Time(35.00s), Finished(7087200), Failed(0), Read Failed(0), Latency AVG(4461us), Batches Req AVG(4784us), Rows AVG(202489.00/s)
The real time
Some notes
- About concurrency , It is mentioned in the question that , This concurrency Designated as your cpu cores Can , Indicates how many client Connect Nebula Server. In practice , Want to go trade off The impact of import speed and server pressure . Test on our side , If concurrency is too high , Will cause disk IO Too high , Trigger some set alarms , It is not recommended to increase concurrency , You can make a trade-off according to the actual business test .
- Importer It can't be continued at breakpoints , If something goes wrong , Need to be handled manually . In practice , We will analyze the program Importer Of log, Handle according to the situation , If any part of the data has unexpected errors , Alarm notification , Artificial intervention , Prevent accidents .
- Hive After the table is generated, it is transferred to Nebula Server, This part of the task The actual time consumption is and Hadoop Resources are closely related , There may be insufficient resources leading to Hive and CSV Table generation time is slow , and Importer Normal running , This part needs to be predicted in advance . Our side is based on hive Task end time and Importer Compare the task start time , To determine whether or not Importer The process of is running normally .
边栏推荐
- VMware Tools和open-vm-tools的安装与使用:解决虚拟机不全屏和无法传输文件的问题
- 【OpenCV入门到精通之九】OpenCV之视频截取、图片与视频互转
- 2022养生展,健康展,北京大健康展,健康产业展11月举办
- Esp32-c3 introductory tutorial questions ⑫ - undefined reference to ROM_ temp_ to_ power, in function phy_ get_ romfunc_ addr
- Halcon template matching
- Scala基础教程--15--递归
- 使用FTP
- 利用策略模式优化if代码【策略模式】
- 力扣刷题日记/day3/2022.6.25
- [mathematical basis of machine learning] (I) linear algebra (Part 1 +)
猜你喜欢
爬虫(6) - 网页数据解析(2) | BeautifulSoup4在爬虫中的使用
Halcon template matching
DeFi生态NFT流动性挖矿系统开发搭建
1、 Introduction to C language
[mathematical modeling of graduate students in Jiangxi Province in 2022] analysis and code implementation of haze removal by nucleation of water vapor supersaturation
Neglected problem: test environment configuration management
How is the entered query SQL statement executed?
[release] a tool for testing WebService and database connection - dbtest v1.0
从实时应用角度谈通信总线仲裁机制和网络流控
【2022年江西省研究生数学建模】冰壶运动 思路分析及代码实现
随机推荐
力扣刷题日记/day1/2022.6.23
From automation to digital twins, what can Tupo do?
Basic tutorial of scala -- 16 -- generics
Li Kou brush question diary /day4/6.26
字节跳动Dev Better技术沙龙成功举办,携手华泰分享Web研发效能提升经验
请教一下 flinksql中 除了数据统计结果是状态被保存 数据本身也是状态吗
Caché WebSocket
力扣刷题日记/day2/2022.6.24
Scala基础教程--14--隐式转换
Digital "new" operation and maintenance of energy industry
6.26cf simulation match B: solution to array reduction problem
Scala basic tutorial -- 12 -- Reading and writing data
What types of Thawte wildcard SSL certificates provide
Improve the accuracy of 3D reconstruction of complex scenes | segmentation of UAV Remote Sensing Images Based on paddleseg
力扣刷题日记/day3/2022.6.25
2022养生展,健康展,北京大健康展,健康产业展11月举办
Using FTP
信息学奥赛一本通 1336:【例3-1】找树根和孩子
6.26CF模拟赛B:数组缩减题解
物联网应用技术的就业前景和现状