当前位置:网站首页>Nebula importer data import practice
Nebula importer data import practice
2022-07-04 19:14:00 【InfoQ】
Preface
- Need to bring from Kafka、Pulsar Streaming data of the platform , Import Nebula Graph database
- From relational database ( Such as MySQL) Or distributed file systems ( Such as HDFS) Read batch data in
- Large quantities of data need to be generated Nebula Graph Recognable SST file
- Importer Applicable to local CSV Import the contents of the file into Nebula Graph in
- In different Nebula Graph Migrate data between clusters
- In the same Nebula Graph Migrate data between different graph spaces in the cluster
- Nebula Graph Migrate data with other data sources
- combination Nebula Algorithm Do graph calculation
- In different Nebula Graph Migrate data between clusters
- In the same Nebula Graph Migrate data between different graph spaces in the cluster
- Nebula Graph Migrate data with other data sources
![](/img/bf/e819955d17e29e62616a5c99381a5d.png)
Nebula Importer Use
[[email protected] importer]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
Stepping: 7
CPU MHz: 2499.998
BogoMIPS: 4999.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
NUMA node0 CPU(s): 0-15
Disk:SSD
Memory: 128G
Cluster environment
- Nebula Version:v2.6.1
- Deployment way :RPM
- The cluster size : Three copies , Six nodes
Data scale
---------+--------------------------+-----------+
| "Space" | "vertices" | 559191827 |
+---------+--------------------------+-----------+
| "Space" | "edges" | 722490436 |
+---------+--------------------------+-----------+
Importer To configure
# Graph edition , Connect 2.x Is set to v2.
version: v2
description: Relation Space import data
# Whether to delete the temporarily generated logs and error data files .
removeTempFiles: false
clientSettings:
# nGQL Number of retries for statement execution failure .
retry: 3
# Nebula Graph Number of concurrent clients .
concurrency: 5
# Every Nebula Graph The cache queue size of the client .
channelBufferSize: 1024
# Specify the data to import Nebula Graph Graph space .
space: Relation
# Connection information .
connection:
user: root
password: ******
address: 10.0.XXX.XXX:9669,10.0.XXX.XXX:9669
postStart:
# configure connections Nebula Graph After the server , Some operations performed before inserting data .
commands: |
# The interval between the execution of the above command and the execution of the insert data command .
afterPeriod: 1s
preStop:
# Configure disconnect Nebula Graph Some operations performed before connecting to the server .
commands: |
# Error and other log information output file path .
logPath: /mnt/csv_file/prod_relation/err/test.log
....
50 03 15 * * /mnt/csv_file/importer/nebula-importer -config /mnt/csv_file/importer/rel.yaml >> /root/rel.log
2022/05/15 03:50:11 [INFO] statsmgr.go:62: Tick: Time(10.00s), Finished(1952500), Failed(0), Read Failed(0), Latency AVG(4232us), Batches Req AVG(4582us), Rows AVG(195248.59/s)
2022/05/15 03:50:16 [INFO] statsmgr.go:62: Tick: Time(15.00s), Finished(2925600), Failed(0), Read Failed(0), Latency AVG(4421us), Batches Req AVG(4761us), Rows AVG(195039.12/s)
2022/05/15 03:50:21 [INFO] statsmgr.go:62: Tick: Time(20.00s), Finished(3927400), Failed(0), Read Failed(0), Latency AVG(4486us), Batches Req AVG(4818us), Rows AVG(196367.10/s)
2022/05/15 03:50:26 [INFO] statsmgr.go:62: Tick: Time(25.00s), Finished(5140500), Failed(0), Read Failed(0), Latency AVG(4327us), Batches Req AVG(4653us), Rows AVG(205619.44/s)
2022/05/15 03:50:31 [INFO] statsmgr.go:62: Tick: Time(30.00s), Finished(6080800), Failed(0), Read Failed(0), Latency AVG(4431us), Batches Req AVG(4755us), Rows AVG(202693.39/s)
2022/05/15 03:50:36 [INFO] statsmgr.go:62: Tick: Time(35.00s), Finished(7087200), Failed(0), Read Failed(0), Latency AVG(4461us), Batches Req AVG(4784us), Rows AVG(202489.00/s)
The real time
Some notes
- About concurrency , It is mentioned in the question that , This concurrency Designated as your cpu cores Can , Indicates how many client Connect Nebula Server. In practice , Want to go trade off The impact of import speed and server pressure . Test on our side , If concurrency is too high , Will cause disk IO Too high , Trigger some set alarms , It is not recommended to increase concurrency , You can make a trade-off according to the actual business test .
- Importer It can't be continued at breakpoints , If something goes wrong , Need to be handled manually . In practice , We will analyze the program Importer Of log, Handle according to the situation , If any part of the data has unexpected errors , Alarm notification , Artificial intervention , Prevent accidents .
- Hive After the table is generated, it is transferred to Nebula Server, This part of the task The actual time consumption is and Hadoop Resources are closely related , There may be insufficient resources leading to Hive and CSV Table generation time is slow , and Importer Normal running , This part needs to be predicted in advance . Our side is based on hive Task end time and Importer Compare the task start time , To determine whether or not Importer The process of is running normally .
边栏推荐
- Scala基础教程--13--函数进阶
- [release] a tool for testing WebService and database connection - dbtest v1.0
- .NET ORM框架HiSql实战-第二章-使用Hisql实现菜单管理(增删改查)
- C语言打印练习
- IBM WebSphere MQ retrieving messages
- Neglected problem: test environment configuration management
- Crawler (6) - Web page data parsing (2) | the use of beautifulsoup4 in Crawlers
- VMware Tools和open-vm-tools的安装与使用:解决虚拟机不全屏和无法传输文件的问题
- Scala基础教程--15--递归
- 删除字符串中出现次数最少的字符【JS,Map排序,正则】
猜你喜欢
Li Kou brush question diary /day2/2022.6.24
【2022年江西省研究生数学建模】冰壶运动 思路分析及代码实现
【2022年江西省研究生数学建模】水汽过饱和的核化除霾 思路分析及代码实现
激进技术派 vs 项目保守派的微服务架构之争
基于unity的愤怒的小鸟设计
基于C语言的菜鸟驿站管理系统
Scala基础教程--16--泛型
Nature Microbiology | 可感染阿斯加德古菌的六种深海沉积物中的病毒基因组
Nebula Importer 数据导入实践
Wireshark packet capturing TLS protocol bar displays version inconsistency
随机推荐
输入的查询SQL语句,是如何执行的?
《看完就懂系列》字符串截取方法substr() 、 slice() 和 substring()之间的区别和用法
2022养生展,健康展,北京大健康展,健康产业展11月举办
Esp32-c3 introductory tutorial questions ⑫ - undefined reference to ROM_ temp_ to_ power, in function phy_ get_ romfunc_ addr
ESP32-C3入门教程 问题篇⑫——undefined reference to rom_temp_to_power, in function phy_get_romfunc_addr
神经网络物联网应用技术学什么
From automation to digital twins, what can Tupo do?
My colleagues quietly told me that flying Book notification can still play like this
中国农科院基因组所汪鸿儒课题组诚邀加入
Build your own website (15)
2022健康展,北京健博会,中国健康展,大健康展11月13日
李迟2022年6月工作生活总结
Li Kou brush question diary /day8/7.1
Scala基础教程--19--Actor
Installation and use of VMware Tools and open VM tools: solve the problems of incomplete screen and unable to transfer files of virtual machines
PB的扩展DLL开发(超级篇)(七)
Scala基础教程--16--泛型
Download the first Tencent technology open day course essence!
Li Kou brush question diary /day5/2022.6.27
英特尔集成光电研究最新进展推动共封装光学和光互连技术进步