当前位置:网站首页>Nebula importer data import practice
Nebula importer data import practice
2022-07-04 19:14:00 【InfoQ】
Preface
- Need to bring from Kafka、Pulsar Streaming data of the platform , Import Nebula Graph database
- From relational database ( Such as MySQL) Or distributed file systems ( Such as HDFS) Read batch data in
- Large quantities of data need to be generated Nebula Graph Recognable SST file
- Importer Applicable to local CSV Import the contents of the file into Nebula Graph in
- In different Nebula Graph Migrate data between clusters
- In the same Nebula Graph Migrate data between different graph spaces in the cluster
- Nebula Graph Migrate data with other data sources
- combination Nebula Algorithm Do graph calculation
- In different Nebula Graph Migrate data between clusters
- In the same Nebula Graph Migrate data between different graph spaces in the cluster
- Nebula Graph Migrate data with other data sources
Nebula Importer Use
[[email protected] importer]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
Stepping: 7
CPU MHz: 2499.998
BogoMIPS: 4999.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
NUMA node0 CPU(s): 0-15
Disk:SSD
Memory: 128G
Cluster environment
- Nebula Version:v2.6.1
- Deployment way :RPM
- The cluster size : Three copies , Six nodes
Data scale
---------+--------------------------+-----------+
| "Space" | "vertices" | 559191827 |
+---------+--------------------------+-----------+
| "Space" | "edges" | 722490436 |
+---------+--------------------------+-----------+
Importer To configure
# Graph edition , Connect 2.x Is set to v2.
version: v2
description: Relation Space import data
# Whether to delete the temporarily generated logs and error data files .
removeTempFiles: false
clientSettings:
# nGQL Number of retries for statement execution failure .
retry: 3
# Nebula Graph Number of concurrent clients .
concurrency: 5
# Every Nebula Graph The cache queue size of the client .
channelBufferSize: 1024
# Specify the data to import Nebula Graph Graph space .
space: Relation
# Connection information .
connection:
user: root
password: ******
address: 10.0.XXX.XXX:9669,10.0.XXX.XXX:9669
postStart:
# configure connections Nebula Graph After the server , Some operations performed before inserting data .
commands: |
# The interval between the execution of the above command and the execution of the insert data command .
afterPeriod: 1s
preStop:
# Configure disconnect Nebula Graph Some operations performed before connecting to the server .
commands: |
# Error and other log information output file path .
logPath: /mnt/csv_file/prod_relation/err/test.log
....
50 03 15 * * /mnt/csv_file/importer/nebula-importer -config /mnt/csv_file/importer/rel.yaml >> /root/rel.log
2022/05/15 03:50:11 [INFO] statsmgr.go:62: Tick: Time(10.00s), Finished(1952500), Failed(0), Read Failed(0), Latency AVG(4232us), Batches Req AVG(4582us), Rows AVG(195248.59/s)
2022/05/15 03:50:16 [INFO] statsmgr.go:62: Tick: Time(15.00s), Finished(2925600), Failed(0), Read Failed(0), Latency AVG(4421us), Batches Req AVG(4761us), Rows AVG(195039.12/s)
2022/05/15 03:50:21 [INFO] statsmgr.go:62: Tick: Time(20.00s), Finished(3927400), Failed(0), Read Failed(0), Latency AVG(4486us), Batches Req AVG(4818us), Rows AVG(196367.10/s)
2022/05/15 03:50:26 [INFO] statsmgr.go:62: Tick: Time(25.00s), Finished(5140500), Failed(0), Read Failed(0), Latency AVG(4327us), Batches Req AVG(4653us), Rows AVG(205619.44/s)
2022/05/15 03:50:31 [INFO] statsmgr.go:62: Tick: Time(30.00s), Finished(6080800), Failed(0), Read Failed(0), Latency AVG(4431us), Batches Req AVG(4755us), Rows AVG(202693.39/s)
2022/05/15 03:50:36 [INFO] statsmgr.go:62: Tick: Time(35.00s), Finished(7087200), Failed(0), Read Failed(0), Latency AVG(4461us), Batches Req AVG(4784us), Rows AVG(202489.00/s)
The real time
Some notes
- About concurrency , It is mentioned in the question that , This concurrency Designated as your cpu cores Can , Indicates how many client Connect Nebula Server. In practice , Want to go trade off The impact of import speed and server pressure . Test on our side , If concurrency is too high , Will cause disk IO Too high , Trigger some set alarms , It is not recommended to increase concurrency , You can make a trade-off according to the actual business test .
- Importer It can't be continued at breakpoints , If something goes wrong , Need to be handled manually . In practice , We will analyze the program Importer Of log, Handle according to the situation , If any part of the data has unexpected errors , Alarm notification , Artificial intervention , Prevent accidents .
- Hive After the table is generated, it is transferred to Nebula Server, This part of the task The actual time consumption is and Hadoop Resources are closely related , There may be insufficient resources leading to Hive and CSV Table generation time is slow , and Importer Normal running , This part needs to be predicted in advance . Our side is based on hive Task end time and Importer Compare the task start time , To determine whether or not Importer The process of is running normally .
边栏推荐
- 【uniapp】uniapp开发app在线预览pdf文件
- 力扣刷题日记/day7/2022.6.29
- 国元期货是正规平台吗?在国元期货开户安全吗?
- Process of manually encrypt the mass-producing firmware and programming ESP devices
- 神经网络物联网应用技术学什么
- Li Kou brush question diary /day7/2022.6.29
- 每日一题(2022-07-02)——最低加油次数
- 建立自己的网站(15)
- 力扣刷题日记/day3/2022.6.25
- Crawler (6) - Web page data parsing (2) | the use of beautifulsoup4 in Crawlers
猜你喜欢
VMware Tools和open-vm-tools的安装与使用:解决虚拟机不全屏和无法传输文件的问题
Wireshark packet capturing TLS protocol bar displays version inconsistency
How to modify icons in VBS or VBE
How is the entered query SQL statement executed?
ByteDance dev better technology salon was successfully held, and we joined hands with Huatai to share our experience in improving the efficiency of web research and development
Wanghongru research group of Institute of genomics, Chinese Academy of Agricultural Sciences is cordially invited to join
What if the self incrementing ID of online MySQL is exhausted?
Deleting nodes in binary search tree
【2022年江西省研究生数学建模】水汽过饱和的核化除霾 思路分析及代码实现
Nature microbiology | viral genomes in six deep-sea sediments that can infect Archaea asgardii
随机推荐
ByteDance dev better technology salon was successfully held, and we joined hands with Huatai to share our experience in improving the efficiency of web research and development
My colleagues quietly told me that flying Book notification can still play like this
神经网络物联网平台搭建(物联网平台搭建实战教程)
Learning path PHP -- phpstudy "hosts file does not exist or is blocked from opening" when creating the project
6.26cf simulation race e: solution to the problem of price maximization
请教一下 flinksql中 除了数据统计结果是状态被保存 数据本身也是状态吗
英特尔集成光电研究最新进展推动共封装光学和光互连技术进步
[2022 Jiangxi graduate mathematical modeling] curling movement idea analysis and code implementation
工厂从自动化到数字孪生,图扑能干什么?
Li Chi's work and life summary in June 2022
【2022年江西省研究生数学建模】水汽过饱和的核化除霾 思路分析及代码实现
力扣刷题日记/day1/2022.6.23
爬虫(6) - 网页数据解析(2) | BeautifulSoup4在爬虫中的使用
What if the self incrementing ID of online MySQL is exhausted?
《看完就懂系列》字符串截取方法substr() 、 slice() 和 substring()之间的区别和用法
How to download files using WGet and curl
李迟2022年6月工作生活总结
Scala基础教程--14--隐式转换
.NET ORM框架HiSql实战-第二章-使用Hisql实现菜单管理(增删改查)
Scala basic tutorial -- 15 -- recursion