当前位置:网站首页>Introduce you to ldbc SNB, a powerful tool for database performance and scenario testing
Introduce you to ldbc SNB, a powerful tool for database performance and scenario testing
2022-06-27 15:49:00 【Huawei cloud developer Alliance】
Abstract : This paper mainly introduces the data generator based on interactive query ( Hereinafter referred to as" Datagen), And LDBC SNB How data is served in Huawei graphics engine GES Application in .
This article is shared from Huawei cloud community 《【 Figure database performance and scenario test tools LDBC SNB】 A series of : Introduction to data generator & be applied to GES service 》, author : Farce and ball
The main content of this article includes : A data generator based on interactive queries ( Hereinafter referred to as" Datagen) Introduce , And LDBC SNB How data is served in Huawei graphics engine GES Application in .LDBC SNB Preset nodes and relationships 、 Test cases for data generators and systems , Form a logical self - appropriate data “ Wulin ”, With ldbc snb For testing standard graph database products , Like the Xiake walking in it , All have to follow the same set “ Wulin rules ”( The test case ), Who can defeat all the experts , To be the leader of the Alliance ?
LDBC SNB summary
LDBC SNB, Full name The Linked Data Benchmark Council’s Social Network Benchmark, Official website address :http://ldbcouncil.org.LDBC It is an industrial alliance organization dedicated to developing map data management , It developed a set of standard benchmarks, It is used to systematically measure the function and performance of different graph database products .SNB It is a group based on social network scenario development benchmarks, By interactive scene (Interactive workload) And business intelligence scenarios (Business Intelligence workload) form .
LDBC SNB The project includes 3 A component : Data generator (Datagen)、 Test driver (Test Driver, Used to perform Benchmark Test of ) And test case implementation (Reference Implementation, Currently available based on Cypher(Neo4j) and SQL(PostgreSQL) Test case implementation of two query languages )
LDBC SNB There are two working modes :
1、 Interactive query (Interactive workload), It is suitable for transactional online query scenarios , For example, basic addition, deletion, modification and query 、shortestpath、 Jump more and wait ;
2、 business intelligence (Business Intelligence workload), It is applicable to complex queries and large-scale offline graph analysis based on enterprise business scenarios .
In different working modes ,【Datagen】、【Test Driver】 and 【 Test case implementation 】 It's all different .
Chapter overview
One 、Datagen Introduce
- Data model
- Data Types
- Data Schema
- Datagen Installation and operation process of
- Datagen Parameter Settings
- General parameter settings
- Scale factor
- Serialization mode
Two 、LDBC SNB stay GES Application in
One 、Datagen Introduce
Data model
Data Types
Datagen Supported properties datatype as follows , Each attribute supports both single value and list modes .

( The screenshot comes from the official documents http://ldbcouncil.org/ldbc_snb_docs/ldbc-snb-specification.pdf)
Data Schema

( The screenshot comes from the official documents http://ldbcouncil.org/ldbc_snb_docs/ldbc-snb-specification.pdf)
As shown in the figure ,Datagen The generated data has a preset set of graph models , Include :
8 Types of nodes :organization & place & tag & tagClass & person & forum & post & comment
15 Kind of relationship , The following table :

These preset nodes and relationships , Form a logical self - appropriate data “ Wulin ”, With ldbc snb For testing standard graph database products , Like the Xiake walking in it , All have to follow the same set “ Wulin rules ”( The test case ), Who can defeat all the experts , To be the leader of the Alliance ? Let's wait and see .
Installation and operation process
stay Interactive Workload In mode ,Datagen The base of is hadoop; stay BI Workload In mode , The base is Spark.
This survey mainly uses pseudo distributed hadoop Of Datagen.
1) Download based on hadoop Of ldbc datagen
GitHub - ldbc/ldbc_snb_datagen_hadoop: The Hadoop-based variant of the SNB Datagen
2) Use pseudo distributed hadoop
cd ldbc_snb_datagen_hadoop/
cp params-csv-composite.ini params.ini
wget http://archive.apache.org/dist/hadoop/core/hadoop-3.2.1/hadoop-3.2.1.tar.gz
tar xf hadoop-3.2.1.tar.gz
export HADOOP_CLIENT_OPTS="-Xmx2G"
# set this to the Hadoop 3.2.1 directory
export HADOOP_HOME=`pwd`/hadoop-3.2.1
./run.sh3) Missing... At compile time jar Package problem solving ( An error is as follows )

Solution :
from windows Environment Download https://simulation.tudelft.nl/maven/dsol/dsol-xml/1.6.9/

Manually install the missing jar Package to local maven Warehouse
mvn install:install-file -Dfile=dsol-xml-1.6.9.jar -DgroupId=dsol -DartifactId=dsol-xml -Dversion=1.6.9 -Dpackaging=jar4) Run again , Complete build
sh run.shThe generated data file is stored in ${outputDir}/social_network.
Parameter setting
( The following parameter descriptions omit the prefix “ldbc.snb.datagen.”, That is, the complete format of the parameter is “ldbc.snb.datagen.xxx”)
1) Conventional parameters

2) Scale factor
LDBC SNB Support the generation of graph datasets of different sizes ,generator.scaleFactor The number of points and edges corresponding to each parameter value is shown in the following table :

( The screenshot comes from the official documents http://ldbcouncil.org/ldbc_snb_docs/ldbc-snb-specification.pdf)
3) Serialization mode
Datagen There are mainly 4 Kind of Csv Serialization mode of file , The generated data formats vary .
CsvBasic
Basic serialization mode , Each node 、 Nodes and relationships between nodes are independent csv file , As shown in Figure 1 :

Figure 1 Each node 、 Nodes and relationships between nodes are independent csv file , among person_xx.csv Are all person Attribute data of the node .
If an attribute has multiple values , for example person Of email Property has multiple values , Will person Of email Records generate a separate csv file , And many more email Display in multiple lines , As shown in Figure 2 :

Figure 2 person Of email Attributes are stored separately , And in multiple email Display in multiple records
CsvComposite( Data generated by this schema , And GES Supported by Csv The format similarity is the highest )
stay CsvBasic On the basis of , Combine attributes with multiple values and other attributes into one record , As shown in Figure 3 ; And merge multiple values ( With list The format of , Semicolons separate ), As shown in figure 4 ;

Figure 3 person The attribute records of nodes are merged into person_0_0.csv

Figure 4 language and email Two list Attributes are merged on one line
CsvMergeForeign
stay CsvBasic On the basis of , If the relationship between nodes is 1 To many , Then the relationship is merged into the attribute file of the node as a foreign key , As shown in figure 5

Figure 5 take comment-hasCreator->person、comment-isLocatedIn->place、comment-replyOf->post、comment-replyOf->comment Relationship with comment Properties file merge
CsvCompositeMergeForeign
yes CsvComposite and CsvMergeForeign The combination of , Both merged list attribute , The one to many relation is compressed , As shown in figure 6

Figure 6 place Column means person-isLocatedIn->place The foreign key representation of the relationship , meanwhile language and email With list Form show
The parameter values corresponding to each serialization mode are as follows
CsvBasic
- ldbc.snb.datagen.serializer.dynamicActivitySerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.activity.CsvBasicDynamicActivitySerializer
- ldbc.snb.datagen.serializer.dynamicPersonSerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.person.CsvBasicDynamicPersonSerializer
- #ldbc.snb.datagen.serializer.staticSerializer:ldbc.snb.datagen.serializer.snb.csv.staticserializer.CsvBasicStaticSerializer
CsvComposite
- ldbc.snb.datagen.serializer.dynamicActivitySerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.activity.CsvCompositeDynamicActivitySerializer
- ldbc.snb.datagen.serializer.dynamicPersonSerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.person.CsvCompositeDynamicPersonSerializer
- ldbc.snb.datagen.serializer.staticSerializer:ldbc.snb.datagen.serializer.snb.csv.staticserializer.CsvCompositeStaticSerializer
CsvMergeForeign
- ldbc.snb.datagen.serializer.dynamicActivitySerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.activity.CsvMergeForeignDynamicActivitySerializer
- ldbc.snb.datagen.serializer.dynamicPersonSerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.person.CsvMergeForeignDynamicPersonSerializer
- ldbc.snb.datagen.serializer.staticSerializer:ldbc.snb.datagen.serializer.snb.csv.staticserializer.CsvMergeForeignStaticSerializer
CsvCompositeMergeForeign
- ldbc.snb.datagen.serializer.dynamicActivitySerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.activity.CsvCompositeMergeForeignDynamicActivitySerializer
- ldbc.snb.datagen.serializer.dynamicPersonSerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.person.CsvCompositeMergeForeignDynamicPersonSerializer
- ldbc.snb.datagen.serializer.staticSerializer:ldbc.snb.datagen.serializer.snb.csv.staticserializer.CsvCompositeMergeForeignStaticSerializer
Two 、LDBC SNB stay GES Application in
Datagen The resulting dataset is consistent with GES The format is as follows 3 Make a difference
- Different label The point of id There may be id Repetition ;
- knows The relationship is two-way ;
- No, label Column .
Use DatagenToGES Data conversion script ( be based on CsvComposite Serialization mode ) Can be LDBC Count , Need to be in python3.6 Operation in environment .
DatagenTOGES The script has the following functions :
- take 8 Node types are mapped to 1-8 A number prefix , The original id Convert to start with a numeric prefix 、 The length is 20bytes The new id, Solve the difference label Between the points of id Repetitive questions ;
- increase knows Reverse edge data for edge files ;
- increase label Column .
File format before conversion (CsvComposite Serialization mode ):


Converted file format :
DatagenToGES The conversion scale factor is 100 It takes about half an hour for a large data set .
Data conversion script core code snippet :

stay GES Import the converted LDBC SNB( The sample data is SF0.1), And implement PageRank Algorithm , The effect is as follows :

Click to follow , The first time to learn about Huawei's new cloud technology ~
边栏推荐
- 避孕套巨头过去两年销量下降40% ,下降原因是什么?
- 守护雪山之王:这些AI研究者找到了技术的新「用武之地」
- Vscode uses yapf auto format to set the maximum number of characters per line
- What are the characteristics of fixed income + products?
- Top ten Devops best practices worthy of attention in 2022
- ICML 2022 | 阿⾥达摩院最新FEDformer,⻓程时序预测全⾯超越SOTA
- #27ES6的数值扩展
- About fast exponentiation
- [digital signal processing] discrete time signal (analog signal, discrete time signal, digital signal | sampling leads to time discrete | quantization leads to amplitude discrete)
- express
猜你喜欢

Vulnerability recurrence ----- 34. Yapi remote command execution vulnerability

2022年最新《谷粒学院开发教程》:8 - 前台登录功能

2022-2-16 learning the imitated Niuke project - Section 6 adding comments

Hyperledger Fabric 2. X custom smart contract

#27ES6的数值扩展

带你认识图数据库性能和场景测试利器LDBC SNB

28 object method extension

PSS: you are only two convolution layers away from the NMS free+ point | 2021 paper

Slow bear market, bit Store provides stable stacking products to help you cross the bull and bear

Atomic operation class
随机推荐
我想买固收+产品,但是不了解它主要投资哪些方面,有人知道吗?
Use redis to automatically cancel orders within 30 minutes
If you want to use DMS to handle database permissions, can you only use Alibaba cloud ram accounts (Alibaba cloud RDS)
Beginner level Luogu 2 [branch structure] problem list solution
Let's talk about the process of ES Indexing Documents
Design of CAN bus controller based on FPGA (with main codes)
Basic configuration and usage of Jupiter notebook
PSS:你距离NMS-free+提点只有两个卷积层 | 2021论文
What is the London Silver code
Expert: those who let you go to a good school with a low score are all Scams
带你认识图数据库性能和场景测试利器LDBC SNB
【170】PostgreSQL 10字段类型从字符串修改成整型,报错column cannot be cast automatically to type integer
一场分销裂变活动,不止是发发朋友圈这么简单!
Teach you how to package and release the mofish Library
一场分销裂变活动,不止是发发朋友圈这么简单!
QT notes (XXVIII) using qwebengineview to display web pages
Principle Comparison and analysis of mechanical hard disk and SSD solid state disk
Fundamentals of software engineering (I)
域名绑定动态IP最佳实践
Can the teacher tell me what the fixed income + products are mainly invested in?