当前位置:网站首页>Introduce you to ldbc SNB, a powerful tool for database performance and scenario testing
Introduce you to ldbc SNB, a powerful tool for database performance and scenario testing
2022-06-27 15:49:00 【Huawei cloud developer Alliance】
Abstract : This paper mainly introduces the data generator based on interactive query ( Hereinafter referred to as" Datagen), And LDBC SNB How data is served in Huawei graphics engine GES Application in .
This article is shared from Huawei cloud community 《【 Figure database performance and scenario test tools LDBC SNB】 A series of : Introduction to data generator & be applied to GES service 》, author : Farce and ball
The main content of this article includes : A data generator based on interactive queries ( Hereinafter referred to as" Datagen) Introduce , And LDBC SNB How data is served in Huawei graphics engine GES Application in .LDBC SNB Preset nodes and relationships 、 Test cases for data generators and systems , Form a logical self - appropriate data “ Wulin ”, With ldbc snb For testing standard graph database products , Like the Xiake walking in it , All have to follow the same set “ Wulin rules ”( The test case ), Who can defeat all the experts , To be the leader of the Alliance ?
LDBC SNB summary
LDBC SNB, Full name The Linked Data Benchmark Council’s Social Network Benchmark, Official website address :http://ldbcouncil.org.LDBC It is an industrial alliance organization dedicated to developing map data management , It developed a set of standard benchmarks, It is used to systematically measure the function and performance of different graph database products .SNB It is a group based on social network scenario development benchmarks, By interactive scene (Interactive workload) And business intelligence scenarios (Business Intelligence workload) form .
LDBC SNB The project includes 3 A component : Data generator (Datagen)、 Test driver (Test Driver, Used to perform Benchmark Test of ) And test case implementation (Reference Implementation, Currently available based on Cypher(Neo4j) and SQL(PostgreSQL) Test case implementation of two query languages )
LDBC SNB There are two working modes :
1、 Interactive query (Interactive workload), It is suitable for transactional online query scenarios , For example, basic addition, deletion, modification and query 、shortestpath、 Jump more and wait ;
2、 business intelligence (Business Intelligence workload), It is applicable to complex queries and large-scale offline graph analysis based on enterprise business scenarios .
In different working modes ,【Datagen】、【Test Driver】 and 【 Test case implementation 】 It's all different .
Chapter overview
One 、Datagen Introduce
- Data model
- Data Types
- Data Schema
- Datagen Installation and operation process of
- Datagen Parameter Settings
- General parameter settings
- Scale factor
- Serialization mode
Two 、LDBC SNB stay GES Application in
One 、Datagen Introduce
Data model
Data Types
Datagen Supported properties datatype as follows , Each attribute supports both single value and list modes .

( The screenshot comes from the official documents http://ldbcouncil.org/ldbc_snb_docs/ldbc-snb-specification.pdf)
Data Schema

( The screenshot comes from the official documents http://ldbcouncil.org/ldbc_snb_docs/ldbc-snb-specification.pdf)
As shown in the figure ,Datagen The generated data has a preset set of graph models , Include :
8 Types of nodes :organization & place & tag & tagClass & person & forum & post & comment
15 Kind of relationship , The following table :

These preset nodes and relationships , Form a logical self - appropriate data “ Wulin ”, With ldbc snb For testing standard graph database products , Like the Xiake walking in it , All have to follow the same set “ Wulin rules ”( The test case ), Who can defeat all the experts , To be the leader of the Alliance ? Let's wait and see .
Installation and operation process
stay Interactive Workload In mode ,Datagen The base of is hadoop; stay BI Workload In mode , The base is Spark.
This survey mainly uses pseudo distributed hadoop Of Datagen.
1) Download based on hadoop Of ldbc datagen
GitHub - ldbc/ldbc_snb_datagen_hadoop: The Hadoop-based variant of the SNB Datagen
2) Use pseudo distributed hadoop
cd ldbc_snb_datagen_hadoop/
cp params-csv-composite.ini params.ini
wget http://archive.apache.org/dist/hadoop/core/hadoop-3.2.1/hadoop-3.2.1.tar.gz
tar xf hadoop-3.2.1.tar.gz
export HADOOP_CLIENT_OPTS="-Xmx2G"
# set this to the Hadoop 3.2.1 directory
export HADOOP_HOME=`pwd`/hadoop-3.2.1
./run.sh3) Missing... At compile time jar Package problem solving ( An error is as follows )

Solution :
from windows Environment Download https://simulation.tudelft.nl/maven/dsol/dsol-xml/1.6.9/

Manually install the missing jar Package to local maven Warehouse
mvn install:install-file -Dfile=dsol-xml-1.6.9.jar -DgroupId=dsol -DartifactId=dsol-xml -Dversion=1.6.9 -Dpackaging=jar4) Run again , Complete build
sh run.shThe generated data file is stored in ${outputDir}/social_network.
Parameter setting
( The following parameter descriptions omit the prefix “ldbc.snb.datagen.”, That is, the complete format of the parameter is “ldbc.snb.datagen.xxx”)
1) Conventional parameters

2) Scale factor
LDBC SNB Support the generation of graph datasets of different sizes ,generator.scaleFactor The number of points and edges corresponding to each parameter value is shown in the following table :

( The screenshot comes from the official documents http://ldbcouncil.org/ldbc_snb_docs/ldbc-snb-specification.pdf)
3) Serialization mode
Datagen There are mainly 4 Kind of Csv Serialization mode of file , The generated data formats vary .
CsvBasic
Basic serialization mode , Each node 、 Nodes and relationships between nodes are independent csv file , As shown in Figure 1 :

Figure 1 Each node 、 Nodes and relationships between nodes are independent csv file , among person_xx.csv Are all person Attribute data of the node .
If an attribute has multiple values , for example person Of email Property has multiple values , Will person Of email Records generate a separate csv file , And many more email Display in multiple lines , As shown in Figure 2 :

Figure 2 person Of email Attributes are stored separately , And in multiple email Display in multiple records
CsvComposite( Data generated by this schema , And GES Supported by Csv The format similarity is the highest )
stay CsvBasic On the basis of , Combine attributes with multiple values and other attributes into one record , As shown in Figure 3 ; And merge multiple values ( With list The format of , Semicolons separate ), As shown in figure 4 ;

Figure 3 person The attribute records of nodes are merged into person_0_0.csv

Figure 4 language and email Two list Attributes are merged on one line
CsvMergeForeign
stay CsvBasic On the basis of , If the relationship between nodes is 1 To many , Then the relationship is merged into the attribute file of the node as a foreign key , As shown in figure 5

Figure 5 take comment-hasCreator->person、comment-isLocatedIn->place、comment-replyOf->post、comment-replyOf->comment Relationship with comment Properties file merge
CsvCompositeMergeForeign
yes CsvComposite and CsvMergeForeign The combination of , Both merged list attribute , The one to many relation is compressed , As shown in figure 6

Figure 6 place Column means person-isLocatedIn->place The foreign key representation of the relationship , meanwhile language and email With list Form show
The parameter values corresponding to each serialization mode are as follows
CsvBasic
- ldbc.snb.datagen.serializer.dynamicActivitySerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.activity.CsvBasicDynamicActivitySerializer
- ldbc.snb.datagen.serializer.dynamicPersonSerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.person.CsvBasicDynamicPersonSerializer
- #ldbc.snb.datagen.serializer.staticSerializer:ldbc.snb.datagen.serializer.snb.csv.staticserializer.CsvBasicStaticSerializer
CsvComposite
- ldbc.snb.datagen.serializer.dynamicActivitySerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.activity.CsvCompositeDynamicActivitySerializer
- ldbc.snb.datagen.serializer.dynamicPersonSerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.person.CsvCompositeDynamicPersonSerializer
- ldbc.snb.datagen.serializer.staticSerializer:ldbc.snb.datagen.serializer.snb.csv.staticserializer.CsvCompositeStaticSerializer
CsvMergeForeign
- ldbc.snb.datagen.serializer.dynamicActivitySerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.activity.CsvMergeForeignDynamicActivitySerializer
- ldbc.snb.datagen.serializer.dynamicPersonSerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.person.CsvMergeForeignDynamicPersonSerializer
- ldbc.snb.datagen.serializer.staticSerializer:ldbc.snb.datagen.serializer.snb.csv.staticserializer.CsvMergeForeignStaticSerializer
CsvCompositeMergeForeign
- ldbc.snb.datagen.serializer.dynamicActivitySerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.activity.CsvCompositeMergeForeignDynamicActivitySerializer
- ldbc.snb.datagen.serializer.dynamicPersonSerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.person.CsvCompositeMergeForeignDynamicPersonSerializer
- ldbc.snb.datagen.serializer.staticSerializer:ldbc.snb.datagen.serializer.snb.csv.staticserializer.CsvCompositeMergeForeignStaticSerializer
Two 、LDBC SNB stay GES Application in
Datagen The resulting dataset is consistent with GES The format is as follows 3 Make a difference
- Different label The point of id There may be id Repetition ;
- knows The relationship is two-way ;
- No, label Column .
Use DatagenToGES Data conversion script ( be based on CsvComposite Serialization mode ) Can be LDBC Count , Need to be in python3.6 Operation in environment .
DatagenTOGES The script has the following functions :
- take 8 Node types are mapped to 1-8 A number prefix , The original id Convert to start with a numeric prefix 、 The length is 20bytes The new id, Solve the difference label Between the points of id Repetitive questions ;
- increase knows Reverse edge data for edge files ;
- increase label Column .
File format before conversion (CsvComposite Serialization mode ):


Converted file format :
DatagenToGES The conversion scale factor is 100 It takes about half an hour for a large data set .
Data conversion script core code snippet :

stay GES Import the converted LDBC SNB( The sample data is SF0.1), And implement PageRank Algorithm , The effect is as follows :

Click to follow , The first time to learn about Huawei's new cloud technology ~
边栏推荐
- About fast exponentiation
- 老师能给我说一下固收+产品主要投资于哪些方面?
- Today, Teng Xu came out with 37k during the interview. It's really a miracle. He showed me his skill
- Condom giants' sales have fallen by 40% in the past two years. What are the reasons for the decline?
- Redis CacheClient
- 手机号码的格式
- substrate 技术每周速览 20220411
- sql注入原理
- OpenSSF安全计划:SBOM将驱动软件供应链安全
- 28 object method extension
猜你喜欢

一场分销裂变活动,不止是发发朋友圈这么简单!

A distribution fission activity is more than just a circle of friends!

PSS:你距離NMS-free+提點只有兩個卷積層 | 2021論文

Teach you how to package and release the mofish Library

洛谷入门1【顺序结构】题单题解
![Beginner level Luogu 2 [branch structure] problem list solution](/img/53/d7bf659f7e1047db4676c9a01fcb42.png)
Beginner level Luogu 2 [branch structure] problem list solution

2022年最新《谷粒学院开发教程》:8 - 前台登录功能

E ModuleNotFoundError: No module named ‘psycopg2‘(已解决)

2022-2-15 learning the imitated Niuke project - Section 5 shows comments

保留有效位数;保留小数点后n位;
随机推荐
[digital signal processing] discrete time signal (discrete time signal knowledge points | signal definition | signal classification | classification according to certainty | classification according t
Numerical extension of 27es6
Indexeddb learning materials
Problems encountered in vs compilation
HTTP Caching Protocol practice
事务的四大特性
机械硬盘和ssd固态硬盘的原理对比分析
[kotlin] the next day
Derivation of Halcon camera calibration principle
If you want to use DMS to handle database permissions, can you only use Alibaba cloud ram accounts (Alibaba cloud RDS)
Use redis to automatically cancel orders within 30 minutes
关于快速幂
FPGA based analog I ² C protocol system design (with main code)
What is the London Silver unit
Design of spread spectrum communication system based on FPGA (with main code)
16 -- 删除无效的括号
Pisa-Proxy 之 SQL 解析实践
Knowledge map model
Does polardb-x open source support mysql5.7?
Knightctf 2022 web section