当前位置:网站首页>Graphical data model for system design
Graphical data model for system design
2022-06-12 17:39:00 【JavaEdge】
Many to many relationship is an important distinguishing feature between different data models . If data is mostly one to many ( Data tree structure ) Or no relationship between records , The document model is the most appropriate . But if many to many data is common , The relational model can handle simple many to many , But as the complexity of data association increases , It is more natural to transform data modeling into graph model .
The composition of a graph :
- The vertices ( Also known as nodes or entities )
- edge ( Also called relationship or arc )
A lot of data can be modeled as graphs . Typical cases :
- Social networks It's people , The edge indicates who knows each other
- Web chart The vertex is the web page , The edge represents the... With other pages HTML link
- Road or rail network The vertex is the intersection , The side indicates the road or railway line between them
Many famous algorithms can run on these graphs . For example, the car navigation system searches the shortest path between any two points in the road network , PageRank Calculation Web The popularity of the web page on the chart , To determine the search ranking .
The vertices of a graph represent things of the same type ( They are people 、 Web pages or intersections ). However , Graphs are not limited to such homogeneous data , The more powerful use of graph is to provide a consistent way to store completely different types of objects in a single data store . Such as Facebook Maintain a large graph with many different types of vertices and edges :
- Vertex includes people 、 place 、 event 、 Sign in and user comments
- The edge indicates who are friends of each other , Where does the check-in take place , Who commented on which post , Who was involved in which event
This article shows the following example
It may come from a social network or a genealogical database . The case is two people , From Idaho Lucy And from Bonn, France Alain, They got married , Currently living in London .
There are many different but related ways to build and query data in a graph . This section will discuss the attribute graph model and the ternary storage model .
Attribute map
In the attribute graph model , Each vertex includes :
- Unique identifier
- Set of edges
- A collection of incoming edges
- Collection of properties ( key - It's worth it )
Each side includes :
- Unique identifier
- The vertex where the edge begins ( Tail vertex )
- The vertex at the end of the edge ( Head vertex )
- A label that describes the type of relationship between two vertices
- Collection of properties ( key - It's worth it )
A graph can be stored as consisting of two relational tables , A vertex , The other side
This mode uses PostgreSQL JSON Store attributes for each vertex or edge ). Store head and tail vertices for each edge , If you want a collection of the in and out edges of the vertices , Can be passed separately head_vertex or tail_vertex Check it out edges surface .
Figure model highlights :
- Any vertex can be connected to any other vertex . There is no pattern that limits what can or cannot be related
- Given a vertex , All its inputs can be obtained efficiently 、 Out of the way , Thus traversing the graph , That is, go straight forward or backward along these vertex chains ( This is why the figure 2-2 In the tail_vertex and head_vertex The reason why the columns are indexed )
- By using different tags for different types of relationships , Many different types of information can be stored in a single graph , While still maintaining a clean data model
These features provide flexibility for data modeling , Pictured 1, It shows some things that are difficult to express in the traditional relationship pattern , For example, different types of regional structures in different countries ( China has provinces and regions , The United States has counties and states ), Special historical reasons and data of different granularity ( Lucy The current residence is designated as a city , Her birthplace was at the state level ).
This diagram can be extended to include many information about Lucy and Alian Other information or other people . for example , Can be used to indicate any food allergies they have ( The allergy is expressed by introducing the vertex and the edge between the person and the allergen for each allergen ), And connect hypersensitivity with the set of vertices , These vertices show which foods contain which substances . then , You can write a query to find out what everyone eats safely . The graph is conducive to evolution : When adding features to an app , Graphs can be easily extended to accommodate changing data structures .
-- use Cypher Check the list of immigrants from the United States to Europe
MATCH
(person) -[:BORN_IN]-> ()-[:WITHIN*O..]->(us:Location{name:'United States'}),
(person) -[:LIVE S_IN ]->()-[: WITHIN*O..]- > (eu:Location{name:'Eu rope'})
RETURN person.nameSQL Graph query in
Example 2 Name can use relational database to represent graph data , Does this mean support SQL Inquire about ?
The answer is yes , But there are some difficulties . In a relational database , You usually know what a query needs join operation . For graph query , Before finding the vertex you want , You may need to traverse an unknown number of edges , namely join The number of operations is unpredictable .
SQL1999 After the standard , This variable traversal path can be used in the query process “ Recursive common table expression
type ”( namely WITH RECURSIVE grammar ) To express . in 2-5 Using this technology SQL Expression to perform the same
Inquire about ( Find a list of people who immigrated from the United States to Europe ), at present PostgreSQL IBM DB2, Oracle
SQL Server Support this technology , But with Cypher Fengmubi , Grammar still seems very clumsy .
in usa(vertex_id) AS (
-- First find name The property value is United States The summit of , And make it a vertex set in_usa The first element in
SELECT vertex_id FROM vertces WHERE properties->>'name'='United States'
UNION
-- Along set in_usa All incoming edges of vertices in within, And add them to the same set , Until all incoming edges are traversed
SELECT edges.tail_vertex FROM edges
JOIN in_usa ON edges.head_vertex = in_usa.vertex_id WHERE edges.label='within'
);
-- in_europe is the set of vertex IDs of all locations within Europe
in europe(vertex_id) AS (
-- from name The property value is Europe The vertex of starts to do the same , And create a vertex set in_europe
SELECT vertex_id FROM vertices WHERE properties->>'name'='Europe'
UNION
SELECT edges.tail_vertex FROM edges
JOIN in_europe ON edges.head_vertex = in_europe.vertex_id
WHERE edges.label = 'within'
);
-- born_in_usa is the set of vertex IDs of all people born in the US
born_in_usa(vertex_id) AS (
-- Yes in_usa Every vertex in the set , Press in edge born_in To find people born somewhere in the United States
SELECT edges.tail_vertex FROM edges
JOIN in_usa ON edges.head_vertex = in_usa.vertex_id
WHERE edges.label = 'born_in'
);
-- lives_in_europe is the set vertex IDs of all people living in Europe
lives_in_europe(vertex_id) AS (
-- Similarly , Yes in_europe Every vertex in the set , Press in edge lives_in To find people who live in Europe
SELECT edges.tail_vertex FROM edges
JOIN in_europe ON edges.head_vertex = in_europe.vertex_id
WHERE edges.label = 'lives_in'
);
SELECT vertices . properties->>'name'
FROM vertices
-- join to find t ho se peo ple who were both born in the US *a nd * live i n Europe
-- Last , through join Intersect a collection of people born in the United States with a collection of people living in Europe
JOIN born_in_usa ON vertices.vertex_id = born_in_usa.vertex_id
JOIN lives_in_europe ON vertices.vertex_id = lives_in_europe.vertex_id;If same query can be written in a query language 4 Line code complete , Another query requires 29 That's ok , It is enough to show that different data models are applicable to different scenarios . therefore , It is important to choose a data model that is appropriate for the application !
Ternary storage and SPARQL
Ternary storage mode is almost equivalent to attribute graph model , Different nouns describe the same idea . For all that , Considering that there are many tools for ternary storage , They can be valuable additions to building applications , So it is worth discussing .
In ternary storage , All information is stored in a simple three part format ( The main body , Predicate , object ) . As in triples ( Jim , like , Banana ) in :
- Jim is the subject It is equivalent to the vertex in the graph
- Like is the predicate ( Verb )
- Bananas are objects The object is one of the following two :
- Values in the original data type , Such as string or number . At this time , The predicate and object of the triple are respectively equivalent to the subject ( The vertices ) Keys and values in attributes . Such as (lucy,age,33) It's like a vertex lucy, With attributes {“age”: 33}
- Another vertex in the graph . here , The predicate is the edge in the graph , The body is the tail vertex , And the object is the top of the head spot . Such as (lucy,marriedTo,alain) in , The main body lucy And object alain It's all culmination , Predicate marriedTo Is the label connecting the two sides
Example 3: With Turtle Triple representation diagram 1 Part of the data in
The vertex of a graph is written as _:someName. The name of a vertex has no meaning outside the definition file , Just to distinguish the different vertices of a triple . The predicate means "side" , The object is another vertex , Such as _:idaho :within _:usa . When the predicate represents an attribute , The object is a string , Such as _:usa :name “United States”
If you define multiple triples of the same subject , Typing the same word over and over is a bit boring . Semicolons can be used to describe the same subject Multiple object information . such Turtle The format is more concise 、 High readability . Rewrite the example in a more concise syntax 3:
The semantic web
If you read more about ternary storage , A lot of articles about semantic web are likely to be involved in the vortex .
The ternary storage data model is actually completely independent of the semantic web , for example ,Datomic It's a ternary storage , It has nothing to do with the semantic web . But considering that many people think the two are closely connected , It is necessary to clarify .
The semantic web , Essentially, it comes from a simple and reasonable idea : Websites usually publish information to human beings in the form of text and pictures , Why not publish the information to the computer in a machine-readable format ? Resource Description Framework (Resource Description Framework, RDF) It's such a mechanism , It allows different websites to publish data in a consistent format , In this way, the data of different websites are automatically merged into a data network , An Internet level database containing all data .
But the semantic web is 21 It was greatly exaggerated at the beginning of the century , Up to now, we have not seen any reliable implementation in practice , So many people began to doubt . in addition , There are other criticisms , Including a dizzying array of acronyms 、 Extremely complex standard proposal , And arrogant flaunting .
边栏推荐
- Codeforces Round #398 (Div. 2) D. Cartons of milk
- 406. 根据身高重建队列
- 三代DRI的变化
- [CSP]202012-2期末预测之最佳阈值
- (6) Control statement if/else switch
- Unprecedented analysis of Milvus source code architecture
- R语言使用plot函数可视化数据散点图,使用font.axis参数指定坐标轴刻度标签的字体类型为斜体字体(italic)
- Figma from getting started to giving up
- R language calculates data Table specifies the mean value of a numeric variable when the value of one grouped variable is fixed and another grouped variable
- Introduction to several common functions of fiddler packet capturing (stop packet capturing, clear session window contents, filter requests, decode, set breakpoints...)
猜你喜欢
随机推荐
1723. minimum time to complete all work
SqlServer常用语句及函数
Yyds dry goods inventory leetcode question set 911 - 920
Introduction of one object one code tracing system
Cicada mother talks to rainbow couple: 1.3 billion goods a year, from e-commerce beginners to super goods anchor
Crazy temporary products: super low price, big scuffle and new hope
数组按指定顺序排序
Tidb Hackathon 2021 - pcloud: conduct icloud pcloud team interview on the database
AlibabaProtect.exe如何删除、卸载
How to win the "Olympic Games" in retail technology for jd.com, the learning tyrant of the "regular examination"?
R语言使用plot函数可视化数据散点图,使用font.axis参数指定坐标轴刻度标签的字体类型为斜体字体(italic)
The R language uses the tablestack function of epidisplay package to generate statistical analysis tables based on grouped variables (including descriptive statistical analysis, hypothesis test, diffe
(4) Golang operator
写技术博客的意义
R语言使用pdf函数将可视化图像结果保存到pdf文件中、使用pdf函数打开图像设备、使用dev.off函数关闭图像设备、自定义width参数和height参数指定图像的宽度和高度
R语言使用epiDisplay包的aggregate.plot函数可视化每个子集的汇总统计信息(可视化基于单个分组下的阳性指标的概率值及其95%置信区间、基于折线图、仅仅适用于目标类别为二分类)
qemu+gdb小节
Tensorflow reads data from the network
Vulnhub[DC3]
Sizepolicy policy in layout management







![Vulnhub[DC3]](/img/3a/1aa03e804d447d38e85807928fdb8f.png)
