当前位置：网站首页>Graphical data model for system design

Graphical data model for system design

2022-06-12 17:39:00 【JavaEdge】

Many to many relationship is an important distinguishing feature between different data models . If data is mostly one to many （ Data tree structure ） Or no relationship between records , The document model is the most appropriate . But if many to many data is common , The relational model can handle simple many to many , But as the complexity of data association increases , It is more natural to transform data modeling into graph model .

The composition of a graph ：

The vertices （ Also known as nodes or entities ）
edge （ Also called relationship or arc ）

A lot of data can be modeled as graphs . Typical cases ：

Social networks It's people , The edge indicates who knows each other
Web chart The vertex is the web page , The edge represents the... With other pages HTML link
Road or rail network The vertex is the intersection , The side indicates the road or railway line between them

Many famous algorithms can run on these graphs . For example, the car navigation system searches the shortest path between any two points in the road network , PageRank Calculation Web The popularity of the web page on the chart , To determine the search ranking .

The vertices of a graph represent things of the same type （ They are people 、 Web pages or intersections ）. However , Graphs are not limited to such homogeneous data , The more powerful use of graph is to provide a consistent way to store completely different types of objects in a single data store . Such as Facebook Maintain a large graph with many different types of vertices and edges ：

Vertex includes people 、 place 、 event 、 Sign in and user comments
The edge indicates who are friends of each other , Where does the check-in take place , Who commented on which post , Who was involved in which event

This article shows the following example

It may come from a social network or a genealogical database . The case is two people , From Idaho Lucy And from Bonn, France Alain, They got married , Currently living in London .

There are many different but related ways to build and query data in a graph . This section will discuss the attribute graph model and the ternary storage model .

Attribute map

In the attribute graph model , Each vertex includes ：

Unique identifier
Set of edges
A collection of incoming edges
Collection of properties （ key - It's worth it ）

Each side includes ：

Unique identifier
The vertex where the edge begins （ Tail vertex ）
The vertex at the end of the edge （ Head vertex ）
A label that describes the type of relationship between two vertices
Collection of properties （ key - It's worth it ）

A graph can be stored as consisting of two relational tables , A vertex , The other side

This mode uses PostgreSQL JSON Store attributes for each vertex or edge ）. Store head and tail vertices for each edge , If you want a collection of the in and out edges of the vertices , Can be passed separately head_vertex or tail_vertex Check it out edges surface .

Figure model highlights ：

Any vertex can be connected to any other vertex . There is no pattern that limits what can or cannot be related
Given a vertex , All its inputs can be obtained efficiently 、 Out of the way , Thus traversing the graph , That is, go straight forward or backward along these vertex chains （ This is why the figure 2-2 In the tail_vertex and head_vertex The reason why the columns are indexed ）
By using different tags for different types of relationships , Many different types of information can be stored in a single graph , While still maintaining a clean data model

These features provide flexibility for data modeling , Pictured 1, It shows some things that are difficult to express in the traditional relationship pattern , For example, different types of regional structures in different countries （ China has provinces and regions , The United States has counties and states ）, Special historical reasons and data of different granularity （ Lucy The current residence is designated as a city , Her birthplace was at the state level ）.

This diagram can be extended to include many information about Lucy and Alian Other information or other people . for example , Can be used to indicate any food allergies they have （ The allergy is expressed by introducing the vertex and the edge between the person and the allergen for each allergen ）, And connect hypersensitivity with the set of vertices , These vertices show which foods contain which substances . then , You can write a query to find out what everyone eats safely . The graph is conducive to evolution ： When adding features to an app , Graphs can be easily extended to accommodate changing data structures .

--  use  Cypher  Check the list of immigrants from the United States to Europe 

MATCH 
  (person) -[:BORN_IN]-> ()-[:WITHIN*O..]->(us:Location{name:'United States'}),

  (person) -[:LIVE S_IN ]->()-[: WITHIN*O..]- > (eu:Location{name:'Eu rope'}) 
RETURN person.name

SQL Graph query in

Example 2 Name can use relational database to represent graph data , Does this mean support SQL Inquire about ？

The answer is yes , But there are some difficulties . In a relational database , You usually know what a query needs join operation . For graph query , Before finding the vertex you want , You may need to traverse an unknown number of edges , namely join The number of operations is unpredictable .

SQL1999 After the standard , This variable traversal path can be used in the query process “ Recursive common table expression

type ”（ namely WITH RECURSIVE grammar ） To express . in 2-5 Using this technology SQL Expression to perform the same

Inquire about （ Find a list of people who immigrated from the United States to Europe ）, at present PostgreSQL IBM DB2, Oracle

SQL Server Support this technology , But with Cypher Fengmubi , Grammar still seems very clumsy .

in usa(vertex_id) AS (
  --  First find name The property value is United States The summit of , And make it a vertex set in_usa The first element in 
	SELECT vertex_id FROM vertces WHERE properties->>'name'='United States'
  UNION
  	--  Along set in_usa All incoming edges of vertices in within, And add them to the same set , Until all incoming edges are traversed 
    SELECT edges.tail_vertex FROM edges
  		JOIN in_usa ON edges.head_vertex = in_usa.vertex_id WHERE edges.label='within'
);

-- in_europe is the set of vertex IDs of all locations within Europe 
in europe(vertex_id) AS (
  --  from name The property value is Europe The vertex of starts to do the same , And create a vertex set in_europe
  SELECT vertex_id FROM vertices WHERE properties->>'name'='Europe'
	UNION 
		SELECT edges.tail_vertex FROM edges 
  		JOIN in_europe ON edges.head_vertex = in_europe.vertex_id 
			WHERE edges.label = 'within'
);

-- born_in_usa is the set of vertex IDs of all people born in the US 
born_in_usa(vertex_id) AS (
  --  Yes in_usa Every vertex in the set , Press in edge born_in To find people born somewhere in the United States 
	SELECT edges.tail_vertex FROM edges
		JOIN in_usa ON edges.head_vertex = in_usa.vertex_id 
		WHERE edges.label = 'born_in'
);

-- lives_in_europe is the set vertex IDs of all people living in Europe 
lives_in_europe(vertex_id) AS (
  --  Similarly , Yes in_europe Every vertex in the set , Press in edge lives_in To find people who live in Europe 
  SELECT edges.tail_vertex FROM edges 
		JOIN in_europe ON edges.head_vertex = in_europe.vertex_id 
		WHERE edges.label = 'lives_in'
);

SELECT vertices . properties->>'name'
FROM vertices 

-- join to find t ho se peo ple who were both born in the US *a nd * live i n Europe
--  Last , through  join  Intersect a collection of people born in the United States with a collection of people living in Europe 
JOIN born_in_usa ON vertices.vertex_id = born_in_usa.vertex_id
JOIN lives_in_europe ON vertices.vertex_id = lives_in_europe.vertex_id;

If same query can be written in a query language 4 Line code complete , Another query requires 29 That's ok , It is enough to show that different data models are applicable to different scenarios . therefore , It is important to choose a data model that is appropriate for the application ！

Ternary storage and SPARQL

Ternary storage mode is almost equivalent to attribute graph model , Different nouns describe the same idea . For all that , Considering that there are many tools for ternary storage , They can be valuable additions to building applications , So it is worth discussing .

In ternary storage , All information is stored in a simple three part format （ The main body , Predicate , object ） . As in triples （ Jim , like , Banana ） in ：

Jim is the subject It is equivalent to the vertex in the graph
Like is the predicate （ Verb ）
Bananas are objects The object is one of the following two ：
- Values in the original data type , Such as string or number . At this time , The predicate and object of the triple are respectively equivalent to the subject （ The vertices ） Keys and values in attributes . Such as （lucy,age,33） It's like a vertex lucy, With attributes {“age”: 33}
- Another vertex in the graph . here , The predicate is the edge in the graph , The body is the tail vertex , And the object is the top of the head spot . Such as （lucy,marriedTo,alain） in , The main body lucy And object alain It's all culmination , Predicate marriedTo Is the label connecting the two sides
Example 3： With Turtle Triple representation diagram 1 Part of the data in

The vertex of a graph is written as _:someName. The name of a vertex has no meaning outside the definition file , Just to distinguish the different vertices of a triple . The predicate means "side" , The object is another vertex , Such as _:idaho :within _:usa . When the predicate represents an attribute , The object is a string , Such as _:usa :name “United States”

If you define multiple triples of the same subject , Typing the same word over and over is a bit boring . Semicolons can be used to describe the same subject Multiple object information . such Turtle The format is more concise 、 High readability . Rewrite the example in a more concise syntax 3：

The semantic web

If you read more about ternary storage , A lot of articles about semantic web are likely to be involved in the vortex .

The ternary storage data model is actually completely independent of the semantic web , for example ,Datomic It's a ternary storage , It has nothing to do with the semantic web . But considering that many people think the two are closely connected , It is necessary to clarify .

The semantic web , Essentially, it comes from a simple and reasonable idea ： Websites usually publish information to human beings in the form of text and pictures , Why not publish the information to the computer in a machine-readable format ？ Resource Description Framework （Resource Description Framework, RDF） It's such a mechanism , It allows different websites to publish data in a consistent format , In this way, the data of different websites are automatically merged into a data network , An Internet level database containing all data .

But the semantic web is 21 It was greatly exaggerated at the beginning of the century , Up to now, we have not seen any reliable implementation in practice , So many people began to doubt . in addition , There are other criticisms , Including a dizzying array of acronyms 、 Extremely complex standard proposal , And arrogant flaunting .

原网站

版权声明
本文为[JavaEdge]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/163/202206121731438583.html