Property graphs (PG) and RDF graphs are two popular database graph models, but they are not interoperable: data modeled in PG cannot be directly integrated with other data modeled in RDF. This lack of interoperability also impedes the use of the tools of one model when data are modeled in the other.
In this paper, we propose PRSC, a configurable conversion to transform a PG into an RDF graph. This conversion relies on PG schemas and user-defined mappings called PRSC contexts. We also formally prove that a subset of PRSC contexts, called well-behaved contexts, can be used to reverse back to the original PG, and provide the related algorithm. Algorithms for conversion and reversion are available as open-source implementations.
Graphs are a popular data model in which knowledge is represented through objects and links between these objects. Today, there are two mainly used models of graphs: Property Graphs and RDF.
Property Graphs (PGs) are a family of implementations, in which data are represented with nodes and edges, and labels and properties (key-value pairs) can be attached to these nodes and edges. Property Graphs are not a uniform model: some implementations like Neo4j 1
https://neo4j.com/
only allow exactly one label for each edge. Most PG engines offer easy-to-use graph query languages like Cypher [16] and Gremlin [28] that rely on graph traversal. While no uniform standard has been settled yet, the Property Graph needs a Schema Working Group2
is working towards defining a schema language for PG, and a unified formalization of the PG model. In the rest of this paper, following other authors [2], the variability of implementations is neglected and PGs are considered as a uniform model.
Another popular graph model is the Resource Description Framework (RDF) model [34]. In this model, data are represented with triples that represent links between resources. The resources and the links between them are identified through Internationalized Resource Identifiers (IRI). This data model is a W3C standard, and has been studied through a large quantity of works, like RDFS [9] and OWL [24] for inference, or SHACL [22] for data validation. This model has been extended by RDF-star [20] that helps writing properties on triple terms in a more concise manner, but does not provide exactly the same modeling capabilities as PGs. For example, multiple links with the same predicate between two resources are still not supported and requires using extra constructs, like the occurrence pattern described in the “Triples and occurrences” section of the RDF-star community group final report [19].
While PG and RDF are both based on the idea of using graph data, the choice of one removes the ability to use the tools developed for the other. The maintainers of Amazon Neptune, a graph database service that can support both models independently, report that their users choose the solution that best suits their current use case, then struggle because they are stuck with the tools of this model even if the tools of the other model would better answer the new business problem they have [23]. Generally speaking, this diversity of graph models, and more precisely the lack of interoperability, hinders graph database adoption.
Numerous authors [14,29] highlight the benefits of interoperable systems, and in particular of interoperability between PGs and RDF graphs [15,32,35]. We can distinguish two levels of interoperability: syntactic interoperability and semantic interoperability. The former consists in having common data formats, for example JSON or CSV. The latter consists in having data not only in a common data format, but also that use shared vocabularies. Tim Berners-Lee theorized the 5-star Open-data model.3
In this model, datasets are given a number of stars depending on how interoperable they are. RDF graphs are by design the best candidate to elevate the ranking of data in the 5-star data-model, in particular through the use of shared vocabularies. So, while internally developers may choose RDF or PG databases based on their preferences and the convenience of associated tools, RDF is better suited as a pivot model for interoperable exchange of data.
Among the many questions risen by the interoperability between PGs and RDF graphs, the scenario we want to support is the conversion from PGs to RDF (syntactic interoperability) without information loss, and while promoting the use of existing shared vocabularies (semantic interoperability). The requirement of using of existing shared vocabularies requires a converter to be highly customizable, in particular in terms of how the produced triples are structured. On the other hand, the requirement of not losing information, i.e. producing reversible conversions, requires to formally study the possible conversions.
In a previous paper [10], we introduced the motivations behind PREC (PG to RDF Experimental Converter), a user-configured mapping from Property Graphs (PG) to RDF graphs and proposed a mapping language, specialized for the PG to RDF conversion problem, to let the user describe how to convert the node labels, the edges and the properties of the original PG to an RDF graph. By converting the data stored in PGs into RDF, users are then able to use all the tools available for RDF. In this paper, we introduce a new mapping language, named PRSC,4
PG to RDF: Schema-driven Converter, pronounced “presque”.
driven by a schema and a description of how to convert the elements of the types in the schema to RDF. This mapping language is formally defined, and conditions under which the conversion produced by PRSC is reversible are also defined. Compared to the previous mapping language, this language contains fewer terms and concepts, and should be easier to use. The PRSC engine is available under the MIT license5
and can connect both to a Cypher endpoint and a Gremlin endpoint.
The rest of this paper is organized as follows. Section 2 gives an overview of PRSC to understand its principles. Section 3 gives generic formal definitions of PGs and RDF graphs. Section 4 gives a formal definition of a PRSC conversion, which is essentially a formal definition of Section 2. Section 5 studies reversibility, and proves that some contexts can convert PG to RDF graph without information loss. Section 6 discusses the existing works to make easier interoperability between PGs and RDF graphs relatively to PRSC. Section 7 discusses the proposed solution and describes some future works.
PRSC in practice
The Property Graph exposed on Fig. 1 describes the relationship between Tintin and Snowy. It is composed of two nodes. The first one holds the label Person and two properties: the first property has “name” as its key and “Tintin” as its value, the second has “job” as its key and “Reporter” as its value, or more simply its name is “Tintin” and its job is “Reporter”. The other node only has one property: the name “Snowy”. These two nodes are connected by an edge that holds one label, TravelsWith, and a property that tells that it is “since” “1978”.
A similar example represented in RDF-star is exposed on Listing 1. Most information that was in the PG is represented by the triples in lines 1–4 and 6. The information about since when Tintin travels with Snowy is represented through a nested RDF-star triple.
A small PG about Tintin that serves as a running example in this paper.
An example of an RDF-star graph in Turtle Format
Using the user-defined mapping, PRSC is able to convert the PG in Fig. 1 into the RDF-star graph in Listing 1, and more generally any Property Graph with the same schema into the corresponding RDF graph. In this paper, we consider that Property Graphs with the same schema as the one in Fig. 1 are Property Graphs where all nodes have either (1) the “Person” label, a “name” property and a “job” property, (2) no label and a “name” property, (3) and where all edges have the “TravelWith” label and a “since” property. The mapping the user must provide to the PRSC engine is exposed on Listing 2 and is in Turtle-star format [27]. Rules are split in two parts:
The target part that describes which elements of the Property Graph are targeted. The target is described depending on three criteria: (1) whether the element must be an edge or a node, (2) the labels and (3) the properties of the element.
The production part that describes the triples to produce with a list of template triples. Values in the namespace are mapped to the blank node in the resulting RDF graph. The literals that use as their datatype are converted to the property values in the RDF graph.
The PRSC context that maps the PG running example to the RDF graph running example
The mapping, named PRSC context, exposed on Listing 2 reads as follows:
The first rule is named _:PersonRule (line 1)
The rule is used for all PG nodes (line 3) that only have the node label “Person” (line 4) and have the properties “name” and “job” (line 5). In our example, the node corresponding to Tintin matches this description, but Snowy does not as it misses the Person label and the job property.
It will produce three triples:
One triple with a blank node as its subject, rdf:type as its predicate and ex:Person as its object (line 8). Each node from the Property Graph is identified by a distinct blank node. In this example, _:n1 rdf:type ex:Person will be produced.
Another triple with the same blank node as its subject, foaf:name as its predicate and a literal that matches the value of the name property in the PG (line 9). The PRSC engine converts all literals whose datatype is prec:valueOf into the value of the corresponding property in the PG. In this example, _:n1 rdf:type "Tintin" will be produced.
One last triple is produced with the same blank node as its subject, ex:profession as its predicate and a literal corresponding to the value of the property job (line 10). In this example, _:n1 ex:profession "Reporter" will be produced.
The second rule is named _:NamedRule (line 12).
It is applied to nodes (line 14) that have no labels and only one property: name (line 15). This is the case of the PG node used to describe Snowy but not the one that describes Tintin as it has an extra label and an extra property.
These PG nodes will be converted into one triple with a blank node that identifies the PG node as its subject, foaf:name as its predicate and the literal that correspond to the value of the name property as its object (line 18). In this example, the triple _:n2 foaf:name "Snowy" is produced.
The third rule is named _:TravelsWithRule (line 20):
It is used to convert edges (line 22) whose only label is “TravelsWith” (line 23) and with one and only one property named “since” (line 24).
These edges are converted by producing a triple with the identifier of the source PG node as the subject, ex:isTeammateOf as the predicate and the identifier of the destination PG node as the object (line 27). In this example, the triple _:n1 ex:isTeammateOf _:n2 is produced.
A triple with a quoted triple is created by the rule on line 28: the triple that was created by the line 27 is used on the subject position of the triples created by this triple, ex:since is used as the predicate and the value of the “since” property is used as the object. In our example, the triple << _:n1 ex:isTeammateOf _:n2 >> ex:since 1978 is produced.
Note that in this example, pvar:self is not used in lines 27 and 28. If it was used, it would be mapped to a blank node that identifies the edge. The consequence of not using it is a smaller RDF graph, at the cost of if several PG edges with the “TravelsWith” label were present between the two same nodes, the RDF representation of these edges would have been merged.
Note that this mechanism of using quoted triples to describe templates in a pure Turtle-star file was already presented in our previous work [10]. Compared to R2RML [14], it lets the user describe the triples to produce with a syntax closer to the triples that will actually be produced, but it is currently impossible to express templated terms in any position. For example, if the user wants to generate a named node <http://example.org/person/{name}> in subject position in which the {name} part is substituted with the value of the name property, R2RML offers it as a native feature. As PRSC is currently unable to generate arbitrary IRIs, nodes and edges from the PG are always mapped to blank nodes and minting IRIs is left for future works.
General definitions
This section introduces some standard definitions, mostly inspired from previous works.
Notations and conventions
Let be the set of all strings. Strings are noted between quotes. For example, , , and are strings.
(Domain and image of a function).
For all partial functions , and are defined as follows:
.
For the partial function , with , .
Let S be a set, we recall that denotes the set of all parts of S.
Compatible functions
For all functions f, we recall that they can be seen as sets: . For all sets S of 2-tuples, S can be seen as a function iff (if and only if) .
Consider the three functions , , exposed in Table 1.
Some functions defined both with the usual function notation and with a set notation
Function notation
Set notation
As , and can be defined with a set, it is possible to use the usual set operations.
The set is a function: the first element of all tuples has a different value. Using a function notation, it may be written as:
On the opposite, is not a function. Both and are members of the set , would be equal to both and which are different values.
Instead of using the notation , the notation is sometimes used to clarify the fact that a set is a function. For example, may be noted as .
(Functions compatibility).
Two functions f and g are compatible iff is a function, i.e..
In other words, two functions f and g are compatible iff for every common input, they share the same output i.e., .
Property Graph
(Property Graph).
Following the definition of Angles [2], a property graph is defined as the tuple , where:
and are finite sets with . and are respectively the set of nodes and the set of edges of the property graph .
and are two total functions. These two functions map each edge to its starting and destination nodes.
is a total function. This function maps the nodes and edges to their sets of labels.
is a partial function. This function describes the properties of the elements. V is the set of all possible property values. Considering that a property is a key-value pair, it expects two inputs: a node or an edge, and a property key. The output is the property value.
The set of all property graphs is denoted .
In this paper, property graph nodes and edges are grouped under the term element.
In the rest of the paper, when any PG x is introduced, we allow ourselves to use the symbols , , , , and without explicitly defining x as the tuple .
(Running example of a Property Graph).
The PG exposed on Fig. 1 can formally be defined as the PG denoted with
(The empty PG).
The empty PG, which is the PG that contains no nodes and no edges, is formalized as follows: with , and .
Renaming Property Graphs and isomorphism
The chosen formal definition of the running example is not the only one that is possible: for example an arbitrary element named a could have been used in place of as the first listed node identifier in Example 3.
(Renaming function).
For all sets , , , where and , a renaming from and to and is a bijective function where , , .
Let be another formal definition of the PG in Fig. 1 that uses the sets and .
An example of a renaming function from to is .
(Property Graph renaming).
Let be a property graph, and be two sets and ϕ be a renaming function from and to and . We define the renamed PG as follows:
Let us consider back , the PG about Tintin defined in Example 3, the other formalization of the same PG and the renaming function introduced in the Example 4.
The PG produced by is
(Isomorphic property graph).
Two PGs and are isomorphic iff there exists a renaming function ϕ from and to and such that . Note that because renaming functions are bijectives, it implies that exists and .
Note that both and from Examples 3 and 4 match the graphical representation given in Fig. 1. An informal way to define the isomorphism between two PGs is to check if they have the same graphical representation.
Existing works [16,33] on PG query languages focus on extracting the properties of some nodes and edges, and never look for the exact identity of the elements. It is therefore possible to affirm that the exact identity is not important, and that if two PGs are isomorphic, they are the same PG for practical matter.
RDF-star definition
(Atomic RDF terms).
Let I be the infinite set of IRIs, be the set of literals and B be the infinite set of blank nodes. The sets I, L and B are disjoint.
IRIs, literals and blank nodes are grouped under the name “Atomic RDF terms”.
Notation: In the examples, the IRIs, the elements of I, will be either noted as full IRIs between brackets, e.g.<http://example.org/Tintin> or by using prefixes to shorten the IRI e.g.ex:Tintin. The list of prefixes used in this paper is described in Table 2.
Literals, the elements of L, can be noted either by using the usual tuple notation, e.g. or with the compact notation .
Finally, the blank nodes, the elements of B, are denoted by blank node labels prefixed with the two symbols “” e.g., or .
List of prefixes used in this paper
Prefix
IRI
rdf
http://www.w3.org/1999/02/22-rdf-syntax-ns#
xsd
http://www.w3.org/2001/XMLSchema#
ex
http://example.org/
prec
http://bruy.at/prec#
pvar
http://bruy.at/prec-var#
(RDF(-star) triples and graphs).
The set of all RDF triples6 is denoted and is defined as follows:
, , , .
, , and for all , and defined as above, , and are members of .
A subset of is an RDF graph.
Both the atomic RDF terms defined in Definition 8 and RDF triples are terms. A triple used in another triple, in subject or object position, is a quoted triple. An RDF triple can not contain itself and can not be nested infinitely.
For the sake of readability, although RDF-star is not yet part of the official RDF recommendation [34], we conflate RDF-star and RDF in this paper. When we mention an RDF triple or an RDF graph, we allow them to contain quoted triples.
The triple is an element of . Its Turtle representation is ex:tintin rdf:type ex:Person.
The RDF graph exposed in Listing 1 is composed of 5 triples written in Turtle format. In our formalism, the second triple, _:tintin foaf:name "Tintin", is .
is an element of .
is an element of that has a quoted triple in subject position.
(Term membership).
The ∈ operator is extended to triples to check if a term is part of a triple.
(Term membership examples).
.
.
(List of blank nodes used in an RDF graph).
For every RDF graph , is the set of blank nodes in i.e., .
PRSC enables the user to convert any Property Graph to an RDF graph by using user-defined templates.
Property graphs with blank nodes
By default, every node and edge of the PG to convert is mapped by PRSC to an arbitrary fresh blank node. This mapping is arbitrary because blank nodes are essentially interchangeable, and because they have no global identifiers which would allow to map to a specific blank node anyway. This mapping is used throughout the transformation. For the sake of conciseness, we introduce the following modelling trick.
We note that in the respective definitions of PGs and RDF, the sets N and E of nodes and edges (in any PG) and the global set B of blank nodes (in RDF), are very loosely characterized. The only constraints are that N and E are disjoint and finite, and that B is disjoint from the sets of IRIs and literals. Theoretically, nothing prevents a property graph to take its nodes and edges in the set B, in other words, to have and .
Our arbitrary mapping from to fresh blank nodes can then be used as a renaming function to build a new PG that is isomorphic to the original one. As noted in Section 3.3.1, the two PGs are indistinguishable for any practical purpose, including the transformation to an RDF graph. Without loss of generality, all the definitions and algorithms in this paper expect PGs whose nodes and edges are elements of B. This amounts to assume that the arbitrary mapping to blank nodes has been set in advance, and is carried by the PG to transform: every element of is identified with the blank nodes it maps to.
Conversely, when we study the reversibility of some contexts in Section 5, we prove that the produced BPG is exactly the original one. Producing a PG with arbitrary elements, and proving that it was isomorphic to the original, would be much more complex.
(Blank Node Property Graph).
is the set of property graphs with blank nodes only (BPG), i.e..
By defining the renaming function , it is possible to build the BPG :
By construction, is isomorphic to , and as , .
This may raise the question of the meaning of two BPGs that share the same blank nodes, in particular if one blank node is used as a node in one BGP and as an edge in the other. However, this question could also be raised for all PGs: what would it mean for two PGs to have common elements in their respective sets of nodes and edges? In general, it would not hold any semantics, as PG elements are considered locally for a given PG. However, in Section 5.3.4, we will define operators for PGs, even the ones without blank nodes, that will consider PGs with shared PG elements and their possible relations.
Type of a PG element and PG schemas
We define the type of a PG element and PG schemas as follows. Let be a PG.
(Property keys of an element).
We recall that in Section 2 and in Definition 3, properties are described as key-value pairs.
is the function that maps a PG element (node or edge) of the PG to the list of property keys for which it has a value, i.e., with , .
(Type of a PG element).
A type is a triple composed of 1) the kind of the PG element, i.e. if it is a node or an edge, 2) a set of labels and 3) a set of property keys. The set of all types is denoted and is defined as .
The type of an element is
A set of PG types is named a schema.
The functions , and are defined for types such that , , , .
Table 3 shows the types of the PG elements in the running example.
The types of the elements in the PG
m
If two PGs and are isomorphic, their elements share the same type.
Indeed, by definition, there exists a renaming function ϕ from the elements of to the elements of , and , .
Template triples
PRSC resorts to a mechanism of templating: to produce an RDF graph from a PG, we use tuples of three elements, named template triples, that will be mapped to proper RDF triples.
(Placeholders).
There are four distinct elements, not included in either of the previously defined sets, named , , and .7
In practice, our implementation of this paper maps to the IRI prec:valueOf and all terms prefixed with ? to the pvar namespace. Examples given in Turtle reflect the implementation instead of fully fitting the theoretical definitions.
Let . elements serve as placeholders that will be replaced by the blank nodes that represent the nodes and edges in the PG.
Let . Elements of P can be noted with the same syntax as literals, for example is the pair . Each element of P serves as a placeholder to be replaced with an RDF literal that represents the value of a property in the PG.
(Template triples).
A template triple is a member of and is defined as follows:
, , , .
, , and for all , and defined as above, , and are members of .
Note that unlike RDF triples, the elements of can not contain blank nodes but can contain placeholders: members can be used in subject (first) and/or object (third) position as they will be mapped later to blank nodes, and members of P are allowed in object position as they will be mapped later to literals. Similarly to RDF triples, template triples can not contain themselves and can not be nested infinitely.
Any subset of is named a template graph. The PRSC engine will use template graphs to produce RDF graphs.
The triple is both an element of and an element of . Indeed, both and allow IRIs in the subject, predicate and object positions.
The triple is a member of but not of . The first member is an element of , which is only allowed in template triples. The two other members are IRIs, and is a subset of .
The triple is also a member of but not of . While the first two members are IRIs, the third member is an element of P. Again, elements of P are only allowed by template triples.
(Placeholders and template triple membership).
The ∈ operator, which we extended in Definition 10, is further extended to placeholders and template triples:
PRSC context
In this paper, the notion of PRSC context is the keystone to let the user drive the conversion from a PG to an RDF graph. It maps PG types to template graphs. The algorithm proceeds by looping on each node and edge of the PG, computing its type, finding the associated template graph in the context, and replacing the placeholders of this template graph with data extracted from the PG to produce an RDF graph.
(PRSC Context).
A PRSC context is a partial function that maps types to template graphs.
All template graphs must be valid, i.e. for all types, the placeholders used in the associated template graph must be consistent with the type: (1) for any given property key, its associated placeholder may only be used in template graphs associated with types that contain the property key, for example the placeholder may only be used if the property key is in the type associated to this template; (2) and templates associated to node types are not allowed to use the placeholders and , as they are related to the source or the destination of an edge.
Formally, all template graphs used by a context are valid iff :
, .
.
The set of all context functions is denoted .
(Complete PRSC contexts for a given PG).
A PRSC context is said complete for a property graph iff there is a template graph defined for each type used in . The set of all complete contexts for a PG is noted .
Note that the type system described in Definition 14 is trivial to resolve as the type of a PG element m, denoted by , only depends on its kind (node or edge), the list of its labels and the list of its property keys. For this reason, checking if a PRSC context is complete for a property graph is also trivial as it consists in computing the type of each PG element of and checking if all types are in the domain of .
Table 4 exposes an example of a complete context function for our running example. First, the function is a context as all template graphs are valid: The placeholders and are only used in types with the associated property key. The fact that the property key in the third type has no associated placeholder occurrence in the template graph does not invalidate the context. The placeholders and are only used in the third type, which is an edge type. Then, all three types used by our running example have an associated template graph, so it is a complete context for the PG exposed in Example 1.
An example of a complete context for the Tintin Property Graph
The function exposed in Table 5 is not complete for the PG as its domain lacks the type of and the type of .
An incomplete context for the Tintin PG
The function exposed in Table 6 is not a context because , which is used in the template graph mapped to the first listed type , is not a value in .
A function that is not a context
Application of a PRSC context on a PG
We now define formally the conversion operated by PRSC. A PRSC conversion of a PG depends on a chosen context .
(Property value conversion).
For the conversion of property values to literals, we consider that we have a fixed total injective function , common for all PGs and contexts. We suppose that is reversible, i.e. we are able to compute .
(The function).
The operation that produces an RDF graph from the application of a PRSC context on a property graph is noted . The result of the function is the union of the RDF graph built by converting all elements of the PG, into RDF. The conversion of a single element is materialized by the function.
, , , with defined as follows:
As said previously, the result of is the union of the graphs produced by , i.e.
Table 7 exposes the resolution of on the running example.
Application of a PRSC context on the running example
m
The resolution of is as follows:
Algorithm 1 gives an algorithmic view of the function presented by Definition 21.
The function
Complexity analysis
In this section, we discuss the different metrics that can be used to evaluate the complexity and evaluate the complexity of the function.
Functions considered constant
For a given PG , the complexity of the functions , , and are considered constant.
The complexity of the functions , the and the is also considered constant.
Evaluating if something is a member of a given set, for example if an entity is part of the set , is generally considered to be constant time thanks to hash maps.8
Inserting and searching in a hash map is not strictly speaking a constant time operation but has an amortized constant complexity, and is linear in the worst case.
Considered metrics
For a given PG and a given context , the following metrics are considered:
The number of nodes and edges in , denoted .
The size of the biggest template graph, denoted .
The complexity of the types in the context, denoted , reflected by the number of labels and the number of properties of the type with the highest number of labels and properties. Note that since the context has to be valid, i.e. all elements of the PG must have their type in the context, is also an upper bound of the type complexity of the types in the PG.
The number of types supported by the context .
In RDF-star, quoted triples can be used as subject or object of other triples, without limit on how deeply triples can be nested. In practice, however, it is rare to have more than one level of nesting. Usually, users are expected to use atomic RDF triples like _:tintin :travelsWith _:haddock or to use RDF-star triples with a depth of one like << _:tintin :travelsWith _:haddock >> :since 1978. We therefore consider the depth of any triple to be bound by a constant. As a consequence, in all functions processing terms and triples recursively (such as β in Algorithm 1), we can ignore the recursion depth in the complexity analysis.
In all complexity analyses, all metrics are considered non null. Indeed, if there are no elements or if the biggest template graph is empty, the produced RDF graph will be empty so this case is not interesting. As we add one to the number of labels and properties, the type complexity can never be zero, even if the context only supports nodes and edges with no labels and no properties.
Complexity of calls in the function
For each given PG element m, the complexity of a call to is :
The type of m in the PG must be computed. has a complexity of :
Evaluating if m is a node or an edge is constant as checking if an element is a member of a set is constant.
Calls to and are considered constants in Section 4.6.1.
In the complexity analysis, we consider that is implemented as a hash map from the types to the template graphs. A call would first need to compute the hash of the type. To do so, it has to look at all the labels and properties in the type in a deterministic order; for this, the labels and keys needs to be sorted, which has a complexity of . After the hash has been computed, the cost of retrieving the template graph has an amortized constant complexity. The overall complexity of a call is .
Complexity of
Given a PG and a context ,
Calls to in line 4 have a complexity of .
Calls to the β function in line 7 are constant, as it has been assumed that the depth of the most nested triple is low enough to be ignored and the operations it performs are in constant time.
There are two for loops, one iterating on all PG elements () and one iterating on all template triples of a template graph (). All instructions in the but the one on line 4 are computed in constant time. Line 4 is inside the first loop but outside the second loop.
The function has an complexity.
PRSC reversibility
When PGs are converted into RDF graphs, an often desired property is to not have any information loss. To determine whenever or not a conversion induces information loss is to check if the conversion is reversible, i.e. if from the output, it is possible to compute back the input. The reversion is studied relatively to the used PRSC contexts: the PRSC context is used as both an input of both the PRSC algorithm and the reversion algorithm. In other words, we consider that the information stored in the PRSC context do not need to be stored in the produced RDF graph to produce a reversible conversion.
This section first shows that not all PRSC contexts are reversible. Then, properties are exhibited about PRSC contexts, leading to a description of a subset of reversible PRSC contexts, i.e. contexts that we prove do not induce information loss.
Reversibility in this paper
In this paper, we call a function f reversible if we can find back x in practice from . This implies that:
The function f must be injective. Indeed, if two different values x and can produce the same value y, it is impossible to know if the value responsible for producing y was x or .
The inverse function must be computable and tractable in reasonable time. By counter-example, a public-key encryption function is supposed to be injective. It is theoretically possible, although prohibitively costly, to decipher a given message by applying the encryption function on all possible inputs until the result is the original encrypted message. This is not the kind of “reversibility” we are interested in.
We say that a context is reversible if for any PG such that the context is complete for the PG , it is possible to find back from the context and the result of .
More formally, when studying reversibility, we want to check if for a given , we are able to define a tractable function such that , .
(A trivially non-reversible context).
Consider the context such that for all types, it returns the empty template graph, i.e., . As it is complete for all property graphs, it is possible to use this context on any property graph. However, applying the context produces the empty RDF graph. Therefore, the use of the context makes the function not injective, and therefore not reversible.
(A more realistic example of a non-reversible context).
Another example of a non-reversible context is the context exposed in Table 4: while this context can be applied on PGs in which edges have the “since” property, the value of this property will never appear in the produced RDF graph.
As not all contexts are reversible, the next sections focus on characterizing some contexts that produce reversible conversions.
Well-behaved contexts
Characterization function
To be able to reverse back to the original PG, we need a way to distinguish the triples that may have been produced by a given member of from the ones that cannot have been produced by it. For this purpose, this section introduces the κ function. This function must verify that, for every triple template and every triple t, if and only if t can be produced from by the β function. It would then follow that two template triples that may produce the same triple have the same image through κ.
(Characterization function).
The κ function maps:
All template triples to a super set of triples that it is able to generate.
All RDF triples t to a super-set of the RDF triples that a template triple that may generate the triple t may also generate. For example, a literal may be generated by any element of P. An element of P may generate any literal. Therefore, the κ function maps all literals to the set of all literals.
The κ function is extended to all template graphs and RDF graphs as .
Table 8 provides an example of applying κ on the running example context of Table 4.
The running example context with the corresponding values through κ
(κ on terms and triples is, as expected, a super-set of the possible generated values).
When comparing the definition of the κ function with the β functions defined in Section 4.5, it appears that:
For elements in B, , L and P, the image of κ is equal to the corresponding image set of the β function.
For elements in I, the image of κ is equal to a singleton containing that element; β maps any IRI to itself.
If the given term is a triple, the image of κ is the cross product of the application of the κ function to the terms that compose the RDF triple. As β on triples recursively applies itself to the three terms in the triple, we can see that .
Therefore, ifxis a term or an RDF triple, for anyβfunction,.
(The result of is, as expected, a subset of the result of κ).
The function, from which is defined, uses β on each template triple. After β is applied, the union of the singletons containing each triple is computed. This is similar to the definition of κ on a set of triples.
From Remark 4, it can be deduced that ifis a set of template triples,.
(A template and its produced values share the same image through κ).
When using the κ function, elements in B and both map to B, and elements in L and P both map to L. Elements in I are wrapped into a singleton and both and apply the function recursively on their members.
When using the β function:
Elements in map for all PGs to elements of and , which are both subsets of B.
Elements in P map to elements in , which is a subset of L.
Elements in L and I are mapped to themselves.
Elements in apply the β function recursively on their members.
Therefore, ,
As mentioned previously, the role of κ is to allow us to determine whether two template triples with placeholders may produce the same triple. It maps all placeholders to a super-set 9 of all elements they can generate with the function. All RDF Triples are mapped by the κ function to a subset of they are a member of.
Note that as κ maps to a super set, it may catch false positives. For example, P can only generate elements in , but the κ function considers that all elements of L can be generated from P. For the scope of this paper, κ catching false positives is considered acceptable, as we are only trying to prove the reversibility of a given class of contexts, rather than to characterize the whole class of reversible contexts.
If a triple is generated by a template graph, then there exists a template triple with the same image through κ.
Per the Definition 21 of , a triple can only be generated by a template graph by the application of β to one of its template triples. Per Remark 6, the generated triple and the corresponding template triple have the same image though κ. □
( template triple).
A template triple is in a set of template triples if no other template triple in the set has the same image through κ equal to .
It is defined as follows with :
Combined with Remark 6, what tells us is that any triple, with the same image through κ as , can not have be generated by any other element of than itself. This leads us to Theorem 1 below.
(Triples produced by a template triple).
In the result of thefunction, if a data triple and a unique template triple have the same value through κ, then the data triple must have been produced by this template triple.
,,, let,,:
We prove the theorem by contradiction.
Let us suppose that:
(A)
(B1) , i.e.
(B2)
(C)
can not be a member of the set , as it explicitly exclude it. As we reached a contradiction, it means that . □
Theorem 1 allows us to link an RDF triple to the unique template triple that produced it. Then by comparing the terms of the RDF triple to the corresponding placeholders in the template triple, we will be able to reconstruct the original PG.
Well-behaved PRSC context
In this section, we define a subset of contexts that we call well-behaved PRSC contexts. In the next section, we will prove that these contexts are reversible.
(Well-behaved contexts).
A PRSC context is well-behaved if it conforms to those 3 criteria:
,
Element provenance: all generated triples must contain the blank node that identifies the node or the edge it comes from. This is achieved by using the placeholder in all template triples:
,
Signature template triple: contains at least one template triple, called its signature and noted , that will produce triples that no other template in can produce. This will allow, for each blank node in the produced RDF graph, to identify its type in the original PG.
, ,
No value loss: for all elements in the PG, we do not want to lose information stored in properties, nor for edges, the source and destination node. Each of these pieces of information must be present in an unambiguously recognizable triple pattern.
,
The set of all well-behaved contexts is , and the set of all well-behaved contexts for a PG is . and .
(Handling multiple candidates).
In the case where there are multiple template triples candidates to become the signature template triple, the choice of the signature template triple among the candidates is generally not important.
To make the choice deterministic, it could be considered that the chosen signature template triple is the first in lexicographic order. In the case of the presented algorithms, the choice of the signature template triple is not important, and will lead to the same output.
(The template graphs used in well-behaved contexts are not empty).
A well-behaved context cannot map a type to an empty template graph: the signature template triple criterion ensures that every template graph contains at least one template triple: , .
(Inside a well-behaved context, all template graphs are different from all others).
For any well-behaved context , two types cannot share the same template graph. Indeed, if two types share the same template graph, i.e. there are two types and with such that , it would contradict the signature template triple criterion as it would lead to .
Table 8 studies the context used in our running example, exposed in Example 12.
The type matches all criteria of a well-behaved PRSC context:
All triples contain .
At least one template triple is a signature: the image through κ of is not contained in the image through κ of other types. It is also the case of .
The properties and have a unique template triple inside .
The type violates the signature template triple criterion as , its only template triple, is shared with the type ,
The type violates the element provenance criterion as is missing. It also violates the no value loss criterion as the term is missing from any template triple.
For all these reasons, this context is not well-behaved.
(A well-behaved context for the running example).
Let be the function described in Table 9. In this new context, an arbitrary IRI is used to sign the PG nodes that have no labels and only a name, and a classic RDF reification is used to model the PG edges.
This context is well-behaved:
An example of a complete and well-behaved context for the Tintin Property Graph
appears in all triples,
Template triples that are signature are marked with a ⋆. At least one signature triple appears for each type,
All property keys have a unique template triple.
The RDF graph produced by the application of the well-behaved context on the running example PG
Listing 3 is the RDF graph produced by the application of the context on the PG . Each part that starts with a # denotes the application of a application to the PG element described in the comment. The elements are ordered in the same order as their type in Table 9, and the RDF triples and the template triples that produced them are also in the same order.
(Relationship between the empty PG and the empty RDF graph with well-behaved PRSC context).
For all well-behaved PRSC contexts, the only PG that can produce the empty RDF graph is the empty PG:
Indeed, Remark 8 ensures that the template graphs are non-empty. So any application of the function with any well-behaved context produces at least one RDF triple. As the produced RDF graph is the union of the graphs produced by the use of on each node and edge, the only way to have an empty result is to have no node nor edge in the property graph.
Complexity analysis and implementation
PRSC well-behaved contexts will be proved to be reversible in Section 5.3, meaning that producing an RDF graph from them will preserve all information stored in the PG. It is therefore important to be able to determine if in practice, it is possible to compute if a PRSC context is well-behaved.
The values through κ of two given terms are either disjoint or equal:
Consider any atomic RDF term t:
If , its value through κ is the singleton composed of the element t. Other terms can not map κ to a super-set of the singleton , in particular no term can be mapped to I.
If , the value through κ is L. No other term can be mapped to a super-set or a subset of L.
If , the value through κ is B. No other term can be mapped to a super-set of a subset of B.
As L, I and B are pairwise disjoint, for two given atomic RDF terms, the value through κ is either disjoint or equal.
For two given RDF triples composed of atomic terms, their value through κ are equals to the Cartesian product of the value through κ of the components. As the values through κ of their components are either equals or disjoints, the values through κ of the triples are also either equals or disjoints. By induction, this is true for any two RDF triples, even if their subject or object are also triples. □
(Implementing κ and complexity analysis).
The function κ is defined to return sets, some of them being infinite sets. While this definition is useful to prove different theorems in this paper, it is not practical from an implementation perspective.
Let λ and δ be two distinct values that are not members of the set I. We propose below an alternative function to be used instead of κ in algorithms:
the sets L and B with two constants λ and δ that are not elements of I,
the cross product with a simple triple of the values returned for each element of x when x is a triple.
The complexity of the is:
For any x that is not a triple nor a graph, calls to this function can be done in constant time, by simply checking the type of x.
When x is a triple, calls to this function involves recursive calls up to the depth of x, which we consider to be bounded by a constant (see Section 4.6.2). So it is also done in constant time.
When x is a graph, calls to this function involves calling on each triple of the graph. As the call on a triple is constant, the call on the graph x has a linear complexity depending on the size of the graph.
Note that:
For two triples, checking if their value through are equals can be done in constant time.
Thanks to Lemma 2, checking if the value through of a triple t is included in the value through of a graph can be done in linear time by iterating on each triple of the graph and comparing the values through of the triples t and .
(Complexity of checking if a PRSC context is a well-behaved).
The first task to check if a context is well-behaved consists in computing the value through κ of all triples used in it. As the depth of a template triple is considered to be negligible, the complexity is the number of template triples, bounded to the number of types multiplied by the size of the biggest template graph: .
After the value through κ of all template triples have been computed, for each type, we need to check if the type complies with the three criterion exposed in the Definition 24.
The element provenance criterion consists in checking if is in all templates triples of all type. This task has an complexity for each template triple and an overall complexity for the whole context.
The signature template triple consists in checking if there is at least one signature template triple in the template graph of all types, i.e. checking if the value through κ of one of the template triples of each type is not contained in the set of the value through κ of the other types template graph. As hash sets make the membership check constant, this task has an complexity for a single template triple candidate, and an overall for the whole context.
For a given type, checking the no value loss criterion consists in checking if a template triple can be found in the template graph for each placeholder, i.e. a placeholder corresponding to each property keys in the type; and if the PG element is an edge, the and placeholders must also be found. Thanks to hash sets, checking if a template triple is inside its template graph is constant. Implementing the test by following the definition leads to an complexity for each type and an complexity for the whole context.
The final complexity of checking if a context is a well-behaved PRSC context is:
Reversion algorithm
Algorithm 2 aims to convert an RDF graph, that was produced from a PG and a known well-behaved context, into the original PG.
It is a four steps algorithm: 1) it finds the elements of the PG, by assuming they are the same as the blank node in the RDF graph, 2) it gives a type to all PG elements with the function in Algorithm 3,10
To help the comprehension of Algorithm 3, we recall that for a given set A, the mathematical notation means that in the set A, there is one and only one element, denoted by a, that matches . By extension, means that there is one and only one element in the set A that is denoted by a.
3) it assigns each triple to a single PG element, corresponding to the production of the function, with the function in Algorithm 4, and 4) it looks for the source, destination and properties of all elements with the function in Algorithm 5.
The main algorithm to convert back an RDF graph into a PG by using a context
Associate the elements of the future PG with their types
Associate each triple to the element that has produced it
Produce a PG from the previous analysis of the elements and triples
Further subsections prove that for all , for all PGs , applying these algorithms to actually produces , meaning that the reversion algorithm is a sound and complete implementation of for well-behaved contexts. Applying this algorithm to an arbitrary RDF graph and/or an arbitrary context is out of the scope of this paper.
Finding the elements of the PG
The first step of the algorithm relies on the assumption that the blank nodes of the RDF graph and the elements of the original PG are the same.
(Equality between the elements of a PG and the blank nodes of the RDF graph).
If the RDF graphhas been produced from a PGand a PRSC well-behaved context, then the set of all blank nodes ofis the set of PG elements of.
The function, described in Section 4.5, produces specific triples depending on the given template. The template graphs cannot contain blank nodes: the blank node produced by are forced to be the elements of the converted BPG. So .
From Remark 8, we know that contains at least one triple pattern . Combined with the element provenance criterion from Definition 24-1, we know that . When β is applied to , a triple that contains m is forced to appear, meaning that .
□
Theorem 2 proves the correctness of the step in Algorithm 2.
Finding the type related to each element
In this part of the proof, we show that the function from Algorithm 3 is correct, i.e. it is able to find back the right type of all elements m in the original graph.
If a data triple shares the same value through κ as one of the signature triples of a type (Definition
24
-2), then the element from which the data triple was produced must be of this type:
, ,
□
(Formalizing ).
For a given blank node/PG element b, and , introduced in Algorithm 3, can be formally defined as:
They give the set of all node types and edge types, respectively, for which one of their signature triple could have produced a triple with b.
( correctness).
Even though thefunctions are defined by only used the used context and the produced RDF graph, they can be used to compute the type of any blank node in the original PG:
,
,and.
Table
10
provides an overview of the cardinality of the differentsets.
We are going to restrict the portion of the graph where such triples may be located:
We see that all triples contributing to must have been produced by the signature triple template applied to a node from the PG. Also remember that must contain b.
If , then the signature triple of must have generated a containing b (since it must contain , according to Definition 24-1), so . Furthermore, no other node can produce a containing b ( is the only blank node placeholder in node type templates), so can not contain any other type. Therefore .
If , it is impossible to produce the blank node b from any node (again, is the only blank node placeholder in node type templates). No containing b can be found, so is empty.
The reasoning for when b is an edge is similar to the one for when b is a node: only b can produce triples containing itself, and it will, because having at least one signature triple with is imposed by Definition 24. So .
Finally, a blank node can appear in any number of triples that share the same value through κ with an edge signature template triple: an edge signature template triple can contain or , that can be mapped to any node depending on the PG. So can contain an arbitrary number of types in that case. □
Theorem 3 not only shows that the function in Algorithm 3 will always find the right function by using , i.e. that it is computable from and , but Table 10 also explicitly shows that the Error(No type found) scenario can not appear if the RDF graph was produced from a PG, making the function both sound and complete.
(Using different signatures to determine the types).
As mentioned previously, the choice of the signature template triple of a given type is not important for the reversion algorithm. Choosing one or another signature template triple only leads to other data triples being used to determine the type of the elements. However, the end result does not change.
Finding the generated triples for each PG element
For each PG element, given the produced RDF graph and the type of this PG element, we are able to compute the list of RDF triples produced from this PG element. In other words, Algorithm 4 correctly partitions the RDF graph into sub-graphs describing each element of the original PG.
In Algorithm
4
, assuming that the passed value ofis equal to,,.
As , each triple is a member of at least one . For all triples , the element provenance criterion ensures that . So the first step that consists in listing in the set the blank nodes in , and consider that m is part of the set is correct: the actual element m is in the set.
The algorithm associates each triple with a single :
Let’s first recall that blank nodes in can only be produced via the placeholders from : , , and . Let’s also recall that every triple pattern in the Well-behaved context must contain (per the element provenance criterion).
If contains only one blank node m, then m must come from , and the corresponding triple pattern must then belong to . must then have been produced by so putting it in is correct.
If contains multiple blank nodes, must have been produced by a template triples with several placeholders from .
Node template graphs can contain only one placeholder from : . No could then have produced . It follows that must have been produced by the template graph of an edge.
Edge template graphs can contain several placeholders from . But by definition of β, only can be mapped to an edge (when ); and are always mapped to nodes. Of the multiple blank nodes in , exactly one of them, m, must therefore be an edge, and come from . Following the same reasoning as above, when contained only one blank node, we conclude that must then have been produced by and that putting it in is correct.
Error(No element provenance) will never be raised if was produced by : each triple will contain at least one blank node generated from (per the element provenance criterion), and if there are multiple blank nodes we showed that there must be only one edge blank node.
As each triple in is attributed in to the right element m that produced it from , , . □
Building the PG element
Projecting Property Graphs As an RDF graph is defined as a set of RDF triples, any subset of that set, as well as the union of two RDF graphs, are formally defined and are also RDF graphs. Algorithm 5 constructs back the original PG in an iterative manner. To prove its correctness, we need operators similar to ⊂ and ∪ for RDF graphs, but for our formalization of PGs.
In this section, the projection of a Property Graph is defined by focusing only on a single PG element, node or edge. The concept of merging PGs, which is the inverse of the projection, is also defined.
Let be a PG.
(π projection of a Property Graph on an element).
The π projection of a PG on a node is equal to the PG with only the node itself. The π projection of a PG on an edge is the edge, and its source and destination nodes without the labels and properties of these nodes.
, is a PG such as:
If , , ,
If , , , ,
,
, , all other values are undefined.
(Property Graph merge operator ⊕).
The merge operator ⊕ is the inverse of the projection operator π. It can only be used on two PGs that are compatible, i.e. (1) a PG element defined as a node is not defined as an edge in the other, (2) an edge defined in both PGs have the same source and destination in both, and (3) if in both PGs, the value of a property key on the same PG element is defined, the values should be the same. The ⊕ operator builds a PG with the PG elements, labels and properties of both PGs.
We now define the ⊕ merge operator on property graphs. (or ) is defined only if:
is compatible with and is compatible with (see compatibility definition in Section 3.2).
is compatible with .
Its value is with:
, .
, .
,
, .
⊕ is commutative, associative, and the neutral element is the empty PG
⊕ is defined by using the ∪ operator, which is commutative, associative and whose neutral element is ∅. The equivalent of ∅ for PGs is . □
The ⊕ merge of the π projection of a PG on all its PG elements is equal to the PG itself:
While the ⊕ operator has been primarily designed as the inverse operation of the projection operator π, it can only merge two PG that are consistent between themselves. If a PG defines an entity as a node, the other PG can not define it as an edge. If a PG has a given source and destination for a given edge, the other one can not define another source or destination. Properties must also be consistent in both PGs. The presented version of the merge operator can only add information, and in the case merging two PGs would lead to an inconsistent PG, the merge operator is undefined.
Relationship between thefunction and projections We are now going to redefine the function using the π operator.
The RDF graph built by from a PG with a context is equal to:
The function is defined in such a way that the RDF triples it produces from an element m are only influenced by:
m itself.
Its labels, i.e..
Its property values, i.e., .
If m is an edge, its source and destination nodes, i.e. and .
The template graph .
Therefore the following equality can be asserted, , , :
can be considered as the minimal required Property Graph to produce the RDF triples related to the element m in the PG . If we can prove that the reversion algorithm constructs all graphs and merges them with the ⊕ operator, then it means that we have properly reconstructed the PG.
Completing the proof of the reversion algorithm We are now back to proving that for all well-behaved contexts , the function presented in Algorithm 2 is an implementation of the function. is a PG for which we know the value of . We are focusing on the last line of Algorithm 2, where the function is invoked.
To prove the correctness of the function in Algorithm 5, starting from an empty PG g, we are going to show that at each iteration, we are adding to the PG g the π projection of on an element m. After iterating on all elements, as we merged the π projection of all PG elements, the PG g ends up being equal to the PG itself.
(Merging the projection of one PG element to the reconstructed PG).
In Algorithm
5
, assuming that theparameter is equal toandis a total function that maps all PG elements b to, at the end of an iteration of an elementafter line 13, the computed PGis equal to, whereis the PG g at the beginning of the iteration between lines 3 and 4.
The PG is described in Table 11. Bold values are the ones for which we need to prove that we compute the correct value: , and . Other values are trivially correct by construction.
Description of the PG projection that is built in Algorithm 5
∅
In the following, we want to check that properly returns . Proofs for / and / are identical.
The set is filled by iterating on all such that . The no value loss criterion ensures that at least one such template triple exists, so the loop in is iterated at least once.
Theorem 1 ensures that the built set in the loop of the function will always have 1 element, that we name . Error(Unique data triple is not unique) may never be raised if was produced by PRSC. By definition of the function, in and in are at the same position.
After the loop, because only is added to in the loop, Error(Not exactly one value for a placeholder) may never be raised.
The last instructions differ for / and P. In the case of and , the obtained value is directly the value of the PG node; in the case of P, the obtained RDF literal needs to be converted into the proper PG property value, which is possible because is assumed to be computable in Section 4.5.
properly computes the values that are missing in . When these values are extracted, they are directly merged with the ∪ operator into the g property graph. Values that were already known or can be computed from the values that were just extracted, i.e., and , are also merged into g.
As all values of are merged into , □
(Completeness of ).
In the case where is built from a PG , the value that a placeholder is mapped to is the same everywhere, so we never run at the risk of encountering multiples values, i.e. Error(Not exactly one value for a placeholder) is never raised. Furthermore, the proof of Lemma 5 shows that Error(Unique data triple is not unique) may not be raised, because we know that each unique template triple has produced one data triple.
(Merging the projection of all PG elements to the reconstructed PG).
Under the same assumptions as Lemma
5
, the PG returned by Algorithm
5
is the original, the PG that was used to produce the RDF graph.
The PG g in the algorithm is initialized to . Lemma 5 shows that after each iteration in the loop with an element b, the PG g is ⊕-merged with the PG . The loop iterates on all elements in the PG , so after all the iterations, the PG g is equal to:
□
As in Algorithm 5 correctly reconstructs , and as its value is directly returned by the function in Algorithm 2, we have finally proven that the latter is a sound and complete implementation of the function for any well-behaved PRSC context .
Complexity analysis
Let us now discuss the complexity of the function described in Algorithm 2. In this discussion, a new metric is considered: the number of triples in the RDF graph: . It is assumed that we have first checked if the context is a well-behaved PRSC context, and computed the function so it can now be called in constant time.
Extracting the list of blank nodes of an RDF graph on line 2 has a linear complexity of .
The function in Algorithm 3 uses three nested loops and only uses constant time operations: its complexity is .
It loops all triples in the template graph such that they are . Evaluating itself has an complexity so the overall complexity of evaluating all is .
Inside the loop, building the set forces to loop on all , multiplying the complexity by a factor.
In the context of the function, the function is always called with a template graph from the PRSC context and a sub-graph of the RDF graph as . the complexity of the calls of the function in the function is .11
Note that if has been generated by the function, the number of triples in the graph is inferior or equal to the number of triples in the template graph as has been generated from . In this case, the complexity of calling in the context of the function is .
The function loops on all PG elements.
Notice that while is called multiple times, it can be called once and then its value can be cached. Its cost is as mentioned in Section 4.6.3.
There are at most calls of the function. All of them have a complexity.
The complexity of each iteration is
The overall complexity of the function is
The overall complexity of the function presented in Algorithm 2 presented in this section for an RDF graph produced from a PG and a well-behaved PRSC context is:
The function is computable in polynomial time w.r.t. all the considered metrics, and is therefore considered as tractable. Furthermore, in our implementation, the algorithm has been optimized to significantly lower the complexity. However, for the sake of concision, we do not describe these optimizations in this paper.
Edge-unique extension
In many cases, there is only one edge of certain types between two nodes, like the “TravelWith” edge in our running example or for relationships like knowing someone, a parental relationship… For this type of edges, it is more intuitive to represent them with a simple RDF triple, and get rid of the blank node corresponding to the edge. However, Well-Behaved PRSC contexts require in edge templates. In this section, we propose an extension to allow to be missing in edge templates and still produce reversible conversions.
A context for the Tintin PG with the “since” property
Consider the Tintin PG exposed in Fig. 1 and the context exposed in Table 12, which uses RDF-star to convert the “since” property. The output of PRSC from those two inputs is exposed in Listing 4. By looking at the produced RDF graph, it appears that the RDF graph captures all the information of the PG. More generally, RDF graphs produced by this context would always be reversible as long as the source PG does not contain multiple “TravelsWith” edges between two given nodes.
The output of PRSC for the Tintin PG and the context exposed in Table 12
(Edge-unique extension).
a) In a context , an edge-unique type is an edge type such that:
complies with the no value loss criterion (as per Definition 24-3) and is not empty.
For all template triples :
and
is a signature template triple (as per Definition 24-2), i.e. no other type has a template triple that shares its value through κ.
is a template triple, i.e. no other template triple in shares its value through κ.
b) A PG is said edge-unique valid for a context if for all edge-unique types in the context, there is at most one edge of this type between two given nodes:
c) The function is introduced to serve as a proxy to the function to be applied only if the given PG is edge-unique valid relatively to the given context:
Theorem 7 shows that is reversible up to an isomorphism.
Letbe a context such that each type either a) matches the constraints of a type in a well-behaved PRSC context in Definition
24
or b) is an edge-unique type.
For every two BPGs,and, such that,andare isomorphic.
There is an algorithm such that for all BPGs, from the RDF graphand the context, the algorithm computes a PGsuch that, i.e. from the produced RDF graph and the context, it is possible to compute a PG that is isomorphic to the original one.
The context is composed of two parts: a) the well-behaved part and b) the edge-unique part. The well-behaved part has been proved to be reversible. As template triples used for edge-unique types are signatures, their value trough κ is different from the triples produced from the value through κ of the triples of the well-behaved part: triples produced from edge-unique types are distinguishable from the rest of the RDF graph.
Denote W the set of all types in the well-behaved part and U the types in the edge-unique part. Let be a PG such that exists. It is possible to split using W and U:
It is also possible to split by defining an predicate that uses κ to filter triples that come from types in the well-behaved part: , , , .
From all the theorems on well-behaved contexts, there is a bijection between and .
All template triples used in the template graph of edge-unique types are both signature and unique: from any triple in , it is possible to find which template triple produced it. Consider an arbitrary edge u, whose type is an edge-unique type, i.e.. As edge-unique template graphs must also comply with the no value loss criterion, all properties, the source node and the destination node of u can be found in a non-ambiguous manner in . The only missing information is the edge identity, i.e. the blank node u itself.
By using a fresh blank node for u, it is possible to build a PG isomorphic to from , by extension, a PG isomorphic to from , and by extension a PG isomorphic to from . □
Discussion about the constraints on well-behaved PRSC contexts
In this section, we discuss the acceptability of the different constraints posed by PRSC well-behaved contexts in terms of usability. In other words, to what extent do they limit what can be achieved with PRSC?
The no value loss criterion on well-behaved contexts ensures that the data are still present and can be found unambiguously: as its name implies, this constraint is obviously required to avoid information loss. Therefore, it should not be perceived as overly constraining when building PRSC contexts.
The signature template triple is a method to force the user to type the resources, which is usually considered to be good practice. The type can either be explicit, through a triple with as the predicate, or implicit through a property that is only used by this type. For example, the template graph for a type Person could contain a template triple for the form . The constraint of a signature composed of only one triple can be considered too strong: one may want to write a context that works for all PGs. For example, many authors [20,32] propose to map each label to an RDF type or a literal used as the object of a specific predicate like pgo:label. More generally, users may want to use a composite key to sign their types. For these kinds of mappings, our approach of identifying the type by finding a single signature template is not sufficient. It requires finding all the signature template triples and deciding to which type they are associated, for example through a Formal Concept Analysis process. This could be studied as a future extension of the PRSC reversion algorithm.
The element provenance constraint may hinder the integration of RDF data coming from a PG with regular RDF data: it forces the user to keep the structure exposed in the PG, with blank nodes representing the underlying structure of the PG. The edge-unique extension enables to leverage this constraint, by avoiding representing PG edges as RDF nodes.
Related works
Many works already exist to address the interoperability between PGs and RDF.
A common pivot for PGs and RDF To achieve interoperability, some authors propose to store the data into another data model, and then expose the data through usual PG and RDF APIs. Angles et al. propose multilayered graphs [4], for which the OneGraph vision from Lassila et al. [23] is a more concrete version. These works propose to describe the data with a list of edges, with the source of the edge, a label and the destination of the edge. All edges are associated with an identifier, that can be used as the source or the destination of other edges. However, authors note that several challenges are raised about the way to implement the interoperability between the OneGraph model and the existing PG and RDF APIs.
In a Unified Relational Storage Scheme [35], Zhang et al. propose to store the data in relational databases. While they specify how to store both models in a similar relational database structure, they do not mention how they align the data that come from one model with the data that come from another, for example to match the PG label “Person” with the RDF type foaf:Person.
The Singleton Property Graph model proposed by Nguyen et al. [25] is an abstract graph model that uses the RDF Singleton Property pattern that can be implemented both with a PG and an RDF graph. They also describe how to convert a regular RDF graph or a regular PG into a Singleton Property Graph. But the use of the Singleton Property pattern induces the creation of many different predicates, which hinders the performance of many RDF database systems as shown by Orlandi et al. [26].
From PGs to RDF In terms of PG to RDF conversion, the most impactful work is probably RDF-star [17–20], an extension of the RDF model originally proposed by Olaf Hartig and Bryan Thompson to bridge the gap between PGs and RDF by allowing the use of triples in the composition of other triples. Indeed, the most blatant difficulty when converting PG to RDF is converting the edge properties. However, most PG engines support multi-edges, i.e. two edges of the same type between the two same nodes. On the other hand, the naive approach consisting in using the source node, the type of the edge and the destination node as respectively the subject, the predicate and the object of an RDF triple would collapse the multi-edges. Converting each edge property to an RDF-star triple that uses the former triple as its subject would lead to the properties of each multi-edge to be merged. Khayatbashi et al. [21] study on a larger scale the different mappings described by Hartig and benchmark them. By allowing triples to be used as the subject and the object of other triples, it is possible to emulate the edge properties of PGs. While these mappings allow some kind of user customization, by letting them choosing the used IRIs, they never consider using different model structures for different PG types during the same conversion. For example, consider a PG with two types of edges: one edge type with the “knows” label and one with the “marriedTo” label. The mappings described in this paper do not enable the user to model edges with the “knows” label as a predicate and edges with the “marriedTo” label as a proper RDF resource. To tackle the edge property problem, Das et al. study how to use already existing reification techniques to represent properties [12]: the modelings that do not rely on quads can be used when writing a PRSC context.
Tomaszuk et al. propose the Property Graph Ontology (PGO) [32], an ontology to describe PGs in RDF. As this solution only describes the structure of the PG in RDF, the produced data is forced to use this ontology, with the exception of other already existing RDF ontologies without further transformations. Thanks to the Neosemantics12
https://github.com/neo4j-labs/neosemantics
plugin developed by Barrasa, Neo4j is able to benefit from RDF related tools like ontologies, and performs a 2-way conversion from and to RDF-star data. However, the PG to RDF conversion performed by Barrasa tends to affirm all triples it can, even for PG edges that may describe facts with a probability or that are time restricted: if the marriage between Alice and Bob has ended in 2017, the triple :Alice :marriedto :Bob should probably not be produced.
Gremlinator [31] allows users to query a PG and an RDF database by using the SPARQL language. This is a first step towards federated queries. However, it supposes that data stored in the PG and data stored in the RDF graph have a similar modeling, and it does not support RDF-star.
Instead of having a fixed mapping, our work on PREC [10] propose a mapping language named PREC to drive the conversion from PG to RDF. Delva propose RML-star [13], an extension of RML [14] and R2RML [30] that introduces new RML directives to generate RDF-star triples. As discussed in Section 2, the format in which the template triples are described in this work is closer to the produced triples, at the cost of reducing the ability to produce templated IRIs or terms.
To leverage the requirement for users to manually write the mapping, Fathy et al. proposed ProGoMap [15], an engine that first generates a putative ontology for the terms in a PG, aligns this ontology with a real world ontology, and finally converts the PG to an RDF graph with an RML mapping generated from the alignment. The authors motivate the choice of automating the ontology alignment process by mentioning that writing mapping manually can be time-consuming. While this may be true, we do not think that this hinders PRSC usefulness as 1) the schema part of a PRSC context may be generated automatically, and nothing prevents a tool from generating the template triples automatically, 2) a PG schema for which writing template triples is time-consuming may be an indicator that the schema is too complex, and therefore that the structure of the data stored in the PG should be cleaned, not only to make the conversion to RDF easier, but also as a benefit for the PG itself, and 3) while the process of aligning terms of the putative ontologies to terms of real ontologies indeed increase data interoperability, it still relies on the idea that the structure of the PG is somewhat close to the structure of the desired RDF graph. However, in this paper, we advocate that it may not be the case, and that the desired method to model each edge type depends on the semantics of held by each edge type.
In general, to the best of our knowledge, most existing works only tackle the syntactic aspect of converting PG to RDF data. While this level of interoperability is appreciated, it is not enough to be able to properly use the converted data in any existing RDF application without further modification of the RDF graph. Other existing works provide a level of semantic interoperability. However, they tend to choose one of the many syntactic representations of edges that exist, despite RDF ontologies having a great variety of patterns to model knowledge. PRSC gives full control to the user, by letting them choosing both the syntactic representation and use shared vocabularies in the produced RDF graph. The use of PG schemas in PRSC contexts guides users in the process of writing the context. The study on PRSC well-behaved contexts, and the related discussion on their constraints in Section 5.5, provide information on if the PRSC context written by the user leads to a reversible conversion or not, and by consequence, if any information is lost in the process of converting the data stored in the PG into RDF.
From RDF to PGs Abuoda et al. [1] study the different RDF-star to PG approaches and identified two classes: the RDF-topology preserving transformation which converts each term into a PG node, and the PG transformation that converts literals into property values. They also evaluate the performance of these different approaches. The PRSC reversion algorithm, and the general philosophy of this work clearly falls under the latter category. The former can be considered as using a PG database to store RDF data.
Angles et al. [5] discuss different methods to transform an RDF graph into a PG. They propose different mappings, including an RDF-topology preserving one and a PG transformation. Atemezing and Hyunh [6] propose to use a mapping similar to the former to publish and explore RDF data with PG tools, namely Neo4j. However, these works offer little customization for the user.
With G2GML [11], Chiba et al. propose to convert RDF data by using queries: the output of the query is transformed into a PG by describing a template PG, similar to a Cypher insert query. This approach can be considered to be a counterpart of PRSC, but to convert RDF into PG.
PG schemas Finally, the “Property Graph needs a Schema” Working Group propose a formal definition of PG schemas [3]. Some PG engines, like TigerGraph, are based on the use of schemas. For PG engines that do not enforce a schema at creation, like Neo4j or Amazon Neptune, the schema may be extracted from the data, as proposed by Bonifati et al. [8] or Beereen [7]. As PRSC uses schemas for mapping between PGs and RDF graphs, these approaches may be used to automatically list the types existing in the PG to convert i.e. the target of the rule part in Listing 2. Then the user would only have to provide the way to convert these types into RDF, i.e. the template graph part.
Conclusion
This work improves interoperability between the two worlds of Property Graphs and RDF graphs. We have presented PRSC, a mapping language to convert PGs into RDF graphs. A mapping, named PRSC context, is written by the user and is driven by a schema: PG elements are converted according to their type. By letting the user configure the conversion, we aim to better integrate PG data into already existing RDF graphs: the produced RDF graphs can be made to use a specific vocabulary, or comply with specific shapes.
We have also proved that some PRSC contexts, named well-behaved PRSC contexts, are reversible: they do not induce any information loss, and therefore it is possible to reverse back to the original PG from the produced RDF graph. Finally, we broaden the realm of reversible contexts with the edge-unique extension. Other existing works focus either on describing a syntactic construction, or providing a semantic construction that relies on a specific syntactic pattern. PRSC lets the user specify the semantics, and proved the reversibility for the two most popular methods to model edges: as an RDF resource in PRSC well-behaved contexts, and as a predicate through the edge-unique extension.
For big PGs, fully converting them into RDF may not scale. For this reason, future works include studying how to use PRSC context not only for PG conversion but also to convert SPARQL queries into the usual PG query languages Cypher and Gremlin. This would not only address the scalability issues, but also avoid data duplication and help for federated queries.
The expressiveness of PRSC contexts could also be extended. As it is currently presented, PRSC contexts are unable to reproduce RDF graphs complying with some ontologies, for example the PG ontology [32]. To solve this issue, PRSC contexts should be able to introduce new blank nodes, and not be limited to the ones in the original PG. This would lead to new challenges, as the presented reversion algorithm relies on the fact that all blank nodes are PG elements.
Other extensions on expressiveness may also be interesting. For examples, types in PRSC contexts are closed, in the sense that a complying element must have exactly the properties of the type, barring any other. Allowing extra properties in elements of the PGs would be useful, but raises the challenge of converting properties that are not known in advance.
To let PG data further benefit from the tools that have been developed around RDF, PRSC could also be explored in two directions. The use of blank nodes for the PG elements may not be suitable in all cases, especially in a linked data context. PRSC could be extended to mint IRIs for nodes and edges of the PG, but this would require to extend the mapping language to be able to express these template IRIs. It would also require an adaptation of the reversion algorithm to be able to differentiate the minted IRIs from the “static” IRIs of the template, which would require additional precautions on well-behaved contexts.
The provided reversion algorithm does not only work for RDF graphs that were produced by PRSC, but can work on any compatible RDF graph. One way to use it would be to use PRSC to convert a PG to an RDF graph, modify the produced RDF graph with RDF-specific tools, e.g. by using a reasoner designed for RDF graphs, and then transform back the RDF graph into a PG, which could be considered as equivalent as using a reasoner on a PG. However, this requires to formally characterize the modifications that can be performed on the RDF triples while maintaining the ability to convert it back to a PG.
Footnotes
References
1.
G.Abuoda, D.Dell’Aglio, A.Keen and K.Hose, Transforming RDF-star to Property Graphs: A preliminary analysis of transformation approaches, in: Proceedings of the QuWeDa 2022: 6th Workshop on Storing, Querying and Benchmarking Knowledge Graphs Co-Located with 21st International Semantic Web Conference (ISWC 2022), Hangzhou, China, 23–27 October 2022, M.Saleem and A.N.Ngomo, eds, CEUR Workshop Proceedings, Vol. 3279, CEUR-WS.org, 2022, pp. 17–32, https://ceur-ws.org/Vol-3279/paper2.pdf.
2.
R.Angles, The property graph database model, in: Proceedings of the 12th Alberto Mendelzon International Workshop on Foundations of Data Management, Cali, Colombia, May 21–25, 2018, D.Olteanu and B.Poblete, eds, CEUR Workshop Proceedings, Vol. 2100, CEUR-WS.org, 2018, https://ceur-ws.org/Vol-2100/paper26.pdf.
R.Angles, A.Hogan, O.Lassila, C.Rojas, D.Schwabe, P.Szekely and D.Vrgoč, Multilayer graphs: A unified data model for graph databases, in: Proceedings of the 5th ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), GRADES-NDA ’22, Association for Computing Machinery, New York, NY, USA, 2022. ISBN 9781450393843. doi:10.1145/3534540.3534696.
5.
R.Angles, H.Thakkar and D.Tomaszuk, Mapping RDF databases to property graph databases, IEEE Access8 (2020), 86091–86110. doi:10.1109/ACCESS.2020.2993117.
6.
G.A.Atemezing and A.Huynh, Knowledge Graph publication and browsing using Neo4J, in: 1st Workshop on Squaring the Circles on Graphs, SEMANTiCS, Amsterdam, Netherlands, 2021.
7.
N.Beeren, Designing a Visual Tool for Property Graph Schema Extraction and Refinement: An Expert Study, CoRR, 2022, arXiv:2201.03643.
8.
A.Bonifati, S.Dumbrava, E.Martinez, F.Ghasemi, M.Jaffré, P.Luton and T.Pickles, DiscoPG: Property Graph schema discovery and exploration, Proc. VLDB Endow.15(12) (2022), 3654–3657, https://www.vldb.org/pvldb/vol15/p3654-bonifati.pdf. doi:10.14778/3554821.3554867.
J.Bruyat, P.-A.Champin, L.Médini and F.Laforest, PREC: Semantic translation of property graphs, in: 1st Workshop on Squaring the Circles on Graphs, SEMANTiCS, Amsterdam, Netherlands, 2021, https://hal.archives-ouvertes.fr/hal-03407785.
11.
H.Chiba, R.Yamanaka and S.Matsumoto, G2GML: Graph to graph mapping language for bridging RDF and Property Graphs, in: Proceedings of the ISWC 2020 Demos and Industry Tracks: From Novel Ideas to Industrial Practice Co-Located with 19th International Semantic Web Conference (ISWC 2020), Globally Online, November 1–6, 2020 (UTC), K.L.Taylor, R.S.Gonçalves, F.Lécué and J.Yan, eds, CEUR Workshop Proceedings, Vol. 2721, CEUR-WS.org, 2020, pp. 363–368, https://ceur-ws.org/Vol-2721/paper591.pdf.
12.
S.Das, J.Srinivasan, M.Perry, E.I.Chong and J.Banerjee, A tale of two graphs: Property Graphs as RDF in Oracle, in: Proceedings of the 17th International Conference on Extending Database Technology, EDBT 2014, Athens, Greece, March 24–28, 2014, S.Amer-Yahia, V.Christophides, A.Kementsietsidis, M.N.Garofalakis, S.Idreos and V.Leroy, eds, OpenProceedings.org, 2014, pp. 762–773. doi:10.5441/002/EDBT.2014.82.
13.
T.Delva, J.Arenas-Guerrero, A.Iglesias-Molina, Ó.Corcho, D.Chaves-Fraga and A.Dimou, RML-star: A declarative mapping language for RDF-star generation, in: Proceedings of the ISWC 2021 Posters, Demos and Industry Tracks: From Novel Ideas to Industrial Practice Co-Located with 20th International Semantic Web Conference (ISWC 2021), Virtual Conference, October 24–28, 2021, O.Seneviratne, C.Pesquita, J.Sequeda and L.Etcheverry, eds, CEUR Workshop Proceedings, Vol. 2980, CEUR-WS.org, 2021, https://ceur-ws.org/Vol-2980/paper374.pdf.
14.
A.Dimou, M.V.Sande, P.Colpaert, R.Verborgh, E.Mannens and R.V.de Walle, RML: A generic language for integrated RDF mappings of heterogeneous data, in: Proceedings of the Workshop on Linked Data on the Web Co-Located with the 23rd International World Wide Web Conference (WWW 2014), Seoul, Korea, April 8, 2014, C.Bizer, T.Heath, S.Auer and T.Berners-Lee, eds, CEUR Workshop Proceedings, Vol. 1184, CEUR-WS.org, 2014, https://ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf.
15.
N.Fathy, W.Gad, N.Badr and M.Hashem, ProGOMap: Automatic generation of mappings from Property Graphs to ontologies, IEEE Access9 (2021), 113100–113116. doi:10.1109/ACCESS.2021.3104293.
16.
N.Francis, A.Green, P.Guagliardo, L.Libkin, T.Lindaaker, V.Marsault, S.Plantikow, M.Rydberg, P.Selmer and A.Taylor, Cypher: An evolving query language for Property Graphs, in: Proceedings of the 2018 International Conference on Management of Data, SIGMOD ’18, Association for Computing Machinery, New York, NY, USA, 2018, pp. 1433–1445. ISBN 9781450347037. doi:10.1145/3183713.3190657.
17.
O.Hartig, Foundations of RDF⋆ and SPARQL⋆ (an alternative approach to statement-level metadata in RDF), in: Proceedings of the 11th Alberto Mendelzon International Workshop on Foundations of Data Management and the Web, Montevideo, Uruguay, June 7–9, 2017, J.L.Reutter and D.Srivastava, eds, CEUR Workshop Proceedings, Vol. 1912, CEUR-WS.org, 2017, https://ceur-ws.org/Vol-1912/paper12.pdf.
18.
O.Hartig, Foundations to query labeled Property Graphs using SPARQL, in: Joint Proceedings of the 1st International Workshop on Semantics for Transport and the 1st International Workshop on Approaches for Making Data Interoperable Co-Located with 15th Semantics Conference (SEMANTiCS 2019), Karlsruhe, Germany, September 9, 2019, L.Kaffee, K.M.Endris, M.Vidal, M.Comerio, M.Sadeghi, D.Chaves-Fraga and P.Colpaert, eds, CEUR Workshop Proceedings, Vol. 2447, CEUR-WS.org, 2019, https://ceur-ws.org/Vol-2447/paper3.pdf.
19.
O.Hartig, P.-A.Champin, G.Kellogg and A.Seaborne, RDF-star and SPARQL-star, W3C Community Group Report, 2021, https://www.w3.org/2021/12/rdf-star.html.
20.
O.Hartig and B.Thompson, Foundations of an alternative approach to reification in RDF, 2014, arXiv preprint arXiv:1406.3399.
21.
S.Khayatbashi, S.Ferrada and O.Hartig, Converting property graphs to RDF: A preliminary study of the practical impact of different mappings, in: GRADES-NDA ’22: Proceedings of the 5th ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), Philadelphia, Pennsylvania, USA, 12 June 2022, V.Kalavri and S.Salihoglu, eds, ACM, 2022, pp. 10:1–10:9. doi:10.1145/3534540.3534695.
22.
D.Kontokostas and H.Knublauch, Shapes Constraint Language (SHACL), W3C Recommendation, W3C, 2017, https://www.w3.org/TR/2017/REC-shacl-20170720/.
23.
O.Lassila, M.Schmidt, O.Hartig, B.Bebee, D.Bechberger, W.Broekema, A.Khandelwal, K.Lawrence, C.López-Enríquez, R.Sharda and B.B.Thompson, The OneGraph vision: Challenges of breaking the graph model lock-in, Semantic Web14(1) (2023), 125–134. doi:10.3233/SW-223273.
24.
D.L.McGuinness, F.Van Harmelenet al., OWL web ontology language overview, W3C recommendation10(10), 2004, 2004.
25.
V.Nguyen, H.Y.Yip, H.Thakkar, Q.Li, E.Bolton and O.Bodenreider, Singleton Property Graph: Adding a Semantic Web abstraction layer to graph databases, in: Proceedings of the Blockchain Enabled Semantic Web Workshop (BlockSW) and Contextualized Knowledge Graphs (CKG) Workshop Co-Located with the 18th International Semantic Web Conference, BlockSW/CKG@ISWC 2019, Auckland, New Zealand, October 27, 2019, R.Samavi, M.P.Consens, S.Khatchadourian, V.Nguyen, A.P.Sheth, J.M.Giménez-García and H.Thakkar, eds, CEUR Workshop Proceedings, Vol. 2599, CEUR-WS.org, 2019, https://ceur-ws.org/Vol-2599/CKG2019_paper_4.pdf.
26.
F.Orlandi, D.Graux and D.O’Sullivan, Benchmarking RDF metadata representations: Reification, singleton property and RDF, in: 15th IEEE International Conference on Semantic Computing, ICSC 2021, Laguna Hills, CA, USA, January 27–29, 2021, IEEE, 2021, pp. 233–240. doi:10.1109/ICSC50631.2021.00049.
M.A.Rodriguez, The Gremlin graph traversal machine and language (invited talk), in: Proceedings of the 15th Symposium on Database Programming Languages, DBPL 2015, Association for Computing Machinery, New York, NY, USA, 2015, pp. 1–10. ISBN 9781450339025. doi:10.1145/2815072.2815073.
S.Sundara, S.Das and R.Cyganiak, R2RML: RDB to RDF Mapping Language, W3C Recommendation, W3C, 2012, https://www.w3.org/TR/2012/REC-r2rml-20120927/.
31.
H.Thakkar, D.Punjani, J.Lehmann and S.Auer, Two for one: Querying property graph databases using SPARQL via gremlinator, in: Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), Houston, TX, USA, June 10, 2018, A.Arora, A.Bhattacharya, G.H.L.Fletcher, J.L.Larriba-Pey, S.Roy and R.West, eds, ACM, 2018, pp. 12:1–12:5. doi:10.1145/3210259.3210271.
32.
D.Tomaszuk, R.Angles and H.Thakkar, PGO: Describing property graphs in RDF, IEEE Access8 (2020), 118355–118369. doi:10.1109/ACCESS.2020.3002018.
33.
O.van Rest, S.Hong, J.Kim, X.Meng and H.Chafi, PGQL: A property graph query language, in: Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, GRADES ’16, Association for Computing Machinery, New York, NY, USA, 2016. ISBN 9781450347808. doi:10.1145/2960414.2960421.
34.
D.Wood, M.Lanthaler and R.Cyganiak, RDF 1.1 Concepts and Abstract Syntax, W3C Recommendation, W3C, 2014, https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/.
35.
R.Zhang, P.Liu, X.Guo, S.Li and X.Wang, A unified relational storage scheme for RDF and Property Graphs, in: Web Information Systems and Applications, W.Ni, X.Wang, W.Song and Y.Li, eds, Springer International Publishing, Cham, 2019, pp. 418–429. ISBN 978-3-030-30952-7. doi:10.1007/978-3-030-30952-7_41.