Sage Journals: Discover world-class research

Abstract

In recent years, semantic sensor networks are proposed, which apply ontologies to provide query capabilities and data access and allow users to express their needs at a conceptual level. In such sensor networks, a large amount of web ontologies are separately created by sensors to represent their own knowledge. These ontologies are distributed in different sensors and provide knowledge for semantic queries. It has become a very pressing issue to locate the desired ontologies for a given semantic query. To address this issue, we propose an approach based on structured peer-to-peer protocol to publish shareable ontologies on different sensors and automatically discover the ontologies useful for a given SPARQL query. Therefore, if a SPARQL query is given, our approach can locate ontologies desired and further send the query to them to find out solutions for the query. In addition, if solutions can be found out from a web ontology published, our approach makes sure to discover the ontology and get the solutions from it for the query. We conduct three experiments to evaluate the approach, the results of which demonstrate that our approach is effective and efficient.

Keywords

Ontology publication ontology location SPARQL query peer-to-peer network semantic sensor network

Introduction

Ontologies are formal, explicit specifications of a shared conceptualization to represent domain knowledge,¹ which are widely applied recently to represent knowledge in the Semantic Web. So the ontologies are also called web ontologies. With the rapid development of the semantic network, some W3C Recommendations for web ontology such as RDF, OWL,² and SPARQL³ are widely used. In fact, a large number of web ontologies can be discovered via Swoogle,⁴ which is a search engine for web ontologies. These ontologies can serve for web-based recommendation systems and question answering systems.^5,6 With the increase of practicality of ontology technology, it has been increasingly applied to sensor networks in recent years to provide query capabilities and data access and to allow users to express their needs at a conceptual level.^7–9 These sensor networks are also called as semantic sensor networks.^7,9,10 In general, a large number of web ontologies in semantic sensor networks are separately created by sensors to represent their own knowledge of a given domain.^9,11–13 These ontologies in semantic sensor networks can also serve as a knowledge base for question answering systems shared by users of sensor networks.

However, an unsolved issue is how to efficiently discover or locate the required ontologies when a semantic query is given to a semantic sensor network. In recent years, a great deal of attention is paid to this issue in practice as well as in research.^8,9,14,15 At present, most of the approaches to deal with this issue are based on a client–server (C/S) architecture. Within the approaches, knowledge resources, that is, ontologies, are all collected and stored in some centralized servers. Users can query and share the ontologies under some kinds of centralized control. These approaches are often considered ineffective and inappropriate to share knowledge^6,16,17 because they do not meet the dynamic and autonomous requirements of knowledge management in semantic sensor networks.^9,14 In this article, we introduce knowledge sharing in a decentralized architecture for semantic sensor networks based on existing peer-to-peer (P2P) protocols. These approaches typically organize the sensors with ontologies as an unstructured P2P network according to the similarity of their knowledge in ontologies located on different sensors. If a semantic query is given, these approaches try to route the query to the sensors, where the ontologies may be checked for answers for the query. There is a daunting task that these approaches need to calculate the similarity of knowledge in different ontologies on different sensors. In addition, unstructured P2P networks also limit their scalability and effectiveness.

In this article, we propose an approach to publish and automatically discover the relevant ontologies on different sensors for a given semantic query based on the structured P2P protocol.¹⁸ In our approach, if a sensor has some shareable ontologies, it can directly publish them by structured P2P protocol according to the entities and their attributes appearing in the ontologies. If a sensor is given a semantic query, the approach can enable the sensors as a requestor agent to seek out the useful ontologies efficiently and then send the query to the other sensors, where at least one of the ontologies may be checked for answers for the query, respectively. The approach makes sure that sensors can find out all the related ontologies published for the given query. It provides the users for a semantic sensor network with the capability to automatically discover and share ontologies which are useful to process their semantic query. In this article, we conduct three experiments to evaluate the effectiveness and efficiency of our approach. The results of the three experiments show that our approach is efficient and effective.

The remainder of this article is organized as follows. Section “Basic ideas and overview of our approach” discusses the basic ideas of our approach. Section “Process of ontology location” presents the discovering process for useful ontology in a semantic sensor network. Section “Implementation of our approach” discusses the implementation of our approach. Section “Evaluation” presents our experiments. Section “Related work” addresses the related work. Section “conclusion” draws the conclusion.

Basic ideas and overview of our approach

In this section, in order to clearly describe the basic design ideas of our approach, we first discuss the related techniques, such as web ontology and P2P network. Then, we present an overview of our approach. To facilitate the description of our approach, without loss of generality, in the remainder of this article a sensor is also called a node.

Web ontology and basic idea for its publication

In accordance with W3C Recommendations, a web ontology is a formal description of a given domain, which defines some entities and their relationships to represent knowledge in a specific domain. In web ontology, the entities, which include properties, classes, and individuals, constitute the primitive terms to express the basic notions in the domain. For example, the entity http://www.ourexample.com/exp_onto.owl#Student in an ontology can represent all students in the described domain. Expressions in web ontology are used to present the complex notions composed by entities. Axioms in web ontology are the statements that are all asserted to be true, which are used to expressions and entities to state the explicit facts.

Essentially, web ontologies can be viewed as a group of entities. For a given ontology, we can take the entities and their properties as the indices of the ontology. Then according to each index we can publish and retrieve the ontology on a structured P2P network. This is our idea to publish and share web ontology on a semantic sensor network. We will address its specific implementation in detail in section “Overview of our approach.”

P2P networks and their application

P2P network systems usually have a large number of autonomous nodes, also known as peer nodes, which can be accessed directly in an open distributed environment. In the P2P system, each P2P node has the same degree of autonomy and can provide services to other nodes and obtain services from other nodes. Thus, a P2P system usually does not need any hierarchical organization or central control structure. It can be a good solution to centralized system performance bottlenecks and single point of failure. Usually, it is a scalable, high-performance service sharing architecture.¹⁹ The current P2P systems are widely applied to file sharing, multimedia transmission, real-time communication, collaborative work, distributed storage, and distributed computing. According to different resource discovery mechanisms, P2P systems can be divided into two types: structured and unstructured P2P systems. Unstructured P2P systems typically organize autonomous nodes in the system into random topologies, where the edges in the graph represent the neighbor relationships between nodes. These systems often use flooding or random walking to find the shared resources provided by other nodes on the topology map.

But the efficiency of the unstructured P2P systems is not high in a large-scale network environment. Structured P2P systems typically organize autonomous nodes into an ordered graph in the system and specify a node that is responsible for the distribution of the resource for any of the shareable resources on any node according to the resource’s attributes. Therefore, the structured P2P system can find the node responsible for the information released according to the attributes of the desired resource. This kind of P2P system can provide a very efficient resource sharing mechanism, suitable for large-scale network application environment.

Chord¹⁸ is a typical structured P2P system, which uses the consistent hash²⁰ to specify a unique value as the key for each autonomous node. Then, it organizes the nodes as a ring according to the order of their keys. For any shareable resource r on any node with a property p, the first Chord can use the same consistency hash function to generate a unique value k for property p. Then, according to the unique value k, Chord can specify a specific node N to store property p and resource r as a pair <p, r>. Here, resource r is usually represented as a URI of the resource. The process is called resource publication.

Thus, if the resource with property p is desired, Chord can use the same consistency hash function to calculate the hash value k of property p. Then, according to the value k, the node N can be found out, which is responsible for the value k. Thus, Chord can fetch all the ordinals associated with the value p from node N and further obtain the corresponding resource from the ordinals. The process is called discovery of resources. In fact, resource discovery on Chord is very efficient. It can find the desired resources within only log(n) messages, where n is the number of nodes in the system. Moreover, Chord can provide very good capabilities of fault recovery.

With advantages of resource publication and discovery, here we apply Chord to our approach for ontology publication and location on a semantic sensor network. So, first we design two functions according to Chord protocol as follows:

publicOnto(idx, otg) is used to publish URI of ontology otg based on the index idx, which are the entities described in ontology otg. Chord can assign index idx a key, and then saves index idx and ontology otg as a pair <idx, otg> on a specific node.

locateOnto(idx) is used to get all the ontologies, where these ontologies are published by Chord based on an entity as index idx by some nodes.

Overview of our approach

In this article, our approach takes the sensors in a semantic sensor network as nodes to construct a Chord P2P network and together maintain the information of ontology publication to discover relevant ontologies for SPARQL query processing. If any node with some ontologies joins to the P2P system, it can publish its shareable ontologies as follows:

For each ontology otg, get all the entities in it as a set S. These entities in set S and the corresponding roles in their appearance in ontology otg can represent the knowledge it describes.

For each entity ety in set S, create a corresponding index idx and then use the function publicOnto(idx, otg) to publish ontology otg, which refers to a URI of ontology otg here.

Once a node Nd receives a SPARQL query Qry, it should automatically collaborate with other nodes to find related ontologies to process query Qry as follows:

First, the node Nd parses query Qry to create the indices for query Qry. The indices can represent what knowledge an ontology that may be checked out the query should have.

According to the indices and the function locateOnto(idx) defined in section “P2P networks and their application” with some strategies, the node Nd locates the ontologies, just from which the answers of query Qry can be checked.

For each located ontology otg, the node N sends the whole query Qry to node ndO where ontology otg locates.

The node ndO reasons its ontology otg to get solutions for query Qry and then returns all the solutions to node Nd.

Finally, node Nd collects all the solutions returned by other nodes and reorganizes them according to query Qry’s requirements and sends to its requestor.

The specific methods to create the indices for ontologies and SPARQL query for publication and location and the specific strategies to discover ontologies for SPARQL query processing will be discussed in detail in the following section.

Process of ontology location

In this section, first we discuss the basics of SPARQL and our basic ideas of creating indices for SPARQL query to discover the related ontologies. Then, we address our process to publish ontologies and our algorithm to discover ontologies for a given query.

SPARQL basics

As a recommendation for the W3C, SPARQL³ is a query language and data acquisition protocol designed for the RDF data model,²¹ which can be used to query any data represented by the RDF data model. Since web ontologies in Semantic Web which are compliant to OWL² or RDFS²² are built on the RDF data model, web ontologies can be queried by SPARQL query.

Each SPARQL query has a matching pattern that consists of one or more pattern clauses. Any matching pattern of SPARQL query can be automatically transformed into a semantic equivalent matching pattern, in which the pattern clauses are all triples and also called triplet pattern.³ For example, in Figures 1 and 2 the conversion is shown. The triples in the triplet pattern are similar to RDF triples, except that the predicate, subject, or object may be replaced by the variables to be queried.

Figure 1.

Matching pattern with initial pattern clauses in a SPARQL query.

Figure 2.

Triplet pattern of matching pattern in Figure 1.

A matching pattern of a query is also called a graph pattern which is used to match a subgraph of the RDF model being queried. In the matching process, RDF terms in subgraph are substituted for the variables in the corresponding graph pattern. The solutions of the query are parts of the RDF graph. In a SPARQL query, graph pattern can be divided into the following five different categories: Alternative Graph Pattern, Basic Graph Pattern, Optional Graph pattern, Named Graph Pattern and Group Graph Pattern. The complex graph pattern in the SPARQL query is nested by a combination of some basic matching patterns. Thus, SPARQL can express any complex graph pattern. In a nested pattern of a SPARQL query, the type of the outermost matching pattern is called the master graph pattern of the query.

Basic Graph Pattern is a matching pattern consisting of one pattern clause to be matched, while Group Graph Pattern usually consists of several pattern clauses to be all matched. Thus, each entity appearing in the matching pattern of a query with Basic Graph Pattern or Group Graph Pattern must appear in an RDF graph if any answers can be checked out from the RDF graph for the query. In addition, if the graph pattern of the SPARQL query is converted into a triplet pattern, for any RDF terms and their roles in a specific triple, the RDF graph must have at least one triple specified or implied, where the RDF term is the same role triple. This is an important clue to discover useful ontology for a query to be processed.

Optional Graph Pattern usually contains an optional pattern to extend the solutions for the pattern’s query. Named Graph Pattern is used to designate the RDF graphs to be matched against. Thus, these patterns cannot provide useful clues to discover ontologies useful for its query. Alternative Graph Pattern usually has two or more alternative patterns to be tried. To mine the specific clues to discover ontologies useful from Alternative Graph Pattern, we should decompose Alternative Graph Pattern in a SPARQL query. In fact, if a SPARQL query with an Alternative Graph Pattern is given, we can break it into several sub-queries without Alternative Graph Pattern. The process to break the query with Alternative Graph Pattern is listed as follows:

Convert the query’s graph pattern into a semantically equivalent graph pattern, where Alternative Graph Pattern is the query’s outermost graph pattern. For example, in Figures 3 and 4 the conversion is shown.

Take each alternative part of the query pattern as an independent graph pattern to create a new sub-query for the initial query.

Figure 3.

A query pattern with Alternative Graph Pattern.

Figure 4.

A query pattern with Alternative Graph Pattern as outermost graph pattern.

There is no doubt that, if a query Q is broken into several sub-queries based on the process as mentioned above, the results of query Q are union of each sub-query’s results. Moreover, query Q guarantees that the same answer as the sub-query can be checked from the web ontology. Thus, the ontologies discovered for each sub-query can be put together as the ontologies useful to the initial query.

Given query Q, if we take pattern clauses appearing in Basic, Group, or Alternative Graph Pattern and the parts that are specified with any one of the keywords FILTER, OPTIONAL, and GRAPH, as atomic formulas, then the graph pattern of query Q can be viewed as a logical expression. So we can convert the logical expression into disjunctive normal form (DNF), which is a logical formula with a disjunction of conjunctive clauses. Then we obtain a semantically equivalent query pattern of query Q, which takes Alternative Graph Pattern as outermost graph pattern. Finally, for each conjunctive clause in the DNF we can create a sub-query for query Q.

Index creation for SPARQL query

According to the semantics of SPARQL query discussed above, we can design a method to create indices for a given SPARQL query to discover useful ontologies which can be checked out for the query as answers. In our method, we just focus on the SPARQL query without Alternative Graph Pattern because a query with Alternative Pattern can be broken into several semantic equivalence sub-queries without Alternative Pattern to substitute for the query as discussed above. The method consists of the following steps:

Step 1. If a SPARQL query is given, extract the query’s graph pattern GHP.

Step 2. From pattern GHP, remove the parts that are syntactically specified with any one of the keywords OPTIONAL, GRAPH, and FILTER and obtain a refined pattern GP.

Step 3. Convert pattern GP into a triplet pattern TP.

Step 4. From each triple tpl in triplet pattern TP, take out the entities, that is, terms, which are not literals, variables, or blank nodes. Then, for each entity ety in the triple tpl, according to its role rl as one of the predicate, object, and subject in the triple tpl record entities ety and its role rl as a pair <ety, rl>.

So, given a triple pattern of a query we can get a set of pairs, called erPairs, where each pair refers to an entity and its corresponding role in a corresponding triple in the query’s triple pattern. For example, for the triple pattern in Figure 2 the erPairs is

{<rdf:type, predicate>, <dcc:textbook, object>, <dcc:title, predicate>, <“Semantic Web Tutorial”, object>, <dcc:author, predicate>, <dcc:team, object>}.

Step 5. Remove pairs from erPairs, the entities of which are RDFS, OWL, or RDF’s vocabularies because these kinds of RDF terms are meta-language elements, which almost appear in every web ontology. The RDF terms in the remaining pairs in erPairs are all entities, which are individuals, properties, or classes in domain.

Step 6. Replace namespace prefix with its IRI for each entity in erPairs so that each element is represented unambiguously.

Here, each pair in erPairs can be taken as an index of the given SPARQL query. According to the semantics of SPARQL as discussed in section “SPARQL basics,” we can address a theorem.

Theorem

If an ontology is likely to be checked out as a part of solutions for a given query, for each pair Pr in the pair set erPairs of the query, the ontology must clearly specify or imply a triple tpl, which includes the entity ety in the pair Pr, and the entity e’s role in the triple tpl is the same as the role in the pair Pr.

For any pair Pr in the query’s erPairs, if an ontology neither specifies nor implies such a triple T, this means that a triple in the triplet pattern of the query must not be matched in the ontology. Therefore, any solutions with this ontology cannot be reasoned out for the query. In the following subsections, we discuss how to apply the conclusion to ontology publication and location.

Publication of web ontology

To efficiently discover ontologies according to the indices created for a given SPARQL query and the conclusion as addressed above, we can design a method to publish an ontology based on entities and their possible roles appearing. The method is as follows:

Step 1. Given an ontology onto, extract all the entities appearing in the ontology onto as a set eSet.

Step 2. For each entity ety in the set eSet, check the ontology onto to decide whether there are triples where the role of entity ety is subject (or object, or predicate). If such a triple is present, put ety into sSet (or oSet, or pSet).

Step 3. Take out each element emt in the set sSet (or oSet, or pSet) and its role subject (or object, or predicate) as a pair <emt, subject> to make an index idx for the ontology onto.

Step 4. According to each index idx, publish the ontology onto using the function publicOnto(idx, onto) as discussed above.

Location of web ontology for a given query

According to the method of ontology publication in section “Publication of web ontology” and conclusion in section “Index creation for SPARQL query,” we can design algorithm oLocation in Figure 5 to discover the useful ontologies for a given SPARQL query. The basic idea of the algorithm is that for a given query Q, we discover the ontologies based on each pair pr in its erPairs and then obtain their intersection as the useful ontologies to the query.

Figure 5.

The algorithm oLocation for ontology location for a query.

In this algorithm, erPairs in line 6 is a pair set defined in section “Index creation for SPARQL query.” The function locateOnto(idx) in line 10 is designed in section “Web ontology and basic idea for its publication” to discover useful ontologies from a semantic sensor network.

In practice, the import relationship may exist between two ontologies. An ontology may be imported by another ontology as an independent part. If the ontologies A and B are two useful ontologies for a given query and ontology B is imported by ontology A, then ontology B is not necessary to check for the query because ontology B will be loaded as a part of ontology A.

To reduce the redundant ontology, we must remove the ontologies which are directly or indirectly imported by other ontology in the useful ontology set of a query. So, if the URLs of the ontologies directly imported by ontology O are published according to ontology O’s URL using the function publicOnto in section “P2P networks and their application” in the publication process of ontology O, we can design an algorithm in Figure 6 to remove redundant ontologies. Its strategy is that, given a set S of ontologies, for each ontology O remained in S, get the ontologies directly or indirectly imported by ontology O and remove them from the set S if they appear in the set S.

Figure 6.

The algorithm to reduce the redundant ontology.

In this algorithm, the function locateOnto(idx) in lines 8 and 12 is defined in section “P2P networks and their application.” In practice, before the algorithm oLocation returns ontos, ontos should be refined by the algorithm removeRedundantOnto. If we only focus on the network resources the algorithms consume, we can define the number T that the algorithms access the P2P network to process a given query as follows

$T \leq | e r P a i r s | + \sum_{i = 1}^{| o n t o s |} | o n t o S e t_{i} | + | o n t o s |$

In the formula, |erPairs| refers to the number of pairs in erPairs of a given query in algorithm oLocation and |ontoSet_i| refers to the number of elements in ontoSet which is corresponding to a given element in the set ontos in algorithm removeRedundantOnto.

Implementation of our approach

In this section, we design the architecture for our approach in a sensor node and outline the functionality of each component in the architecture. As shown in Figure 7, it consists of one Ontology Repository and six brokers, which are as follows:

Ontology Repository. It holds zero or more local ontologies, which are maintained by a sensor node.

Query Parser. It parses SPARQL query and distributes it. When receiving a SPARQL query Q from the requestor, first if necessary it breaks Q into several sub-queries. Then, for query Q or its sub-queries, it parses out the corresponding erPairs defined in section “Index creation for SPARQL query.” Next, it sends the erPairs to Ontology Discoverer to find out the ontologies (as the set ontoS) useful to query Q. Finally, it sends query Q and ontoS to Query Processor.

Ontology Discoverer. Given the erPairs of a query Q or its sub-queries, it finds out all the related ontologies as the set ontoS for query Q.

Query Processor. It constructs solutions for a query Q based on the answers which is checked from each related ontology. When receiving a query Q and its related ontology set ontoS, first it extracts all the URLs of each ontology O from ontoS and gets the node N where ontology O locates. Then it sends query Q and the URI of ontology O to the Inference Engine of node N. Next, it collects all the results from node N. Finally, when it has collected all the results from each ontology in ontoS, it reorganizes them and sends to the requestor. In practice, if the SPARQL keywords, such as ORDER BY, DISTINCT, REDUCED, OFFSET, and LIMIT, appear in query Q, in order to give requestor the sound results, query Q must be processed before it is sent to related ontologies and solutions must be reorganized before it is sent to the requestor. In this article, we do not discuss it in detail.

Inference Engine. Given a query and an URL of ontology, it checks the local ontology corresponding to the URI to get answers for the query.

Ontology Publisher. It is used to publish the shareable ontologies in repository.

P2P Node. It implements the two functions to publish and locate ontologies based on Chord protocol as discussed in section “P2P networks and their application.”

Figure 7.

Our approach’s architecture in a sensor node.

In order to evaluate our approach, we implement the architecture of our approach. We suppose that all the ontologies in a semantic sensor network are in compliance with OWL standard. We use the open source development kits Pellet,²³ Jena,²⁴ and Open Chord²⁵ for our implementation. Jena is a Semantic Web framework to provide a Java application programming interface (API) to extract data from and write to RDF graphs. Pellet is a Java-based OWL description logic (DL) reasoner in conjunction with both Jena and OWL API libraries. Open Chord is an implementation for Chord protocol, which provides an interface for Java applications to take part as a peer for P2P network.

Evaluation

In this section, we design three experiments to evaluate our approach’s ability, effectiveness, and efficiency as follows:

The first experiment is designed to evaluate our approach’s ability to achieve solutions for a given query in a given context.

The second experiment is designed to evaluate our approach’s requirement of network resources for ontology publication.

The third experiment is designed to evaluate our approach’s requirement of network resources for query processing.

Network resources in this article are the number of times to access the P2P network and the quantities of the items publishing on or retrieving from the P2P network when ontology on the sensor node is published, or a semantic query is processed.

Experiment setup

In order to distributedly store and share an RDF model based on the structured P2P network, several approaches^26–30 are presented as distributed RDF repositories, such as RDFPeers,²⁶ which can save a triple at three nodes by applying hash functions to its predicate, subject, and object, respectively. All the nodes in the P2P network know which nodes are responsible for the triples they are searching for according to their predicate, subject, or object. If the triples exist in our semantic sensor network, the approach ensures that the RDF triples will be found. Based on the ideas above, we can redesign two methods to publish ontologies and process SPARQL query as follows:

The first method (M2) is that, if a web ontology is given, it just publishes all the specified RDF triples explicitly in the ontology. Then, for a SPARQL query, from the P2P network it should retrieve the connected sub-graph of each entity in the query’s graph pattern so as to create a temporary web ontology to process it.

The second method (M3) is that, if a web ontology is given, it publishes all the RDF triples specified and implied in the web ontology. Because all the possible RDF triples are published, based on the entities in the graph pattern in a SPARQL query, it can obtain all the related RDF triples. Then a temporary web ontology is created for the query, and it can reason out all the possible results for the query.

Similar to our approach (M1) in this article, if a SPARQL query is given and a web ontology is published to reason out solutions for it, approach M2 and approach M3 guarantee to achieve the solutions. As applications based on distributed P2P network always consume network resources, such as network accessing, items publishing or retrieving from the network, it will be superior if an application consumes fewer network resources for a same task. Thus, in our experiments the performance of our approach M1 is compared with that of the approaches M2 and M3 based on the following aspects: when an ontology is published, how many times they should access the P2P network, and how many items they should publish, respectively; and when a query is processed, how many times they should access the network, and how many items they should retrieve, respectively.

In addition, in our experiments first 16 ontologies are downloaded from the ontology repository TONES³¹ as our experimental data listed in Table 1.

Table 1.

Web ontologies downloaded from the ontology repository TONES.

No.	URI of ontologies from TONES
O1	http://keg.cs.tsinghua.edu.cn/hands-on/people.owl
O2	http://keg.cs.tsinghua.edu.cn/ontology/software
O3	http://www.mindswap.org/ontologies/family.owl
O4	http://www.co-ode.org/ontologies/pizza/pizza.owl
O5	http://www.owl-ontologies.com/Movie.owl
O6	http://www.bpiresearch.com/BPMO/2004/03/03/cdl/Countries
O7	http://www.semanticweb.org/ontologies/2007/9/AirSystem.owl
O8	http://protege.stanford.edu/plugins/owl/owl-library/koala.owl
O9	http://www.loa-cnr.it/ontologies/DUL.owl
O10	http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl
O11	http://www.mindswap.org/ontologies/debugging/university.owl
O12	http://www.co-ode.org/amino-acid/2006/05/18/amino-acid.owl
O13	http://www.mindswap.org/dav/commonsense/food/foodswap.owl
O14	http://www.estrellaproject.org/lkif-core/role.owl
O15	http://www.estrellaproject.org/lkif-core/lkif-top.owl
O16	http://www.semanticweb.org/ontolgies/chemical

Then 15 SPARQL queries are designed with the individuals, classes, and properties, which are all entities in ontologies in Table 1. Without loss of generality, each query’s graph pattern consists of three triples and three variables, the style of which is shown in Figure 8.

Figure 8.

Style of graph pattern of each query.

In our experiments, we do not discuss the experimental environment in detail because our experiments just focus on the number of times to access the P2P network and the quantities of item publication or retrieval, which do not involve in concrete experimental environment.

Analysis of results

Experiment 1 is conducted to evaluate our approach’s ability to get the solutions for a given query. In this experiment, for each SPARQL query in Figure 8 all the web ontologies in Table 2 are reasoned, respectively, to get solutions for the query. Obviously, a query’s solution that can be reasoned out is that all these ontologies can provide. For each query, the total of the solutions is shown in row M in Table 2.

Table 2.

Totals of each query’s solutions obtained by our approach.

	Q1	Q2	Q3	Q4	Q5	Q6	Q7	Q8	Q9	Q10	Q11	Q12	Q13	Q14	Q15
M1	28	6	18	14	0	20	168	12	26	36	0	1	6	420	210
M	28	6	18	14	0	20	168	12	26	36	0	1	6	420	210

Then, we publish all the ontologies in Table 1 by approach M1 and process each SPARQL query in Figure 8. For each query, we counted the number of solutions that our approach M1 gets, which are shown in row M1 in Table 2. In Table 2, the figures in row M1 are identical to the corresponding figures in row M. This means that our approach ensures that all ontologies that match the query will be checked out as the answer for the query.

Experiment 2 is designed to evaluate our approach’s requirement of network accesses for ontology publication. When each web ontology in Table 1 is published using the approaches M1, M2, and M3, respectively, we sum up items they insert into the P2P network. In fact, when an item is inserted into the P2P network by each of the three approaches, they access the P2P network one time. So we can believe that the total of the items published is the number of times to access the network for ontology publication.

The results of this experiment are shown in Figure 9. The bars M1, M2, and M3 in Figure 9 denote their numbers of items to be inserted into the P2P network (i.e. the number of times to access the P2P network) when a corresponding web ontology is published using the approaches M1, M2, and M3, respectively.

Figure 9.

Numbers of network accesses for ontology publication with the approaches M2 and M3 to M1.

Figure 9 shows that the number of accesses to the network for the approaches M2 and M3 are always far more than that of the approach M1 when publishing web ontologies. The multiple is shown in Figure 10. For approach M2, the range of multiples is between 1.09 and 16 which has an average value of 4.82; for approach M3, the range is between 6.522 and 112.7 which has an average value of 49.36. It means that the approach M1 can save a lot of network resources on the publication of ontologies. The reason is that, ontology publication using approach M1 is just based on entities in ontology, while approach M2 must publish each specified triple three times and approach M3 must publish all of the possible triples three times too. As a knowledge base, an ontology usually contains a mass of triples that must be published by M2 and M3. Especially, it can reason out more RDF triples implied, which must be published by M3.

Figure 10.

Multiples of items published for each ontology from M2 and M3 to M1.

Experiment 3 is designed to evaluate our approach’s requirement of network accesses for query processing. In this experiment, three tests are conducted. In the first test, all ontologies in Table 1 are published using approach M1 and then our 15 SPARQL queries constructed are processed. Once a query is processed, the quantities of the items retrieved and the number of times to access the P2P network are recorded. In the second and the third one, the same is done, respectively, using the approaches M2 and M3. The rows M1, M2, and M3 in Table 3, respectively, record the number of times to access the network for query processing with the approaches M1, M2, and M3.

Table 3.

Requirement of network resources for query processing with approaches M1, M2, and M3.

Query	Number of times to access network			Numbers of items retrieved the from network
Query	M1	M2	M3	M1	M2	M3
Q1	5	1383	6	5	2009	8547
Q2	6	1383	6	6	1726	8547
Q3	6	22	6	6	403	104
Q4	5	56	6	5	2614	208
Q5	5	129	5	5	1708	499
Q6	5	171	4	8	1639	1025
Q7	5	124	5	5	1692	564
Q8	6	231	6	6	1774	946
Q9	5	231	6	5	504	946
Q10	5	171	5	6	1735	1025
Q11	4	1383	5	4	1789	8547
Q12	5	98	5	5	229	429
Q13	5	98	6	5	1821	429
Q14	5	129	6	5	2022	499
Q15	4	129	5	4	1926	499
Total	76	5738	82	80	23,591	32,814

From Table 3, we can find out that the approaches M2 and M3 usually need more number of accesses to the P2P network than approach M1 for query processing. The range of the multiples from approach M2 to M1 is between 3.67 and 345.8 which has an average value of 75.5. It is because approach M2 has to access the network iteratively according to the entities in the RDF sub-graphs required when getting the useful sub-graphs for a query, while approach M1 just uses the entities in the query to discover the desired ontologies. On the other hand, the range of the multiples from approach M2 to M3 is between 0.8 and 1.25 and has an average of 1.08. As a matter of fact, the numbers of approach M1 in Table 3 are almost identical to the corresponding ones of approach M3. Their differences are slight. It is because, if a query is given, approaches M1 and M3 are all just according to the entities in the query to locate the ontologies desired or retrieve relevant triples. But approach M1 usually ignores RDF terms of RDF, RDFS, or OWL vocabularies, while approach M2 cannot omit them. However, approach M1 must retrieve ontologies imported for each ontology discovered, while approach M2 does not need to do so. Thus, their numbers of times to access the network for a query do not show very large difference.

In addition, Figure 11 illustrates the numbers of the items retrieved from the network according to different approaches. The bars M1, M2, and M3 in Figure 11 denote the numbers of the items retrieved with the approaches M1, M2, and M3, respectively, when a SPARQL query is processed. Figure 10 indicates that the approaches M2 and M3 always retrieve far more items than the approach M2 for SPARQL query processing.

Figure 11.

Numbers of items retrieved from the network for each query using the approaches M1, M2, and M3.

Given a query, the approaches M2 and M3 actually need many times the items to be retrieved than approach M1. In Figure 12, the multiples are shown. The line M2 shows that the range of the multiples is between 17.33 and 2136.8 and has an average value of 410.18. The line M3 shows that the range is between 45.8 and 447.25 and has an average of 294.89.

Figure 12.

Multiples of numbers of items retrieved for each query from M2 and M3 to M1.

It can be seen that the amount of data the approach M1 needs to retrieve from the network is less than that of the approaches M2 and M3. This is because the approaches M2 and M3 have to get all triples related to each entity in connected sub-graphs desired, while the two approaches have published all the triples specified or implied according to each entity in the ontologies published, and a large number of triples will be retrieved for a query.

Based on the three experiments, we can find that our approach is very efficient and effective. When publishing an ontology or processing a query, it just needs only a small amount of network access and a small amount of data publication and acquisition. This means that it only consumes less network resources. It is obviously superior.

Related work

In recent years, semantic sensor networks are proposed and apply ontology to provide query capabilities and data access. At present, research on semantic sensor networks is very extensive. Li and Taylor⁷ offer a semantic service-oriented framework for sensor network services that aims to improve query processing. Their framework focuses on query processing to allow distributed end users to request streams of interest easily and efficiently, based on the principle of pushing the query down to the network nodes as much as possible. Corcho et al.⁹ propose an ontology-based approach for providing data access and query capabilities to streaming data sources. In addition, they describe the theoretical foundations and technologies that enable exposing semantically enriched sensor metadata, and querying sensor observations through SPARQL extensions, using query rewriting and data translation techniques according to mapping languages, and managing both pull and push delivery modes. Huang and Javed⁸ propose a Semantic Web architecture to allow the sensor data to be understood and processed in a meaningful way by a variety of applications with different purposes. In the architecture, they develop ontologies for sensor data and use the Jena API for processing which includes querying and inference over the sensor data. Barnaghi et al.¹³ propose the common standards and logical description frameworks based on the Semantic Web community to create a sensor data description model. In the frameworks, they design a sensor data ontology which is created based on the Sensor Web Enablement (SWE) and SensorML data component models and describe how the semantic relationship and operational constraints are deployed in a uniform structure to describe the heterogeneous sensor data.

In addition, technically similar to the research work in this article, distributed approaches for knowledge sharing on Semantic Web, especially based on a P2P network, are attracting more and more attention in recent years. Tian et al.³² present the approach named Semantic Peer based on P2P techniques, which provide an ontology-based P2P lookup service. First, it divides ontology into private and common ontologies. Then they extend the Chord protocol with an index table and an express table to publish resources described in private and common ontologies, respectively. Given semantic query, it can route to a node with a private ontology which can be used to process the query. Similarly, Gao et al.³³ present a similar approach which can find out the node’s responsibility for the kind of resource according to the classification of a target resource. But these approaches depend on a common ontology or a unified classification of resources.

The method proposed in this article is completely different from the above approaches. It does not rely on common ontologies or a given proposal for ontology description when publishing web ontologies on a structural P2P network, but just based on the entities being described in these ontologies.

Moreover, Cai and Frank²⁶ and Pellegrino et al.²⁷ present a scalable distributed RDF storage and sharing approach, which publishes each RDF triple at three nodes on a structural P2P network by applying hash functions according to its predicate, subject, and object. Therefore, the method ensures that all queries can find the desired triples. However, if RDF triples are directly published, it does not support semantic retrieval because implicit triplets will not be inferred. To address this issue, Kohigashi et al.²⁸ focus on class hierarchies of RDF resources. But their approach looks for RDF triples just based on the class hierarchies and some problems such as load balance and reliability will be encountered when RDF triples are distributed on a P2P network. For this issue, Z Kaoudi et al.²⁹ and G Rizzo et al.³⁰ presented methods for distributed and reliable RDF storage, respectively. But they do not focus on semantic query processing. Although these approaches can make sure that RDF triples in different web ontologies can be directly shared but it is difficult for processing semantic query with these approaches.

In fact, many works have been carried out that just focus on distributed management and sharing of knowledge based on unstructured P2P network in a virtual community. Wang et al.³⁴ propose an approach for knowledge sharing in a virtual community. Their approach organizes the nodes with ontologies as an unstructured P2P network. When a node receives a semantic query, it will send the query to its neighbors until the desired ontologies are discovered for the query. Similar approaches are also discussed in previous works,^6,35–37 which are all based on unstructured P2P. The differences among them are the strategies to construct their unstructured P2P. But they all try to create a connection between the nodes with similar knowledge so as to improve semantic query’s routing efficiency. However, unstructured P2P also limits their efficiency to locate ontologies due to their routing mechanism. Yin et al.³⁸ and Xu and Yin,³⁹ respectively, discuss the collaborative location sharing of content and services. In a broader field, Yu and colleagues^40,41 use a learning-based approach to classify and share information resources.

In fact, distributed ontology techniques^42–44 are presented and discussed in recent years. But they just focus on web ontology integration, mapping, and distributed reasoning.

Conclusion

In this article, we propose an approach that is based on a structured P2P network to publish and discover shareable and related ontologies and to process SPARQL queries for knowledge sharing in a semantic sensor network automatically. For a given SPARQL query, our approach makes sure to find out all the shareable ontologies published in a semantic sensor network, which can be reason out solutions for the query. So the semantic sensors in our approach can publish their ontologies to share with other sensors and discover useful ontologies from other sensors in the network when processing their SPARQL queries. It is particularly suitable for knowledge sharing in a semantic sensor network, where the numbers of ontologies are created, respectively, by sensors independently and shared and leveraged for their own purposes.

We also conducted three experiments to evaluate the approach’s effectiveness and efficiency. Our experimental results show that it is obviously superior. In the near future, we plan to continue our research work in the following aspects:

SPARQL query’s graph pattern should be investigated further so as to find out other clues to more efficiently locate the desired ontologies rather than the entities and their roles appearing in it.

It should be tried to map words in natural language to existing entities in ontologies to facilitate users to automatically generate their semantic queries.

Footnotes

Handling Editor: Francisco Vasques

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: This work was supported by the National Natural Science Foundation of China under Grant Nos 61472112 and 61672200 and the National Key Technology R&D Program under Grant No. 2015BAH17F02.

References

Maedche

Ontology learning for the semantic web. New York: Springer Science & Business Media, 2012.

Motik

Cuenca Grau

Horrocks

et al . OWL 2 web ontology language: profiles. W3C Recommendation. 27October2009.

Harris

Seaborne

Prud’hommeaux

SPARQL 1.1 query language. W3C Recommendation. 2013, p.21.

Singh

Jain

Information retrieval (IR) through semantic web (SW): an overview. arXiv:1403.7162, 2014, https://arxiv.org/abs/1403.7162

Zhong

Liu

Yao

Advances in web intelligence. Beijing, China: Higher Education Press, 2011.

Zhen

Jiang

Song

Distributed recommender for peer-to-peer knowledge sharing. Inform Sciences 2010; 180(18): 3546–3561.

Taylor

. A framework for semantic sensor network services. In: Proceedings of the PhD Symposium at the 6th International Conference on Service Oriented Computing (ICSOC), Sydney, Australia, 1 December 2008, pp.347–361. Berlin: Springer.

Huang

Javed

MK.

Semantic sensor information description and processing. In: Proceedings of the 2nd international conference on sensor technologies and applications, Cap Esterel, 25–31 August 2008, pp.456–461. New York: IEEE.

Corcho

Calbimonte

Jeung

et al . Enabling query technologies for the semantic sensor web. Int J Semant Web Inf 2012; 8: 43–63.

10.

Imai

Hirota

Satake

et al . Semantic sensor network for physically grounded applications. In: Proceedings of the 9th international conference on control, automation, robotics and vision, Singapore, 5–8 December 2006, pp.1–6. New York: IEEE.

11.

Neuhaus

Compton

. The semantic sensor network ontology: a generic language to describe sensor assets. In: Proceedings of the AGILE pre-conference workshop challenges in geospatial data harmonisation, Hannover, 2 June 2009, pp.1–33.

12.

Compton

Henson

Lefort

et al . A survey of the semantic specification of sensors. In: Proceedings of the 2nd International Workshop on Semantic Sensor Networks (SSN09), collocated with the 8th International Semantic Web Conference (ISWC), Washington, DC, 26 October 2009, vol. 522, pp.17–32.

13.

Barnaghi

Meissner

Presser

et al . Sense and sens’ability: semantic data modelling for sensor networks. Ict-mobilesummit Conference 2009; 2009: 818–819.

14.

Corcho

García-Castro

Five challenges for the semantic sensor web. Semant Web 2010; 1(12): 121–125.

15.

Calbimonte

Jeung

Corcho

et al . Semantic sensor data search in a large-scale federated sensor network. International Conference on Semantic Sensor Networks, 2011; 839: 23–38.

16.

Panahi

Watson

Partridge

Towards tacit knowledge sharing over social web tools. J Knowl Manag 2013; 17(3): 379–397.

17.

Baddar

SAH

Merlo

Migliardi

Anomaly detection in computer networks: a state-of-the-art review. J Wirel Mob Netw Ubiq Comput Depend Appl 2014; 5(4): 29–64.

18.

Stoica

Morris

Liben-Nowell

et al . Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Trans Network 2003; 11(1): 17–32.

19.

Buyukkaya

Abdallah

Simon

A survey of peer-to-peer overlay approaches for networked virtual environments. Peer Peer Netw Appl 2015; 8(2): 276–300.

20.

Peng

Han

A consistent hashing based data redistribution algorithm. In: Proceedings of the international conference on intelligent science and big data engineering, Suzhou, China, 14–16 June 2015, pp.559–566. New York: Springer International Publishing.

21.

Prud’Hommeaux

Seaborne

SPARQL query language for RDF. W3C Recommendation. 2008, p.15.

22.

Franconi

Gutierrez

Mosca

et al . The logic of extensional RDFS. In: Proceedings of the international semantic web conference, Sydney, Australia, 21–25 October 2013, pp.101–116. Berlin: Springer.

23.

Khan

Kumar

OWL, RDF, RDFS inference derivation using Jena semantic framework pellet reasoner. In: Proceedings of the international conference on advances in engineering & technology research, Unnao, India, 1–2 August 2014, pp.1–8. New York: IEEE.

24.

Ameen

Khan

KUR

Rani

BP.

Reasoning in semantic web using Jena. Comput Eng Intell Syst 2014; 5(4): 39–47.

25.

Zave

Using lightweight modeling to understand chord. ACM SIGCOMM Comput Commun Rev 2012; 42(2): 49–57.

26.

Cai

Frank

. RDFPeers: a scalable distributed RDF repository based on a structured peer-to-peer network. In: Proceedings of the 13th international conference on world wide web (WWW), New York, NY, 17–20 May 2004, pp.650–657. New York: ACM.

27.

Pellegrino

Huet

Baude

et al . A distributed publish/subscribe system for RDF data. In: Proceedings of the 6th international conference on data management in cloud, grid and P2P systems, Prague, 28–29 August 2013, pp.39–50. Berlin: Springer.

28.

Kohigashi

Takahashi

Harumoto

et al . A peer-to-peer information sharing method for RDF triples based on RDF schema. In: Proceedings of the IWANN 2009 Workshops, Salamanca, Spain, 10–12 June 2009, pp.646–650. Berlin: Springer.

29.

Kaoudi

Kyzirakos

Koubarakis

. SPARQL query optimization on top of DHTs. In: Proceedings of the international semantic web conference (ISWC), Shanghai, China, 7 November 2010, pp.418–435. Berlin: Springer.

30.

Rizzo

Di Gregorio

Di Nunzio

et al . A peer-to-peer architecture for distributed and reliable RDF storage. In: Proceedings of the 1st international conference on networked digital technologies, Ostrava, 28–31 July 2009, pp.94–99. New York: IEEE.

31.

D’Aquin

Noy

NF.

Where to publish and find ontologies? A survey of ontology libraries. Web Semant 2012; 11: 96–111.

32.

Tian

Dai

SemanticPeer: an ontology-based P2P lookup service. In: Proceedings of the international conference on grid and cooperative computing (GCC), Shanghai, China, 7–10 December 2003, pp.464–467. Berlin: Springer.

33.

Gao

Qiu

et al . An interest-based P2P RDF query architecture. In: Proceedings of the 1st international conference on semantics, knowledge and grid, Beijing, China, 27–29 November 2005, p.11. New York: IEEE.

34.

Wang

C-Y

Yang

H-Y

Chou

ST.

Using peer-to-peer technology for knowledge sharing in communities of practices. Decis Support Syst 2008; 45: 528–540.

35.

Ardissono

Bosio

Context-dependent awareness support in open collaboration environments. User Model User-Adap 2012; 22(3): 223–254.

36.

Loizou

SK.

Intelligent support for knowledge sharing in virtual communities. Leeds: University of Leeds, 2010.

37.

Zhang

Ackerman

Adamic

Expertise networks in online communities: structure and algorithms. In: Proceedings of the 16th international conference on World Wide Web (WWW), Banff, Alberta, Canada, 8–12 May 2007, pp.221–230. New York: ACM.

38.

Yin

Deng

et al . Colbar: a collaborative location-based regularization framework for QoS prediction. Inform Sciences 2014; 265: 68–84.

39.

Yin

Collaborative recommendation with user generated content. Eng Appl Artif Intell 2015; 45: 281–294.

40.

Rui

Tao

Click prediction for web image reranking using multimodal sparse coding. IEEE T Image Process 2014; 23(5): 2019–2032.

41.

Yang

Fei

et al . Deep multimodal distance metric learning using click constraints for image ranking. IEEE T Cybernetics 2016; 47: 4014–4024.

42.

Mossakowski

Codescu

Neuhaus

et al . The distributed ontology, modeling and specification language (DOL). In: Koslow

Buchsbaum

(eds) The road to universal logic. Springer International Publishing, 2015, pp.489–520.

43.

Lee

Park

et al . An intelligent query processing for distributed ontologies. J Syst Software 2010; 83(1): 85–95.

44.

Jiang

Tang

Wang

et al . Representation and reasoning of context-dependant knowledge in distributed fuzzy ontologies. Expert Syst Appl 2010; 37(8): 6052–6060.