Abstract
Keywords
Introduction
Ontologies are formal, explicit specifications of a shared conceptualization to represent domain knowledge, 1 which are widely applied recently to represent knowledge in the Semantic Web. So the ontologies are also called web ontologies. With the rapid development of the semantic network, some W3C Recommendations for web ontology such as RDF, OWL, 2 and SPARQL 3 are widely used. In fact, a large number of web ontologies can be discovered via Swoogle, 4 which is a search engine for web ontologies. These ontologies can serve for web-based recommendation systems and question answering systems.5,6 With the increase of practicality of ontology technology, it has been increasingly applied to sensor networks in recent years to provide query capabilities and data access and to allow users to express their needs at a conceptual level.7–9 These sensor networks are also called as semantic sensor networks.7,9,10 In general, a large number of web ontologies in semantic sensor networks are separately created by sensors to represent their own knowledge of a given domain.9,11–13 These ontologies in semantic sensor networks can also serve as a knowledge base for question answering systems shared by users of sensor networks.
However, an unsolved issue is how to efficiently discover or locate the required ontologies when a semantic query is given to a semantic sensor network. In recent years, a great deal of attention is paid to this issue in practice as well as in research.8,9,14,15 At present, most of the approaches to deal with this issue are based on a client–server (C/S) architecture. Within the approaches, knowledge resources, that is, ontologies, are all collected and stored in some centralized servers. Users can query and share the ontologies under some kinds of centralized control. These approaches are often considered ineffective and inappropriate to share knowledge6,16,17 because they do not meet the dynamic and autonomous requirements of knowledge management in semantic sensor networks.9,14 In this article, we introduce knowledge sharing in a decentralized architecture for semantic sensor networks based on existing peer-to-peer (P2P) protocols. These approaches typically organize the sensors with ontologies as an unstructured P2P network according to the similarity of their knowledge in ontologies located on different sensors. If a semantic query is given, these approaches try to route the query to the sensors, where the ontologies may be checked for answers for the query. There is a daunting task that these approaches need to calculate the similarity of knowledge in different ontologies on different sensors. In addition, unstructured P2P networks also limit their scalability and effectiveness.
In this article, we propose an approach to publish and automatically discover the relevant ontologies on different sensors for a given semantic query based on the structured P2P protocol. 18 In our approach, if a sensor has some shareable ontologies, it can directly publish them by structured P2P protocol according to the entities and their attributes appearing in the ontologies. If a sensor is given a semantic query, the approach can enable the sensors as a requestor agent to seek out the useful ontologies efficiently and then send the query to the other sensors, where at least one of the ontologies may be checked for answers for the query, respectively. The approach makes sure that sensors can find out all the related ontologies published for the given query. It provides the users for a semantic sensor network with the capability to automatically discover and share ontologies which are useful to process their semantic query. In this article, we conduct three experiments to evaluate the effectiveness and efficiency of our approach. The results of the three experiments show that our approach is efficient and effective.
The remainder of this article is organized as follows. Section “Basic ideas and overview of our approach” discusses the basic ideas of our approach. Section “Process of ontology location” presents the discovering process for useful ontology in a semantic sensor network. Section “Implementation of our approach” discusses the implementation of our approach. Section “Evaluation” presents our experiments. Section “Related work” addresses the related work. Section “conclusion” draws the conclusion.
Basic ideas and overview of our approach
In this section, in order to clearly describe the basic design ideas of our approach, we first discuss the related techniques, such as web ontology and P2P network. Then, we present an overview of our approach. To facilitate the description of our approach, without loss of generality, in the remainder of this article a sensor is also called a node.
Web ontology and basic idea for its publication
In accordance with W3C Recommendations, a web ontology is a formal description of a given domain, which defines some entities and their relationships to represent knowledge in a specific domain. In web ontology, the entities, which include properties, classes, and individuals, constitute the primitive terms to express the basic notions in the domain. For example, the entity
Essentially, web ontologies can be viewed as a group of entities. For a given ontology, we can take the entities and their properties as the indices of the ontology. Then according to each index we can publish and retrieve the ontology on a structured P2P network. This is our idea to publish and share web ontology on a semantic sensor network. We will address its specific implementation in detail in section “Overview of our approach.”
P2P networks and their application
P2P network systems usually have a large number of autonomous nodes, also known as peer nodes, which can be accessed directly in an open distributed environment. In the P2P system, each P2P node has the same degree of autonomy and can provide services to other nodes and obtain services from other nodes. Thus, a P2P system usually does not need any hierarchical organization or central control structure. It can be a good solution to centralized system performance bottlenecks and single point of failure. Usually, it is a scalable, high-performance service sharing architecture. 19 The current P2P systems are widely applied to file sharing, multimedia transmission, real-time communication, collaborative work, distributed storage, and distributed computing. According to different resource discovery mechanisms, P2P systems can be divided into two types: structured and unstructured P2P systems. Unstructured P2P systems typically organize autonomous nodes in the system into random topologies, where the edges in the graph represent the neighbor relationships between nodes. These systems often use flooding or random walking to find the shared resources provided by other nodes on the topology map.
But the efficiency of the unstructured P2P systems is not high in a large-scale network environment. Structured P2P systems typically organize autonomous nodes into an ordered graph in the system and specify a node that is responsible for the distribution of the resource for any of the shareable resources on any node according to the resource’s attributes. Therefore, the structured P2P system can find the node responsible for the information released according to the attributes of the desired resource. This kind of P2P system can provide a very efficient resource sharing mechanism, suitable for large-scale network application environment.
Chord
18
is a typical structured P2P system, which uses the consistent hash
20
to specify a unique value as the key for each autonomous node. Then, it organizes the nodes as a ring according to the order of their keys. For any shareable resource
Thus, if the resource with property
With advantages of resource publication and discovery, here we apply Chord to our approach for ontology publication and location on a semantic sensor network. So, first we design two functions according to Chord protocol as follows:
publicOnto(
locateOnto(
Overview of our approach
In this article, our approach takes the sensors in a semantic sensor network as nodes to construct a Chord P2P network and together maintain the information of ontology publication to discover relevant ontologies for SPARQL query processing. If any node with some ontologies joins to the P2P system, it can publish its shareable ontologies as follows:
For each ontology
For each entity
Once a node
First, the node
According to the indices and the function locateOnto(
For each located ontology
The node
Finally, node
The specific methods to create the indices for ontologies and SPARQL query for publication and location and the specific strategies to discover ontologies for SPARQL query processing will be discussed in detail in the following section.
Process of ontology location
In this section, first we discuss the basics of SPARQL and our basic ideas of creating indices for SPARQL query to discover the related ontologies. Then, we address our process to publish ontologies and our algorithm to discover ontologies for a given query.
SPARQL basics
As a recommendation for the W3C, SPARQL 3 is a query language and data acquisition protocol designed for the RDF data model, 21 which can be used to query any data represented by the RDF data model. Since web ontologies in Semantic Web which are compliant to OWL 2 or RDFS 22 are built on the RDF data model, web ontologies can be queried by SPARQL query.
Each SPARQL query has a matching pattern that consists of one or more pattern clauses. Any matching pattern of SPARQL query can be automatically transformed into a semantic equivalent matching pattern, in which the pattern clauses are all triples and also called triplet pattern. 3 For example, in Figures 1 and 2 the conversion is shown. The triples in the triplet pattern are similar to RDF triples, except that the predicate, subject, or object may be replaced by the variables to be queried.

Matching pattern with initial pattern clauses in a SPARQL query.

Triplet pattern of matching pattern in Figure 1.
A matching pattern of a query is also called a graph pattern which is used to match a subgraph of the RDF model being queried. In the matching process, RDF terms in subgraph are substituted for the variables in the corresponding graph pattern. The solutions of the query are parts of the RDF graph. In a SPARQL query, graph pattern can be divided into the following five different categories: Alternative Graph Pattern, Basic Graph Pattern, Optional Graph pattern, Named Graph Pattern and Group Graph Pattern. The complex graph pattern in the SPARQL query is nested by a combination of some basic matching patterns. Thus, SPARQL can express any complex graph pattern. In a nested pattern of a SPARQL query, the type of the outermost matching pattern is called the master graph pattern of the query.
Basic Graph Pattern is a matching pattern consisting of one pattern clause to be matched, while Group Graph Pattern usually consists of several pattern clauses to be all matched. Thus, each entity appearing in the matching pattern of a query with Basic Graph Pattern or Group Graph Pattern must appear in an RDF graph if any answers can be checked out from the RDF graph for the query. In addition, if the graph pattern of the SPARQL query is converted into a triplet pattern, for any RDF terms and their roles in a specific triple, the RDF graph must have at least one triple specified or implied, where the RDF term is the same role triple. This is an important clue to discover useful ontology for a query to be processed.
Optional Graph Pattern usually contains an optional pattern to extend the solutions for the pattern’s query. Named Graph Pattern is used to designate the RDF graphs to be matched against. Thus, these patterns cannot provide useful clues to discover ontologies useful for its query. Alternative Graph Pattern usually has two or more alternative patterns to be tried. To mine the specific clues to discover ontologies useful from Alternative Graph Pattern, we should decompose Alternative Graph Pattern in a SPARQL query. In fact, if a SPARQL query with an Alternative Graph Pattern is given, we can break it into several sub-queries without Alternative Graph Pattern. The process to break the query with Alternative Graph Pattern is listed as follows:
Convert the query’s graph pattern into a semantically equivalent graph pattern, where Alternative Graph Pattern is the query’s outermost graph pattern. For example, in Figures 3 and 4 the conversion is shown.
Take each alternative part of the query pattern as an independent graph pattern to create a new sub-query for the initial query.

A query pattern with Alternative Graph Pattern.

A query pattern with Alternative Graph Pattern as outermost graph pattern.
There is no doubt that, if a query
Given query
Index creation for SPARQL query
According to the semantics of SPARQL query discussed above, we can design a method to create indices for a given SPARQL query to discover useful ontologies which can be checked out for the query as answers. In our method, we just focus on the SPARQL query without Alternative Graph Pattern because a query with Alternative Pattern can be broken into several semantic equivalence sub-queries without Alternative Pattern to substitute for the query as discussed above. The method consists of the following steps:
So, given a triple pattern of a query we can get a set of pairs, called {<rdf:type, predicate>, <dcc:textbook, object>, <dcc:title, predicate>, <“Semantic Web Tutorial”, object>, <dcc:author, predicate>, <dcc:team, object>}.
Here, each pair in
Theorem
If an ontology is likely to be checked out as a part of solutions for a given query, for each pair
For any pair
Publication of web ontology
To efficiently discover ontologies according to the indices created for a given SPARQL query and the conclusion as addressed above, we can design a method to publish an ontology based on entities and their possible roles appearing. The method is as follows:
Location of web ontology for a given query
According to the method of ontology publication in section “Publication of web ontology” and conclusion in section “Index creation for SPARQL query,” we can design algorithm

The algorithm
In this algorithm,
In practice, the import relationship may exist between two ontologies. An ontology may be imported by another ontology as an independent part. If the ontologies
To reduce the redundant ontology, we must remove the ontologies which are directly or indirectly imported by other ontology in the useful ontology set of a query. So, if the URLs of the ontologies directly imported by ontology

The algorithm to reduce the redundant ontology.
In this algorithm, the function locateOnto(
In the formula, |
Implementation of our approach
In this section, we design the architecture for our approach in a sensor node and outline the functionality of each component in the architecture. As shown in Figure 7, it consists of one Ontology Repository and six brokers, which are as follows:

Our approach’s architecture in a sensor node.
In order to evaluate our approach, we implement the architecture of our approach. We suppose that all the ontologies in a semantic sensor network are in compliance with OWL standard. We use the open source development kits Pellet, 23 Jena, 24 and Open Chord 25 for our implementation. Jena is a Semantic Web framework to provide a Java application programming interface (API) to extract data from and write to RDF graphs. Pellet is a Java-based OWL description logic (DL) reasoner in conjunction with both Jena and OWL API libraries. Open Chord is an implementation for Chord protocol, which provides an interface for Java applications to take part as a peer for P2P network.
Evaluation
In this section, we design three experiments to evaluate our approach’s ability, effectiveness, and efficiency as follows:
The first experiment is designed to evaluate our approach’s ability to achieve solutions for a given query in a given context.
The second experiment is designed to evaluate our approach’s requirement of network resources for ontology publication.
The third experiment is designed to evaluate our approach’s requirement of network resources for query processing.
Network resources in this article are the number of times to access the P2P network and the quantities of the items publishing on or retrieving from the P2P network when ontology on the sensor node is published, or a semantic query is processed.
Experiment setup
In order to distributedly store and share an RDF model based on the structured P2P network, several approaches26–30 are presented as distributed RDF repositories, such as RDFPeers, 26 which can save a triple at three nodes by applying hash functions to its predicate, subject, and object, respectively. All the nodes in the P2P network know which nodes are responsible for the triples they are searching for according to their predicate, subject, or object. If the triples exist in our semantic sensor network, the approach ensures that the RDF triples will be found. Based on the ideas above, we can redesign two methods to publish ontologies and process SPARQL query as follows:
The first method (M2) is that, if a web ontology is given, it just publishes all the specified RDF triples explicitly in the ontology. Then, for a SPARQL query, from the P2P network it should retrieve the connected sub-graph of each entity in the query’s graph pattern so as to create a temporary web ontology to process it.
The second method (M3) is that, if a web ontology is given, it publishes all the RDF triples specified and implied in the web ontology. Because all the possible RDF triples are published, based on the entities in the graph pattern in a SPARQL query, it can obtain all the related RDF triples. Then a temporary web ontology is created for the query, and it can reason out all the possible results for the query.
Similar to our approach (M1) in this article, if a SPARQL query is given and a web ontology is published to reason out solutions for it, approach M2 and approach M3 guarantee to achieve the solutions. As applications based on distributed P2P network always consume network resources, such as network accessing, items publishing or retrieving from the network, it will be superior if an application consumes fewer network resources for a same task. Thus, in our experiments the performance of our approach M1 is compared with that of the approaches M2 and M3 based on the following aspects: when an ontology is published, how many times they should access the P2P network, and how many items they should publish, respectively; and when a query is processed, how many times they should access the network, and how many items they should retrieve, respectively.
In addition, in our experiments first 16 ontologies are downloaded from the ontology repository TONES 31 as our experimental data listed in Table 1.
Web ontologies downloaded from the ontology repository TONES.
Then 15 SPARQL queries are designed with the individuals, classes, and properties, which are all entities in ontologies in Table 1. Without loss of generality, each query’s graph pattern consists of three triples and three variables, the style of which is shown in Figure 8.

Style of graph pattern of each query.
In our experiments, we do not discuss the experimental environment in detail because our experiments just focus on the number of times to access the P2P network and the quantities of item publication or retrieval, which do not involve in concrete experimental environment.
Analysis of results
Totals of each query’s solutions obtained by our approach.
Then, we publish all the ontologies in Table 1 by approach M1 and process each SPARQL query in Figure 8. For each query, we counted the number of solutions that our approach M1 gets, which are shown in row M1 in Table 2. In Table 2, the figures in row M1 are identical to the corresponding figures in row M. This means that our approach ensures that all ontologies that match the query will be checked out as the answer for the query.
The results of this experiment are shown in Figure 9. The bars M1, M2, and M3 in Figure 9 denote their numbers of items to be inserted into the P2P network (i.e. the number of times to access the P2P network) when a corresponding web ontology is published using the approaches M1, M2, and M3, respectively.

Numbers of network accesses for ontology publication with the approaches M2 and M3 to M1.
Figure 9 shows that the number of accesses to the network for the approaches M2 and M3 are always far more than that of the approach M1 when publishing web ontologies. The multiple is shown in Figure 10. For approach M2, the range of multiples is between 1.09 and 16 which has an average value of 4.82; for approach M3, the range is between 6.522 and 112.7 which has an average value of 49.36. It means that the approach M1 can save a lot of network resources on the publication of ontologies. The reason is that, ontology publication using approach M1 is just based on entities in ontology, while approach M2 must publish each specified triple three times and approach M3 must publish all of the possible triples three times too. As a knowledge base, an ontology usually contains a mass of triples that must be published by M2 and M3. Especially, it can reason out more RDF triples implied, which must be published by M3.

Multiples of items published for each ontology from M2 and M3 to M1.
Requirement of network resources for query processing with approaches M1, M2, and M3.
From Table 3, we can find out that the approaches M2 and M3 usually need more number of accesses to the P2P network than approach M1 for query processing. The range of the multiples from approach M2 to M1 is between 3.67 and 345.8 which has an average value of 75.5. It is because approach M2 has to access the network iteratively according to the entities in the RDF sub-graphs required when getting the useful sub-graphs for a query, while approach M1 just uses the entities in the query to discover the desired ontologies. On the other hand, the range of the multiples from approach M2 to M3 is between 0.8 and 1.25 and has an average of 1.08. As a matter of fact, the numbers of approach M1 in Table 3 are almost identical to the corresponding ones of approach M3. Their differences are slight. It is because, if a query is given, approaches M1 and M3 are all just according to the entities in the query to locate the ontologies desired or retrieve relevant triples. But approach M1 usually ignores RDF terms of RDF, RDFS, or OWL vocabularies, while approach M2 cannot omit them. However, approach M1 must retrieve ontologies imported for each ontology discovered, while approach M2 does not need to do so. Thus, their numbers of times to access the network for a query do not show very large difference.
In addition, Figure 11 illustrates the numbers of the items retrieved from the network according to different approaches. The bars M1, M2, and M3 in Figure 11 denote the numbers of the items retrieved with the approaches M1, M2, and M3, respectively, when a SPARQL query is processed. Figure 10 indicates that the approaches M2 and M3 always retrieve far more items than the approach M2 for SPARQL query processing.

Numbers of items retrieved from the network for each query using the approaches M1, M2, and M3.
Given a query, the approaches M2 and M3 actually need many times the items to be retrieved than approach M1. In Figure 12, the multiples are shown. The line M2 shows that the range of the multiples is between 17.33 and 2136.8 and has an average value of 410.18. The line M3 shows that the range is between 45.8 and 447.25 and has an average of 294.89.

Multiples of numbers of items retrieved for each query from M2 and M3 to M1.
It can be seen that the amount of data the approach M1 needs to retrieve from the network is less than that of the approaches M2 and M3. This is because the approaches M2 and M3 have to get all triples related to each entity in connected sub-graphs desired, while the two approaches have published all the triples specified or implied according to each entity in the ontologies published, and a large number of triples will be retrieved for a query.
Based on the three experiments, we can find that our approach is very efficient and effective. When publishing an ontology or processing a query, it just needs only a small amount of network access and a small amount of data publication and acquisition. This means that it only consumes less network resources. It is obviously superior.
Related work
In recent years, semantic sensor networks are proposed and apply ontology to provide query capabilities and data access. At present, research on semantic sensor networks is very extensive. Li and Taylor 7 offer a semantic service-oriented framework for sensor network services that aims to improve query processing. Their framework focuses on query processing to allow distributed end users to request streams of interest easily and efficiently, based on the principle of pushing the query down to the network nodes as much as possible. Corcho et al. 9 propose an ontology-based approach for providing data access and query capabilities to streaming data sources. In addition, they describe the theoretical foundations and technologies that enable exposing semantically enriched sensor metadata, and querying sensor observations through SPARQL extensions, using query rewriting and data translation techniques according to mapping languages, and managing both pull and push delivery modes. Huang and Javed 8 propose a Semantic Web architecture to allow the sensor data to be understood and processed in a meaningful way by a variety of applications with different purposes. In the architecture, they develop ontologies for sensor data and use the Jena API for processing which includes querying and inference over the sensor data. Barnaghi et al. 13 propose the common standards and logical description frameworks based on the Semantic Web community to create a sensor data description model. In the frameworks, they design a sensor data ontology which is created based on the Sensor Web Enablement (SWE) and SensorML data component models and describe how the semantic relationship and operational constraints are deployed in a uniform structure to describe the heterogeneous sensor data.
In addition, technically similar to the research work in this article, distributed approaches for knowledge sharing on Semantic Web, especially based on a P2P network, are attracting more and more attention in recent years. Tian et al. 32 present the approach named Semantic Peer based on P2P techniques, which provide an ontology-based P2P lookup service. First, it divides ontology into private and common ontologies. Then they extend the Chord protocol with an index table and an express table to publish resources described in private and common ontologies, respectively. Given semantic query, it can route to a node with a private ontology which can be used to process the query. Similarly, Gao et al. 33 present a similar approach which can find out the node’s responsibility for the kind of resource according to the classification of a target resource. But these approaches depend on a common ontology or a unified classification of resources.
The method proposed in this article is completely different from the above approaches. It does not rely on common ontologies or a given proposal for ontology description when publishing web ontologies on a structural P2P network, but just based on the entities being described in these ontologies.
Moreover, Cai and Frank 26 and Pellegrino et al. 27 present a scalable distributed RDF storage and sharing approach, which publishes each RDF triple at three nodes on a structural P2P network by applying hash functions according to its predicate, subject, and object. Therefore, the method ensures that all queries can find the desired triples. However, if RDF triples are directly published, it does not support semantic retrieval because implicit triplets will not be inferred. To address this issue, Kohigashi et al. 28 focus on class hierarchies of RDF resources. But their approach looks for RDF triples just based on the class hierarchies and some problems such as load balance and reliability will be encountered when RDF triples are distributed on a P2P network. For this issue, Z Kaoudi et al. 29 and G Rizzo et al. 30 presented methods for distributed and reliable RDF storage, respectively. But they do not focus on semantic query processing. Although these approaches can make sure that RDF triples in different web ontologies can be directly shared but it is difficult for processing semantic query with these approaches.
In fact, many works have been carried out that just focus on distributed management and sharing of knowledge based on unstructured P2P network in a virtual community. Wang et al. 34 propose an approach for knowledge sharing in a virtual community. Their approach organizes the nodes with ontologies as an unstructured P2P network. When a node receives a semantic query, it will send the query to its neighbors until the desired ontologies are discovered for the query. Similar approaches are also discussed in previous works,6,35–37 which are all based on unstructured P2P. The differences among them are the strategies to construct their unstructured P2P. But they all try to create a connection between the nodes with similar knowledge so as to improve semantic query’s routing efficiency. However, unstructured P2P also limits their efficiency to locate ontologies due to their routing mechanism. Yin et al. 38 and Xu and Yin, 39 respectively, discuss the collaborative location sharing of content and services. In a broader field, Yu and colleagues40,41 use a learning-based approach to classify and share information resources.
In fact, distributed ontology techniques42–44 are presented and discussed in recent years. But they just focus on web ontology integration, mapping, and distributed reasoning.
Conclusion
In this article, we propose an approach that is based on a structured P2P network to publish and discover shareable and related ontologies and to process SPARQL queries for knowledge sharing in a semantic sensor network automatically. For a given SPARQL query, our approach makes sure to find out all the shareable ontologies published in a semantic sensor network, which can be reason out solutions for the query. So the semantic sensors in our approach can publish their ontologies to share with other sensors and discover useful ontologies from other sensors in the network when processing their SPARQL queries. It is particularly suitable for knowledge sharing in a semantic sensor network, where the numbers of ontologies are created, respectively, by sensors independently and shared and leveraged for their own purposes.
We also conducted three experiments to evaluate the approach’s effectiveness and efficiency. Our experimental results show that it is obviously superior. In the near future, we plan to continue our research work in the following aspects:
SPARQL query’s graph pattern should be investigated further so as to find out other clues to more efficiently locate the desired ontologies rather than the entities and their roles appearing in it.
It should be tried to map words in natural language to existing entities in ontologies to facilitate users to automatically generate their semantic queries.
