Abstract
Introduction

The paper’s contributions. A three-folded perspective on the knowledge representation efforts for RDF Stream Processing respectively based on the FAIR principles, a meta [C]o[NC]e[PT]ualization, and the [C]ommon [E]vent [M]odel.
In recent years, the Semantic Web community has witnessed a growing interest in streaming data for application domains that combine the presence of Data Variety (i.e., highly heterogeneous data sources) with the need to process data as soon as possible and before they are no longer useful (Data Velocity). Examples of such application domains include Smart Cities, Industry 4.0, and Social Media Analytics. Stream Reasoning (SR) [29] is a research initiative that combines Semantic Web with Stream Processing technologies to the extent of addressing the aforementioned challenges at the same time. SR counts several research outcomes that span across Continuous Querying, Incremental Reasoning, and Complex Event Recognition [31]. RDF Stream Processing (RSP) is a subarea of SR that focuses on the processing of RDF Streams [65]. In particular, the research activities around RSP, include a growing number of applied research works due to the availability of working prototypes, benchmarks, and libraries [49] that, in turn, spawn research on Streaming Linked Data (SLD) [67,70].
While data streams become more available on the Web, the community started discussing best practices to publish data streams in an interoperable manner. To this extent, the FAIR data initiative is promising. Indeed, Tommasini et al. reinterpreted some of the steps of the linked data lifecycle to answer the question “
Tommasini et al. consider several resources published under the SR umbrella. A number of works emerged that show how to access and process data streams on the Web [49]. Even though a number of domain-specific ontologies have been used in SLD applications, little has been done regarding the data modelling and knowledge representation efforts that SLD applications entail.
In this paper, we dig deeper into this claim by surveying the related literature and isolating such efforts. In particular, we investigated research papers that apply RSP, i.e. a subset of SR, as a solution. Like in similar works, we systematically select the papers, defining inclusion criteria and filtering methods. We extracted the ontologies used in these selected papers to model the data streams. We study such ontologies from three perspectives: (i) A
Figure 1 summarizes our three-folded perspective, designed to highlight different aspects concerning knowledge representation for SLD by progressively zooming in. Indeed, higher levels offer a broader analysis than the ones below, encouraging a holistic view of the central concepts, i.e., Data Streams and their interrelations (30k), the classes and properties characterizing the content of data streams (10k), and the structure of the event as the unit of information that populate the streams (1k).
This section presents the fundamental notions needed to understand the paper’s content. In particular, we offer the survey methodology and the Streaming Linked Data lifecycle.
Survey methodology
Our survey follows the guidelines of the systematic mapping research method [21], which has already been used successfully for surveys in the Semantic Web [55]. In particular, our investigation aims at answering the following research question (RQ):
The integration of heterogeneous data is a significant part of Semantic Web Research. In addition, RQ1 includes two main components, i.e.,
To collect relevant studies, we initially conducted a keyword-based search on Google Scholar, the IEEE Xplore, and ScienceDirect and investigated their citations to retrieve further interesting studies. We used the following keywords to retrieve 620 papers:
Stream Reasoning
RDF Stream Processing
Streaming Linked Data
Linked Stream Data
Incremental Reasoning
Ontology AND Streaming/Dynamic
Ontology AND Event
Observation AND Ontology
The next steps of our collection apply a number of filters to reduce the number of papers and narrow the analysis. To this extent, we identified different inclusion criteria (IC) indicated below. Notably, IC1-4 are based on the papers’ metadata, while IC5 and IC6 are content-based.
papers should be written in English
papers should be peer-reviewed
papers should be published in the last 10 years,
papers should have at least 10 citations.
papers should
papers should present/reuse a domain-specific ontology to model the data in the processed streams,
Like in [55], we apply
Our analysis identified 32 papers from which we extracted 10 ontologies. The extracted ontologies are commonly used in one or more identified papers. The last step of our analysis was dividing the ontologies into two groups. The first group addresses SLD from a publication/discovery standpoint. Given the abstract view, we name the group Thirty-Thousand Foot View. The second group looks at SLD from a processing standpoint, which is a lower level of abstraction. Therefore, we name this group the Ten-Thousand-Feet View. We also notice that within the latter group, there is an even lower abstraction point of view, which we call the Thousand Foot View, and it concerns the representation of data points within the streams. Figure 2 visualizes the selection process, while Table 1 lists the selected ontologies, their prefixes, each view they cover, and the papers they originated from.

Collection and filtering methodology visualized.
Ontologies for Streaming Linked Data: summary. (✓: supported,
In this section, we present some essential concepts that will recur alongside the remainder of the paper.
This paper also focuses on research works that leverage time as a measure of
Such works focus on abstractions such as streams or events. The former represents unbounded yet ordered data using non-strict temporal ordering, which is leveraged to define the processing semantics. In these regards, we say time plays the role of
The latter, i.e.,
Last but not least, it is worth mentioning
Streaming Linked Data

Streaming Linked Data life-cycle from [17].
To address this issue, a unified formalization of continuous query processing over RDF streams was introduced in [30], known as RSP-QL, and a library RSP4J [65]. The former successfully integrates continuous query over RDF streams evaluation semantics and operational semantics of windows, enabling the characterization of existing SPARQL extensions for continuous querying. The latter aims at unifying existing RSP systems via a unique API inspired by RSP-QL primitives. Together, they contributed to pushing the state-of-the-art via the formalisation and prototyping of new languages [61] and systems [57]
This section details the selected SR ontologies that will be investigated using the proposed Thirty-Thousand, Ten-Thousand, or Thousand Foot View.
Foundational ontologies
Summary of foundational ontologies
Summary of foundational ontologies
We first describe four general ontologies that are frequently imported into the SR ontologies we will discuss later. Moreover, we highlight parts of their conceptualizations that are relevant to understand the content of the paper and summarize them in Table 2.
When surveying the literature, we found that the following ontologies are being used for the description and modelling of streaming data as Web resources:
The
The
The
Furthermore, we additionally identified the following prominent ontologies used in RSP applied research and will investigate their structure and internals when used as a knowledge representation in stream reasoning applications:
The
The
The
The
The
The
All surveyed ontologies, their prefixes and which views they cover are summarized in Table 1. Figure 4 visualizes the dependencies between the various selected SLD ontologies and the imported concepts or complete ontologies that they share. Certain SLD ontologies do not import a whole ontology, but rather import a limited subset of concepts of a certain ontology, this is visualized with the full dependency arrow in Fig. 4, while complete imports of ontologies are visualized with dashed arrows. Note that the figure only depicts overlapping imports, i.e. imported ontologies that at least two ontologies share. Ontologies imported by a single SLD ontology are not depicted in order to keep a visual overview.

Overview of dependencies between the selected SLD ontologies and the imported concepts/ontologies they share.
The thirty-Thousand-Foot View for SLD observes data streams as Web resources, i.e., the fundamental building blocks of the World Wide Web, and focuses on their metadata, governance, and provenance. Therefore, we reformulate our research question as follows:
Only four of the ten selected ontologies have the notion of data streams as Web resources, the others are not included in this discussion. These four ontologies include VoCALS, SAO/CES, LDES, and IoTStream.
Analysis framework
Our analysis builds upon the preliminary adaptation of the FAIR principles proposed in [67]. The original FAIR Principles [73] are reported below:
Notably, the Thirty-Thousand Foot View does not aim at assessing whether existing ontologies follow the FAIR principles themselves (as similar effort has been done in previous research [53]). Instead, the analysis investigates if existing ontologies allow to share FAIR streaming data on the Web. The analysis focuses on the ontological level and its (potential) applications. Definition 1 introduces the notion of Web Stream, which is a prerequisite for identifying streams on the Web.
Definition 1 captures the double nature of Web Streams, which are both a resource (indeed they are identifiable) but also “contain”, i.e., refer to other resources on the Web. Such a two-fold nature extends to the data and metadata levels. Therefore, we can distinguish between stream-wide and event-wide (meta)data, which relate to the stream resource and its content, respectively [66]. Stream-wide (meta)data contains information about the whole stream, for instance, who is the publisher, or a list of known consumers; on the metadata level, we find the date when the stream was first issued, descriptive statistics about the data or the formats in which the stream is available. Event-wide (meta)data concern each Web resource within the stream. For instance, a resource can refer to a domain-specific entity, which in turn depends on where the stream is originally from (e.g., for an IoT stream monitoring the location of people, an entity can be a given Point of Interest or a person). The role of Event-wide metadata relates to the event order, duration, or location. Notably, a punctuation mechanism that is needed to enable continuous processing is usually based on time. However, it can be generalised to any Boolean predicate related to order that leverages event-wide metadata [68].
Summary of the thirty-thousand-foot view, i.e., compliance of the selected ontologies (top) with FAIR principles (left) and our analysis dimensions (left) (terminological level only) legend: ◇ = possible; ✓ = supported;
partially supported; [S]tream; [E]vent; [G]general; [D]escriptive; [C]ontext; [P]rovenance; [I]indexing; [U]nordered; [N]ot [A]pplicable
Summary of the thirty-thousand-foot view, i.e., compliance of the selected ontologies (top) with FAIR principles (left) and our analysis dimensions (left) (terminological level only) legend: ◇ = possible; ✓ = supported;
We now analyze the selected ontologies, w.r.t. the FAIR data principles. While Table 3 summarise the answers to the individual principles, we organize the discussion along the following dimensions by answering the related questions:
D.1
D.2
Among the selected ontologies, only five have a conceptualization that can be coherently aligned with Web Streams and, thus, allow representing stream-level data.
Regarding metadata,
Finally, all the selected ontologies use OWL (Frappe, VoCals, SAREF, SAO/CES, IoTStream, SSN/SOSA) or RDFS (Activity Streams, SIOC) as ontological languages to implement their formalization.
D.3
All the selected ontologies support and encourage using RDF (Streams) to represent data and metadata (F3). However, not all focus on the stream and event levels.
D.4
Among the selected ontologies, only
Moreover,
Regarding provenance (R2), all the ontologies, except for LDES, which is not focused on processing, include dedicated classes and properties for tracking the provenance of streaming analysis, i.e.,
Finally,
D.5
Linking across resources is essential to the Semantic Web and, more generally, interoperability. Also, the FAIR principle encourages this, translating at the ontological level with the explicit possibility of linking to external resources (outside the (meta)data semantics). Not all the ontologies support it explicitly, but only
Unfortunately, there is no way to verify whether the linked resources follow the FAIR principles by only looking at the ontological level. However, if we only limit our indirect assessment to the selected ontologies, any interlinked Stream that reuses a combination of the selected one would be FAIR.

Combination of VoCALS with SAO and SSN ontologies to increase FAIR coverage. Prefixes omitted
It is important to note that every ontology does not need to cover all aspects. It is possible to combine ontologies with different capabilities to obtain complete coverage. A combination of VoCALS with SAO and SSN was already explored in the original VoCaLS paper [63] and is reintroduced in Listing 1. We utilized the SOSA/SSN vocabularies to represent the source device and the observation data it produces, and SOA to describe information about the output of a stream observation, in addition to capturing the stream and streaming services metadata. The listing reflects an interpretation of Table 3, which shows that the combination of VoCaLS with complementary ontologies such as SOA or IoTStream can increase the FAIRness of the streams.
From our discussion emerges a clear need for greater emphasis on adhering to the FAIR principles and addressing the challenges specific to stream reasoning, ensuring that data streams are not only analyzed in real-time but are also readily discoverable, accessible, interoperable, and reusable for both current and future research and applications.
When modelling an ontology for SLD, the primary goal should be to maximize FAIR coverage. The rapid development of SLD technologies has led to overlook these aspects. Indeed, it’s not uncommon for a single ontology in this domain to fall short of meeting all the FAIR principles comprehensively (see Table 3). In such cases, it’s advisable to pursue a strategy of combining multiple ontologies to bridge these gaps and maximize FAIR coverage collectively, thereby enhancing the effectiveness of stream reasoning systems.
Maximize FAIR coverage in new design; Combine ontologies to maximize FAIR coverage not just for domain modelling compliance;
Ten-thousand foot view: Streams’ structure

Streaming Linked Data abstractions.
The Ten-thousand Foot View focuses on the ontological level and analyses the nature and nurture of the conceptualization of the selected ontologies used for representing streaming data within a given domain.
According to our Thirty-Thousand Foot View analysis (see Table 3), only eight of the ten selected ontologies describe concepts to represent the streaming data at the event level. These eight ontologies include SSN/SOSA, SAREF, IoTStream, SIOC, LODE, ActS, Frappe, and SAO/CES. The other ontologies are not included in this discussion.
In the related literature [3,31,49], dynamic data are typically divided into two kinds of abstractions, i.e., unbounded time-ordered data a.k.a.
SLD focuses on query answering over RDF Streams, i.e., Continuous Computations (see Definition 3) that assume the form of Continuous Queries (CQ), which are a special class of queries that listen to updates and allow interested users to receive new results as soon as data becomes available.
On the other hand, Time-varying abstractions represent the result of Continuous Computations and, as the term suggests, capture the changes that occur to data as a function of time. Definition 4 formalizes the notion and specializes the definition.
Many extensions of SPARQL exist [31] to perform Continuous Queries over RDF Streams, and the RSP-QL [30] reference model aims at unifying the formal semantics of existing SPARQL extensions. Its abstraction can be found in Fig. 5. A common aspect of these languages is the notion of windowing, which allows to perform stateful computation over a stream. Window Operators, a.k.a. Stream-to-Relation (S2R) operators, chunk the stream into finite portions where computations can terminate. Once windows are applied, operators that involve Time-varying abstractions can be traced back to their original version that is applicable to static data (R2R). Finally, an operator’s class that transform back Time-varying data into streams is called Relation-to-Stream (R2S). According to RSP-QL, a Time-varying RDF Graph results from applying a window operator over a stream.
Last but not least,
Summary of the ten-thousand foot view analysis
Summary of the ten-thousand foot view analysis

Decision diagram for assigning the meta-structure in the ten-thousand foot view. Red arrow is “no”, green arrow is “yes”.
In this section, we elicit the data dichotomy explained above to study the meta-conceptualization of the selected ontologies that model concepts that align with the meta-conceptualization described above. For this reason, LDES is not taken into account in this discussion.
An ontology used for SR typically consists of five levels, i.e.,
The detailed analysis of the selected ontologies is presented below and summarized in Table 4.
The decision diagram in Fig. 6 is structured to guide knowledge workers operating within the SLD context at the Ten-Thousand Foot View. The diagram helps determining the classification of ontology concepts based on time. For instance, if one is determining if “time is part of the conceptualization,” and the answer is “no,” then the concept is “Time Agnostic.” If the answer is “yes,” further decisions based on “occurrence”, “endurance,” and “change” lead to the classification of the concept into one of the other levels. The diagram provides a structured approach to categorizing ontology concepts by their relationship with time, which aligns with Definitions 2, 3, 4, and the general notion of time presented in Section 2.
Interestingly, L4 is where the selected ontologies differ the most. SSN/SOSA distinguish between the
We can see that most ontologies distribute their complexity across different temporal levels, facilitating the alignment with SR applications.
The selected ontologies include complex concepts requiring definition consisting of expressive language constructs. Such constructs have, in turn, an impact on the expressivity of the including ontology. In the following, we discuss these nuances focusing on how they related to our meta-structure (see Fig. 6). Moreover, we discuss opportunities for reasoning optimizations. Table 5 summarises the expressivity of each ontology in terms of minimum OWL2 Profile and Description Logic (DL).14 We refer the reader to Baader et al. [5] for a complete introduction to DLs, as it is out-of-scope for this paper.
Ontology expressivity in terms of OWL2 profile and description logic
We now zoom deeper into various complex definitions and their structural relation to SR tasks. As the goal in SR applications is to reason upon the events in the stream and combine them with other contextual data, we investigate complex concept definitions that span across levels (L1-L5), stressing in particular on L1. We define complex concept definition in DL notation, i.e.
We focus on reasoning on instance level (ABox), through definitions defined across the five ontology meta-structures. We differentiate between complex definitions using either existential in the subclass definition (i.e.
We identified four interesting reasoning perspectives based on the position of L1 in the complex definitions, i.e. either in
Various reasoning classes that influence an ontologies SR abilities. (U = universal, E = existential, E
Both
(
So even though most ontologies were very expressive at first glance, they mainly use this expressivity to define restrictions on the various concepts, while the inference tasks are typically reserved for application specific logic.
At this level of analysis, we recommend to follow four valuable lessons to enhance the effectiveness of data processing. Firstly, practitioners shall carefully examine the expressivity of imported ontologies and striving to limit their complexity, ensuring that the ontologies utilized align closely with the specific requirements of their applications. Indeed, we observed that despite the attempt of keeping the ontology profile down to OWL 2 QL, resolving all the imports causes the overall profile to be much more complex (OWL 2 DL). Secondly, it is advisable to maintain a low reasoning expressivity when defining the concepts related to events. Recent results on hierarchical reasoning show how SLD applications could benefit by limiting to such modelling practice [18], which also helps streamline the processing of streaming data by avoiding unnecessary complexity in stream reasoning tasks. Furthermore, it’s essential to avoid Reasoning Perspective 1, where event data significantly influence the classification of more static data. This approach can be challenging to optimize and may lead to inefficiencies in data handling [16]. When selecting ontologies for integration in the stream reasoning context, aim for those that exhibit clear differentiation in their meta-structure (see Fig. 6), as identifying the change frequency of instances based on their assigned concepts allows to optimize the processing. Indeed, differentiation allows to avoid redundancy and promote effective knowledge representation and data integration within this dynamic and evolving domain [40].
By heeding these lessons, the field of SLD can better manage the intricacies that occur when modelling a domain that presents streaming data and continuous information needs.
Check the expressivity of the imported ontologies and try to limit the imported expressivity. Keep the reasoning expressivity of the concepts that define the event as low as possible. Avoid Reasoning Perspective 1 in which the event data influence the classification of the more static data, as it is not trivial to optimize. Aim for a clear differentiation in the ontology meta-structure.
Thousand foot view: Streams’ content
The Thousand Foot View of SLD focuses on the stream’s internals. In particular, we study the notion of Ontology Kernel (see Definition 5), and how the selected ontologies implement it. We reuse the ontologies introduced in the Ten-Thousand Foot View. Only eight of the ten selected ontologies describe concepts to represent the stream’s internals. These eight ontologies include SSN/SOSA, SAREF, IoTStream, SIOC, LODE, ActS, Frappe, and SAO/CES. The other ontologies are not included in this discussion.
Analysis framework

Kernel structure.
The Common Event Model (CEM) was initially proposed by Westermann and Jain for multimedia applications [72]. CEM is designed for historical event analytics. Thus, it does not relate to L4 and L5. When porting CEM to SR/RSP, we must reinterpret some aspects. Traditionally, data streams are characterized by a form of
Our analysis highlights the relation between the Kernel and the meta-conceptualisation levels (cf. Section 5). Figure 7 depicts such relation enumerating the levels across the CEM dimensions, which are:

Overview of the RDF event shapes.
Overview of ontology kernel analysis for informational and experiential information
Overview of ontology kernel analysis for informational and experiential information
We now align each of the ontologies with the CEM: We distinguish the Informational and Experiential discussion over the two levels L1 and L2. The higher the level, the further away from the core. L1 is one property link away from the core, e.g. a type assertion and linked entities, while L2 requires two hops, e.g. types of the linked entities of L2 or additional entities) We provide a summary of the analysis for the Informational and Experiential discussion in Table 7 and for the Spatial and Temporal discussion in Table 8.
Overview of ontology kernel analysis for spatial and temporal information
On L2, informational data include the types of the L1 linked Entities which describe the Static level of the ontology. In particular, the IoT ontologies (SSN, SOSA, IoTStream, and SAO) link the
Note that many of the classes of Informational L1 align with the Instantaneous level of the Ten-Thousand Foot View even though these are two different ways of looking at the classes of the ontologies. In the previous, view we looked at the classes that had a temporal annotation, while in this view we look at the classes used for modelling the events. They align as the events themselves are what change over time.
Interestingly, we see that most ontology models rely on
Chains are not particularly useful as they only allow to move from the core of the kernel to the outer level through Informational Entity relations. At the end of the chain, there can optionally be only Informational Type or Experiential data, as these data end the chain. Cycles share the same faith, as they only allow to cycle through Informational Entity relations, without any Experiential or Type data, as these data end the cycle. Trees can model all data, but tend to describe unnecessary static data. Stars can model Informational L1, both the type of the event itself and the linked Entities, while describing the data in the Experiential L1, making it ideal for event modelling. Table 9 and 10 summarize the analysis.
Mapping of the RDF structures on the event kernel using the SOSA ontology.
Structural analysis vs query shapes
RDF shapes alignment with the kernel and ontology levels
Understanding the structure of the events is important as it opens many opportunities for optimizations, as it allows to clarify how a query can optimally interact with the events. For example, Stars could be represented as a table (instead of an RDF graph) allowing part of the querying to be offloaded to lower-level processing techniques that operate before the conversion to RDF which can improve performance [11]. Fernandez et al. [33] showed that identifying regularities in the structure of the data in the stream allows to improve transmission by structure-tailored compression techniques. Furthermore, Bonte et al. [15] showed that understanding the structure of the events in the stream allows to optimize the continuous query evaluation process. These kinds of optimizations then on their own can lead to better modelling guidelines for SLD ontologies.
Finally, at the lowest level of our analysis, we share several key lessons that have emerged. To promote streamlined processing in real-time environments, it is advised to keep the core kernel of the data model as concise as possible or at least limit the expressiveness of the ontological fragment that it uses. Indeed, the more properties constitute the kernel, the higher the risk for encountering unexpected dependencies with static knowledge (see Perspectives in Section 5.3). Additionally, the adoption of event structures that can be easily translated into simpler representations, such as the Star model, can be optimised for matching independently from the window [45]. When incorporating temporal information, adhering to widely accepted temporal concepts like
These lessons collectively advance the field of SLD, enabling more effective management and utilization of dynamic and evolving datasets.
Keep the kernel as small as possible. Rely on an event structure that can easily be translated to simpler representations, such as the Star. When modelling temporal information, regardless of the need for point or time semantics, use widely accepted existing temporal concepts such as For spatial information, refrain from introducing custom location-specific concepts and reuse concepts from the
Related surveys
Dell’Aglio et al. [31] recently surveyed the state-of-the-art of stream reasoning research. They initially identified 9 requirements for a stream reasoning system to satisfy, then they analyzed the compliance of existing works to them. Although the authors discuss streaming annotation, which is comparable to our Thirty-Thousand Foot View, they do not explicitly compare ontologies themselves.
Margara et al. [49] also surveyed solutions for stream reasoning and RDF stream processing. The focus of this survey was on comparing system capabilities and identifying limitations in terms of RDF stream processing. Although related to potential future work, we did not include
In the context of the Semantic Web for the Internet of Things, the work of Szilagy et al. [60] is related. The authors discuss the advantages of semantic annotation for solving interoperability issues in the IoT domain. Then, they propose a specialized version of the Semantic Web stack for IoT. Although Szilagy et al. propose to compare four ontologies, including SSN, the comparison is not the main focus of their work. Moreover, the analysis’s scope is limited to IoT and does not include ontologies like SIOC and LODE.
Finally, Gyrard et al. [36] describe a Linked Open Vocabulary (LOV) for IoT projects (LOV4IoT). LOV4IoT identified existing IoT ontologies, re-engineered the vocabularies to make them interoperable, and cataloged them. However, they did not investigate each of the ontologies’ capabilities for modelling data streams and LOV4IoT is limited to IoT applications.
Conclusion
In this paper, we surveyed the work on KR for SLD. In particular, we presented 1) a Thirty-Thousand Foot View observing streams as Web resources, 2) a Ten-Thousand Foot View that observes the nature and nurture of the ontologies for streaming data starting from a bottom-up approach, and 3) a Thousand Foot View, which zooms further in and discusses how different ontologies model the events in the stream. Our analysis can be summarised as follows:
From
As not all ontologies cover all aspects and different views, to be compliant with the SLD principles, a combination of SR ontologies is recommended.
As future work, we plan to extend the analysis to include a
Our analysis introduced a number of reasoning perspectives, which opens opportunities to design an ontology profile that opens the possibilities for various reasoning optimization that can be identified by the different perspectives. Our analysis frameworks also open various directions in terms of optimized processing. For example, the Ten-Thousand-Foot View opens optimizations by explicitly defining the interaction between the data in the stream (instantaneous level) and more slowly changing data. Similarly, the Thousand Foot View opens optimizations by identifying the different shapes of events. In terms of knowledge representation, we have identified opportunities to define ontology metrics for SLD ontologies, starting from our analysis frameworks.
Most importantly, our analysis frameworks can aid to evaluate future ontologies for SLD and serve as a guideline for high-quality knowledge representation.
