Abstract
Semantic Web technologies aim to simplify the distribution, sharing and exploitation of information and knowledge, across multiple distributed actors on the Web. As with all technologies that manipulate information, there are privacy and security implications, and data policies (e.g., licenses and regulations) that may apply to both data and software artifacts. Additionally, semantic web technologies could contribute to the more intelligent and flexible handling of privacy, security and policy issues, through supporting information integration and sense-making. In order to better understand the scope of existing work on this topic we examine 78 articles from dedicated venues, including this special issue, the PrivOn workshop series, two SPOT workshops, as well as the broader literature that connects the Semantic Web research domain with issues relating to privacy, security and/or policies. Specifically, we classify each paper according to three taxonomies (one for each of the aforementioned areas), in order to identify common trends and research gaps. We conclude by summarising the strong focus on relevant topics in Semantic Web research (e.g. information collection, information processing, policies and access control), and by highlighting the need to further explore under-represented topics (e.g., malware detection, fraud detection, and supporting policy validation by data consumers).
Introduction
Privacy, security and the proper handling of data related policies are topics that affect all technological areas, but have been under-explored in relation to Semantic Web technologies. Indeed, much research in the Semantic Web and Linked Data domain has focused on enabling the sharing of open datasets. However, as Semantic Web technologies and principles are gaining traction both in use cases that deal with sensitive data and in terms of application in industrial contexts, it is necessary to investigate the potential privacy and security issues. For example, how they might cause new or more complex threats to privacy or make the security of deployed systems harder to ensure, and how managing, tracking and enforcing policies associated with data becomes more complex.
Although the widespread use of Semantic Web technologies and Linked Data leads to new security, privacy and policy-related problems, at the same time they can also be seen as part of the solution. For example, more accurate models for detecting security issues can be built through the semantic analysis of the data. Additionally, the meaningful interpretation of personal data exchanged between individuals and various other web entities could be used to empower web users to better control those interactions, and therefore better manage their online privacy. The machine-readable and machine-processable representation of data-related policies can also bring many advantages to companies through the automation of tasks related to policy-management.
The goal of this paper is to provide a brief overview of recent work on security, privacy and policy related challenges associated with Semantic Web technologies. The information presented herein is based on analysing the articles published in this special issue of the Semantic Web Journal, therefore acting as an editorial for it, as well as looking at the five editions of the

A taxonomy of activities creating privacy problems, from [39].
Privacy, security and policy topics in data and information management are very related to each other, but also each one is very complex and multifaceted in their own right. They each represent a wide range of issues and challenges, to which a variety of solutions have been applied in other domains. While not all those issues and challenges might apply to Semantic Web technologies, it is worth looking at them broadly, inorder to understand where works by the Semantic Web community tend to place themselves, and where gaps still exist.
A taxonomy of privacy
One of the most highly cited works that is used to “classify privacy” is an article entitled “A taxonomy of privacy” by Daniel Solove [39]. In said article, Solove argues (as many authors before him) that privacy is an ambiguous, polysemic and often subjective term that can therefore not be reduced to a simple concept, and especially cannot be considered purely from the point of view of the law. Instead of proposing a definition for privacy, Solove focuses on privacy threats which, he argues, can be listed and defined in a more robust manner. This taxonomy of privacy problems is depicted, in Fig. 1 where information based activities that are known to create problems are divided into four main categories: information collection, information processing, information dissemination, and invasion.
Classification of security incidents
Security is also a broad term that can be applied to many different areas. However, considering the scope of this article, we focus here on cyber-security, which relates to security issues and challenges associated with computing devices, applications and networks. There have been several classifications of issues and problems associated with cyber-security from various organisations, including, e.g., the Software Engineering Institute [5] and the European Union Agency for Network and Information Security (ENISA) [29]. Those tend to overlap and cover similar aspects, as they focus on the incidents of problems that might occur in relation to cyber-security. Here we choose to apply the taxonomy from the European Cybercrime Centre (EUROPOL) [14] as it focuses specifically on threats and issues that are related to technological systems. This taxonomy of incidents is reproduced in Table 1. Naturally, only a subset of those threats are expected to be relevant for Semantic Web technologies.
Classification of cybersecurity incidents from EUROPOL [14]
Classification of cybersecurity incidents from EUROPOL [14]
There are several types of policies that are related to the present study. Those include privacy and security policies that strongly overlap, in their content, with the two previous classifications. We additionally consider in this category the specific tasks that are associated with the management of and compliance with policies associated with the distribution of intellectual property (IP) assets, especially software and data licenses, as well as terms of use of services and regulatory obligations. As far as we are aware, there does not exist a taxonomy of activities or issues associated with this area. We therefore take inspiration from existing literature, especially in the area of software license management, to devise a simple taxonomy of tasks associated with privacy, security, distribution and usage policies for IP assets. This taxonomy, which is presented in Table 2, is relevant for policies that relate to data or software artifacts, including services.
Collection: Existing works around semantic web security, privacy and policy
Based on the taxonomies described above, our goal is to review the privacy, security and policy research contributions associated with Semantic Web technologies. This includes both the use of Semantic technologies to support the resolution of specific privacy, security and policy issues, as well as works that tackle privacy, security and policy issues emerging from the application of semantic technologies. To do that, we create a corpus of papers and articles that directly address one or more of those aspects. We start with the works published in the Special Issue of the Semantic Web Journal on Security, Privacy and Policies (for which this article acts as editorial), namely:
We also include in this analysis all the papers presented during the PrivOn workshop series, which was co-located with the International Semantic Web Conference (ISWC) from 2013 to 2017, and relevant papers from the SPOT workshop, which was co-located with the Extended Semantic Web Conference (ESWC) in 2009 and 2010. Finally, in order to obtain relevant works outside of those specific venues, we perform several Google Scholar searches, by associating keywords strongly related to Semantic Web technologies1 including Semantic Web, semantics, Linked Data, ontology, RDF, OWL, SPARQL
Taxonomy of tasks associated with IP distribution and usage policies
Sample paper classification
We analyse the corpus of references collected according to the method described in the previous section by manually annotating each paper using the three taxonomies previously described. In doing so, we do not assume that any paper should only be represented by one category, or one taxonomy, as many works span across several topics with varying levels of generality. For example, the articles included in the special issue of the Semantic Web Journal are classified as depicted in
We also add another category to indicate whether the paper or article presents an issue, challenge or problem, or a solution. Unsurprisingly, considering that most works come from computing or other strongly technical disciplines, the large majority of the references relate to works presenting solutions (66 out of 78).
Works with a strong focus on privacy
Also unsurprisingly, considering the nature of semantic web technologies and their purpose, many of the references included in our corpus relate to privacy (37 out of 78), with at least one annotation from the privacy taxonomy. A particularly frequent annotation there relates to the
Another common category addressed by works in our corpus is the one of
Besides the aforementioned groups, several works including [32] or [6] look at privacy from a broader perspective, especially connecting privacy issues around
Works with a strong focus on security
While rarely considered a core topic for Semantic Web research, many (46 out of 78) of the works in our corpus relate, in one way or another, to the topic of security. Most of those however focus entirely on the area of
Another interesting area in terms of
Interestingly other common security topics in relation to the
Works with a strong focus on policies
With the increasing amount of (creative) content being published online, policies about IP distribution and usage are becoming more and more important, as they allow for the association of constraints relating to use and reuse. In this context, the contribution of Semantic Web technologies and languages is twofold: they may be used to support the
From the point of view of supporting and easing the activities of Producers, several approaches have been proposed in the last years. Concerning
Other challenges deal with the Consumer point of view, where issues like compatibility testing and usage monitoring need to be addressed to assist Consumers in gaining a better understanding of the policies, thus supporting the compliant usage of protected resources. Works considering for example
As can be seen from the analysis described in the previous sections, and further from the annotated corpus of collected references, research work related to Semantic Web technologies has been, at least for privacy and security, strongly focusing on a small subset of issues and challenges. Indeed, the strong prominence of references related to controlling data collection mechanisms and access control shows that, as is often the case in primarily technological disciplines, privacy and security are often reduced to those basic issues. While in security, some works have been looking at
Beyond security, the contrast between the description and study of privacy in the social sciences, portraying the issue as a complex, multifaceted and interdisciplinary notion, and its treatment in the Semantic Web literature is striking. Many of the papers reviewed consider privacy as a single, specific (and often purely technical challenge), related most often either to identification, or to the control of either data collection or data access. Again, with some exceptions, very few works really consider the potential of Semantic Web technologies to either create or address issues such as appropriation, distortion, or broadly, information dissemination, and none has considered the challenges associated with invasion. While this is not necessarily surprising, considering the technological nature of Semantic Web research, its purpose, and the specific issues it tackles, it is disappointing to see that these technologies are not being used more creatively to address other challenges where their sense-making and inferential capability would no doubt have benefits. It is also disappointing that, as far as we could see from the references collected, those technologies are rarely being included in broader, interdisciplinary discussions about their potential privacy implications.
The policy part of our brief analysis stands out from the two others as being somehow more varied. Unsurprisingly, issues of policy communication have attracted more consideration as being more directly within the remit of the representation languages and formalisms of the Semantic Web. However, a few works have started to appear that use those representational capabilities to support interpreting, monitoring and reasoning upon policies (often related to privacy and access control, but also related to intellectual property management). Those works address issues of rights associated with information assets, and therefore overlap with research in legal informatics where Semantic Web technologies have had many contributions (which are, however, mostly out of the scope of this article). There is nevertheless much work to be done, from the few starting points we encountered, on the implications of using Semantic Web technologies to support both data producers and consumers (including private individuals) in understanding, combining and interpreting policies in a meaningful and valuable way.
