Abstract
Introduction
Cultural Heritage (CH) is one of the most vibrant application domains of the Semantic Web (SW), as demonstrated by the two recent special issues on CH and SW in the Semantic Web journal [1,3]. Institutions such as museums, libraries, or galleries invest substantial resources in the publication of their collections as Linked Open Data (LOD). Aggregators such as Europeana [19] contribute to consolidate CH digital resources in the LOD cloud. Moreover, cross-domain datasets like Wikidata and DBpedia provide an extensive coverage of the CH domain; in this regard, organizations such as the Smithsonian Institution are adapting their strategies to interlink their archives with Wikidata and other Wikimedia projects to increase their visibility [21].
All these efforts have crystallized in the creation of an emergent CH LOD cloud. Despite the availability of large amounts of CH content as LOD, its exploitation is challenging. On this subject, the primary concern in [11] is the remaining work in democratizing the use of CH LOD. [26] also raises this problem, giving as justification the lack of technical competencies in query languages and difficulties in understanding data models. Access to LOD is not a specific challenge of the CH domain, affecting the whole SW community [7,8,15]. This is especially problematic for web developers who are not familiar with SW technologies such as SPARQL, RDF, and OWL. Instead, this group of users is more used to Representational State Transfer (REST) [13] Application Programming Interfaces (APIs) and the JavaScript Object Notation (JSON) data interchange format [5]. Due to this, there are several proposals of API generators for LOD such as RAMOSE [10], OBA [14], or our previous work CRAFTS [30].
In this paper, we propose LOD4Culture, a novel web application that exploits CH LOD for tourism and arts learning purposes. The application offers an interactive map for exploring world-wide
The rest of the paper is organized as follows: Section 2 reviews existing proposals consuming CH LOD. Section 3 provides a technical description of LOD4Culture, including its requirements, its data sources, its architecture, its main functionalities, and its implementation details. Section 4 reports about the impact of LOD4Culture so far. The paper ends with a discussion in Section 5.
Related work
The CH community is carrying out a tremendous effort in digitalizing their contents and publishing them as LOD. This is the case of individual institutions, such as Rijksmuseum [12]. To avoid fragmentation of CH datasets, aggregators like Europeana [19] provide single access to myriads of artworks. CH consortiums like American Art Collaborative [22] or Sampo Model and Portal Series [17] have also contributed to the emergent CH LOD. Studies such as [11,26] survey some of the most relevant initiatives in CH data publication. Nonetheless, cross-domain community-based datasets like Wikidata and DBpedia are also a rich source of CH LOD. Indeed, Wikidata aims to establish itself as a central hub for data integration, data enhancement, and data management in the CH domain.1
The availability of CH LOD has fueled the development of websites and portals for browsing and searching CH content, making it more accessible for educational use [25]. It is often the case that CH LOD publishers provide their own websites and portals over their data. One example is the Europeana website2
Other initiatives reuse existing CH LOD for proposing new interactive applications. One example is openArtBrowser [16], an artwork browser that uses Wikidata as source. This application does not directly access Wikidata; instead, openArtBrowser employs an offline batch process for extracting artworks from Wikidata, converting them to JSON objects, and uploading the output into a text-based search engine, supporting faceted search over the extracted artwork collection. This workflow simplifies web application design by eliminating the need for interaction with the Wikidata endpoint, resulting in lower latency due to the use of a dedicated text-based datastore. However, this component must be maintained and the transformation process must be frequently updated in order to keep consistency with Wikidata.
In our previous work we have proposed CasualLearn [27], a mobile application for CH education. It exploits a dataset of 10K geolocalized learning tasks that were generated from DBpedia, Wikidata, and the Open Data portal of Castile and León.8
[26] addresses the challenge of CH LOD exploitation following the question answering paradigm through virtual assistants (VAs). Their approach consists of a generator of VAs that can be employed with SPARQL endpoints, enabling users to pose questions in natural language through their smartphones. This solution has been tested in several case studies in the CH domain.
Beyond the aforementioned initiatives, the usage of CH LOD is still scarce [11]. [18] discusses the difficulties in creating CH LOD-based applications, requiring expertise in SW technologies not common among the CH community; moreover, it identifies a lack of tools for dealing with LOD to facilitate its exploitation in particular application cases. In addition, we can identify several factors that contribute to the complexity of accessing CH LOD: intricate ontologies, multiple endpoints, and a lack of control over the triplestores. The surveyed approaches address this complexity by utilizing a single endpoint, which they commonly manage. An alternative is to transform CH LOD into other types of data stores, as in the case of openArtBrowser, but this path entails other problems such as inconsistencies that can hinder some of the benefits of LOD.
All the previous approaches but [26] provide visualizations for accessing CH LOD. They typically expose HTML pages about artworks built over RDF data. Text search is the main mechanism for accessing CH contents. Faceted search and interlinking to other entities are employed in some cases, providing new ways of discovering CH content. Some applications like BiographySampo and CasualLearn employ maps, but this is not so common in this domain. While artworks are not normally geolocalized, CH sites like monuments or museums (typically containing artworks) are – this could be exploited with map visualizations in a new breed of CH LOD-based applications, providing highly contextual information and allowing users to quickly and easily grasp the cultural sites of interest in a specific area. Note that visualizations of geospatial data have their own challenges [29], such as providing suitable representations at different scales or processing large volumes of data in an efficient way.
The analysis in Section 2 evidenced the existence of a deluge of CH LOD, although applications that exploit them are relatively scarce. This section describes LOD4Culture, a novel application that aims to harness the wealth of CH contents in Wikidata and DBpedia for tourism and arts education in informal learning – a popular trend in the educational domain that refers to learning processes which occur spontaneously, outside a formal educational setting [23]. Wikidata and DBpedia are community-based sources that are especially suitable for such purposes, as they have a global scope, are frequently updated, and offer multilingual content. The key idea of LOD4Culture is to offer an interactive map to display CH sites, allowing users to discover interesting cultural places in a contextual manner, such as in the vicinity of their current location or a chosen area of interest. In addition, users should be able to filter CH sites by specific types, such as palaces. The application will also offer a resource view for browsing detailed information about CH sites and related entities, such as artists and artworks. The target audience of LOD4Culture are lay users, and, therefore, the application will be designed to hide RDF and SPARQL, as well as to be responsive and portable to cater to a wide range of devices, including mobile phones. The following subsections present the requirements, the employed data sources, the architecture of LOD4Culture, and further design and implementation details. Employed prefixes and namespaces are listed in Table 1.
Prefixes and namespaces employed in this paper
Prefixes and namespaces employed in this paper
The previous paragraph succinctly describes the design principles of LOD4Culture that are derived from our own experience in the field, from the gaps found in the literature (see Section 2), and from the feedback obtained with prospective users in testing early prototypes of LOD4Culture. Table 2 summarizes the main requirements for LOD4Culture. This application should provide two main functionalities: exploration of CH sites through an interactive map (FR0) and browsing of CH entities (FR1). The former functionality is purposed for discovering CH sites in an area of interest defined by the user. The scope should be world-wide and the map view should be adaptable to different zoom levels (FR2), thus allowing the exploration of small areas – showing a marker for each site, as typical in map applications – but also large ones, providing appropriate aggregating mechanisms to avoid cluttering the map view with too many markers. The application should allow the user to filter CH sites by a specific type (FR3).
Requirements for LOD4Culture
Requirements for LOD4Culture
With respect to browsing of CH entities, LOD4Culture should provide suitable visualizations (FR4) with the information available in the target LOD sources (labels, descriptions, images, locations, etc.). A CH entity may correspond to a CH site, an artwork, or an artist. Along a session, the user will typically navigate the map and then switch to the entity browser by selecting a site of interest. Entity browsing should be further facilitated by interlinking sites with their artworks, these with their creators, and so on (FR5). In addition, the application should include a textbox for searching CH entities by label (FR6).
Beyond the aforementioned functionalities, LOD4Culture has to comply with a number of non-functional requirements. First, the application should be portable to facilitate its usage in a wide range of devices, including mobile phones, tablets, and desktop computers (NFR0). Since prospective users are not necessarily fluent in Semantic Web technologies, LOD4Culture should provide a user interface that effectively hides RDF annotations or SPARQL queries from end users (NFR1). Furthermore, the application should be responsive in order to keep user engagement; this may be difficult to achieve with third-party services (especially SPARQL endpoints), but LOD4Culture should provide mechanisms to keep latency low (NFR2). Finally, the application should be localized to English and Spanish (NFR3).
LOD4Culture uses Wikidata and DBpedia as sources of CH LOD. Both datasets are well-known by the Semantic Web community; they are open and collaborative projects that are used in a myriad of applications by researchers and industry practitioners. While the scope of Wikidata and DBpedia is cross-domain, CH is well covered in both sources; lots of CH entities can be found from every place in the world and with very detailed annotations in many cases. CH entities are not uniformly annotated in Wikidata and DBpedia though; well-known items commonly include very rich descriptions, but not so popular CH entities are sparsely annotated.
Wikidata and DBpedia contain complementary information about CH entities. Wikidata includes very fined-grained annotations such as inception dates, creators, heritage designations, materials, architects, subelements, widths, heights…DBpedia shines at providing very thorough comments, Wikipedia categories, images, and mappings to Wikidata. Some CH entities include image annotations in DBpedia, but not in Wikidata – this is very typical of contemporary paintings, e.g. Picasso’s Three Musicians (check
Interestingly, population completeness of CH entities is significantly higher in Wikidata than in DBpedia. As an example, querying the number of castles gives a value of 28,678 in Wikidata9 Note that the query in Listing 2 also requires a heritage designation, a geolocation, sitelinks, statements, and a label in English or Spanish. This is why only 13K castles were extracted, as shown in Table 3.
Unfortunately, the Wikidata endpoint is not responsive enough for supporting the exploration of CH sites (FR0) with a low latency (NFR2). Listing 1 presents a typical query for fulfilling requirement FR0, but the Wikidata endpoint takes several seconds to answer it.10 The latency of a similar query in the English DBpedia endpoint is one order of magnitude lower, i.e. hundreds of milliseconds, which is acceptable for an interactive application.

Query to the Wikidata endpoint for retrieving the number of castles in a bounding box with the following WGS84 coordinates: South 41°, North 42°, West −5°, East −4°
CHsites contains the essential information about CH sites to fulfill functionality FR0 of LOD4Culture. This dataset was created by submitting a series of SPARQL CONSTRUCT queries to the Wikidata endpoint. Listing 2 shows the prototypical query for castles that retrieves labels, descriptions, and images for presentation purposes; types for supporting type filtering (FR3); geocoordinates and administrative territorial location for positioning; and sitelinks and statements for measuring CH site popularity. We decided to borrow terms from the DBpedia and W3C Geo ontologies – such as Only museums, national heritage sites, and UNESCO World Heritage Sites did not require a heritage designation; in these cases the triple pattern

SPARQL CONSTRUCT query for generating a graph of RDF triples about castles with a heritage designation from the Wikidata endpoint
Types of CH sites extracted from Wikidata
CHsites is currently published in a SPARQL endpoint13 Endpoint URL
Statistics of the CHsites dataset
LOD4Culture is designed as a web application to facilitate portability (NFR0), allowing its deployment in multiple devices and platforms as a web browser is ubiquitous nowadays. Traditional web applications require full-page reloads from the server that negatively impacts performance. This limitation is addressed in single-page applications (SPAs), web applications that initially load a single web document and then update their body content with data from the server, resulting in performance gains and a more dynamic experience [28]. Given the latency requirement of LOD4Culture (NFR2), the SPA approach is embraced in this application.
LOD4Culture exposes the routes in Table 5. R0 is a landing page that presents the application and includes links to R1 and R2. Route R1 corresponds to the interactive map functionality (FR0);
Routes exposed in LOD4Culture. Query parameters marked with ∗ are required
Route R2 is used to get a representation of a CH entity (functionality FR1).
After embracing the SPA principles and defining the routes, the architecture of LOD4Culture can be designed; it is graphically depicted in Fig. 1. The

Architecture of LOD4Culture.
The Appendix depicts the configuration file of Note that this identifier was chosen before naming the application as LOD4Culture.
Sample calls to the LOD4Culture API
The As map applications typically employ a spherical Mercator projection, vertical sides of cells will look longer near the poles than near the Equator.
A grid cell is the unit of work for displaying CH sites in the map. After computing the grid, the
Once cell data is ready, the
Check
When a marker of a CH site is first clicked, the
The threshold of 10 CH sites per cell is a configuration parameter of the application. It is employed to avoid plotting too many markers in a cell since they will look cluttered in the map. This is especially relevant with low zoom levels (0–9) in which a cell can concentrate hundreds or even tens of thousands of CH sites – it is also quite inefficient to gather the information of all the CH sites in a very large area. Instead, LOD4Culture will plot a single cluster marker with the number of CH sites in the cell. A simple approach for positioning a cluster marker is to use the center of the cell; however, the result does not look good as the grid division is too evident to the user and, worse, cluster markers may be positioned in awkward places, e.g. in the middle of the ocean. Instead, the

Snapshots of the map interface of LOD4Culture. (a) Route /app/?loc=41.623655,3.801270,5z: the map covers a very large area that includes South-West Europe and North Africa; cluster markers aggregating CH sites are shown in Europe and the Mediterranean coast of Africa, while single CH site markers are displayed in the rest of Africa. (b) Route /app/?loc=42.757188,-6.908581,8z: the map is positioned in the last part of the Way of St James (North-West Spain); cluster markers are positioned in the main cities of the view, while single CH site markers are plotted along the North-East border of Portugal and Spain; the taxonomy of CH site types is displayed in the top-left form. (c) Route /app/?loc=42.757188,-6.908581,8z&siteType=http://www.wikidata.org/entity/Q2977: the map covers the same area as (b), but only cathedrals are shown due to the type filter set in the top-left form; a popup of León Cathedral is displayed. (d) Route /app/?loc=41.893061,12.482765,15z: the map is centered in a tiny area, corresponding to the city center of Rome; the search textbox shows a list of 10 suggestions for input text
In order to support filtering of CH sites by type (FR3), the Note that the counting of the CH site members shown in the application is an approximation. For performance reasons – there are more than 4.4K CH site types – only the direct members of each CH site type are retrieved (check model element
During a session with LOD4Culture, a user will navigate the map and may find a CH site of their interest, e.g. Cathedral of León in Fig. 2(c), or one of the artworks at a site. Alternatively, a user may use the search text box to find CH sites, artists, or artworks (FR6) – see Fig. 2(d) for an example. LOD4Culture supports browsing of such CH entities (FR1) that employ R2-compliant routes. Refreshing the browser URL will activate the

Snapshots of the CH entity interface of LOD4Culture. (a) Extract of the visualization of CH site National Sculpture Museum – route /app/?type=Site&uri=http://www.wikidata.org/entity/Q1581475 – showing label, short description, types and image from Wikidata; and long description and Wikipedia categories from DBpedia; source pages and social media buttons are also included. (b) Extract of the visualization of CH site Museo del Prado – route /app/type=Site&uri=http://www.wikidata.org/entity/Q160112 – showing basic information, e.g. country, and a browsable list of the 5K artworks retrieved from Wikidata; clicking on an artwork will lead to a new CH entity in LOD4Culture, e.g. painting ‘Las Meninas’. (c) Extract of the visualization of artwork ‘Nighthawks’ – route /app/?type=Artwork&uri=http://www.wikidata.org/entity/Q83872 – showing an image and a list of factual information; blue buttons link to other CH entities in LOD4Culture, e.g. artist Edward Hopper; gray buttons display modal windows with the corresponding source Wikidata or DBpedia pages. (d) Extract of the visualization of artist Leonardo da Vinci – route /app/?type=Artist&uri=http://www.wikidata.org/entity/Q762 – showing label, short and long descriptions, types, and Wikipedia categories; the search textbox shows a list of suggestions for
In order to answer an entity request from the
For example, Wikidata includes more than 22K artworks in the collection of Metropolitan Museum of Art (
LOD4Culture is coded in JavaScript; this programming language is the natural choice for developing web applications. The implementation effort has been considerably reduced by the usage of a number of JavaScript libraries. Notably, the
The user interface is built with Bootstrap24
Since LOD4Culture needs to be localized to English and Spanish (requirement NFR3),
The source code of LOD4Culture is available on GitHub.29
Before releasing LOD4Culture to the public, we conducted several user tests with early prototypes of the application. We engaged approximately 18 prospective users with a mixed background, including secondary school art teachers, secondary school students, retirees, computer scientists, technology education experts, and computer science graduate students – note that no Semantic Web experts participated and that this group adequately represents both the tourism and education cases. We received feedback from most of the users (but not all) through email messages, informal talks, and observations. This was key in improving the prototype, resulting in the release of a more refined application. Initially, the prototype only displayed museums on the map, but user feedback led us to expand the scope to include other CH sites, which are listed in Table 3 of the manuscript. Some users challenged us to show in LOD4Culture lesser-known CH sites, such as the abandoned village of Castrotorafe31 Accessible on the LOD4Culture map, check
We also obtained very valuable feedback on usability issues. For example, we observed a user struggling to zoom in on a picture in the entity interface of LOD4Culture. This was due to a viewport meta tag32
The test site of LOD4Culture33 See footnote 30.
As a result of this activity in social networks, we were contacted by DBpedia to feature a short community article about LOD4Culture in the DBpedia Newsletter of April 2022.34
Since traffic in the test site is tracked with Google Analytics, the uptake of LOD4Culture can be easily analyzed. Table 7 summarizes the collected data, obtained in February 2023. In a period of barely a year, more than 1.7K users have employed the application in 2.8K sessions (with an average time of 1 minute and 57 seconds). 64.6% of new users came to LOD4Culture directly, i.e. entering the URL in the browser, 17.2% were acquired through organic40 Organic means that traffic is earned, not paid.
Uptake of the test site of LOD4Culture
Tracked data also includes latencies of the pages served with LOD4Culture. A map page takes 1.4 seconds on average; the time elapsed in mobile devices is just 0.8 seconds due to small screens requiring less grid cells than desktop computers. In the case of CH entity pages, the average latency is 2.4 seconds.
LOD4Culture is a CH LOD-based application that fulfills all the requirements in Table 2. LOD4Culture adheres to the REST principles of web design [13], so all the application state is encapsulated in the URL (see Table 5). As a result, URLs will refresh as a result of the user interaction, allowing them to be shared and ensuring that a URL will produce the same view regardless of the device employed.
The interactive map of LOD4Culture is arguably the most challenging part of the application, especially for accommodating different zoom levels (FR2). The solution devised is highly adaptable by partitioning the map view in a grid and by retrieving cell data in a two-step process: first requesting the number of CH sites in a cell, and then obtaining site data if it makes sense to plot it. In this way, network exchanges are reduced and client resources are not wasted by plotting hundreds or thousands of markers in a tiny area. In contrast, the browser of CH entities is much simpler, as the
The complexity of the application is greatly reduced by the use of a CRAFTS API. The interactive map only needs four template queries to gather the required data. Similarly, the CH entity browser just uses three RDF resources defined in the API model to fulfill its data needs (see sample calls C7–9 in Table 6). From the point of view of the application, it only sees JSON data and REST API calls. Behind the scenes, the CRAFTS API provides single access to three different endpoints by translating API requests into SPARQL queries. Despite this simplicity of use, the configuration file of LOD4Culture has been carefully crafted to match the needs of the application. In the case of template queries, this involves the selection of the prototype queries and then defining the parameters, e.g. the SPARQL query in Listing 1 is mapped to the
Since LOD4Culture is an interactive application, we have taken special care to meet latency requirements (NFR2). We employ client-side caching along a user session to avoid duplicated requests to the API. Server-side caching in CRAFTS can benefit the whole community of LOD4Culture users, as well as data providers that will receive less traffic. In this regard, the use of a common grid with fixed lengths per zoom level contribute to make server-side caching more effective. Despite all these measures, latency is quite dependent on data providers. As discussed in Section 3.2, we had to set up our own dataset due to latency problems with Wikidata. Thanks to SW technologies, the burden to gather the data and upload it into a new endpoint was negligible and can be easily replicated.
All in all, LOD4Culture can be used for tourism and CH education purposes by lay users, i.e. without requiring knowledge of SW technologies. More than 1.7K users have employed the application in a year. We have received very positive feedback through social media and the application has been featured in DBpedia Newsletter and in the open data applications list of datos.gob.es. Our future work includes user studies in educational and tourism settings in order to better assess the potential of LOD4Culture. While the application can be useful in detecting inaccuracies or gaps in CH data, we aim to investigate how it can also facilitate contributions to the LOD sources.
