Abstract
Introduction
Data on the Web has steadily grown in the last decades. However, the heterogeneity of the data published on the Web has hindered its consumption and usage [59]. This scenario has fostered data transformation and publication of data as Knowledge Graphs in both academic and industrial environments [35]. These Knowledge Graphs normally expose Web data expressed in RDF and modeled according to an ontology.
A large number of techniques that query or translate data into RDF have been proposed, and follow two approaches, namely, (1) RDF materialization, that consists of translating data from one or more heterogeneous sources into RDF [5,49]; or (2) Virtualization, (Ontology Based Data Access) [51,68] that comprises translating a SPARQL query into one or more equivalent queries which are distributed and executed on the original data source(s), and where its results are transformed back to the SPARQL results format [11]. Both types of approaches rely on an essential element, a mapping document, which is the key-enabler for performing the required translation.
Mapping languages allow representing the relationships between the data model in heterogenous sources, and an RDF version that follows the schema of an ontology, i.e., they define the rules on how to translate from non-RDF data into RDF. The original data can be expressed in a variety of formats such as tabular, JSON, or XML. Due to the heterogeneous nature of data, the wide variety of techniques, and specific requirements that some scenarios may impose, an increasing number of mapping languages have been proposed [24,29,46]. The differences among them are usually based on three aspects: (a) the focus on one or more data formats, e.g., the W3C Recommendations R2RML focuses on SQL tabular data [19]; (b) a specific requirement they address, e.g., SPARQL-Generate [44] allows the definition of functions in a mapping for cleaning or linking the generated RDF data; or (c) if they are designed for a scenario that has special requirements, e.g., the WoT-mappings [15] were designed as an extension of the WoT standard [40] and used as part of the Thing Descriptions [39].
As a result, the diversity of mapping languages provides a rich variety of options for tools to translate data from heterogeneous formats into RDF, in many different scenarios [21,41,48,55]. However, these tools are mostly tied to one mapping language, and sometimes they do not even implement the entire language specification [5,9]. Deciding which language and technique should be used in each scenario becomes a costly task, since the choice of one language may not cover all the needed requirements [20]. Some scenarios require a combination of mapping languages due to their different features, which requires the use of different techniques. In many cases, this diversity leads to ad hoc solutions that reduce reproducibility, maintainability, and reusability [36].
Mapping languages for KG construction maintain the same bottom-line idea and purpose: to describe and establish the relationships between data sources and the schema provided by an ontology. Therefore, it can be assumed that mapping languages share common inherent characteristics that can be modeled.
This paper presents the Conceptual Mapping ontology, which aims to gather the expressiveness of existing declarative mapping languages and represent their shared characteristics. The Conceptual Mapping ontology has been developed based on the requirements extracted from the Mapping Challenges proposed by the community1
The Conceptual Mapping ontology has been developed following the LOT Methodology [53]. It reuses existing standards such as DCAT [1] and WoT Security.2
The rest of this article is structured as follows. Section 2 provides an overview of relevant works centered on mapping languages. Section 3 describes the methodology used to develop the ontology. Section 4 presents the purpose and scope of the ontology, its requirements, and how they are extracted. Section 5 shows details about the ontology conceptualization and evaluation, and some examples. Section 6 illustrates how the ontology is published and maintained. Finally, Section 7 summarizes the work presented and draws some conclusions and future steps.
Analyzed mapping languages and their corresponding references
Analyzed mapping languages and their corresponding references
In this section, the current scene of mapping languages is described first, regardless of the approach they follow, i.e., RDF materialization or virtualization. Then, previous works comparing mapping languages are surveyed.
The different scenarios in which mapping languages are used and their specific requirements have led to the creation of several mapping languages and tailored to specific domain extensions. This section presents and describes existing mapping languages, listed in Table 1. Depending on their syntax, they can be classified into the following: RDF-based, SPARQL-based, and based on other schema. It is worth mentioning that some mapping languages have become W3C recommendations, namely R2RML [19] and CSVW [66]. The surveyed languages include the ones considered relevant because of their widespread use, unique features, and current maintenance. Deprecated or obsolete languages are not included.

Existing mapping languages and their relationships.
In this category, we can also find more languages not related to R2RML. XLWrap [43] is focused on transforming spreadsheets into different formats. CSVW [66] enables tabular data annotation on the Web with metadata, but also supports the generation of RDF. Finally, WoT Mappings [15] are oriented to be used in the context of the Web of Things.
As the number of mapping languages increased and their adoption grew wider, comparisons between these languages inevitably occurred. This is the case of, for instance, SPARQL-Generate [44], which is compared to RML in terms of query/mapping complexity; and ShExML [29], which is compared to SPARQL-Generate and YARRRML from a usability perspective.
Some studies dig deeper, providing qualitative complex comparison frameworks. Hert et al. [33] provide a comparison framework for mapping languages focused on transforming relational databases to RDF. The framework is composed of 15 features, and the languages are evaluated based on the presence or absence of these features.The results lead authors to divide the mappings into four categories (direct mapping, read-only general-purpose mapping, read-write general-purpose mapping, and special-purpose mapping), and ponder on the heavy reliance of most languages on SQL to implement the mapping, and the usefulness of read-write mappings (i.e., mappings able to write data in the database). De Meester et al. [20] show an initial analysis of 5 similar languages (RML + FnO, xR2RML, FunUL, SPARQL-Generate, YARRRML) discussing their characteristics, according to three categories: non-functional, functional and data source support. The study concludes by remarking on the need to build a more complete and precise comparative framework and asking for a more active participation from the community to build it. To the best of our knowledge, there is no comprehensive work in the literature comparing all existing languages.
Methodology
This section presents the methodology followed for developing the Conceptual Mapping ontology. The ontology was developed following the guidelines provided by the Linked Open Terms (LOT) methodology. LOT is a well-known and mature lightweight methodology for the development of ontologies and vocabularies that has been widely adopted in academic and industrial projects [53]. It is based on the previous NeOn methodology [64] and includes four major stages: Requirements Specification, Implementation, Publication, and Maintenance (Fig. 2). In this section, we describe these stages and how they have been applied and adapted to the development of the Conceptual Mapping ontology.

Workflow proposed by the LOT methodology [53].
This stage refers to the activities carried out for defining the requirements that the ontology must meet. At the beginning of the requirements identification stage, the goal and scope of the ontology are defined. Following, the domain is analyzed in more detail by looking at the documentation, data that has been published, standards, formats, etc. In addition, use cases and user stories are identified. Then, the requirements are specified in the form of competency questions and statements.
In this case, the specification of requirements includes purpose, scope, and requirements. The requirements are specified as facts rather than competency questions and validated with Themis [26], an ontology evaluation tool that allows validating requirements expressed as tests rather than SPARQL queries. The authors consider this approach to be adequate in this case since (1) there are no use cases as this ontology is a mechanism of representation of mapping language’s features; and (2) there are no SPARQL queries because they result from Competency Questions which are in turn extracted from use cases and user stories. Further details are shown in Section 4.
Implementation
The goal of the Implementation stage is to build the ontology using a formal language, based on the ontological requirements identified in the previous stage. From the set of requirements a first version of the model is conceptualized. The model is subsequently refined by running the corresponding evaluations. Thus, the implementation process follows iterative sprints; once it passes all evaluations and meets the requirements, it is considered ready for publication.
The conceptualization is carried out representing the ontology in a graphical language using the Chowlk notation [10] (as shown in Fig. 4). The ontology is implemented in OWL 2 using Protégé. The evaluation checks different aspects of the ontology: (1) requirements are validated using Themis [26], (2) inconsistencies are found with the Pellet reasoner, (3) OOPS! [54] is used to identify modeling pitfalls, and (4) FOOPS! [31] is run to check the FAIRness of the ontology. Further details are described in Section 5.
Publication
The publication stage addresses the tasks related to making the ontology and its documentation available. The ontology documentation was generated with Widoco [30], a built-in documentation generator in OnToology [2], and it is published with a W3ID URL.5
Finally, the last stage of the development process, maintenance, refers to ontology updates as new requirements are found and/or errors are fixed. The ontology presented in this work promotes the gathering of issues or new requirements through the use of issues in the ontology GitHub repository. Additionally, it provides control of changes, and the documentation enables access to previous versions. Further details are shown in Section 6.
Conceptual mapping requirements specification
This section presents the purpose, scope, and requirements of the Conceptual Mapping Ontology. In addition, it also describes from where and how the requirements are extracted: analysing the mapping languages (presented as a comparative framework) and the Mapping Challenges proposed by the community.
Purpose and scope
The Conceptual Mapping ontology aims at gathering the expressiveness of declarative mapping languages that describe the transformation of heterogeneous data sources into RDF. This ontology-based language settles on the assumption that all mapping languages used for the same basic purpose of describing data sources in terms of an ontology to create RDF, must share some basic patterns and inherent characteristics. Inevitably, not all features are common. As described in previous sections, some languages were developed for specific purposes, others extend existing languages to cover additional use cases, and others are in turn based in languages that already provide them with certain capabilities. The Conceptual Mapping ontology is designed to represent and articulate these core features, which are extracted from two sources: (1) the analysis of current mapping languages, and (2) the limitations of current languages identified by the community. These limitations, proposed by the W3C Knowledge Graph Construction Community Group,6
This ontology has also some limitations. As presented in Section 2, mapping languages can be classified into three categories according to the schema in which they are based: RDF-based, SPARQL-based and based on other schemes. Conceptual Mapping is included in the first category and, as such, has the same inherent capabilities and limitations as RDF-based languages regarding the representation of the language as an ontology. This implies that it is feasible to represent their expressiveness, whereas reusing classes and/or properties or creating equivalent constructs. Languages based on other approaches usually follow schemas that make them relatable to ontologies. This can be seen in the correspondence between YARRRML and RML: RML is written in Turtle syntax. YARRRML [34] is mainly used as a user-friendly syntax to facilitate the writing of RML rules. It is based on YAML, and can easily be translated into RML.7
Lastly, SPARQL-based languages pose a challenge. SPARQL is a rich and powerful query language [50] to which these mapping languages add more capabilities (e.g., SPARQL-Generate, Facade-X). It has an innate flexibility and capabilities sometimes not comparable to the other languages. For this reason, representing every single capability and feature of SPARQL-based languages is out of the scope of this article. Given the differences of representation paradigm between RDF and SPARQL for creating mappings, it cannot be ensured that the Conceptual Mapping covers all possibilities that a SPARQL-based language can.
This subsection presents a comparison framework that collects and analyzes the main features included in mapping language descriptions. It aims to fill the aforementioned gap on language comparison. The diversity of the languages that have been analyzed is crucial for extracting relevant features and requirements. For this reason, the framework analyzes languages from the three categories identified in Section 2.
The selected languages fulfill the following requirements: (1) widely used, relevant and/or include novel or unique features; (2) currently maintained, and not deprecated; (3) not a serialization or a user-friendly representation of another language. For instance, D2RQ [8] and R2O [6] were superseded by R2RML, which is included in the comparison. XRM [69] is not included either, due to the fact that it provides a syntax for CSVW, RML and R2RML, which are also included.
The following RDF-based languages are included: R2RML [19], RML [24], KR2RML [60], xR2RML [46], R2RML-F [22], FunUL [38], XLWrap [43], WoT mappings [15], CSVW [66], and D2RML [12]. The SPARQL-based languages that were analyzed are: XSPARQL [7], TARQL [65], SPARQL-Generate [44], Facade-X [18] and SMS2 [63]. Finally, we selected the following languages based on other formats: ShExML [29], Helio Mappings [14] and D-REPR [67].
These languages have been analyzed based on their official specification, documentation, or reference paper (listed in Table 1). Specific implementations and extensions that are not included in the official documentation are not considered in this framework. The cells (i.e. language feature) marked “*” in the framework tables indicate that there are non-official implementations or extensions that include the feature.

Input source data and reference ontology that represents information on cities and their location.
The framework has been built as a result of analyzing the common features of the aforementioned mapping languages, and also the specific features that make them unique and suitable for some scenarios. It includes information on data sources, general features for the construction of RDF graphs, and features related to the creation of subjects, predicates, and objects. In the following subsections, the features of each part of the framework are explained in detail. The language comparison for data sources is provided in Table 2, for triples creation in Table 3, and for general features in Table 4. All these tables are presented in Appendix B.
Throughout the section, there are examples showing how different languages use the analyzed features. The example is built upon two input sources: an online JSON file, “coordinates.json”, with geographical coordinates (Fig. 3(b)); and a table from a MySQL database, “cities” (Fig. 3(c)). The reference ontology is depicted in Fig. 3(a). It represents information about cities and their locations. The expected RDF output of the data transformation is shown in Listing 1. Each mapping represents only the relevant rules that the subsection describes. The entire mapping can be found in the examples section of the ontology documentation. 5

Expected RDF output for the data sources and the ontology in Fig. 3
Table 2 shows the ability of each mapping language to describe a data source in terms of retrieval, features, security, data format and protocol.
Half of the languages do not allow the definition of security terms. Some languages are specific for RDB terms (R2RML and extensions, with
Regarding protocols, all languages consider local files, except WoT mappings, which are specific for HTTP(s). It is highly usual to consider HTTP(s) and database access (especially with the ODBC and JDBC protocols). Only XSPARQL, TARQL, D-REPR, and XLWrap describe exclusively local files.
The features provided by each language are closely related to the data formats that are covered. Queries are usual for relational databases and NoSQL document stores and iterators for tree-like formats. Some languages also enable the description of delimiters and separators for tabular formats (e.g., CSVW defines the class
The most used format is tabular (RDB and CSV). Some languages can also process RDF graphs such as SMS2, ShExML, RML, SPARQL-Generate, Helio, and D2RML (e.g.


Table 3 represents how different languages describe the generation of triples. We assess whether they generate the
The categories

xR2RML mapping rules to describe the ontology depicted in Fig. 3(a)

RML + FnO mapping rules (written in YARRRML) to describe the ontology depicted in Fig. 3(a)
Table 4 shows the features of mapping languages regarding the construction of RDF graphs such as
Most RDF-based languages allow static assignment to named graphs. R2RML, RML, R2RML-F, FunUL, and D2RML enable also dynamic definitions (e.g.,
Allowing conditional statements is not usual; it is only considered in the SPARQL-based languages (with the exception of SMS2), XLWrap and D2RML (e.g.
Linking rules using join conditions imply evaluating if the fields selected are equal. Since the join condition is the most common, applying the equal logical operator is the preferred choice. Only a few languages consider other similarity functions to perform link discovery, such as the Levenshtein distance and Jaro-Winkler, e.g., Helio.

Helio mapping with linking rules to describe the ontology depicted in Fig. 3(a)

SPARQL-Generate query with conditional rules to describe the ontology depicted in Fig. 3(a)
Following its inception, the W3C Knowledge Graph Construction Community Group
6
defined a series of challenges for mapping languages based on the experience of members in using declarative mappings.
1
These challenges are a summary of the limitations of current languages. They have been partially addressed independently in some of the analyzed languages, such as RML [23] and ShExML [27]. These challenges are summarized as follows:
Conceptual mapping requirements
In order to extract the requirements that serve as the basis for the development of the Conceptual Mapping ontology, we take as input the analysis from the comparison framework and the Mapping Challenges described in previous sections. From a combination of their features, we extract 30 requirements. These requirements are expressed as facts, and are available in the ontology repository and portal.10
The requirements gathered range from general-purpose to fine-grained details. The general-purpose requirements refer to the basic fundamental capabilities of mappings, e.g., to create the rules to generate RDF triples (cm-r8) from reference data sources (cm-r7). The requirements with the next level of detail involve some specific restrictions and functionalities, e.g. to indicate the specific type (whether they are IRIs, Blank nodes, etc.) of subjects (cm-r16), predicates (cm-r17), objects (cm-r18), named graphs (cm-r19), datatypes (cm-r20) and language tags (cm-r21); the possibility of using linking conditions (cm-r23) and functions (cm-r15). Finally, some requirements refer to specific details or features regarding the description of data sources (e.g. cm-r4, cm-r6) and transformation rules (e.g. cm-r14, cm-r22, cm-r25).
Not all the observed features in the comparison framework have been added to the set of requirements. Some features are really specific, and supported by a minority of languages, sometimes only one language. As a result, we selected the (really) detailed features in these requirements to build the core specification of the Conceptual Mapping when they tackled the basic functionalities of the language. The rest of the details are left to be included as extensions. This differentiation and the modeling criteria is explained further in Section 5.
This section describes in detail the activities and tasks carried out to implement the ontology, that consists in the conceptualization of the model, the encoding in a formal language, and the evaluation to fix errors, inconsistencies, and ensure that it meets the requirements. Additionally, an example of the ontology’s use is presented at the end of the section.
Ontology conceptualization
The ontology’s conceptualization is built upon the requirements extracted from experts experience, a thorough analysis of the features and capabilities of current mapping languages presented as a comparative framework; and the languages’ limitations discussed by the community and denoted as Mapping Challenges. The resulting ontology model is depicted in Fig. 4. This model represents the core specification of the Conceptual Mapping ontology that contains the essential features to cover the requirements. Some detailed features are also included when considered important to the language expressiveness, or needed for the language main functionality. Other detailed features are considered as extensions, as explained further in this section. For description purposes, we divide the ontology into two parts, Statements and Data Sources, that compose the core model. These two parts, when not used in combination, cannot describe a complete mapping. For that reason they are not separated into single modules.

Visual representation of the conceptual mapping ontology created using the Chowlk diagram notation [10].
Data properties in the
In order to represent the fragments of data that are referenced in a statement map, the class
A source frame corresponds to a data source (with
Resource maps are expressed with an evaluable expression (
A special case of a statement map is a conditional statement map (
The following ontology design patterns have been applied in the conceptualization as they are common solutions to the problem of representing taxonomies and linked lists:
The SKOS vocabulary has been reused to represent some coding schemes such as the protocol taxonomy and the function taxonomy. The design pattern consists on having an instance of The class
Ontology evaluation
The ontology, once implemented, has been evaluated in different ways to ensure that it is correctly implemented, it has no errors or pitfalls, and meets the requirements.
With these evaluations, we can conclude that the ontology is correctly encoded and implemented, and that it meets the requirements specified in Section 4.
The Conceptual Mapping ontology has been designed as a core ontology. However, as time passes, new requirements may emerge. In order to include these new requirements, new modules of the Conceptual Mapping ontology shall be developed. It is worth mentioning that this is a common practice for ontologies, which is highly suitable for adapting an existing ontology to new scenarios, by ontology modules specialized for a specific set of requirements. A clear example of this is the SAREF ontology,13

CSV extension conceptualization.
In the case of the Conceptual Mappings a sample extension16

Description with the conceptual mapping of two data sources (a JSON file and a relational database), their access and fields
This section builds a mapping in three steps (data sources in Listing 8, triples in Listing 9 and special statements in Listing 10) to represent how the proposed language can describe data with different features. The mapping uses the data sources “coordinates.json” (Fig. 3(b)) and “cities”(Fig. 3(c)) as input and the ontology depicted in Fig. 3(a) as reference, to create the output RDF shown in Listing 1. Additionally, Appendix A contains a second example to illustrate different features than the ones represented in the example of this section, to provide more insights about the expressiveness of this language.

Description with the conceptual mapping of the creation of regular statements from the data sources described in Listing 8

The ontology is considered ready for publication when it passes all evaluations. This means that it is correctly implemented in the formal language (OWL) and meets the requirements.
In order to publish the ontology, the first step required is to create the ontology documentation. We used Widoco [30], integrated inside the OnToology [2] system, to automatically generate and update the HTML documentation every time there is a commit in the GitHub repository where the ontology is stored. This documentation contains the ontology metadata, links to the previous version, a description of the ontology, the diagram, and detailed examples of the capabilities of the language. It is published using a W3ID URL 5 and under the CC BY-SA 4.0 license.
The HTML documentation is not the only documentation resource provided. An overview of all resources is provided in the ontology portal. 3 This portal shows in a table the ontologies associated with the Conceptual Mapping ontology. For now, the core (Conceptual Mapping) and an extension to describe CSV files in detail (Conceptual Mapping – CSV Description) are available. For each ontology, links to the HTML documentation, the requirements, the GitHub repository, the Issue Tracker, and the releases are provided.
The maintenance is supported by the Issue Tracker,17
This paper presents the Conceptual Mapping, an ontology-based mapping language that aims to gather the expressiveness of current declarative mapping languages. In order to build this ontology, we first conducted an extensive analysis of the state-of-the-art mapping specifications (presented as a comparison framework) and mapping challenges proposed by the community, improving the understanding of current mapping languages and expanding previous studies on the comparison of language characteristics. Then, this analysis allowed us to develop a unique model that aims to integrate the common features of existing languages, acknowledging the limitations of representing the full potential of SPARQL-based languages such as SPARQL-Generate or Facade-X. Next, the approach was evaluated by validating that the constructs provided by this language can address the requirements extracted from the two-fold analysis. Thus, we ensure that this language covers the required expressiveness. The language is formalized as an ontology that is available along with a documentation online.
Our future work lines include exploring the limitations of the current scope and addressing the gap to be able to represent the expressiveness of SPARQL-based languages. Similarly to a programming language, SPARQL-based languages can specify “instructions” to describe and transform data that is not accessible by other languages, because of inner restrictions or simply because they lack the necessary constructs. At some point, modelling constructs for each specific use case becomes unfeasible, unpractical and very likely, too verbose. Despite the difficulties, we want to keep updating with modules our ontology with new issues and addressing the limitations to a reasonable extent. We also want to explore the possibility of implementing this ontology as a common interchange language for mapping translation purposes [16,37] that we believe can help build bridges toward mapping interoperability. We also consider the integration of the mapping translation step into the common workflow for constructing virtual and materialized Knowledge Graphs, using this conceptual model as the core resource for carrying out this process. Furthermore, we want to integrate this ontology into previous work on mapping rules management, MappingPedia [56], with the translation step between different specifications. In this manner, we aim to help users and practitioners during the selection of mapping languages and engines, not forcing them to select the ones that are only under their control, but being able to select the ones that best fit their own specific use cases. Finally, we want to specify the correspondence of concepts between the considered mapping languages and the Conceptual Mapping, and to formally define the semantics and operators required to perform the mapping translation, adapting previous works on schema and data translations [3,4].

