Abstract
Introduction
Over the last two decades, ontologies have seen wide-spread use for a variety of purposes. Some of them, such as the Gene Ontology [16], have found significant use by third parties. However, the majority of ontologies have seen hardly any re-use outside the use cases for which they were originally designed [27,37].
It behooves us to ask why this is the case, in particular, since the heavy re-use of ontologies was part of the original conception for the Semantic Web field. Indeed, many use cases have high topic overlap, so that a re-use of ontologies on similar topics should, in principle, lower development cost. However, according to our experience, it is often much easier to develop a new ontology from scratch, than it is to try to re-use and adapt an existing ontology. We can observe that this sentiment is likely shared by many others, as the new development of an ontology so often seems to be preferred over adapting an existing one.
We posit, based on our experience, that four of the major issues preventing wide-spread re-use are (i) differing representational granularity, (ii) lack of conceptual clarity in many ontologies, (iii) lack and difficulty of adherence to established good modeling principles, and (iv) lack of re-use emphasis and process support in available ontology engineering tooling. We explain these aspects in more detail in the following. As a remedy for these issues, we propose tool-supported
Note that a fine-grained ontology can be populated with coarse-granularity data; the converse is not true. If a use case requires fine-granularity data, a coarse-grained ontology is essentially useless. On the other hand, using a fine-grained ontology for a use case that requires only coarse granularity data is unwieldy due to (possibly massively) increased size of ontology and data graph.
Even more problematically, is that two use cases may differ in granularity in different ways in different parts of the data, respectively, ontology. That is, the level of abstraction is not uniform across the data. For example, one use case may call for details on the statistical models underlying population data, but not for measurement instruments for temperatures, whereas another use case may only need estimated population figures, but require calibration data for temperature measurements. Essentially, this means that attempting to re-use a traditional ontology may require modifying it in very different ways in different parts of the ontology. An additional complication is that ontologies are traditionally presented as monolithic entities and it is often hard to determine where exactly to apply such a change in granularity.
We can briefly illustrate this using an example from the OAEI1 For more information on the Ontology Alignment Evaluation Initiative, see
By definition, an ontology with high conceptual clarity will be much easier to re-use, simply because it is much easier to understand the ontology in the first place. Thus, a key quest for ontology research is to develop ontology modeling methodologies which make it easier to produce ontologies with high conceptual clarity.
That These are ontological terms; a perdurant means “an entity that only exists partially at any given point in time” and endurant means “an entity that can be observed as a complete concept, regardless of the point time.”
A further issue is that even in cases where the aforementioned re-use challenges are manageable, implementing and subsequently maintaining re-use in practice is problematic due to
Furthermore, through ontology re-use, the ontologist commits to a design and logic built by a third party. As the resulting ontology evolves, keeping track of the provenance of re-used ontological resources and their locally instantiated representations may become important, e.g., to resolve design conflicts resulting from differing requirements, or to keep up-to-date with the evolution of the re-used ontology. This is particularly important in case remote resources are reused directly rather than through cloning into a local representation (e.g., using
Processes and tools should be sought that make it possible to leverage modeling experience by seasoned experts, without actually requiring their direct involvement. This was one of the original ideas behind ontology re-use which, unfortunately, did not quite work out that well, for reasons including those mentioned above. Our modularization approach, however, together with the systematic utilization of ontology design patterns, and our accompanying tools, gives us a means to address this issue.
The notion of
Note that modules, in this sense, indicate a departure from a more traditional perspective on ontologies, where they are often viewed as enhanced taxonomies, with a strong emphasis on the structure of the class subsumption hierarchy. Modules can contain their own taxonomy structure, guided by the design logic of the module, that ideally integrates into usability-wise coherent taxonomy of the ontology as a whole; but the latter is not a hard requirement. From our perspective, the occurrence of subclass relationships within an ontology is not a key guiding principle for modeling or ontology organization. As we will see, modules make it possible to approach ontology modeling in a divide-and-conquer fashion; first, by modeling one module at a time, and then connecting them.3 Other divide and conquer approaches have also recently been proposed [41,59], and while they seem to be compatible with ours, exact relationships still need to be established.
Modules furthermore provide an easy way of avoiding the hassle of dealing with ontologies that are large and monolithic: understanding an ontology amounts to understanding each of its modules, and then their interconnections. This, at the same time, provides a recipe for documentation which resonates with domain experts’ conceptualizations (which were captured by means of the modules), and thus makes the documentation and ontology easier to understand. Additionally, using modules facilitates modification, and thus adapting an ontology to a new purpose, as a module is much more easily replaced by a new module with, for instance, higher granularity, because the module inherently identifies where changes should be localized.
The systematic use of
In our approach, well-designed ontology design patterns, provided as templates to the ontology modelers, make it easier to follow already established good modeling principles, as the patterns themselves will already reflect them [24]. When a module is to be modeled, within our process there will always be a check whether some already existing ontology design pattern is suitable to be adapted for the purpose. Modules, as such, are often derived from patterns as templates.
The principles and key aspects laid out above are tied together in a clearly defined modular ontology modeling process which is laid out below, and which is a refinement – with some changes of emphasis – of the eXtreme Design methodology [6]. It is furthermore supported by a set of tools developed for support of this process, the CoModIDE plug-in to Protégé, and which we will discuss in detail below. Also central to our approach is that it is usually a collaborative process with a (small) team that jointly has the required domain, data and ontology engineering expertise, and that the actual modeling work utilizes schema diagrams as the central artifact for modeling, discussion, and documentation.
This paper is structured as follows. Section 2 describes our related work – this covers precursor methods, the eXtreme Design methodology, and overviews of concepts fundamental to our approach. Section 3 describes our modular ontology modeling process in detail. Section 4 presents CoModIDE as a tool for supporting the development of modular ontologies through a graphical modeling paradigm, as well as a rigorous evaluation of its effectiveness and usability. Section 5 describes additional, supporting infrastructure and other resources for the MOMo process. Finally, in Section 6, we conclude.
This paper significantly extends [51] and summarizes several other workshop and conference papers: [50,53], and [52].
Ontology engineering methods
The ideas underpinning the Modular Ontology Modeling methodology build on years of prior ontology engineering research, covering organizational, process, and technological concerns that impact the quality of an ontology development process and its results.
The METHONTOLOGY methodology is presented by Férnandez et al. in [15]. It is one of the earlier attempts to develop a development method specifically for ontology engineering processes (prior methods often include ontology engineering as a sub-discipline within knowledge management, conflating the ontology-specific issues with other more general types of issues). Férnandez et al. suggest, based largely on the authors’ own experiences of ontology engineering, an ontology lifecycle consisting of six sequential work phases or
The On-To-Knowledge Methodology (OTKM) [61] is, similarly to METHONTOLOGY, a methodology for ontology engineering that covers the big steps, but leaves out the detailed specifics. OTKM is framed as covering both ontology engineering and a larger perspective on knowledge management and knowledge processes, but it heavily emphasises the ontology development activities and tasks (in [61] denoted the
DILIGENT, by Pinto et al. [43], is an abbreviation for
In all three of these well-established methods, the process steps that are defined are rather coarse-grained. They give guidance on overall activities that need to be performed in constructing an ontology, but more fine-grained guidance (e.g., how to solve common modeling problems, how to represent particular designs on concept or axiom level, or how to work around limitations in the representation language) is not included. It is instead assumed that the reader is familiar with such specifics of constructing an ontology. This lack of guidance arguably is a contributor to the three issues preventing re-use, discussed in Section 1.
Ontology design patterns
Ontology Design Patterns (ODPs) were introduced at around the same time independently by Gangemi [17] and Blomqvist and Sandkuhl [8], as potential solutions to the drawbacks of classic methods described above. The former defines such patterns by way of the characteristics that they display, including examples such as
A substantial body of work has been developed based on this idea, by a sizable distributed research community.4
MOMo extends on those methods, but also incorporates results from our past work on how to document ODPs [26,29,32], how to implement ODP support tooling [22] and how to instantiate patterns into modules by “stamping out copies” [24].
The eXtreme Design (XD) methodology [6] was originally proposed as a reaction to previous waterfall-oriented methods (e.g., some of those discussed above). XD instead borrows from agile software engineering methods, emphasizing a divide-and-conquer approach to problem-solving, early or continuous deployment rather than a “one-shot” process, and early and frequent refactoring as the ontology grows. Crucially, XD is built on reusing of ontological best practices via ODPs.

eXtreme Design method overview, from [6].
The XD method consists of a number of tasks, as illustrated in Fig. 1. The first two tasks deal with establishing a project context (i.e., introducing initial terminology and obtaining an overview of the problem) and collecting initial requirements in the form of a prioritized list of user stories (describing the required functionality in layman’s terms). These steps are performed by the whole XD team together with the customer, who is familiar with the domain and who understands the required functionalities of the resulting ontology. The later steps of the process are performed in pairs of two developers (these steps are in the figure enclosed in the large box). They begin by selecting the top prioritised user story that has not yet been handled, and transform that story into a set of requirements in the form of competency questions (data queries), contextual statements (invariants), and reasoning requirements. Customer involvement at this stage is required to ensure that the user story has been properly understood and that the elicited requirements are correctly understood.
The development pair then selects one or a small set of interdependent competency questions for modelling. They attempt to match these against a known ODP, possibly from a designated ODP library. The ODP is adapted and integrated into the ontology module under development (or, if this iteration covers the first requirements associated with a given user story, a new module is created from it). The module is tested against the selected requirements to ensure that it covers them properly. If that is the case, then the next set of requirements from the same user story is selected, a pattern is found, adapted, and integrated, and so on. Once all requirements associated with one user story have been handled, the module is released by the pair and integrated with the ontology developed by the other pairs in the development team. The integration may be performed either by the development pair themselves, or by a specifically designated integration pair.
XD has been evaluated experimentally and observationally, with results indicating that the method contributes to reduced error rates in ontologies [5,7], increased coverage of project requirements [5], and that pattern usage is perceived as useful and helpful by inexperienced users [5,7,20]. However, results also indicate that there are pitfalls associated with a possibility of over-dependence on ODP designs, as noted in [20].
SAMOD [42], or
Hammar [23] presents a set of proposed improvements to the XD methodology under the umbrella label “XD 1.1”. These include (1) a set of roles and role-specific responsibilities in an XD project, (2) suggestions on how to select and implement other forms of ontology re-use in XD than just patterns (e.g.,
XD, SAMOD, and XD 1.1 emphasize the needs for suitable support tooling for, e.g., finding suitable ODPs, instantiating those ODPs into an ontology, and executing tests across the ontology or parts of it. In developing MOMo and the CoModIDE platform, we propose and develop solutions to two additional support tooling needs: that of
Graphical conceptual modelling
[19] proposes three factors (see Fig. 2) that influence the construction of a conceptual model, such as an ontology; namely, the

Factors affecting conceptual modeling, from [19].
Graphical conceptual modeling approaches have been extensively explored and evaluated in fields such as database modeling, software engineering, business process modeling, etc. Studying model grammar, [58] compares EER notation with an early UML-like notation from a comprehensibility point-of-view. This work observes that restrictions are easier to understand in a notation where they are displayed coupled to the types they apply to, rather than the relations they range over. [10] proposes a quality model for EER diagrams that can also extend to UML. Some of the quality criteria in this model, that are relevant in graphical modeling of OWL ontologies, include
[1] studies the usability of UML, and reports that users perceive UML class diagrams (closest in intended use to ontology visualizations) to be less easy-to-use than other types of UML diagrams; in particular, relationship multiplicities (i.e., cardinalities) are considered frustrating by several subjects. UML displays such multiplicities by numeric notation on the end of connecting lines between classes. [36] analyses UML and argues that while it is a useful tool in a design phase, it is overly complex and as a consequence, suffers from redundancies, overlaps, and breaks in uniformity. [36] also cautions against using difficult-to-read and -interpret adornments on graphical models, as UML allows.
Various approaches have been developed for presenting ontologies visually and enabling their development through a graphical modeling interface, the most prominent of which is probably
For such collaborative modeling use cases, the commercial offering
CoModIDE is partially based on the Protégé plugin
Modular Ontology Modeling (MOMo6 Momo is the protagonist in the 1973 fantasy novel “Momo” by Michael Ende. The antagonists are Men in Grey that cause people to waste time.
In this part of the paper, we lay out the key components, namely schema diagrams, our approach to OWL axiomatization, ontology design patterns, and the concept of modules already mentioned previously, as well as the process which ties them together. In Section 4 and 5, we discuss our supporting tools and infrastructure, however they should be considered just one possible instantiation of the more general MOMo methodology. Indeed, most of the first part of the MOMo process is, in our experience, best done in analog mode, armed with whiteboards, flip-charts and a suitable modeling team.
Team composition is of critical importance for establishing a versatile modular ontology. Different perspectives are very helpful, as long as the group does not lose focus. Arrival at a consensus model between all parties which constitutes a synthesis of different perspectives is key, and such a consensus is much more likely to be suitable to accommodate future use cases and modifications. It is therefore advisable to have more than one domain expert with overlapping expertise, and more than one ontology engineer on the team. Based on our experiences, three types of participants are needed in order to have a team that can establish a modular ontology: domain experts, ontology engineers, and data scientists. Of course some people may be able to fill more than one role. An overall team size of 6–12 people appears to be ideal, based on our experiences (noted in Section 5.5). Meetings with the whole team will be required, but in the MOMo process most of the work will fall on the ontology engineers between the meetings.
The The The
Schema diagrams
Schema diagrams are a primary tool of the MOMo process. In particular, they are the visual vehicle used to coalesce team discussions into a draft model and used centrally in the documentation. This diagram-based approach is also reflected in our tools, which we will present in Sections 4–5.
Let us first explain what we do – and do not – mean by schema diagram, and we use Fig. 3 as an example,7 The schema diagrams in this paper were produced with yEd, available from
Our schema diagrams are labeled graphs that indicate OWL entities and their (possible) relationships. Nodes can be labeled by (1) classes (EntityWithProvenance – rectangular, orange, solid border), (2) modules (Agent, PersonRecord, ProvenanceActivity – rectangular, light blue, dashed border), (3) controlled vocabularies (DocumentTypes, LicenseInformation – rectangular, purple, solid border), (4) datatypes (xsd:string, xsd:anyURI – oval, yellow, solid border). Arrows can be white-headed without label, indicating a subclass relationship (the arrow between PersonRecord and EntityWithProvenance) or can be labeled with the name of the property, which could be a data or an object property, which is identified by the target of the arrow, which may be a datatype.

Indication of a module in a diagram means that instead of the node (the light blue, dashed border), there may be a complex model in its very own right, which would be discussed, depicted, and documented separately. For example, PersonRecord in the Enslaved Ontology is a complex module with several sub-modules. The diagram in Fig. 3 “collapses” this into a single node, in order to emphasize what is essential for the Provenance module. Controlled vocabularies are predefined sets of IRIs with a specific meaning that is documented externally (i.e., not captured in the ontology itself). A typical example would be IRIs for physical units like meter or gram or, as in our example diagram, IRIs for specific copyright licences, such as CC-BY-SA. Datatypes would be the concrete datatypes allowed in OWL. This type of schema diagram underlies a study on automatic schema diagram creation from OWL files [50].
Note that our schema diagrams do
As already mentioned, OWL axioms are the key constituents of an ontology as a data artifact, although in our experience quality documentation is of at least the same importance. As has been laid out elsewhere [30], axiomatizations can have different interpretations, and while they can, for example, be used for performing deductive reasoning, this is not their main role as part of the MOMo approach. Rather, for our purposes axioms serve to
As such, we recommend a rather complete axiomatization, as long as it does not force an overly specific reading on the ontology. We usually use the checklist from the OWLAx tool [47] to axiomatize with simple axioms. More complex axioms, in particular those that span more than two nodes in a diagram, can be added conventionally or by means of the ROWLTab Protégé plug-in [45,46]. We also utilize what we call This is similar to the
As already mentioned, Ontology Design Patterns (ODPs) have originated in the early 2000s as reusable solutions to frequently occurring ontology design problems. Most ODPs can currently be found on the ontologydesignpatterns.org portal, and they appear to be of very varied quality both in terms of their design and documentation, and following a variety of different design principles. While they proved to be useful for the community [28], as part of MOMo, we re-imagine ontology design patterns and their use.
Most importantly, rather than working with a crowd-sourced collection of ODPs, there seems to be a significant advantage in working with a well-curated

Schema diagram of the MODL Provenance ODP. It is based on the core of PROV-O [44].
As an example, a schema diagram for the MODL Provenance pattern is provided in Fig. 4. In MOMo, the pattern would be used as a
An (ontology)
Modules can be overlapping, or nested. While they are often based on some shared semantics, as encoded in an ODP, this is not a hard requirement; the purpose of the module is to encapsulate a set of interrelated functionality, the logic of which classes and properties that the module covers can be, and often is, guided, not only by the semantics of the domain, but also by the development context and use case. For example, in the context of Fig. 3, the PersonRecord class could reasonably be considered to be outside the module. Likewise, the EntityWithProvenance class may or may not be considered part of the PersonRecord module. The latter may depend on the question how “central” provenance for person records is, in the application context of the ontology. In this sense, ontology modules are ambiguous in their delineation, just as the human concepts they are based on.

Schema diagram of a supply chain ontology currently under development by the authors.
As a data artifact, though, i.e., in the OWL file of the ontology, we will use the above-mentioned Ontology Pattern Language OPLa to identify modules, i.e. the ontology engineers will have to make an assessment how to delineate each module in this case. OPLa will furthermore be used to identify ODPs (if any) which were used as templates for a module.
Finally, an ontology’s modules will drive the documentation, which will usually discuss each module in turn, with separate schema diagrams, axioms, examples and explanations, and will only at the very end discuss the overall ontology which is essentially a composition of the modules. In a diagram that encompasses several modules, the modules can be identified visually using frames or boxes around sets of nodes and arrows. An example for this is given in Fig. 5. Several modules are identified by grey boxes in this diagram, including nested modules such as on the lower right.
We now describe the Modular Ontology Modeling workflow that we have been applying and refining over the past few years. It borrows significantly from the eXtreme Design approach described in Section 2.3, but has an emphasis on modularization, systematic use of schema diagrams, and late-stage OWL generation. Table 1 summarizes the steps of the workflow, and the following sections discuss each step in more detail. A walk-through tutorial for the approach can be found in [56].
MOMo workflow
MOMo workflow
This workflow is not necessarily a strict sequence, and work on later steps may cause reverting to an earlier step for modifications. Sometimes subsequent steps are done together, e.g., 4 and 5, or 7 and 8.
Steps 1 through 4 can usually be done through a few shorter one-hour teleconferences (or meetings), the number of which depends a lot on the group dynamics and prior experience of the participants. This sequence would usually also include a brief tutorial on the modeling process. If some of the participants already have a rather clear conception of the use cases and data sources, then 2 or 3 one-hour calls would often suffice.
In our experience, synchronous engagement (in the sense of longer meetings) of the modeling team usually cannot be avoided for step 5. Ideally, they would be conducted through in-person meetings, which for efficiency should usually be set up for 2 to 3 subsequent days. Online meetings can also be almost as effective, but for this we recommend several, at least 3, subsequent half-day sessions about 4–5 hours in length.
Steps 6 to 10 are mostly up to the ontology engineers at the team, however they would request feedback and correctness checks from the data and domain experts. This can be done asynchronously, but depending on preference could also include some brief teleconferences (or meetings).
As the first step, the use case, i.e., the problem to be addressed, should be described. The output description can be very brief, e.g., a paragraph of text, and it does not necessarily have to be very crisp. In fact it may describe a set of related use cases rather than one specific use case, and it may include future extensions which are currently out of scope. Setting up a use case description in this way alerts the modeling team to the fact that the goal is to arrive at a modular ontology that is extensible and re-useable for adjacent but different purposes. In addition to capturing the problem itself, the use case descriptions can also describe existing data sources that the ontology needs to be able to represent or align against, if any.

Example use case description, taken from [56].
An example for such a use case description can be found in Fig. 6. In this particular case, the possible data sources would be a set of different recipe websites such as allrecipes.com.
Competency questions are examples for queries of interest, expressed in natural language, that should be answerable from the data graph with which the ontology would be populated. Competency questions help to refine the use case scenario, and can also aid as a sanity check on the adequacy of the data sources for the use case. While the competency questions can often be gathered during work on the use case description, it is sometimes also helpful to collect them from potential future users. For example, for an ontology on the history of the slave trade [54], professionals, school children, and some members of the general public were asked to provide competency questions. A few examples are provided in Fig. 7. We found experientially, 10–12 sufficiently different competency questions will be enough.

Example competency questions, taken from [54].
This is a central step which sets the stage for the actual modeling work in step 5. The main idea is that each of the identified key notions will become a module, however, during modeling, some closely related notions may also become combined into a single module. It is also possible that at a later stage is realized that a key notion had been forgotten, which is easily corrected by adding the new key notion to the previous list.
The key notions are determined by the modeling team, by taking into consideration the use case description, the possible data sources, and the competency questions from the previous steps. One approach, which can help guide this elicitation is to generalize use case descriptions and/or competency questions into
The list of key notions can act not only as a feature inclusion list, but also as a control to help prevent feature creep; in our experience, it is not unusual for modellers to try to generalize their modeling early on, including additional concepts and relations that are not strictly speaking part of the project requirements. By keeping track of requirements and their provenance, from use case descriptions through competency questions through key notions and subsequently modules, one can prevent such premature generalization. Ideally this workflow is supported by integrated requirements management tooling that provides traceability of those requirements.
An example for key notions, for the recipe scenario from Fig. 6, is given in Fig. 8.

In MOMo, we utilize pattern libraries such as MODL. For each of the key notions identified in the previous step, we thus attempt to find a pattern from the library which seems close enough or modifiable, so that it can serve as a template for a first draft of a corresponding module. For example, for
For some key notions there may be different reasonable choices for a pattern. For example,
In some cases, there will be no pattern in the library which can reasonably be used as a template. This is of course fine, it just means that the module will have to be developed from scratch.
Create schema diagrams for modules
This step usually requires synchronous work sessions by the modeling team, led by the ontology engineers. The key notions are looked at in isolation, one at a time, although of course the ontology engineers should simultaneously keep an eye on basic compatibility between the draft modules. The modeling order is also important. It often helps to delay the more complicated, involved or controversial modules, and focus first on modules that appear to be relatively clear or derivable from an existing pattern. It is also helpful to begin with notions that are most central to the use case.
A typical modeling session could begin with a discussion as to which pattern may be most suitable to use as a template (thus overlapping with step 4). Or it could start with the domain experts attempting to explain the key notion, and its main aspects, to the ontology engineers. The ontology engineers would query about details of the notion, and also about available data, until they can come up with a draft schema diagram which can serve as a
Indeed, the idea of prompting with schema diagrams is in our experience a very helpful one for these modeling sessions. A prompt in this sense does not have to be exact or even close in terms of the eventual solution. Rather, the diagram used as a prompt reflects an attempt by the ontology engineer based on his current (and often naturally) limited understanding of the key notion. Usually, such a prompt will prompt(!) the domain and data experts to point out the deficiencies of the prompt diagram, thus making it possible to refine or modify it, or to completely reject it and come up with a new one. Discussions around the prompts also sometimes expose disagreements between the different domain experts in the team, in which case the goal is to find a consensus solution. It is important, though, that the ontology engineers attempt to keep the discussion focused on mostly the notion currently modeled.
Ontology engineers leading the modeling should also keep in mind that schema diagrams are highly ambiguous. This is important for several reasons.

A minimalistic provenance module based on the MODL Provenance pattern shown in Fig. 4.
For instance, some critique by a domain expert may be based on an unintended interpretation of the diagram. When appropriate, the ontology engineers should therefore explain the meaning of the diagram in natural language terms, such as “there is one
Furthermore, eventually (see the next step) the ontology engineers will have to convert the schema diagrams into a formal model which will no longer be ambiguous. The ontology engineers should therefore be aware that they need to understand how to interpret the diagram in the same way as the domain experts. This can usually be done by asking the domain experts – during this step or a subsequent one – concrete questions about the intended meaning, e.g., whether a person can have several children, or at most one, etc.
It is of course possible that a module may use a pattern as a template, but will end up to being a highly simplified version of the pattern. E.g., the provenance module depicted in Fig. 9 was derived from the pattern depicted in Fig. 4, as discussed in Section 3.4.
We consider the documentation to be a primary part of an ontology: In the end, an OWL file alone, in particular if sizable, is really hard to understand, and it will mostly be humans who will deal with the ontology when it is populated or re-used. In MOMo, creation of the documentation is in fact an integral part of the modeling process, and the documentation is a primary vehicle for communication with the domain and data experts in order to polish the model draft.
MOMo documentations – see [55] for an example – discuss each of the modules in turn, and for each module, a schema diagram is given together with the formal OWL axioms (and possible additional axioms not expressible in OWL) that will eventually be part of the OWL file. Since the documentation is meant for human consumption, we prefer to use a concise formal representation of axioms, usually using description logic syntax or rules, together with an additional listing of the axioms in a natural language representation.
Domain and data experts can be asked specific questions, as mentioned above, to determine the most suitable axioms. Sometimes, the choice of axiom appears to be arbitrary, but would have direct bearing on the data graph. An example for this would be whether the property
In our experience, using axioms that only contain two classes and one property suffices to express an overwhelming majority of the desired logical theory [12]. We are thus utilizing the relatively short list of 17 axiom patterns that was determined for support in the OWLAx Protégé plug-in [47] and that can also be found discussed in [56]. More complex axioms can of course also be added as required. Axioms can often also be derived from the patterns used as templates.
We would like to mention, in particular, two types of axioms that we found very helpful. One of them are
Scoped domain (resp., range) axioms differ from unscoped or global ones in that they make the domain (resp., range) contingent on the range (resp., domain). In formal terms, a domain axiom is of the form
Using scoped versions of domain and range helps to avoid making overly general domain or range axioms. E.g., if you specify two global domains for a property
To give an example, consider the two scoped domain axioms
We generally recommend to use scoped versions of domain and range axioms – and, likewise, for functionality, inverse functionality, and cardinality axioms – instead of the global versions. It makes the axioms easier to re-use, and avoids overly general axioms which may be undesirable in a different context.
Create ontology schema diagram from the module schema diagrams, and add axioms spanning more than one module
A combined schema diagram, see Fig. 5 for an example, can be produced from the diagrams for the individual modules, In our experience, it is best to focus on understandability of the diagram [32,50]. The following guidelines should be applied with caution – exceptions at the right places may sometimes be helpful.
Arrange key classes in columns and rows.
Prefer vertical or horizontal arrows; this will automatically happen if classes are arranged in columns and rows.
Avoid sub-class arrows: We have found that sub-class arrows can sometimes be confusing for readers that are not intimately familiar with the formal logical meaning of them. E.g., in Fig. 5, SourceRole is a subclass of ParticipantRole, which means that a container may assume SourceRole. However the diagram does not show a direct arrow from Container to the box containing SourceRole, and this in some cases makes the diagram harder to understand, in particular if there is an abundance of sub-class relationships.
Prefer straight arrows.
Avoid arrow crossings; if they are needed, make them near perpendicular.
Use “module” boxes (light blue with dashed border) to refer to distant parts of the diagram to avoid cluttering the diagram with too many arrows.
Avoid partial overlap of module groupings (grey boxes) in the diagram, even if modules are in fact overlapping. This is generally done by duplicating class nodes.
Break any guideline if it makes the diagram easier to understand.
The schema diagram for the entire ontology should then also be perused for additional axioms that may span more than one module. These axioms will often be rather complex, but they can often be expressed as rules. For complex axioms, rules are preferable over OWL axioms since they are easier for humans to understand and create [46]; the ROWLtab Protégé plug-in [45] can for example be used to convert many of these rules into OWL.
Reflect on entity naming and all axioms
Good names for ontology entities, in particular classes and properties, are very helpful to make an ontology easier to understand and therefore to re-use. We use a mix of common sense and practice, and our own naming conventions which have found to be useful. We list the most important ones in the following.
The entity names (i.e., the last part of the URI, after the namespace) should be descriptive. Avoid the encoding of meaning in earlier parts of the URI. An exception would be concrete datatypes such as xsd:string.
Begin class names and controlled vocabulary names with uppercase letters, and properties (as well as individuals and datatypes) with lowercase letters.
Use CamelCase for enhanced readability of composite entity names. E.g., use AgentRole rather than Agentrole, and use hasQuantityValue rather than hasquantityvalue.
Use singular class names, e.g., Person instead of Persons.
Use class names that are specific, and that help to avoid common misunderstandings. For example, use ActorRole instead of Actor, to avoid accidental subClassing with Person.
Whenever possible, use directional property names, and in particular avoid using nouns as property names. E.g., use hasQuantityValue instead of quantityvalue. The inverse property could then be consistently named as quantityValueOf. Other examples would be providesAgentRole and assumesAgentRole.
Make particularly careful choices concerning property names, and that they are consistent with the domain and range axioms chosen. E.g., a hasName property should probably never have a domain (other than owl:Thing), as many things can indeed have names.
It is helpful to keep these conventions in mind from the very start. However, during actual modeling sessions, it is often better to focus more on the structure of the schema diagram that is being designed, and to delay a discussion on most appropriate names for ontology entities. These can be relatively easily changed during the documentation phase.
Create OWL file(s)
Creation of the OWL file can be done using CoModIDE (discussed below). The work could be done in parallel with writing up the documentation; however we describe it as the last point in order to emphasize that most of the work on a modular ontology is done conceptually, using discussions, diagrams, and documentation; and that the formal model, in form of an OWL file, is really only the final step in the creation.
For the sake of future maintainability, the generated OWL file should incorporate OPLa annotations that identify modules and their provenance; such annotations are created by CoModIDE.
CoModIDE
CoModIDE is intended to simplify MOMo-based ontology engineering projects. Per the MOMo methodology, initial modeling rarely needs to (or should) make use of the full set of language constructs that OWL 2 provides; instead, at these early stages of the process, work is typically carried out graphically – whether that be on whiteboards, in vector drawing software, or even on paper. This limits the modeling constructs to those that can be expressed intuitively using graphical notations, i.e., schema diagrams,9 We find that the size of partial solutions users typically develop fit on a medium-sized whiteboard; but whether this is a naturally manageable size for humans to operate with, or whether it is the result of constraints of or conditioning to the available tooling, i.e., the size of the whiteboards often mounted in conference rooms, we cannot say.
Per MOMo, the formalization of the developed solution into an OWL ontology is carried out after-the-fact, by a designated ontologist with extensive knowledge of both the language and applicable tooling. However, this comes at a cost, both in terms of hours expended, and in terms of the risk of incorrect interpretations of the previously drawn graphical representations (the OWL standard does not define a graphical syntax, so such human-generated representations are sometimes ambiguous). CoModIDE intends to reduce costs by bridging this gap, by providing tooling that supports both user-friendly schema diagram composition, according to our graphical notation described in Section 3.2, (using both ODP-based modules and “free-hand” modeling of classes and relationships), and direct OWL file generation.
The design criteria for CoModIDE, derived from the requirements discussed above, are as follows:
CoModIDE should support visual-first ontology engineering, based on a graph representation of classes, properties, and datatypes. This graphical rendering of an ontology built using CoModIDE should be consistent across restarts, machines, and operating system.
CoModIDE should support the type of OWL 2 constructs that can be easily and intuitively understood when rendered as a schema diagram. To model more advanced constructs (unions and intersections in property domains or ranges, the property subsumption hierarchy, property chains, etc), the user can drop back into the standard Protégé tabs.
CoModIDE should embed an ODP repository. Each included ODP should be free-standing and completely documented. There should be no external dependency on anything outside of the user’s machine.10 Our experience indicates that while our target users are generally enthusiastic about the idea of reusing design patterns, they are quickly turned off of the idea when they are faced with patterns that lack documentation or that exhibit link rot.
CoModIDE should support simple composition of ODPs; patterns should snap together like Lego blocks, ideally with potential connection points between the patterns lighting up while dragging compatible patterns. The resulting ontology modules should maintain their coherence and be treated like modules in a consistent manner across restarts, machines, etc. A pattern or ontology interface concept will need to be developed to support this.
CoModIDE is developed as a plugin to the versatile and well-established Protégé ontology engineering environment. The plugin provides three Protégé views, and a tab that hosts these views (see Fig. 10). The

CoModIDE user interface featuring 1) the schema editor, 2) the pattern library, and 3) the configuration view.
When a pattern is dragged onto the canvas, the constructs in that pattern are copied into the ontology (optionally having their IRIs updated to correspond with the target ontology namespace), but they are also annotated using the OPLa vocabulary, to indicate 1) that they belong to a certain pattern-based module, and 2) what pattern that module implements. In this way module provenance is maintained, and modules can be manipulated (folded, unfolded, removed, annotated) as needed.
We have evaluated CoModIDE through a four-step experimental setup, consisting of: a survey to collect subject background data (familiarity with ontology languages and tools), two modeling tasks, and a follow-up survey to collect information on the usability13 As according to the System Usability Scale (SUS) and described further in Section 4.2.2.
When using CoModIDE, a user takes less time to produce correct and reasonable output, than when using Protege.
A user will find CoModIDE to have a higher SUS score than when using Protege alone.
During each of the modeling tasks, participants were asked to generate a
The following sections provide a brief overview of each of the steps. The source material for the entire experiment is available online.14
When recruiting our participants for this evaluation, we did not place any requirements on ontology modeling familiarity. However, to establish a shared baseline knowledge of foundational modeling concepts (such as one would assume participants would have in the MOMo scenario we try to emulate, see above), we provided a 10 minute tutorial on ontologies, classes, properties, domains, and ranges. The slides used for this tutorial may be found online with the rest of the experiment’s source materials.
a priori survey
The purpose of the I have done ontology modeling before. I am familiar with Ontology Design Patterns. I am familiar with Manchester Syntax.15 This is asked as Manchester Syntax is the default syntax in Protégé. The underlying assumption is that the manual addition of axioms with the expression editor in Protégé would be faster given familiarity with Manchester Syntax. I am familiar with Description Logics. I am familiar with Protégé.
Finally, we asked the participants to describe their relationship to the test leader, (e.g. student, colleague, same research lab, not familiar).

Task A schema diagram.
In Task A, participants were to develop an ontology to model how an analyst might generate reports about an ongoing emergency. The scenario identified two design patterns to use:
Figure 11 shows how these patterns are instantiated and connected together. Overall the schema diagram contains seven concepts, one datatype, one subclass relation, one data property, and six object properties.

Task B schema diagram.
In Task B, participants were to develop an ontology to capture the steps of an experiment. The scenario identified two design patterns to use:
Figure 12 shows how these patterns are instantiated and connected together. Overall, the schema diagram contains six concepts, two datatypes, two subclass relations, two data properties, and four object properties (one of which is a self-loop).
a posteriori survey
The
Additionally, we inquire about CoModIDE-specific features. These statements are also rated using a Likert scale. However, we do not use this data in our evaluation, except to inform our future work. Finally, we requested any free-text comments on CoModIDE’s features.
Participant pool composition
Of the 21 subjects, 12 reported some degree of familiarity with the authors, while 9 reported no such connection. In terms of self-reported ontology engineering familiarity, the responses are as detailed in Table 2. It should be observed that responses vary widely, with a relative standard deviation (
Metric evaluation
We define our two metrics as follows:
For these metrics, we generate simple statistics that describe the data, per modeling task. Tables 3(a) and 3(b) show the mean, standard deviation, and median for the Time Taken and Correctness of Output, respectively.
Mean, standard deviation, relative standard deviation, and median responses to a priori statements
Mean, standard deviation, relative standard deviation, and median responses to
Summary of statistics comparing Protege and CoModIDE
In addition, we examine the impact of our control variables (CV). This analysis is important, as it provides context for representation or bias in our data set. These are reported in Table 3(c). CV1–CV5 correspond exactly to those questions asked during the
Significance of results
Mean, standard deviation, and median SUS score for each tool. The maximum score is 100
We analyze the SUS scores in the same manner. Table 5 presents the mean, standard deviation, and median of the data set. The maximum score while using the scale is a 100. Table 3(d) presents our observed correlations with our control variables.
Finally, we compare each metric for one tool against the other. That is, we want to know if our results are statistically significant – that as the statistics suggest in Table 3, CoModIDE does indeed perform better for both metrics and the SUS evaluation. To do so, we calculate the probability
Free text comment fragments per category
Free text comment fragments per category
18 of the 21 subjects opted to leave free-text comments. We applied fragment-based qualitative coding and analysis on these comments. I.e., we split the comments apart per the line breaks entered by the subjects, we read through the fragments and generated a simple category scheme, and we then re-read the fragments and applied these categories to the fragments (allowing at most one category per fragment) [9,49]. The subjects left between 1–6 fragments each for a total of 49 fragments for analysis, of which 37 were coded, as detailed in Table 6.
Of the 18 participants who left comments, 3 left comments containing no codable fragments; these either commented upon the subjects own performance in the experiment, which is covered in the aforementioned completion metrics, or were simple statements of fact (e.g.,
Participant pool composition
The data indicates no correlation (bivariate correlation
Metric evaluation
Before we can determine if our results confirm H1 and H2, we must first examine the correlations between our results and the control variables gathered in the
As shown in Table 3(c), the metric
To confirm H1, we look at the metrics separately.
This is particularly interesting; given the above analysis of CV correlations where we see no (or very weak) correlations between prior ontology modeling familiarity and CoModIDE modeling results, and the confirmation of H1, that CoModIDE users perform better than Protégé users, we have a strong indicator that we have achieved increased
When comparing the SUS score evaluations, we see that the usability of Protégé is strongly influenced by familiarity with ontology modeling and familiarity with Protégé itself. The magnitude of the correlation suggests that newcomers to Protege do not find it very usable. CoModIDE, on the other hand is, weakly, negatively correlated along the CV. This suggests that switching to a graphical modeling paradigm may take some adjusting.
However, we still see that the SUS scores for CoModIDE have a greater mean, tighter
As such, by confirming H1 and H2, we may say that CoModIDE improves the approachability of ontology engineering, especially for those not familiar with ontology modeling – with respect to our participant pool. However, we suspect that our results are generalizable, due to the strength of the statistical significance (Table 4) and participant pool composition (Section 4.3.1).
Free-text responses
The fragments summarized in Table 6 paints a quite coherent picture of the subjects’ perceived advantages and shortcomings of CoModIDE, as follows:
We note that there is a near-unanimous consensus among the subjects that graphical modeling is intuitive and helpful. When users are critical of the CoModIDE software, these criticisms are typically aimed at specific and quite shallow bugs or UI features that are lacking. The only consistent criticism of the modeling method itself relates to the difficulty in constructing self-links (i.e., properties that have the same class as domain and range).
CoModIDE 2.0
Since the evaluation, we have made plenty of progress on improving CoModIDE in significant ways. Aside from bug fixes and general quality of life improvements (i.e. versions 1.1.1 and 1.1.2) addressing many of the free-text responses in Section 4.4.3, we have implemented additional key aspects of the MOMo methodology. In particular, they are as follows.
The
We have also added functionality to assist in navigating a complex pattern space through the notion of interfaces. That is, categorizing patterns based on the roles that they may play. For example, a more general ontology may call for some pattern that satisfies a spatial extent modeling requirement. To borrow from software engineering terms, one could imagine several different implementations of a ‘spatial extent” interface.
In addition, we have added simple, manual alignment to external ontologies. More information on this upper alignment tool for CoModIDE can be found in [11].
In order to improve the extensibility of the platform, we have reworked the overarching conceptual framework for functionality in CoModIDE. Functionality is now categorized into so-called toolkits which communicate through a newly implemented message bus. This allows for a relatively straightforward integration process for external developers.
It is also important to recall that CoModIDE is not just a development platform, but a tool that enables research into ontology engineering. To that point, we have implemented an opt-in telemetry agent that collects and sends anonymized usage characteristics back to the developers. This records session lengths, clicks, and other such metrics that give us insight on how ontologies are authored in a graphical environment.
Additional infrastructure and resources
The modular ontology design library (MODL)
The Modular Ontology Design Library (MODL) is both an artifact and a framework for creating collections of ontology design patterns [53]. MODL is a method for establishing a well-curated and well-documented collection of ODPs that is structured using OPLa. This allows for a queryable interface when the MODL is very large, or if the MODL is integrated into other tooling infrastructure. For example, CoModIDE uses OPLa annotations to structure, define, and relate patterns in its internal MODL, as described in Section 4.1.
OPLa annotator
The OPLa Annotator [52] is a standalone plugin for Protégé. This plug-in allows for the guided creation of
ROWLTab
ROWLTab [46] is another standalone plugin for Protégé. It is based on the premise that some ontology users, and frequently non-ontologists, find conceptualizing knowledge through rules to be more convenient. This plugin allows the user to enter SWRL rules which will then, when applicable, be converted into equivalent OWL axioms. An extension to this plug-in, detailed in [48], allows for existential rules.
SDOnt
SDOnt [50] is an early tool for generating schema diagrams from OWL files. Unlike other visual OWL generators, SDOnt does not give a strictly disambiguous diagram. Instead, it generates schema diagrams in the style that has been described in Section 3.2 based on the TBox of the input OWL file. This program only requires Java to run and can be run on any OWL ontology; although, as with any graph visualization, it tends to work best with smaller schemas.
Example modular ontologies
In this section, we provide a brief directory of existing modular ontologies, organized by modeling challenges. These can be used for inspiration to the prospective modeler, or, in the spirit of the MOMo methodology, the modeler may wish to reuse or adapt their modules to new or similar use cases.
Highly spatial data
It is frequently common to model data that has a strong spatial dimension. The challenges that accompany this are unfortunately myriad. In the GeoLink Modular Ontology [34] we utilize the Semantic Trajectory pattern to model discrete representations of continuous spatial movement. Ongoing work regarding the integration of multiple datasets (e.g., NOAA storm data, USGS earthquake data, and FEMA disaster declarations)17 Respectively, these are National Oceanic and Atmospheric Administration, United States Geological Survey, and Federal Emergency Management Agency. See
Elusive ground truth
Sometimes, it is necessary to model data where it is not known if it is true, or that it is necessary to knowingly ingest possibly contradictory knowledge. In this case, we suggest a records-based approach, with a strong emphasis on the first-class modeling of provenance. That is, knowledge or data is not modeled directly, but instead we model a container for the data, which is then strongly connected to its provenance. An example of this approach can be found in the Enslaved Ontology [54], where historical data may contradict or conflict with itself, based on the interpretations of different historians.
Rule-based knowledge
In some cases, it may be necessary to directly encode rules or conditional data, such as attempting to describe the action-response mechanisms when reacting to an event. The methods for doing so, and the modules therein associated, can be found in the Modular Ontology for Space Weather Research [57] and in the Domain Ontology for Task Instructions [13].
Shortcuts & views
Shortcuts and Views are used to manage complexity between rich and detailed ontological truthiness and convenience for data providers, publishers, and consumers. That is, it is frequently desirable to have high fidelity in the underlying knowledge model, which may result in a model that is confusing or unintuitive to the non-ontologist. As such, shortcuts can be used to simplify navigation or publishing according to the model. These shortcuts would also be formally described allowing for a navigation between levels of abstraction. A full examination of these constructs is out of scope, but examples of shortcuts and views, alongside their use, can be found in the Geolink Modular Ontology [34], the tutorial for modeling Chess Game Data [35], and in the Enslaved Ontology [54].
The re-use of ontologies for new purposes, or adapting them to new use-cases, is frequently very difficult. In our experiences, we have found this to be the case for several reasons: (i) differing representational granularity, (ii) lack of conceptual clarity of the ontology design, (iii) adhering to good modeling principles, and (iv) a lack of re-use emphasis and process support available in ontology engineering tooling. In order to address these concerns, we have developed the Modular Ontology Modeling (MOMo) workflow and supporting tooling infrastructure, CoModIDE (The Comprehensive Modular Ontology Integrated Development Environment – “commodity”).
In this paper, we have presented the MOMo workflow in detail, from introducing the schema diagram as the primary conceptual vehicle for communicating between ontology engineers and domain experts, to presenting several experiences in executing the workflow across many distinct domains with different use cases and data requirements.
We have also shown how the CoModIDE platform allows ontology engineers, irrespective of previous knowledge level, to develop ontologies more correctly and more quickly, than by using standard Protégé; that CoModIDE has a higher usability (SUS score) than standard Protégé; and that the CoModIDE issues that concern users primarily derive from shallow bugs as opposed to methodological or modeling issues. Taken together, this implies that the modular graphical ontology engineering paradigm is a viable way for supporting the MOMo workflow.
Future work
From here, there are still many avenues of investigation remaining, pertaining to both the MOMo workflow and CoModIDE.
Regarding the workflow, we will continue to execute the workflow in new domains and observe differences in experiences. Currently, we are examining how to better incorporate spatially-explicit modeling techniques. In addition, we wish to further explore how schema diagrams may represent distinctly different semantics, such as ShEx [4], SHACL [33], rather than OWL.
We also foresee the continued development of the platform. As mentioned in Section 4.5, we have improved its internal structure so that it may support bundled pieces of functionality. In particular, we will develop such toolkits for supporting holistic ontology engineering projects, going beyond just the modeling process. This will include the incorporation of ontology alignment systems so that CoModIDE may export automatic alignments alongside the designed deliverable, and the incorporation of recommendation software, perhaps based on input seed data. Further, we see a route for automatic documentation in the style of our own technical reports. Finally, we wish to examine collected telemetry data in order to analyse how users develop ontologies in a graphical modeling paradigm.
