Sage Journals: Discover world-class research

Abstract

Knowledge graphs (KGs) are an established paradigm for integrating heterogeneous data and representing knowledge. As such, there are many different methodologies for producing KGs, which span notions of expressivity, and are tailored for different use-cases and domains. Now, as neurosymbolic methods rise in prominence, it is important to understand how the development of KGs according to these methodologies impact downstream tasks, such as link prediction using KG embeddings (KGEs). In this article, we examine how various perturbations of graph structures impact downstream tasks. These perturbations are sourced from how various methodologies (or design practices) would impact the model, starting with simple inclusions of schema and basic reification constructions. We assess these changes across synthetic graphs and FB15k-237, a common benchmark. We provide visualizations, graph metrics, and performance on the link prediction task as exploration results using various KGE models.

Keywords

knowledge graphs knowledge graph embeddings graph metrics

1. Introduction

Knowledge graphs (KGs) are an established paradigm for effectively and efficiently integrating heterogeneous data (Hitzler, 2021; Hogan et al., 2022; Noy et al., 2019). Many methodologies for creating KGs (and the ontologies that act as their schema (Hitzler & Krisnadhi, 2016)) have been developed over the years (Fernandez-Lopez et al., 1997), which recommend or otherwise emphasize the use of various techniques. These range from the use of upper ontologies (Gangemi et al., 2002; Smith, 1998), the use of ontology design patterns (Blomqvist et al., 2016; Gangemi & Presutti, 2009; Shimizu et al., 2023), or even the use of LLMs alone (Meyer et al., 2023), or combined with other methods (Shimizu & Hitzler, 2025).

Evaluating of KGs (or the ontologies that act as their schemas) can be done in many ways (Gómez-Pérez, 2004; Raad & Cruz, 2015), including the use of large language models (Tsaneva et al., 2024), logical and mathematical characteristics (Guarino & Welty, 2004), heuristics (Poveda-Villalón et al., 2014), or competency questions (Mansfield et al., 2021). On the other hand, validation tools (e.g., SHACL (Knublauch & Kontokostas, 2017) or ShEx (Baker & Prud’hommeaux, 2019)) can measure whether or not the KG adheres to the schema.

As such, these also widely vary along which dimensions the evaluation occurs (e.g., is the ontology well-formed?) and how the quality is reported (i.e., quantitative or qualitative reporting). Of particular importance, in any case, is determining whether or not the resultant KG after executing a methodology indeed serves the needs of the stakeholders. For example, competency questions act as both a guide during the development (in many methodologies) and also as a mechanism to confirm if the KG appropriately models—and returns—the correct data (Antia & Keet, 2023).

Beyond these particular assessments of quality, however, is also whether or not a KG is appropriate for downstream tasks, such as KG embeddings (KGEs) (Kejriwal et al., 2021). Depending on the model, entities and relationships are somehow vectorized, which allow, for example, predicting relationships between entities (Rossi et al., 2021). A recent study revealed that KGE model performance for link prediction can be impacted by the underlying structure of KGs (Dave & Shimizu, 2023).

The work presented in this article explores how various graphical structures, such as those that would produced via different KG or ontology development methodologies, impact how various KGE models are impacted by those changes, when evaluated against the link prediction task. To the authors’ knowledge, beyond their own work (Dave et al., 2024) that this paper extends, there has yet to be any comprehensive investigation in this area (although recently a pipeline for conducting such assessment has been developed (Heist et al., 2023)).

Specifically,

–
instance graph, which we call SKG-4;
–
SKG-4 plus type annotations, which we call SKG-5;
–
SKG-5 plus superclasses for each type, which we call SKG-6;
–
SKG-5 with reified properties, which we call SKG-5r;
–
SKG-5r with shortcuts, which we call SKG-5rs; and
–
SKG-5rs with added contextual nodes, which we call SKG-5rsc.

We furthermore note that these various representations span complexity. On one hand, they represent a richer ontological reality, but on the other hand simpler semantics (and thus KG structures) are easier to consume and query. This is inline with how patterns can be used to flatten or expand views of data to aid in data publishing and consumption (Krisnadhi et al., 2015; Krisnadhi, 2015; Rodrı́guez-Doncel et al., 2015). Similarly, it is worth exploring how the various views over a KG can be used for human consumption, but tied directly to a version that is easier for different KGE models to learn.

Concretely, this article contributes: (a) various synthetic graphs and mechanism for their generation, (b) the FB15k isotopes: FB15k-238 and FB15k-239,¹ (c) the scripts and configuration files to generate these datasets, (d) a thorough evaluation of the effects that the incorporation of increasing metadata has on the performance of the KGE models in the link prediction task,² (e) the creation of SKG-237, a graph mimicking FB15k-237 in structure as far as node, edge , node/ratio and degree centrality, that is trained and validated in the same way as the ones above on TransE, (f) the creation of synthetic knowledge graphs (SKGs) of increasing complexity, showcasing their generation, training, and evaluation on different hyperparameters from our original isotopes, along with visualizations using t-SNE and UMAP, and (g) a discussion of results and insights.
2. Background

2.1. Related Work

In Iferroudjene et al. (2023), they argue that the removal of Freebase compound value types (CVTs) from the FB15k and FB15k-237 datasets, consequently, removes valuable information from the KG. They create FB15k-CVT that re-introduces an exact subset of Freebase with CVTs, which allow KGs to create more structured and detailed representation of entities with multiple values of a type of data. When evaluating KGE models against FB15k-237 and FB15k-CVT, FB15k-CVT underperformed on link prediction tasks. This work indicates that current KGE models may not effectively incorporate semantic data and additional research can be done to understand the limitations.

Overall, we see that deductive reasoning is quite difficult outside of the symbolic algorithms dedicated to it. In particular, neurosymbolic methods (e.g., as found in Hitzler et al., 2023) struggle quite a bit. As deductive reasoning is a major hurdle for approaching human-level cognition, this provides further motivation for understanding the impact of how the presence (or lack thereof) of semantic information impacts KGEs.

The importance of evaluating KGEs with respect to the underlying semantics of the graph is brought up in recent research. When evaluating embedding performance, for instance, Jain et al. (2021) mentions the importance of evaluating how well these embeddings preserve the semantic links within the KG in addition to using ordinary metrics. Our goal of understanding embedding behavior in synthetic KGs is in support of this. Gutierrez et al. work (Gutierrez Basulto & Schockaert, 2018) additionally points out the significance of matching vector space representations to basic ontology rules and terminology, claiming that a more thorough examination of how well embedding methods work with complex semantic structures is essential—a realization that directs our investigation of synthetic KGs.

Additionally, Kang et al. (2021) showed how conditional information can make connections in data more clear, which motivated us to use t-SNE visualizations to uncover important patterns in our embeddings. Their research on demonstrating dataset properties guided our approach for using these visualizations to identify patterns, improve our understanding of groups, and identify data cluster divisions. Building on this visualization method, Damrich et al. (2023) reveals how UMAP and t-SNE can be used to effectively study high-dimensional data. Their usage of similar learning methods to modify embeddings offers a helpful perspective on how visualizing models such as TransE might highlight structural ties in the data, which we apply into our own visualizations of our synthetic KGs.

2.2. Knowledge Graph Embedding Models

We utilize the DGL-KE library for scalable training and evaluation of KGE models.³ KGE models that implement an additive scoring function can be categorized as Translational Distance (TrD) Models. Tested TrD Models include TransE (Bordes et al., 2013), TransR (Lin et al., 2015), and RotatE (Sun et al., 2019). KGE models that apply tensor decomposition (TeD) techniques for scoring can be categorized similarly as TeD Models. Tested TeD Models that fall under this category include RESCAL (Nickel et al., 2011), DistMult (Yang et al., 2015), and ComplEx (Trouillon et al., 2016).

3. Methodology

In this section, we describe how we created the various synthetic KG and FB15k isotopes developed for our evaluation. Specific implementation details, including hyper-parameters, are detailed in Section 3.4.

3.1. Creating SKG-4, SKG-5, and SKG-6

We created a total of six synthetic datasets to further our investigation regarding the graph structure of a KG and how that may affect the link prediction aspect of KGEs. The structure of, or template for, each of these synthetic KGs (SKGs) is shown in Figure 1.

Figure 1.

This figure shows the various schema diagrams for the synthetic KG isotopes. We have used consistently coloring across all figures to demonstrate correspondence. For clarity, in SKG-5RSC, we denote contextual nodes with a C and reification nodes with an R. Black arrows represent arbitrary, but unique (object) properties connecting. Open dashed arrows indicate typeOf relationships (e.g., a gray circle is typeOf a light purple square), while open solid arrows indicate subclassOf relationships (dark purple subsumes light purple). Dashed red arrows indicate shortcut relationships (possibly over a reification) and are still object properties.

We describe the templates for the SKG isotopes. –

SKG-4: This isotope consists of a central node connected to four peripheral nodes via unique properties. That there are four unique properties in this isotope gives meaning to the numerical assignment (i.e., SKG-4).

–

SKG-5: Each node in the template for SKG-4 is assigned a type. In doing so, we introduce a new predicate (rdf:type) and thus increase the isotope numeral. The type remains constant (and thus tied to) the unique property. This is depicted using a consistent color in Figure 1. For example, the top node in SKG-4 always has the “lavender” type.

–

SKG-6: Each type node in SKG-5 is assigned a corresponding superclass. In doing so, we introduce another new predicate (rdfs:subClassOf). In Figure 1, this corresponds to the “lavender” type having “purple” superclass.

–

SKG-5r: This isotope is built from SKG-5. Essentially, each unique property is now reified with a consistently typed node. Reification can be interpreted a few different ways (Galton, 1991; Gangemi & Presutti, 2016). In this case, intend it to be a node that will somehow be used to attach context to a property. A modeling example is show in Figure 2. This metapattern can be viewed in detail in Shimizu et al. (2019).

–

SKG-5rs: For each reified property, we include the original unique property, thus providing a shortcut back to the original target node. This is the red dashed lines in Figure 1.

–

SKG-5rsc: For each reification node, we attach a contextual node of specific type. This is labeled C in the figure.

Figure 2.

This figure shows a basic schema diagram modeling a reified node, with labels corresponding to Figure 1. The reification is represented by the Employment node (‘R’), which allows us to provide context (e.g., TemporalExtent (‘C’) via forDuration) for a relationship (such as a person being employed by an organization). Additionally, the diagram illustrates a direct relationship (red dashed arrow) worksAt, between Person and Organization, which we call a shortcut, which does not include ‘R’ or ‘C’.

For more context, you can find some examples in Appendix A.1.

In this study, we currently instantiate each template 1,000 times. This can be improved in the future to produce templates that interlink or somehow connect via nodes. As it stands, each SKG has 1,000 disconnected components.

It is important to clarify that the training, validation, and test splits are constructed to maintain strict disjointness of entities across these subsets, especially for SKG-4 which consists of 1,000 disconnected components. This design ensures that entities appearing in the test set are not seen during training, reflecting a realistic and challenging open-world link prediction scenario. Thus, standard transductive embedding models face limitations, as they cannot leverage embeddings for unseen entities at test time. This setup was deliberately chosen to evaluate model generalization capabilities under such constraints. Furthermore, while SKG-4 presents disconnected graph components, the synthetic dataset generation process preserves internal structural patterns similar to those in real-world KGs, thereby providing meaningful evaluation benchmarks despite the absence of entity overlap across splits.

3.2. Creating FB15k-238 and FB15k-239

FB15k-237 is published with the data split to allow for training, evaluation, and validation of KGE models. This research introduces FB15k-238 and FB15k-239 as augmentations of the FB15k-237 dataset. Expanding from FB15k-237, FB15k-238 includes exactly one new relation, P31 (hence the 238). P31 is taken from Wikidata and has the label instanceOf or is a.⁴ FB15k-239 extends from FB15k-238 and adds exactly one new relation: P279. P279 is taken from Wikidata and has the label “subclass of.”⁵ FB15k-238 is constructed by iterating through each Freebase entity (MID) and querying Wikidata for its entity typing via the P31 property. The found facts of P31 are appended to the respective data split files of FB15k-237. FB15k-239 is constructed by iterating through each new entity typing entry from FB15k-238 and querying Wikidata for the entity typing’s superclass relationship via the P279 property. The found facts of P279 are appended to the data split files of FB15k-238. We note that not every MID remains incorporated into Wikidata from the original transfer (either they were never transferred or, over time, were for some reason removed).⁶ As such, our FB15k-238 dataset is missing the type information for 42 entities; the graphical overview can be seen in Figure 3.

Figure 3.

Graphical overview of adding semantics to FB15k and the method of testing trained models. (a) This represents the types of triples contained in each of the datasets. The yellow ellipses are a set of triples extracted from $T_{239}$ . The dashed boxes correspond to the colors in (b). (b) A graphical overview of the different KGE models and their corresponding augmentations. The right hand side shows the different sets of test data used to evaluate the models.

We provide Table 1 as a summary of the count of entities, edges, and triples per data split in FB15k-237 and our augmentations.

Table 1.

Comparison of Different Counts for the Freebase Subset and the Created Augmentations.

Dataset	# Entities	# Rel.s	# Train	# Validation	# Test
FB15k-237	14,541	237	272,115	17,535	20,466
FB15k-238	16,414	238	293,471	31,482	35,257
FB15k-239	17,494	239	296,822	33,879	37,738

3.3. Creating SKG-237

We constructed a synthetic graph with the same number of unique nodes, unique predicates, and triple count as FB15k-237. However, the exact facts (i.e., specific entity–relation–entity triples) are not. Instead, triples are added in such a way to create synthetic version of FB15k-237 (which we creatively call SKG-237) that has the same graph centrality metrics: degree, betweenness, and closeness. By calculating these metrics on FB15k-237 and using them as distributional goals during the synthetic triple generation process, we were able to estimate the original graph’s statistical structure while likely removing any semblance of the original modeling.

We stress that SKG-237 was not produced randomly. The centrality characteristics of nodes, edge frequency per predicate, and node degree marginal distributions were instead preserved by carefully sampling and connecting entities and relations. Semantic aspects like topic-driven grouping, inverse relation pairings, and ontological hierarchies were not attempted to be replicated. The objective of SKG-237 is to separate semantic content from structural form so that we can determine if structural similarity is sufficient to support efficient KGEs.

3.4. Implementation

Our graphs are generated using a set of scripts which can be found online. Research artifacts include the scripts for generating the SKG and FB15k isotopes, for calculating the ratio and centrality metrics, for generating the visualizations, and a container for training the KGE models, as well as each of the graphs themselves. They are provided through a Zenodo repository⁷ and a GitHub repository⁸ under the MIT License, which is also included in the repository.

The KGE models , except TransD, are trained through the Deep Graph Learning - Knowledge Embedding (DGL-KE) library (Zheng et al., 2020). Experiments using TransD employed pykeen (Ali et al., 2021; Team, 2024).

Hyper-parameters play a crucial role in training machine learning models, and adjustments to hyper-parameters have a sizable impact on model performance, choosing them for KG embedding model training is a difficult but important issue (Lloyd et al., 2023). Due to the smaller size of these synthetic KGs, we had different hyper-parameter configurations for them, due to incompatibilities between the graph size and the DGL-KE configuration. Further, we were not able to identify the hyper-parameters used in the initial publications of the KGE models, so we opted to standardize their values across our experimentation with the implemented models in DGL-KE. As used by DGL-KE, the list of hyper-parameters⁹ are found in Table 2.

Table 2.
The Hyper-Parameter Settings Used for the Training and Evaluation During Training of the KGE Models With Respect to FB15k-237, FB15k-238, and FB15k-239.

Hyper-Parameter Setting

emb_size 400

max_train_step 500

batch_size 1000

neg_sample_size 1000

learning rate 0.25

gamma 19.9

double_ent FALSE

double_rel FALSE

neg_adversarial_sampling TRUE

adversarial_temperature 1

regularization_coef 1.00E $-$ 09

regularization_norm 3

Hyper-Parameter	Setting
emb_size	400
max_train_step	500
batch_size	1000
neg_sample_size	1000
learning rate	0.25
gamma	19.9
double_ent	FALSE
double_rel	FALSE
neg_adversarial_sampling	TRUE
adversarial_temperature	1
regularization_coef	1.00E $-$ 09
regularization_norm	3

3.5. Evaluation

The experiment consists of four overall analyses: (a) We evaluate the performance of each model across the previously described KGs to examine the respective model’s impact to semantic inclusion. In this experiment, the models are trained and evaluated with their respective training and test data. (b) We evaluate the performance of each model by training them on their own respective training data. We continue to evaluate them with the test data provided by FB15k-237. This allows for an examination of how models are trained with and without semantics when evaluating data. (c) We include an ablation-like study which experiments solely with models trained with FB15k-238 and FB15k-239 data (as respectively denoted by $M_{238}$ and $M_{239}$ in Table 11). These models are evaluated with the new data, challenging the models to perform link prediction on the semantics of the KGs. (d) With using the same hyperparameters shown in Table 2, for training and evaluation of SKG-237 to examine how the performance would be since the graphs are only the same in structure. We also include the t-SNE and UMAP figures. (e) We used drastically different hyperparameters from our original approach shown in Table 3 for training and evaluation on SKG-4, SKG-5,SKG-5r, SKG-5rs, SKG-5rsc and SKG-6 for the TransE model in order to examine how our isotope variations affect the link prediction aspect of learning when that structure is controlled.

Table 3.
The Standardized Lowest Hyper-Parameter Settings Used for the Training and Evaluation During Training of the KGE Models for the SKGs.

Hyper-Parameter Setting-SKG

max_train_step 50

batch_size 10

neg_sample_size 10

learning rate 0.25

gamma 19.9

hidden_dim 10

log_interval 25

reg_coeff 1.00E $-$ 05

Hyper-Parameter	Setting-SKG
max_train_step	50
batch_size	10
neg_sample_size	10
learning rate	0.25
gamma	19.9
hidden_dim	10
log_interval	25
reg_coeff	1.00E $-$ 05

As a straightforward and widely used model that offers a clear baseline for evaluating embedding accuracy and link prediction performance, we chose to focus on TransE as our starting point. TransE’s translation-based approach fits in well with our goal of investigating how structural features affect model behavior, and initial testing showed that it is very sensitive to graph structure changes. To compare the synthetic KG with complex models created in future investigations, it was the perfect place to start when evaluating how well it represents underlying relationships.

3.6. KGE Evaluation Metrics

The DGL-KE library provides an evaluation mechanism, configured with Mean Rank (MR), Mean Reciprocal Rank (MRR), and Hits@K (Akrami et al., 2020). MR is a statistical metric representing the average position or ordinal rank assigned to a set of items in a given ranking. A lower MR score indicates a better performing model. MRR is a statistical measure that assesses the average of the reciprocals of the ranks assigned to relevant items in a ranked list. A higher MRR score, constrained by {0,1}, indicates a better performing model. Hits@K is an evaluation metric that measures the number of relevant items present in the top-k positions of a ranked list. A higher value indicates a better performing model. Our evaluation uses $k$ at 1, 3, and 10.

3.7. Graph Metrics

Important insights into the dynamics and structure of the underlying data are obtained by investigating graph metrics in the context of KGs. The following explains each metric’s selection and the implications of each datasets investigation.

The number of edges, nodes, and facts: The basic indicators of a KG’s size and scope. It is easier to measure the complexity and range of the KG when one is aware of the quantity of facts (triples), nodes (entities), and edges (relationships). Examining these parameters enables us to classify KGs according on their sparsity and density. A dataset with a high degree of relation is shown by a graph with a large number of edges in relation to its nodes. On the other hand, a lower percentage can indicate a more domain-specific or scattered KG. Understanding the KG’s size helps in adjusting procedures to the properties of the data, minimizing errors and increasing efficiency in applications like link prediction and node categorization (Dörpinghaus et al., 2022).

Degree Centrality: Degree centrality provides an simple yet effective method of identifying significant nodes in the connections by counting the number of relationships that each node has. Knowing the degree distribution makes it easier to identify the entities in the graph that may be more important. In applications where integration is crucial, such as recommender systems or data retrieval, high-degree nodes can serve as indicators of significant concepts or entities. In domain-specific analysis, low-degree nodes can b e helpful in identifying specialized or less well-known topics (Dörpinghaus et al., 2022). In this work, we specifically use NetworkX’s (Hagberg et al., 2008) normalized degree centrality, which divides each node’s degree by the maximum possible degree (number of nodes minus one). This normalization produces values between 0 and 1, explaining why degree centrality values reported here can be quite small despite nodes having at least one connection. This method reflects how connected a node is relative to the size of the graph, consistent with the sparse nature of our datasets.

Inbetweeness Centrality: This metric measures how far a node is from other nodes along paths, indicating that it serves as a link in the network. Nodes with a high betweenness centrality are those that control the network’s information flow. These nodes may serve as crucial connections or limitations, which is essential to comprehending the paths via which knowledge spreads. Examining betweenness centrality can help identify entities that could interfere with how the graph connects if eliminated. For operations involving pathway analysis or locating vulnerable positions in the graph, this is highly relevant (Dörpinghaus et al., 2022).

Closeness Centrality: This metric quantifies a node’s closeness to every other node in the graph, indicating whether efficiently information can get to it. Relevance: High closeness centrality nodes are positioned to enable fast access to data from the whole graph. This aids in locating basic concepts that serve as points of focus for the distribution or retrieval of information. In applications like query-answering systems or KG-based search engines, where immediate access to scattered information is essential, knowing which nodes have high proximity centrality can help with decision-making processes (Dörpinghaus et al., 2022).

4. Results

We report our results along three dimensions, the graph centrality metrics, KGE model performance on the link prediction task (including both the evaluation for the SKG isotopes and the ablation-like study for FB15k isotopes), and visualizations using t-SNE (Kang et al., 2021) and UMAP (Damrich et al., 2023).

4.1. Graph Metrics of the Isotopes

Tables 4, 5 and 6 present important metrics, such as the total number of facts, nodes, edges, and edge-to-node ratio, for the datasets and synthetic KGs. Additionally, they offers broad information for degree centrality, betweenness centrality, and closeness centrality, displaying the average, maximum, and minimum values for each of these metrics across the datasets.

Table 4.
The Graph Metrics for FB15k-237, FB15k-238, and FB15k-239.

Metric FB15k-237 FB15k-238 FB15k-239

Total Facts 310,114 360,210 368,439

Number of Nodes 14,541 16,414 17,507

Number of Edges 248,608 270,023 273,403

Edge-to-Node Ratio 17.10 16.45 15.62

Degree Centrality (Min) 6.90E $-$ 05 6.10E $-$ 05 5.70E $-$ 05

Degree Centrality (Max) 0.4782 0.4240 0.3975

Degree Centrality (Avg) 0.0024 0.0020 0.0018

Betweenness Centrality (Min) 0.00 0.00 0.00

Betweenness Centrality (Max) 0.2837 0.2555 0.2477

Betweenness Centrality (Avg) 0.0001 0.0001 0.0001

Closeness Centrality (Min) 0.0001 0.0001 0.1123

Closeness Centrality (Max) 0.6155 0.5681 0.5304

Closeness Centrality (Avg) 0.3732 0.3473 0.3261

Metric	FB15k-237	FB15k-238	FB15k-239
Total Facts	310,114	360,210	368,439
Number of Nodes	14,541	16,414	17,507
Number of Edges	248,608	270,023	273,403
Edge-to-Node Ratio	17.10	16.45	15.62
Degree Centrality (Min)	6.90E $-$ 05	6.10E $-$ 05	5.70E $-$ 05
Degree Centrality (Max)	0.4782	0.4240	0.3975
Degree Centrality (Avg)	0.0024	0.0020	0.0018
Betweenness Centrality (Min)	0.00	0.00	0.00
Betweenness Centrality (Max)	0.2837	0.2555	0.2477
Betweenness Centrality (Avg)	0.0001	0.0001	0.0001
Closeness Centrality (Min)	0.0001	0.0001	0.1123
Closeness Centrality (Max)	0.6155	0.5681	0.5304
Closeness Centrality (Avg)	0.3732	0.3473	0.3261

Table 5.

The Graph Metrics for SKG-4, SKG-5, and SKG-6.

Metric	SKG-4	SKG-5	SKG-6	Difference
Total Facts	4000	9000	9004	$↑$
Number of Nodes	5000	5005	5009	$↑$
Number of Edges	4000	9000	9004	$↑$
Edge-to-Node Ratio	0.80	1.80	1.80	$↑$
Degree Centrality (Min)	0.0002	0.0004	0.0002	$↑$ and $↓$
Degree Centrality (Max)	0.0008	0.1998	0.1999	$↑$
Degree Centrality (Avg)	0.0003	0.0007	0.0007	$↑$
Betweenness Centrality (Min)	0.00	0.0001	0.00	$↑$ and $↓$
Betweenness Centrality (Max)	0.00	0.3591	0.3586	$↑$ and $↓$
Betweenness Centrality (Avg)	0.00	0.0004	0.0004	$↑$
Closeness Centrality (Min)	0.0046	0.2962	0.2920	$↑$
Closeness Centrality (Max)	0.0005	0.2943	0.2939	$↑$
Closeness Centrality (Avg)	0.0008	0.5553	0.5547	$↑$ and $↓$

Note: The arrows in the difference column indicate the direction of change: an upward arrow ( $↑$ ) denotes an increase in the metric value compared to the previous graph, while a downward arrow ( $↓$ ) denotes a decrease.

Table 6.

The Graph Metrics for SKG-4, SKG-5, SKG-5r, SKG-5rs, and SKG-5rsc.

Metric	SKG-4	SKG-5	SKG-5r	SKG-5rs	SKG-5rsc	Difference
Total Facts	4,000	9,000	13,000	17,000	25,001	$↑$
Number of Nodes	5,000	5,005	9,005	9,005	13,007	$↑$
Number of Edges	4,000	9,000	13,000	17,000	25,001	$↑$
Edge-to-Node Ratio	0.80	1.80	1.44	1.89	1.92	$↑$
Degree Centrality (Min)	0.0002	0.0004	0.0001	0.0002	7.70E $-$ 05	$↑$ and $↑$
Degree Centrality (Max)	0.0008	0.1998	0.1111	0.1111	0.3076	$↑$
Degree Centrality (Avg)	0.0003	0.0007	0.0003	0.00042	0.0003	$↑$
Betweenness Centrality (Min)	0.00	0.0001	0.00	0.00	0.00	$↑$ and $↓$
Betweenness Centrality (Max)	0.00	0.3591	0.3081	0.5299	0.4748	$↑$ and $↓$
Betweenness Centrality (Avg)	0.00	0.0004	0.0003	0.0003	0.0002	$↑$
Closeness Centrality (Min)	0.0046	0.2962	0.2047	0.2649	0.2602	$↑$
Closeness Centrality (Max)	0.0005	0.2943	0.4285	0.5293	0.4813	$↑$
Closeness Centrality (Avg)	0.0008	0.5553	0.2388	0.2814	0.2949	$↑$ and $↓$

As can be seen at Table 7, even though SKG-237 is the same in structure with FB15k-237 as far as node, fact, edges and degree centrality values, the train/evaluation and visualization results are so different as shown in Table 8.

Table 7.

The Graph Metrics for SKG-237.

Metric	SKG-237
Total Facts	310,114
Number of Nodes	14,541
Number of Edges	248,608
Edge-to-Node Ratio	17.10
Degree Centrality (Min)	6.90E $-$ 05
Degree Centrality (Max)	0.4782
Degree Centrality (Avg)	0.0024
Betweenness Centrality (Min)	0.00
Betweenness Centrality (Max)	0.0100
Betweenness Centrality (Avg)	0.0001
Closeness Centrality (Min)	0.2234
Closeness Centrality (Max)	0.3720
Closeness Centrality (Avg)	0.2757

Table 8.

The Results of Our Evaluation of SKG-237 for TransE, Using the Standardized Hyper-Parameters.

Metric	SKG-237
MRR	0.01209
MR	309.91
Hits1	0.0018
Hits3	0.0058
Hits10	0.0186

Note: We note a drastically negative impact on performance.

4.2. KGE Evaluation Results

Table 9 refers to the evaluation results of TransE.

Table 9.
The Performance Results for the SKG Isotopes Using TransE With the Standardized Hyper-Parameters.

Model Metrics SKG-4 SKG-5 SKG-6 SKG-5r SKG-5rs SKG-5rsc

TransE MRR 0.3068 0.2311 0.2366 0.2486 0.2410 0.2482

MR 5.6925 6.8772 6.7836 6.5969 6.6403 6.6325

HITS@1 0.1275 0.0683 0.06992 0.0800 0.0682 0.07620

HITS@3 0.2963 0.1972 0.2109 0.2269 0.2275 0.2371

HITS@10 0.9163 0.8678 0.8574 0.8881 0.8541 0.8419

Model	Metrics	SKG-4	SKG-5	SKG-6	SKG-5r	SKG-5rs	SKG-5rsc
TransE	MRR	0.3068	0.2311	0.2366	0.2486	0.2410	0.2482
	MR	5.6925	6.8772	6.7836	6.5969	6.6403	6.6325
	HITS@1	0.1275	0.0683	0.06992	0.0800	0.0682	0.07620
	HITS@3	0.2963	0.1972	0.2109	0.2269	0.2275	0.2371
	HITS@10	0.9163	0.8678	0.8574	0.8881	0.8541	0.8419

Table 10 reports the model performances when trained with their respective KGs. Models trained according to a specific FB15k- $x$ are denoted as $M_{x}$ , where $x$ is the appropriate value. Test datasets are denoted $T_{x}$ , analogously, and we may find their differences (e.g., $T_{238} - T_{237}$ contains only the entity type triples).

Table 10.

The Results of Evaluating Each of the Models Against Their Respective Training Data.

	Evaluation across KGs			Evaluation with $T_{237}$
Model	Metrics	FB15k-237	FB15k-238	FB15k-239	FB15k-238	FB15k-239
TransE	MRR	0.4143	0.7219	0.7342	0.5489	0.5566
	MR	33.9947	11.1185	10.3057	17.2292	16.1160
	HITS@1	0.2982	0.6440	0.6569	0.4349	0.4440
	HITS@3	0.4701	0.7729	0.7836	0.6152	0.6195
	HITS@10	0.6394	0.8577	0.8714	0.7575	0.7668
TransR	MRR	0.2901	0.2019	0.2361	0.3037	0.3058
	MR	152.6647	241.1533	209.9060	138.4772	128.6830
	HITS@1	0.2247	0.1538	0.1832	0.2401	0.2404
	HITS@3	0.3148	0.2153	0.2526	0.3266	0.3293
	HITS@10	0.4066	0.2871	0.3292	0.4188	0.4226
TransD	MRR	0.210	0.228	0.259	0.0032	0.0030
	MR	330.50	287.87	155.05	7168.6280	6828.5341
	HITS@1	0.116	0.126	0.146	0.0025	0.0023
	HITS@3	0.249	0.277	0.302	0.0027	0.0025
	HITS@10	0.390	0.419	0.485	0.0033	0.0030
ComplEx	MRR	0.3064	0.1296	0.1909	0.0840	0.1297
	MR	84.6207	225.5332	129.6837	281.7108	179.7169
	HITS@1	0.2034	0.0809	0.1152	0.0425	0.0652
	HITS@3	0.3532	0.1331	0.2053	0.0859	0.1398
	HITS@10	0.5001	0.2206	0.3420	0.1612	0.2565
RESCAL	MRR	0.3520	0.3132	0.3939	0.3771	0.3766
	MR	126.5442	134.8030	104.7880	119.2274	116.6290
	HITS@1	0.2766	0.2478	0.3142	0.3108	0.3055
	HITS@3	0.3884	0.3366	0.4305	0.4043	0.4076
	HITS@10	0.4789	0.4320	0.5388	0.4957	0.5022
DistMult	MRR	0.3213	0.1644	0.2344	0.1097	0.1537
	MR	80.7576	137.8459	108.8363	190.6282	158.1567
	HITS@1	0.2180	0.0865	0.1452	0.0508	0.0800
	HITS@3	0.3672	0.1785	0.2601	0.1147	0.1696
	HITS@10	0.5176	0.3216	0.4122	0.2235	0.2993
RotatE	MRR	0.0769	0.0751	0.0629	0.0775	0.0664
	MR	277.2960	287.1963	298.5854	265.2055	257.6581
	HITS@1	0.0419	0.0394	0.0344	0.0408	0.0341
	HITS@3	0.0757	0.0728	0.0592	0.0767	0.0625
	HITS@10	0.1331	0.1361	0.1068	0.1387	0.1189

Note: This table reports the results of testing each of the models against solely the FB15k-237 training data (i.e., $T_{237}$ ).

Table 11.

The Results of Our Ablation-Like Study, Where We Change Which Component of the Data Against Which We Evaluate.

		$M_{238} \leftarrow$		$M_{239} \leftarrow$
Model	Metrics	$T_{238 - 237}$	$T_{238 - 237}$	$T_{239 - 238}$	$T_{239 - 237}$
TransE	MRR	0.9590	0.9465	0.9495	0.9470
	MR	2.6546	3.1323	6.0362	3.4810
	HITS@1	0.9280	0.9122	0.9132	0.9128
	HITS@3	0.9910	0.9785	0.9851	0.9799
	HITS@10	0.9970	0.9958	0.9910	0.9955
TransR	MRR	0.0632	0.1643	0.0770	0.1522
	MR	383.1885	254.3162	618.0510	306.3611
	HITS@1	0.0377	0.1215	0.0611	0.1139
	HITS@3	0.0645	0.1744	0.0832	0.1617
	HITS@10	0.1034	0.2377	0.1002	0.2175
TransD	MRR	0.3784	0.0143	0.0006	0.2615
	MR	100.8378	8476.8943	8628.7851	128.5453
	HITS@1	0.2485	0.0132	0.0000	0.1510
	HITS@3	0.4569	0.0143	0.0002	0.3061
	HITS@10	0.6049	0.0151	0.0007	0.4746
ComplEx	MRR	0.1904	0.2250	0.4818	0.2614
	MR	147.8925	75.8214	38.9972	70.3205
	HITS@1	0.1317	0.1347	0.3841	0.1702
	HITS@3	0.1957	0.2433	0.5261	0.2839
	HITS@10	0.3033	0.4047	0.6724	0.4436
DistMult	MRR	0.2399	0.2989	0.5142	0.3316
	MR	64.7679	53.2142	35.7648	50.5354
	HITS@1	0.1350	0.1896	0.4160	0.2245
	HITS@3	0.2652	0.3329	0.5661	0.3685
	HITS@10	0.4593	0.5212	0.6941	0.5479
RESCAL	MRR	0.2238	0.3830	0.5886	0.4127
	MR	156.3768	89.4667	99.9538	90.8974
	HITS@1	0.1583	0.2852	0.5412	0.3223
	HITS@3	0.2437	0.4319	0.6087	0.4571
	HITS@10	0.3440	0.5639	0.6764	0.5801
RotatE	MRR	0.0707	0.0679	0.0049	0.0584
	MR	317.6899	293.1596	669.9831	347.1890
	HITS@1	0.0364	0.0418	0.0012	0.0352
	HITS@3	0.0656	0.0645	0.0020	0.0549
	HITS@10	0.1329	0.1071	0.0068	0.0923

Note: $M_{x}$ denotes a model being trained with FB15k- $x$ . $T_{x - y}$ denotes test data, where $x - y$ refers to the set difference resulting in data that can only be found in FB15k- $x$ .

As a space saving measure, the evaluation of FB15k-237 is reported only once, as the second test to compare the various trained models repeat evaluation of FB15k-237 on its own test data. If there are no bold reports shown for a particular model, the optimal reported results are from models trained with FB15k-237. If some results are missing underline in the evaluation with $T_{237}$ , the next best results come from models trained on FB15k-237.

Table 11 reports the result of the aforementioned ablation-like study.

4.3. Visualizations of TransE Embeddings

Captions of the figures are included in the Appendix so as to not overwhelm the narrative.

5. Discussion

5.1. KGE Performance Over SKG Isotopes

For SKG-4, which has no hierarchical relationships or any sort of additional “semantic complexity,” we observe the strongest performance across most metrics. As this synthetic KG is less complex and generally consistent, embedding models can easily pick up patterns. We note that these values will still be limited due to the low connectivity between template structure instantiations. Yet, when we begin introducing semantic annotations (in the form of rdf:type triples), which should theoretically enrich the graph and increase connectivity (as in SKG-5 and its variations), we do not observe a clear or consistent improvement in embedding performance. This suggests that adding type information alone, without richer context or relation semantics, may not provide sufficient structural or relational cues to benefit learning. SKG-5 and its variations which include reification, shortcut connections, and contextual triples, present a more detailed image. For base SKG-5, Mean Reciprocal Rank (MRR) decreases slightly, indicating increased difficulty in identifying the correct tail entity in ranking tasks. Nonetheless, Hits@10 improves, implying that while exact prediction is harder, the correct entity is often still placed near the top of the evaluated set’s list. This reflects the fact that the structure is deeper and denser but also more ambiguous. Reification adds more ambiguity, which likely increases the difficulty for most models to identify the underlying semantic connections. Similarly, shortcut edges (e.g., directly linking nested entities that would otherwise be several hops apart) increase structural complexity, which could be harmful if the model overfits to these objects, but on the other hand may provide a correcting force by reducing the distance originally introduced by the reification process. By contrast, contextual triples; for example, embeddings of event context, agent roles, or time qualifiers appear to provide supplementary information that help models infer missing links more effectively. This is particularly evident in improved performance under Hits@10, where partial correctness is showcased.

5.2. KGE Performance Over FB15k Isotopes

First, across the different isotopes, we see that the inclusion of the additional semantic data drastically improves the performance of TransE and RESCAL, but otherwise impedes or has a marginal improvement in the other models, when tested with the full training data for each corresponding FB15k isotope. We believe this to largely be the product of the type of relationships being added. For example, DistMult works best with only symmetric relationships, and neither P31 nor P279 are as such. The TransD model’s performance clearly improves when we go from FB15k-237 to FB15k-239, with Hits@10 rising from 0.390 to 0.485 and MRR rising from 0.210 to 0.259. But when analyzing just the components of FB15k-238 and FB15k-239 that are absent with respect to FB15k-237 training data ( $T_{237}$ ), all measures significantly decline, with MRRs falling below 0.004, suggesting that the model has trouble interpreting facts that have not yet been seen. Treating this work as a more traditional data science problem is slated for immediate next steps.

We also test if the presence of additional semantic metadata present during training improves link prediction only in the case of non-semantic metadata relations (i.e., not P31 or P279). For TransE and TransR this is the case relative to baseline.

Limitations. We acknowledge that there are some limitations to this approach, and that the reader should be careful in the interpretation of this work. The evaluation may not fully capture how the models handle unseen or more complex hierarchical relationships, and dataset-specific biases could influence the results. Moreover, some of the results we report might not generalize beyond the FB15k isotopes, so additional testing on more diverse KGs is needed. We have chosen FB15k due to its widespread usage, and thus allows for commensurately widespread comparisons. In particular, the use of ComplEx or DISTMULT may bias the reader to believing that all KGEs are poor or that broadly additional semantic information hinders performance more broadly, as they are not meant to handle the types of relations added to FB15k-237 to form the isotopes. Yet, while the performance is poor (as expected), this highlights that the semantic information added to the graph still does not improve, which is counter to the argument that semantics (or metadata in general) improves the KG. Specifically, it highlights the need for new forms of evaluation (e.g., on ML pipelines (Llugiqi et al., 2024) rather than other traditional metrics (Guarino & Welty, 2004; Sammi et al., 2026)) such as KGrEaT (Heist et al., 2023), as well as new KGE models that are more capable of incorporating semantics (Hubert et al., 2023).

The key takeaway, rather than just new models or new evaluations, is that models that are not meant to handle these sorts of relations (notably, TransE) still have an increased performance on the link prediction task when we remove these relations (i.e., the ones that TransE would not handle well) from the evaluation, indicating that their presence during training is beneficial, whereas during evaluation they skew the score. This indicates that such models are learning something from the semantics, and the determination thereof is immediate and valuable follow-on work to these experiments.

5.3. KGE Performance Over SKG-237

Despite utilizing a KG that has the same structural characteristics as FB15k-237 (nodes, predicates, and triples), the TransE model does not do well on link prediction, according to the results . Low HITS@1, HITS@3, and HITS@10 scores, along with poor Mean Reciprocal Rank (MRR) and Mean Rank (MR) values, show that the model has struggled properly ranking pertinent entities, even among the top 10 predictions. This suggests that although the synthetic KG shares structural similarities with FB15k-237, it does not contain of the semantic relationships that underlie the original dataset. Thus, we note, that to some extent TransE requires that the KG indeed more closely mimic real-world data. Further exploration is required to determine the exact connection between recurring entities in triple and the appearance of entities consistently in appropriate domains and ranges of relations. That is to say, that we suspect in order for a KG to be TransE-learnable, a minimum semantics is required in the graph.

5.4. Ablation-Like Study With FB15k Isotopes

The purpose of our ablation-like study is to determine how different training data influences the model and, subsequently, if the end results change for different test data. For example, the $T_{238} - T_{237}$ task may be also called type classification, as we are at this point simply predicting a P31 relation. We note that TransE does exceptionally well, outperforming itself when all data is present, and irrespective of the training data. Yet, this is not the case for any other model. One concern is that the relatively huge presence bias for P31 may be significantly skewing performance. On the other hand, given that performance still improves when removing said assertions (Table 10, cols. 4–5).

Overall, we see that when looking to improve performance for link prediction, for simple assertional relationships, TransE is effective and much improved when semantic metadata is included during training, outperforming all other models.

TransD performs second best when $M_{238}$ is evaluated against the $T_{238 - 237}$ and when $M_{238}$ is evaluated against $T_{239 - 237}$ as these test sets include hierarchical and N-1 relations such as P31 and P279, which require flexible modeling of asymmetric and complex relationships. TransD’s use of dynamic projections for entities and relations enables it to better capture these patterns compared to models like ComplEx or DistMult, which assume symmetry and thus struggle with such relations. We believe that the overall poorer performance (say over TransE) is that the data is inherently mixed (i.e., that the isotopes do not contain only hierarchical data).

5.5. Discussion of Graph Metrics

The graphs for SKG-4, SKG-5, and SKG-6 are getting more complex as reflected by their increased nodes, edges, and edge-to-node ratio, which indicates their higher degree of connectivity. Degree and betweenness centrality show that center nodes that are important are becoming more frequent, even while many nodes are still less connected. Nodes become easier to locate in SKG-5, but somewhat less so in SKG-6, according to closeness centrality.

When reification relationships (r), shortcuts (s), and contextual information (c) are added, the metrics for SKG-4 through SKG-5rsc clearly show a pattern of a growing complexity. A significant increase is seen in the overall number of facts, nodes, and edges; denser graphs are demonstrated by a higher edge-to-node ratio. There is a range of degree centrality values, with some nodes growing closer together while others stay just moderately connected. Although betweenness centrality points to the rise in important nodes, especially in SKG-5rs, overall average values are still low, suggesting that there is no dominant centralization. As shortcuts and context are introduced, nodes become easier to access, thus increasing total graph connection, according to closeness centrality values. These trends demonstrate the graphs’ increasing structural changes and depth as more semantic layers are added.

With the most facts, nodes, and edges, FB15k-239 has the largest graph structures, according to the metrics. In bigger datasets, the edge-to-node ratio drops from FB15k-237 to FB15k-239, indicating a lower relative graph density. While average values drop across the datasets, degree centrality measurements indicate that FB15k-237 has a greater range with higher maximum values, suggesting better balanced connection in larger graphs. Although betweenness centrality varies throughout the datasets, FB15k-237 has somewhat higher maximum values, suggesting that some nodes are essential for connecting. More direct node interaction is suggested by FB15k-237’s greater maximum and average closeness centrality scores.

So far we managed to replicate the graph structure of FB15k-237, as far as node, edge count, node/edge ratio and degree centrality.

Most nodes do not act as important information-transfer facilitators, given the low average and maximum betweenness centrality numbers, thereby pointing to a graph structure in which no single node controls the shortest paths.

Also, nodes appear to be in a similar location with respect to their average distance to every other node, based on the very small variety of closeness centrality values. This suggests that a lot of nodes in the graph are relatively easy to find, indicating a balanced connectivity pattern.

5.6. Discussion of Visualization Results

The distribution of the training embeddings for FB15k-237, FB15k-238 and FB15k-239 can be seen in Figures 4, 5 and 6, showing discrete clusters within each dataset. The visualizations show distinct regions with dense node clusters and with minimal areas with scattered nodes. This suggests that the model has effectively discovered significant connections between the KG’s entities.

Figure 4.

TransE embedding visualizations for SKG-237.

Figure 5.

TransE embedding visualizations for FB15k-238.

Figure 6.

TransE embedding visualizations for FB15k-239.

The entities and relations are typically well-separated, suggesting that the model obtained distinct representations of different entity and relation types. The relationships between entities in the embedding space appear to be implied by the model, as shown by the red crosses that depict relations appearing between clusters.

TransE t-SNE and UMAP visualizations are showcased in Figure 7(a) and (b). Notably the nature of the clustering is quite different. Of course, both t-SNE and UMAP are not appropriate for strictly defining cluster membership, they can give an understanding of what clusters might exist. In this case, we might inspect that the centrality metrics for SKG-237 are misleading. Future work should include different investigations to the centrality metric beyond the average, per se.

Figure 7.

TransE embedding visualizations for SKG-237.

Figure 8.

(a) t-sne embeddings for SKG-4 with version-1 hyperparameters. (b) t-sne embeddings for SKG-4 with version-2 hyperparameters.

Figure 9.

(a) umap embeddings for SKG-4 with version-1 hyperparameters. (b) umap embeddings for SKG-4 with version-2 hyperparameters.

Figure 10.

(a) t-sne embeddings for SKG-5 with version-1 hyperparameters. (b) t-sne embeddings for SKG-5 with version-2 hyperparameters.

Figure 11.

(a) umap embeddings for SKG-5 with version-1 hyperparameters. (b) umap embeddings for SKG-5 with version-2 hyperparameters.

Figure 12.

(a) t-sne embeddings for SKG-6 with version-1 hyperparameters. (b) t-sne embeddings for SKG-6 with version-2 hyperparameters.

Figure 13.

(a) umap embeddings for SKG-6 with version-1 hyperparameters. (b) umap embeddings for SKG-6 with version-2 hyperparameters.

The dense mixing observed here may lead to the model miss-ranking predictions because of similar embeddings for different entities. As opposed to this, the UMAP plot in Figure 7(b) displays elements within close clusters, although this separation might simply represent key structural differences rather than expressing the more complicated semantic connections required for accurate predictions.

Plots for the SKG isotopes are shown in Figures 14–19, displaying the TransE training results and giving us an insight on the overall clustering and that the lack of interconnections between template structure instantiations has a negative impact. Yet, in higher isotopes, we also notice a distinct lack of clustering based on type (i.e., that consistent use of type for the range of a property does not seem to overtly influence the distribution of embeddings).

Figure 14.

Visualization of t-SNE and UMAP embeddings for SKG-4.

Figure 15.

Visualization of t-SNE and UMAP embeddings for SKG-5.

Figure 16.

Visualization of t-SNE and UMAP embeddings for SKG-5r.

Figure 17.

Visualization of t-SNE and UMAP embeddings for SKG-5rs.

Figure 18.

Visualization of t-SNE and UMAP embeddings for SKG-5rsc.

Figure 19.

Visualization of t-SNE and UMAP embeddings for SKG-6.

The t-SNE and UMAP visualizations both demonstrate the creation of clusters that is similar for SKG-4 seen in Figure 14, suggesting that entities with similar semantic properties are grouped together. Based on embedding values, the color coding indicates that different entities have different semantic characteristics. Along with SKG-5 and SKG-6 shown in Figures 19 and 15 respectively, the clusters show the most separation, confirming the results we discussed above about the highest evaluation results.

The rest of the visualizations of the extended versions of SKG-5 that are SKG-5r/5rs/5rsc in Figures 16, 17 and 18 respectively, show many small tight clusters again reflecting the evaluation results.

6. Conclusion

Surprisingly, creating SKG-237 with exact triple, node, predicate and degree centrality number as FB15k-237 was not enough as a controlled environment in terms of training and evaluation results.

The semantic connections represented in the synthetic graph may not capture the complex patterns found in FB15k-237, despite its structural features (such as node/edge counts and centrality measurements) being identical. Hence, we took the next step in creating SKG-4/5/6 and the variations of SKG-5 (SKG-5/5r/5rs/5rsc).

In summary, controlling graph structure has yielded important information on KGE’s performance. With its simple structure, SKG-4 provides the best link prediction results, indicating that model performance is improved by minimal complexity. While adding complexity enhances semantic depth, it also makes prediction more difficult. This is true for SKG-5 and its modified versions, which include reification, shortcuts, and contextual information.

Simpler structure, such as SKG-4, are shown to form more distinct clusters, while more complicated graphs create a balance between relationships and group formation. According to these findings, adjusting graph complexity affects how effective KGEs are; simplicity and structural depth must be balanced. The experiment described in this short article invites further investigations toward understanding the impact of a KG’s schema and KGE model performance. The reports of our experiment suggests a threshold of semantic inclusion exists that can assist in link prediction for all models. Understanding the effects of graph metrics and structure on embedding outcomes is essential. Getting the best results out of knowledge graph embeddings can be complex as demonstrated by the impact of these metrics and the careful tuning of training and evaluation parameters.

Future Work

We have identified some next steps in this line of research:

Replicate the experiment on other benchmarks (e.g., YAGO (Tanon & Gerhard Weikum, 2020) or WN18RR (Bordes et al., 2013)).

Replicate the experiment using additional models (e.g., Deep Learning techniques for KGEs (Dettmers et al., 2018)), which may better incorporate semantics, as well as establish no differences between implementations (e.g., Ali et al., 2021)

Increase the number of tested isotopes by adding even more semantic metadata and varying graph structures.

Examine the impact on other downstream tasks (e.g., entity clustering (Wang et al., 2017)).

Examine how different embedding models are capable (or not) of handling various KG characteristics, especially when the graph may have heterogeneous characteristics (e.g., not all relations are hierarchical or not all relations are symmetric).

Footnotes

Acknowledgments

The authors acknowledge support from the National Science Foundation (NSF) under Grant #2333532; Proto-OKN Theme 3: An Education Gateway for the Proto-OKN. The authors would like to thank Brandon Dave for his earlier contributions to this work, namely through (Dave et al., 2024).

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

ORCID iDs

Antrea Christou

Cogan Shimizu

Notes

Appendix

References

Akrami

Saeef

M. S.

Zhang

(2020). Realistic re-evaluation of knowledge graph completion methods: An experimental study.

Ali

Berrendorf

Hoyt

C. T.

Vermue

Sharifzadeh

Tresp

Lehmann

(2021). PyKEEN 1.0: A Python library for training and evaluating knowledge graph embeddings. Journal of Machine Learning Research, 22(82), 1–6.

Antia

M.-J.

Keet

(2023). Automating the generation of competency questions for ontologies with AgOCQs.

Baker

Prud’hommeaux

(Eds.) (2019). Shape Expressions (ShEx) 2.1 Primer. Final Community Group Report 09 October 2019. http://shex.io/shex-primer/index.html

Blomqvist

Hammar

Presutti

(2016). Engineering ontologies with patterns—The eXtreme design methodology. In Hitzler, P., Gangemi, A., Janowicz, K., Krisnadhi, A., & Presutti, V. (Eds.), Ontology engineering with ontology design patterns—Foundations and applications. Volume 25 of Studies on the Semantic Web (pp. 23–50). IOS Press.

Bordes

Usunier

Garcia-Durán

Weston

Yakhnenko

(2013). Translating embeddings for modeling multi-relational data. In Proceedings of the 26th international conference on neural information processing systems—Volume 2, NIPS’13 (pp. 2787–2795). Curran Associates Inc.

Damrich

Böhm

Hamprecht

F. A.

Kobak

(2023). From t-sne to umap with contrastive learning. In Proceedings of the 11th international conference on learning representations (ICLR).

Dave

Christou

Shimizu

(2024). Towards understanding the impact of graph structure on knowledge graph embeddings. In Besold, T. R., d’Avila Garcez, A., Jiménez-Ruiz, E., Confalonieri, R., Madhyastha, P., & Wagner, B. (Eds.), Neural-symbolic learning and reasoning—18th international conference, NeSy 2024, Barcelona, Spain, September 9–12, 2024, Proceedings, Part II. Volume 14980 of Lecture Notes in Computer Science (pp. 41–50). Springer.

Dave

Shimizu

(2023). Towards understanding the impact of schema on knowledge graph embeddings (invited). In press.

10.

Dettmers

Minervini

Stenetorp

Riedel

(2018). Convolutional 2d knowledge graph embeddings.

11.

Dörpinghaus

Weil

Düing

Sommer

M. W.

(2022). Centrality measures in multi-layer knowledge graphs. In Communication papers of the 17th conference on computer science and intelligence systems (FedCSIS). Volume 32 of Annals of Computer Science and Information Systems (pp. 163–170). Polskie Towarzystwo Informatyczne.

12.

Fernandez-Lopez

Gomez-Perez

Juristo

(1997). Methontology: From ontological art towards ontological engineering. In Proceedings of the AAAI97 spring symposium (pp. 33–40).

13.

Galton

(1991). Reified temporal theories and how to unreify them. In Proceedings of the 12th international joint conference on artificial intelligence—Volume 2, IJCAI’91 (pp. 1177–1182). Morgan Kaufmann Publishers Inc.

14.

Gangemi

Guarino

Masolo

Oltramari

Schneider

(2002). Sweetening ontologies with dolce. In Gómez-Pérez, A. & Benjamins, V. R. (Eds.), Knowledge engineering and knowledge management: Ontologies and the semantic web (pp. 166–181). Springer Berlin Heidelberg.

15.

Gangemi

Presutti

(2009). Ontology design patterns. In Staab, S. & Studer, R. (Eds.), Handbook on ontologies. International Handbooks on Information Systems (pp. 221–243). Springer.

16.

Gangemi

Presutti

(2016). Multi-layered n-ary patterns. In Hitzler, P., Gangemi, A., Janowicz, K., Krisnadhi, A., & Presutti, V. (Eds.), Ontology engineering with ontology design patterns—Foundations and applications. Volume 25 of Studies on the Semantic Web (pp. 105–131). IOS Press.

17.

Gómez-Pérez

(2004). Ontology evaluation (pp. 251–273). Springer Berlin Heidelberg.

18.

Guarino

Welty

C. A.

(2004). An overview of OntoClean (pp. 151–171). Springer Berlin Heidelberg.

19.

Gutierrez Basulto

Schockaert

(2018). From knowledge graph embedding to ontology embedding? An analysis of the compatibility between vector space representations and rules.

20.

Hagberg

A. A.

Schult

D. A.

Swart

P. J.

(2008). Exploring network structure, dynamics, and function using networkx. In Proceedings of the 7th python in science conference (SciPy2008) (pp. 11–15).

21.

Heist

Hertling

Paulheim

(2023). Kgreat: A framework to evaluate knowledge graphs via downstream tasks. In Frommholz, I., Hopfgartner, F., Lee, M., Oakes, M., Lalmas, M., Zhang, M., & Santos, R. L. T. (Eds.), Proceedings of the 32nd ACM international conference on information and knowledge management, CIKM 2023, Birmingham, United Kingdom, October 21–25, 2023 (pp. 3938–3942). ACM.

22.

Hitzler

(2021). Semantic web: A Review of the Field. Communications of the ACM, 64(2), 76–83.

23.

Hitzler

Krisnadhi

(2016). On the roles of logical axiomatizations for ontologies. In Hitzler, P., Gangemi, A., Janowicz, K., Krisnadhi, A., & Presutti, V. (Eds.), Ontology engineering with ontology design patterns—Foundations and applications. Volume 25 of Studies on the Semantic Web (pp. 73–80). IOS Press.

24.

Hitzler

Rayan

Zalewski

Norouzi

S. S.

Eberhart

Vasserman

E. Y.

(2023). Deep deductive reasoning is a hard deep learning problem. Under review.

25.

Hogan

Blomqvist

Cochez

d’Amato

de Melo

Gutierrez

Kirrane

Gayo

J. E. L.

Navigli

Neumaier

Ngomo

A. N.

Polleres

Rashid

S. M.

Rula

Schmelzeisen

Sequeda

J. F.

Staab

Zimmermann

(2022). Knowledge Graphs. ACM Computing Surveys, 54(4), 71:1–71:37.

26.

Hubert

Monnin

Brun

Monticolo

(2023). Sem@

k

: Is my knowledge graph embedding model semantic-aware?

27.

Iferroudjene

Charpenay

Zimmermann

(2023). FB15k-CVT: A challenging dataset for knowledge graph embedding models. In NeSy 2023, 17th international workshop on neural-symbolic learning and reasoning (pp. 381–394).

28.

Jain

Kalo

J.-C.

Balke

W.-T.

Krestel

(2021). Do embeddings actually capture knowledge graph semantics? In The semantic web: 18th international conference, ESWC 2021, virtual event, June 6–10, 2021, Proceedings 18 (pp. 143–159). Springer.

29.

Kang

Garcia Garcia

Lijffijt

Santos-Rodríguez

De Bie

(2021). Conditional T-sne: More informative T-sne embeddings. Machine Learning, 110, 2905–2940.

30.

Kejriwal

Knoblock

Szekely

(2021). Knowledge graphs: Fundamentals, techniques, and applications. Adaptive Computation and Machine Learning series. MIT Press.

31.

Knublauch

Kontokostas

(Eds.) (2017). Shapes constraint language (SHACL). W3C Recommendation 20 July 2017. https://www.w3.org/TR/shacl/

32.

Krisnadhi

Janowicz

Hitzler

Arko

R. A.

Carbotte

Chandler

Cheatham

Fils

Finin

T. W.

Jones

M. B.

Karima

Lehnert

K. A.

Mickle

Narock

T. W.

O’Brien

Raymond

Shepherd

Wiebe

(2015). The geolink modular oceanography ontology. In Arenas, M., Corcho, Ó., Simperl, E., Strohmaier, M., d’Aquin, M., Srinivas, K., Groth, P. T., Dumontier, M., Heflin, J., Thirunarayan, K., & Staab, S. (Eds.), The semantic web—ISWC 2015—14th international semantic web conference, Bethlehem, PA, USA, October 11–15, 2015, Proceedings, Part II. Volume 9367 of Lecture Notes in Computer Science (pp. 301–309). Springer.

33.

Krisnadhi

A. A.

(2015). Ontology pattern-based data ontegration. Ph. D. Thesis, Wright State University.

34.

Lin

Liu

Sun

Liu

Zhu

(2015). Learning entity and relation embeddings for knowledge graph completion. Proceedings of the AAAI Conference on Artificial Intelligence, 29(1), 2181–2187.

35.

Lloyd

Liu

Gaunt

T. R.

(2023). Assessing the effects of hyperparameters on knowledge graph embedding quality. Journal of Big Data, 10(1), 59.

36.

Llugiqi

Ekaputra

F. J.

Sabou

(2024). Enhancing machine learning predictions through knowledge graph embeddings. In Besold, T. R., d’Avila Garcez, A., Jiménez-Ruiz, E., Confalonieri, R., Madhyastha, P., & Wagner, B. (Eds.), Neural-symbolic learning and reasoning—18th international conference, NeSy 2024, Barcelona, Spain, September 9–12, 2024, Proceedings, Part I. Volume 14979 of Lecture Notes in Computer Science (pp. 279–295). Springer.

37.

Mansfield

Tamma

Goddard

Coenen

(2021). Capturing expert knowledge for building enterprise SME knowledge graphs. In Proceedings of the 11th knowledge capture conference, K-CAP ’21 (pp. 129–136). Association for Computing Machinery.

38.

Meyer

L.-P.

Stadler

Frey

Radtke

Junghanns

Meissner

Dziwis

Bulert

Martin

(2023). LLM-assisted knowledge graph engineering: Experiments with ChatGPT.

39.

Nickel

Tresp

Kriegel

H.-P.

(2011). A three-way model for collective learning on multi-relational data. In Proceedings of the 28th international conference on international conference on machine kearning, ICML’11 (pp. 809–816). Omnipress.

40.

Noy

N. F.

Gao

Jain

Narayanan

Patterson

Taylor

(2019). Industry-scale Knowledge Graphs: Lessons and Challenges. Communications of the ACM, 62(8), 36–43.

41.

Pellissier Tanon

Vrandečić

Schaffert

Steiner

Pintscher

(2016). From freebase to wikidata: The great migration. In Proceedings of the 25th international conference on world wide web, WWW ’16 (pp. 1419–1428). Republic and Canton of Geneva, CHE. International World Wide Web Conferences Steering Committee.

42.

Poveda-Villalón

Gómez-Pérez

Suárez-Figueroa

M. C.

(2014). Oops! (ontology Pitfall Scanner!): An on-line Tool for Ontology Evaluation. International Journal on Semantic Web and Information Systems, 10(2), 7–34.

43.

Raad

Cruz

(2015). A survey on ontology evaluation methods. In Fred, A. L. N., Dietz, J. L. G., Aveiro, D., Liu, K., & Filipe, J. (Eds.), KEOD 2015—Proceedings of the international conference on knowledge engineering and ontology development, part of the 7th international joint conference on knowledge discovery, knowledge engineering and knowledge management (IC3K 2015), Volume 2, Lisbon, Portugal, November 12–14, 2015 (pp. 179–186). SciTePress.

44.

Rodrı́guez-Doncel

Krisnadhi

A. A.

Hitzler

Cheatham

Karima

Amini

(2015). Pattern-based linked data publication: The linked chess dataset case. In Hartig, O., Sequeda, J. F., & Hogan, A. (Eds.), Proceedings of the 6th international workshop on consuming linked data co-located with 14th international semantic web conference (ISWC 2105), Bethlehem, Pennsylvania, USA, October 12th, 2015. Volume 1426 of CEUR Workshop Proceedings. CEUR-WS.org.

45.

Rossi

Barbosa

Firmani

Matinata

Merialdo

(2021). Knowledge graph embedding for link prediction: A comparative analysis. ACM Transactions on Knowledge Discovery from Data, 15(2), 1–49.

46.

Sammi

Bhushan

Mutharaju

Shimizu

(2026). Ontoinsight—A metric-guided tool for ontology quality evaluation with LLM-powered recommendations. In Bork, D., Lukyanenko, R., Sadiq, S., Bellatreche, L., & Pastor, O. (Eds.), Conceptual modeling (pp. 393–411). Springer Nature Switzerland.

47.

Shimizu

Hammar

Hitzler

(2023). Modular Ontology Modeling. Semantic Web, 14(3), 459–489.

48.

Shimizu

Hirt

Hitzler

(2019). MODL: A modular ontology design library. In Janowicz, K., Krisnadhi, A. A., Poveda-Villalón, M., Hammar, K., & Shimizu, C. (Eds.), Proceedings of the 10th workshop on ontology design and patterns (WOP 2019) co-located with 18th international semantic web conference (ISWC 2019), Auckland, New Zealand, October 27, 2019. Volume 2459 of CEUR Workshop Proceedings (pp. 47–58). CEUR-WS.org.

49.

Shimizu

Hitzler

(2025). Accelerating Knowledge Graph and Ontology Engineering with Large Language Models. Journal of Web Semantics, 85, 100862.

50.

Smith

(1998). The basic tools of formal ontology. In Formal ontology in information systems.

51.

Sun

Deng

Z.-H.

Nie

J.-Y.

Tang

(2019). Rotate: Knowledge graph embedding by relational rotation in complex space. In Proceedings of the 7th international conference on learning representations (ICLR).

52.

Tanon

T. P.

Gerhard Weikum

F. M. S.

(2020). Yago 4: A reason-able knowledge base.

53.

Team

P. P.

(2024). Pykeen: Transd model documentation. https://pykeen.readthedocs.io/en/latest/api/pykeen.models.TransD.html

54.

Trouillon

Welbl

Riedel

Éric

Gaussier

Bouchard

(2016). Complex embeddings for simple link prediction.

55.

Tsaneva

Vasic

Sabou

(2024). LLM-driven ontology evaluation: Verifying ontology restrictions with ChatGPT. In Tiwari, S., Mihindukulasooriya, N., Osborne, F., Kontokostas, D., D’Souza, J., Kejriwal, M., Pellegrino, M. A., Rula, A., Gayo, J. E. L., Cochez, M., & Alam, M. (Eds.), Joint proceedings of the 3rd international workshop on knowledge graph generation from text (TEXT2KG) and data quality meets machine learning and knowledge graphs (DQMLKG) co-located with the extended semantic web conference (ESWC 2024), Hersonissos, Greece, May 26–30, 2024. Volume 3747 of CEUR Workshop Proceedings (p. 15). CEUR-WS.org.

56.

Wang

Mao

Wang

Guo

(2017). Knowledge Graph Embedding: A Survey of Approaches and Applications. IEEE Transactions on Knowledge and Data Engineering, 29(12), 2724–2743.

57.

Yang

tau Yih

Gao

Deng

(2015). Embedding entities and relations for learning and inference in knowledge bases.

58.

Zheng

Song

Tan

Dong

Xiong

Zhang

Karypis

(2020). DGL-KE: Training knowledge graph embeddings at scale. In Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, SIGIR ’20 (pp. 739–748). Association for Computing Machinery.