Abstract
Introduction
The use of linked data models in the humanities and in cultural heritage institutions to structure, store, share and link knowledge on our historical past has seen a marked increase of interest and implementation [19]. Evidence of this is the growing size of Wikimedia Foundation’s collaborative multilingual knowledge graph of Wikidata [2]. Sharing information in this way provides opportunities for increasing its accessibly and find-ability as well as technologies for efficiently integrating and implementing previously unstructured, siloed data, at lightning speed. Despite these affordances, there remains a gap in access between those familiar with Semantic Web principles, who can implement SPARQL queries to explore data, and those new to these technologies. In working to filling this gap, we questioned how can we can generate multimedia stories from data stored in a public knowledge graph.
To tell a story one has to put together diverse information about people, places, time periods, and things. We detail here how a machine, through the power of Semantic Web, can compile scattered and diverse materials and information to construct stories. Through the case of the ERC Starting Grant project “Agents of Change: Women Editors and Socio-Cultural Transformation in Europe, 1710–1920” (acronym WeChangEd), we detail how to move from archive, to a structured data model and relational database, to Linked Open Data on Wikidata, to a Stories Services API powered application to tell machine-readable stories of women editors in Europe. We show that WeChangEd Stories can be an important tool for recounting and sharing the past.
WeChangEd: Women editors in Europe
The WeChangEd research project1
This project is carried out with a team of seven researchers and seven student interns with complementary language skills and methodological expertise in literary studies, the digital humanities and the social sciences. This has resulted in two main outputs: 1) a comprehensive database of women editors and their periodicals; 2) a series of thematic sub-projects (in the form of four doctoral dissertations,2 Mariia Alesina. “Femininity at the Crossroads: Negotiating National and Gender Peripherality in the Russian Fashion Journal Modnyi Magazin (1862–1883).” Unpublished doctoral dissertation, Ghent University, 2020; Bezari, Christina, “‘Restless Agents of Progress’: Female Editorship, Salon Sociability and Modernisation in Spain, Italy, Portugal, and Greece (1860–1920).” Unpublished doctoral dissertation, Ghent University, 2020; D’Eer, Charlotte. “Women Editors in the German-Language Periodical Press (1740–1920): Transnational Emotional Networks.” Unpublished doctoral dissertation, Ghent University, 2020; Forestier, Eloise. “Women Editors Conducting Deliberative Democracy: A Transnational Study of Liberty, Equality, and Justice in Nineteenth-Century Periodicals.” Unpublished doctoral dissertation, Ghent University, 2020. e.g. [21].
To accurately take stock of women editors in Europe over the period 1710–1920, the WeChangEd team developed a data model [16]. Using the collaborative object-oriented relational database-based research environment nodegoat [18] researchers started to collect, organize, and structure this information. This included a unique identifier for each person, organization, and periodical, allowing us to classify information not only on women editors themselves, but also the people in their lives (e.g. partners, colleagues, co-editors, family, and so forth), as well as information on the periodicals they edited, and the organizations that these periodicals may have emerged from, were supported by, or served as official organ for. This resulted in a dataset which includes around 1700 persons, 1600 periodicals, 200 organizations and numerous links between these entities. This data is available as CSV files upon request at [23], but it is also publicly available on Wikidata as we detail below.
Inherent to the WeChangEd project is facilitating future research on these women editors and periodicals and also increasing the knowledge on women in this period in particular, that are often absent from mainstream records on these periodicals given their lacking of fundamental rights that allowed them to hold formal positions as editors. Thus the project team wanted to ensure that the information on these editors would be more easily identifiable and accessible beyond a private database or in a non-digitized document in a library or archive. Thus, there was a need to ensure both the findability of the information on the web as well as the user friendliness in accessing and exploring this information. Consequently the WeChangEd team elected to make this data available on Wikidata and develop stories from the data using the Stories Services API, which resulted in what we detail here further in the WeChangEd Stories App.
Semantic web
The vision of the Semantic Web is an accumulation of interconnected data from heterogeneous sources connected to points of reference for which we define meaning. This entity- and link-based architecture allows for navigation of data from many databases or collections via known points of representation. An organization well-known for cultivating the technologies necessary for the Semantic Web is the World Wide Web Consortium or W3C. In 2009, members of a W3C group stated that: “Semantic Web is the idea of having data on the Web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration, and reuse of data across various applications.” [25]. The Wikidata knowledge base fulfills the requirements outlined by the W3C in that each resource has a unique identifier and is linked to other resources by properties, and that all of the data is machine actionable as well as editable by both humans and machines.
The Wikidata data model
Wikidata is the knowledge base of structured data that anyone can edit [24]. This community-edited knowledge base contains multilingual structured data from many domains, from computing to biodiversity to cultural heritage [4]. A sister project to the Wikipedias, Wikidata is a project of the Wikimedia Foundation. The data in Wikidata is available under the Creative Commons Zero (CC0) license, meaning it is free for anyone to reuse for any purpose.
The data model of Wikidata consists of Items, Properties and unique identifiers. In Fig. 1 we see a screenshot of the Wikidata item for Lady Mary Wortley Montagu, a person that is also in the WeChangEd database. The Wikidata identifier for this item (in the red rectangle) is Q235121. Two properties ‘instance of’ and ‘image’ (in the blue rectangles) are used as the predicates of statements. Three properties ‘stated in’, ‘retrieved’, and ‘reference URL’ are used as qualifiers within the gray reference blocks that provide sourcing information for the claims.

Screenshot of the item for Lady Mary Wortley Montagu in Wikidata.
There are more than 8,000 properties in use in Wikidata at the time of this writing. These properties are used to express statements of fact about items. Aligning data with Wikidata requires selecting properties to express the types of statements you’d like to make about your dataset.
The WeChangEd team partnered with two Wikidata experts in order to align their data with Wikidata, write the data to Wikidata, and create a Wikidata-powered application for visualization of the data. We outline the steps we took to accomplish this data contribution. The first step was to access the data available from the WeChangEd database. It is possible to export data from Nodegoat, the platform used to store the WeChangEd database, in a number of different formats: CSV, ODT, and JSON. Data export in the JSON-LD format is possible as well, but this is undocumented and cumbersome. The data was exported as CSV files and a Python script was used to detect inconsistencies in relation properties (relations in Nodegoat have to be added in both directions and can have different properties). After a few iterations of updating the source data, exporting, and checking, the exported CSV files were reshaped to prepare the data for wikification.
We then used the OpenRefine5
We proposed a new property in Wikidata for WeChangEd to help us identify the items in the dataset and write SPARQL queries related to the data. We proposed the property as an external identifier using Wikidata’s community property proposal process.6
After reconciling the WeChangEd data with Wikidata we then needed to write statements to Wikidata to contribute the data. We used a tool called WikidataIntegrator (WDI) to write the WeChangEd data to Wikidata. WDI is a python library for interacting with data from Wikidata [26]. WDI was created by the Su Lab of Scripps Research Institute and shared under an open-source software license via GitHub.7
We used the WDI library to prepare scripts for the bot to write statements with our selected properties and values from the WeChangEd data. We proposed the bot plan to the community on March 6, 2020 and it was approved on March 21, 2020.8
We used an object-oriented-programming approach with the django-wikidata-api9
We used the EditGroups tool10

Frequency of the number of statements per item in Wikidata for WeChangEd dataset.
We used the wbstack platform to create an instance of Wikibase for testing.11
Aligning the WeChangEd data with Wikidata required finding patterns to express relationships in the original data and mapping them to items and properties from the Wikidata knowledge base. We created a data model using relevant Wikidata properties to represent the data created by the WeChangEd project team. In the WeChangEd data information was collected about the start and end dates of an editor’s involvement with various periodicals. In Wikidata start and end dates are represented as qualifiers to other statements. In this case we created statements on the items for the periodicals using the property ‘editor’ and then applied the ‘start date’ and ‘end date’ qualifiers to those statements.
Once we aligned the data with Wikidata, we were able to create an average of seven statements per item. The chart in Fig. 2 represents the frequency of the number of statements across items with a WeChangEd identifier in July, 2020. Many of these items now have more than seven statements. Some have dozens and a few items already have hundreds of statements. To quickly see the subgraph of all statements added to Wikidata by the WeChangEd integration we consult a SPARQL query for all statements with provenance sourced to the WeChangEd project.13
Donating this data to Wikidata contributed to the diversity and breadth of data in the knowledge base. At the time of this writing, of the set of items in Wikidata that describe people, eighty percent of those items represent males. When WeChangEd completed this donation it resulted in 851 WeChangEd Persons and related bibliographical data, 1687 WeChangEd Periodicals, and 219 WeChangEd Organizations to Wikidata. The majority of these were women. This is a contribution that works to counteract the gender gap in Wikidata [7,8].
The WeChangEd team assembled extensive data about changes in the editors’ names over time. These changes were due to marriage, noble titles, use of pseudonyms, or being referred to by their husband’s names. We added each of these names as aliases in their Wikidata items. This means that end-users of Wikidata who search for these people will now be more likely to find the correct Wikidata item regardless of the version of the name they are using to search. In Table 1 we list the Wikidata properties we used to contribute statements about the people in the WeChangEd dataset.
Table of properties added for people from the WeChangEd dataset
Table of properties added for people from the WeChangEd dataset
Of the 1,687 periodical titles in the WeChangEd dataset, roughly 1,550 of them were new for Wikidata. In August, 2020 there were more than 53,000 periodical titles in Wikidata. The WeChangEd data donation increased the coverage of periodicals from the 1700s, 1800s, and early 1900s. Researchers who consult Wikidata now benefit from readily available information about where these publications were published, who edited them and dates of their runs. In Table 2 we list the Wikidata properties we used to contribute statements about the periodicals in the WeChangEd dataset.
Table of properties added for periodicals from the WeChangEd dataset
Of the 219 organizations in the WeChangEd dataset, roughly 150 were new for Wikidata. Many of these organizations had a mission related to women’s rights. The WeChangEd data donation helps complete gaps in coverage of organizations with specific social functions. In Table 3 we list the Wikidata properties we used to contribute statements about the organizations in the WeChangEd dataset. Creating these new items in Wikidata is a step toward addressing the gaps in information about women and organizations that center women’s voices, we offer this as an example to demonstrate how this can be achieved in the hopes that other researchers will be inspired to consider contributing to Wikidata. Working toward making data coverage more equitable will require the work of many thousands of editors. Making diverse areas of human knowledge available in Wikidata is part of Wikidata’s Development Plan.14
Table of properties added for organizations from the WeChangEd dataset
This project enhanced Wikidata by serving as a model for using software tools to help automate data creation that is practical for humanists. WikidataIntegrator has been used by many groups in the biological and life sciences to import datasets [13,15,26,27]. There are also several projects that center topics relevant to the humanities in relation to Wikidata [5,6,11]. Our use of WikidataIntegrator in the field of literary studies demonstrates workflows that may be useful for other humanists who are not yet familiar with Wikidata.
As part of the WeChangEd project, the research team organized public workshops and lectures so that others interested in learning about Wikidata could gain skills and knowledge [1]. These events were attended by students, researchers, faculty members, and others, no prior experience with these technologies required for participation. By offering these events and framing them as appropriate for newcomers people who do not self-identify as technologists felt comfortable attending. Events like this help spread awareness of Semantic Web technologies to additional audiences.
Once we contributed the WeChangEd data to Wikidata we quickly experienced the benefits of a collaborative knowledge base in the form of error correction and enrichment. There were a small number of cases where the dates of birth and the dates of death for an individual were reversed in the Nodegoat database. These were quickly identified by a Wikidata editor who regularly runs a query to catch items where the death date is chronologically before the birth date. This user messaged us to let us know of the error and also corrected the values in Wikidata.
The WeChangEd team recorded Virtual International Authority File (VIAF) and International Standard Name Identifier (ISNI) identifiers for many of the resources in their dataset. These external identifiers helped us confirm matches with existing Wikidata items. Wikidata is recognized as a hub of external identifiers [10]. External identifiers are a type of property in Wikidata, there are thousands of these properties in use. After aligning the WeChangEd data with Wikidata, we not only knew the VIAF and ISNI identifiers for resources, we also had access to all of the identifiers that were already in Wikidata to describe them. In Fig. 3 we see a graph visualization of the external identifiers for people in the WeChangEd dataset that are currently available from Wikidata. The identifiers are represented by the blue ovals and the people are represented by the white ovals. There are now dozens of external identifiers available for some of these people, more than the two external identifiers from the original WeChangEd dataset. The process of aligning the WeChangEd data with Wikidata expanded our ability to quickly find pathways to additional sources of information about these editors. Researchers interested in these editors may be inspired to consult other repositories and databases beyond Wikidata to learn more about these people. Such research is facilitated by being able to quickly find the information thanks to the identifiers listed on their Wikidata item.

Graph of external identifiers for people in the WeChangEd dataset available in Wikidata in July, 2020.
For resources in the WeChangEd dataset for which we had references to scholarly works, we were able to connect the statements to their supporting publications as seen in Fig. 4. Each of these publications have their own Wikidata item, thus metadata about the publications is available for consultation and reuse. Metadata about these publications is used in the WeChangEd stories application.15

Usage of the ‘described by source’ property.
After contributing the WeChangEd data to Wikidata we were able to write SPARQL queries to ask questions of this data in combination with other data in the knowledge base. We wrote a query to return items that have WeChangEd Id and image.16
We also wrote a query for items from the WeChangEd dataset that have something named after them.17
In August, 2020 there were more than 550 works that have a resource from the WeChangEd dataset listed as a main subject.18
Contributing the WeChangEd data to Wikidata allows us to use tools in the Wikimedia ecosystem to interact with this data. We can now use the MediaWiki API, the Wikidata Query Service, and these tools allow us to get the data out in a range of formats.
Thanks to the many options for getting data out of Wikidata, we were able to create an application for the display of the WeChangEd dataset. This application fetches data from Wikidata and presents it in the format of an interactive website. The web application allows users to explore the WeChangEd dataset.19

Relevant people moment from Lady Mary Wortley Montagu’s story.
The WeChangEd Stories application centers the lives of the editors in the WeChangEd dataset. Stories are generated automatically based on the Wikidata identifier. This means that the WeChangEd team did not have to design or arrange or organize any data. Once the WeChangEd data had been written to Wikidata the stories had elements from their dataset in combination with additional data already found in Wikidata. Images invite human attention and engagement. We designed the stories to showcase images so that people get a sense of the contextual details of an editor’s life. Exploring different aspects of a person’s work and relationships helps users relate to a person, and holds attention longer than a collection of facts as presented as text.
For each editor, the application presents a series of moments highlighting the organizations, places, and other people with which the person is associated. There is a moment that displays all publications that the person edited, authored, or of which the person is a main subject. The moments can include video or images of the person. Images were not part of the original WeChangEd dataset, but thanks to Wikimedia Commons, many images of editors as well as some of the periodicals they edited are available for reuse in the application. In Fig. 5, we see a screen capture of one of the moments in the story for Lady Mary Wortley Montagu showing people significant to her life. These connections were drawn from statements on the Lady Mary Wortley Montagu item in Wikidata as well as other Wikidata items that reference her item. In this way we use information in Wikidata such as images, and descriptions of the type of connection that help add context to Lady Mary’s story.
Lady Mary Wortley Montagu’s story20
We used Python as the primary programming language for the backend of the application. We used the Django framework for the Stories Service layer. For storing configurations about the presentation metadata of each story we used a Postgresql database. For added performance in our API, we use Redis to cache Wikidata SPARQL, query results used throughout the application. To offload long-running processes we have a Celery server for queuing complex SPARQL queries, and use API polling in the frontend once the tasks have been executed.
The Stories Services team maintains a package for working with Wikidata data in a Django application.21
There are two primary functions of our frontend: 1) rendering a story and 2) managing a the presentation of a story. For Story rendering, we developed react-stories-api,23
For managing the collection and story presentation, we developed a Publisher Workspace to serve as the visual frontend of API operations. While the Stories Services API layer powers the data in the WeChangEd website, the Publisher Workspace is where admin users can rearrange the ordering of moments, modify the story metadata itself, and most importantly, enhance the stories with curated content such as images, videos, and links found outside of Wikidata. The Publisher Workspace is built using React.js and has react-stories-api as a core dependency to provide publishers real-time previewing of their story selections using the same presentation library as their hosted site.
Syndicating data from public knowledge graphs as part of our research projects allows us to connect data to webs of other data [19]. All items with a WeChangEd ID are now connected by properties to other items in the Wikidata knowledge base. The SPARQL query language allows us to ask questions that make use of any of these connections. Rather than maintaining a database silo that only the original research team can consult, we have contributed the WeChangEd data to an international, public database that anyone with access to the internet may consult. More people can now share in the knowledge produced by this research team. In Fig. 6, we see a screen capture from Lady Mary’s Story showing some of the external identifiers that provide readers additional pathways to information about this person in datasets beyond Wikidata. This moment is powered by the external identifier properties in Wikidata that have been added to Lady Mary’s item.

Screen capture from Lady Mary’s Story showing some of the external identifiers that provide readers additional pathways to information about this person in datasets beyond Wikidata.
As more researchers decide to publish their data to Wikidata, data from teams working on a diversity of topics may be added. Topical coverage in Wikidata is currently uneven [9]. Our hope is that demonstrating the value of contributing data to Wikidata to audiences, such as researchers in the Humanities, researchers will help address the gaps in topical coverage. Providing concrete examples of strategies we have tried to engage with Wikidata is a way to promote the option of contributing data to Wikidata.
Researchers and technologists have created many visualization options for data sourced from public knowledge graphs. Once data is published in a public knowledge graph, we can reuse frameworks and packages for presenting visualizations of this data for a variety of purposes. For example, the Wikidata Query Service provides support for creating graphs, charts, bubble diagrams, network graphs, and image grids from a menu in the user interface for the SPARQL endpoint. Researchers do not have to create visualizations using other software tools, they can select the options they would like directly from Wikidata Query Service [12]. This allows researchers who are new to data visualization to access additional formats when communicating their results.
Publishing data in a public knowledge graph allows us to communicate with a wider audience. Anyone searching Wikidata itself, or any project reusing Wikidata data, will find references to the WeChangEd project on any of the statements the project team contributed. This audience is wide and still growing, as the data will persist for future searches for years to come.
From the perspective of digital preservation, donating the WeChangEd dataset to Wikidata will result in greater longevity of the data. A project-team based preservation strategy is more costly and time-intensive to maintain because the team would have to provide server space for the database and train people to maintain the data. Wikidata will be maintained for the duration of the Wikimedia Foundation itself.
This project is an example of an interactive application built to showcase a specific view of a Wikidata subgraph. The subgraph highlighted is the set of resources with a WeChangEd identifier.
Archival work in retracing people of the past is often a laborious and time intensive task, where the discovering of related items is not always immediate. Leveraging Semantic Web technologies in the WeChangEd project allowed us to have a wider audience for our work and to develop an interactive, user-friendly presentation formats for our research to tell stories from data stored in Wikidata. The model described here from how to move from archive, to a structured data model and relational database, to publishing this information via Wikidata, to the use of the Stories Services API to generate multimedia stories can be an important tool for recounting and sharing the past.
Publishing this data in a public knowledge graph, ensures that this information can be integrated and accessible in the growing knowledge base of the Semantic Web and more specifically Wikidata; compared to what is often traditionally done in the Humanities of locally archiving databases and or making CSV files available on websites or upon request which creates silos of knowledge. Anyone who consults Wikidata now has access to the dataset created by the WeChangEd team about women editors in Europe and the organizations and periodicals with which they were affiliated. This dataset is woven into the Wikidata knowledge base via property relationships with other types of resources. Thanks to Wikidata we now know the geo-coordinates of all of the locations in the WeChangEd dataset. These affordances for both Wikidata and the research project were outlined in detail in Section 6.
As more humanists, social scientists, and other researchers choose to contribute their data to Wikidata we will all benefit. As researchers add data, the breadth and complexity of the questions we can ask about the data we have contributed will increase. As more data is curated in Wikidata additional relationships between resources are created through the use of properties. Relationships between editors in the dataset and other people described in Wikidata will continue to grow over time. The sets of information connected to the WeChangEd dataset will be enriched and expanded over time, making this an evolving dataset.
A research team partnering with Wikidata experts was an effective collaboration. This partnership allowed us all to learn from one another. The Wikidata experts relied on the WeChangEd team for domain knowledge and feedback on data modeling decisions. The WeChangEd team was able to continue their normal research activities without having to learn every detail of how to work with Wikidata.
In addition, the WeChangEd Stories App, developed using the Stories Services API, to merge this distributed data, afforded an unique, interactive and user friendly way to view data sourced from a knowledge graph on women editors. It also provided a new entrance point into the data and discovering the diverse and distributed material on the Web of women editors specifically. It showcases a new way of telling stories from machine readable information, in a visual appealing and more accessible manner than writing SPARQL queries. Building applications that syndicate data from Wikidata allows us to leverage a general purpose knowledge graph with a growing number of references back to scholarly literature. Using frameworks developed by the Wikidata community allows us to rapidly provision interactive sites that will help us engage new audiences.
