Abstract
Keywords
The long awaited FAIR digital object
In a world as digitally connected as ours, it is sometimes difficult to remember that computers were not always able to communicate with each other. In fact, this feature that seems so obvious now and that we take as a given, was at one point an engineering grand challenge. The solution to this problem, recounted below, is part of long-term trends toward information exchange and data interoperability. It is instructive to revisit this “ancient history” as it holds many parallels for the immediate future of academic publishing and scholarship in general.
In 2019, the former program director of the NSFnet, George Strawn, reminded us of the 50th anniversary of the Internet [1]. By this, Strawn meant that version one of the ARPAnet had begun operation in 1969. Only four years later, the core technology that makes “interoperable networks” possible was invented by Robert Kahn and Vint Cerf. The engineering challenge at that time was to somehow interconnect early computer networks which had little standardisation in software or hardware. How could numerous networks be connected without having to create all pair-wise adaptors, or forcing every network to rebuild from scratch according to a single standard? Although it is today a problem hardly worth thinking about, it had by then become a serious research program in the Defense Advanced Research Projects Agency and culminated with version two of the ARPAnet (1983–1989). The brilliant solution was the Transmission Control Protocol/Internet Protocol (TCP/IP), a common and minimal standard that all networks could connect to with relatively minor modifications. In this way, network interoperability could be established among a diverse assemblage of home-grown technologies. Very different implementations (e.g., networks made of copper wires versus networks made of microwaves) could, with TCP/IP, seamlessly interconnect, behaving as a single, larger, virtual network. The approach proved to be economical and technically scalable (reaching hundreds of nodes) with benefits that would come to surprise even the engineers who built it.
For the next 10 years (1985–1995) Strawn had various responsibilities for the NSFnet, a federally funded project to extend the TCP/IP network to the computer networks at US Colleges and Universities and linking them to emerging supercomputer centers (constituting by then, already thousands of nodes). Curiously, Strawn discovered in 1991 that this US government funded academic research network had apparently become so useful that it was dominated by private industry members. It was then agreed in 1992 that the NSFnet, top-heavy with commerce, would better be managed by the private sector (accomplished in 1995), giving birth to what we now know as the Internet (soon thereafter reaching millions, then billions of nodes). Today TCP/IP continues to route our daily, and now global, data exchanges, from web browsing to music streaming, to social medial and emails to video calls. In 2004, Kahn and Cerf would win the highest honor in computer science, the Turing Award for their accomplishments [2].
Remarkably, in 1995, the same year that the commercial Internet was born, Kahn along with Robert Wilensky articulated another seminal vision:
Kahn and Wilensky proposed a new infrastructure that would be “open in its architecture and which supports a large and extensible class of distributed digital information services”. These services would act on the “basic entities to be found in such a system, in which information in the form of digital objects is stored, accessed, disseminated, and managed”. In essence, Kahn and Wilensky were proposing for information, what Kahn and Cerf had accomplished for networks: a technology solution that would interconnect, in a decentralized manner, data and services into a seamless virtual database. The digital object would be for data interoperation, what TCP/IP had become for network interoperation.
The vision behind this Digital Object Architecture percolated among technical experts for decades [4]. During that time, retrieval and access solutions for specialised problems permitted the discussion around the Digital Object Architecture to remain exploratory and theoretical. However, by the early 21st Century, the information overload anticipated by Kahn and Wilensky had begun to manifest and this set into motion a renewed interest in how data might be more automatically interconnected.
As early as 2010, biologists already feeling the weight of data overload, and convinced of the value of the semantic web and linked data, proposed a minimal schema (and its representation in the machine-readable Resource Description Framework) to make individual subject-predicate-object combinations (i.e., semantic assertions) become stand-alone publications, complete with provenance and Globally Unique, Persistent and Resolvable Identifiers (GUPRIs). Biological data such as gene-disease associations or protein-protein interactions, increasingly produced in large volumes in automated laboratories, could thus be more effectively captured, exchanged, and reused. In turn, authors could receive fine-grained credit for individual datum, and ambiguities that plague free-text and ordinary spreadsheet formats could be altogether avoided. This approach was named
In another example where data overload elicited renewed interest in machine-actionability, a Lorentz Center workshop was held in Leiden, in 2014, entitled
The commentary and the FAIR Principles were immediately well received and embraced by the international stakeholder community, especially among policy makers, funders, and publishers. Currently the commentary is cited on average 7 times per day and according to Google Scholar, has by now accumulated over 11,000 citations.
Starting in 2020, the first discussions arose between communities interested in translating the FAIR Principles into practical implementations. This included GO FAIR [15] and communities that had ongoing discussions around digital objects supported primarily in the Research Data Alliance [16,17]. It became clear that the progress being made in FAIR seemed to have something critical to offer to the ongoing discussions around digital objects [18–20]. By October 2022, the First International Conference on FAIR Digital Objects was held in Leiden with keynotes provided by both Robert Kahn and George Strawn [21,22].
In its current form a FAIR Digital Object (FDO), is a self-describing, machine-actionable unit of (digital) information. It tells machine agents “What it is, what can be done with it, and what users are allowed to do with it”. In keeping with the FAIR Principles, all the various components of the FDO have GUPRIs which makes them visible to machines, the first step towards their automated interpretation. The essence of the FDO is a minimal (and hopefully soon standardized) metadata record, with two essential elements: a type of description of the object and the object location. Each type of FDO has a metadata schema and vocabulary appropriate for it. FDOs can be thought of as minimal, standardized, machine-readable metadata tags that can be used to describe any resource (including non-digital entities such as, for example, butterfly specimens in a museum). With a minimal system of standardised self-description, it becomes possible to then build services that can operationalize the F, A, I and R functions, much as Kahn and Wilensky envisioned in 1995. The present grand challenge in the FDO community is to define the technical specifications around the minimal standards and build performant services that run FDOs [23].
Curiously however, in the First International Conference on FAIR Digital Objects mentioned above, collaborative work was reported at the GO FAIR Foundation that suggested a nanopublication assertion, when constructed in a particular manner, would be a close approximation to, if not be conformant with, the emerging FDO specification [24,25]. Specifically, nanopublications with assertions that explicitly give resource types and locations, along with the GUPRI services provided on the existing Nanopublication Server Network, already provide an example of the open architecture supporting: “a large and extensible class of distributed digital information services” envisioned in the Digital Object Architecture. Although simple and intriguing in hindsight, the idea that a certain class of nanopublications might be instances of FDOs came as a sudden surprise during very practical work using nanopublications to represent community-specific FAIR Implementation Profiles [26]. If the close correspondence of the nanopublication and FDO specifications is shown to hold, then we come to the remarkable conclusion that the community has already been “doing” FDOs for most of the last decade and that nanopublications are currently among the most technologically mature and widely used examples of FDOs.
APIN and FAIR Connect
Given the convergent technology trends between FDOs and nanopublications, the academic publishers broadly construed, be they society publishers, preprint platforms or commercial houses, were called to action by GO FAIR in 2020, to develop and promote best practices for “publishing for machines”. The initiative was called the Academic Publishers Implementation Network (APIN), and it proposed the collective, pro-active formulation of protocols and standards that would publish all scientific research material, from data and code to the narrative text, as FDOs [27]. Of course, in lieu of finalised, and widely endorsed specifications for FDOs, APIN would proceed with its own version of a minimal, open standard. This bottom-up approach taken by APIN is inspired by developments in the early Internet where researchers and engineers had been encouraged to make progress by the mantra “rough consensus, running code”.
By September 2022, IOS Press and the GO FAIR Foundation took the first concrete steps in response to the APIN call, and founded
Realizing that data stewards around the world often confront common problems that are typically solved in isolation and without knowledge of previously created solutions, FAIR Connect aims to publish their creations as nanopublication-based FDOs. More specifically, in FAIR Connect, authors create “articles” by filling out nanopublication templates that help structure and focus the content to guarantee machine-readability [32–35]. Once this information is published as nanopublication-based FDOs, it can be automatically transcribed into human-readable prose in any human language.
FAIR Connect articles provide descriptions of so-called FAIR Supporting Resources (FSRs) which are explicitly defined in a FAIR ontology [36]. FAIR Connect FSR types are limited in number and include FAIR Data Policies; FAIR Data Stewardship Plan Templates; FAIR Implementation Profiles; FAIR Enabling Resources (having 12 sub-types); Data Steward Professional Profiles; FAIR Data Stewardship Events; FAIR Practices; FAIR Supporting Services; and formalized, short-form articles describing published FSRs. As FDOs, FSRs enjoy fine-grained search, explicit access points, zero ambiguity and automated interoperation. Hence, in FAIR Connect, Sage/IOS Press has accepted the primary challenge of APIN “to ‘publish for machines’ in alignment with the FAIR principles”.
Although the FAIR Connect platform is intended to offer streamlined capability to publish and retrieve FSR nanopublications, the content is itself open and available to anyone via the Nanopublication Server Network. Indeed, as has always been the case, any organisation is welcomed to launch and run services on the open and decentralized Nanopublication Server Network. While others are encouraged to also build dedicated search capabilities fit to (their) purpose, FAIR Connect nonetheless aspires to diamond open access (free to publish, free to read) via the custodianship of the FAIR Connect Foundation.
The benefits of the FDO approach are immediately apparent. First, data stewards practicing their craft have a near-real time communication platform for exchanging resources that they create and manage. This will mitigate the rampant and needless “reinvention of the wheel”. This could lead to significant cost savings and accelerated resource FAIRification worldwide. Second, the reuse of FSRs can be tracked in the emerging collection of domain-specific FAIR Implementation Profiles, themselves represented as nanopublications. This allows data stewards publishing at FAIR Connect to also receive recognition and credit for their contributions that would otherwise be invisible. Third, FSR nanopublications created in FAIR Connect are subject to a rapid and lightweight process of editorially controlled peer-review that is itself mediated by nanopublications [37]. Part of this review process includes the qualification and endorsement of FSRs according to explicit criteria that may be set forth by any third-party. These qualifications are issued, again as nanopublication-based FDOs and can be used in the search for FSRs having desired qualifications. These qualifications may come from recognised expert communities, funding organizations or publishers and will help drive FAIR convergence to high-quality and widely used FAIR implementations.
Taken together, FAIR Connect extends a long-term trend in information technology towards increasing interoperability of information, and leverages the recent developments in the FAIR Principles, FAIR Digital Objects and the Nanopublication Server Network to create a fully machine-actionable academic publication platform supporting the proceedings of professional data stewardship. As an APIN initiative, the FAIR Connect platform itself becomes an exemplar of lightweight and open tools that can help academic publishers to realistically transition to FDOs.
