Abstract
Curiosity is an intrinsically motivated search for information. It is enduring and open-ended, and may have evolved to help humans build accurate mental representations of our ever-changing environments. Due to the significant role that curiosity plays in our lives, several theoretical constructs have sought to explain how we engage in its practice. Yet, quantitative validation of these accounts has remained elusive due to the fundamental challenge of constructing formal models of mental representations of knowledge. We overcome this challenge by conceptualizing curiosity as the process of building a growing knowledge network. We find that different theoretical accounts may be explanatory in different contexts, thereby offering a pluralistic view of curiosity.Significance statement
Introduction
Humans must manage uncertainty and embrace change to thrive in a complex and dynamic environment (Gottlieb et al., 2013). To this end, we continually consume information to construct and maintain accurate mental models of the world (Johnson-Laird, 2010; Valadao et al., 2015). Information-seeking behavior may be driven by a variety of intrinsic and extrinsic factors. Arising from the latter, information acquisition is an intermediate step towards attaining a specific goal—such as increased wealth or social recognition—that is ultimately rewarding (Dweck, 1986). By contrast, the intrinsic motivation to seek information is commonly conceptualized as curiosity (Gottlieb et al., 2013; Kidd and Hayden, 2015; Loewenstein, 1994).
Given the significant role that it plays in our daily behavior and decision-making, several theories have sought to explain how individuals practice curiosity. The
One such model that has shown promise is a network model where knowledge is composed of discrete and yet interconnected concepts (Chrastil and Warren, 2014, 2015; Ericson and Warren, 2020; Peer et al., 2021; Schapiro et al., 2016; Stiso et al., 2022; Warren et al., 2017). In graph learning studies, volunteers are shown sequences of images on a screen, where, unbeknownst to the volunteers, each image corresponds to a node in an underlying network (Lynn and Bassett, 2020). Based solely on observed transitions, and despite being unaware of the underlying network’s structure, participants successfully infer statistical regularities from the temporal order in which images appear (Garvert et al., 2017; Kahn et al., 2018; Schapiro et al., 2013; Schapiro et al., 2016; Tompson et al., 2019). Crucially, the structure of the pre-defined experimental graph can be recovered from neural activity by decoding simultaneously acquired functional magnetic resonance imaging (fMRI) data (Garvert et al., 2017; Tompson et al., 2020). The sequential manner in which stimuli are presented in graph learning tasks can be conceived of as a walk prescribed by the experimenter in a limited knowledge space of objects, images, concepts, or movements. Curiosity, too, can be conceived of as a walk, but one that is largely self-directed and purposeful across the vast landscape of knowledge. To evaluate curious walks, recent work gathered browsing histories from individuals who freely explored the online encyclopedia Wikipedia. Structural features of the knowledge networks that participants walked upon (Figure 1(a)) were found to be associated with curiosity, as measured by an independent index of participants’ sensitivity to information deprivation (Lydon-Staley et al., 2021). Connectional approach to curiosity. (a) A participant constructs a growing knowledge network through curiosity-driven self-directed exploration of Wikipedia, a vast networked landscape of information. Nodes represent unique Wikipedia pages. Edges represent hyperlinks between nodes. Nodes are colored to denote the order in which they are visited. (b) Gaps in a knowledge network can be formalized using algebraic topology and tracked in several topological dimensions. The green and blue gaps represent a 0-dimensional and 1-dimensional cavity, respectively. (c) Compression progress aims to construct internal representations of the world that are both storage efficient and generalizable. In a knowledge network, all concepts that belong to the same cluster can be represented parsimoniously at a higher level of abstraction using their cluster identity. The unclustered network has 9 nodes and 12 edges, while the clustered network only has 3 nodes and 3 edges. (d) A mechanical network can possess several spatial configurations, any of which can be arrived at from any of the others through a series of conformational changes. We formalize and measure knowledge network flexibility as the number of available conformational degrees of freedom.
Here, we leverage this framework and cast curiosity as a network building process. This approach allows us to take qualitative explanations for information-seeking behavior, such as the information gap theory and compression progress theory, and operationalize them in quantitative statistics. Information gap theory posits that humans add information to regulate uncertainty by filling gaps (Gottlieb et al., 2013; Kidd and Hayden, 2015; Loewenstein, 1994). This theory can be operationalized by treating gaps in networks as topological cavities, and by tracking their evolution using techniques from applied algebraic topology (Bianconi, 2021; Ghrist, 2007; Hatcher, 2002) (Figure 1(b)). In contrast, compression progress theory posits that humans subtract or discard information (Lynn and Bassett, 2021; Schmidhuber, 2008; Zhou et al., 2020) due to limited cognitive capacity (Shiffrin and Schneider, 1977; Zhou et al., 2020, 2020). Compressing a network while maintaining meaningful latent structure requires that we discard some irrelevant information while maintaining important information about past experiences and present priorities (Lynn et al., 2020; Lynn and Bassett, 2021; Momennejad, 2020; Zhou et al., 2020a). This theory can be operationalized by measuring the compressibility of a network, an information-theoretic quantity that captures the ability of a network to be compressed (Lynn and Bassett, 2021) (Figure 1(c)). Via the network operationalization of these two theories, we come to see that curiosity is marked as a process by which networks of knowledge densify and simplify, raising the question of what alternative process might drive them to sprawl and become complex.
To address this question, we expand beyond historical accounts to operationalize our own conformational change theory of curiosity. The conformational change theory suggests that information-seeking behavior results in the creation of expansive knowledge networks (Zurn et al., 2021) embedded in a conceptual geometry (Figure 1(d)). The notion of a conceptual geometry is motivated by prior studies of neural population geometry and the fact that information can be embedded and processed in locally Euclidean geometric representations to solve complex tasks (Chung and Abbott, 2021). The geometry provides a key affordance for curiosity—conceptual flexibility—as the knowledge network can mechanically conform into different shapes. While some concepts are separated from other concepts by fixed distances of shared-versus-unshared meaning, other concept pairs can move closer together or farther apart as inter-concept relations shorten or lengthen depending on time and context (Kim et al., 2019). This flexibility allows us to draw from past experience, cohere the past with newly learned information, monitor conflict, and respond appropriately in different contexts (Botvinick and Braver, 2015; Karuza et al., 2016; Tenenbaum et al., 2011); it may also subserve the unexpected conceptual combinations that accompany imaginative thought and support serendipitous discoveries (Copeland, 2019; McAllister et al., 2012). Mechanically akin to conformational change in proteins, the flexible reshaping of the knowledge network can only occur if concepts are sparsely connected; densely connected linkage networks embedded in a Euclidean geometry are rigid. Hence, the conformational change theory of curiosity posits a drive for conceptual flexibility that leads networks to sprawl and become complex.
As is now evident, each of the three theories is motivated by a distinct and uniquely important psychological drive: to reduce uncertainty by learning a missing piece of information, to discover latent patterns by distilling fundamental epistemic elements, and to reshape information by flexibly reconfiguring knowledge networks. Here, we test each theory through parallel analyses of the growth of individual and collective knowledge networks derived from Wikipedia. At the individual scale, we construct knowledge networks for 149 individuals using their Wikipedia browsing histories (Lydon-Staley et al., 2021) (Figure 1(a)). At the collective scale, we extract Wikipedia networks to assess knowledge growth in 30 disciplines such as calculus, economics, and linguistics (Ju et al., 2020). We treat Wikipedia pages as nodes in both sets of networks and add edges between them according to the presence of hyperlinks between pages. For the data on individuals, we specify network growth using the order in which individuals visit pages; for the data on collectives, we use the years in which different concepts originate. To model the random growth of knowledge in both data sets, we create 25 degree-preserving edge-rewired versions of each network. We test the predictions of the three theories by comparing measurements of relevant features from empirically observed knowledge networks to those from the related null networks. First, considering the information gap theory, we expect to find fewer-than-chance topological cavities in growing empirical knowledge networks due to the hypothesized drive to close knowledge gaps when they are perceived. Considering compression progress theory, we hypothesize that growing knowledge networks will exhibit greater-than-chance compressibility due to the hypothesized drive to distill fundamental epistemic elements (Zhou et al., 2020a). Third, considering conformational change theory (Zurn et al., 2021), we hypothesize that knowledge networks will possess greater-than-chance capacity for conformational changes due to the hypothesized drive for conceptual flexibility. In testing these hypotheses, we demonstrate the utility of the network approach in quantitatively validating existing theoretical constructs of curiosity as well as in formulating new ones.
Results
Network growth formalism
Before testing the predictions of the three theories, we clarify the network formalism upon which they are operationalized. Consider a graph
Information gap theory
The information gap theory posits that curiosity is the drive to collect units of knowledge that fill gaps in one’s internal representation of the world (Loewenstein, 1994). When we model internal representations as networks, missing information can be usefully operationalized as topological cavities, which can be tracked in a principled manner using tools from applied algebraic topology (see Sec. Probing information gaps as topological cavities in growing knowledge networks. We operationalize information gaps as topological cavities (also referred to as 
Considering information gap theory, we hypothesized that empirical knowledge networks would contain fewer cavities than topologically similar edge-rewired null model networks. To test this hypothesis, we compute persistent homology for filtrations of individual and collective knowledge networks in dimensions 0, 1, and 2. We find that the number of 0-cycles, or disconnected network components, increases as individual knowledge networks grow, and does so at a steeper rate in null networks than in empirical networks (Figure 2(b)). For collective knowledge networks, we find that the number of disconnected components first increases and then decreases both in the empirical and in the null model networks, albeit with significantly different peak values (Figure 2(c)). In both data sets, for a significant duration of growth, Betti curves for observed networks are lower than those for null model networks. In dimensions 1 and 2, we find that the number of cycles increases as individual and collective knowledge networks grow (Figure 2(d)–(i)). This temporal trajectory could arise from the fact that filling gaps by forging new connections can open new gaps, making it prohibitively difficult to track (and fill) gaps among an increasingly large number of items. In support of information gap theory, the rate at which 1-cycles increase is lower for the empirical networks than for the null networks (Figure 2(e) and (f)). In contrast to information gap theory, the rate at which 2-cycles increase is higher in the empirical networks than in the null networks (Figure 2(h) and (i)). The marked growth of 2-dimensional cavities could reflect an alternative drive to expand and complexify knowledge networks. All empirical Betti curves are significantly different from the Betti curves for the null model data (
Compression progress theory
Originally proposed as a general algorithmic framework for reinforcement learning, compression progress theory posits that curiosity is the drive to continually improve the compression of a learner’s mental model of the world (Schmidhuber, 2008). By conceptualizing mental models as knowledge networks, we can measure compressibility using recent advances at the intersection of information theory and network science (Lynn and Bassett, 2021). To compute the compressibility of a network, we begin by considering a random walk Quantifying compression progress using network compressibility. (a) A random walk 
Considering compression progress theory, we hypothesized that growing knowledge networks would be more compressible than topologically similar edge-rewired null model networks. We test our hypothesis by computing network compressibility for each subgraph in filtrations of individual and collective knowledge networks. We find that compressibility increases monotonically as knowledge networks grow. At all stages of growth, and in support of our hypothesis, networks for individuals exhibit greater-than-expected compressibility (Figure 3(c)). This same trend holds, but to a much weaker extent in the collective knowledge networks. While the early stages of growth evince greater separation between empirical and null compressibility values, the two curves overlap in later stages of growth (Figure 3(d)). Based on non-parametric permutation testing, compressibility curves for individual and collective knowledge networks are significantly different from their null model counterparts (
Conformational change theory
A curious learner practising curiosity solely according to information gap theory strives for growth and completeness of knowledge. By contrast, a learner practising curiosity solely according to compression progress theory strives to uncover the latent organization of the world. In the process, neither individual can keep pace with the growing complexity of the environment; with a rapidly expanding frontier of ignorance as new unknowns become accessible. Crucially, both theories suggest how we can usefully add or relinquish information but neither acknowledges the worth of what we already possess. Prior work has shown that curiosity-driven information acquisition is not only about growing or shedding knowledge, but also about retreading and reconsidering what one presently holds (Lydon-Staley et al., 2021; Zhou et al., 2020a). Following Zurn et al. (2021), we propose that such reflection entails moving concepts flexibly in relation to one other. Specifically, we define curiosity as the process of constructing knowledge networks with a finely arbitrated balance between local internal rigidity and global external flexibility. Rigidity and flexibility are mechanical notions that require an object of interest to be embedded in physical space. Therefore, drawing inspiration from a rich literature on cognitive maps (see Supplementary Materials for background), we assume that knowledge networks are embedded in Euclidean space where they possess several degrees of freedom. We then measure flexibility as a network’s ability to undergo conformational changes (Kim et al., 2019) and formalize our account as the
Before measuring the conformational flexibility of growing knowledge networks, we offer a brief introduction to mechanical networks. Consider a triangular network in two dimensions. Each of its nodes can be located with two coordinates (Figure 4(a)). This network has three available rigid-body motions: horizontal translation, vertical translation, and rotation. Next, consider a network comprised of 4 nodes and 4 edges (Figure 4(b)). This network possesses the same rigid-body motions as are available to the triangle. Additionally, the quadrilateral possesses a conformational degree of freedom. A conformational change in a network alters the Euclidean distance between unconnected pairs of nodes. For instance, if a pair of adjacent nodes in the quadrilateral is held fixed in space, the remaining nodes can be moved freely while sweeping across an angle Conformational change in mechanical networks. (a) In two-dimensional space, a network with three nodes and three edges has three rigid-body degrees of freedom: horizontal translation, vertical translation, and rotation. (b) In addition to the three rigid-body motions, a quadrilateral frame also possesses a

Crucially, equation (2) relies on the linear independence of edges. Linear independence entails that there are no redundant edges to over-constrain a set of nodes beyond the formation of a rigid cluster yielding a state of self-stress, and that the network does not exist in a rare and pathological geometry known as a kinematic bifurcation (Kim et al., 2019; Mao and Lubensky, 2018). States of self-stress imply that edges within a network bear internally balanced forces. A negative value for the number of conformational degrees of freedom would indicate that the network—when considered in its entirety—is over-constrained. In our framework, we assume that such states, wherein competing constraints between concepts cannot be resolved, are aversive to humans. This assumption could be tested in future work by correlating individual-level measurements of self-stress with the Need For Gap Closure (NFCS) scale (Roets and Van Hiel, 2007; Webster and Kruglanski, 1994). We alleviate the tension by incrementing the dimensionality by 1 when needed. Specifically, we increment Conformational change theory of curiosity. We propose that in the networked space of the mind, while some concepts and their relationships have fixed locations, others can move flexibly in a context-dependent manner. Such flexibility affords curious humans the ability to rethink and reconfigure what they already know in light of new information. We formalize flexibility as the number of conformational degrees of freedom (
We hypothesized that knowledge networks would possess greater conformational flexibility than corresponding null model networks. We test this hypothesis by computing the number of conformational degrees of freedom in filtrations of individual and collective knowledge networks (Figure 5(c) and (e)). In parallel, we track the minimum embedding dimensionality required to prevent self-stress from developing in the growing networks (Figure 5(b) and (d)). We find that individual knowledge networks need greater dimensionality and possess greater conformational flexibility than null model networks (Figure 5(b) and (c)). By contrast, measurements of dimensionality and flexibility for collective networks cannot be as easily distinguished from their corresponding null model data (Figure 5(d) and (e)). However, for both data sets, the empirical curves for dimensionality and conformational flexibility are significantly different from the curves for the null model data (
Discussion
In this work, we formalize curiosity as the process of constructing a growing knowledge network. We leverage tools from network science to quantitatively examine several theoretical constructs for curiosity such as the information gap theory and compression progress theory. Information gap theory suggests that curiosity is the drive to obtain units of knowledge that fill gaps in understanding (Loewenstein, 1994). Compression progress theory posits that curiosity is the drive to uncover the latent organization of the world (Schmidhuber, 2008). We probe information gaps as topological cavities in growing knowledge networks and quantify compression progress using network compressibility. The two theories offer complementary perspectives on curiosity; the information gap theory suggests that new information is acquired to fill knowledge gaps, whereas the compression progress theory suggests that new information is used to distill the essential epistemic elements of knowledge. While these perspectives describe how knowledge networks become denser and simpler through information acquisition, an alternative formulation is needed to explain how they become expansive and more complex. Therefore, we build upon a recently proposed conceptual framework (Zurn et al., 2021) to develop the conformational change theory of curiosity. We posit that knowledge networks are embedded in a Euclidean geometry, which allows concepts to move flexibly in relation to one another. We then view curiosity as the practice of constructing mechanically flexible knowledge networks. Formally, we measure conceptual flexibility as the number of conformational degrees of freedom available to a growing knowledge network. Throughout our investigations, we take a multi-scale view and probe evidence for each theory in individuals and in collectives. Across the two scales, we determine the precise contexts in which each theoretical account is explanatory, thereby clarifying their complementary and specific affordances.
Information gap theory and topological cavities in knowledge networks
Information gap theory suggests that humans tolerate a finite amount of uncertainty in their knowledge of the world (Loewenstein, 1994). Exposure to a small amount of previously unknown information brings into focus the presence of a knowledge gap, pushing the level of uncertainty past an acceptable threshold. This increased uncertainty prompts a search for information to fill the knowledge gap and resolve the unknown. In this work, we formalize gaps as topological cavities in growing knowledge networks and track their evolution in dimensions 0, 1, and 2 (Bianconi, 2021; Ju et al., 2020; Sizemore et al., 2018). Each dimension is characterized by a different kind of topological gap: 0-dimensional gaps correspond to disconnected network components, 1-dimensional gaps correspond to loop-like holes, and 2-dimensional gaps correspond to pocket-like voids. Across all dimensions, we find that the number of cavities increases as individual knowledge networks grow. Stated differently, associations between familiar concepts remain undiscovered even as we acquire more information. Hence, in addition to the common view of an expanding frontier of ignorance, knowledge growth is accompanied by an ever-expanding interior of ignorance (Ju et al., 2020). Except for the 0-th dimension, we report similar results for knowledge networks built collectively. Filling a 0-dimensional cavity entails adding an edge between two disconnected network components. Such edges may be easier for collectives to add than for individuals since interdisciplinary sub-fields within scientific domains are motivated to link disparate sub-areas of knowledge (Okamura, 2019). Importantly, and in support of the information gap theory, the number of 0- and 1-dimensional cavities is lower in observed individual and collective knowledge networks than in the corresponding null model data, reflecting a downward pressure on the number of gaps created, consistent with a gap-filling drive. Therefore, from a networks perspective, gaps—as envisioned by information gap theory, those that are prioritized for filling—may best correspond to topological cavities of dimensions 0 and 1. Stated differently, information gap theory provides an explanation for the markedly damped growth of lower dimensional cavities; however, a different account is needed to explain the contrasting proliferation of higher-dimensional cavities, both in individuals and in collectives.
Compression progress theory and efficient network representations of knowledge
To gain a deeper intuition, we turn to compression progress theory, which derives inspiration from resource limitations that underpin brain function (Schmidhuber, 2008). We represent knowledge as a network of concepts and their inter-relationships, and we compute network compressibility (Lynn and Bassett, 2021) to determine whether curiosity drives compression. We find that growing individual knowledge networks consistently exhibit greater-than-expected compressibility, consistent with the theory. This finding can be contextualized by considering that as we interact with the world, we encounter and consume large quantities of information. Constructing perfectly accurate mental models would entail storing each unit of acquired knowledge separately. However, finite resources constrain us to build compressed or efficient abstractions of observed data that can generalize across contexts (Tenenbaum et al., 2011). According to compression progress theory, information that—when acquired—facilitates such abstraction is more valuable (Schmidhuber, 2008). Our results support this proposition and suggest that individuals preferentially seek such information. By contrast, the compressibility curve for collective knowledge networks tends to align with the curve for the corresponding null model data in later stages of growth. This finding can be contextualized by considering the fact that collectives can store vast quantities of detailed information in a distributed manner and, hence, do not face the same resource limitations that individuals do. In summary, while compression progress theory is supported by our data from individual knowledge networks, the building of collective knowledge networks appears to require a different account.
Conformational change theory and the mechanical flexibility of knowledge networks
The conformational change theory of curiosity is an alternative account that is built on two assumptions. First, we assume that humans encode conceptual knowledge in cognitive networks. Second, we assume that knowledge networks are embedded in Euclidean space, where they possess several degrees of freedom. Both assumptions are predicated on how humans encode spatial and abstract knowledge (Garvert et al., 2017; Peer et al., 2021; Stiso et al., 2022; Warren, 2019). Evidence from spatial navigation studies demonstrates that mental representations of space take the form of labeled cognitive graphs. Each node represents a physical location and is accompanied by local metric information such as angles and Euclidean distances to its immediate neighbors (Chrastil and Warren, 2014; Peer et al., 2021; Warren, 2019). Furthermore, hexadirectional modulation, the telltale signature associated with an underlying map-like neural code, is observed in neural signals when individuals navigate discrete and continuous abstract concept spaces (Constantinescu et al., 2016; Park et al., 2021) (see Supplementary Materials for details on mental representations of spatial and non-spatial knowledge). Building on Euclidean cognitive graphs, we operationalize conceptual flexibility in knowledge networks as the number of conformational degrees of freedom. We find that growing individual knowledge networks have greater-than-expected embedding dimensionality and conformational flexibility. According to conformational change theory, embedding dimensionality increments when growing knowledge networks become over-constrained and develop self-stress. We find that such stress arises more frequently in individual knowledge networks than in null model data. This observation is consistent with the conformational change theory of curiosity, and suggests that individuals’ idiosyncratic acquisition of information leads to a frequent reshaping of concept relations based on context. By contrast, in knowledge networks built collectively we find that the evolution of mechanical features-of-interest cannot be distinguished from their evolution in null model data. Collective networks grow through a dynamic interplay of consensus and dissensus between large groups of individuals. Therefore, it is possible that due to the long time scales that we focus on in this study, dynamic events associated with collective knowledge growth, such as paradigm shifts, are simply concealed from view in local sectors of each field.
Implications for reinforcement learning
The computational metrics that we examine here are relevant not only for the study of human curiosity, but also potentially for that of artificial intelligence. Compressibility, for instance, was originally proposed as an intrinsic learning signal to guide reinforcement learning (Schmidhuber, 2008). In both single and multi-agent settings, the design of intrinsic (or curiosity-based) reward signals for reinforcement learning is an increasingly important area for further research (Aubret et al., 2019), and may benefit from computational insights into human behavior, such as those derived from our analyses here. Our work provides several candidate metrics—such as the number of topological cavities, network compressibility, and conformational flexibility—that can act as suitable curiosity-based signals for tasks where the environment can be modeled as a network. Information acquisition in reinforcement learning is a means to an end, where the end is a reward associated with the successful completion of a specific task (Sutton and Barto, 2018). An agent seeking to collect high total reward during interactions with its environment must strike a balance between exploitation and exploration. The agent must exploit, or productively use, those actions that are currently known to yield high reward but must also occasionally explore untested actions that may eventually turn out to be better. In many real-world settings, external rewards are highly infrequent or even completely absent and, thus, cannot reliably guide behavior. In such sparse reward environments, curiosity-like intrinsic motivations can lead to improved exploration and, by extension, improved task performance (Pathak et al., 2017; Savinov et al., 2018). At the collective level, models of intelligence tend to characterize the interactions between multiple agents. It remains to be seen, however, whether features such as coordination or cooperation emerge from prescriptive rules that describe individual agents’ motivations. Our work represents an initial step in this direction.
Conclusion
We conceptualize curiosity as the process of knowledge network building in order to examine three theoretical accounts: information gap theory, compression progress theory, and conformational change theory. Formalizing curiosity in terms of networks helps us to
Methods
Data
Knowledge networks built by individuals
Knowledge networks for individuals are constructed with data obtained from the “Knowledge Networks Over Time” (KNOT) study (Lydon-Staley et al., 2020, 2020; Lydon-Staley et al., 2021). These data are comprised of Wikipedia browsing histories of 149 individuals (121 women, 26 men, 2 other) collected between October 2017 and July 2018. At the beginning of the study, all interested participants attended a laboratory session, where they received training in a daily assessment protocol and were guided through the installation of a tracking software called
We treat all pages visited by an individual as nodes in a knowledge network. Edges between nodes are specified based on the presence of hyperlinks. Prior work has found that pairs of pages connected by hyperlinks are significantly more similar to each other compared to pairs that are not (Lydon-Staley et al., 2021). Thus, we add an undirected and unweighted edge between Page 1 and Page 2 if either Page 1 links to Page 2 or Page 2 links to Page 1. Hyperlinks are not required to exist bidirectionally for an edge to exist between two nodes. We determine the presence of hyperlinks based on how Wikipedia appeared on August 1, 2019. Each node (or page) in the browsing data is accompanied by an index that denotes the temporal order in which it was visited. Each new session begins at the last visited page of the previous session. We stitch data acquired across 21 days to build a comprehensive browsing history. For every individual, the nodes and edges as well as the order of node visitation is used to specify a growing knowledge network.
Knowledge networks built collectively
In its role as an encyclopedia, Wikipedia represents a large repository of knowledge acquired over thousands of years through collective human effort. Building on prior work, we construct domain-specific collective knowledge networks by taking subgraphs of the larger Wikipedia network (Ju et al., 2020). Information in Wikipedia is organized in a hierarchical manner, which makes it possible to identify articles that pertain to a particular domain of interest. We capitalize on this structure to construct knowledge networks for the following thirty topics: abstract algebra, accounting, biophysics, Boolean algebra, calculus, cognitive science, commutative algebra, dynamical systems and differential equations, dynamical systems, earth science, economics, education, energy, evolutionary biology, geology, geometry, group theory, immunology, linear algebra, linguistics, meteorology, molecular biology, number theory, optics, philosophy of language, philosophy of law, philosophy of mind, philosophy of science, sociology, and software engineering. All pages listed under a topic are treated as nodes in the topic’s network. For instance, the network for molecular biology contains pages for “allele,” “lymphocyte,” and “antibody” as nodes. Similar to knowledge networks for individuals, edges between nodes are considering hyperlinks. Typically, articles also contain information about the year in which the concept they describe first became known; the year attribute is used as an index to specify node order in a growing graph. For instance, benzene was first isolated by Michael Faraday in 1825. Therefore, in the collective knowledge network for chemistry, the node for benzene is added before other nodes timestamped after the year 1825. More details on the network construction process (such as the procedure followed when a page has no year attribute) are available from Ref. Ju et al., 2020.
Detecting topological cavities
In order to identify cavities of various dimensions in a network, we construct a higher-order relational object known as a
In a clique complex, a
The symmetric difference is an associative operation that returns the union of two sets without their intersection. A set of
The graph filtration from equation (1) induces a related filtration of clique complexes
Notation for information gap theory.
For a more comprehensive treatment of topological data analysis, we direct the interested reader to Refs. Bianconi, 2021; Carlsson, 2009; Ghrist, 2007; Hatcher, 2002; Sizemore et al., 2019; Zomorodian and Carlsson, 2005.
Computing network compressibility
In order to estimate the compressibility of a network, we consider a binary graph
Notation for compression progress theory.
Computing mechanical network features
Consider a set of nodes
To linear order, this constraint can be modified by taking the total derivative of both sides and dividing by 2 to yield
This line of reasoning was first put forth by Maxwell (1864).
Notation for conformational change theory.
Statistical testing
We use non-parametric permutation testing to determine whether feature curves, such as those for compressibility and conformational flexibility, for empirical knowledge networks differ significantly from those for corresponding null model networks (Ramsay and Silverman, 2005). For a given feature, we first compute the area
Citation diversity statement
Recent work in a number of scientific fields has identified a bias in citation practices such that papers by women and other minority scholars are under-cited relative to the number of such papers in the field (Bertolero et al., 2020; Caplar et al., 2017; Chatterjee and Werner, 2021; Dion et al., 2018; Dworkin et al., 2020; Fulvio et al., 2021; Maliniak et al., 2013; Mitchell et al., 2013; Wang et al., 2021). Here, we sought to proactively choose references that reflect the diversity of the field in thought, form of contribution, gender, race, ethnicity, and other factors. First, we predicted the gender of the first and last authors of each reference using databases that store the probability of a first name being carried by a woman (Dworkin et al., 2020; Zhou et al., 2020a). By this measure (and excluding self-citations to the first and last authors of our current paper), our references contain 16.27% woman(first)/woman(last), 11.80% man/woman, 19.30% woman/man, 52.63% man/man citation categorizations. This method is limited in that (a) names, pronouns, and social media profiles used to construct the databases may not, in every case, be indicative of gender identity and (b) it cannot account for intersex, non-binary, or transgender people. Second, we obtained predicted racial/ethnic category of the first and last author of each reference using databases that store the probability of a first and last name being carried by an author of color (Ambekar et al., 2009; Sood and Laohaprapanon, 2008). By this measure (and excluding self-citations), our references contain 4.67% author of color/author of color, 9.86% white author/author of color, 20.34% author of color/white author, and 65.12% white author/white author citation categorizations. This method is limited in that (a) names, Census entries, and Wikipedia profiles used to make predictions about gender may not be indicative of racial/ethnic identity, and (b) it cannot account for Indigenous and mixed-race authors, or those who may face differential biases due to the ambiguous racialization or ethnicization of their names. We look forward to future work that could help us to better understand how to support equitable practices in science.
Supplemental Material
sj-pdf-1-col-10.1177_15459683211207633 – Supplemental Material for Curiosity as filling, compressing, and reconfiguring knowledge networks by Shubhankar P Patankar, Dale Zhou, Christopher W Lynn, Jason Z Kim, Mathieu Ouellet, Harang Ju, Perry Zurn, David M Lydon-Staley and Dani S Bassett in Collective Intelligence
Supplemental Material, sj-pdf-1-chc-10.1177_26339137231207633 for Supplemental Material for Curiosity as filling, compressing, and reconfiguring knowledge networks; by Shubhankar P Patankar, Dale Zhou, Christopher W Lynn, Jason Z Kim, Mathieu Ouellet, Harang Ju, Perry Zurn, David M Lydon-Staley and Dani S Bassett
Footnotes
Acknowledgements
Declaration of conflicting interests
Funding
Data availability statement
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
