Abstract
Describing and understanding hierarchical structures of psychological constructs, such as personality (Markon et al., 2005), intelligence (McGrew, 2009), and psychopathology (Lahey et al., 2021), is fundamental to advancing their theory and measurement (Kotov et al., 2017). Refining and confirming these structures ensures that subsequent models adequately capture the underlying conceptual complexity. Empirical investigations to establish these hierarchies, however, face many obstacles, such as the interrelations between constructs (Marsh et al., 2010), potential overlaps in content (e.g., jingle-jangle; Condon et al., 2020; Wulff & Mata, 2023), and other nuanced measurement issues (Achenbach, 2021; Bringmann et al., 2022). As a result, developing methodological approaches to effectively recover the hierarchical organizations of psychological constructs remains a challenge with significant implications for theory and practice.
The majority of efforts to develop and evaluate hierarchical structures have used top-down methods that impose
A growing number of researchers have called for personality to be explored from the “bottom-up”—to investigate relationships at the
In light of the need for a comprehensive, bottom-up approach to assess the hierarchical structure of personality (Condon et al., 2020), we developed a network psychometrics framework called
Methodological challenges for hierarchical structures
There are a number of important methodological challenges that hinder the development and validation of hierarchical structures (Clark & Watson, 2019), including, but not limited to, violations of local independence, wording effects, dimensionality assessment for each level of the hierarchy, and structural robustness. Common scale development practices start with a set of target constructs (e.g., personality trait domains) and develop each construct independently (e.g., Extraversion; Lambert & Newman, 2023). For example, the IPIP-NEO defines Extraversion as someone who is outgoing and enjoys interacting with the external world (Johnson, 2014). From this definition, Friendliness, Gregariousness, Assertiveness, Activity-level, Excitement-seeking, and Cheerfulness serve as narrow characteristics that target specific
Similar to how these constructs are developed, conventional psychometric approaches validate each facet or trait one-by-one, taking their constituent items at face value (Achenbach, 2021). Classical test theory, for example, identifies items most strongly correlated with the overall sum score in their respective facet or trait (i.e., item-total correlations) and removes items with low correlations. In modern test theory, factor analysis and item response theory are typically conducted dimension-by-dimension in large item pools to evaluate how item parameters correspond to their underlying latent construct (DeVellis, 2007; Forbes, Baillie et al., 2024; Irwing et al., 2024; Reise & Waller, 2009). In both approaches, facets and traits are often “siloed” relative to each other, which prevents simultaneous evaluations of relations across the item pool.
Although this process aims to ensure homogeneous constructs (Burisch, 1984), the lack of cross-level evaluation can lead to numerous issues such as local independence violations, substantial cross-loadings, method effects, and the merging or collapsing of facets or traits, when all items are analyzed together (Marsh et al., 2010, 2013). An example from the IPIP-NEO includes the items “Am afraid to draw attention to myself” in the Self-consciousness facet of Neuroticism, “Don’t like to draw attention to myself” (reverse) in the Assertiveness facet of Extraversion, and “Dislike being the center of attention” in the Modesty facet of Agreeableness. Despite belonging to three different facets and traits, these items strongly correlate due to their shared content of seeking attention. Each item fits within its prescribed facet and trait, so when researchers evaluate them independently using conventional procedures, these cross-level relations remain hidden (only to reappear later as additional “semantic” dimensions or correlated residuals in item-level factor models; Marsh et al., 2010).
To date, there are two main approaches to estimate hierarchical structures from large item pools: hierarchical clustering and bass-ackwards (Goldberg, 2006; Forbes, Baillie et al., 2024, Forbes, Watts et al., 2024; but see Condon, 2022). Agglomerative hierarchical clustering is a bottom-up approach that develops a tree-like hierarchy by merging variables one-by-one (Ward, 1963). In contrast, the bass-ackwards approach establishes a hierarchy from the top-down by extracting the first principal component or factor that explains the maximum possible variance across all variables and continuing to extract dimensions until there is no meaningful variance to extract (Goldberg, 2006). Recent improvements on the bass-ackwards approach include removing redundant components, identifying statistical artifacts, and examining relationships among components across all levels (Forbes, 2024). Despite the recent improvements and empirical applications of these approaches in personality (Goldberg, 2006) and psychopathology (Forbes, Baillie et al., 2024; Forbes, Watts et al., 2024), several limitations and challenges remain unaddressed.
One important limitation is that both clustering and factor analytic methods often struggle to recover the correct simulated structure when item pools are large or dimension sizes vary (i.e., number of variables per dimension; Hands & Everitt, 1987; MacCallum et al., 1999, 2001; Milligan, 1981). Both conditions are likely to occur in hierarchical structures. Large item pools arise from the common psychometric practice of creating many variables per dimension to increase internal consistency (DeVellis, 2007). Variable dimension sizes may occur during the item refinement phase of scale development or if cross-dimension item interrelations have not been previously explored (e.g., attention-seeking).
A related limitation is the lack of supporting simulation evidence. Simulation studies allow researchers to evaluate how well different methods recover the underlying hierarchical structure by systematically varying parameters with known values (e.g., factor loadings, items per specific factor, number of specific and general factors; Jiménez et al., 2023). These studies are essential to determine whether a method is likely to work across diverse datasets or only under specific conditions (Siepe et al., 2024). Although hierarchical clustering and bass-ackwards have been applied empirically (Forbes, Baillie et al., 2024; Forbes, Watts et al., 2024), neither approach, to our knowledge, has been rigorously evaluated in their ability to recover simulated hierarchical structures.
A core methodological challenge in the identification of hierarchical structures and large items pools is the assumption of local independence. This assumption is central to latent variable models, stating that items are uncorrelated after accounting for latent variables (Chen & Thissen, 1997; Holland & Rosenbaum, 1986). Violations of local independence can happen for many reasons such as acquiescence and shared semantic content (Leising et al., 2024). These violations can cause problems ranging from minor to severe, including model misspecification (Montoya & Edwards, 2021), biased model parameters (Edwards et al., 2018), and inaccurate estimates of internal structure (Wood et al., 1996).
A fundamental limitation in latent variable approaches is that specifying the correct number of latent variables is required to accurately detect violations of local independence, yet local independence violations adversely impact dimensionality assessment (Flores-Kanter et al., 2021; Montoya & Edwards, 2021). The circular nature of this problem can lead to an impasse in exploratory situations. Indeed, although methods have been developed to detect these violations in the factor analysis (Ferrando et al., 2022), item response theory (Chen & Thissen, 1997; Edwards et al., 2018), and exploratory structural equation modeling (ESEM; Saris et al., 2009) frameworks, they all rely on the correct specification of the number of factors to work as intended (Christensen et al., 2023). If such violations are present and left unchecked, the validity of the estimated structure is questionable.
Another methodological challenge stems from the item content themselves. In personality research (and much of psychology), it’s common for items to be administered using a Likert scale (e.g., strongly disagree to strongly agree) with some items reverse-keyed to control for response biases (e.g., “Don’t like to draw attention to myself” in the Assertiveness facet of Extraversion). However, this practice can introduce wording effects, a form of systematic method variance that arises when people provide inconsistent answers to positively and negatively worded items measuring the same construct (Gu et al., 2015; Kam, 2018). These effects emerge from various response biases, such as carelessness, acquiescence, and item difficulty, particularly for negatively worded statements (Swain et al., 2008; Weijters et al., 2013). When people agree with both positively and negatively phrased items that are logically opposite, it indicates potential biases rather than true trait variance. In the context of latent variable models, wording effects can distort factor solutions by introducing artificial method factors (DiStefano & Motl, 2006), altering factor loadings (Arias et al., 2020), and affecting the estimation of correlations between substantive traits. Failing to account for wording effects may lead to inflated or deflated trait correlations depending on how method variance interacts with the factor structure (Nieto et al., 2021). If unmodeled, wording effects can obscure the true hierarchical organization of traits, emphasizing the need for appropriate modeling strategies to mitigate their impact (Schmalbach et al., 2020).
When violations of local independence and wording effects both occur, accurately recovering hierarchical structures becomes a formidable task. These problems, coupled with the lack of simulation evidence to recover dimensions in large item pools, may render most conclusions about empirically derived hierarchies to be speculative at best. Achenbach (2021, p. 65) summarizes these issues succinctly: “Despite the growing popularity of hierarchical dimensional models, their value may be undercut if researchers fail to deal scientifically with the many details to be mastered in properly constructing, testing, and applying such models.”
Taxonomic graph analysis framework
Our primary aim is to develop a network psychometrics framework that combines several recently developed methods, validated via extensive simulations, to identify hierarchical structures from the bottom-up. The TGA framework consists of seven primary steps: (1) assess and mitigate local independence violations, (2) assess and mitigate wording effects (if necessary), (3) estimate a network, (4) reach a consensus on the number of dimensions and item assignments, (5) determine the robustness of these dimensions and assignments, (6) compute network scores for the next level, and (7) estimate Taxonomic graph analysis framework.
Step 1: Remove redundancies
The framework begins by addressing local independence violations by identifying and removing redundant items. Although network proponents have criticized latent variable models for their local independence assumption (Cramer et al., 2012), substantial shared variance over-and-above other relations in a network can lead to similar effects such as distorted dimension recovery (e.g., minor dimensions) and biased parameter estimates (e.g., centrality confounded by redundancy rather than true relative position; Fried & Cramer, 2017). Unique Variable Analysis (UVA; Christensen et al., 2023) was developed to assess and mitigate this issue using a network estimation method followed by the graph theoretic measure of weighted topological overlap (wTO; Nowick et al., 2009) to identify redundant nodes in the network. Recent simulation work has demonstrated that UVA is effective across data types and conditions, performing as well as or better than alternative methods like ESEM modification indices (Saris et al., 2009) and correlated residuals (Christensen et al., 2023; Ferrando et al., 2022). In contrast to conventional procedures, UVA can be applied
Step 2: Mitigate wording effects
The second step is to evaluate wording effects by fitting a random intercept factor model where regular- and reverse-keyed items (without recoding) load onto a single latent factor with loadings fixed to one, enabling the estimation of an additional variance component that captures response biases and individual differences in scale usage (Maydeu-Olivares & Coffman, 2006). This model relaxes the assumption that all people use the response scale identically and accounts for variance that would otherwise distort dimensionality estimates (García-Pardina et al., 2024). The presence of wording effects is determined by model convergence. If the model converges, it suggests that wording effects are likely present with their magnitude reflected in the size of the loadings on the random intercept factor; if it fails to converge, wording effects are negligible or absent. When wording effects are detected, the residual correlation matrix, which has the variance attributed to the random intercept factor removed from the sample correlation matrix, is used for dimensionality estimation; otherwise, the sample correlation matrix is retained. Recent simulation research has demonstrated that the random intercept exploratory graph analysis (riEGA) provides more accurate dimensionality estimates than the random intercept parallel analysis in the presence of wording effects (García-Pardina et al., 2024). By explicitly modeling and removing the variance associated with wording effects before dimensionality assessment, riEGA increases the accuracy of dimension recovery and mitigates distortions introduced by response biases.
Step 3: Estimate network structure
After mitigating violations of local independence and wording effects, the network structure can be estimated. A common network estimation method is the graphical least absolute shrinkage and selection operator with extended Bayesian information criterion for model selection (commonly referred to as EBICglasso; Chen & Chen, 2008; Epskamp & Fried, 2018; Foygel & Drton, 2010; Friedman et al., 2008). The result of the EBICglasso algorithm is a sparse network where nodes (circles) represent variables and edges (lines) represent regularized partial correlations (Epskamp & Fried, 2018). Although there are many other network estimation methods that could be used (e.g., non-regularized methods; Williams et al., 2019), the EBICglasso has consistently demonstrated comparable or better performance over other methods for dimension recovery (Christensen et al., 2024; Golino & Epskamp, 2017; Golino et al., 2020).
Step 4: Reach dimension consensus
On the network structure, a community detection algorithm can be applied to identify
Step 5: Establish dimension robustness
The next step evaluates the consensus solution’s robustness using Bootstrap EGA (bootEGA; Christensen & Golino, 2021), which assesses the generalizability of dimensions through a resampling with replacement bootstrap procedure. This process applies EGA or riEGA (if wording effects are present) with the most common consensus method (using the same
Step 6: Compute network scores
Once a robust consensus solution for communities and item assignments is established, network loadings and subsequent scores can be computed. Simulation studies have demonstrated that network loadings can accurately capture the same patterns as factor loading patterns when data are generated from a latent factor model (Christensen et al., 2025). Network scores, calculated by multiplying data by network loadings (Golino et al., 2022), can estimate factor correlations with comparable accuracy to exploratory factor analysis when data are generated from factor models (Christensen et al., 2025). A unique feature of network loadings (and subsequently scores) is that they do not require or depend on (factor) rotations to make their estimates accurate and interpretable.
Step 7: N-level communities
The network scores from each level can be used to compute the next level’s dimensions by repeating Steps 3–6 until reaching unidimensionality. One limitation of contemporary network psychometric methods is that if all nodes are connected to
Summary
Taken together, TGA provides a systematic, bottom-up framework based on network psychometrics to identify hierarchical structures. Each step of TGA addresses methodological challenges that have historically been obstacles to the accurate identification of hierarchical structures in psychology (i.e., siloed item analyses, violations of local independence, wording effects, dimensionality assessment for each level of the hierarchy, and verifying the robustness of the hierarchical structure). To date, TGA offers one of the most comprehensive approaches to investigate hierarchical constructs from the bottom-up.
Personality taxonomies
Personality reflects variations in thoughts, feelings, and behaviors that occur within and across people (Funder, 2009), reliably predicting a range of important work and life outcomes across cultures, time, and raters (Barrick & Mount, 1991; Kim et al., 2019; Ozer & Benet-Martínez, 2006; Roberts & DelVecchio, 2000; Soto, 2019). Personality is typically organized into hierarchical structures based on lexical patterns of covariation (Markon, 2009) that are consolidated around the “Big Few” such as the five- and six-factor models of personality prevalent today (John et al., 2008). The psycholexical approach, which examined how trait adjectives cluster based on covariation of natural language, led to the discovery of consistently replicable dimensions that are known today as the Big Five (Allport & Odbert, 1936; Cattell, 1943; Goldberg, 1990; Tupes & Christal, 1961). These Big Five have inspired many subsequent frameworks: lexically derived Big Five (Goldberg, 1990; McCrae & John, 1992), questionnaire-based Five Factor Model (NEO-PI-R; Costa & McCrae, 1992), circumplex models (AB5C; Hofstee et al., 1992), and the Big Six model with HEXACO (Ashton & Lee, 2020). A consensus on their content and structure remains elusive between these models and other frameworks (Baumert et al., 2017; Block, 2010; Christensen et al., 2019; Condon et al., 2020; Mõttus et al., 2020; Schwaba et al., 2020).
Above the trait level, there is a general consensus around two meta-traits, such as Alpha and Beta (Digman, 1997) or Stability and Plasticity (DeYoung et al., 2002), that reflect shared variance in lower-level traits, representing tendencies toward restraint and control or exploration and engagement, respectively. Alternative cross-lexical models include the Big Two, such as Dynamism and Social Self-Regulation (Saucier et al., 2014), which are similar to Getting Ahead and Getting Along (Hogan, 1982), and the Big Three supertraits (Dynamism, Affiliation, and Order; De Raad et al., 2014) that reproduce across languages. Some have even proposed a single, general factor of personality (Musek, 2007). Below the Big Few, structures vary by inventory: aspects serve as intermediaries between traits and facets (DeYoung et al., 2007), while facets represent narrower characteristics with no clear consensus on their number or content (ranging from twenty to sixty; Ashton & Lee, 2020; Irwing et al., 2024; Johnson, 2014; McCrae, 2015; Christensen et al., 2019; Schwaba et al., 2020). At the lowest level, there appears to be as many personality nuances (e.g., items) as there are stars in the sky (Condon et al., 2020).
Beyond typical trait constellations, alternate structures and dimensions of personality have been proposed that encompass individual differences related to, but not captured by, the Big Few (Block, 2010). These alternatives include maladaptive personality models, which capture tendencies toward disordered thoughts, feelings, and behaviors such as Detachment, Antagonism, Disinhibition, and Psychoticism (Krueger et al., 2012), competing three-factor models (e.g., PEN model of Psychoticism, Extraversion, and Neuroticism; Eysenck et al., 1985), and the Dark Triad framework which considers non-pathological personality traits of Narcissism, Machiavellianism, and Psychopathy (Paulhus & Williams, 2002). Finally, some facets within the Big Few structures have been considered traits themselves, such as Risk Propensity (Highhouse et al., 2022) and Impulsivity (DeYoung & Rueter, 2016; Whiteside & Lynam, 2001), making their location in the hierarchy uncertain.
IPIP-NEO personality hierarchy
Overall, there are many competing structures of personality with little consensus around the organization of each level in the hierarchy (Block, 2010; Mõttus et al., 2020). Clarifying the personality hierarchy can have substantial consequences for descriptive, predictive, and explanatory efforts (Baumert et al., 2017; Blum et al., 2021). Accordingly, there are increasing calls for refinement of the Big Few frameworks and facets with novel methodological approaches from the bottom-up (e.g., Castro et al., 2021; Condon et al., 2020; Roberts & Yoon, 2022; Thielmann et al., 2022). When attempting to assess a personality hierarchy from the bottom-up, a broad item pool is recommended (Condon et al., 2020), making the 300-item IPIP-NEO inventory an ideal starting point (Goldberg, 1999).
The IPIP-NEO has several advantages including a large data repository that contains over 300,000 observations (Kajonius & Johnson, 2019). In addition, it serves as an open-source proxy for the widely used NEO-PI-R (Costa & McCrae, 1992), which makes its relations closely aligned with the broader NEO-PI-R literature because of its similar hierarchical structure of 30 facets that form the Big Five. Finally, the IPIP-NEO has been recognized for having broad (but not exhaustive) coverage of the universe of personality content (Condon et al., 2020). Altogether, the IPIP-NEO framework offers a useful foundation on which to build a personality taxonomy from the bottom-up. Clarifying the IPIP-NEO hierarchy through novel methodological approaches can lead to more accurate measurement, substantive interpretations, and predictions at each level (Blum et al., 2021; Condon et al., 2020; Irwing et al., 2024; Mõttus et al., 2020).
Present research
Although the IPIP-NEO has a well-defined theoretical structure, its empirical structure has not been rigorously examined, to our knowledge, using exploratory methods on the entire item pool. TGA provides a systematic approach to identify hierarchical dimensions starting from the items, and accounting for methodological challenges that have historically impeded the accurate recovery of hierarchical structures. Such an approach opens the door to discover novel dimensions or alternative organizations that may not have been considered previously within the constraints of its item pool. Therefore, a primary goal of the present research is to exemplify the TGA framework by evaluating the hierarchical structure of the IPIP-NEO inventory from the bottom-up. A second goal is to clarify the extent to which the recovered hierarchy aligns with or deviates from our current understanding of how the item pool organizes into facets, traits, and meta-traits. By identifying areas of convergence and divergence, we can provide valuable insights into the structure of personality as operationalized with the IPIP-NEO and highlight possible refinements to its conceptual framework. These findings have the potential to enhance our understanding of the IPIP-NEO hierarchy, its relations to other personality frameworks, and may improve subsequent prediction and explanation efforts. Overall, our findings can contribute to the ongoing dialogue on personality structure and provide a foundation for further research to investigate the hierarchical nature of individual differences.
Method
Participants
We used archival personality data from Johnson’s IPIP-NEO repository (e.g., Kajonius & Johnson, 2019). The IPIP-NEO 300-item dataset has 307,313 cases and is available on their OSF (https://osf.io/tbmh5). In the original database, the reverse-keyed items were recoded in the direction of their theoretical facet. In order to perform TGA, the reverse-keyed items were recoded back to be in their original direction. The dataset was subset to only include respondents based in the United States of America (U.S.A.) between 19 to 69 years old for a working sample of
Measures
The IPIP-NEO was created by evaluating a large item pool (over 1,000 items) and constructing 30 facets that mirrored the facets of the NEO-PI-R (Goldberg, 1999). The IPIP-NEO structure contains 6 facets per FFM trait and 10 items per facet: Openness to Experience (Adventurousness, Artistic Interests, Emotionality, Imagination, Intellect, Liberalism), Conscientiousness (Achievement-striving, Cautiousness, Dutifulness, Orderliness, Self-discipline, Self-efficacy), Extraversion (Activity-level, Assertiveness, Cheerfulness, Excitement-seeking, Friendliness, Gregariousness), Agreeableness (Altruism, Cooperation, Modesty, Morality, Sympathy, Trust), and Neuroticism (Anger, Anxiety, Depression, Immoderation, Self-consciousness, Vulnerability). Although the facets that correspond between the IPIP-NEO and NEO-PI-R correlate strongly (on average, 0.73; Goldberg, 1999), many facets have different labels (e.g., Immoderation and Impulsivity, respectively). The ranges for Cronbach’s
Statistical analysis
Missingness
Although the rate of missing data was small (0.4%), there were 183,923 total missing responses due to the large sample. Missing data were imputed by taking the rounded mean of each variable (e.g., 3.56 = 4; 2.21 = 2), as is appropriate when the missing data rate is minor (i.e., < 1–2%; Widaman, 2006). 2
Taxonomic graph analysis
Step 1: Remove redundancies
UVA was iteratively applied to remove redundancies by estimating an EBICglasso network, computing wTO, and removing all but one item from redundant item sets that had wTO values greater than 0.20 (Christensen et al., 2023). UVA first estimated a network using the EBICglasso (Epskamp & Fried, 2018) on polychoric correlations and then computed wTO (Nowick et al., 2009) to quantify the extent to which each pairwise combination of nodes (items)
Step 2: Mitigate wording effects
To account for potential wording effects, riEGA (Garcia-Pardina et al., 2024) was estimated using the sample polychoric correlation matrix of the items and the maximum likelihood estimator within a random intercept model. If the model fails to converge, it is assumed that wording effects are negligible, and the sample correlation matrix is used for network estimation. However, when the model converges, as observed in this study, the network is estimated using the residual correlation matrix after controlling for the random intercept factor. This approach was applied across all consensus runs to ensure congruent network estimation. riEGA was applied to the first-level only because wording effects are only relevant at the item-level.
Step 3: Estimate network structure
The EBICglasso (Epskamp & Fried, 2018) was applied by searching along 100 values of
Step 4: Reach dimension consensus
The lower order Louvain algorithm (Jiménez et al., 2023) with the most common consensus clustering approach (Golino & Christensen, 2025) was applied to the EBICglasso network. The number of repetitions in the most common consensus approach was incrementally increased in powers of ten starting with 102 (100) up to 106 (1,000,000) to reach a stable solution. At each increment, 100 applications of the most common consensus clustering approach were used to establish the stability of the most common solution with the goal of achieving the same solution across all 100 applications. The first-level achieved this aim at 104 (10,000) repetitions (as well as 105 and 106) and both the second- and third-level achieved this aim for all repetition increments (i.e., 102−6). For first-level analyses, some consideration should be given to communities with only two nodes, as traditional psychometric practices suggest that at least three items are required to adequately assess a dimension (Gorsuch, 1997). In these cases, UVA heuristics can be applied: retain the item with the lowest maximum wTO with all other variables.
Step 5: Establish dimension robustness
bootEGA (Christensen & Golino, 2021) using the resampling with replacement procedure was applied after a consensus solution was reached. The resampling with replacement procedure randomly draws an observation from the original sample and makes a copy of their responses in a replicate sample. This observation is placed back into the original sample (replacement) and then another observation is drawn from the original sample (meaning the same observation can be drawn more than once). These draws repeat (resampling) until the replicate sample has the same number of observations as the original sample. On this replicate sample, the same procedures that were performed on the original sample up to this step are applied to the replicate sample (e.g., riEGA, network estimation, most common consensus clustering using the largest number of repetitions in Step 4). This process was repeated 500 times to create a sampling distribution of the communities and item assignments. Using the 500 replicate results, item stability, or the extent to which each item was assigned to the community found in the empirical most common consensus solution (Step 4), is computed as a proportion. Following Christensen and Golino (2021), item stabilities less than 0.75 are considered unstable. For the first-level, if any items were unstable (item stability < 0.75), they were removed from the analysis and Steps 2–5 were repeated.
Step 6: Compute network scores
Network loadings and scores were computed following Christensen et al. (2025). Network loadings have different magnitudes relative to factor loadings but maintain interpretable effect sizes of small (0.20), moderate (0.35), and large (0.50). Network scores were computed by multiplying the standardized data by its respective community assigned loading only (Golino et al., 2022).
Step 7: N -level dimensions
The network scores were used to estimate the Pearson’s correlation matrix for the next level of the hierarchy and Steps 3–6 are repeated until a single dimension is identified. To evaluate whether the final level of the hierarchy was unidimensional, the unidim metric was used (Revelle & Condon, 2025). unidim is the product of two indices:
Openness and transparency
All code, output, supplemental tables, and a link to the Johnson IPIP-NEO repository are available on the project OSF (https://osf.io/hwpa9). Analyses were conducted in R (version 4.3.1; R Core Team, 2023) using
Results
Procedural results
Over three passes with UVA, 46 locally dependent sets of variables were identified and resolved to reduce the dataset from 300 to 249 items. 4 Next, using the residual correlation matrix from the riEGA, the EBICglasso network was estimated, and the lower order Louvain with most common consensus was applied for one million repetitions, reaching perfect agreement across 100 applications. In this application, 7 two-node communities were identified and one node from each were removed, resulting in 242 items.
Applying these same steps (except for UVA) to the 242 items, the next cycle had zero two-node communities; however, further inspection of the communities revealed one set of items that formed a music dimension (theoretical facet and trait in parentheses): “Dislike loud music” (Excitement-seeking of Extraversion), “Like music” (Artistic Interests of Openness to Experience), and “Do not like concerts” (Artistic Interests of Openness to Experience). Despite the semantic similarity of the item content, their broader association patterns differed: “Dislike loud music” was more related to Recklessness, “Like music” was more related to Artistic Interests, and “Do not like concerts” was more related to Gregariousness. Of the three, “Like music” was retained due to its association with its theoretically intended dimension (Artistic Interests) and the other two items were removed.
Item content nested in their first-, second-, and third-level dimensions.
Network scores were computed, and the second level was established using these scores. At the second level, only one pass through the steps was necessary (consensus and robustness were achieved; all stabilities = 1.00), resulting in 6 second-level dimensions (Table 1). Similarly, network scores were computed, and a single pass was needed to establish 3 third-level dimensions (all stabilities = 1.00; Table 1). As expected, the fourth-level resulted in a single dimension according to the lower order Louvain algorithm. At this stage, the unidim metric was applied and there was no statistical support for unidimensionality ( IPIP-NEO taxonomic network.
Mapping the theoretical structure to the empirical structure
To interpret and label the network dimensions at each level, several of the authors met to interpret the dimension content and label the dimensions accordingly. The authors aimed to keep the dimension interpretations in line with the existing theoretical IPIP structure, inventories (e.g., HEXACO; Ashton & Lee, 2007), and research (e.g., Impulsivity, Integrity; Laginess, 2016; Whiteside & Lynam, 2001); opting to keep the theoretical labels for dimensions that retained most of their theoretically aligned content and considering new labels for dimensions with new orientations. Network loadings were used to identify the key defining features of each dimension at every level (first-level loadings on the second-level dimensions are provided in Table 1). At the same time, GPT-4, a foundational language model (OpenAI, 2023), was used to augment the human decision-making process (i.e., Brynjolfsson, 2022) by providing it with detailed prompting and item-based descriptions of the communities. The authors met again to compare the sets of human and GPT-4 labels in order to finalize the dimension interpretation and labeling.
To compare the theoretical and empirical (TGA) facets (first-level) and traits (second-level), the reduced item set (221 items) was grouped according to the theoretical and empirical assignments, respectively. The first analysis depicts the proportion of items from each theoretical facet that were identified in an empirical facet (Figure 3). Dimensions were “mapped” within their trait and organized from most alike to least alike (based on proportions). To provide an example, the retained Adventurousness items (9 out of 10 possible) are considered. Eight of the nine items ( Proportion correspondence map of the first-level dimensions. Correlation correspondence map of the first-level dimensions. 

Overall, Openness to Experience retained the highest proportion of items within its theoretical domain and facets, with the exception of Emotionality, which moved to Neuroticism and still kept a substantial proportion of its items (0.750; Figure 3). Although a substantial proportion of the Neuroticism and Conscientiousness items also remained within their respective domains, more than half did not align with their intended facets. The majority of Neuroticism’s Anxiety (0.900) and Vulnerability (0.875) items merged into a single dimension of Anxiety. Neuroticism gained a novel but weakly loading dimension of Dominance (Table 1), which had its highest proportion of items from Cooperation (0.857; Agreeableness) and Assertiveness (0.600; Extraversion). The Achievement-striving facet of Conscientiousness split almost evenly between Work Ethic (0.556) and Determination (0.333), and its Orderliness facet retained all of its items (1.000) as well as one or two items from the other Conscientiousness facets (except for Self-efficacy).
Sociability emerged mostly from a mix of Extraversion (Cheerfulness, Gregariousness, Friendliness) and Agreeableness (Sympathy, Altruism, Modesty, Trust) facets as well as with some additional items from Neuroticism (Self-consciousness). The Extraversion facets largely retained all of their items (1.000, 1.000, 0.800, respectively) whereas the Agreeableness facets retained the majority of their items (0.556, 0.667, 0.625, 0.750, respectively). Integrity arose from the two facets of Dutifulness (0.875 total; Conscientiousness) and Morality (0.777 total; Agreeableness), with their items spread across Integrity’s facets: Manipulativeness (0.250 and 0.444, respectively), Fairness (0.250 and 0.333, respectively), and Honesty (0.375 and 0.000, respectively). Impulsivity emerged from a smattering of traits with each of its respective facets representing a higher proportion from one theoretical trait: Recklessness (0.556; Extraversion), Excitement-seeking (0.333; Extraversion), Cautiousness (0.875; Conscientiousness), and Immoderation (0.857; Neuroticism).
The correlations between the theoretical and empirical facets revealed several noteworthy patterns (Figure 4). For the 16 empirical facets that retained a theoretical label, their correlations were substantial with their theoretical counterpart (
The first-level dimensions with the weakest loadings on their respective second-level dimension—Dominance (Neuroticism), Calmness (Conscientiousness), and Humility (Sociability)—displayed more complex correlational patterns than the other first-level dimensions within their respective domains. Dominance showed a strong negative correlation with Self-consciousness (
The correlation patterns across the first-level dimensions for the novel second-level dimensions—Sociability, Integrity, and Impulsivity—were largely consistent with the proportions observed in Figure 3. Sociability’s first-level dimensions showed moderate-to-large positive correlations for most of the facets in Extraversion and Agreeableness. Attention-seeking had the largest deviations with primarily negative correlations on the Agreeableness facets and especially Morality (
Discussion
The present study developed a comprehensive psychometric network framework called Taxonomic Graph Analysis (TGA) to estimate hierarchical structures from the bottom-up. TGA offers a robust framework that addresses key methodological challenges in hierarchical personality assessment, including local independence violations, wording effects, dimensionality assessment, and structural stability, enabling a rigorous bottom-up investigation of psychological constructs without the constraints of traditional top-down assumptions. This framework was applied to a large, U.S.-based dataset (
Overlooking the challenges addressed by TGA, such as local independence violations, wording effects, and the stability of the dimensions, would have resulted in a substantially different hierarchical structure. For example, applying the standard EGA approach to the full 300 item set would have resulted in 61 first-level communities with some representing split dimensions (Flores-Kanter et al., 2021; Wood et al., 1996) or minor dimensions due to redundancy (Ferrando et al., 2022). If local independence violations were handled, but wording effects were not, then some dimensions would split on the semantic polarity of the items (e.g., Empathy split into positively and negatively worded dimensions). After addressing local independence violations and wording effects, 26 additional items still needed to be removed due to structural instability (e.g., weakly loading items or items that loaded substantially on multiple dimensions). These findings highlight the importance of TGA to ensure that the recovered structure is not distorted by methodological artifacts, resulting in a clearer, more interpretable and stable hierarchy.
First-level structure
Applying TGA to the IPIP-NEO allowed items to freely associate and new dimensions to emerge from the bottom-up, with substantial departures from the theoretical structure. Relative to the existing structure, some items formed larger, broader dimensions involving the merging of theoretical facets (e.g., Gregariousness and Friendliness, Anxiety and Vulnerability); other items formed refined, narrower dimensions involving few items from a single theoretical facet (e.g., Honesty from Dutifulness, Calmness from Activity-level). Some items remained in their theoretical facet (e.g., Artistic Interests, Anger); other items formed a facet distinct from the IPIP-NEO facet structure (e.g., Determination, Recklessness). Of the 30 theoretical dimensions, only 16 emerged empirically (53.3%), with the proportion of theoretical items composing them varying considerably (from 0.333 to 1;
These results reiterate the challenges of ensuring that constructs remain homogeneous as item pools grow in the face of “bloated specifics” or “cheating by repeating” semantic variations (Cattell & Tsujioka, 1964; Reise et al., 2018). Additionally, conventional psychometric practices tend to analyze items in silos, assuming content composition is homogenous and overlooking much of the complexity in high-dimensional data (Achenbach, 2021; Condon et al., 2020). The identification of these cross-facet and cross-domain assignments can substantively inform researchers as to why certain facets and domains in the IPIP-NEO tend to be correlated (e.g., Openness to Experience and Agreeableness; Lawn et al., 2023), and why outcomes related to these theoretical dimensions are related to multiple different facets and traits (Mõttus, 2016). These results are particularly relevant in relation to the widespread practice of performing theoretical parceling (i.e., creating facet scores based on theory) to explore the second-order structure of personality (e.g., Costa & McCrae, 1995; Sanz-García et al., 2024). Although parceling can be advantageous under certain scenarios (Little et al., 2013), it can obscure, rather than clarify, the structure of the data if the complete item-level structure (analyzed simultaneously) has not been established empirically (Bandalos, 2002; Little et al., 2013; Marsh et al., 2013).
Second-level structure
At the second level, there were six dimensions identified that ranged from nearly identical to theory (Openness to Experience) to mostly a reorganization of content (Conscientiousness and Neuroticism) to the mixing of Extraversion and Agreeableness (Sociability; Blain et al., 2023) to the emergence of relatively novel dimensions not identified in the Big Five model (Integrity and Impulsivity; Laginess, 2016; Whiteside & Lynam, 2001). Only half of the second-level dimensions had content that represented the majority of one theoretical trait domain (i.e., Openness to Experience, Conscientiousness, and Neuroticism).
In the case of the Openness to Experience dimension, most empirical facets were consistent with their theoretical counterpart. The only notable change was that the Emotionality facet moved to Neuroticism. The Neuroticism dimension also merged two theoretical facets, Anxiety and Vulnerability, into a single dimension (Anxiety), and had all items belonging to the Depression facet removed completely due to local dependence and low stability. A novel dimension of Dominance emerged that loaded weakly, reflecting a confrontational social orientation (a mix of Extraversion and Agreeableness facets). Although not explicitly defined within the IPIP-NEO, Dominance has frequently appeared as a component of personality and group dynamics (Anderson & Kilduff, 2009). The empirical Conscientiousness dimension was largely composed of its theoretical items, however, there were substantial changes in their organization. Achievement-striving split into two dimensions reflecting Determination (goal-pursuit) and Work Ethic (DeYoung, 2015; Jayawickreme et al., 2019; Kanfer et al., 2017). A novel but weakly (negative) loading facet of Calmness was identified indicating a general penchant toward an easy-going and passive pace of life. The Self-efficacy and Self-discipline dimensions that emerged were partial representations of their theoretical facets while Orderliness was composed completely of its original items plus individual items from several other Conscientiousness facets.
The remaining three second-level dimensions—Sociability, Integrity, and Impulsivity—represent major departures from the traditional IPIP-NEO structure and FFM framework. Sociability represented a mix of Extraversion and Agreeableness, capturing the affiliative content (Gregariousness) that broadly characterizes Extraversion and the prosocial elements of Agreeableness (Empathy and Trust). The Extraversion side of Sociability also captured positive affect (Cheerfulness), resonating with the Enthusiasm aspect of the Big Five Aspect Scale; the Agreeableness side of Sociability consisted of Empathy, resonating with the Compassion aspect of the Big Five Aspect Scale (DeYoung et al., 2007). This finding corroborates recent suggestions of an interstitial trait of Affiliation, which blends these two aspects (Blain et al., 2023) as well as the affiliative role of empathy in interpersonal relationships (Ringwald & Wright, 2021). The label Sociability was selected to capture broader social engagement characteristics (e.g., Attention-seeking, Trust) that extend beyond its affiliative content.
Our findings establish Integrity as a distinct personality dimension that integrates elements from the theoretical facets of Agreeableness (Morality) and Conscientiousness (Dutifulness; Laginess, 2016). Integrity aligned primarily with the Honesty content of HEXACO’s Honesty-Humility factor, with an explicit dimension of Honesty and the dimensions of Manipulativeness and Fairness corresponding to HEXACO’s theoretical facets of Sincerity and Fairness, respectively (Ashton & Lee, 2007). Although Integrity overlaps conceptually with Honesty-Humility, the theoretical Modesty content aligned more closely with Sociability, reinforcing the idea that social self-presentation tendencies operate independently from moral character (Hart et al., 2023). Integrity captures moral or socially normative behaviors (Honesty and Fairness) versus more antisocial behaviors (Manipulativeness) associated with the Dark Triad traits of Narcissism, Machiavellianism, and Psychopathy (Howard & Manix, 2022; Howard & Van Zandt, 2020; Paulhus & Williams, 2002) and the Dark Factor of Personality (Moshagen et al., 2018). This association is further underscored by research on workplace psychopathy, which highlights dishonesty and manipulation as key drivers of unethical decision-making and counterproductive behaviors (Hart et al., 2023; Smith & Lilienfeld, 2013). Importantly, by emphasizing manipulation, this conception of Integrity extends beyond merely telling the truth or following rules—it represents the rejection of deceptive, self-serving behaviors (Miller & Schlenker, 2011).
The final second-level dimension identified was Impulsivity, which was defined in earlier frameworks as a component of Psychoticism (Eysenck et al., 1985) and more recently operationalized as a heterogeneous mix of traits often associated with facets of Conscientiousness, Extraversion, and Neuroticism (Whiteside & Lynam, 2001; Zuckerman et al., 1993). The theoretical dimensions of Impulsivity (Whiteside & Lynam, 2001) correspond to specific empirical facets where Premeditation is consistent with the facet of Cautiousness, Sensation-Seeking aligns with Excitement-Seeking and Recklessness, and Urgency is associated with Immoderation, supporting the view that Impulsivity is a multifaceted trait (de Vries et al., 2009; Sharma et al., 2014). This placement in the IPIP-NEO hierarchy raises questions about whether Impulsivity should be conceptualized as a broad trait domain rather than as a subordinate facet within existing frameworks (DeYoung & Rueter, 2016) and whether it may be part of a unique system that works in conjunction with other personality traits (Mullins-Sweatt et al., 2019).
Third-level structure
At the third level, three dimensions emerged that appeared to resemble conceptualizations of the Big Two or Three meta-traits (De Raad et al., 2014; DeYoung, 2006, 2015; Digman, 1997; Saucier et al., 2014). Two of the dimensions retained the labels of Stability and Plasticity (DeYoung, 2006) because they closely resembled the original concepts, despite Agreeableness not emerging as a dimension and its content being redistributed to different traits. In the reorganization of the IPIP-NEO components, Stability (Conscientiousness and Neuroticism) and Plasticity (Sociability and Openness to Experience), appear to align with their theoretical interpretation as an adaptive system of purposeful behavior reflecting motivational control in the pursuit of goals (Conscientiousness) that emerges from low emotional volatility (Neuroticism), and an orientation toward exploring novel internal (Openness to Experience) and external (Sociability) states in the pursuit of goals, respectively (DeYoung, 2006, 2015; Fleeson & Jayawickreme, 2015).
The final third-level dimension, Disinhibition, is a novel addition to the IPIP-NEO and Big Few personality hierarchies, and is composed of Integrity and Impulsivity, representing a broad dispositional tendency characterized by reduced behavioral and emotional regulation, integrating aspects of both externalizing and self-regulatory processes (Mullins-Sweatt et al., 2019). This structure aligns with prior hierarchical models of personality that have consistently identified Disinhibition versus Constraint as a major superordinate dimension spanning normal and maladaptive traits (De Raad et al., 2014; Eysenck et al., 1985; Markon et al., 2005). The integration of Impulsivity and Integrity under this overarching construct reflects a balance between a propensity for rash, sensation-seeking behavior and a susceptibility to ethical and self-control failures, suggesting that Disinhibition encompasses not only impulsive action but also a broader failure to regulate behavior in accordance with internalized social and moral standards (Joyner et al., 2021). This conceptualization is further supported by meta-analytic findings indicating that Disinhibition is closely tied to Conscientiousness but also incorporates elements of low Agreeableness, reinforcing its role as a dimension spanning externalizing and self-regulatory traits (Sharma et al., 2014). Additionally, recent research has emphasized that Disinhibition is an important predictor of a range of maladaptive behaviors, including risk-taking, rule-breaking, and interpersonal insensitivity (Mullins-Sweatt et al., 2019; Ro et al., 2023).
Despite mostly maladaptive connotations, moderate levels of Disinhibition may be adaptive for goal disengagement, acting as a switch that allows someone to stop old, ineffective strategies (Stability) and start new ones (Plasticity) in the pursuit of long-term goals (Clark & Watson, 2008). As people navigate environments in pursuit of their goals, they need to dynamically identify and employ strategies to overcome obstacles that challenge their goal pursuit (DeYoung, 2015; Wrosch et al., 2003). Disinhibition may be an important component of self-regulatory or cybernetic systems of motivated behavior (Carver & White, 1994; Elliot & Thrash, 2002; McNaughton & Gray, 2000; Higgins & Cornwell, 2016; Monni et al., 2020), such that an adaptive state of Disinhibition may be beneficial in situations where normative behaviors are no longer beneficial, allowing people to disengage from current strategies and search for new ones (e.g., Asch’s social conformity experiments, Milgram’s obedience to authority studies, and the Bystander effect; Hirsh et al., 2010; van den Bos et al., 2011). The convergence of Integrity and Impulsivity within this framework suggests that Disinhibition is not merely a reflection of momentary impulse control failures but rather a more pervasive meta-trait that influences behavioral regulation across multiple domains.
Limitations
Although the components of TGA have been thoroughly vetted through simulation studies, some with item pools as large as 180 items (Jiménez et al., 2023), none of the applications so far have included item pools and structures as large as that of the IPIP-NEO. Future simulation studies should evaluate the TGA methods under similar conditions. Our results are also based on a single personality inventory, risking results that have a mono-operation bias (Gallagher et al., 2020). Although the IPIP-NEO is a broad, open-access inventory based on the widely used NEO-PI-R (Costa & McCrae, 2008) and IPIP (Goldberg et al., 2006), future research should continue to investigate other self-report inventories and adjective-based approaches to evaluate item pools beyond those used in this study (including Big Few alternatives; Feher & Vernon, 2021). Similarly, researchers should use TGA in the exploration and evaluation of other hierarchical constructs related to personality, such as intelligence, attitudes, and psychopathology. Aside from the inventory, our data were restricted to a single country (U.S.A.), which may limit the generalizability of the results. Future work should continue contributing to open-source personality data so that researchers and practitioners have access to large, representative datasets that comprehensively sample the content space of personality so that researchers can continue exploring its structure.
Conclusion
The taxonomic structure of personality is the foundation on which subsequent prediction and explanatory models of personality are built (Baumert et al., 2017; Mõttus et al., 2020). Although most contemporary theories of personality start with the Big Few and work down, more recent calls have emphasized the need for more bottom-up, exploratory approaches to validate item-level structures that are often neglected in theoretical-driven approaches (e.g., Condon et al., 2020). To date, a core constraint on such analyses has been the limited availability of validated statistical methods to evaluate complex, hierarchical structures in this manner. This study introduces a promising statistical framework, TGA, that has the capability to capture the full complexity of personality and build from the bottom-up.
Supplemental Material
Supplemental Material - Revisiting the IPIP-NEO personality hierarchy with taxonomic graph analysis
Supplemental Material for Revisiting the IPIP-NEO personality hierarchy with taxonomic graph analysis by Andrew Samo, Luis Eduardo Garrido, Francisco J Abad, Hudson Golino, Samuel T McAbee, and Alexander P Christensen in European Journal of Personality.
Supplemental Material
Supplemental Material - Revisiting the IPIP-NEO personality hierarchy with taxonomic graph analysis
Supplemental Material for Revisiting the IPIP-NEO personality hierarchy with taxonomic graph analysis by Andrew Samo, Luis Eduardo Garrido, Francisco J Abad, Hudson Golino, Samuel T McAbee, and Alexander P Christensen in European Journal of Personality.
Footnotes
Acknowledgments
The first author would like to thank Christopher M. Gallagher for the introduction to EGA.
Author contributions
Andrew Samo: Conceptualization, data curation, formal analysis, methodology, validation, visualization, writing—original draft, and writing—review and editing. Luis Eduardo Garrido: Conceptualization, formal analysis, methodology, validation, writing—original draft, and writing—review and editing. Francisco J Abad: Formal analysis, methodology, validation, and writing—review and editing. Hudson Golino: Formal analysis, methodology, software, and writing—review and editing. Samuel T McAbee: Conceptualization, supervision, and writing—review and editing. Alexander P Christensen: Conceptualization, data curation, formal analysis, methodology, resources, software, supervision, validation, visualization, writing—original draft, and writing—review and editing.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
