Abstract
Among the most reproducible findings in the literature on general cognitive ability is the
One challenge arises out of the fact that the
Here, we asked whether a lack of attention to development has limited a comprehensive understanding both of the
The first account,
A second influential account is Cattell’s investment theory (Cattell, 1971). This is based on a division of cognitive abilities into crystallized abilities (knowledge-based) and fluid abilities (flexible skills not dependent on acquired knowledge or skills). The theory is based on a central developmental claim, namely that fluid abilities are invested in order to acquire crystallized abilities. Recent work (Weiland, Barata, & Yoshikawa, 2014) suggests that executive-function scores at the beginning of a preschool year predict improvements in vocabulary performance at the end of the year but not vice versa. Research on a large cross-sectional sample (Valentin Kvist & Gustafsson, 2008) found that the factor structure of general and fluid abilities within and across groups was compatible with investment theory. However, these findings are ambiguous (Valentin Kvist & Gustafsson, 2008), and other researchers found no such effect (Christensen, Batterham, & Mackinnon, 2013), only the reverse pattern (Fuhs & Day, 2011) or an effect only in one cohort (Ferrer & McArdle, 2004). Similarly, Schmidt and Crano (1974) used cross-lagged panel analysis to test investment theory but found evidence that both crystallized and fluid abilities are related over time, concluding that investment theory cannot account for this pattern.
A third developmental account is the
Several challenges preclude strong inferences regarding the best model of cognitive development. First, the studies discussed in the preceding paragraphs drew their samples from various points in the life span, which may be governed by different developmental mechanisms. Second, several reports have relied on statistical techniques such as cross-lagged panel models (Schmidt & Crano, 1974) not ideally suited to study change. Third, other studies have relied on cross-sectional cohorts, which limits the range of inferences that can be made (e.g., Gignac, 2014; Valentin Kvist & Gustafsson, 2008). Most important, although several studies tested specific theories (e.g., Ferrer & McArdle, 2004; Ghisletta & Lindenberger, 2003; McArdle et al., 2002; McArdle et al., 2000), to the best of our knowledge, no study has directly compared these three prominent accounts of development. Our aim in this study was to fill this gap by exploiting innovations in structural equation modeling (McArdle & Hamagami, 2001) that are uniquely suited to directly compare these three accounts. To do this, we exploited data from a large developmental cohort measured on two domain-representative (crystallized and fluid) standardized subtests, Matrix Reasoning and Vocabulary from the second edition of the Wechsler Abbreviated Scale of Intelligence (WASI-II; Wechsler, 2011). Using a latent change score (LCS) framework, we modeled the three theoretical accounts of change in cognitive abilities as three different LCS models.
Method
Sample
We recruited 784 participants (401 female, 383 male; mean age: 19.05 years, range: 14.10–24.99) for the University of Cambridge-University College London Neuroscience in Psychiatry Network (NSPN) cohort. This sample size has been shown to be sufficient to fit moderately complex structural equation models with adequate power (e.g., Wolf, Harrington, Clark, & Miller, 2013). We tested 563 of these participants a second time, on average 1.48 years later (range: 0.98–2.62 years). Those who returned for a second wave did not differ significantly from those who did not return on Time 1 Vocabulary scores,
Measures
Participants were tested using the Matrix Reasoning and Vocabulary subtests from the WASI-II. Matrix Reasoning measures fluid and visual intelligence by means of a series of incomplete visual matrices; participants pick one out of five options that best completes the matrix. The Vocabulary subtest measures participants’ breadth of word knowledge and verbal concepts; examiners present words or concepts orally and ask participants to verbally define and describe them. Both subtests have excellent interrater reliability (
Modeling framework
To tease apart candidate mechanisms of development, we fitted a series of LCS models (Kievit et al., 2017; McArdle & Hamagami, 2001; McArdle et al., 2000). These models conceptualize differences between successive measurements as latent change factors. Crucially, this allowed us to directly model within-subjects changes as a function of structural parameters, which made these models more suitable for our purposes than latent growth curve models (McArdle & Hamagami, 2001). The basic equation of the LCS model specifies the score of person
A key step in the LCS model specification is to set the regression weight β
These change scores were then modeled as perfect indicators of a latent factor of change scores. In cases in which there was only one observed variable, or indicator, per construct, the LCS factor was construed as the difference between these indicators over time. In the absence of coupling, the intercept of the simple LCS model gives approximately identical results as a paired-samples
The self-feedback parameter (β) is thought to reflect a combination of effects, including regression to the mean and a dampening effect induced by an end horizon for rapid development (i.e., individuals reaching their performance ceiling). The coupling parameter (γ) is of special importance for several developmental accounts. It captures whether the change in
We fitted models for

Illustrations of the (a)
Second, investment theory implies that scores in fluid abilities (here indexed by Matrix Reasoning scores) should positively influence the degree of change in crystallized abilities (indexed by Vocabulary scores), such that individuals with greater fluid ability will, on average, improve more in crystallized abilities than peers with lower Matrix Reasoning scores at Time 1. This process was modeled by a single coupling parameter from Matrix Reasoning scores at Time 1 on the Vocabulary change factor at Time 2 (Fig. 1b). Finally, the mutualism model (Fig. 1c) predicts bivariate coupling between both cognitive abilities; specifically, higher starting points in vocabulary would lead to larger gains in matrix reasoning and vice versa. In all models, we added age as a covariate to account for differences in baseline scores but did not include age anywhere else in the model (i.e., we hypothesized that the dynamics of change were fully captured by the change dynamics proposed by each theory).
Model fit and comparison
Models were estimated in the lavaan software package (Version 5.22; Rosseel, 2012) using full information maximum likelihood with robust standard errors to account for missingness and nonnormality. No observations were excluded. We assessed overall model fit via the chi-square test, the root-mean-square error of approximation (RMSEA; acceptable fit: < .08, good fit: < .05), the comparative fit index (CFI; acceptable fit: .95–.97, good fit: > .97), and the standardized root-mean-square residual (SRMR; acceptable fit: .05–.10, good fit: < .05; Schermelleh-Engel, Moosbrugger, & Müller, 2003). We compared the three models in three ways: overall model fit (cf. Schermelleh-Engel et al., 2003), information criteria (viz., Akaike’s information criterion, AIC, and Bayesian information criterion, BIC), and Akaike weights (Wagenmakers & Farrell, 2004), which use differences in AICs to quantify the relative likelihood of a model being the best among the set of competitors, given the data.
Results
Raw scores and descriptive statistics for the Matrix Reasoning and Vocabulary subtests are shown in Table 1, and the association between age and score on each test is shown in Figure 2. Before fitting the models shown in Figure 1, we fitted two univariate LCS models to Vocabulary and Matrix Reasoning scores in order to quantify change within each domain. Both models fitted the data well: Matrix Reasoning: χ2(1) = 3.098,
Raw Scores and Descriptive Statistics for Matrix Reasoning and Vocabulary Scores
Note: The Matrix Reasoning and Vocabulary subtests were taken from the second edition of the Wechsler Abbreviated Scale of Intelligence (Wechsler, 2011).

Scatterplots showing the association between age and score on the Matrix Reasoning subtest (top) and Vocabulary subtest (bottom) of the second edition of the Wechsler Abbreviated Scale of Intelligence (Wechsler, 2011). Lines connect the rescaled scores of those individuals who completed the test at both waves.
Having shown, as expected, a growth in scores in both domains, we next fitted all three models (
In Table 2, we report the fit statistics for each of the three competing models. This comparison suggests that the mutualism model fitted the data best, showing excellent model fit on all indices. The two alternative models (investment and
Fit Statistics for Each of the Three Models
Note: For root-mean-square errors of approximation (RMSEAs), 90% confidence intervals are given in brackets. CFI = comparative fit index; SRMR = standardized root-mean-square residual.

Akaike’s information criterion and Bayesian information criterion (a) and normalized probabilities using Akaike weights (b), for each of the three models.
Having established the superior fit of the mutualism model, we next investigated its estimated parameters in more detail (see Fig. 4; Table S1 in the Supplemental Material available online contains all parameter estimates and 95% confidence intervals). As expected, Matrix Reasoning and Vocabulary scores at Time 1 were positively correlated, and age at first testing predicted scores on both tasks at Time 1. In addition to significant latent change intercepts (i.e., increasing scores), variance of change scores led to a substantial drop in model fit when fixed to 0—Matrix Reasoning: Δχ2(1) = 82.43,

Estimated parameters for the mutualism model. Values in Roman are standardized parameter estimates, and values in italics are unstandardized parameter estimates (with standard errors in parentheses). See Figure 1 for an explanation of the notational system used. Further results are given in Table S1 in the Supplemental Material. Mat = Matrix Reasoning; Voc = Vocabulary; T1 = Time 1; T2 = Time 2.
Using Equation 3 and the estimated parameters of the full mutualism model (Fig. 4), we next visualized the expected change between Time 1 and Time 2. To do this, we created a vector field plot (e.g., McArdle et al., 2000, p. 69) in which each arrow represents a (hypothetical) bivariate score at Time 1 (base of each arrow) and model-implied expected score at Time 2 (end of arrow) across a range of possible scores. Figure 5 shows the vector field plot and highlights regions where the mutualistic effects are easiest to see.

Vector field plot for the mutualism model showing model-implied changes between Time 1 and Time 2. The dots represent the Time 1 Matrix Reasoning and Vocabulary scores of a randomly selected subset of individuals, and each arrow represents a model-implied change between Time 1 (base of arrow) and Time 2 (head of arrow). The horizontal shaded rectangle illustrates the positive effect of higher Vocabulary scores on expected change in Matrix Reasoning scores. The vertical shaded rectangle illustrates that there was a negligible expected Vocabulary improvement for low Matrix Reasoning ability (arrows below 24 on the
Although analytic work (van der Maas et al., 2006) has demonstrated that a
In the three models examined here, we included age as a linear covariate to account for individual differences due to age at Time 1 (we will describe alternative parametrizations of age in the Discussion). This reflects a hypothesis that age affects scores at Time 1 but that all aspects of development over time can be captured within the model. Allowing age to directly predict change scores did not improve model fit, Δχ2(2) = 0.33,
Discussion
In a large (
We can hypothesize several mechanisms to explain the coupling parameters, both direct and indirect. One direct pathway may be that a greater facility with vocabulary and verbal skills allows for swifter, more accurate decomposition of reasoning problems into constituent elements, as well as decreased working memory demands for maintenance of such elements, especially in younger adults. A more indirect pathway, in line with the gene-environment interactions mentioned previously, is that greater vocabulary may be an easily detectable marker of higher cognitive ability, which leads to real-world feedback effects in the form of more academically challenging classes or environments to support perceived ability in a manner that generalizes to other domains. A final, intriguing possibility is that traditionally fluid tasks such as Matrix Reasoning may in fact reflect a hybrid of purely fluid abilities (or learning potential) and more strategic, verbal components akin to crystallized abilities (Kühn & Lindenberger, 2016). This would explain both the life-span trajectories of fluid abilities and the considerable secular gains in fluid abilities in the 20th century (Flynn, 1987).
Our findings suggest a need for a shift away from a narrow focus on desirable cognitive end goals (e.g., adequate performance on abilities such as vocabulary or mathematics) and the incorporation of a simultaneous view across abilities that may have less intrinsic interest but are essential in their capacity to support successful development. For example, skills such as processing speed or working memory may be less important in isolation but may be coupled to other cognitive skills (Kail, 2007), which in turn may affect later life socioeconomic outcomes. In other words, to facilitate early detection and possibly even effective intervention, it may pay off to focus on abilities that have the strongest coupling strengths rather than solely on outcomes that are currently below some desirable threshold. For example, Quinn, Wagner, Petscher, and Lopez (2015) used dynamic models to show that vocabulary was a leading indicator of gains in reading comprehension but not vice versa. Such a finding offers insight into the causal pathways of children with reading difficulties, as well as informing appropriate interventions. Similarly, disruptions to typical development were reported by Ferrer, Shaywitz, Holahan, Marchione, and Shaywitz (2010), who observed that within a subgroup with dyslexia (or “persistently poor readers,” p. 94), the coupling between IQ and reading ability observed in typical groups was absent. This suggests not only a possible mechanism for developmental disorders, but also shows how multivariate longitudinal models can allow for early detection of developmental challenges that are likely to self-reinforce over time.
Although we compared various developmental models and quantified longitudinal coupling, our research has certain limitations. First and foremost, we focused on two cognitive subtests alone, which yielded a relatively simplistic
An additional challenge with repeated measures data is the improvement in test scores due to practice effects, which may inflate developmental gains or attenuate age-related decline (Rabbitt, Diggle, Smith, Holland, & Mc Innes, 2001; Salthouse & Tucker-Drob, 2008). Although, in our sample, practice effects may have led to greater increases in scores between Time 1 and Time 2, it is unlikely that these effects affected our conclusions regarding mutualism. First, such practice effects would lead to an increase in test scores that are a combination of true (developmental) gains and increases due to practice effects (although see Lövdén, Ghisletta, & Lindenberger, 2004, on the interpretation of practice effects). Notably, if one interprets the gains between Time 1 and Time 2 as a combination of “true” gains and practice effects, this would entail an underestimate of the mutualism effect (as the effect size reflects the prediction of the total gains rather than the non-practice-related gains). In principle, a sufficiently large number of time points spaced at unequal retest intervals would allow for a decomposition of retest effects, but both practical difficulties as well as the inherent collinearity of retest occasions with time intervals has proved methodologically challenging (Hoffman, Hofer, & Sliwinski, 2011).
Finally, we observed our effects in adolescents and young adults, which limited the generalizability of our observations to this developmental period alone. We hypothesize that the coupling effects we observed are likely to be stronger earlier in life and the self-feedback parameters weaker, as developmental change in higher cognitive abilities is most rapid during pre- and early adolescence. Considering these effects at the other end of the life span yields several intriguing questions. It is conceivable that mutualism occurs only during early development, with other processes and mechanisms taking over after initial peaks are reached. However, we suggest that studying later life decline from the perspective of mutualism might prove a promising avenue for future work. If dynamic coupling is crucial for maintenance of cognitive abilities in later life, this may explain why declines are often strongly correlated (see Ghisletta & Lindenberger, 2003; Tucker-Drob, 2011, for further exploration of this hypothesis). Using large longitudinal cohorts and similar tests across the entire life span will allow for the investigation of possible “regime changes” within the same cohort.
Future work should study multiwave, multidomain cognitive data using principled model-selection methods to better capture the underlying dynamics of cognitive development. Data of high temporal resolution would allow one to move beyond group-level dynamics of individual differences to the ultimate goal, namely that of estimating individual differences in intraindividual dynamics over time. The investigation of individual coupling parameters across domains and across the life span is likely to yield a wealth of information on cognitive development in health and disease. The recent convergence of novel modeling techniques, large-scale data-gathering ability via tools such as smartphones, and the integration of behavioral data sets with data from neural and genetic sources of evidence together promise to provide new insight into some of the most elusive, yet fundamental, questions in cognitive psychology.
Supplemental Material
Kievit_Supplemental_Material_rev – Supplemental material for Mutualistic Coupling Between Vocabulary and Reasoning Supports Cognitive Development During Late Adolescence and Early Adulthood
Supplemental material, Kievit_Supplemental_Material_rev for Mutualistic Coupling Between Vocabulary and Reasoning Supports Cognitive Development During Late Adolescence and Early Adulthood by Rogier A. Kievit, Ulman Lindenberger, Ian M. Goodyer, Peter B. Jones, Peter Fonagy, Edward T. Bullmore, and Raymond J. Dolan in Psychological Science
Supplemental Material
OpenPracticesDisclosure_Kievit_corrigendum – Supplemental material for Mutualistic Coupling Between Vocabulary and Reasoning Supports Cognitive Development During Late Adolescence and Early Adulthood
Supplemental material, OpenPracticesDisclosure_Kievit_corrigendum for Mutualistic Coupling Between Vocabulary and Reasoning Supports Cognitive Development During Late Adolescence and Early Adulthood by Rogier A. Kievit, Ulman Lindenberger, Ian M. Goodyer, Peter B. Jones, Peter Fonagy, Edward T. Bullmore, and Raymond J. Dolan in Psychological Science
Footnotes
Action Editor
Declaration of Conflicting Interests
Funding
Open Practices
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
