Abstract
Introduction
Study of the mind in the human brain is one of main research topics in cognitive science (Friedenberg & Silverman, 2006), and the purpose of cognitive-scientific study of the mind is to learn mechanisms that underlie cognitive performance (Sternberg, 1986). In pursuit of understanding the brain mechanisms, a comprehensive and efficient assessment tool measuring cognitive abilities is required (Gur et al., 2010) along with identifying and understanding a structure of human cognition (Wasserman, 2019). On the other hand, it is still a challenge to develop a practical assessment tool even if the domains of cognitive ability are assumed to be identified, due to a lack of consensus between theoretical aspects of intelligence and practical aspects of intelligence (Canivez & Youngstrom, 2019). In spite of such a challenge, many tools and scales have been developed on the strength of psychometric theories (Beaujean, 2015; Bruijnen et al., 2020; Caemmerera et al., 2020; Dombrowski et al., 2021; Geisinger, 2019; McDicken et al., 2019). Based on the Cattell-Horn-Carroll (CHC: Bryan & Mayer, 2020; Flanagan et al., 2013; McGrew, 2009; Schneider & McGrew, 2012, 2018) theory of cognitive abilities, this study uniquely demonstrates how an innovative psychometric method, known as an automatic item generation (AIG; Drasgow et al., 2006; Embretson & Yang, 2006; Gierl & Haladyna, 2013; Gierl et al., 2021; Irvine & Kyllonen, 2002), helps to develop a new practical assessment tool of cognitive ability.
Cognitive ability tests are often used as repeated measures in longitudinal study although there is inherent limitation in serial tests, which is called as the “practice effect” (Kaufman, 1990; Temkin et al., 1999). For example, in numerous clinical settings, serial cognitive tests are administered to investigate the effect of treatments on changes of cognitive abilities (Cerulla et al., 2019; Elman et al., 2018; D. M. Jacobs et al., 2017) and to make decisions on disease progress or recovery (Beglinger et al., 2005). Serial cognitive assessments are also essential when the new pharmaceutical treatments for conditions influencing cognition is developed (Beglinger et al., 2005). The similar demand exists in educational settings, particularly, when longitudinal studies are conducted. On the other hand, repeated tests may also result in the practice effect in the educational settings, defined as changes in a person’s test performance on re-testing (Hausknecht et al., 2007), due to examinees’ memory and familiarity of items.
Although individually administered tests in the cognitive assessment tools provide numerous information on examinees, group-based tests have received considerable attention in both clinical and educational settings, especially in diagnostic tests for screening purposes. Clinicians continue to consider the administration of group tests because of short clinical test intervals to screen patient’s current ability (Sparrow & Davis, 2000). In educational settings, individually administered ability tests have some shortcomings for talent identification even though they are very helpful for diagnostic purposes (Lohman & Gambrell, 2012). In other words, the issue of equity is raised because the expensive cost of individualized assessments limits the opportunities of children who can be assessed. Only those who can afford to pay for tests can be provided the opportunities of testing or retesting (Renzulli, 2005). The nonverbal battery of the Cognitive Abilities Test (CogAT; Lohman, 2011) and the Naglieri Nonverbal Ability Test (NNAT; Naglieri, 2008) administered as a group test have been developed for the identification of gifted students in educational settings, but the CogAT nonverbal test has an issue on practice effects (Lohman & Gambrell, 2012). Thus, there has been a demand on developing a group-based measure of nonverbal cognitive ability to reduce the practice effects.
To address the issues of practice effects and group administration in a measure of cognitive ability, we have developed a new measure of cognitive ability using automatic item generation (AIG; Drasgow et al., 2006; Embretson & Yang, 2006; Gierl & Haladyna, 2013; Irvine & Kyllonen, 2002). AIG is a method of producing computerized item models to generate item instances which have similar psychometric properties such as item difficulty. Therefore, the approach is greatly beneficial to a removal of the practice effect. Compared to traditional (manual and individual) item writing process, the AIG-based method has been known as an effective and efficient assessment development process (Wainer, 2002). Furthermore, AIG also provides other possible benefits such as increment of test security by minimizing item exposure, reduction of the unit cost of assessment item, and increment of test-retest reliability in a situation of repeated measures because it can efficiently generate isomorphic items (e.g., at the similar level of difficulty) using item models (Choi, 2018; Wainer, 2002).
In addition, we also applied the Cattell-Horn-Carroll theory of cognitive abilities which is well-known as the comprehensive and integrated model of cognitive abilities by greatly specifying the range of cognitive abilities. Accordingly, two-fold of the current study objectives is (1) to demonstrate how the newly developed nonverbal cognitive ability test is developed using AIG and (2) to examine its psychometric properties. To meet the objectives, we hypothesize that the new measure of cognitive ability based on the CHC theory holds validity and reliability via empirical data analysis.
Method
Cognitive Domains in the Measure of Cognitive Ability
To address the demand of group-administrated nonverbal cognitive measures, we have developed a new measure of cognitive ability (called as MOCA) including item models to measure fluid reasoning (Gf) and visual processing (Gv) in the CHC theory. The notion of an item model in AIG restructures the process of establishing guidelines and standards in traditional item writing by using computer coding. AIG allows item/test writers to develop constructs to be measured a priori utilizing the item/test design principles (Bormuth, 1970; Mislevy, 2018; Thorndike, 1971), which is important because cognitive domains have been identified over the century and its structure/atlas of domains have been understandable and acceptable in the CHC theory.
In the MOCA, the fluid reasoning (Gf) refers to the ability to solve unfamiliar problems without relying on previous knowledge, and the Gf of the MOCA was also specified with three sub-domains: (1) inductive reasoning (IR), (2) general sequential reasoning (SR) known as deductive reasoning, and (3) quantitative reasoning (QR). The visual processing (Gv) refers to the ability to use visual imagery to solve certain problem, and the Gv of the MOCA was measured by examining the capabilities to use simulated mental imagery. For example, the MOCA measures Gv by asking examinees to simulate how the movement of one figure affects another, or how figures might look from a different angle, which will be discussed in detail in the Results section.
Item Models Created in the Measure of Cognitive Ability (MOCA)
Initially, we developed 100 item models covering the two broad abilities, Gf and Gv, which generate item instances measuring four sub-domains: three sub-domains, inductive reasoning (IR), sequential reasoning (SR), and quantitative reasoning (QR) within fluid reasoning (Gf) and one sub-domain, visualization (Vz) within visual processing (Gv). Then, we made two forms of the MOCA test consisting of 36 item models per form. Since we used the same 18 item models in both forms as common item models, the total number of item models used was 54 out of 100 item models. Table 1 summarizes the structure of each form with the difficulty levels of item models. In both forms, there are 36 unique item models (18 item models for each form) and 18 common item models. In terms of cognitive domains, there are 45 item models in Gf and 9 item models in Gv. All of item models were created by using CAFA AIG software (Choi & Zhang, 2019).
Form of the MOCA Based on the Item Models in the Sub-Domains.
Empirical Data
Data used in this study approved by the university IRB (with both students’ and parents’ consents) were obtained from the MOCA administered over two time points with 2-week interval. In this nonverbal test, teachers only helped students access their accounts and take the MOCA online. A total of 1,198 participants from fifth graders (
Psychometric Properties of the MOCA
Confirmatory factor analysis
Based on the structure of the cognitive domains described in the CHC theory, we conducted a confirmatory factor analysis (CFA) to examine if the structure is supported by the empirical data. We, thus, considered two CFAs according to two different datasets obtained from both forms, separately. There are 602 students in the first data obtained from the first form (form A) and 596 students from the second form (form B) at Time 1.
Model evaluation and estimation
In CFA, we applied weighted least square estimates with chi-square test of exact-fit hypothesis test in Mplus 8.4 (Muthén & Muthén, 1998-2017 in the Supplemental Material). We then evaluated the hypothesized models using approximate fit indices including root mean square error of approximation (RMSEA), comparative fit index (CFI), and standardized root mean square residuals (SRMR) with the following criteria for
Factor reliability
As a psychometric property of items indicating the degree to which factor scores are precise, the reliabilities of factors were examined using the factor reliability (
where
Construct validity
As another psychometric evaluation, we examined construct validity obtaining convergent validity and discriminant validity in addition to showing that factor loadings are greater than .45 (Brown, 2006, 2015). These indexes of convergent and discriminant validities provide how the underlying constructs are measured accurately and separately to each other, respectively. As noted in Brown (2015), the CFA results provide evidence how strongly indicators of a latent variable are interrelated (convergent validity) and how lowly latent variables are correlated (discriminant validity). Convergent validity was provided by obtaining factor reliabilities are greater than .70 (Nunnally & Bernstein, 1994) and discriminant validity was provided by obtaining factor correlations are lower than .80 (Brown, 2015).
Results
Measure of Cognitive Ability (MOCA)
Item models in inductive reasoning (IR)
In the MOCA, as the key sub-domain of fluid reasoning (Gf), IR is defined as “the ability to discern rules and patterns in what is observed.” (Schneider & Newman, 2015). IR ability is measured in the two sub-tests named as figure classification (FC) and figure matrices (FM) depicted in Figure 1. One item instance from a FC item model was shown in Figure 2a with a set of figures as a

Cognitive domains of two broad abilities (Gf and Gv) and four narrow abilities (IR, SR, QR, and Vz) in the MOCA.

Two types of item instances for the inductive reasoning with the fluid reasoning (Gf): (a) an item in figure classification and (b) an item in figure matrices.

Cognitive model of an item (Figure 2a) in figure classification.
In Figure 2b, we demonstrated an item instance from a FM item model. Examinees are asked to find the proper objects in the lower right-hand corner. To solve the problem, examinees must induce the following relations: (1) shape changes from row to row, (2) two different objects are moving toward to the center from column to column, and (3) the two objects are laid at the center. Those relations can be induced from A1 and A2 in Figure 2b. If examinees understood the three relations, they should select the proper image for A3 from the options listed that are generated from the alternatives in the item model. As a category of cognition of figural relations, P. I. Jacobs and Vandeventer (1972) pointed out that this kind of item models for IR ability is able to be generated using a principle of cognitive test items and also allows to reduce practice effects by changing patterns and objects. The procedure can be found via a flowchart in Figure 4.

Cognitive model of an item (Figure 2b) in figure matrices.
Item models in quantitative reasoning (QR)
QR referring to “the ability to reason with quantities, mathematical relations, and operators” (Schneider & McGrew, 2018) is assessed in the two other sub-tests called Number Analogies (NA) and Number Puzzles (NP) in the MOCA. For example, as shown in Figure 5a for the NA sub-test, examinees should find out the relation between the two given pairs in A1 and A2 and then they are asked to select the most appropriate number for the question mark (?) in A from the options given below. In the NP item presented in Figure 5b, system of equations with variables denoted by the question mark (?) and the diamond (◊) is provided. Different from items discussed before, examinees are asked to solve an equation in A1 first, by making both sides of the equation equal. After identifying ◊ (=16), examinees apply to the equation A that is 48 ÷ ? = 16. Next, they are required to select the number from the possible options to make both sides of the equation A equal. The item within the NP sub-test is different from the other items in NA, FM, and FC in terms of the procedure that they need to apply backward approach instead of forward approach.

Two types of item instances for the quantitative reasoning with the fluid reasoning (Gf): (a) an item in number analogies and (b) an item in number puzzle.
Item models in sequential reasoning (SR)
SR defined as “the ability to reason logically using known premises and principles” (Schneider & McGrew, 2012) is assessed in the Number Series (NS) in the MOCA where a series of numbers is presented. To solve problems in this sub-test of NS, examinees are asked to select which number from the options below should follow. In this test, examinees should find out the relation between the numbers in the series of number. The example of the NS is shown in Figure 6a.

One item instance for the number series in the fluid reasoning (Gf) and one item instance for the visualization in the visual processing (Gv): (a) an item in number series and (b) an item in visualization.
Item models in visualization (Vz)
Vz within the visual processing (VP) is defined as “the ability to perceive complex patterns and mentally simulate how they might look when transformed (e.g., rotated, twisted, inverted, changed in size, partially obscured).” (Schneider & McGrew, 2018, p. 126). This is the key ability of Gv, and Vz is measured in the sub-test, Paper Folding (PF), where a series of figures in A is presented as shown in Figure 6b. The series of figures are given in a paper of a square or ⌂-shape of a paper that will be further folded with holes cut in circle, triangle, or clover shapes, etc. The one-headed arrow indicates how the paper will be folded. In these situations, examinees must visualize what the figure A looks like in their mind when the paper A is unfolded. Based on the paper folding, examinees should select the most proper figure from the possible answers.
Reliabilities and Validity Based on CFA of MOCA
Model evaluation
We fit the second order factor model depicted in Figure 7 (adding items in Figure 1) into two datasets collected from two forms of the MOCA at Time 1. The hierarchical structure was considered to distinguish two levels: the sub-domain levels of IR, QR, SR, and Vz nested within Gf (of IR, QR, and SR) and Gv (of Vz) where Gv serves as a phantom construct of Vz. We obtain good fit indexes for both datasets such as RMSEA = 0.032 and 0.026, CFI = 0.969 and 0.977, and SRMR = 0.072 and 0.062 for Form A and Form B, respectively.

Second-order factor structure for the cognitive domains in the MOCA.
Factor reliability and validity
We examined the five factor reliabilities for IR (.938 and .877), SR (.899 and .934), QR (.962 and .960), VZ (.916 and .923), and Gf (.808 and .917) for Form A and Form B, respectively, where Gv is same as VZ. These factor reliabilities also provided convergent validities with higher factor loadings on all of four sub-domains that were greater than .45 except item 2 of QR in Form B (.429), item 27 of IR in Form A (.317), and IR of Gf in Form A (.350). Factor correlations between Gf and Gv were .685 for Form A and .642 for Form B that are lower than .80, which provides discriminant validities. Thus, based on CFA results of From A and Form B, construct validity of the MOCA was confirmed.
Item Analyses Based on Two Parameter Logistic (2PL) Model for the MOCA
Item parameter estimates of Form A and Form B
The hierarchical factor structure of the MOCA was confirmed via CFA. Now, we examined conformity of difficulty level between a priori difficulty levels of item models from AIG and difficulty levels of item instances estimated via item response theory (IRT). When selecting 54 item models for two MOCA tests, Form A and Form B, we considered three classes in terms of difficulty: low, medium, and high. Due to the complexity of procedure for answering cognitive questions shown at Figures 3 and 4, the three classes may not give any criteria or cut-off in terms of estimated difficulties via IRT. Based on the comparison among three classes, we found that items in the low class have significantly lower estimated difficulties than items in the medium or high classes (
Difficulty Parameter Estimates of the MOCA Items Using 2PL IRT Model.
Discriminant Parameter Estimates of the MOCA Items Using 2PL IRT Model.
Test-Retest Reliabilities of Narrow Domains at Form A and Form B.
Test-retest reliability based on two time points
Using the sample of examinees who took both MOCA at Time 1 and Time 2, we examined the test-retest reliabilities using Pearson correlation coefficients based on the narrow ability factor scores of IR, QR, SR, and Vz. All of correlations of narrow abilities between Time 1 and Time 2 were significant (

Correlations of narrow abilities across time points.
Practice Effects
Based on the narrow abilities of IR, QR, SR, and Vz, we examined the mean differences over time using dependent sample
Practice Effects via Dependent Sample
Discussion
The MOCA is not an exhaustive measure of all cognitive domains as depicted in Figure 1 but it measures the four narrow abilities, inductive reasoning (IR), sequential reasoning (SR), quantitative reasoning (QR), and visualization (Vz), nested within two broad abilities, fluid reasoning (Gf) and visual processing (Gv). However, these four domains are essential in educational setting, which is the reason that many other group-administered measures also include of these domains. On the other hand, MOCA was uniquely created by using AIG, which is beneficial in educational setting because it can be used as repeated measures.
MOCA has not only been used in educational settings to understand students’ strengths and weaknesses in terms of cognition but it will also be launched in clinical settings including Atopic Dermatitis school program (Jang et al., 2015). Although Elbin et al. (2019) recently showed that the Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT), the computerized neurocognitive battery, for short-term serial assessment of neurocognitive functioning was suitable in repeated administration, the use of the battery is limited to clinical settings. MOCA also covers broad age groups due to its characteristics of a nonverbal cognitive test.
In this study, MOCA has been validated via psychometrics including item analysis and examining practice effects. Its reliabilities of cognitive domains and the hierarchical structure of six subtests were also validated via a CFA. Its reliability and validity mean more than its verbatim because the properties were drawn from item models of automatic item generation (AIG) instead of item instances. In other words, MOCA developed using AIG was psychometrically sound at the level of item models and also is applicable in intensive longitudinal data analysis utilizing massive item production that is possible due to the isomorphism of item instances generated from item models. Thus, all MOCA assessments for each individual will be different but conform within the same item model set in terms of measuring cognitive ability.
With the psychometric properties, MOCA can be utilized to broader age ranges of students and its capability as a group-administered measure contribute educational equity of taking cognitive ability test. As an AIG-implemented measure, MOCA does not only hold item security so that it can prevent impacts of construct irrelevant variables from practice effects but also eventually provides more economically beneficial tests to students by generating many item instances from item model. As we go through COVID-19 pandemic, educators and parents concern the disparity of student’s academic achievement which is associated with their cognitive ability. MOCA would be a solution to examine the development of student’s cognitive ability.
Limitations
In this study, the number of assessments was limited to two time points. It is needed to examine the practice effects in intensive longitudinal data. In addition, linking in item analysis was ignored because this study mainly focused on its psychometrics of reliabilities and validities instead of scale development. As future research, we will examine the status of cognition by considering the anchor items. Although we considered two forms of the MOCA, it has not been considered as age- and grade-based assessment tools. In educational settings, we will consider tailored MOCA tests in terms of grade level and gifted/special education status. In clinical settings, we will consider other tailored MOCA tests consisting of different sets of cognitive domains. As a new measure of cognitive ability, it is also important to explore the concurrent validity with other cognitive measures. However, MOCA was unique in terms of implementing AIG and thus, the concurrent validity study is not a simple way to measure by examining association with the other cognitive measures but requires adjustments between AIG and the traditional item generation. We plan to study the concurrent validity.
Concluding Remarks
The MOCA, based on CHC theory and AIG method, was developed as a new measure of cognitive ability that fits studies requiring repeated measures of cognition. This psychometrically sound measure will contribute to cognitive science that researchers need to measure one’s cognition by using the MOCA.
Supplemental Material
sj-docx-1-sgo-10.1177_21582440221095016 – Supplemental material for Development of a New Measure of Cognitive Ability Using Automatic Item Generation and Its Psychometric Properties
Supplemental material, sj-docx-1-sgo-10.1177_21582440221095016 for Development of a New Measure of Cognitive Ability Using Automatic Item Generation and Its Psychometric Properties by Ji Hoon Ryoo, Sunhee Park, Hongwook Suh, Jaehwa Choi and Jongkyum Kwon in SAGE Open
Footnotes
Declaration of Conflicting Interests
Funding
Institutional Review Board (IRB)
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
