Abstract
The Woodcock–Johnson V Tests of Cognitive Abilities (WJ V COG; McGrew et al., 2025) is a comprehensive assessment of cognitive abilities for use with individuals aged four through adulthood. The WJ V COG is co-normed with tests of academic skills, oral language, and other cognitive and linguistic abilities related to academic achievement. Unlike previous editions, administration of the WJ V COG is entirely digital. New tests are featured in both the standard and extended test sets. Scoring takes place through the Riverside Insights online platform, and score interpretation is similar to previous editions of the WJ COG.
Specific Description
The WJ V retains Cattell–Horn–Carroll (CHC) based framework with updates to align with current theory (Schneider & McGrew, 2018). The eight broad ability clusters assessed by the WJ V COG are largely consistent with those measured on the WJ IV. However, Auditory Processing (Ga) is no longer included as a broad ability cluster in the COG battery and can be assessed using the Virtual Test Library (VTL). Additionally, Long-Term Retrieval (Glr) has been divided into Long-Term Storage (Gl) and Retrieval Fluency (Gr).
Tests Included in the WJ IV COG and WJ V COG
aIncluded in the WJ V COG but was not included in the WJ IV COG.
Composites
The General Intellectual Ability (GIA), now recommended for use with examinees ages six and above, includes the first eight tests measuring seven CHC broad abilities. The GIA is an unweighted average of these eight tests, a change from previous editions, in which tests were weighted by their theorized influence on general intelligence. To account for their outsized influence, Gc and Gf abilities are represented together in a second test, Verbal Analogies. Semantic Word Retrieval was previously termed “Retrieval Fluency” in the WJ IV Oral-Language battery. The Number Series and Phonological Processing tests are no longer included in the WJ V COG based on concerns that scores were suppressed for examinees with foundational academic skill deficits (LaForte et al., 2025).
As with the WJ IV COG, the Brief Intellectual Ability (BIA) index on the WJ V COG is intended to be a quick estimate or screener for overall cognition and includes three tests combined in an unweighted average. The BIA still includes Oral Vocabulary and Verbal Attention but now includes Matrices rather than Number Series.
The Gf-Gc Composite remains a four-test composite as it was on the WJ IV COG, though it only includes one test that was previously included in the composite on the WJ IV COG, Oral Vocabulary. Analysis-Synthesis is new to the Gf-Gc composite, and Matrices and Verbal Analogies are new to the WJ V COG itself. The examiner’s manual suggests that the Gf-Gc composite is a measure of
Administration
The standard easel administration has been replaced with a digital testing platform for the WJ V (Mather et al., 2025). The test administrator’s system is browser-based and housed on riversideinsights.com; it can be accessed through a laptop or desktop computer. The examinee view is deployed from the examiner device. The test authors recommend that an iPad be used for the examinee, as iPads were used during data collection and it is thought that timed tests may be impacted if a different type of tablet were used. Internet connection is required throughout the duration of the administration, and there is not currently an offline option.
Tests are selected by the examiner prior to launching the test portal. Examiners can create specific test sets and save them for common referral concerns and can rearrange the administration order. Examiners can also add additional tests during administration. After administration has begun, the examiner is presented with test administration instructions, such as allowable prompts, as previously shown in the test easels. Verbal directions are displayed on the examiner’s screen to read aloud. Some tasks include video examples on the examinee’s device.
Scoring System
With digital administration, answers are now captured by the touchscreen device, basal, and ceiling rules are automatically applied, and examinee responses are automatically scored. Basals, however, can be adjusted manually by the examiner. After the ceiling of a test is reached, the examiner is automatically prompted to begin the next test. At the time of this publication, the WJ V does not allow examiners to test-the-limits by administering items that surpass the test ceiling. Upon test completion, an age and grade equivalent are displayed. Standard scores are not provided until a report is run by the examiner.
Interpretation
Interpretation of the WJ V COG continues to allow for the analysis of an individual’s strengths and weaknesses across domains and discrepancy in expected versus actual performance. In the previous version, the methods for this interpretation style were termed variations and comparisons, now the term comparison procedures is used as the umbrella term for all “data-based score predictions” to reduce confusion for users (LaForte et al., 2025). The statistical procedures for making these score predictions remain consistent with previous versions as the authors state they have been refined since their first introduction and are outlined in the technical manual. Consistent with modern practice, the WJ V now provides percentile base rates, to indicate how unexpected a score is for an individual. Notably, the WJ V no longer allows for comparisons by test, as all comparison procedures use cluster scores.
Technical Adequacy
Test Construction
The WJ V was constructed following recommendations from the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014), and Guidelines for Technology-Based Assessment (International Test Commission & Associations of Test Publishers, 2022). The technical manual outlines the steps taken for the user interface/user experience design, content adaptation, and test logic/workflow design. Visual Working Memory and Symbol Inhibition were developed as “digital native” tests.
Like previous editions, the WJ V COG is based on the Rasch model of measurement. Tests used either a dichotomous Rasch model, a Rasch partial-credit model, or a Rasch rating scale model (Symbol Inhibition only). Test items and examinees were placed onto an equal-interval logit scale and transformed to the
Norm Sample
Norm data was collected from 5,837 examinees between early 2022 and 2024. The sample was representative of the 2020 United States Census. The test developers conducted a simulation study to consider four multiple matrix sampling designs. The selected method included subgroups based on geographic region, age, gender, race/ethnicity, and educational level. Therefore, not every examinee in the norm sample was administered every WJ V test, but rather each examinee was assigned to a test set. Examinee scores were differentially weighted to ensure the sampling variable reflected the population. Acknowledging that human ability traits typically are not distributed in a precise normal curve, and instead display positive skew, the test developers used half-norm distributions (above and below the median
Item-Level Analysis
Individual test items were reviewed using several methods. Rasch outputs were considered for the gradient of
Differential item functioning (DIF) was used to examine if subgroups of examinees had more difficulty with specific test items. Subgroups analyses included sex (male and female), race (White and non-White), and ethnicity (Hispanic and Not Hispanic). If an item showed statistically significant DIF contrast between subgroups in any direction, the item was flagged for further review.
Items were reviewed by 13 experts including educators, professors, and practitioners with different areas of specialization. These reviewers examined item content with a specific focus on bias and sensitivity. Test items were only included on the WJ V if the item was not flagged during the bias and sensitivity review and if any observed DIF was determined to be a result of low response rates in a subgroup or unexpected responses based on person-by-item residuals rather than potential source of bias.
Reliability Evidence
To ensure precision, several types of reliability were examined. Marginal item-response theory (IRT) reliability provided information about the consistency of a test. Reliability coefficients were calculated using conditional standard errors of measurement for all tests and age groups of 4–5, 6–9, 10–14, 15–19, 20–49, and 50–80+.
Reliability coefficients for the measured age groups were at or above
Test-retest reliability, which reflects consistency of scores on a test to the same examinee on two occasions, was examined for Semantic Word Retrieval and Phonemic Word Retrieval only, with reliability coefficients at or above 0.85 across ages. Alternative form reliability, which considers the correlation between two forms of a test, was assessed for three of five tests with a Form A and Form B. Reliability coefficients for Number-Pattern Matching and Letter-Pattern Matching were above 0.85 across age groups. Coefficients were slightly lower for Symbol Inhibition, with 0.81 for the age 4–19 sample and 0.78 for the adult sample.
Validity Evidence
A notable strength of the WJ V is the volume of validity studies, exceeding typical test manuals. Analyses confirmed the CHC model’s hierarchy of cognitive abilities, demonstrating developmental changes in both broad and narrow factors across age groups. Data emphasizing the general factor (
The authors presented structural evidence using a split-sample (separate model development and model validation samples) three-stage analysis. Multiple exploratory analysis procedures, such as cluster analysis, exploratory factor analysis, multidimensional scaling, and psychometric network analysis were applied to six age-differentiated model development samples. The synthesis of the four exploratory methods suggested 11 CHC broad abilities. Exploratory model development and evaluation procedures were then used to refine three categories of CHC models (i.e., traditional hierarchical
Criterion-related validity was evidenced by moderate to strong correlations between the GIA and total scores from other cognitive measures (e.g., WISC-V, SB5, DAS-II), and within matched CHC domains (e.g., Gf). Cognitive–achievement correlations aligned with theory (e.g., Gf and math problem-solving). Of note, correlations between the GIA and achievement are lower for the WJ V than the WJ IV. However, the authors expected this change due to removing tests of Ga and Gq from the standard cognitive battery and believe it to be an improvement in effort toward minimizing predictor-outcome contamination (LaForte et al., 2025). Clinical validity was explored using data from individuals with eight different diagnoses, providing further descriptive support. Additional information can be found in the WJ V Technical Abstract available through Riverside Insights.
Summary and Recommendations
The WJ V COG is a modern and clinically valuable assessment tool. Its digital administration is designed to decrease common administration errors, such as violation of basal/ceiling and test-by-pages rules (Ramos et al., 2009; Spenceley et al., 2017), and to facilitate automatic scoring. The updated battery includes five new tests and removal of three. We feel that the removal of Phonological Processing is an improvement over the WJ IV COG for which Ga tests were often challenging for individuals with language-based learning disabilities.
Tests comprising the CHC clusters in the WJ V COG have largely changed compared to the WJ IV COG, though underlying CHC theory remains similar. The authors suggest “extreme caution” when comparing WJ V scores to WJ IV scores for an individual, for this, and a variety of other reasons (Flynn effect, differences in norm group, item content etc.), which some clinicians may find limiting for their practice (LaForte et al., 2025). Though, we feel that these limitations are more general to the nature of intelligence tests, and not necessarily exclusive to the WJ V COG.
Recommended materials for the WJ V COG include a laptop and iPad with internet connection. Test sets can be easily created, saved, and revised based on referral questions. Like any test, administration requires practice to build fluency. However, we feel the Riverside Insights platform is user-friendly and is an improvement in the examiner-examinee interface compared to other cognitive tests that currently offer digital administration.
Some clinicians may find the digital-only option limiting, for example, in settings with unreliable internet or for use with early childhood or aging populations. Given that the GIA is not recommended for children under six, and alongside relatively low reliability estimates for ages 4–5, other assessments are likely better suited to obtain an overall estimate of cognitive functioning for young children. Validity evidence involved traditional approaches using factor analysis, as well as some new techniques like exploratory graph analysis. The Technical Manual features an impressive number of techniques and studies used to consider item fairness, reliability, internal validity, and external validity. Overall, we believe the robust psychometric evidence of the WJ V COG alongside its engaging digital design makes it an exciting option for professionals in clinical and education settings.
Footnotes
Ethical Considerations
IRB or ethical board approval was not sought or needed based on the article type (Test Review) and because this manuscript does not include research on human subjects.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
