Sage Journals: Discover world-class research

Abstract

The Woodcock–Johnson V Tests of Cognitive Abilities (WJ V COG), published in February 2025, offers the latest edition of the WJ family of tests alongside tests of academic achievement and oral language. The WJ V COG has changed substantially from previous editions regarding administration, which is now entirely digital. Administration and scoring are housed within the Riverside Insight’s online platform. The test battery features several changes, such as the addition of five new tests and the removal of three tests, including measures of Auditory Processing (Ga). The WJ V COG maintains a CHC-based theoretical framework, although updated to align with current theory. Psychometric evidence, including validity, reliability, and item-level analysis, is robust. Evidence is less convincing for children under six. The assessment was co-normed with measures of academic achievement, and the norm sample was gathered post-COVID. Although some may find requirements of digital administration limiting, the WJ V COG offers an engaging and psychometrically sound option for the assessment of intelligence.

Keywords

assessment intelligence cognitive digital administration test review

The Woodcock–Johnson V Tests of Cognitive Abilities (WJ V COG; McGrew et al., 2025) is a comprehensive assessment of cognitive abilities for use with individuals aged four through adulthood. The WJ V COG is co-normed with tests of academic skills, oral language, and other cognitive and linguistic abilities related to academic achievement. Unlike previous editions, administration of the WJ V COG is entirely digital. New tests are featured in both the standard and extended test sets. Scoring takes place through the Riverside Insights online platform, and score interpretation is similar to previous editions of the WJ COG.

Specific Description

The WJ V retains Cattell–Horn–Carroll (CHC) based framework with updates to align with current theory (Schneider & McGrew, 2018). The eight broad ability clusters assessed by the WJ V COG are largely consistent with those measured on the WJ IV. However, Auditory Processing (Ga) is no longer included as a broad ability cluster in the COG battery and can be assessed using the Virtual Test Library (VTL). Additionally, Long-Term Retrieval (Glr) has been divided into Long-Term Storage (Gl) and Retrieval Fluency (Gr).

The WJ V COG consists of 20 tests, including 14 tests in the Standard Set and six in the Extended Set. Three tests from the WJ IV COG no longer appear in the current edition. Pair Cancellation was replaced with Symbol Inhibition, Picture Recognition was replaced with Visual Working Memory, and Phonological Awareness was replaced by two tests—Phonemic Word Retrieval and Sound Substitution; however, Sound Substitution has been moved to the VTL. The Visualization subtest was split into two full length tests for the WJ V COG—Spatial Relations and Block Rotation, which returns to the test structure of the WJ III and WJ III NU. Word Fluency, a subsection of the removed Phonological Processing subtest, was revised to a full-length test now called Phonemic Word Retrieval on the WJ V COG. Additionally, there are five new tests, including Matrices, Verbal Analogies, Story Comprehension, Visual Working Memory, and Symbol Inhibition. Table 1 presents a side-by-side comparison of WJ IV COG and WJ V COG tests listed in order. CHC broad ability is listed next to each test, and italicized tests are from the extended battery.

Table 1.

Tests Included in the WJ IV COG and WJ V COG

WJ IV COG	WJ V COG
Oral Vocabulary (Gc)	Oral Vocabulary (Gc)
Number Series (Gf)	Matrices (Gf)^a
Verbal Attention (Gwm)	Spatial Relations (Gv)
Letter-Pattern Matching (Gs)	Story Recall (Gl)
Phonological Processing (Ga)	Semantic Word Retrieval (Gr)
Story Recall (Glr)	Verbal Attention (Gwm)
Visualization (Gv)	Number-Pattern Matching (Gs)
General Information (Gc)	Verbal Analogies (Gc/Gf)^a
Concept Formation (Gf)	Analysis-Synthesis (Gf)
Numbers Reversed (Gwm)	Block Rotation (Gv)
Number-Pattern Matching (Gs)	Story Comprehension (Gl)^a
Nonword Repetition (Ga)	Phonemic Word Retrieval (Gr)
Visual-Auditory Learning (Glr)	Numbers Reversed (Gwm)
Picture Recognition (Gv)	Letter-Pattern Matching (Gs)
Analysis-Synthesis (Gf)	General Information (Gc)
Object-Number Sequencing (Gwm)	Concept Formation (Gf)
Pair Cancellation (Gs)	Number Series (Gq/Gf)
Memory for Words (Gwm)	Visual-Auditory Learning (Gv/Gf)
	Visual Working Memory (Gwm/Gv) ^a
	Symbol Inhibition (Gs) ^a

^aIncluded in the WJ V COG but was not included in the WJ IV COG. Note. Broad CHC factors listed for the WJ IV COG reflect CHC theory at the time of publication.

Composites

The General Intellectual Ability (GIA), now recommended for use with examinees ages six and above, includes the first eight tests measuring seven CHC broad abilities. The GIA is an unweighted average of these eight tests, a change from previous editions, in which tests were weighted by their theorized influence on general intelligence. To account for their outsized influence, Gc and Gf abilities are represented together in a second test, Verbal Analogies. Semantic Word Retrieval was previously termed “Retrieval Fluency” in the WJ IV Oral-Language battery. The Number Series and Phonological Processing tests are no longer included in the WJ V COG based on concerns that scores were suppressed for examinees with foundational academic skill deficits (LaForte et al., 2025).

As with the WJ IV COG, the Brief Intellectual Ability (BIA) index on the WJ V COG is intended to be a quick estimate or screener for overall cognition and includes three tests combined in an unweighted average. The BIA still includes Oral Vocabulary and Verbal Attention but now includes Matrices rather than Number Series.

The Gf-Gc Composite remains a four-test composite as it was on the WJ IV COG, though it only includes one test that was previously included in the composite on the WJ IV COG, Oral Vocabulary. Analysis-Synthesis is new to the Gf-Gc composite, and Matrices and Verbal Analogies are new to the WJ V COG itself. The examiner’s manual suggests that the Gf-Gc composite is a measure of g that is not confounded by more automatic tasks such as cognitive efficiency and processing, and memory. It is explicitly recommended for use in Specific Learning Disability (SLD) identification, as the authors suggest that it removes such processes that are potentially underlying the disability (LaForte et al., 2025; Schrank et al., 2015).

Administration

The standard easel administration has been replaced with a digital testing platform for the WJ V (Mather et al., 2025). The test administrator’s system is browser-based and housed on riversideinsights.com; it can be accessed through a laptop or desktop computer. The examinee view is deployed from the examiner device. The test authors recommend that an iPad be used for the examinee, as iPads were used during data collection and it is thought that timed tests may be impacted if a different type of tablet were used. Internet connection is required throughout the duration of the administration, and there is not currently an offline option.

Tests are selected by the examiner prior to launching the test portal. Examiners can create specific test sets and save them for common referral concerns and can rearrange the administration order. Examiners can also add additional tests during administration. After administration has begun, the examiner is presented with test administration instructions, such as allowable prompts, as previously shown in the test easels. Verbal directions are displayed on the examiner’s screen to read aloud. Some tasks include video examples on the examinee’s device.

Scoring System

With digital administration, answers are now captured by the touchscreen device, basal, and ceiling rules are automatically applied, and examinee responses are automatically scored. Basals, however, can be adjusted manually by the examiner. After the ceiling of a test is reached, the examiner is automatically prompted to begin the next test. At the time of this publication, the WJ V does not allow examiners to test-the-limits by administering items that surpass the test ceiling. Upon test completion, an age and grade equivalent are displayed. Standard scores are not provided until a report is run by the examiner.

Interpretation

Interpretation of the WJ V COG continues to allow for the analysis of an individual’s strengths and weaknesses across domains and discrepancy in expected versus actual performance. In the previous version, the methods for this interpretation style were termed variations and comparisons, now the term comparison procedures is used as the umbrella term for all “data-based score predictions” to reduce confusion for users (LaForte et al., 2025). The statistical procedures for making these score predictions remain consistent with previous versions as the authors state they have been refined since their first introduction and are outlined in the technical manual. Consistent with modern practice, the WJ V now provides percentile base rates, to indicate how unexpected a score is for an individual. Notably, the WJ V no longer allows for comparisons by test, as all comparison procedures use cluster scores.

Technical Adequacy

Test Construction

The WJ V was constructed following recommendations from the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014), and Guidelines for Technology-Based Assessment (International Test Commission & Associations of Test Publishers, 2022). The technical manual outlines the steps taken for the user interface/user experience design, content adaptation, and test logic/workflow design. Visual Working Memory and Symbol Inhibition were developed as “digital native” tests.

Like previous editions, the WJ V COG is based on the Rasch model of measurement. Tests used either a dichotomous Rasch model, a Rasch partial-credit model, or a Rasch rating scale model (Symbol Inhibition only). Test items and examinees were placed onto an equal-interval logit scale and transformed to the W scale, which allows for creation of a W-Difference score. On the WJ, the W-Difference score is the basis for standard scores, percentiles, and relative proficiency index (RPI), which remains a unique feature with clinical utility.

Norm Sample

Norm data was collected from 5,837 examinees between early 2022 and 2024. The sample was representative of the 2020 United States Census. The test developers conducted a simulation study to consider four multiple matrix sampling designs. The selected method included subgroups based on geographic region, age, gender, race/ethnicity, and educational level. Therefore, not every examinee in the norm sample was administered every WJ V test, but rather each examinee was assigned to a test set. Examinee scores were differentially weighted to ensure the sampling variable reflected the population. Acknowledging that human ability traits typically are not distributed in a precise normal curve, and instead display positive skew, the test developers used half-norm distributions (above and below the median W-score values) to estimate unique standard deviations to use along with the median reference W-score to develop the test norms and derived scores.

Item-Level Analysis

Individual test items were reviewed using several methods. Rasch outputs were considered for the gradient of W-Difficulty, point-measure correlations between item responses and examinee ability estimates, and the mean-square item fit indices that assess expected versus observed performance. Test items with low point-measure correlations and high mean-square fit statistics were not included in the final version of the test.

Differential item functioning (DIF) was used to examine if subgroups of examinees had more difficulty with specific test items. Subgroups analyses included sex (male and female), race (White and non-White), and ethnicity (Hispanic and Not Hispanic). If an item showed statistically significant DIF contrast between subgroups in any direction, the item was flagged for further review.

Items were reviewed by 13 experts including educators, professors, and practitioners with different areas of specialization. These reviewers examined item content with a specific focus on bias and sensitivity. Test items were only included on the WJ V if the item was not flagged during the bias and sensitivity review and if any observed DIF was determined to be a result of low response rates in a subgroup or unexpected responses based on person-by-item residuals rather than potential source of bias.

Reliability Evidence

To ensure precision, several types of reliability were examined. Marginal item-response theory (IRT) reliability provided information about the consistency of a test. Reliability coefficients were calculated using conditional standard errors of measurement for all tests and age groups of 4–5, 6–9, 10–14, 15–19, 20–49, and 50–80+.

Reliability coefficients for the measured age groups were at or above r = 0.80 with few exceptions. For example, Visual Working Memory had relatively low reliability coefficients (0.78 for ages 10-14; 0.76 for ages 15–19 and 20–49). For ages 4–5, seven tests had a reliability coefficient below 0.80, and Phonemic Word Retrieval was lowest overall (0.68). Reliability coefficients were lowest for this age group, which is consistent with general reliability evidence for cognitive tests and young children (Kranzler & Floyd, 2020). Letter-Pattern Matching and Number-Pattern Matching stood out with high reliability (0.92 to 0.97 across age groups). Compared to individual tests, reliability coefficients were expectedly higher for WJ V COG cluster scores, with values between 0.83 and 0.99 across age groups.

Test-retest reliability, which reflects consistency of scores on a test to the same examinee on two occasions, was examined for Semantic Word Retrieval and Phonemic Word Retrieval only, with reliability coefficients at or above 0.85 across ages. Alternative form reliability, which considers the correlation between two forms of a test, was assessed for three of five tests with a Form A and Form B. Reliability coefficients for Number-Pattern Matching and Letter-Pattern Matching were above 0.85 across age groups. Coefficients were slightly lower for Symbol Inhibition, with 0.81 for the age 4–19 sample and 0.78 for the adult sample.

Validity Evidence

A notable strength of the WJ V is the volume of validity studies, exceeding typical test manuals. Analyses confirmed the CHC model’s hierarchy of cognitive abilities, demonstrating developmental changes in both broad and narrow factors across age groups. Data emphasizing the general factor (g) characteristics of the tests, as derived from unrotated principal components analysis, was provided as content validity evidence for the relative degree of cognitive complexity of the WJ V COG tests. An independent expert consensus study found high agreement for broad abilities (93%) and moderate agreement for narrow abilities (63%), offering additional evidence of content validity (Flanagan et al., in press). Oral Vocabulary, Verbal Analogies, and Verbal Attention showed the highest g loadings for ages six and older.

The authors presented structural evidence using a split-sample (separate model development and model validation samples) three-stage analysis. Multiple exploratory analysis procedures, such as cluster analysis, exploratory factor analysis, multidimensional scaling, and psychometric network analysis were applied to six age-differentiated model development samples. The synthesis of the four exploratory methods suggested 11 CHC broad abilities. Exploratory model development and evaluation procedures were then used to refine three categories of CHC models (i.e., traditional hierarchical g model, hierarchical g broad + narrow model, and Horn no-g broad model). The final models were then cross-validated “as is” in the corresponding independent cross-validation sample for the six different age groups. The results equally supported both the Horn no-g and traditional hierarchical g models, which previous research has supported for theoretical reasons (Decker et al., 2014; McGrew et al., 2023). The WJ V COG ultimately adopted the Horn no-g broad ability model.

Criterion-related validity was evidenced by moderate to strong correlations between the GIA and total scores from other cognitive measures (e.g., WISC-V, SB5, DAS-II), and within matched CHC domains (e.g., Gf). Cognitive–achievement correlations aligned with theory (e.g., Gf and math problem-solving). Of note, correlations between the GIA and achievement are lower for the WJ V than the WJ IV. However, the authors expected this change due to removing tests of Ga and Gq from the standard cognitive battery and believe it to be an improvement in effort toward minimizing predictor-outcome contamination (LaForte et al., 2025). Clinical validity was explored using data from individuals with eight different diagnoses, providing further descriptive support. Additional information can be found in the WJ V Technical Abstract available through Riverside Insights.

Summary and Recommendations

The WJ V COG is a modern and clinically valuable assessment tool. Its digital administration is designed to decrease common administration errors, such as violation of basal/ceiling and test-by-pages rules (Ramos et al., 2009; Spenceley et al., 2017), and to facilitate automatic scoring. The updated battery includes five new tests and removal of three. We feel that the removal of Phonological Processing is an improvement over the WJ IV COG for which Ga tests were often challenging for individuals with language-based learning disabilities.

Tests comprising the CHC clusters in the WJ V COG have largely changed compared to the WJ IV COG, though underlying CHC theory remains similar. The authors suggest “extreme caution” when comparing WJ V scores to WJ IV scores for an individual, for this, and a variety of other reasons (Flynn effect, differences in norm group, item content etc.), which some clinicians may find limiting for their practice (LaForte et al., 2025). Though, we feel that these limitations are more general to the nature of intelligence tests, and not necessarily exclusive to the WJ V COG.

Recommended materials for the WJ V COG include a laptop and iPad with internet connection. Test sets can be easily created, saved, and revised based on referral questions. Like any test, administration requires practice to build fluency. However, we feel the Riverside Insights platform is user-friendly and is an improvement in the examiner-examinee interface compared to other cognitive tests that currently offer digital administration.

Some clinicians may find the digital-only option limiting, for example, in settings with unreliable internet or for use with early childhood or aging populations. Given that the GIA is not recommended for children under six, and alongside relatively low reliability estimates for ages 4–5, other assessments are likely better suited to obtain an overall estimate of cognitive functioning for young children. Validity evidence involved traditional approaches using factor analysis, as well as some new techniques like exploratory graph analysis. The Technical Manual features an impressive number of techniques and studies used to consider item fairness, reliability, internal validity, and external validity. Overall, we believe the robust psychometric evidence of the WJ V COG alongside its engaging digital design makes it an exciting option for professionals in clinical and education settings.

Footnotes

ORCID iDs

Laree B. Foster

Donna C. Perazzo

Scott L. Decker

Ethical Considerations

IRB or ethical board approval was not sought or needed based on the article type (Test Review) and because this manuscript does not include research on human subjects.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

This manuscript is based on information (e.g., Examiner’s Manual, Technical Manual) available on Riverside Insights with a subscription: .

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education . (2014). Standards for educational and psychological testing. American Educational Research Association.

Decker

S. L.

Englund

J. A.

Roberts

A. M.

(2014). Higher-order factor structures for the WISC-IV: Implications for neuropsychological test interpretation. Applied Neuropsychology: Child, 3(2), 135–144. https://doi.org/10.1080/21622965.2012.737760

Flanagan

D. P.

Ortiz

S. O.

Alfonso

V. C.

(in press). Essentials of cross-battery assessment (4th ed.). Wiley.

International Test Commission & Association of Test Publishers . (2022). Guidelines for technology-based assessment. Association of Test Publishers.

Kranzler

J. H.

Floyd

R. G.

(2020). Assessing intelligence in children and adolescents: A practical guide for evidence-based assessment. Bloomsbury Publishing PLC.

LaForte

E. M.

Dailey

McGrew

K. S.

(2025a). WJ V technical abstract. Riverside Assessments, LLC.

LaForte

E. M.

Dailey

McGrew

K. S.

(2025b). Technical manual. Woodcock-Johnson V. Riverside Assessments, LLC.

Mather

Wendling

B. J.

Snader

E. H.

(Contributor), & Jeantete

G. T.

(Contributor) (2025). Examiner’s manual. Woodcock-Johnson V. Riverside Assessments, LLC.

McGrew

K. S.

Schneider

W. J.

Decker

S. L.

Bulut

(2023). A psychometric network analysis of CHC intelligence measures: Implications for research, theory, and interpretation of broad CHC scores “beyond g”. Journal of Intelligence, 11(1), 19. https://doi.org/10.3390/jintelligence11010019

10.

McGrew

K.S.

Mather

LaForte

E.M.

(2025). Woodcock-Johnson V Tests of Cognitive Abilities. Riverside Assessments, LLC.

11.

Ramos

Alfonso

V. C.

Schermerhorn

S. M.

(2009). Graduate students' administration and scoring errors on the Woodcock‐Johnson III Tests of Cognitive Abilities. Psychology in the Schools, 46(7), 650–657. https://doi.org/10.1002/pits.20405

12.

Schneider

W. J.

McGrew

K. S.

(2018). The cattell-horn-carroll theory of cognitive abilities. In Flanagan

D. P.

McDonough

E. M.

(Eds.), Contemporary intellectual assessment (4th ed., pp. 73–163). Guilford Press.

13.

Schrank

F. A.

McGrew

K. S.

Mather

(2015). The WJ IV Gf-Gc composite and its use in the identification of specific learning disabilities (Woodcock-Johnson IV assessment service bulletin no. 3). Riverside.

14.

Spenceley

L. M.

Clawson

Flanagan

Vonderoh

(2017). Administration and scoring errors on the WJ-IV-COG by school psychologists in training. In Trainers’ Forum (34). Trainers of School Psychologists.