Sage Journals: Discover world-class research

Abstract

The use of multiple criteria/modes for gifted identification is generally believed to improve the validity and reliability of measurements in several ways, including being more appropriate for identifying culturally, linguistically, and economically diverse students. Many districts using multiple criteria/modes combine them in some fashion, including in the use of an identification matrix. However, the method for combining these measures itself can be psychometrically unsound and has been previously shown to influence which students are identified as gifted. This study explored measurement invariance of an in-use gifted identification matrix with 22,280 Kindergarten and fifth grade students to determine if the matrix performs equivalently across demographic/grade level groups. Although some matrix components performed similarly across various groups, no single component of the matrix functioned equivalently across all demographic/grade level groups. Patterns in the results are discussed, as well as recommendations for future use of identification matrices with diverse student populations.

Keywords

gifted identification multiple measures identification matrix representation diversity equity equality measurement invariance

The use of multiple measures to identify gifted students is the most commonly reported method of gifted identification used in schools since at least 2008 (National Association for Gifted Children, 2009). Often, these multiple measures are combined into an identification matrix in order to simplify the identification process for the teachers and administrators involved. An identification matrix is an organizational tool for recording and collating identification data from a variety of different measures (McCabe, 1978). These identification matrices can be constructed to combine the results from many different types of measures, including test scores, grades, and nominations across a variety of methods. However, how those multiple measures are selected, combined, and used may result in differences in outcomes between our various student populations (Lee et al., 2024; Moon, 2017). Even though the same multiple identification measures may be used for all students within a district for the sake of equality (equal treatment), differences in how the combined instruments actually measure could result in inequities, especially for already underrepresented populations (Ford et al., 2020). These differences in identification outcomes could exacerbate historical and ongoing systemic injustices in the enrollment of underrepresented groups in gifted and talented education (List & Dykeman, 2021; Peters, 2022), all the while schools using these multiple-measure identification procedures believe they are improving outcomes for the diverse students they serve. While there have been explorations of individual instruments and decision rules for their combination (Lakin, 2018; McBee et al., 2014; Pereira, 2021; Peters & Gentry, 2013), the potential differential impacts of the construction and use of matrices on our culturally, linguistically, and economically diverse (CLED) students have just begun to be explored in research on gifted identification (Peters et al., 2025). Our study proposes to add to the body of knowledge in this new and growing area of research through an exploration of psychometric properties of one district's in-use multiple-measure identification matrix.

Literature Review

In both quantitative and qualitative research fields, there is general agreement that using multiple measurements improves the reliability and validity of the measured variable (Whitley & Kite, 2013). From construct validity to triangulation, multiple measures are used across the research methodology range to improve the chances of accurate and trustworthy findings (FairTest, 2007). As Worrell (2009, p. 243) stated, “outstanding accomplishments by children and adults are multivariate in nature and require multivariate explanations.” The most commonly reported method of gifted identification used in schools is the use of multiple measures (National Association for Gifted Children, 2009). These multiple measurements can differ both in criteria (e.g., ability, creativity, and leadership) and in mode (e.g., observation, performance, and portfolio). Using multiple measures that differ in both criteria and mode is a long-standing method for examining the construct validity of a multifaceted measurement, such as giftedness (Campbell & Fiske, 1959).

There are many reasons for using multiple measures in gifted identification (Rinn et al., 2020). In some states, legislators have passed laws requiring multiple-criteria identification processes after lobbying by gifted researchers, educators, and parents (Krisel & Brown, 1997). In other states, the use of multiple criteria was imposed through legal mandate (Lohman & Renzulli, 2007; Romey, 2006). For example, the Alabama Department of Education entered into a consent decree with the federal Office of Civil Rights in 1999 to adopt a multiple-criteria approach to gifted identification (Romey, 2006). In 2007, the Wisconsin Department of Public Instruction was required by a State Circuit judge to create specific rules for its school districts to follow when using multiple measures to identify gifted children (Lohman & Renzulli, 2007).

Others moved to a multiple-criteria and/or multiple-mode identification process with the belief it would be more inclusive in identifying CLED students (National Association for Gifted Children, 2019). CLED students are often underserved in school-based programs, at least in part due to being underidentified for those programs (Long et al., 2023; Peters, 2022). In a systematic literature review, Mun et al. (2020) found that almost half of the reviewed articles provided recommendations for the use of multiple measures to increase the identification rates of CLED students. After the implementation of a new multiple-criteria rule following legislation requiring multiple-criteria identification for gifted students in the state of Georgia in 1995, Krisel and Brown (1997) found more students from underrepresented populations were being identified. In the decade following the implementation of Georgia's multiple-criteria rule, the percentages of traditionally underrepresented students identified in Georgia through the use of multiple criteria continued to increase dramatically, with Black students increasing in identification by 206% and Latinx students by 570% (Stephens, 2009).

Difficulties Identifying Gifted CLED Students

The programming standard for assessment, outcome 2.3.1, from the National Association for Gifted Students (2019) states that, “educators select and use equitable approaches and assessments that minimize bias for referring and identifying students with gifts and talents” (p. 2). Some researchers believe adding measures that better represent diverse talents and experiences of CLED students will improve our ability to identify those students (Joseph & Ford, 2006). However, instruments may also be selected for use in a multiple measure system based on their face validity; appearing to capture a broader, unbiased picture of our CLED learners than our standardized ability/achievement tests, while instead introducing bias and lowering the reliability of our overall matrix (McBee, 2006). Some of the instruments suggested for use, at least partially due to a belief that they would be better at identifying CLED students, include teacher rating/behavioral scales, native-language and/or nonverbal instruments, and other nontraditional alternative assessments (Joseph & Ford, 2006).

Teacher ratings are often used as one component in gifted identification (Carman, 2013). Lohman and Renzulli (2007) noted that including measures such as teacher ratings and behavioral checklists to the ability and achievement tests already in use for gifted identification could help increase the diversity of the population of identified students. However, recent research has shone a harsh light on the quality, reliability, and validity of teacher rating scales and other teacher nomination processes (Hodges et al., 2018). Teacher ratings have been strongly linked to the individual teacher performing the rating (McCoach et al., 2024) and the grade level of the students being assessed by those teacher ratings (Marsili & Pellegrini, 2022). Additionally, their use as part of the gifted identification process has been viewed as inequitable for Black students (Britten, 2021; Ford, 2010) and a potential opening for bias in the identification process (McBee, 2006).

Tests designed specifically for native Spanish speakers, such as the Logramos achievement test (Riverside Insights, 2019), or nonverbal tests such as the Naglieri Nonverbal Abilities Test (NNAT; Naglieri, 1997), are often suggested as a way to reduce the verbal load for emergent bilingual (EB) students (Abbott & McQuarrie, 2015; Lakin, 2010) and be more culture-fair (Naglieri, 2008; Naglieri & Ford, 2003). The Logramos has been developed to align with the Iowa Assessments and has been nationally normed to cover “the many diverse characteristics” of the Spanish-speaking bilingual/emergent bilingual student population (Aparicio, n.d.), to “parallels the scope and sequence” of the Iowa Assessments (Riverside Insights, 2019, p. 1), and is expected to function similarly to the Iowa Assessments for Spanish-speaking students (Logramos Third Edition, 2014). Nonverbal and other nontraditional identification methods have faired less well in identifying CLED students. Research exploring the effectiveness of the use of nonverbal instruments at identifying gifted CLED students has occasionally found positive results (Naglieri et al., 2004; Naglieri & Ford, 2003; Naglieri & Ronning, 2000), but more often has not found the use of nonverbal tests to address the gap in scoring between CLED and non-CLED students (Carman et al., 2020; Giessman et al., 2013; Hodges et al., 2018; Lohman et al., 2008; Lohman & Lakin, 2021).

Identification for gifted programs often involves the use of one or more measures that have been found in research to differentiate in identification ability between underrepresented and overrepresented groups (Carman et al., 2020; Lee et al., 2024). It is possible the use of such measures could at least partially explain the continuing underrepresentation of CLED students in gifted identification (Ford et al., 2020). Latinx students (Godinez-Cedillo, 2022; Lewis et al., 2007; Peters et al., 2024), Black students (Ford et al., 2020; Peters et al., 2024; Ricciardi et al., 2020), female students (Petersen, 2013; Ricciardi et al., 2020), twice-exceptional students (Jung & Hay, 2018; Peters & Johnson, 2024), multicultural and low-income students (Lee et al., 2022; Ricciardi et al., 2020), and emergent bilingual students (Abedi, 2002; Peters & Johnson, 2024; Ricciardi et al., 2020) are only a few of the many CLED student groups which have been historically and currently underrepresented in gifted identification and then underserved in gifted programming (List & Dykeman, 2021).

One of the reasons these instruments may be differing in their identification ability may not be due to the instruments themselves, but rather due to systemic inequalities, experienced historically as well as in present day, which can lead to a variety of outcomes that can have a negative effect on students’ opportunity and ability to learn, causing group differences in achievement, ability, and other related tests often used in gifted identification (Erwin & Worrell, 2012). Long et al. (2023) recently explored potential competing explanations for underrepresentation in gifted identification among various CLED student groups and found a majority of identification disparities could be traced to differences in students’ early academic abilities, suggesting differences in early opportunities to learn (OTL) may drive the underrepresentation we persistently see. While an extended exploration of the causes of these inequities is beyond the scope of this article, we point to Peters (2022) as an excellent review of many of the factors involved in these persistent issues.

Although using the same multiple measures for every student promotes equality, or equal treatment, using the same measures for everyone does not necessarily improve equity, or equality of outcomes (Ferlazzo, 2023). After all, if student starting lines for a race are in different locations, we should be unsurprised when students achieve different race times, even though they were all measured by the same finish line (Long, 2022). When making high-stakes test-based decisions, as is the case in gifted identification, it is important to use methods which are equivalently valid for all students, no matter where their starting line. Using multiple measures is generally thought to improve the reliability, validity, and fairness of the gifted identification process and, thus, increase the identification rates for CLED students, but this outcome has not been consistently supported in the literature (Callahan et al., 2012; McBee et al., 2014; Plucker & Callahan, 2014).

Matrices in the Gifted Identification Process

One tool for making decisions using multiple criteria/modes in the gifted identification process is to use an identification matrix. One of the earliest reports of matrix use in gifted identification comes in 1978 in a report by McCabe, who suggested the use of a matrix to record and organize the data created by using multiple measures to identify gifted children. He suggested using a matrix “could help make a broad, comprehensive definition of giftedness a definition which could be practical and workable as well” (McCabe, 1978, p. 6). Almost 30 years later, Lohman and Renzulli (2007, p. 1) remarked that “it is common practice to collect many different kinds of information about students, arrange this information in a matrix, and then combine it in some way to decide which children to admit to the G&T program.” Many districts find the use of an identification matrix to combine those multiple measures to simplify the identification process for the teachers and administrators involved. However, while the use of multiple identification measures is common across districts, the methods for combining those multiple measures into a singular matrix assessment and the instruments chosen to be included in those matrices are not.

There are many ways to build and use an identification matrix, and Moon (2017) discussed two common usage scenarios. The first scenario involves a multistage identification process in which a student proceeds through a series of screenings on a variety of instruments during the identification process. In this scenario, the student must meet or exceed one or more cutoffs to proceed to the next screening stage, where they must meet or exceed at least one additional screening cutoff before being identified as gifted. This more linear screening model has at least one first-stage instrument that the student must pass before proceeding to the rest of the screening instruments. The second scenario envisions the student being assessed on multiple instruments and having the scores from all the instruments considered concurrently using a point-based system with one cutoff for gifted identification. In this model, student achievement on every instrument contributes to the final decision (Moon, 2017).

There have been both positive and negative findings in research examining the use of identification matrices to promote equity. Pearson (2001) explored the effects of implementing a multicriteria identification matrix on the proportional identification of Black and Latinx students in Alabama. After implementation of the multicriteria matrix, the identification rate for both Black and Latinx students increased by a small percentage in its first year of implementation (Pearson, 2001). Romey (2006) later extended Pearson's study of the effects of the implementation in Alabama and found certain instruments were used more frequently as part of the identification matrix in districts which were more successful in reaching proportional identification for culturally diverse students. However, Romey also found other district-level factors related to an increased likelihood of identifying a proportionally representative student group, such as SES, were not included in the matrix calculations. Additionally, Romey called into question the use of matrix components that did not have well-established reliability and validity, as that could have effects on which students are identified. Lidz and Macrine (2001) proposed adding a dynamic assessment to a multicriteria screening battery in a district that had previously been identifying less than 1% of their CLED students as gifted. The addition of the dynamic assessment, when used with the rest of the criteria, increased the district's identification rate to 5%, resulting in an identified pool of students that more closely matched their proportions of representation in the school population (Lidz & Macrine, 2001). Creating and implementing a matrix resulted in an over 1000% increase (from n = 3 to n = 41) in the number of CLED students initially recommended for the gifted program in a large suburban school district (George, 1992). Lakin (2018) explored the use of differing combination rules within a multiple-criteria model on the identification of diverse students and found that the strongest driver of a diverse identified population was the size of the program rather than the combination rule selected.

Because multiple types of matrices are used and not all matrices are built the same, the use of an identification matrix could cause differences in outcomes depending upon which matrix is used, how the matrix is created, and the makeup of its component parts (Peters et al., 2025). Research outside the field of gifted education described the various effects of combining multiple criteria measures into a unidimensional score. Combining multiple criteria into one matrix-based score may be a better predictor of student giftedness than using those multiple criteria alone. Walters (2011) found that a summed score composed of multiple criteria to be equivalent to a weighted score composed of the same criteria in their ability to predict recidivism, and that either method (summed or weighted) for combining the multiple criteria resulted in a better prediction than using each criterion individually.

However, the method of combining multiple criteria into one score made a significant difference in outcomes in other studies (Timbie & Normand, 2008; Wilson, 2008). Timbie and Normand (2008) found significant differences in the classification of hospitals’ value depending on the combination method used to create the value variable. Similarly, Wilson (2008) explored multiple methods for combining student assessment results and found a large variation in number of students classified as failing depending upon the combination method used.

Finally, even the most effective method for combining multiple criteria may not be useful if the method is too complicated for practitioners to use. Teixeira-Pinto and Normand (2008) examined the effects of multiple methods of combining multiple best practice indicators to classify hospitals’ performance into two categories (superior/not superior). They found the best fitting method of classification was a complex statistical model-based score rather than a classification method based on simple average measure scores. However, they cautioned that using a model-based score made it difficult for nonstatisticians to understand and use the score.

One potential cause of differences in identification outcomes between demographic groups could be that some of the matrices used for gifted identification are not psychometrically/theoretically sound (Callahan et al., 2013). While the assessments selected to be part of a multicriteria matrix may individually be unbiased, there could be bias generated as part of the method used to select and/or combine those assessments and the timing of when those assessments are given. Using definitions of giftedness alone to decide which tests are included, how they are combined, and to what degree those tests are represented without evaluating the psychometric properties of their combination may result in differences in the abilities of the students who are identified and in how many students are identified (Callahan et al., 2013; McBee et al., 2014). Plucker and Callahan (2014) stated, “simply using more measures is not as important as how those measures are actually used” (p. 395). Including more “authentic” measures to broaden the criteria/method without regard to their reliability or validity will still lower the overall reliability and validity of the matrix (Lohman, 2012), as will oversampling from any one domain because it is measured by multiple of the included instruments (Callahan et al., 2013; Lohman, 2012). The method selected for combining various scores on assessments can have large effects on both who is selected and how similar (or different) the selected individuals are (McBee et al., 2014). Combining multiple instruments that measure different constructs into one identification matrix increases measurement error, which increases the chance students are identified as gifted when they should not be or are not identified when they should have been (Moon, 2017).

Timing of when to assess students for inclusion in gifted services can also affect which students are identified (Carman et al., 2018; Hodges et al., 2018). Testing for gifted services tends to be conducted in early elementary school rather than middle school or later (Hodges et al., 2018; Sternberg & Davidson, 2005). Researchers have found significant differences in identification rates among CLED students based on the grade in which they were identified (Hodges et al., 2018; Lohman & Korb, 2006). Marsili and Pellegrini’s (2022) systematic review and meta-analysis across 29 studies found school level to have a significant moderator effect on the relationship between traditional identification measures and nominations, with higher relationships occurring in elementary school as opposed to middle school. Lohman and Korb (2006) note that students’ scores on identification instruments may change over time due to multiple factors, including maturation, quality of instruction, and other personal and social factors. When decisions for which instruments to include, how to combine those instruments, and when to assess with those instruments are arbitrary, it can result in undesirable consequences, such as lowered matrix reliability and validity (Lohman & Renzulli, 2007).

Measurement Invariance

The 2014 Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association & National Council on Measurement in Education, 2014) define measurement bias as nonconstruct-related characteristics of a test which may impact the scores differentially for some subgroups. One of the ways researchers measure nonconstruct-related characteristics impact on instrument performance is measurement invariance. Measurement invariance explores the extent to which an instrument or subscale performs equivalently (i.e., has equal meaning and interpretation) across examinee groups (French & Finch, 2006). There are multiple methods for determining measurement invariance, including multisample confirmatory factor analysis, which explores the structure of an instrument/construct across groups. This comparison iterates with increasingly restrictive models until the process results in a model that is found to show a significant decrease in fit. As the model structure is found to be more equivalent between tested groups, the researcher can assume better measurement invariance (French & Finch, 2006).

Measurement invariance is an important part of validating an instrument for group comparisons, as scores on noninvariant measures may be due to extraneous variables and should not be used for group comparisons due to the increased chance of inappropriate conclusions (Warne, 2023). Peters and Gentry (2013) found the HOPE Scale to be measurement noninvariant for gender and income groups, even though there was no differential item functioning (DIF) found due to gender or income. As a result, they recommended the instrument should not be used to compare across gender and income groups, but rather within those groups instead (Peters & Gentry, 2013). In exploring the same scale for measurement invariance between English language learner (ELL) and English proficient (EP) students, Pereira (2021) also found significant differences in the underlying factor structure for those two demographic groups. He too recommends the HOPE Scale not be used to compare scores between ELL and EP students (Pereira, 2021). Warne (2023) explored four versions of the Wechsler tests for measurement invariance in four developing African nations as compared to American measurement models. While some of the samples did reach strict measurement invariance, other samples did not, leading to the conclusion that, while some American test batteries can produce validly interpretable scores using an international comparison group, other instruments/samples may not support comparisons across national groups.

While it is possible instruments themselves may be biased, the use of instruments where little to no DIF has been measured within an identification matrix can still result in differences in which students are identified, potentially due to systematic inequities among those minoritized groups, including lack of OTL (Erwin & Worrell, 2012; Long et al., 2023). Instruments may also be equal but not produce equitable results (measurement noninvariant) through differences in group scores outside of the latent construct of the instrument itself. It is possible for demographic groups to have differences in percentage passing or on average scores even if there is no psychometric bias found (Jonson & Geisinger, 2022). Worrell (2009) noted that mean score differences between groups do not necessarily indicate bias, and, in fact, it would be surprising if one did not find such score differences given the known gaps in school achievement between demographic groups. As noted in Peters (2022), there are many inequities faced by CLED students that could result in real group differences which are environmental and societal in origin. The American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (2014) note that, while group differences in testing outcomes should trigger additional scrutiny of the instrument for test bias, these differences could be a result of real differences between groups on the construct being measured or a combination of real group differences and test bias. Because one may not be able to rule out all sources of bias, researchers must be careful in their interpretations and continue to work toward improving test design to help eliminate potential sources of bias while maintaining instrument validity (American Educational Research Association, American Psychological Association & National Council on Measurement in Education, 2014).

We were unable to find any previous articles on the measurement invariance of an entire gifted identification matrix, although there were several studies involving measurement invariance on individual instruments which may be included in some matrices (Lee et al., 2022; Pereira, 2021; Peters & Gentry, 2013; Warne, 2011, 2023) and also research on the effects of combinations of multiple instruments that did not explore measurement invariance (Lakin, 2018; Lee et al., 2024; McBee et al., 2014; Peters et al., 2025). As there has not yet been an exploration of measurement invariance in an identification matrix in the literature, the purpose of our study is to examine the psychometric properties of a real-world, in-use multiple-measure identification matrix in order to determine if there are differences in student identification outcomes by demographic/grade level variables and if those differences in identification outcome might be attributable to measurement noninvariance between different demographic/grade level groups.

Research Questions

Before examining an instrument's measurement invariance across groups within a dataset, it is useful to know if differences in instrument outcome between groups exist. Our first three research questions examine these differences.
RQ1: Are there differences in matrix scores/outcome by grade level?

RQ2: Are there differences in matrix scores/outcomes by demographic (gender, race/ethnicity, SES, EB, SPED)?

RQ3: Are there differences in matrix scores/outcomes by demographic within grade level?

Once we have examined the differences in instrument outcome, we then move to examining the matrix invariance. These parallel research questions are:
RQ4: Are there differences in matrix performance (noninvariance) by grade level?

RQ5: Are there differences in matrix performance (noninvariance) by demographic (gender, race/ethnicity, SES, EB, SPED)?

RQ6: Are there differences in matrix performance (noninvariance) by demographic within grade level?

Method

Participants

All research was conducted under institutional review board approval. Participants were 22,280 Kindergarten and fifth grade students (n's = 13,364 and 8,916, respectively) in a large, urban public school district in the South-Central region of the United States (see Table 1 for demographic characteristics).

Table 1.
Demographic Characteristics of Sample.

Kindergarten 5th Grade

Sample size (n) 13364 8916

Age (M years) 5.3 10.4

Gender

Female (%) 49.5 49.1

Male (%) 50.5 50.9

Race/ethnicity

Latinx (%) 65.4 66.4

Black (%) 21.8 25.8

White (%) 8.1 4.9

Asian (%) 3.4 2.9

Native American (%) 0.1 0.1

Multiracial (%) 1.1 0.5

Low SES (%) 81.2 85.7

EB (%) 47.5 41.3

SPED (%) 2.8 8.0

Note. EB=emergent bilingual.

SES level was measured using student FRPL status, which includes parental income and number of family members, in accordance with state guidelines. While FRPL is not a perfect measure of student socioeconomic status, it is a close approximation and has value for use beyond the calculation of household income alone (Domina et al., 2018). EB and SPED status were based on designation by the participating school district.

Matrix Instruments

The identification matrix developed by the participating district is a two-page document that details how to combine points derived from student scores on an achievement test (Iowa Assessments [IOWA]/Logramos), an ability test (Cognitive Abilities Test [CogAT7]), report card, and teacher recommendation into a total matrix score. Students who earn above a set total matrix point cutoff are automatically qualified for the gifted program, while students earning at least 90% of the cutoff score will also qualify if at least 16% of their points come from the CogAT7 and at least 32% of their points come from the IOWA/Logramos.

Iowa Assessments/Logramos

The Iowa Assessments (IOWA; Dunbar & Welch, 2022) are a set of multiple-level achievement tests normed for grades K-12. They can be administered online or paper-and-pencil and take between 2 to 4 hours to administer depending upon the level selected. The Core Battery contains multiple subtests, including English, language arts, and math, and comprises approximately 145 to 200 questions depending on the level. Reliabilities on the Core subtests are mostly in the .80 s and .90 s. Concurrent validity of the assessment was examined through a comparison with scores on the CogAT with the same standardization sample. Students in the participating district were administered Core Battery level 5 (Kindergarten) or level 11 (5th grade) midway through the school year. The core test was used to produce both an English/Language Arts (ELA) and Math score, both of which were included as part of the matrix calculations in the form of a national percentile rank (NPR).

The Logramos (Riverside Publishing, 2014) is an achievement measure specifically designed for native Spanish speakers. It “parallels the scope and sequence” of the IOWA while using Spanish vocabulary that is commonly used in Spanish-speaking countries (Riverside Insights, 2019, p. 1; Riverside Publishing Company, 2012). It is available for use in grades K-8. The Core Battery, made of the same subscales as the IOWA, was given to native Spanish-speaking students in the participating district as an alternative to the IOWA. The ELA and Math scores from the Logramos were used in the matrix calculations in the form of an NPR.

NPR scores from the IOWA/Logramos ELA and Math portions contributed separately to the total matrix score, so a student could earn points from both ELA and Math. Scores on the ELA and Math tests contributed on the same scale, where scoring at the 70th to 79th percentile earns seven points, 80th to 84th earns 10 points, 85th to 89th earns 13 points, 90th to 94th earns 16 points, and 95th to 99^h earns 20 points. If a student earns in the highest percentile range on both exams, the IOWA/Logramos could contribute a maximum of 40 points toward their total matrix score.

Cognitive Abilities Test

The CogAT7 (Lohman, 2011) is an ability test that measures the Verbal, Nonverbal, and Quantitative domains in a group setting. The participating district only administers the nonverbal portion as part of their universal screening. The CogAT produces score reports measured in multiple ways, including a Standard Age Score (SAS), with a mean of 100 and a standard deviation of 16. Split-half reliabilities for the CogAT7 across all grade levels and domains are reported at .80 and higher, with reliabilities increasing as student grade level increases (Warne, 2015). Concurrent validity and confirmatory factor analysis provided evidence of instrument validity. In the participating district, earning an SAS score of 100 and above contributes points towards the total matrix score, with a score between 100 and 103 adding five points, 104 and 108 adding 10 points, 109 and 113 adding 15 points, 114 and 120 adding 20 points, 121 and 125 adding 25 points, and 126 and 130 adding 30 points.

Report Card

Additional matrix points were contributed from the students’ report cards. For Kindergarten students, scores in the core content areas (Language Arts, Math, Science, and Social Studies) from their most recent nine weeks report card were added together, and the resulting score was assessed on a set scale based on the range of total points achieved. For the fifth grade students, all grades from students’ prior year final report card were averaged to determine each student's score, which was then compared against several ranges to determine how many matrix points to award. For both grades, students who earned an average score below 80% did not earn any points, between 80% and 84% earned five points, 85% and89% earned 10 points, 90% and 94% earned 15 points, and students earning in the 95% and 100% range earned 20 points toward their total matrix score.

Teacher Recommendation

Students’ primary teachers were asked to fill out a district-created teacher recommendation form adapted from the Scales for Rating the Behavioral Characteristics of Superior Students (SRBCSS; Renzulli et al., 2002) for each student in kindergarten and fifth grade as part of the district's universal screening process. This form assesses students on their general intellectual, creative, and leadership abilities, with six to seven questions per ability category. Teachers rated each student from Rarely (1) to Consistently most of the time (5) for each question. Point totals were added up for each recommendation, and the matrix points were awarded based on the total number of recommendation points earned, with scores between 60 and 69 contributing four points, 70 and 79 contributing six points, 80 and 89 contributing eight points, and 90 and 100 contributing 10 points toward their total matrix score.

Procedures/Data Collection

All participants were administered the identification matrix instruments during the district's regular annual universal screening process for grades K and 5. All students were administered the CogAT7 and either the Iowa Test of Basic Skills or Logramos based upon their EB status. Additional measures for the matrix included teacher recommendations and student report cards. Only students whose file included all scores on all matrix measures were included in this analysis.

The participating district provided archival data from one full academic year of assessment. Data provided by the district included de-identified student demographic data (including age, grade, EB status, SPED status, FRPL status, federal aggregated ethnicity code, and gender), along with both scores and matrix points from the teacher recommendations, IOWA/Logramos ELA and Math, report card, and CogAT7.

Coding

We created dummy codes for the nominal-level variables, including grade level, gender, ethnicity, FRPL status, EB status, and SPED status. Due to the low numbers of students from Pacific Islander, Native American, and multi-ethnic backgrounds, we grouped these students into an “Other” category for analyses involving race/ethnicity. We coded identification outcome as 0/1, where nonidentified students score 0 and identified students score 1.

Data Analyses

Our first three research questions examined differences in scores/outcomes by both grade level and demographics using a series of independent-samples t-tests, two-way contingency table tests, and analysis of variances (ANOVAs) in SPSS, depending upon the level of measurement and number of groups in the independent and dependent variables. Type I error inflation was controlled using Holm's Sequential Bonferroni (1979) for all family-wise t-tests, and Games-Howell post hoc tests (Games & Howell, 1976) following ANOVAs.

For the last three research questions, matrix performance (invariance) across different groups (by grade and/or demographics) was explored through multiple hierarchical multigroup invariance analyses in Mplus. A one factor confirmatory factor analysis (CFA) was conducted to examine the model fit of the matrix for all participants. Acceptable or good model fit would indicate that the matrix fit the sample when all participants were considered together. Multigroup analyses were then conducted to determine if the matrix items fit the latent factor differently by grade level, demographic factors, and demographic factors within grade level. In the base model, the items were freely estimated across groups. In the subsequent models, each item loading was constrained to be equal across groups one at a time, followed by testing whether the model fit significantly changed with the additional constraint. A statistically significant change in the chi-square statistic indicated a between group difference in the item loading, and a nonsignificant change in the chi-square statistic indicated the item loading was similar between the groups.

Positionality Statement

All three authors have been previously identified as gifted at some point in their K-12 education. The three authors identify as White, with one identifying as male and the two others identifying as female. While one author is an immigrant, all authors are native English speakers. All three authors have children in the US public education system, and one of the children has been identified as gifted while the other child is twice-exceptional. One author is an Educational Psychologist, one a Developmental Psychologist, and the third a Scholarship of Teaching and Learning researcher. Growing up in public school gifted education settings, one of the authors was educated in quota-based gifted magnet program and was able to experience having diverse gifted classmates from elementary through high school, another of the authors was unable to participate in gifted programming until secondary school due to the use of quota-based gifted programming, and the third author did not enter gifted programming until secondary school because there was no gifted programming offered in their elementary school. With our past experiences in gifted public education and with gifted/2e children, the authors have a personal interest in making sure our gifted identification processes are as equitable as possible, so that all gifted children, including our own, have the opportunity to participate in gifted programs.

Results

Research Questions 1–3

The first three research questions all focused on differences in total matrix scores and identification outcome between student grade level and/or demographic characteristics in our sample. Table 2 presents the means and standard deviations on the total matrix score across all demographic groups in our sample.
RQ1: Are There Differences in Matrix Scores/Outcome by Grade Level?

Table 2.
Means and Standard Deviations of Overall Matrix Score by Demographic Group.

Demographic Overall Kindergarten 5th Grade

x̄ s x̄ s x̄ s

Overall 12.38 15.85 13.18 17.24 10.88 13.35

Gender

Female 11.46 15.41 14.27 17.68 11.87 13.63

Male 13.31 16.23 12.50 16.76 9.92 13.00

Race/ethnicity

Latinx 11.13 14.65 11.93 16.00 9.95 12.29

Black 10.47 14.05 11.07 15.27 9.70 12.28

Non-Latinx Caucasian 25.09 20.79 25.82 21.56 23.31 18.69

Asian 22.57 21.00 22.03 21.70 23.89 19.16

I/PI/Other 22.36 20.85 24.72 21.96 15.97 15.98

SES (FRPL)

Low 10.41 14.05 11.11 15.40 9.43 11.81

Average to High 21.94 20.08 23.17 20.96 19.50 17.97

EB

EB 10.41 14.42 12.31 16.25 7.13 9.70

Non-EB 13.98 16.75 14.33 18.04 13.50 14.85

SPED

SPED 3.82 8.33 5.95 11.54 2.70 5.69

Non-SPED 12.81 16.01 13.59 17.33 11.59 13.58

Note. CFI=comparative fit index; EB=emergent bilingual; FRPL=free and reduced price lunch; PI=Pacific Islander; RMSEA=root mean square error of approximation; SES=socioeconomic status; SPED=special education.

We conducted an independent-samples t-test to determine if there were significant differences in total matrix score (dependent variable) by student grade level (independent variable). Levene's test for equality of variances was determined to be significant for the analysis, so equal variances were not assumed in calculating the corresponding t-test. Kindergarten students scored significantly higher than fifth grade students on total matrix scores, t(21793.62) = 12.16, p < .001, with a 95% confidence interval of the difference of 2.10 to 9.90 and a Cohen's d of 0.16. We examined differences between grade level (independent variable) on identification outcome (dependent variable) using a two-way contingency table analysis, χ²(1, N = 22280) = 83.18, p < .001, Cramér's V = .06.
RQ2: Are There Differences in Matrix Scores/Outcomes by Demographic Groups (Gender, Race/Ethnicity, SES, EB, SPED)?

We conducted multiple independent-samples t-tests to determine if there were differences in total matrix scores between groups by gender, SES, EB, and SPED status. Similar to the first t-test, all groups exhibited unequal variances, so equal variances were not assumed in the calculation of the t-statistics. Significant differences were found across all groups on the total matrix scores, as can be seen on Table 3 below, with weak to moderate effects found.

Table 3.
Independent-Samples t-Tests Between Student Demographic Groups.

Comparison t df p 95% CI d

Gender −8.7 22140.41 <.001 [−2.27, −1.44] 0.12

SES 33.68 4578.54 <.001 [10.85, 12.19] 0.76

EB 17.08 22219.42 <.001 [3.16, 3.98] 0.23

SPED 32.62 1533.51 <.001 [8.45, 9.53] 0.57

Note. CI=confidence interval; EB=emergent bilingual.

We conducted a one-way ANOVA to determine if there were differences in total matrix scores across racial/ethnic groups. The data did not meet the assumption of homogeneity of variance and there were large differences in sample sizes among racial/ethnic groups, so we opted to conduct the ANOVAs using the Welch statistic (Welch, 1951) with post hoc Games-Howell comparisons. The Welch statistics are indicated for use when the assumption of homogeneity of variance is violated, as it does not assume equality of population variances (Green & Salkind, 2016). The Games-Howell post hoc test, an extension of the Tukey test, corrects for family-wise type I error inflation and is designed for use after conducting an ANOVA with unequal variances and differences in sample sizes among the groups (Shingala & Rajyaguru, 2015). The ANOVA was significant at the p < .001 level, F_Welch(4, 1245.28) = 231.23, η² = .07, a moderate effect. Significant pairwise differences were found at the .05 level between 7 of the 10 comparisons. All groups scored significantly higher on the total matrix than the Latinx and Black groups, while the White, Asian, and Other groups did not score significantly different than each other.

We examined differences between all demographic groups (independent variables) and identification outcomes (dependent variable) using two-way contingency tables. As expected, all demographic groups displayed significant differences in identification outcomes at the p < .001 level, with Cramér's Vs ranging from .031 to .197.
RQ3: Are There Differences in Matrix Scores/Outcomes by Demographic within Grade Level?

Kindergarten: We conducted multiple independent-samples t-tests to determine if there were differences in total matrix scores between kindergarten-level groups by gender, SES, EB, and SPED status. Similar to the first t-test, all groups exhibited unequal variances, so equal variances were not assumed in the calculation of the t-statistics. Significant differences were found across all groups on the total matrix scores. Weak to moderate effects found with Cohen's d. Results of the t-tests can be found on Table 4.

Table 4.
Independent-Samples t-Tests Between Kindergarten Student Demographic Groups.

Comparison t df p 95% CI d

Gender −5.94 13287.58 <.001 [−2.35, −1.19] 0.10

SES 27.22 3169.25 <.001 [11.20, 12.94] 0.73

EB 6.81 13361.89 <.001 [1.44, 2.60] 0.12

SPED 12.43 424.27 <.001 [6.43, 8.85] 0.44

Note. CI=confidence interval; EB=emergent bilingual.

We conducted a one-way ANOVA to determine if there were differences in total matrix scores across racial/ethnic groups for kindergarten-level groups. Neither the kindergarten nor the fifth grade groups met the assumption of homogeneity of variance and both grades exhibited large differences in sample sizes among racial/ethnic groups, so we opted to conduct the ANOVAs using the Welch statistic with post hoc Games-Howell comparisons. The ANOVA was significant at the p < .001 level, F_Welch(4,900.33) = 146.90, η² = .07, a moderate effect. Significant pairwise differences were found at the p < .05 level between seven of the 10 comparisons. Latinx and Black groups scored significantly lower than all groups except each other, while the White and Asian groups did not score significantly different than the Other group.

We examined differences between all demographic groups (independent variables) and identification outcomes (dependent variable) using two-way contingency tables for the kindergarten group. As expected, all demographic groups displayed significant differences in identification outcomes at the p < .001 level except for the EB group which was still statistically significant, p = .007, with Cramér's Vs ranging from .023 to .197.

Fifth Grade: We conducted multiple independent-samples t-tests to determine if there were differences in total matrix scores between fifth grade-level groups by gender, SES, EB, and SPED status. Similar to the first t-test, all groups exhibited unequal variances, so equal variances were not assumed in the calculation of the t-statistics. Significant differences were found across all groups on the total matrix scores with weak to moderately strong Cohen's d calculated. Results of the t-tests can be found on Table 5.

Table 5.
Independent-Samples t-Tests Between Fifth Grade Student Demographic Groups.

Comparison t df p 95% CI d

Gender −6.92 8852.83 <.001 [−2.51, −1.40] 0.15

SES 19.33 1464.29 <.001 [9.05, 11.09] 0.78

EB 24.5 8870.02 <.001 [5.86, 6.88] 0.49

SPED 34.06 1553.76 <.001 [8.37, 9.40] 0.68

Note: CI=confidence interval; EB=emergent bilingual.

We conducted a one-way ANOVA to determine if there were differences in total matrix scores across racial/ethnic groups for the fifth grade-level group. The ANOVA was significant at the p < .001 level, F_Welch(4,341.26) = 80.82, η² = .07, a moderate effect. Post hoc tests were conducted using Games-Howell due to the lack of equal variances and differences in sample sizes among the groups. Significant pairwise differences were found at the .05 level between 8 of the 10 comparisons. Latinx and Black groups scored significantly lower than all groups except each other, and the White and Asian groups scored significantly higher than all groups except each other.

We examined differences between all demographic groups (independent variables) and identification outcomes (dependent variable) using two-way contingency tables. The gender variable did not display significant differences in identification outcome (p = .122). The racial/ethnic variable did not meet the chi-square assumptions of at least five individuals expected per group due to small sample size in the Other category for fifth graders. All other demographic groups displayed significant differences in identification outcomes at the p < .001 level, with Cramér's Vs ranging from .066 to .190.

Research Questions 4–6

A CFA was conducted to test the fit of a one factor model to the matrix items. The fit statistics confirmed acceptable fit (χ²(3) = 194.06, p < .001; RMSEA = .05 (confidence interval [CI]: .05–.06); CFI = .99).
RQ4: Are There Differences in Matrix Performance (Invariance) by Grade Level?

Model fit for ELA score and math score did not worsen when constrained to be equal between grade levels, which indicates factor loadings were similar for fifth graders and kindergartners. See Table 6 for factor loadings, 95% confidence intervals (CIs), and p-values. Model fit worsened when all other items were constrained to be equal between grade levels, which indicates factor loadings differed between the grade levels (i.e., invariance occurred). The factor loadings for CogAT7 score, report card score, and teacher recommendation score were all higher for fifth graders than for kindergarteners.
RQ5: Are There Differences in Matrix Performance (Invariance) by Demographic Groups (Gender, Race/Ethnicity, SES, EB, SPED)?

Table 6.
Analysis of Research Question 4: Matrix Invariance by Grade Level.

Instrument Factor Loadings (95% CI) Δχ² p-Value

Kindergarten 5th Grade

ELA 0.96 (0.93, 1.00) 0.95 (0.92, 0.97) 0.4 .55

Math 1.04 (1.00, 1.08) 1.06 (1.03, 1.09) 0.4 .55

CogAT7 1.40 (1.32, 1.46) 0.93 (0.89, 0.96) 158.9 <.001*

Report card 1.13 (1.09, 1.18) 0.74 (0.71, 0.77) 185.4 <.001*

Teacher Recommendation 0.67 (0.64, 0.70) 0.50 (0.48, 0.51) 88.8 <.001***

Note. df = 1 for all Δχ² tests; p < .05, p < .01, p < .001; ELA and Math scores are constrained to the same p-value. CI=confidence interval; CogAT= Cognitive Abilities Test; ELA=English/Language Arts.

Factor loadings, 95% CIs, and p-values related to whether there are differences in matrix performance for different demographic groups can be found on Table 7. Overall, we found matrix noninvariance (i.e., significant decrease in model fit when all other items were constrained to be equal) in 33 out of the 45 (73%) outcomes, and 25 (56% of the total) were significant at the .001 level. Noninvariance was found in 8 out of 9 demographic groups for the ELA and Math scores. The other measures frequently exhibited noninvariance, with Report Card scores having the fewest cases of noninvariance (5 of 9 demographic groups). When we compared Asians to non-Asians and SPED students to non-SPED students, all five measures exhibited noninvariance when the other items were constrained to be equal. Black (vs. non-Black) and Other (vs. not Other) showed the most invariance when compared to other demographic groups.

Table 7.
Analysis of Research Question 5: Matrix Invariance by Demographics.

Instrument Factor Loadings (95% CI) Δχ² p-Value

Male Female

ELA 0.85 (0.82, 0.87) 1.06 (1.03, 1.09) 97.4 <.001*

Math 1.18 (1.14, 1.22) 0.94 (0.92, 0.97) 97.4 <.001*

CogAT7 1.14 (1.09, 1.19) 1.00 (0.96, 1.04) 19.5 <.001*

Report card 0.88 (0.84, 0.92) 0.83 (0.80, 0.87) 3.0 .09

Teacher recommendation 0.59 (0.56, 0.61) 0.52 (0.50, 0.54) 17.6 <.001*

Latinx Not Latinx

ELA 0.91 (0.88, 0.93) 1.01 (0.98, 1.05) 21.5 <.001*

Math 1.11 (1.07, 1.14) 0.99 (0.95, 1.02) 21.5 <.001*

CogAT7 0.97 (0.93, 1.01) 1.17 (1.12, 1.22) 37.5 <.001*

Report card 0.79 (0.76, 0.82) 0.92 (0.88, 0.97) 25.5 <.001*

Teacher recommendation 0.56 (0.54, 0.58) 0.53 (0.50, 0.55) 3.8 .05

Black Not Black

ELA 0.97 (0.92, 1.02) 0.93 (0.91, 0.95) 2.2 .07

Math 1.03 (0.98, 1.08) 1.07 (1.05, 1.10) 2.2 .14

CogAT7 1.15 (1.08, 1.22) 1.06 (1.03, 1.10) 5.5 .02*

Report card 0.91 (0.85, 0.97) 0.84 (0.81, 0.86) 4.6 .03*

Teacher recommendation 0.58 (0.55, 0.62) 0.54 (0.52, 0.56) 3.9 .05*

White Not White

ELA 1.18 (1.10, 1.25) 0.92 (0.89, 0.93) 51.5 <.001*

Math 0.85 (0.80, 0.91) 1.10 (1.07, 1.13) 51.5 <.001*

CogAT7 1.00 (0.92, 1.07) 1.08 (1.04, 1.11) 3.2 .07

Report card 0.81 (0.75, 0.87) 0.86 (0.83, 0.89) 2.5 .11

Teacher recommendation 0.45 (0.42, 0.48) 0.57 (0.56, 0.59) 37.3 <.001*

Asian Not Asian

ELA 0.76 (0.67, 0.85) 0.96 (0.94, 0.99) 16.7 <.001*

Math 1.32 (1.17, 1.47) 1.04 (1.01, 1.06) 16.7 <.001*

CogAT7 2.09 (1.84, 2.34) 1.01 (0.98, 1.04) 112.3 <.001*

Report card 1.40 (1.24, 1.57) 0.82 (0.80, 0.85) 69.9 <.001*

Teacher recommendation 0.66 (0.58, 0.74) 0.54 (0.52, 0.56) 10.0 .002

Other Not Other

ELA 1.12 (0.94, 1.31) 0.95 (0.92, 0.97) 4.2 .04*

Math 0.89 (0.75, 1.04) 1.06 (1.04, 1.08) 4.2 .04*

CogAT7 1.14 (0.93, 1.35) 1.06 (1.03, 1.10) 0.4 .51

Report card 0.80 (0.63, 0.96) 0.86 (0.83, 0.88) 0.5 .48

Teacher recommendation 0.46 (0.37, 0.55) 0.55 (0.54, 0.57) 3.2 .07

FRPL Not FRPL

ELA 0.92 (0.90, 0.95) 1.02 (0.97, 1.08) 11.9 <.001*

Math 1.08 (1.06, 1.11) 0.98 (0.93, 1.03) 11.9 <.001*

CogAT7 1.07 (1.04, 1.11) 1.04 (0.97, 1.11) 0.7 .40

Report card 0.87 (0.84, 0.90) 0.82 (0.77, 0.88) 2.2 .14

Teacher recommendation 0.59 (0.57, 0.61) 0.46 (0.43, 0.48) 54.5 <.001*

EB Not EB

ELA 0.77 (0.74, 0.80) 1.08 (1.05, 1.11) 181.2 <.001*

Math 1.30 (1.25, 1.35) 0.93 (0.90, 0.95) 181.2 <.001*

CogAT7 0.97 (0.91, 1.02) 1.10 (1.06, 1.14) 17.0 <.001*

Report card 0.70 (0.66, 0.74) 0.92 (0.89, 0.96) 57.6 <.001***

Teacher recommendation 0.56 (0.53, 0.58) 0.53 (0.51, 0.55) 1.8 .19

SPED Not SPED

ELA 0.87 (0.81, 0.93) 0.95 (0.93, 0.97) 5.4 .02*

Math 1.15 (1.07, 1.23) 1.05 (1.03, 1.08) 5.4 .02*

CogAT7 1.41 (1.28, 1.54) 1.06 (1.02, 1.09) 28.5 <.001*

Report card 1.14 (1.03, 1.25) 0.85 (0.82, 0.87) 30.8 <.001*

Teacher recommendation 0.95 (0.88, 1.02) 0.54 (0.52, 0.56) 164.0 <.001***

Note. df = 1 for all Δχ² tests; p < .05, p < .01, p < .001; ELA and Math scores are constrained to the same p-value. CI=confidence interval; CogAT= Cognitive Abilities Test; ELA=English/Language Arts.

The instrument-demographic combinations with the highest change in chi-square values included (1) the ELA and Math measures in EB students such that ELA factor loadings were higher for non-EB students and Math factor loadings were higher for EB students; (2) for SPED students, factor loadings for teacher recommendations were higher than for non-SPED students; (3) both CogAT7 and report cards had higher factor loadings for Asian students than non-Asian students; and (4) ELA scores had higher factor loadings for girls and Math scores had higher factor loadings for boys.
RQ6: Are There Differences in Matrix Performance (Invariance) by Demographic within Grade Level?

Factor loadings, 95% CIs, and p-values related to whether there are differences in matrix performance for different demographic groups can be found in Table 8. We examined the results for patterns of invariance across the two-grade levels, five measures, and nine demographic groups. Overall, noninvariance was found somewhat more often with fifth graders compared to kindergarteners. Out of all comparisons for fifth graders, noninvariance occurred in 76% (34 out of 45) of the measures with most (71%; 32 out of 45) significant at the .001 level, compared to kindergarteners in which noninvariance occurred for 60% (27 out of 45) of the measures with much fewer (29%; 13 out of 45) significant at the .001 level. For kindergarteners, the CogAT7 measure was the most likely to exhibit differential factor loading, and for fifth graders, all measures demonstrated noninvariance for six or seven out of the nine demographic groups.

Table 8.
Analysis of Research Question 6: Matrix Invariance by Demographics by Grade Level.

Instrument Kindergarten 5th Grade

Factor loadings (95% CI) Δχ² p Factor loadings (95% CI) Δχ² p

Male Female Male Female

ELA 0.91 (0.87, 0.94) 0.99 (0.95, 1.03) 10.5 .001 0.72 (0.68, 0.76) 1.28 (1.21, 1.35) 215.7 <.001*

Math 1.11 (1.06, 1.15) 1.01 (0.97, 1.05) 10.5 .001 1.39 (1.32, 1.46) 0.78 (0.74, 0.83) 215.7 <.001*

CogAT7 0.99 (0.94, 1.04) 0.92 (0.88, 0.97) 3.8 .05 1.72 (1.60, 1.83) 1.24 (1.17, 1.32) 48.9 <.001*

Report card 0.77 (0.73, 0.82) 0.79 (0.74, 0.83) 0.3 .60 1.31 (1.24, 1.38) 1.10 (1.04, 1.15) 22.1 <.001*

Teacher recommendation 0.53 (0.51, 0.56) 0.49 (0.46, 0.51) 6.3 .01* 0.76 (0.71, 0.82) 0.62 (0.58, 0.66) 40.5 <.001***

Latinx Not Latinx Latinx Not Latinx

ELA 0.92 (0.89, 0.95) 0.99 (0.95, 1.04) 5.9 .01* 0.86 (0.81, 0.90) 1.12 (1.05, 1.18) 44.0 <.001***

Math 1.09 (1.05, 1.13) 1.01 (0.96, 1.06) 5.9 .01* 1.17 (1.11, 1.23) 0.90 (0.85, 0.95) 44.0 <.001*

CogAT7 0.81 (0.77, 0.85) 1.15 (1.09, 1.21) 86.6 <.001* 1.65 (1.55, 1.76) 1.23 (1.14, 1.31) 38.3 <.001*

Report card 0.66 (0.62, 0.70) 0.93 (0.88, 0.99) 61.9 <.001* 1.38 (1.31, 1.45) 0.99 (0.94, 1.05) 69.6 <.001*

Teacher recommendation 0.49 (0.47, 0.52) 0.52 (0.49, 0.55) 1.7 .20 0.80 (0.75, 0.85) 0.57 (0.53, 0.61) 50.7 <.001*

Black Not Black Black Not Black

ELA 0.91 (0.85, 0.97) 0.94 (0.91, 0.97) 0.9 .36 1.13 (1.04, 1.22) 0.92 (0.88, 0.96) 20.6 <.001*

Math 1.10 (1.03, 1.18) 1.06 (1.03, 1.10) 0.8 .36 0.89 (0.82, 0.96) 1.09 (1.05, 1.14) 20.6 <.001*

CogAT7 1.05 (0.97, 1.14) 0.95 (0.91, 0.99) 5.3 .02* 1.34 (1.22, 1.46) 1.50 (1.42, 1.58) 4.4 .04*

Report card 0.78 (0.70, 0.87) 0.77 (0.74, 0.80) 0.1 .76 1.19 (1.09, 1.28) 1.18 (1.13, 1.24) 0.0 .97

Teacher recommendation 0.53 (0.49, 0.58) 0.50 (0.48, 0.52) 1.3 .25 0.69 (0.63, 0.76) 0.68 (0.65, 0.72) 0.1 .79

White Not White White Not White

ELA 1.13 (1.04, 1.22) 0.92 (0.89, 0.94) 24.1 <.001* 1.34 (1.18, 1.50) 0.90 (0.87, 0.94) 38.1 <.001*

Math 0.88 (0.81, 0.95) 1.09 (1.06, 1.12) 24.1 <.001* 0.75 (0.66, 0.83) 1.11 (1.06, 1.15) 38.1 <.001

CogAT7 1.04 (0.94, 1.13) 0.93 (0.89, 0.97) 4.7 .03* 1.00 (0.86, 1.13) 1.57 (1.49, 1.64) 39.9 <.001***

Report card 0.86 (0.78, 0.94) 0.76 (0.72, 0.79) 5.8 .02* 0.79 (0.71, 0.87) 1.28 (1.23, 1.34) 65.8 <.001***

Teacher recommendation 0.47 (0.43, 0.51) 0.52 (0.50, 0.54) 4.0 .05* 0.41 (0.35, 0.47) 0.75 (0.71, 0.79) 69.9 <.001*

Asian Not Asian Asian Not Asian

ELA 0.78 (0.66, 0.90) 0.96 (0.94, 0.99) 7.0 .008 0.72 (0.60, 0.85) 0.98 (0.94, 1.02) 12.3 <.001*

Math 1.29 (1.09, 1.49) 1.04 (1.01, 1.07) 7.0 .008 1.39 (1.15, 1.62) 1.02 (0.98, 1.06) 12.3 <.001*

CogAT7 2.39 (2.01, 2.77) 0.90 (0.86, 0.93) 113.4 <.001* 1.62 (1.29, 1.94) 1.45 (1.38, 1.51) 1.1 .29

Report card 1.71 (1.45, 1.97) 0.73 (0.70, 0.76) 100.0 <.001*** 0.95 (0.78, 1.12) 1.21 (1.16, 1.26) 6.2 .01*

Teacher recommendation 0.70 (0.59, 0.81) 0.49 (0.48, 0.51) 17.8 <.001*** 0.63 (0.51, 0.76) 0.69 (0.66, 0.73) 0.8 .39

Other Not Other Other Not Other

ELA 1.13 (0.92, 1.34) 0.94 (0.92, 0.97) 3.5 .06 1.30 (0.72, 1.87) 0.96 (0.92, 1.00) 1.6 .20

Math 0.89 (0.73, 1.05) 1.06 (1.03, 1.09) 3.5 .06 0.77 (0.43, 1.11) 1.04 (1.00, 1.08) 1.6 .20

CogAT7 1.18 (0.94, 1.43) 0.95 (0.91, 0.98) 3.9 .05* 1.05 (0.49, 1.61) 1.46 (1.39, 1.53) 1.5 .23

Report card 0.85 (0.55, 1.05) 0.78 (0.75, 0.81) 0.6 .46 0.79 (0.45, 1.12) 1.19 (1.15, 1.24) 3.0 .08

Teacher recommendation 0.47 (0.37, 0.56) 0.51 (0.49, 0.53) 0.7 .40 0.53 (0.25, 0.81) 0.69 (0.66, 0.72) 0.9 .35

FRPL Not FRPL FRPL Not FRPL

ELA 0.94 (0.91, 0.97) 0.99 (0.92, 1.05) 1.6 .21 0.87 (0.83, 0.91) 1.21 (1.11, 1.32) 43.3 <.001*

Math 1.07 (1.03, 1.10) 1.02 (0.95, 1.08) 1.6 .21 1.15 (1.10, 1.21) 0.83 (0.75, 0.90) 43.3 <.001*

CogAT7 0.90 (0.86, 0.94) 1.07 (0.99, 1.16) 13.6 <.001* 1.72 (1.63, 1.81) 0.96 (0.86, 1.07) 92.2 <.001*

Report card 0.75 (0.72, 0.79) 0.86 (0.78, 0.93) 6.0 .01* 1.40 (1.34, 1.46) 0.80 (0.74, 0.86) 123.9 <.001*

Teacher recommendation 0.53 (0.51, 0.55) 0.46 (0.43, 0.50) 9.0 .003 0.82 (0.78, 0.87) 0.43 (0.38, 0.48) 117.6 <.001*

EB Not EB EB Not EB

ELA 0.80 (0.77, 0.84) 1.06 (1.02, 1.09) 83.6 <.001* 0.60 (0.54, 0.65) 1.14 (1.08, 1.19) 153.7 <.001*

Math 1.24 (1.19, 1.30) 0.95 (0.91, 0.98) 83.6 <.001* 1.68 (1.53, 1.83) 0.88 (0.84, 0.92) 153.7 <.001*

CogAT7 0.84 (0.78, 0.89) 1.04 (0.99, 1.08) 31.2 <.001* 2.40 (2.16, 2.65) 1.22 (1.16, 1.29) 116.4 <.001*

Report card 0.61 (0.56, 0.65) 0.89 (0.85, 0.93) 72.1 <.001* 2.25 (2.08, 2.43) 0.99 (0.94, 1.03) 329.3 <.001*

Teacher recommendation 0.50 (0.47, 0.52) 0.51 (0.48, 0.53) 0.3 .60 1.23 (1.12, 1.34) 0.58 (0.55, 0.61) 174.8 <.001*

SPED Not SPED SPED Not SPED

ELA 0.87 (0.74, 0.99) 0.95 (0.92, 0.98) 1.5 .23 0.91 (0.83, 1.00) 0.97 (0.93, 1.00) 1.1 .30

Math 1.15 (0.99, 1.32) 1.05 (1.02, 1.08) 1.5 .23 1.09 (0.99, 1.20) 1.04 (0.99, 1.08) 1.1 .30

CogAT7 1.15 (0.94, 1.35) 0.95 (0.92, 0.99) 3.7 .05 2.28 (2.03, 2.53) 1.41 (1.35, 1.48) 51.0 <.001*

Report card 0.88 (0.69, 1.06) 0.78 (0.75, 0.81) 1.1 .30 2.03 (1.85, 2.22) 1.15 (1.11, 1.20) 112.9 <.001*

Teacher recommendation 0.71 (0.59, 0.83) 0.51 (0.49, 0.52) 14.2 <.001* 1.44 (1.32, 1.57) 0.66 (0.62, 0.69) 229.5 <.001l

Note. df = 1 for all Δχ² tests; p < .05, p < .01, * p < .001; ELA and Math scores are constrained to the same p-value. CI=confidence interval; CogAT= Cognitive Abilities Test; ELA=English/Language Arts.

For fifth graders, all five measures had different factor loadings when all other items were constrained to be equal, at the .001 significance level for the following demographic student groups: gender, Latinx, White, FRPL, and EB. For kindergarteners, no demographic group exhibited this pattern of results, although EB students had four out of five measures demonstrating noninvariance. For kindergarteners, three demographic student groups (Black, Other, and SPED) had a single measure demonstrating noninvariance. For fifth-graders one demographic student group (Black) had no measures showing noninvariance. All other demographic groups had at least three measures showing noninvariance.

Summary of Results

A summary of all significant results for the three matrix invariance questions can be found on Table 9.

Table 9.
Significant Factor Loadings for Research Questions 4–6.

Comparison Iowa/Logramos CogAT7 Report Card Teacher Recommendation

ELA Math

RQ4: Grade Fifth Fifth Fifth

RQ5: Demographics

Gender Female Male Male Male

Racial/ethnic identity

Latinx Non-Latinx Latinx Non-Latinx Non-Latinx

Black Black Black Black

White White Non-White Non-White

Asian Non-Asian Asian Asian Asian Asian

Other Other Non-Other

Socioeconomic status Non-FRPL FRPL FRPL

Emergent bilingual Non-EB EB Non-EB Non-EB

Special education status Non-SPED SPED SPED SPED SPED

RQ6: Demographics by grade

kindergarten

Gender Female Male Male

Racial/ethnic Identity

Latinx Non-Latinx Latinx Non-Latinx Non-Latinx

Black Black

White White Non-White White White Non-White

Asian Non-Asian Asian Asian Asian Asian

Other Other

Socioeconomic status Non-FRPL Non-FRPL FRPL

Emergent bilingual Non-EB EB Non-EB Non-EB

Special education status SPED

Fifth grade

Gender Female Male Male Male Male

Racial/ethnic identity

Latinx Non-Latinx Latinx Latinx Latinx Latinx

Black Black Non-Black Non-Black

White White Non-White Non-White Non-White Non-White

Asian Non-Asian Asian Non-Asian

Other

Socioeconomic status Non-FRPL FRPL FRPL FRPL FRPL

Emergent bilingual Non-EB EB EB EB EB

Special education status SPED SPED SPED

Note. Group with the significantly stronger relationship is indicated on the table. Empty space indicates equivalent strength between groups. CogAT= Cognitive Abilities Test; EB=emergent bilingual; ELA=English/Language Arts.

Discussion

Explanation of Findings

Our first three research questions (RQ1–3) explored differences in student total matrix scores and identification outcomes by student demographic/grade level. We found significant differences between all demographic and grade level groups for both total matrix scores and identification outcomes, which supports our inquiry into the measurement invariance of the overall matrix. For our first research question, we found kindergarten students scored significantly higher than fifth grade students on the total matrix score but had a significantly larger standard deviation. This finding is in line with Lohman and Korb (2006), which found that expected scores over time may decrease, at least in part due to regression to the mean, while variance should also decrease if the instruments are scaled using a Rasch (1960) model. There was also a significantly different identification outcome between groups, matching the significant difference in scores. Lohman and Korb (2006) state that “even for highly reliable test scores, approximately half of the students who score in the top 3% of the score distribution in 1 year will not fall in the top 3% of the distribution in the next year” (p. 478). However, the effect sizes for both findings were weak, which may indicate this finding was more strongly influenced by the large sample size used rather than grade level effects.

Our second research question explored differences in total matrix scores and identification outcomes by demographic group, and, once again, we found significant differences for both total matrix score and outcome. This also aligns with previous findings in gifted identification literature. Hodges et al. (2018) found significant differences in proportional identification rates across student race/ethnicity in a recent meta-analysis of 54 studies. Significant differences in gifted identification based on ethnicity, gender, poverty, and emergent bilingual status were also found by Ricciardi et al. (2020). Most effects were in the weak to small range, again indicating large sample size may have affected the results. However, a few comparisons showed stronger effects, with a moderately strong effect found for the difference in total score between students who have FRPL and non-FRPL, a moderate effect for the difference in total score between students who qualify for SPED services and those who do not, and a moderate effect for race/ethnicity on total matrix scores.

For our third research question, exploring demographic differences in total matrix scores and identification outcomes by grade level, we again found significant differences between all groups at all levels for both total matrix score and identification outcome. Similar to the first two questions’ results, most effect sizes were weak to small with a few exceptions. Student FRPL status showed a moderately strong effect on total matrix score for both the kindergarten and fifth grade sample. EB and SPED status had moderate effects on total matrix scores for fifth grade students, but weaker effects for kindergarteners. Student race/ethnicity had moderate effects on total matrix scores for both kindergarten and fifth grade students. Significant differences in identification by demographic between grade levels has previously been found by Ricciardi et al. (2020) and Hodges et al. (2018) among others. All effects of group membership on identification outcomes were weak to small across all three research questions.

For the three matrix measurement invariance questions (RQ4–6), there were significant differences in model fit across all three comparisons. When the five matrix components (ELA, Math, CogAT, report card, and teacher recommendations) were set to be equivalent, model fit worsened for different grade levels, demographic groups, and demographic groups within each grade level. This worsening of model fit indicates the matrix does not function equivalently across demographics/grade levels. Combining these diverse measures into one identification matrix did not remove the differential functioning of the individual matrix instrument components, as discussed in Moon (2017). While some matrix components function equivalently for certain comparisons, no matrix component consistently functions equivalently across all comparison groups. This is a similar finding to many prior studies that explored the measurement invariance of component instruments separately (Lee et al., 2022; Pereira, 2021; Peters & Gentry, 2013; Warne, 2011, 2023).

Some patterns did emerge in the strength and significance of factor loadings of matrix components between groups. For readers unfamiliar with factor loadings, the higher (or stronger) the factor loading is for any group, the stronger it effects that group, but not necessarily in a positive or negative way, similar to the magnitude of a correlation that indicates strength without direction. If a matrix component has a significantly stronger loading for one group in comparison to another, it indicates that component has more weight in the overall total score for that group than the other group, but that weight could be positive or negative in nature. So, a matrix component that has a significantly stronger loading for males means that it affects males more than females in determining gifted identification in this sample, but not necessarily in a positive way.

Across all comparisons, neither the CogAT7 nonverbal score nor the report card produced any recognizable pattern of effect for any one demographic/grade level group. This means neither the CogAT7 nonverbal nor the report card disadvantaged/advantaged any one group consistently. However, the other three matrix components did produce repeated similar results for various demographic groups across all three comparisons (all demographics, demographics for kindergarten, demographics for fifth grade). For teacher recommendations, across all demographic comparisons, students who are identified as either male, non-White, on FRPL, and/or using SPED services had significantly stronger factor loadings than their comparison demographic group. Teacher recommendations had significantly stronger effects (positive or negative) on gifted identification for members of those groups than on members of other demographic groups. This is in line with McBee (2006), which found teacher nominations to be more less to identify Black, Hispanic, and low-SES students. The IOWA/Logramos Math score also displayed repeated results for specific demographic groups across all demographic comparisons, with students identified as either male, Latinx, non-White, Asian, and/or EB showing significantly stronger factor loadings than their demographic counterparts. Math scores had significantly stronger effects (positive or negative) on gifted identification for members of those groups than on members of other demographic groups. Finally, IOWA/Logramos ELA scores also displayed repeated results for specific demographic groups across all demographic comparisons, although these groups were almost completely diametrically opposite than the previous findings for Math scores and teacher recommendations. The demographic groups with the significantly stronger loadings across all demographic comparisons on the ELA scores were either female, non-Latinx, White, non-Asian, and/or non-EB. ELA scores had significantly stronger effects (positive or negative) on gifted identification for members of those groups than on members of other demographic groups. These results align with the general body of literature in the field, including Lewis et al. (2007), who found an achievement test to identify significantly fewer ethnically diverse students than White students, Petersen (2013), who found achievement tests were more likely to identify boys than girls, and Abedi (2002), who found significant differences in performance on achievement tests based on EB status.

Limitations and Implications

There are multiple limitations on both the generalizability and validity of these findings. Similar to Carman et al. (2020), this research used a sample generated by one very large district, which has effects on both external and statistical conclusion validity of our findings. The participating district, while very large, is also very diverse. However, that diversity is spread across more than 100 elementary schools and each school within the district has a different assortment of CLED students, with a high level of economic, ethnic/racial, and native language self-segregation between schools. In as much as the participating district mirrors the demographic makeup of districts around the country, our results may not be as applicable to districts that are demographically different. Additionally, our results are based on the in-use identification matrix of the participating district. While the results of exploring the measurement invariance of a single district's matrix will not necessarily generalize across districts and matrices, we are able to conduct an exploration of measurement invariance within a single district's identification matrix because all the students within that single district took the same measures and were measured through the same single matrix. While matrices overall will differ by district, components included, and student body makeup, we believe our exploration of measurement invariance within one district's matrix adds to the overall gifted identification literature and fills a knowledge gap. Districts that use different matrix components or who combine those components using different methods may find different results, although we suspect different combinations of instruments will produce similar results if not combined in statistically appropriate ways. Unfortunately, this is in keeping with the general lack of common identification methods across schools/districts/states/countries. If our educational system (or field) were to agree upon a common definition (and therefore common identification method) of giftedness, any study of identification methods would then have more generalizability. Until we come to a common consensus on how to identify the gifted, districts that use multiple measure matrix identification may wish to explore differing combination rules and/or different measures statistically, to see if changing the components or how they are combined could lead to improved and equitable identification for the gifted CLED students.

The use of large sample sizes can lead to a greater likelihood of finding statistical significance for many statistical analyses. We calculated, interpreted, and presented effect sizes and CIs as a means of counterbalancing potentially inflated statistically significant findings. While almost every comparison was found to be statistically significant, our effect sizes were mostly weak to small, which could indicate artificial inflation of our statistical significance findings due to the size of our sample.

Additionally, our analyses do not reflect the actual implementation of the identification matrix in the participating district. The participating district includes the option to add additional points to the total matrix scores of students who, by virtue of their demographic groups membership(s), have experienced historically less OTL than their majority counterparts. While we chose to remove consideration of those extra points from our exploration of matrix functioning because it would most likely have been a confound in our study, that removal also made our findings less reflective of actual practice in this one district. Future studies could explore the functioning of the identification matrix including the OTL scores and determine if the additional points added to the total makes a difference in who is selected and how the matrix functions for the various demographic groups.

These patterns of effects, where a particular matrix component more strongly affects one group than another even across grade level, should be explored further, especially in areas where these findings are in line with previous literature. For components that can be affected by training, such as teacher recommendations, these patterns may signal areas for further focus during teacher development sessions. For components that are less affected by training, districts may want to monitor the outcomes of those instruments to identify if there are patterns of differences in scores among demographic groups in their students and potentially apply targeted remediation/development or move to a within-group comparison model. A model for closing these scoring gaps, whether they be matrix-specific or not, can be found in Plucker and Peters (2016) Excellence Gaps Intervention Model, which proposes six areas for targeted interventions at the national, district, and classroom level.

Although using multiple criteria/modes is a widely supported recommendation, the use of those instruments in matrix form offers no silver bullet to the problem of underrepresentation in gifted identification. Similar to previous results by Pereira (2021) and Peters and Gentry (2013), we found significant differences in how this identification matrix functioned by demographic groups, which could have effects on which students are identified if the results of the matrix are used without modification. The multiple significant differences in factor loadings for many of the demographic groups indicate that using this identification matrix to compare demographically different students will leave some students at a disadvantage either through producing biased scores or because of true score differences between the demographic groups (Pereira, 2021; Peters, 2022). The use of multiple criteria/modes is important for capturing a clearer picture of the talents and abilities of the students we are considering for enrichment programs but using those scores in a comparative or competitive way can lead to significant harm for CLED students. While we encourage the continued use of multiple criteria/modes, we strongly discourage districts from using the results of those matrices to compare students from differing demographics. Identifying students with matrix-based results through the use of within-group comparisons rather than in comparison to overall cross-group scores may result in more equitable selection.

In addition to our efforts to create better instruments and to use those instruments in less-biased ways, we should also be expanding our efforts not on ever more complex identification models to try to capture every single gifted student, but rather on providing opportunities to learn at an earlier stage (frontloading), expanding access to targeted advanced programming that is responsive to local needs, and providing support to retain those students we’ve identified, among other areas (Plucker et al., 2022). In areas/districts with little funding for enrichment, we might change our framing and simplify instead, determine the resources each school/district has and the enrichment programs they are capable of offering and identifying for those specific programs rather than trying to perfectly and broadly identify students who will then not be well-served by the schools in which they are identified (Gubbins et al., 2021).

Using an identification matrix is an easy way to combine scores on a variety of measures that is simple for teachers and administrators to use without advanced training in statistical analysis. We encourage the continued use of identification matrices to combine diverse instruments in a nongatekeeping manner but suggest that the results of those matrices be used to identify students within similar backgrounds, such as through the use of building/local norms (Carman et al., 2020; Peters et al., 2019), rather than district wide, and as only a part of a broader model for equitable identification and service. Districts should be aware that using multiple measures, even in matrix format, will not in itself result in more equitable identification decisions, but may be part of creating a more equitable system for identification overall.

	Kindergarten	5th Grade
Sample size (n)	13364	8916
Age (M years)	5.3	10.4
Gender
Female (%)	49.5	49.1
Male (%)	50.5	50.9
Race/ethnicity
Latinx (%)	65.4	66.4
Black (%)	21.8	25.8
White (%)	8.1	4.9
Asian (%)	3.4	2.9
Native American (%)	0.1	0.1
Multiracial (%)	1.1	0.5
Low SES (%)	81.2	85.7
EB (%)	47.5	41.3
SPED (%)	2.8	8.0

Demographic	Overall	Kindergarten	5th Grade
Overall	12.38	15.85	13.18	17.24	10.88	13.35
Gender
Female	11.46	15.41	14.27	17.68	11.87	13.63
Male	13.31	16.23	12.50	16.76	9.92	13.00
Race/ethnicity
Latinx	11.13	14.65	11.93	16.00	9.95	12.29
Black	10.47	14.05	11.07	15.27	9.70	12.28
Non-Latinx Caucasian	25.09	20.79	25.82	21.56	23.31	18.69
Asian	22.57	21.00	22.03	21.70	23.89	19.16
I/PI/Other	22.36	20.85	24.72	21.96	15.97	15.98
SES (FRPL)
Low	10.41	14.05	11.11	15.40	9.43	11.81
Average to High	21.94	20.08	23.17	20.96	19.50	17.97
EB
EB	10.41	14.42	12.31	16.25	7.13	9.70
Non-EB	13.98	16.75	14.33	18.04	13.50	14.85
SPED
SPED	3.82	8.33	5.95	11.54	2.70	5.69
Non-SPED	12.81	16.01	13.59	17.33	11.59	13.58

Comparison	t	df	p	95% CI	d
Gender	−8.7	22140.41	<.001	[−2.27, −1.44]	0.12
SES	33.68	4578.54	<.001	[10.85, 12.19]	0.76
EB	17.08	22219.42	<.001	[3.16, 3.98]	0.23
SPED	32.62	1533.51	<.001	[8.45, 9.53]	0.57

Comparison	t	df	p	95% CI	d
Gender	−5.94	13287.58	<.001	[−2.35, −1.19]	0.10
SES	27.22	3169.25	<.001	[11.20, 12.94]	0.73
EB	6.81	13361.89	<.001	[1.44, 2.60]	0.12
SPED	12.43	424.27	<.001	[6.43, 8.85]	0.44

Comparison	t	df	p	95% CI	d
Gender	−6.92	8852.83	<.001	[−2.51, −1.40]	0.15
SES	19.33	1464.29	<.001	[9.05, 11.09]	0.78
EB	24.5	8870.02	<.001	[5.86, 6.88]	0.49
SPED	34.06	1553.76	<.001	[8.37, 9.40]	0.68

Instrument	Factor Loadings (95% CI)	Δχ²	p-Value
ELA	0.96 (0.93, 1.00)	0.95 (0.92, 0.97)	0.4	.55
Math	1.04 (1.00, 1.08)	1.06 (1.03, 1.09)	0.4	.55
CogAT7	1.40 (1.32, 1.46)	0.93 (0.89, 0.96)	158.9	<.001***
Report card	1.13 (1.09, 1.18)	0.74 (0.71, 0.77)	185.4	<.001***
Teacher Recommendation	0.67 (0.64, 0.70)	0.50 (0.48, 0.51)	88.8	<.001***

Instrument	Factor Loadings (95% CI)	Δχ²	p-Value
ELA	0.85 (0.82, 0.87)	1.06 (1.03, 1.09)	97.4	<.001***
Math	1.18 (1.14, 1.22)	0.94 (0.92, 0.97)	97.4	<.001***
CogAT7	1.14 (1.09, 1.19)	1.00 (0.96, 1.04)	19.5	<.001***
Report card	0.88 (0.84, 0.92)	0.83 (0.80, 0.87)	3.0	.09
Teacher recommendation	0.59 (0.56, 0.61)	0.52 (0.50, 0.54)	17.6	<.001***
	Latinx	Not Latinx
ELA	0.91 (0.88, 0.93)	1.01 (0.98, 1.05)	21.5	<.001***
Math	1.11 (1.07, 1.14)	0.99 (0.95, 1.02)	21.5	<.001***
CogAT7	0.97 (0.93, 1.01)	1.17 (1.12, 1.22)	37.5	<.001***
Report card	0.79 (0.76, 0.82)	0.92 (0.88, 0.97)	25.5	<.001***
Teacher recommendation	0.56 (0.54, 0.58)	0.53 (0.50, 0.55)	3.8	.05
	Black	Not Black
ELA	0.97 (0.92, 1.02)	0.93 (0.91, 0.95)	2.2	.07
Math	1.03 (0.98, 1.08)	1.07 (1.05, 1.10)	2.2	.14
CogAT7	1.15 (1.08, 1.22)	1.06 (1.03, 1.10)	5.5	.02*
Report card	0.91 (0.85, 0.97)	0.84 (0.81, 0.86)	4.6	.03*
Teacher recommendation	0.58 (0.55, 0.62)	0.54 (0.52, 0.56)	3.9	.05*
	White	Not White
ELA	1.18 (1.10, 1.25)	0.92 (0.89, 0.93)	51.5	<.001***
Math	0.85 (0.80, 0.91)	1.10 (1.07, 1.13)	51.5	<.001***
CogAT7	1.00 (0.92, 1.07)	1.08 (1.04, 1.11)	3.2	.07
Report card	0.81 (0.75, 0.87)	0.86 (0.83, 0.89)	2.5	.11
Teacher recommendation	0.45 (0.42, 0.48)	0.57 (0.56, 0.59)	37.3	<.001***
	Asian	Not Asian
ELA	0.76 (0.67, 0.85)	0.96 (0.94, 0.99)	16.7	<.001***
Math	1.32 (1.17, 1.47)	1.04 (1.01, 1.06)	16.7	<.001***
CogAT7	2.09 (1.84, 2.34)	1.01 (0.98, 1.04)	112.3	<.001***
Report card	1.40 (1.24, 1.57)	0.82 (0.80, 0.85)	69.9	<.001***
Teacher recommendation	0.66 (0.58, 0.74)	0.54 (0.52, 0.56)	10.0	.002**
	Other	Not Other
ELA	1.12 (0.94, 1.31)	0.95 (0.92, 0.97)	4.2	.04*
Math	0.89 (0.75, 1.04)	1.06 (1.04, 1.08)	4.2	.04*
CogAT7	1.14 (0.93, 1.35)	1.06 (1.03, 1.10)	0.4	.51
Report card	0.80 (0.63, 0.96)	0.86 (0.83, 0.88)	0.5	.48
Teacher recommendation	0.46 (0.37, 0.55)	0.55 (0.54, 0.57)	3.2	.07
	FRPL	Not FRPL
ELA	0.92 (0.90, 0.95)	1.02 (0.97, 1.08)	11.9	<.001***
Math	1.08 (1.06, 1.11)	0.98 (0.93, 1.03)	11.9	<.001***
CogAT7	1.07 (1.04, 1.11)	1.04 (0.97, 1.11)	0.7	.40
Report card	0.87 (0.84, 0.90)	0.82 (0.77, 0.88)	2.2	.14
Teacher recommendation	0.59 (0.57, 0.61)	0.46 (0.43, 0.48)	54.5	<.001***
	EB	Not EB
ELA	0.77 (0.74, 0.80)	1.08 (1.05, 1.11)	181.2	<.001***
Math	1.30 (1.25, 1.35)	0.93 (0.90, 0.95)	181.2	<.001***
CogAT7	0.97 (0.91, 1.02)	1.10 (1.06, 1.14)	17.0	<.001***
Report card	0.70 (0.66, 0.74)	0.92 (0.89, 0.96)	57.6	<.001***
Teacher recommendation	0.56 (0.53, 0.58)	0.53 (0.51, 0.55)	1.8	.19
	SPED	Not SPED
ELA	0.87 (0.81, 0.93)	0.95 (0.93, 0.97)	5.4	.02*
Math	1.15 (1.07, 1.23)	1.05 (1.03, 1.08)	5.4	.02*
CogAT7	1.41 (1.28, 1.54)	1.06 (1.02, 1.09)	28.5	<.001***
Report card	1.14 (1.03, 1.25)	0.85 (0.82, 0.87)	30.8	<.001***
Teacher recommendation	0.95 (0.88, 1.02)	0.54 (0.52, 0.56)	164.0	<.001***

Instrument	Kindergarten	5th Grade
ELA	0.91 (0.87, 0.94)	0.99 (0.95, 1.03)	10.5	.001**	0.72 (0.68, 0.76)	1.28 (1.21, 1.35)	215.7	<.001***
Math	1.11 (1.06, 1.15)	1.01 (0.97, 1.05)	10.5	.001**	1.39 (1.32, 1.46)	0.78 (0.74, 0.83)	215.7	<.001***
CogAT7	0.99 (0.94, 1.04)	0.92 (0.88, 0.97)	3.8	.05	1.72 (1.60, 1.83)	1.24 (1.17, 1.32)	48.9	<.001***
Report card	0.77 (0.73, 0.82)	0.79 (0.74, 0.83)	0.3	.60	1.31 (1.24, 1.38)	1.10 (1.04, 1.15)	22.1	<.001***
Teacher recommendation	0.53 (0.51, 0.56)	0.49 (0.46, 0.51)	6.3	.01*	0.76 (0.71, 0.82)	0.62 (0.58, 0.66)	40.5	<.001***
	Latinx	Not Latinx			Latinx	Not Latinx
ELA	0.92 (0.89, 0.95)	0.99 (0.95, 1.04)	5.9	.01*	0.86 (0.81, 0.90)	1.12 (1.05, 1.18)	44.0	<.001***
Math	1.09 (1.05, 1.13)	1.01 (0.96, 1.06)	5.9	.01*	1.17 (1.11, 1.23)	0.90 (0.85, 0.95)	44.0	<.001***
CogAT7	0.81 (0.77, 0.85)	1.15 (1.09, 1.21)	86.6	<.001***	1.65 (1.55, 1.76)	1.23 (1.14, 1.31)	38.3	<.001***
Report card	0.66 (0.62, 0.70)	0.93 (0.88, 0.99)	61.9	<.001***	1.38 (1.31, 1.45)	0.99 (0.94, 1.05)	69.6	<.001***
Teacher recommendation	0.49 (0.47, 0.52)	0.52 (0.49, 0.55)	1.7	.20	0.80 (0.75, 0.85)	0.57 (0.53, 0.61)	50.7	<.001***
	Black	Not Black			Black	Not Black
ELA	0.91 (0.85, 0.97)	0.94 (0.91, 0.97)	0.9	.36	1.13 (1.04, 1.22)	0.92 (0.88, 0.96)	20.6	<.001***
Math	1.10 (1.03, 1.18)	1.06 (1.03, 1.10)	0.8	.36	0.89 (0.82, 0.96)	1.09 (1.05, 1.14)	20.6	<.001***
CogAT7	1.05 (0.97, 1.14)	0.95 (0.91, 0.99)	5.3	.02*	1.34 (1.22, 1.46)	1.50 (1.42, 1.58)	4.4	.04*
Report card	0.78 (0.70, 0.87)	0.77 (0.74, 0.80)	0.1	.76	1.19 (1.09, 1.28)	1.18 (1.13, 1.24)	0.0	.97
Teacher recommendation	0.53 (0.49, 0.58)	0.50 (0.48, 0.52)	1.3	.25	0.69 (0.63, 0.76)	0.68 (0.65, 0.72)	0.1	.79
	White	Not White			White	Not White
ELA	1.13 (1.04, 1.22)	0.92 (0.89, 0.94)	24.1	<.001***	1.34 (1.18, 1.50)	0.90 (0.87, 0.94)	38.1	<.001***
Math	0.88 (0.81, 0.95)	1.09 (1.06, 1.12)	24.1	<.001***	0.75 (0.66, 0.83)	1.11 (1.06, 1.15)	38.1	<.001**
CogAT7	1.04 (0.94, 1.13)	0.93 (0.89, 0.97)	4.7	.03*	1.00 (0.86, 1.13)	1.57 (1.49, 1.64)	39.9	<.001***
Report card	0.86 (0.78, 0.94)	0.76 (0.72, 0.79)	5.8	.02*	0.79 (0.71, 0.87)	1.28 (1.23, 1.34)	65.8	<.001***
Teacher recommendation	0.47 (0.43, 0.51)	0.52 (0.50, 0.54)	4.0	.05*	0.41 (0.35, 0.47)	0.75 (0.71, 0.79)	69.9	<.001***
	Asian	Not Asian			Asian	Not Asian
ELA	0.78 (0.66, 0.90)	0.96 (0.94, 0.99)	7.0	.008**	0.72 (0.60, 0.85)	0.98 (0.94, 1.02)	12.3	<.001***
Math	1.29 (1.09, 1.49)	1.04 (1.01, 1.07)	7.0	.008**	1.39 (1.15, 1.62)	1.02 (0.98, 1.06)	12.3	<.001***
CogAT7	2.39 (2.01, 2.77)	0.90 (0.86, 0.93)	113.4	<.001***	1.62 (1.29, 1.94)	1.45 (1.38, 1.51)	1.1	.29
Report card	1.71 (1.45, 1.97)	0.73 (0.70, 0.76)	100.0	<.001***	0.95 (0.78, 1.12)	1.21 (1.16, 1.26)	6.2	.01*
Teacher recommendation	0.70 (0.59, 0.81)	0.49 (0.48, 0.51)	17.8	<.001***	0.63 (0.51, 0.76)	0.69 (0.66, 0.73)	0.8	.39
	Other	Not Other			Other	Not Other
ELA	1.13 (0.92, 1.34)	0.94 (0.92, 0.97)	3.5	.06	1.30 (0.72, 1.87)	0.96 (0.92, 1.00)	1.6	.20
Math	0.89 (0.73, 1.05)	1.06 (1.03, 1.09)	3.5	.06	0.77 (0.43, 1.11)	1.04 (1.00, 1.08)	1.6	.20
CogAT7	1.18 (0.94, 1.43)	0.95 (0.91, 0.98)	3.9	.05*	1.05 (0.49, 1.61)	1.46 (1.39, 1.53)	1.5	.23
Report card	0.85 (0.55, 1.05)	0.78 (0.75, 0.81)	0.6	.46	0.79 (0.45, 1.12)	1.19 (1.15, 1.24)	3.0	.08
Teacher recommendation	0.47 (0.37, 0.56)	0.51 (0.49, 0.53)	0.7	.40	0.53 (0.25, 0.81)	0.69 (0.66, 0.72)	0.9	.35
	FRPL	Not FRPL			FRPL	Not FRPL
ELA	0.94 (0.91, 0.97)	0.99 (0.92, 1.05)	1.6	.21	0.87 (0.83, 0.91)	1.21 (1.11, 1.32)	43.3	<.001***
Math	1.07 (1.03, 1.10)	1.02 (0.95, 1.08)	1.6	.21	1.15 (1.10, 1.21)	0.83 (0.75, 0.90)	43.3	<.001***
CogAT7	0.90 (0.86, 0.94)	1.07 (0.99, 1.16)	13.6	<.001***	1.72 (1.63, 1.81)	0.96 (0.86, 1.07)	92.2	<.001***
Report card	0.75 (0.72, 0.79)	0.86 (0.78, 0.93)	6.0	.01*	1.40 (1.34, 1.46)	0.80 (0.74, 0.86)	123.9	<.001***
Teacher recommendation	0.53 (0.51, 0.55)	0.46 (0.43, 0.50)	9.0	.003**	0.82 (0.78, 0.87)	0.43 (0.38, 0.48)	117.6	<.001***
	EB	Not EB			EB	Not EB
ELA	0.80 (0.77, 0.84)	1.06 (1.02, 1.09)	83.6	<.001***	0.60 (0.54, 0.65)	1.14 (1.08, 1.19)	153.7	<.001***
Math	1.24 (1.19, 1.30)	0.95 (0.91, 0.98)	83.6	<.001***	1.68 (1.53, 1.83)	0.88 (0.84, 0.92)	153.7	<.001***
CogAT7	0.84 (0.78, 0.89)	1.04 (0.99, 1.08)	31.2	<.001***	2.40 (2.16, 2.65)	1.22 (1.16, 1.29)	116.4	<.001***
Report card	0.61 (0.56, 0.65)	0.89 (0.85, 0.93)	72.1	<.001***	2.25 (2.08, 2.43)	0.99 (0.94, 1.03)	329.3	<.001***
Teacher recommendation	0.50 (0.47, 0.52)	0.51 (0.48, 0.53)	0.3	.60	1.23 (1.12, 1.34)	0.58 (0.55, 0.61)	174.8	<.001***
	SPED	Not SPED			SPED	Not SPED
ELA	0.87 (0.74, 0.99)	0.95 (0.92, 0.98)	1.5	.23	0.91 (0.83, 1.00)	0.97 (0.93, 1.00)	1.1	.30
Math	1.15 (0.99, 1.32)	1.05 (1.02, 1.08)	1.5	.23	1.09 (0.99, 1.20)	1.04 (0.99, 1.08)	1.1	.30
CogAT7	1.15 (0.94, 1.35)	0.95 (0.92, 0.99)	3.7	.05	2.28 (2.03, 2.53)	1.41 (1.35, 1.48)	51.0	<.001***
Report card	0.88 (0.69, 1.06)	0.78 (0.75, 0.81)	1.1	.30	2.03 (1.85, 2.22)	1.15 (1.11, 1.20)	112.9	<.001***
Teacher recommendation	0.71 (0.59, 0.83)	0.51 (0.49, 0.52)	14.2	<.001***	1.44 (1.32, 1.57)	0.66 (0.62, 0.69)	229.5	<.001***l

Comparison	Iowa/Logramos	CogAT7	Report Card	Teacher Recommendation
RQ4: Grade			Fifth	Fifth	Fifth
RQ5: Demographics
Gender	Female	Male	Male		Male
Racial/ethnic identity
Latinx	Non-Latinx	Latinx	Non-Latinx	Non-Latinx
Black			Black	Black	Black
White	White	Non-White			Non-White
Asian	Non-Asian	Asian	Asian	Asian	Asian
Other	Other	Non-Other
Socioeconomic status	Non-FRPL	FRPL			FRPL
Emergent bilingual	Non-EB	EB	Non-EB	Non-EB
Special education status	Non-SPED	SPED	SPED	SPED	SPED
RQ6: Demographics by grade
kindergarten
Gender	Female	Male			Male
Racial/ethnic Identity
Latinx	Non-Latinx	Latinx	Non-Latinx	Non-Latinx
Black			Black
White	White	Non-White	White	White	Non-White
Asian	Non-Asian	Asian	Asian	Asian	Asian
Other			Other
Socioeconomic status			Non-FRPL	Non-FRPL	FRPL
Emergent bilingual	Non-EB	EB	Non-EB	Non-EB
Special education status					SPED
Fifth grade
Gender	Female	Male	Male	Male	Male
Racial/ethnic identity
Latinx	Non-Latinx	Latinx	Latinx	Latinx	Latinx
Black	Black	Non-Black	Non-Black
White	White	Non-White	Non-White	Non-White	Non-White
Asian	Non-Asian	Asian		Non-Asian
Other
Socioeconomic status	Non-FRPL	FRPL	FRPL	FRPL	FRPL
Emergent bilingual	Non-EB	EB	EB	EB	EB
Special education status			SPED	SPED	SPED

Footnotes

ORCID iDs

Christine A. P. Walther

Robert A. Bartsch

Carol A. Carman

Ethical Considerations

This study received expedited IRB approvals,due to the archival nature of the educational data required,from both the IRB panels of our institution and the IRB of the partnering school district.

Funding

The authors received no financial support for the research,authorship,and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Data Availability Statement

Restrictions apply to the availability of these data,which were used under IRB permission of our participating school district for this study. Data may be available from the authors,subject to the permission of the partnering school district.

Author Biographies

Christine A. P. Walther is a professor in the Department of Psychology at the University of Houston—Clear Lake. She teaches undergraduate and graduate courses in Research Design,Statistics,and Adolescent Development. Her research interests include trajectories of substance use from adolescence to adulthood and investigation of classroom practices that increase understanding of research methods and statistics.

Robert A. Bartsch is a professor of Psychology at the University of Houston—Clear Lake. He teaches Social Psychology,Research and Statistics,and Critical Thinking. His research interests include the scholarship of teaching and learning.

Carol A Carman is a professor in the Department of Clinical Laboratory Sciences at the University of Texas Medical Branch at Galveston. She teaches courses in Applied Statistics,Research Design and Analysis,and Survey Design. Her research interests include giftedness,diversity,and scholarship of teaching and learning.

References

Abbott

M. L.

McQuarrie

(2015). Equitable assessment for hearing and deaf English language learners: An investigation of the impact of verbal load on PASS processes. In Papadopoulos

T. C.

Parrila

R. K.

Kirby

J. R.

(Eds.), Cognition, intelligence, and achievement (pp. 291–308). Academic Press. https://doi.org/10.1016/B978-0-12-410388-7.00015-4

Abedi

(2002). Standardized achievement tests and English language learners: Psychometrics issues. Educational Assessment, 8(3), 231–257. https://doi.org/10.1207/S15326977EA0803_02

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Author. https://www.testingstandards.net/open-access-files.html

Aparicio

(n.d.). The importance of equity in assessments: The development of Logramos Tecera Edicion, a comprehensive assessment of achievement in Spanish. Houghton Mifflin Harcourt. https://www.hmhco.com/∼/media/sites/home/hmh-assessments/assessments/logramos-3/pdf/logramos-third-edition_whitepaper.pdf%3Fla%3Den?srsltid=AfmBOopUvQeG7scfjx9z-00qlOG8fhvJ_hlReQEun7rYXtmnNzwTcki3

Britten

C. Y.

(2021). A single case study investigating the role of teacher perceptions in identifying underrepresented gifted students [Unpublished doctoral dissertation]. Northcentral University.

Callahan

C. M.

Moon

T. R.

(2013). Status of elementary gifted programs. National Research Center on the Gifted and Talented.

Callahan

C. M.

Renzulli

J. S.

Delcourt

A. B.

Hertberg

H. L.

(2012). Considerations for identification of gifted and talented students: An introduction to identification. In Callahan

C. M.

Hertberg-Davis

(Eds.), Fundamentals of gifted education: Considering multiple perspectives (pp. 83–91). Routledge.

Campbell

D. T.

Fiske

D. W.

(1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56(3), 81–105. https://doi.org/10.1037/h0046016

Carman

C. A.

(2013). Comparing apples and oranges: Fifteen years of definitions of giftedness in research. Journal of Advanced Academics, 24(1), 52–70. https://doi.org/10.1177/1932202X12472602

10.

Carman

C. A.

Walther

C. A.

Bartsch

R. A.

(2020). Differences in using the cognitive abilities test (CogAT7) nonverbal battery versus the Naglieri Nonverbal Ability Test (NNAT2) to identify the gifted/talented. Gifted Child Quarterly, 64(3), 171–191. https://doi.org/10.1177/0016986220921164

11.

Carman

C. A.

Walther

C. A. P.

Bartsch

R. A.

(2018). Using the Cognitive Abilities Test (CogAT) 7 nonverbal battery to identify the gifted/talented: An investigation of demographic effects and norming plans. Gifted Child Quarterly, 62(2), 193–209. https://doi.org/10.1177/0016986217752097

12.

Domina

Pharris-Ciurej

Penner

A. M.

Penner

E. K.

Brummet

Porter

S. R.

Sanabria

(2018). Is free and reduced-price lunch a valid measure of educational disadvantage? Educational Researcher, 47(9), 539–555. https://doi.org/10.3102/0013189X18797609

13.

Dunbar

Welch

(2022, December 14). About the Iowa Assessments. https://info.riversideinsights.com/hubfs/2022%20Site%20Redesign%20Collateral/Iowa%20Assessments/Iowa%20Assessments%20Overview%20Brochure%202022.pdf

14.

Erwin

J. O.

Worrell

F. C.

(2012). Assessment practices and the underrepresentation of minority students in gifted and talented education. Journal of Psychoeducational Assessment, 30(1), 74–87. https://doi.org/10.1177/0734282911428197

15.

FairTest. (2007). Mission statement. https://fairtest.org/mission-statement/

16.

Ferlazzo

(2023, November 24). There’s a difference between equity and equality. Schools need to understand that. Education Week. https://www.edweek.org/leadership/opinion-theres-a-difference-between-equity-and-equality-schools-need-to-understand-that/2023/11

17.

Ford

D. Y.

(2010). Underrepresentation of culturally different students in gifted education: Reflections about current problems and recommendations for the future. Gifted Child Today, 33(3), 31–35. https://doi.org/10.1177/107621751003300308

18.

Ford

D. Y.

Wright

B. L.

Trotman Scott

(2020). A matter of equity: Desegregating and integrating gifted and talented education for underrepresented students of color. Multicultural Perspectives, 22(1), 28–36. https://doi.org/10.1080/15210960.2020.1728275

19.

French

B. F.

Finch

W. H.

(2006). Confirmatory factor analytic procedures for the determination of measurement invariance. Structural Equation Modeling, 13(3), 378–402. https://doi.org/10.1207/s15328007sem1303_3

20.

Games

P. A.

Howell

J. F.

(1976). Pair wise multiple comparison procedures with unequal n's and/or variances. Journal of Educational Statistics, 1(2), 13–125. https://doi.org/10.2307/1164979

21.

George

B. G.

(1992). Developing an appropriate methodology to identify minority students for the Gifted Program (ED347749). ERIC. https://eric.ed.gov/?id=ED347749

22.

Giessman

J. A.

Gambrell

J. L.

Stebbins

M. S.

(2013). Minority performance on the Naglieri nonverbal ability test, second edition, versus the Cognitive Abilities Test, form 6: One gifted program’s experience. Gifted Child Quarterly, 57(2), 101–109. https://doi.org/10.1177/0016986213477190

23.

Godinez-Cedillo

(2022). When “best” is not enough: The underrepresentation of Latinx students in gifted education programs in “best practice” North Carolina public school districts [Unpublished honors thesis]. University of North Carolina at Chapel Hill.

24.

Green

S. B.

Salkind

N. J.

(2016). Using SPSS for Windows and Macintosh: Analyzing and Understanding Data (8th ed.). Pearson.

25.

Gubbins

E. J.

Siegle

Ottone-Cross

McCoach

D. B.

Dulong Langley

Callahan

C. M.

Brodersen

A. V.

Caughey

(2021). Identifying and serving gifted and talented students: Are identification and services connected? Gifted Child Quarterly, 65(2), 115–131. https://doi.org/10.1177/0016986220988308

26.

Hodges

Tay

Maeda

Gentry

(2018). A meta-analysis of gifted and talented identification practices. Gifted Child Quarterly, 62(2), 147–174. https://doi.org/10.1177/0016986217752107

27.

Holm

(1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2), 65–70. http://www.jstor.org/stable/4615733

28.

Jonson

J. L.

Geisinger

K. F.

(2022). Conceptualizing and contextualizing fairness standards, issues, and solutions across professional fields in educational psychology. In Jonson

J. L.

Geisinger

K. F.

(Eds.), Fairness in educational and psychological testing: Examining theoretical, research, practice, and policy implications of the 2014 standards (pp. 1–9). American Educational Research Association.

29.

Joseph

L. M.

Ford

D. Y.

(2006). Nondiscriminatory assessment: Considerations for gifted education. Gifted Child Quarterly, 50(1), 42–51. https://doi.org/10.1177/001698620605000105

30.

Jung

J. Y.

Hay

(2018). Identification of gifted and twice-exceptional students. In Jolly

J. L.

Jarvis

J. M.

(Eds.), Exploring gifted education: Australian and New Zealand perspectives (pp. 12–31). Taylor & Francis Group.

31.

Krisel

S. C.

Brown

R. S.

(1997). Georgia’s journey toward multiple-criteria identification of gifted students. Roeper Review, 20(2), A-1–A-3. https://doi.org/10.1080/02783199709553867

32.

Lakin

J. M.

(2010). Comparison of test directions for ability tests: Impact on young English-language learner and non-ELL students [Unpublished doctoral thesis]. University of Iowa.

33.

Lakin

J. M.

(2018). Making the cut in gifted selection: Selection decisions and their impact on program diversity. Gifted Child Quarterly, 62(2), 210–219. https://doi.org/10.1177/0016986217752099

34.

Lee

Gentry

Maeda

(2022). Validity evidence of the HOPE scale in Korea: Identifying gifted students from low-income and multicultural families. Gifted Child Quarterly, 66(1), 23–40. https://doi.org/10.1177/00169862211024590

35.

Lee

L. E.

McCoach

D. B.

Stambaugh

Makel

M. C.

Peters

S. J.

McBee

Johnson

K. R.

(2024). Identification criteria decisions impact cognitive ability profiles and demographic representation [Conference session]. American Educational Research Association Annual Meeting, Philadelphia, PA, USA.

36.

Lewis

J. D.

DeCamp-Fritson

S. S.

Ramage

J. C.

McFarland

M. A.

Archwamety

(2007). Selecting for ethnically diverse children who may be gifted using Raven’s standard progressive matrices and Naglieri nonverbal abilities test. Multicultural Education, 15(1), 38–42.

37.

Lidz

C. S.

Macrine

S. L.

(2001). An alternative approach to the identification of gifted culturally and linguistically diverse learners: The contribution of dynamic assessment. School Psychology International, 22(1), 74–96. https://doi.org/10.1177/01430343010221006

38.

List

Dykeman

(2021). Disproportionalities in gifted and talented education enrollment rates: An analysis of the U.S. civil rights data collection series. Preventing School Failure: Alternative Education for Children and Youth, 65(2), 108–113. https://doi.org/10.1080/1045988X.2020.1837061

39.

Logramos Third Edition Preliminary Technical Summary. (2014). Houghton Mifflin Harcourt.

40.

Lohman

D. F.

(2011). Introducing CogAT Form 7. https://www.hmhco.com/_/media/sites/home/hmh-assessments/assessments/cogat/pdf/cogat-cognitively-speaking-v7-aug-2011.pdf?la=en

41.

Lohman

D. F.

(2012). Decision strategies. In Hunsaker

S. L.

(Ed.), Identification: The theory and practice of identifying students for gifted and talented education services (pp. 217–248). Creative Learning Press.

42.

Lohman

D. F.

Korb

K. A.

(2006). Gifted today but not tomorrow? Longitudinal changes in ability and achievement during elementary school. Journal for the Education of the Gifted, 29(4), 451–484 https://doi.org/10.4219/jeg-2006-245

43.

Lohman

D. F.

Korb

K. A.

Lakin

J. M.

(2008). Identifying academically gifted English-language learners using nonverbal tests: A comparison of the Raven, NNAT, and CogAT. Gifted Child Quarterly, 52(4), 275–296. https://doi.org/10.1177/0016986208321808

44.

Lohman

D. F.

Lakin

J. M.

(2021). Nonverbal test scores as one component of an identification system: Integrating ability, achievement, and teacher ratings. In VanTassel-Baska

J. L.

(Ed.), Alternative assessments with gifted and talented students (pp. 41–66). Taylor & Francis Group. https://doi.org/10.4324/9781003232988-3

45.

Lohman

D. F.

Renzulli

(2007). A simple procedure for combining ability test scores, achievement test scores, and teacher ratings to identify academically talented children. https://drive.google.com/file/d/1PMhYyFQcFSitfZ3wv5Tts_415HW85mri/view

46.

Long

D. A.

(2022). Equity, not just equality: How equality of educational outcome policies could help narrow excellence and identification gaps. Gifted Child Quarterly, 66(2), 105–107. https://doi.org/10.1177/00169862211037944

47.

Long

D. A.

McCoach

D. B.

Siegle

Callahan

C. M.

Gubbins

E. J.

(2023). Inequality at the starting line: Underrepresentation in gifted identification and disparities in early achievement. AERA Open, 9, https://doi.org/10.1177/23328584231171535

48.

Marsili

Pellegrini

(2022). The relation between nominations and traditional measures in the gifted identification process: A meta-analysis. School Psychology International, 43(4), 321–338. https://doi.org/10.1177/01430343221105398

49.

McBee

M. T.

(2006). A descriptive analysis of referral sources for gifted identification screening by race and socioeconomic status. Journal of Advanced Academics, 17(2), 103–111. https://doi.org/10.4219/jsge-2006-686

50.

McBee

M. T.

Peters

S. J.

Waterman

(2014). Combining scores in multiple-criteria assessment systems: The impact of combination rules. Gifted Child Quarterly, 58(1), 69–89 https://doi.org/10.1177/0016986213513794

51.

McCabe

M. V.

(1978). A matrix based plan to help identify gifted students (ED155868). ERIC. https://eric.ed.gov/?q=ED155868&id=ED155868

52.

McCoach

D. B.

Peters

S. P.

Gambino

A. J.

Long

Siegle

(2024). Who are we measuring? Teacher effects in gifted and talented teacher rating scales. Exceptional Children, 90(4), 422–441. https://doi.org/10.1177/00144029241247035

53.

Moon

T. R.

(2017). Uses and misuses of matrices in identifying gifted students: Considerations for better practice. In Callahan

C. M.

Hertberg-Davis

H. L.

(Eds.), Fundamentals of gifted education: Considering multiple perspectives (2nd ed, pp. 116–124). Routledge.

54.

Mun

R. U.

Hemmler

Langley

S. D.

Ware

Gubbins

E. J.

Callahan

C. M.

McCoach

D. B.

Siegle

(2020). Identifying and serving English learners in gifted education: Looking back and moving forward. Journal for the Education of the Gifted, 43(4), 297–335. https://doi.org/10.1177/0162353220955230

55.

Naglieri

J. A.

(1997). Naglieri Nonverbal Ability Test: Multilevel technical manual. Harcourt Brace.

56.

Naglieri

J. A.

(2008). Naglieri Nonverbal Ability Test: Multilevel form technical report. Pearson.

57.

Naglieri

J. A.

Booth

A. L.

Winsler

(2004). Comparison of Hispanic children with and without limited English proficiency on the Naglieri Nonverbal Ability Test. Psychological Assessment, 16(1), 81–84 https://doi.org/10.1037/1040-3590.16.1.81

58.

Naglieri

J. A.

Ford

D. Y.

(2003). Addressing underrepresentation of gifted minority children using the Naglieri Nonverbal Ability Test (NNAT). Gifted Child Quarterly, 47(2), 155–160 https://doi.org/10.1177/001698620304700206

59.

Naglieri

J. A.

Ronning

M. E.

(2000). Comparison of White, African American, Hispanic, and Asian children on the Naglieri Nonverbal Ability Test. Psychological Assessment, 12(3), 328–334 https://doi.org/10.1037/1040-3590.12.3.328

60.

National Association for Gifted Children. (2009). State of the states in gifted education: 2008-2009. Author.

61.

National Association for Gifted Children. (2019). 2019 Pre-K-Grade 12 Gifted Programming Standards: Standard 2. http://www.nagc.org/sites/default/files/standards/Programming%20Standard%202%20Assessment.pdf

62.

Pearson

N. N.

(2001). An examination of the policy changes made to the identification process for gifted services in the state of Alabama: The impact of these changes on the identification of Hispanic and African American children [Unpublished doctoral dissertation]. University of Alabama, Tuscaloosa.

63.

Pereira

(2021). Finding talent among elementary English learners: A validity study of the HOPE teacher rating scale. Gifted Child Quarterly, 65(2), 153–166. https://doi.org/10.1177/0016986220985942

64.

Peters

McCoach

D. B.

Little

C. A.

(2025). Gifted identification matrices: An exploratory study of measurement considerations [Conference session]. American Educational Research Association Annual Meeting, Denver, CO, United States.

65.

Peters

S. J.

(2022). The challenges of achieving equity within public school gifted and talented programs. Gifted Child Quarterly, 66(2), 82–94. https://doi.org/10.1177/00169862211002535

66.

Peters

S. J.

Gentry

(2013). Additional validity evidence and across-group equivalency of the HOPE Teacher Rating Scale. Gifted Child Quarterly, 57(2), 85–100. https://doi.org/10.1177/0016986212469253

67.

Peters

S. J.

Johnson

(2024). Where are the English learners and students with disabilities in gifted education? AERA Open, 10(1), 1–17. https://doi.org/10.1177/23328584241258480

68.

Peters

S. J.

Johnson

Makel

M. C.

Carter

J. S.

III (2024). Who’s got talent for identifying talent? Predictors of equitable gifted identification for Black and Hispanic students. Gifted Child Quarterly, 68(3), 238–246. https://doi.org/10.1177/00169862241240483

69.

Peters

S. J.

Rambo-Hernandez

K. E.

Makel

M. C.

Matthews

M. S.

Plucker

J. A.

(2019). The effect of local norms on racial and ethnic representation in gifted education. AERA Open, 5(2), 1–18. https://doi.org/10.1177/2332858419848446

70.

Petersen

(2013). Gender differences in identification of gifted youth and in gifted program participation: A meta-analysis. Contemporary Educational Psychology, 38(2013), 342–348. https://doi.org/10.1016/j.cedpsych.2013.07.002

71.

Plucker

J. A.

Callahan

C. M.

(2014). Research on giftedness and gifted education: Status of the field and considerations for the future. Exceptional Children, 80(4), 390–406. https://doi.org/10.1177/0014402914527244

72.

Plucker

J. A.

Peters

S. J.

(2016). Excellence gaps in education: Expanding opportunities for talented students. Harvard Education Press.

73.

Plucker

J. A.

Wells

Meyer

M. S.

(2022). Identification policy: Addressing equity and excellence for advanced learners. Gifted Child Today, 45(4), 201–211. https://doi.org/10.1177/10762175221110942

74.

Rasch

(1960). Probabilistic models for some intelligence and attainment tests. Denmarks Pædagogiske Institut.

75.

Renzulli

J. S.

Smith

L. H.

White

A. J.

Callahan

C. M.

Hartman

R. K.

Westberg

K. L.

(2002). Scales for rating the behavior characteristics of superior students (Revised edition). Creative Learning Press.

76.

Ricciardi

Haag-Wolf

Winsler

(2020). Factors associated with gifted identification for ethnically diverse children in poverty. Gifted Child Quarterly, 64(4), 243–258. https://doi.org/10.1177/0016986220937685

77.

Rinn

A. N.

Mun

R. U.

Hodges

(2020). 2018-2019 State of the States in Gifted Education. National Association for Gifted Children and the Council of State Directors of Programs for the Gifted. https://www.nagc.org/2018-2019-state-states-gifted-education

78.

Riverside Insights. (2019, January). Logramos: Tercera edicion [Advertisement].

79.

Riverside Publishing. (2014). Logramos research and development guide. Author.

80.

Riverside Publishing Company. (2012). 2012 Logramos®, Second Edition Order Form. Retrieved from http://www.Riversidepublishing.com/products/orderform/Logramos2ndedition.pdf

81.

Romey

E. A.

(2006). A quantitative study of the Alabama gifted matrix identification process: Implications for underserved populations (Publication No. 3231246) [Doctoral dissertation, University of Connecticut]. ProQuest Dissertations and Theses Global.

82.

Shingala

M. C.

Rajyaguru

(2015). Comparison of post hoc tests for unequal variance. International Journal of New Technologies in Science and Engineering, 2(5), 22–33.

83.

Stephens

V. C.

(2009). Comparison of the performance of gifted students identified through either the psychometric approach or the multiple criteria approach (Publication No. 3349312) [Doctoral dissertation, Northcentral University]. ProQuest Dissertations and Theses Global.

84.

Sternberg

R. J.

Davidson

J. E.

(2005). Conceptions of giftedness. Cambridge University Press.

85.

Teixeira-Pinto

Normand

S. L.

(2008). Statistical methodology for classifying units on the basis of multiple-related measures. Statistics in Medicine, 27(9), 1329–1350. https://doi.org/10.1002/sim.3187

86.

Timbie

J. W.

Normand

S. L.

(2008). A comparison of methods for combining quality and efficiency performance measures: Profiling the value of hospital care following acute myocardial infarction. Statistics in Medicine, 27(9), 1351–1370. https://doi.org/10.1002/sim.3082

87.

Walters

G. D.

(2011). Taking the next step: Combining incrementally valid indicators to improve recidivism prediction. Assessment, 18(2), 227–233. https://doi.org/10.1177/1073191110397484

88.

Warne

R. T.

(2011). An investigation of measurement invariance across genders on the Overexcitability Questionnaire-Two. Journal of Advanced Academics, 22(4), 578–593. https://doi.org/10.1177/1932202X11414821

89.

Warne

R. T.

(2015). Test review: Cognitive Abilities Test, form 7 (CogAT7). Journal of Psychoeducational Assessment, 33(2), 188–192. https://doi.org/10.1177/0734282914548324

90.

Warne

R. T.

(2023). Tests of measurement invariance of three Wechsler intelligence tests in economically developing nations in South Asia and Sub-Saharan Africa. Gifted and Talented International. https://doi.org/10.1080/15332276.2023.2245007

91.

Welch

(1951). On the comparison of several mean values: An alternative approach. Biometrika, 38(3-4), 330–336. https://doi.org/10.1093/biomet/38.3-4.330

92.

Whitley

B. E.

Jr. Kite

M. E.

(2013). Principles of research in behavioral science (3rd ed.). Routledge.

93.

Wilson

(2008). Combining assessment scores: A variable feast. Medical Teacher, 30(4), 428–430. https://doi.org/10.1080/01421590802043843

94.

Worrell

F. C.

(2009). Myth 4: A single test score or indicator tells us all we need to know about giftedness. Gifted Child Quarterly, 53(4), 242–244. https://doi.org/10.1177/0016986209346828

Equality Versus Equity in Multiple Measures: Gifted Identification Matrix Assessment Across Demographic Groups

Abstract

Keywords

Literature Review

Difficulties Identifying Gifted CLED Students

Matrices in the Gifted Identification Process

Measurement Invariance

Research Questions

Method

Participants

Matrix Instruments

Iowa Assessments/Logramos

Cognitive Abilities Test

Report Card

Teacher Recommendation

Procedures/Data Collection

Coding

Data Analyses

Positionality Statement

Results

Research Questions 1–3

Research Questions 4–6

Summary of Results

Discussion

Explanation of Findings

Limitations and Implications

Footnotes

ORCID iDs

Ethical Considerations

Funding

Declaration of Conflicting Interests

Data Availability Statement

Author Biographies

References