Sage Journals: Discover world-class research

Abstract

Critical thinking stands out as one of the most important cognitive abilities needed for effective adaptation to a knowledge-based society in the 21st century. Despite its significance, there remains a lack of consensus regarding the conceptual and methodological frameworks for measuring it. This study aimed to design and validate a comprehensive assessment scale for critical thinking. A specification table was constructed based on the critical thinking components agreed upon in the current literature, from which corresponding items were formulated and subsequently validated by expert judges. Following adjustments to the test, 258 Colombian participants completed it. Sample adequacy (KMO), Bartlett’s sphericity, and collinearity were confirmed, and the results underwent exploratory factor analysis. Reliability analysis was conducted using McDonald’s ω, Cronbach’s α, Guttman’s λ6, and Greatest Lower Bound (GLB) statistics. The final test comprised 17 items organized into 2 constituent factors, demonstrating robust content and internal structure validity, as well as high levels of precision and internal consistency (overall GLB of .93). The Critical Thinking Evaluation Scale (CTES) exhibits validity and reliability for use within the Colombian population. Its adaptation for other contexts and countries, both Spanish and English-speaking, is recommended. The Spanish version, along with the validated English version for potential adaptations, and scoring norms are provided in the attached documents.

Plain language summary

Scale to evaluate critical thinking

Critical thinking is one of the most important cognitive skills, but there is no complete instrument for its evaluation. Therefore, this study aimed to design and validate a comprehensive evaluation scale for critical thinking. A table of contents of the components of critical thinking was constructed, the items were designed and validated by expert judges. After adjustments to the test, 258 Colombian participants answered it and validity and reliability analyzes were carried out. The final test was composed of 17 items organized into 2 constituent factors, demonstrating strong content and internal structural validity, as well as high levels of precision and internal consistency (overall GLB of .93). The Critical Thinking Evaluation Scale (CTES) is a test with high validity and reliability for use in the Colombian population. Its adaptation is recommended for other contexts and countries, both Spanish and English speaking. The Spanish version, along with the validated English version for possible adaptations and scoring standards, are provided in the attached documents.

Keywords

critical thinking reliability and validity research methods social sciences measurement and scaling methods educational measurement & assessment education self-regulation

Critical thinking (CT) stands as a fundamental cognitive capacity essential for successful integration into contemporary knowledge societies of the 21st century (Alsaleh, 2020; Kocak et al., 2021; Nussbaum et al., 2021; Wechsler et al., 2018). It entails developing robust decision-making and problem-solving skills applicable in diverse practical contexts within an increasingly complex environment, while also addressing significant issues within specific disciplinary domains (Butler et al., 2012; Dwyer et al., 2014; Niu et al., 2013). Recent studies highlight that good critical thinkers demonstrate better decisions-making capabilities, even under pressure (Ellerton, 2022; Gambrill, 2006; Nussbaum et al., 2021); exhibit fewer cognitive biases (Facione & Facione, 2001; Hong & Choi, 2015; Georgiadou et al., 2018); engage more actively as well-informed citizens (Shutaleva, 2021) and frequently possess enhanced employability prospects (Dwyer et al., 2014). This has made CT a competence with little conceptual and methodological consensus in its measurement instruments due to the multifaceted attention it has garnered from diverse scholars and educators interested in the development of thinking skills (Bernard et al., 2008; Niu et al., 2013). Consequently, multiple conceptualizations of CT persist contingent upon the field or disciplinary context under studied (Butler et al., 2012; Ossa-Cornejo et al., 2017; Valenzuela and Nieto, 2008a).

Existing theoretical frameworks characterize CT as a purposeful, reasoned, and goal-directed thinking process, comprising a set of fundamental cognitive skills (An Le & Hockey, 2022; Black, 2012; Dwyer et al., 2014; Nieto & Saiz, 2008; Valenzuela & Nieto, 2008a). These competences enable individuals discern and interpret information (Valenzuela & Nieto, 2008a), scrutinize its validity, assess its reliability, interrogate its origins (Halpern, 2014; Shutaleva, 2021), and construct coherent explanations and conclusions (Nussbaum et al., 2021; Schroyens, 2005).

While the cognitive aspect predominates (Ossa-Cornejo et al., 2017), CT cannot be solely delineated by its constituent skills, as proficiency in these skills does not guarantee adept critical thinking (Nieto and Saiz, 2008; Saiz et al., 2015; Wechsler et al., 2018). Moreover, individuals must discern when it is convenient to use them and be willing and motivated to do so when necessary (Dwyer et al., 2014; Ku, 2009; Valenzuela & Nieto, 2008a). Thus, the behavioral component of CT manifests in the synergy between these components and their practical application (Halpern, 1998; 2014).

In an attempt to resolve the conceptual discrepancy, an interdisciplinary and international panel of CT experts formulated the Delphi Report (American Philosophical Association [APA], 1990), presenting CT as a construct organized around two dimensions: cognitive abilities and affective dispositions. CT is defined as:

Intentional, self-regulatory judgment resulting in interpretation, analysis, evaluation, and inference, as well as an explanation of the visual, conceptual, methodological, criteriological, or contextual considerations on which that judgment is based (Facione, 1990, p. 3).

This conceptual perspective underscores CT as a multidimensional construct, consolidating the principal components agreed upon in current literature and representing the widely accepted definition of proficient CT (Alsaleh, 2020; Beckie et al. 2001; Dwyer et al., 2014; Miele & Wigfield, 2014; Sorensen & Yankech, 2008; Wechsler et al., 2018). Under this approach, CT comprises six core cognitive skills: interpretation, analysis, evaluation, inference, explanation, and self-regulation, each with their respective sub-skills, with analysis, evaluation, and inference holding particular significance (Dwyer et al., 2014); and two affective dispositions: approach to life and living and approach to specific themes, questions or problems, along with their sub-components (Facione, 1990, 2011; Ossa-Cornejo et al., 2021; Valenzuela and Nieto, 2008a). Definition of each component is presented below (Table 1).

Table 1.

CT’s Cognitive Abilities Definition (Facione 1990, 2011).

Ability	Sub-skills
Analysis Recognize an argument structure to identify and examine its verbal and graphic statements and representations, their conceptual relationships and each of their role in concluding.	Idea’s examination Define, contrast, and relate the role of various ideas, concepts, or statements in argumentation, reasoning, or persuasion.
	Argument’s identification Determine whether a set of verbal or graphical representations supports or denies any statement, opinion, or viewpoint.
	Argument’s analysis Determine whether a set of verbal o graphic representations provides sufficient reasoning to justify an opinion.
Evaluation Examine the descriptions of a person’s perception, experience, situation, judgment, belief, or opinion, and evaluate the logical force of actual or intended relationships between them to judge its credibility.	Claims evaluation Recognize relevant facts to evaluate the credibility, acceptability, probability of truth, and level of trust of information.
	Argument’s evaluation Judge whether the credibility of an argument justifies accepting it as true (deductively true) or most likely true (inductively justified), while raising questions or objections and evaluating whether these point to weakness in the argument.
Inference Identify and secure elements to reasonably conclude. It implies stating hypotheses considering the relevant information and the consequences derived from any set of verbal or graphic representations.	Evidence consultation Assess relevant information and argumentative support required to determine the acceptability, plausibility, or credibility of a premise, and formulate a strategy to gather scientific information.
	Alternative’s consideration Generate multiple plans to achieve a goal while consider the potential consequences of each decision. This involves formulate various alternatives to solve a problem, postulate different assumptions regarding a question, and project alternative hypotheses regarding an event.
	Conclusion’s creation Determine which of several possible conclusions should be considered in a given situation by appropriately employing analogical, arithmetic, dialectical, and scientific reasoning, considering which is most strongly justified or supported or which should be rejected or considered less plausible based on available evidence.
Interpretation Understand and express the meaning of various situations, experiences, data, events, judgments, conventions, beliefs, rules, procedures, or criteria.	Categorization Formulate distinctions or frameworks for understanding, describing, or characterizing experiences, situations, beliefs, events, or information.
	Decoding Detect and describe the informative content, affective meaning, managerial functions, intentions, motives, purposes, social meaning, criteria, values, procedures, points of view, rules, and relationships expressed through verbal, behavioral, or graphic representations.
	Meaning clarification Make explicit the contextual, conventional, or intentional meanings of ideas conveyed in verbal, graphic, or behavioral representations to eliminate unintended confusion, vagueness, or ambiguity.
Self-regulation Conscious monitoring of the process and results of cognitive activities, applying analysis and evaluation skills to question, confirm, validate, or correct them.	Self-examination Evaluating both the process and results of cognitive activities, assessing whether they are influenced by deficiencies in self-knowledge, stereotypes, prejudices, emotions, or any other factor that may limit objectivity or rationality.
	Self-correction Design reasonable procedures to correct errors and their underlying causes identified during self-examination.

Furthermore, the list of affective dispositions characterizing proficient critical thinkers are: curiosity regarding diverse issues; acquiring and maintaining well-rounded knowledge; readiness to recognize and capitalize on opportunities for critical thinking; trust in structured deliberative processes; self-assurance in reasoning abilities; receptiveness to diverse perspectives; adaptability in considering alternative viewpoints; comprehension of others’ perspectives; impartiality in evaluating reasoning; honesty in confronting personal biases, prejudices, stereotypes, and inclinations; caution in suspending, formulating, or revising judgments; willingness to reevaluate positions where honest introspection warrants change; clarity in articulating questions or concerns; organization in handling complex tasks; diligence in seeking pertinent information; rationality in selecting and applying standards; attentiveness to current issues; perseverance in the face of challenges; and a degree of precision appropriate to the subject and context (Facione, 1990, 2011).

A literature review revealed prominent instruments for assessing CT based on the Delphi panel’s definition, including the California Critical Thinking Skills Test (CCTS), the Test for Everyday Reasoning (TER), and the Critical Thinking Disposition Inventory (Facione, 2011; Ricketts & Rudd, 2004). However, none of these instruments simultaneously measure both dimensions of CT, as the first two evaluate only cognitive skills components, while the third focuses solely on assessing related dispositions.

Other instruments that are not based con Delphi panel’s framework, such as the Watson-Glaser Critical Appraisal (Watson & Glaser, 1980), the Ennis-Weir Critical Thinking Essay Test (Werner, 1991), the Cornell Test of Critical Thinking (Ennis & Millman, 2005), the Halpern Critical Thinking Assessment using Everyday Situations (Halpern, 1998), and the Salamanca Critical Thinking Test (Rivas & Saiz, 2012), evaluate CT solely based on its cognitive abilities. Meanwhile, tests like the Motivational Scale of Critical Thinking (EMPC) (Valenzuela & Nieto, 2008b) are grounded solely in motivational dispositions.

Given the contemporary understanding of CT as a synthesis of highly interrelated skills and dispositions operating jointly and complementarily (Bernard et al., 2008; Ossa-Cornejo et al., 2017), the lack of psychometric tests that assess CT as a multidimensional concept comprising cognitive skills and affective dispositions, and notably, the absence of Latin American assessments (Ossa-Cornejo et al., 2017), the present study aimed to design and validate a CT assessment scale based on the theoretical framework provided by the Delphi report.

Method

Design

The present study adopts a quantitative empirical approach with an instrumental design, aiming to develop and validate a critical thinking assessment scale (Ato et al., 2013).

Participants

A non-probabilistic convenience sampling method was employed virtually, resulting in a sample of 258 individuals (55.04% women) aged between 18 and 63 years (M = 39.68; SD = 14.48). Participants represented diverse educational backgrounds, including high school (3.87%), technologist (3.49%), technician (5.43%), undergraduate (18.22%), professional (27.90%), specialization (19.38%), master (19.38%), and doctorate (2.33%). Inclusion criteria included being of legal age and Colombian nationality or residence. The adequacy of the sample size was confirmed with a Kaiser-Meyer-Olkin (KMO) statistic of .907, surpassing the minimum acceptable value of .8 (Pérez & Medrano, 2010).

Procedure

Initially, a literature review identified the APA Delphi Panel CT conceptualization as the most appropriate framework (APA, 1990). The primary CT components were delineated, and a specifications table was developed, allocating each factor’s percentage (%) load and the appropriate number of items (see Appendix A). Expert validation was then conducted independently by five psychologists, including three Ph.D. holders and two candidates, all possessing extensive research experience pertinent to the design and subject matter of the present study. Additionally, one expert specialized in assessment and evaluation, while another had considerable expertise in university teaching, research methodology, and psychometrics. The remaining three experts specialized in thinking skills and cognitive development, teaching and consulting in life-skills education, and linguistic and decision-making processes, respectively. The items were evaluated based on relevance, clarity, sufficiency, and necessity using a scoring scheme adapted from Escobar-Pérez and Cuervo-Martínez (2008). The scores underwent analysis using Lawshe’s Content Validity Index (CVI), with values exceeding 0.6 deemed satisfactory (Tristán-López, 2008). Subsequent adjustments were made to the scale based on the evaluation results.

The scale was administered and validated using the Microsoft Forms tool. Participants were required to confirm eligibility, provide demographic information (age in years, sex, and academic level), and follow instructions for responding to the scale items. Validity and reliability tests were then conducted on the collected data, and a database was established.

Data Analysis

To analyze the internal structure of the test, sample adequacy (KMO), Bartlett’s sphericity coefficient (p < .05 expected), and collinearity were assessed, verifying correlation values were less than .9 (Pérez & Medrano, 2010). Once confirmed, exploratory factor analysis (EFA) was performed using a weighted least squares extraction with a promax oblique rotation method, given the scalar condition of each item and the theoretical relation of these factors (Lloret-Segura, 2014). Factor loadings below 0.4 and factors comprising less than three items were eliminated iteratively until all retained items exhibited unique factor loadings.

Reliability analysis involved evaluating McDonald’s ω, Cronbach’s alpha (α), Guttman’s λ6, and Greatest Lower Bound (GLB) statistics for the complete test and each factor. Values exceeding .7 were indicative of internal consistency (Chadha, 2009). In addition, sample normality was verified with the Kolmogorov-Smirnov test. Due to normality not being founded, Pearson’s product-moment correlations (r) between each item and the test and Spearman’s rank correlations between items of the same factor were analyzed, expecting values above .3 for all (Chadha, 2009). Finally, the rules for scoring and interpreting the test were developed. Given the sample size (n > 200) (Aragón, 2011), it was consolidated into a scale using Z scores for direct score interpretation (Valero, 2013) (see Appendix C). All the analysis were conducted using Jasp software (JASP team, 2022).

Ethical Considerations

This research received approval from the Research and Ethics Subcommittee of the researchers’ Faculty of Psychology, with record number 158. Furthermore, participants’ rights were upheld throughout the research, as their participation was entirely voluntary, and their dignity, integrity, privacy, and autonomy were all maintained. They were provided with the opportunity to give informed consent before the questionnaire was administered, which included information about the study’s authors, its purpose, justification, the advantages of participating, the procedure to be followed and confidentiality and anonymity agreements (American Psychological Association (APA), 2017).

Participants were assured of no risk to their well-being following resolution 8,430 of 1993s article 11 (Colombian Health Ministry, 1993), and their data were strictly used for research purposes only, with no feedback provided on the results due to the ongoing validation process.

Results

The specifications table, constructed in alignment with the two dimensions of Critical Thinking (CT) proposed by the Delphi report and its sub-components, guided the development of the Critical Thinking Evaluation Scale (CTES) (see Appendix A). Results of the validation process by expert judges of the corresponding items are shown in Table 2.

Table 2.

Lawshe’s Content Validity Index (CVI) for Each Item According to the Scoring Criteria by Judges and Subsequent Decisions on Their Permanence in the Test.

Item/criteria	1	2	3	4	5	6	7	8	9	10	11	12	13	14
Relevance	1	1	1	1	0.5	1	1	0.75	0.75	1	1	1	0.75	1
Sufficiency	1	1	0.75	1	0.75	1	1	0.75	0.75	1	1	1	0.75	1
Need	1	1	1	1	0.5	1	1	0.75	0.75	1	1	1	0.75	1
Clarity	0.75	1	0.5	0.75	0.25	1	1	0.75	0.5	1	0.75	1	0.5	0.75
Decision	M	S	D	M	D	S	S	M	D	S	M	S	D	M
Item/criteria	15	16	17	18	19	20	21	22	23	24	25	26	27	28
Relevance	0.75	1	0.75	0.75	1	1	1	1	0.75	1	1	1	1	1
Sufficiency	0.75	1	0.75	0.75	1	1	1	1	0.75	1	1	1	1	1
Need	0.75	1	0.75	0.5	1	1	1	0.5	0.75	1	1	0.75	1	1
Clarity	1	1	0.75	0.75	1	1	1	1	0.75	1	1	0.75	1	0.75
Decision	S	S	M	D	S	M	S	D	M	M	S	D	S	M
Item/criteria	29	30	31	32	33	34	35	36	37	38	39	40	41	42
Relevance	1	0.75	1	0.75	1	0.5	0.5	0.75	0.75	0.5	0.75	0.5	0.5	0.5
Sufficiency	1	0.75	1	0.75	0.75	0.25	0.25	0.5	0.5	0.25	0.75	0.5	0.5	0.5
Need	1	0.75	0.75	1	1	0.5	0.25	0.5	0.75	0.25	0.75	0.75	0.75	0.5
Clarity	0.75	1	1	0.75	1	0.75	0.75	0.75	0.5	0.5	0.75	0.75	0.5	0.75
Decision	M	D	S	M	S	M	D	M	M	D	S	M	D	D
Item/criteria	43	44	45	46	47	48	49	50	51	52	53	54	55	56
Relevance	1	0.75	1	0.75	0.75	0.5	0.75	1	1	1	1	0.75	1	1
Sufficiency	0.75	0.75	0.75	0.5	0.25	0.75	0.75	1	1	1	1	0.75	0.75	1
Need	1	0.75	0.75	0.75	0.75	0.5	0.75	1	1	1	1	0.75	1	1
Clarity	1	1	0.75	0.75	0.5	0.5	0.75	0.75	0.75	1	1	0.5	1	1
Decision	S	M	M	M	D	D	S	S	D	M	S	D	S	S
Item/criteria	57	58	59	60	61	62	63	64	65	66	67	68	69	70
Relevance	0.75	1	0.75	1	1	0.5	0.25	0.25	1	0.25	1	0.75	0.75	1
Sufficiency	0.75	1	0.75	0.75	1	0.5	0.5	0.5	1	0.25	0.5	0.75	0.75	0.75
Need	0.75	1	1	1	1	0.5	0.25	0.25	1	0.25	1	0.5	0.75	1
Clarity	1	1	1	1	1	0.25	0.75	0.5	1	0.25	1	0.75	0.75	1
Decision	D	S	D	S	S	D	D	D	S	D	M	D	D	S
Item/criteria	71	72	73	74	75	76	77	78	79	80	81	82	83	84
Relevance	1	1	1	1	1	1	1	1	1	0.75	1	1	1	0.75
Sufficiency	1	1	1	1	1	1	1	1	1	0.75	0.75	0.75	0.75	0.5
Need	1	1	1	1	1	1	1	1	1	0.75	1	1	1	0.75
Clarity	1	1	1	0.75	1	1	1	1	0.75	0.75	0.75	1	1	1
Decision	D	S	D	M	S	D	S	D	M	D	D	M	S	D
Item/criteria	85	86	87	88	89	90	91	92	93	94
Relevance	0.5	0.25	1	1	1	1	1	1	1	1
Sufficiency	0.25	0.25	0.75	0.75	1	1	1	0.75	1	1
Need	0.75	0.5	1	1	1	1	1	1	1	1
Clarity	1	0.75	0.75	1	1	1	1	1	1	1
Decision	D	D	D	S	D	S	M	D	D	S

Note. For the decision, S means that the item was kept the same, M means that it was kept and modified, and D means that it was deleted.

Based on the obtained Content Validity Index, the validation process by expert judges resulted in the deletion of 22 items (CVI < 0.6), the retention of 32 items (CVI > 0.7), and modification of 25 items based on qualitative feedback from each judge. Two motivational disposition sub-categories were removed due to none of their items meeting the minimal sufficient value for two or more rating criteria. Additionally, items with lower frequency CVI ratings of 1 or 0.75 were deleted to maintain the percentage loadings of the original theoretical proposal. Consequently, the scale comprised 57 items for validation and administration to 258 participants. Factor analysis was conducted to examine the underlying factor structure of the CTES (see Table 3).

Table 3.

Correlation Matrix of the Exploratory Factor Analysis.

Item	Factor 1	Factor 2	Uniqueness
Item-22	0.736		0.588
Item-51	0.730		0.506
Item-16	0.706		0.504
Item-46	0.679		0.556
Item-52	0.592		0.630
Item-39	0.580		0.628
Item-48	0.567		0.595
Item-25	0.435		0.597
Item-18		0.751	0.453
Item-31		0.743	0.496
Item-6		0.642	0.555
Item-30		0.607	0.594
Item-4		0.574	0.728
Item-10		0.561	0.747
Item-57		0.484	0.717
Item-17		0.471	0.681
Item-55		0.433	0.599

Note. Items’ order was randomized for test administration purposes.

The final scale consisted of 17 items distributed across 2 factors: Factor 1 comprised 8 items, while Factor 2 included 9 items. Reliability values for each factor and the overall scale demonstrated high internal consistency and appropriate reliability (see Table 4).

Table 4.

Reliability Statistics of the General Test and Each Factor.

Dimension	McDonald’s ω	Cronbach’s α	Guttman’s λ6	Greatest lower bound
General	.891	.890	.903	0.937
Factor 1	.848	.847	.839	0.881
Factor 2	.841	.839	.833	0.878

All statistics indicated values exceeding .8, signifying robust internal consistency. Positive and significant correlations were observed within both factors (p < .01) and between all items, with item-test correlations exceeding .4 (Table 5). Each item contributed significantly to the high-reliability values of the test, as demonstrated by the hypothetical decrease in reliability values upon item elimination.

Table 5.

Item Hypothetical Elimination and Item-Test Correlation.

Factor	Items	If the item is deleted				Item-test correlation
Factor	Items	McDonald’s ω	Cronbach’s α	Guttman’s λ6	Greatest lower bound	Item-test correlation
1	22	.834	.834	.820	0.872	.566
	51	.824	.823	.811	0.861	.639
	16	.822	.822	.81	0.861	.648
	46	.829	.828	.814	0.854	.594
	52	.832	.832	.819	0.858	.563
	39	.833	.832	.818	0.861	.561
	48	.832	.831	.817	0.865	.571
	25	.834	.833	.822	0.868	.551
2	18	.811	.809	.801	0.851	.656
	31	.815	.812	.803	0.854	.635
	6	.819	.816	.804	0.852	.612
	30	.822	.819	.811	0.858	.575
	4	.834	.831	.822	0.871	.462
	10	.835	.833	.824	0.876	.449
	57	.832	.829	.82	0.87	.487
	17	.828	.825	.814	0.859	.522
	55	.825	.822	.809	0.85	.552

According to the results, the cumulative proportion of variance explained by the final scale was 40.2%

Discussion

This study aimed to design and validate the Critical Thinking Evaluation Scale, acknowledging the pivotal role of critical thinking (CT) in 21st-century society, as underscored by various scholars (Alsaleh, 2020; Kocak et al., 2021; Nussbaum et al., 2021; Wechsler et al., 2018). Despite the acknowledged significance of CT, there remains a notable scarcity of psychometric instruments that comprehensively assess it as a multidimensional construct (Ossa-Cornejo et al., 2017). Therefore, our endeavor sought to address this gap by constructing a scale grounded in the theoretical framework provided by the Delphi panel, which integrates the main components related to CT agreed upon literature: cognitive skills and affective dispositions.

Our meticulous methodology involved constructing a specifications table based on CT components, followed by item construction, expert validation, and analysis using Lawshe’s Content Validity Index. Upon adjustments, the scale was administered to a sample of 258 Colombian individuals. Subsequently, the assumptions of sample adequacy (KMO), Bartlett’s sphericity, and collinearity were confirmed, and exploratory factor and reliability analysis were conducted.

The results yielded a CT evaluation scale comprising 17 items distributed across two factors, demonstrating high indices of general reliability, and supporting the accuracy and internal consistency of the test (Chadha, 2009). Factor 1 is finally composed of 8 items, while Factor 2, of 9. Factor 1, termed Analytical Ability, primarily encompasses cognitive skills related to evaluation and analytical information processing (6 of 7 items), while Factor 2, termed Argumentative Ability, encompasses a distribution between motivational dispositions (3 items) and cognitive skills, reflecting the strategic application of skills and cognitive strategies in generating and utilizing information.

These findings align with agreed established definitions of CT as an active and skillful application, analysis, and evaluation of information (Alsaleh, 2020; Choy & Cheah, 2009; Nussbaum et al., 2021; Paul & Elder, 2003; Paz et al., 2010; Tung & Chan, 2009). Moreover, this scale ensures comprehensive coverage of the construct by incorporating both cognitive skills and affective dispositions within a unified measurement framework, marking a significant contribution to the existing literature. Notably, this scale represents a pioneering effort within the existing literature as the first to encompass CT as a multidimensional construct. The validity of the instrument is supported by its internal structure, as evidenced by the alignment between factor analysis clusters and the theoretical proposal, high explained variance, expert judgment validation, and adequate item-test correlations (Barraza, 2007).

The only initially integrated component of CT, whose items are not present in the scale’s final version after the corresponding analyses, is the cognitive skill of self-regulation. However, while some literature suggests including self-regulation as a metacognitive component of CT (Facione, 1990), the scale’s theoretical coverage encompasses this aspect within its broader framework. Moreover, empirical evidence suggests that metacognition, while related, constitutes a distinct cognitive process that enhances the direction and prediction of CT outcomes (Choy and Cheah, 2009; Dawson, 2008; Dwyer, 2011; Dwyer et al., 2014; Ghanizadeh, 2011; Heydarnejad et al., 2021; Kuhn and Dean, 2004; Magno, 2010; Melsert & Bicalho, 2012).

The present study has limitations, including the lack of predictive validity testing and convergent validity analysis. Since this study marks the first instance within the current literature review of designing a scale covering the Delphi panel’s CT conceptualization thoroughly, there was no endeavor to obtain valid evidence based on the response process and other variables. Therefore, it is imperative to conduct predictive validity studies with other variables, such as measuring and correlating CT with academic performance across various knowledge areas or comparing CT-trained and untrained individuals. Similarly, only exploratory factor analyses were performed due to the primary aim of designing a scale tested for its metric qualities for the first time. Consequently, it is recommended to validate this factorial structure through confirmatory factor analysis with independent samples in diverse contexts for future research. Another limitation of this study is the absence of a convergent validity analysis with other established measures of critical thinking. Despite the lack of instruments comprehensively covering all dimensions, as demonstrated by the present study, it is advisable to apply the current scale alongside others to assess their correlation in measurements.

In conclusion, this research presents a robust and reliable psychometric instrument for evaluating CT in the analytical and argumentative skills and dispositions within the Colombian population. The scale was named the Critical Thinking Evaluation Scale (CTES), and application-ready version and norms for scoring and interpretation are provided, facilitating its use in various contexts to enhance CT thinking skills and motivational dispositions (see Aprendix B, C and D). Furthermore, critical thinking (CT) is a fundamental skill for students, enabling them to effectively plan their learning, evaluate their performance, and monitor their progress (Silva & Rodriguez, 2011; Alwehaibi, 2012). This skill is equally applicable in scientific and business contexts (Lin, 2014). Moreover, within organizations, strong CT abilities facilitate problem identification, contextualization based on complexity, and the application of methodologically sound solution (Zúñiga, 2015). Thus, enhancing CT proficiency is expected to address the challenge of constructing adequate instruments for measuring and evaluating CT due to the existing conceptual diversity (Dwyer et al., 2014; Ossa-Cornejo et al., 2017). In today’s rapidly evolving information society, the ability to identify individuals with adequate CT skills is more crucial than ever.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: This article is derived from the project “Risk and Protection Factors Associated with Risk Behaviors and Problems Affecting Mental Health in Children and Adolescents: Understanding,Analysis and Modification of Risk and Protection Factors Associated with Risk Behaviors”,with funding code PSIPHD-4-2023,of the Faculty of Psychology and Behavioral Sciences,of the Universidad de La Sabana.

ORCID iD

Fernando Riveros Munévar

Data Availability Statement

Data are available upon request to the authors.

References

Alsaleh

N. J.

(2020). Teaching critical thinking skills: Literature review. Turkish Online Journal of Educational Technology, 19(1), 21–39. https://files.eric.ed.gov/fulltext/EJ1239945.pdf.

Alwehaibi

(2012). Novel program to promote critical thinking among higher education students: Empirical study from Saudi Arabia. Asian Social Science, 8(11), 193. https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=b10c926849ffcf01a5cbc54b121caf184eb082f0.

American Philosophical Association (APA). (1990). Critical thinking: A statement of expert consensus for purposes of educational assessment and instruction. Executive Summary “The Delphi Report”. https://philarchive.org/archive/faccta.

American Psychological Association (APA). (2017). Ethical principles of psychologists and code of conduct. American Psychological Association. https://www.apa.org/ethics/code.

An Le

D. T. B.

Hockey

(2022). Critical thinking in the higher education classroom: Knowledge, power, control, and identities. British Journal of Sociology of Education, 43(1), 140–158. https://doi.org/10.1080/01425692.2021.2003182

Aragón

(2011). Evaluación psicológica: historia, fundamentos teórico-conceptuales y psicometría. Manual Moderno.

Ato

López

Benavente

(2013). Un sistema de clasificación de los diseños de investigación en psicología. Anales de Psicología, 29(3), 1038–1059. https://doi.org/10.6018/analesps.29.3.178511

Barraza

(2007). La consulta a expertos como estrategia para la recolección de evidencias de validez basadas en el contenido. Investigación educativa duranguense, 7, 5–14. https://dialnet.unirioja.es/servlet/articulo?codigo=2358908.

Beckie

T. M.

Lowry

L. W.

Barnett

(2001). Assessing critical thinking in baccalaureate nursing students: A longitudinal study. Holistic Nursing Practice, 15(3), 18–26. https://pdfs.journals.lww.com/hnpjournal/2001/04000/Assessing_Critical_Thinking_in_Baccalaureate.6.pdf.

10.

Bernard

R. M.

Zhang

Abrami

P. C.

Sicoly

Borokhovski

Surkes

M. A.

(2008). Exploring the structure of the Watson–Glaser critical thinking appraisal: One scale or many subscales? Thinking Skills and Creativity, 3(1), 15–22. https://doi.org/10.1016/j.tsc.2007.11.001

11.

Black

(2012). An overview of a programme of research to support the assessment of critical thinking. Thinking Skills and Creativity, 7, 122–133. https://doi.org/10.1016/j.tsc.2012.04.003

12.

Butler

H. A.

Dwyer

C. P.

Hogan

M. J.

Franco

Rivas

S. F.

Saiz

Almeida

L. S.

(2012). The halpern critical thinking assessment and real-world outcomes: Cross-national applications. Thinking Skills and Creativity, 7(2), 112–121. https://doi.org/10.1016/j.tsc.2012.04.001

13.

Chadha

(2009). Applied psychometry. SAGE Publications.

14.

Choy

S. C.

Cheah

P. K.

(2009). Teacher perceptions of critical thinking among students and its influence on higher education. International Journal of Teaching and Learning in Higher Education, 20(2), 198–206. https://files.eric.ed.gov/fulltext/EJ864337.pdf.

15.

Colombian Health Ministery. (1993, October 4). Resolución N° 8430 de 1993. https://www.minsalud.gov.co/sites/rid/lists/bibliotecadigital/ride/de/dij/resolucion-8430-de-1993.pdf

16.

Dawson

T. L.

(2008). Metacognition and learning in adulthood. Developmental Testing Service, LLC.

17.

Dwyer

C. P.

(2011). The evaluation of argument mapping as a learning tool [Doctoral thesis, National University of Ireland]. https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=07433d5ecc9d4c65cd05eb642ea937e91eea73b0.

18.

Dwyer

C. P.

Hogan

M. J.

Stewart

(2014). An integrated critical thinking framework for the 21st century. Thinking skills and Creativity, 12, 43–52. http://doi.org/10.1016/j.tsc.2013.12.004

19.

Ellerton

(2022). On critical thinking and content knowledge: A critique of the assumptions of cognitive load theory. Thinking Skills and Creativity, 43, 100975. https://doi.org/10.1016/j.tsc.2021.100975

20.

Ennis

R. H.

Millman

(2005). Cornell critical thinking test, level X. Midwest Publications.

21.

Escobar

Cuervo-Martínez

(2008). Validez de contenido y juicio de expertos: Una aproximación a su utilización. Avances en medición, 6, 27–36. https://gc.scalahed.com/recursos/files/r161r/w25645w/Juicio_de_expertos_u4.pdf.

22.

Facione

P. A.

(1990). Critical thinking: A statement of expert consensus for purposes of educational assessment and instruction. Research Findings and Recommendations. California State University: ERIC.

23.

Facione

P. A.

(2011). Critical thinking: What it is and why it counts. Insight Assessment, 2007(1), 1–23. https://d1wqtxts1xzle7.cloudfront.net/71022740/what_why98-libre.pdf.

24.

Facione

P. A.

Facione

N. C.

(2001). Analyzing explanations for seemingly irrational choices: Linking argument analysis and cognitive science. International Journal of Applied Philosophy, 15(2), 267–28. https://w.insightassessment.com/var/ezflow_site/storage/pdf/IJAP_Analysis_Paper.pdf.

25.

Gambrill

(2006). Critical thinking in clinical practice: Improving the quality of judgments and decisions. John Wiley y Sons.

26.

Georgiadou

Rahanu

Siakas

K. V.

McGuinness

Edwards

J. A.

Hill

Khan

Kirby

Cavanagh

Knezevic

(2018). Fake news and critical thinking in information evaluation. https://eprints.mdx.ac.uk/24364/1/BIHAC%202018%20Georgiadou%20et%20al%20.

27.

Ghanizadeh

(2011). An investigation into the relationship between self-regulation and critical thinking among Iranian EFL teachers. Journal of Technology & Education, 5(2), 117–124. https://jte.sru.ac.ir/article_292_00817d7decdf06118d88b083a2687ce9.pdf.

28.

Halpern

D. F.

(1998). Teaching critical thinking across domains: Dispositions, skills, structure training, and metacognitive monitoring. American Psychologist, 53(4), 449–455. https://psycnet.apa.org/fulltext/1998-00766-023.pdf.

29.

Halpern

D. F.

(2014). Thought and knowledge: An introduction to critical thinking (5th ed.). Psychology Press.

30.

Heydarnejad

Hosseini

Ghonsooly

(2021). The relationship between critical thinking, self-regulation, and teaching style preferences among EFL teachers: A path analysis approach. Journal of Language and Education, 7(1), 98–110. https://doi.org/10.17323/jle.2021.1110

31.

Hong

Y. C.

Choi

(2015). Assessing reflective thinking in solving design problems: The development of a questionnaire. British Journal of Educational Technology, 46(4), 848–863. http://.doi.org/10.1111/bjet.12181

32.

JASP Team (2022). JASP (Version 0.16.2) [Computer software]. https://jasp-stats.org/

33.

Kocak

Coban

Aydin

Cakmak

(2021). The mediating role of critical thinking and cooperativity in the 21st century skills of higher education students. Thinking Skills and Creativity, 42, 100967. https://doi.org/10.1016/j.tsc.2021.100967

34.

K. Y. L.

(2009). Assessing students’ critical thinking performance: Urging for measurements using multi-response format. Thinking Skills and Creativity, 4(1), 70–76. https://doi.org/10.1016/j.tsc.2009.02.001

35.

Kuhn

Dean

(2004). Metacognition: A bridge between cognitive psychology and educational practice. Theory into Practice, 43(4), 268–274. https://www.jstor.org/stable/pdf/3701534.pdf.

36.

Lin

S.S.

(2014). Science and non-science undergraduate students’ critical thinking and argumentation performance in reading a science news report. International Journal of Science and Mathematics Education, 12, 1023–1046. https://doi.org/10.1007/s10763-013-9451-7

37.

Lloret-Segura

Ferreres-Traver

Hernández-Baeza

Tomás-Marco

(2014) El análisis factorial exploratorio de los ítems: una guía práctica, revisada y actualizada. Anales de Psicología, 30(3), 1151–1169. https://revistas.um.es/analesps/article/download/analesps.30.3.199361/165441.

38.

Magno

(2010). The role of metacognitive skills in developing critical thinking. Metacognition and Learning, 5(2), 137–156. http://doi.org/10.1007/s11409-010-9054-4

39.

Melsert

A. L. D. M.

Bicalho

P. P. G. D

. (2012). Mismatches between a critical practice in psychology and traditional views on education. Psicologia Escolar e Educacional, 16(1), 153–160. https://doi.org/10.1590/S1413-85572012000100016

40.

Miele

Wigfield

(2014). Quantitative and qualitative relations between motivation and critical analytic thinking. Educational Psychology Review, 26(4), 519–541. http://doi.org/10.1007/s10648-014-9282-2

41.

Nieto

A. M.

Saiz

(2008). Relación entre las habilidades y las disposiciones del pensamiento crítico. Motivación y emoción: Contribuciones actuales, 2, 255–263. https://www.pensamiento-critico.com/archivos/motdispopc.pdf.

42.

Niu

Behar-Horenstein

Garvan

(2013). Do instructional interventions influence college students’ critical thinking skills? A meta-analysis. Educational Research Review, 9, 114–128. https://doi.org/10.1016/j.edurev.2012.12.002

43.

Nussbaum

Barahona

Rodríguez

Guentulle

Lopez

Vázquez-Uscanga

Cabezas

(2021). Taking critical thinking, creativity, and grit online. Educational Technology Research and Development, 69(1), 201–206. https://doi.org/10.1007/s11423-020-09867-1

44.

Ossa-Cornejo

C. J.

Palma-Luengo

M. R.

Martín

L. S.

Nelly

Quintana-Abello

I. M.

Díaz-Larenas

C. H.

(2017). Analysis of critical thinking measuring instruments. Ciencias psicológicas, 11(1), 19–28. http://doi.org/10.22235/cp.v11i2.1343

45.

Paul

Elder

(2003). La mini-guía para el Pensamiento crítico. Conceptos y herramientas. Ed. Fundación para el Pensamiento Crítico. https://www.criticalthinking.org/resources/PDF/SP-ConceptsandTools.pdf.

46.

Paz

J. S.

Molina

E. C.

Sánchez

L. P.

(2010). Pensamiento crítico y capacidad intelectual. Faísca, 15(17), 98–110. https://dialnet.unirioja.es/descarga/articulo/3548104.pdf.

47.

Pérez

Medrano

(2010). Análisis Factorial Exploratorio: Bases Conceptuales y Metodológicas. Revista Argentina de Ciencias del Comportamiento, 2(1), 58–66. https://dialnet.unirioja.es/descarga/articulo/3161108.pdf.

48.

Ricketts

J. C.

Rudd

(2004). The relationship between critical thinking dispositions and critical thinking skills of selected youth leaders in the national FFA organization. Journal of Southern Agricultural Education Research, 54(1), 21–33. https://www.researchgate.net/profile/John-Ricketts/publication/266160642_The_Relationship_between_Critical_Thinking_Dispositions_and_Critical_Thinking_Skills_of_Selected_Youth_Leaders_in_the_National_FFA_Organization.pdf.

49.

Rivas

S. F.

Saiz

(2012). Validación y propiedades psicométricas de la prueba de pensamiento crítico PENCRISAL. Revista Electrónica de Metodología Aplicada, 17(1), 18–34. https://dialnet.unirioja.es/descarga/articulo/4107460.pdf.

50.

Saiz

Rivas

S. F.

Olivares

(2015). Collaborative learning supported by rubrics improves critical thinking. Journal of the Scholarship of Teaching and Learning, 15(1), 10–19. http://doi.org/10.14434/josotl.v15i1.1290

51.

Schroyens

(2005). Knowledge and thought: An introduction to critical thinking. Experimental Psychology, 52(2), 163–164. https://psycnet.apa.org/fulltext/2005-02752-008.pdf.

52.

Shutaleva

(2021). Critical thinking in media sphere: Attitude of university teachers to fake news and its impact on the teaching. Journal of Management Information and Decision Sciences, 24(51), 1–12. https://philarchive.org/archive/SHUCTI.

53.

Silva

Rodrigues

(2011). Critical thinking: Its relevance for education in a shifting society. Revista de Psicología, 29(1), 175–195. https://www.redalyc.org/pdf/3378/337829518007.pdf.

54.

Sorensen

Yankech

(2008). Precepting in the fast lane: Improving critical thinking in new graduate nurses. The Journal of Continuing Education in Nursing, 39(5), 208–216. https://doi.org/10.3928/00220124-20080501-07

55.

Tristán-López

(2008). Modificación al modelo de Lawshe para el dictamen cuantitativo de la validez de contenido de un instrumento objetivo. Avances en Medición, 6, 37–48. https://www.humanas.unal.edu.co/lab_psicometria/application/files/9716/0463/3548/VOL_6._Articulo4_Indice_de_validez_de_contenido_37-48.pdf.

56.

Tung

C. A.

Chang

S. Y.

(2009). Developing critical thinking through literature reading. Feng Chia Journal of Humanities and Social Sciences, 19, 287–317. http://www.cocd.fcu.edu.tw/wSite/publicfile/Attachment/f1262069682958.pdf.

57.

Valenzuela

Nieto

A. M.

(2008a). Motivación y Pensamiento Crítico: Aportes para el estudio de esta relación. Revista Electrónica de Motivación y Emoción, 11(28). https://d1wqtxts1xzle7.cloudfront.net/49226601/article3-libre.pdf.

58.

Valenzuela

Nieto

A. M.

(2008b). Motivación y disposiciones como predictores del desempeño del pensamiento crítico. In Emoción y Revista Motivación: investigaciones actuales (135–150). Publicaciones Universidad de La Laguna. https://d1wqtxts1xzle7.cloudfront.net/51915693/Investigaciones_Actuales_en_Motivacion_y_Emocion-libre.pdf.

59.

Valero

(2013). Transformación e interpretación de las puntuaciones. Universitat Oberta de Catalunya. https://openaccess.uoc.edu/bitstream/10609/69325/1/Psicometr%C3%ADa_M%C3%B3dulo%204_Transformaci%C3%B3n%20e%20interpretaci%C3%B3n%20de%20las%20puntuaciones.pdf.

60.

Watson

Glaser

E. M.

(1980). Critical thinking appraisal, forms A and B. Harcourt, Brace and Wold.

61.

Wechsler

S. M.

Saiz

Rivas

S. F.

Vendramini

C. M. M.

Almeida

L. S.

Mundim

M. C.

Franco

(2018). Creative and critical thinking: Independent or overlapping components? Thinking Skills and Creativity, 27, 114–122. https://doi.org/10.1016/j.tsc.2017.12.003

62.

Werner

P. H.

(1991). The Ennis-Weir critical thinking essay test: An instrument for testing and teaching. Journal of Adolescent y Adult Literacy, 34(6), 494. https://www.proquest.com/openview/c276907d387feded6ac6c65788b73bef/1?pq-origsite=gscholar&cbl=42001.

63.

Zúñiga

(2015). Las competencias de pensamiento crítico y liderazgo colaborativo en las prácticas de psicología organizacional. Divulgare boletín científico de la escuela superior de Actopan, 2(4). https://doi.org/10.29057/esa.v2i4.1625