The Debate Skills Scale: Development and Psychometric Validation of a Multidimensional Self-Report Measure for High School Students

Abstract

We developed and validated the Debate Skills Scale (DSS), grounded in social cognitive theory, to address the lack of an instrument for assessing debate competencies among high school students, as existing measures primarily assess general critical thinking rather than debate performance. Data were collected from 389 students, and an initial 49-item pool was generated based on literature and expert input. Exploratory Factor Analysis revealed a four-factor structure explaining 50.95% of the variance and reduced the scale to 29 items. Subsequent confirmatory factor analysis (CFA) removed four items, resulting in a final 25-item scale. CFA supported this structure, with internal consistency reliability ranging from acceptable to excellent, although lower reliability was observed for the Speaking Skills subscale. Discriminative validity analyses indicated that the DSS differentiated students based on debate experience and performance. The DSS provides a psychometrically validated self-report instrument for assessing perceived debate competencies and may inform instructional planning.

Keywords

Debate Skills Scale self-report measurement debate competencies high school students psychometric validation

Introduction

Debate is widely implemented in secondary education and frequently embedded within advanced academic and gifted programming as a structured pedagogical method for cultivating higher-order reasoning, argumentation, and strategic communication (Baketa et al., 2023; Shum et al., 2024; Thornhill-Miller et al., 2023). Within gifted education in particular, debate functions as a performance-based domain through which students demonstrate cognitive complexity, rhetorical fluency, and ethical engagement.

Despite this instructional importance, the field lacks a domain-specific, psychometrically validated instrument designed to measure debate-specific competencies distinct from general critical thinking constructs (Ennis, 2009). This measurement gap limits systematic evaluation of perceived debate competencies and constrains enrichment-oriented instructional planning, necessitating the development of a domain-specific, psychometrically validated measure of debate skills.

Contemporary educational paradigms emphasize assessing complex competencies beyond knowledge acquisition (Bhardwaj et al., 2025; González-Pérez & Ramírez-Montoya, 2022), reinforcing the need for reliable tools to evaluate structured argumentation and strategic communication in performance-based domains such as debate (Walker & Kettler, 2020).

However, such competencies are often difficult to observe directly in classroom settings, increasing the need for well-designed self-report instruments. Social cognitive theory posits that individuals’ beliefs about their capabilities influence performance behaviors, persistence, and strategic engagement (Bandura, 2006). In performance-based domains such as debate, students’ perceived self-efficacy may shape how effectively they construct arguments, respond to counterclaims, and regulate their speaking performance. Accordingly, measuring self-perceived debate competencies provides theoretically grounded insight into mastery and agency in performance. Within social cognitive theory, self-efficacy develops through mastery experiences, observational learning, social persuasion, and performance feedback (Bandura, 2006). In debate settings, repeated argumentative exchanges and peer modeling shape performance beliefs related to argument construction, communication, ethical positioning, and speaking regulation. Accordingly, a multidimensional measurement framework grounded in social cognitive theory is theoretically warranted.

Importantly, self-efficacy is not a generalized trait but a domain-specific construct that varies across contexts and performance conditions (Bandura, 2006). Recent scale development studies reinforce this distinction. For example, Wang and Chuang (2024) demonstrated that general technology self-efficacy measures were insufficient to capture artificial intelligence–specific performance beliefs, necessitating the development of a domain-tailored instrument. Similarly, Greco et al. (2022) emphasized that academic self-efficacy is multidimensional and must reflect context-bound competencies rather than global confidence beliefs. These findings support the argument that debate-related self-efficacy cannot be adequately inferred from general critical thinking or academic self-efficacy scales. Instead, a theoretically grounded, domain-specific instrument is required to capture the performative, interactive, and strategic nature of debate.

Research in advanced academic education has increasingly emphasized the need for domain-specific assessment of performance-based competencies (Thornhill-Miller et al., 2023). However, few instruments isolate debate-specific competencies as a distinct domain of performance. Precise measurement becomes particularly important in enrichment-oriented contexts where instructional decisions rely on valid assessment of students’ demonstrated strengths. Recent psychometric research further underscores the necessity of domain-specific measurement. Xu et al. (2025) demonstrated that generic competency frameworks fail to capture context-dependent performance behaviors, thereby strengthening construct validity when domain-tailored instruments are developed. By extension, debate, particularly in gifted and performance-oriented settings, requires a similarly specialized measurement framework.

Educational methods play a central role in fostering 21st-century competencies such as critical thinking, collaboration, and communication (González-Pérez & Ramírez-Montoya, 2022). Among these, debate functions as a performance-based pedagogical strategy through which students develop structured argumentation, strategic rebuttal, verbal fluency, and ethical engagement (Baketa et al., 2023; Shum et al., 2024). Beyond general cognitive skills, debate has been associated with civic awareness and democratic engagement (Baketa et al., 2023), strengthening students’ capacity for complex analysis, persuasive reasoning (Demir et al., 2016), and cognitive and interpersonal flexibility in dynamic learning contexts (Örün & Sever, 2025).

Debate Technique and Education

Debate is explicitly included in the 2024 Turkish Language and Literature Curriculum as a recommended instructional activity (MoNE, 2024a, 2024b). As both a pedagogical method and oral discourse practice, it supports cognitive, emotional, and social development (Çabuk & Yeni, 2016). Through structured engagement, students analyze, defend, and evaluate competing perspectives while developing tolerance and adaptability (Kennedy, 2009).

Historically rooted in the dialectical traditions of Ancient Greece and later integrated into Islamic intellectual practices as “Jedel,” debate has long functioned as a structured method for examining opposing perspectives (Benli, 2019; Büyükdinç, 2007; Darby, 2007). Although debate draws upon critical thinking, it often requires participants to argue positions independent of personal beliefs, emphasizing rhetorical performance rather than internal conviction (Ennis, 2009).

In practice, debate requires participants to present contrasting perspectives on real or hypothetical issues. It encourages group collaboration, links discussions to real-life scenarios, and encourages the use of structured rhetorical techniques and evidence presentation within assigned perspectives (Graefe, 2024). The process often culminates in a reflective summary, where participants refine their stances or introduce additional evidence to support their claims. In competitive settings, a jury may evaluate debates, providing feedback and scoring based on argument quality and engagement (Arung & Jumardin, 2016; Iman, 2017). This distinction is crucial for researchers: debate can facilitate the development of critical thinking, but it does not inherently demonstrate it, especially when argument roles are assigned regardless of personal beliefs (Elder & Paul, 2020).

As curriculum implementers, teachers shape the educational value of debate by contextualizing it within student-centered pedagogies and guiding critical reflection beyond rhetorical performance (Kim et al., 2024). Allowing students to participate in topic selection enhances motivation and engagement (Zare & Othman, 2015). Additionally, structured guidance from teachers fosters autonomy and accountability, enabling students to take ownership of their learning (Boumediene et al., 2021). Teacher preparation is, therefore, critical. Recent findings emphasize that teachers with training in gifted education are better equipped to identify and support high-ability learners, suggesting that the DSS could complement professional development in this area (Woo et al., 2024).

The benefits of debate extend beyond intellectual development. Research links debate to enhanced critical thinking, public speaking, emotional intelligence, and respect for diverse perspectives (Boumediene et al., 2021; Chikeleze et al., 2018; Iman, 2017; Rodriguez-Dono & Hernandez-Fernandez, 2021). It has also been associated with resilience, collaboration, adaptive problem-solving (Darby, 2007), and creativity and teamwork in active learning environments (Laia, 2019; Lumbangaol & Mazali, 2020). However, without structured debriefing or self-evaluation mechanisms, the extent to which these outcomes translate into sustained and measurable competencies remains uncertain (Bandura, 2006).

In summary, the debate technique is a versatile and comprehensive teaching strategy that cultivates students’ cognitive, social, and emotional capacities. Promoting active engagement, critical analysis, and collaboration equips learners with the skills necessary to address complex real-world scenarios, solidifying its relevance as a versatile instructional tool, particularly when aligned with curriculum objectives and implemented through reflective pedagogy.

Review of Existing Debate-Related Scales

The debate technique has been widely used in educational contexts, particularly in language learning, to foster argumentation, public speaking, and structured reasoning skills, which, in turn, may support the development of critical thinking (Rosyid & Hidayati, 2019). Although several established instruments assess reasoning and critical thinking—such as the New Jersey Test of Reasoning Skills (1983), the California Critical Thinking Dispositions Inventory (1992), and the James Madison Test of Critical Thinking (2004) these tools evaluate general cognitive tendencies rather than debate-specific performance behaviors (Ennis, 2009). For example, the California Critical Thinking Dispositions Inventory evaluates general reasoning dispositions but does not assess time-bound argumentative exchange or structured rebuttal performance in dialogic contexts (Ennis, 2009). This limitation aligns with broader psychometric findings indicating that transferring general self-efficacy or reasoning instruments into specialized performance domains risks underrepresentation of their constructs (Wang & Chuang, 2024; Xu et al., 2025). Similarly, the James Madison Test of Critical Thinking measures analytical reasoning but does not capture interactive strategic communication under assigned positions. The New Jersey Test of Reasoning Skills focuses on logical problem-solving rather than rhetorical fluency, ethical positioning, or performance regulation within competitive debate formats.

Although adapted versions such as the Cornell Critical Thinking Test Level X (Özensoy, 2011) and the UF/EMI Critical Thinking Disposition Scale (Ertaş Kılıç & Şen, 2014) have been used in debate-related research, their focus remains on general cognitive dispositions rather than debate-specific performance behaviors.

To provide a structured comparison of representative instruments and their limitations for debate assessment, Table 1 summarizes their primary purposes, sample contexts, and conceptual constraints.

Table 1.

Comparison of Existing Instruments and Their Limitations for Debate Assessment.

Instrument	Primary Purpose	Sample Context	Limitations for Debate Assessment	Why Insufficient for Debate?
California Critical Thinking Dispositions Inventory	Measures general critical thinking dispositions	Secondary and higher education students	Focuses on reasoning tendencies rather than performance behaviors	Does not assess structured rebuttal, time-regulated exchange, or rhetorical performance
James Madison Test of Critical Thinking	Assesses analytical reasoning ability	College-level students	Emphasizes logical analysis in written format	Lacks measurement of interactive communication and strategic argumentation
New Jersey Test of Reasoning Skills	Measures logical reasoning skills	School-aged students	Focuses on cognitive problem-solving tasks	Does not capture debate-specific fluency, ethical positioning, or oral performance regulation

As summarized in Table 1, existing instruments primarily assess general reasoning or cognitive dispositions. None of these tools operationalize debate as an interactive, performance-based domain requiring strategic rebuttal, communicative fluency, ethical positioning, and time-regulated speaking.

Despite these contributions, there is a notable absence of a valid and reliable tool to assess debating skills. This reinforces the distinction between evaluating general cognitive tendencies and assessing debate-specific skills such as structured rebuttal, speaking fluency, and strategic engagement. Given this conceptual gap, a new tool is needed to address the debate's performative, rhetorical, and ethical aspects, distinct from general cognitive traits.

Grounded in social cognitive theory, we hypothesized a four-factor structure reflecting (a) cognitive mastery in argumentation and strategy, (b) social modeling and communicative interaction, (c) moral agency in ethical collaboration, and (d) performance efficacy in speaking skills. Consistent with multidimensional self-efficacy models in academic contexts (Greco et al., 2022), we conceptualized debate competence as a structured set of interrelated but distinct performance beliefs rather than a unidimensional construct.

Each proposed factor was conceptually derived from core components of social cognitive theory, linking domain-specific self-efficacy beliefs to cognitive mastery, social modeling processes, moral agency, and performance regulation. This study addresses this critical gap by developing and validating the Debate Skills Scale (DSS) specifically for high school students. The research has two primary objectives:

To establish the validity and reliability of the DSS.

To analyze the DSS's exploratory and confirmatory factor structures for high school students.

Method

Research Design

This study employed an exploratory sequential mixed-methods design (QUAL → QUAN), in which qualitative findings informed subsequent quantitative scale validation (Creswell & Plano Clark, 2018). This research design collects and analyses qualitative and quantitative data during distinct phases, with one method taking precedence (Leech & Onwuegbuzie, 2009). The qualitative phase involved multiple steps to inform the development of the DSS items: an extensive literature review, analysis of existing scale items, observation of debate processes, interviews with students participating in debates, and consultations with experts in the field. These qualitative steps informed the conceptual framework of the DSS and guided the formulation of item content grounded in real-world debate experiences.

Subsequently, the DSS items were presented to participants, and the collected data were subjected to quantitative analysis, constituting the study's main phase. This sequential integration of qualitative insights and quantitative validation ensured a comprehensive approach to developing and testing the DSS. The study design reflects a qualitative-to-quantitative sequence, with the qualitative phase shaping item development and the quantitative phase testing psychometric properties.

Study Group

The sample for this study comprised 389 high school students enrolled in various institutions across Istanbul during the 2023–2024 academic year. Participants were enrolled in public high schools and general secondary schools; no International Baccalaureate, Advanced Placement, or formally designated gifted-track programs were included. Participants were selected using a convenience sampling strategy, which allowed for practical access to students actively engaged in debate events. Inclusion criteria required participants to be enrolled in grades 9–12, to have participated in at least one structured debate activity, and to demonstrate sufficient Turkish literacy to comprehend scale items. Exclusion criteria included incomplete responses exceeding 10% missing data and failure to provide informed consent. Missing data were minimal (<3%). Cases with substantial missing responses were removed using listwise deletion. The remaining missing values were handled using full information maximum likelihood (ML) during CFA estimation. High school students were selected based on prior findings that indicate their prominent and consistent participation in debate practices during formal education (Kılıç, 2013). The DSS was administered in group classroom settings using paper-and-pencil forms. Completion time ranged between 12 and 15 min. Administration was conducted under the researcher's supervision to ensure standardized instructions. The demographic characteristics of the participants are presented in Table 2, offering a detailed overview of their distribution across relevant categories. These demographic distributions offer contextual grounding for subsequent validity analyses and ensure transparency in sample representation.

Table 2.

Demographic Information of EFA and CFA Participants.

EFA Sample				CFA Sample
Variable	Category	N	%	Variable	Category	N	%
Gender	Male	69	34.2	Gender	Male	66	35.3
Gender	Female	133	65.8	Gender	Female	121	64.7
Grade level	Preparatory	38	18.8	Grade level	Preparatory	30	16
	Grade 9	57	28.2		Grade 9	50	26.7
	Grade 10	53	26.2		Grade 10	45	24.1
	Grade 11	38	18.8		Grade 11	36	19.3
	Grade 12	16	7.9		Grade 12	26	13.9
Total		202	100	Total		187	100

EFA = exploratory factor analysis; CFA = confirmatory factor analysis.

The exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) sample consisted of 389 participants, collected in two separate phases. Specifically, data from 202 participants were utilized for the EFA, while 187 participants were allocated for the CFA. Among the participants, 135 (34.70%) were male, and 254 (65.30%) were female.

Institutional Review Board approval was obtained from İstanbul Sabahattin Zaim University. No personally identifiable information was collected during the study. The research posed minimal risk, as it involved only completing a self-report scale to evaluate students’ debate-related competencies. The items in the DSS were shared in advance with both students and school administrators to ensure transparency and comprehension. High school students from public schools in two provinces of Türkiye were included through voluntary participation, with permission from school principals. No incentives were offered, and no classroom instruction was disrupted during data collection. The findings are intended to support students’ skill development and inform future instructional practices related to argumentation and communication.

Development Process of the Measurement Tool

The DSS development process followed the methodological principles outlined by DeVellis (2014), which include eight key steps in scale construction: defining the construct, generating an item pool, determining the format, conducting expert review, pretesting, analyzing items, performing factor analysis, and assessing reliability. The construct to be measured was defined as a latent psychological skill set involving observable and self-reported indicators of debate-specific competencies, such as argument construction, strategic rebuttal, verbal fluency, and ethical collaboration. An initial pool of 80 items was developed based on an extensive literature review, existing instruments related to communication and reasoning skills, structured observations of school debate activities, and semistructured interviews with 12 students who had participated in interschool debate competitions. Items reflected four expected domains: argumentation, speaking ability, teamwork/ethics, and debate strategy. These domains were theoretically derived and empirically validated.

The qualitative interview data were analyzed using inductive content analysis. Two independent coders reviewed the transcripts and developed initial open codes. Codes were compared, refined through discussion, and organized into higher-order categories and themes. Intercoder agreement was calculated using percent agreement and reached 87%, indicating satisfactory reliability. A codebook outlining definitions and exemplar quotations is provided in Appendix A. Table 3 presents the code–category–theme structure that informed item generation.

Table 3.

Code–Category–Theme Structure Informing DSS Item Development.

Theme	Category	Example Codes	Item Domain Reflected
Argumentative reasoning	Evidence use	Citing sources, counter-argument	Argumentation and strategy
Strategic engagement	Rebuttal management	Responding under time pressure	Argumentation and strategy
Communicative fluency	Verbal clarity	Articulation, tone control	Speaking skills
Ethical collaboration	Respectful discourse	Turn-taking, fairness	Ethics and cooperation

DSS = Debate Skills Scale.

To ensure content validity, the study adhered to the guidelines of Lawshe (1975) and McKenzie et al. (1999), which recommend consulting at least five experts during scale development. Content validity was assessed using Lawshe's (1975) method. Content Validity Ratio (CVR) values were calculated, and items that did not meet the threshold (CVR < .99 for five experts) were revised or eliminated. The expert panel consisted of two faculty members specializing in Turkish Education, two literature teachers, and one debate expert. These experts assessed the clarity, appropriateness, and relevance of the items. Based on their evaluations, items were added, revised, or removed as necessary, resulting in the elimination of 31 items. The full item reduction process across all stages is summarized in Appendix B.

The revised version was reviewed by a separate group of five students for clarity and comprehensibility prior to implementation, serving as a basic pilot for language control. However, no formal pilot data analysis was conducted. Although the EFA sample size approached recommended participant-to-item ratios, the ratio (approximately 4:1) was slightly below conservative guidelines and is acknowledged as a limitation (Worthington & Whittaker, 2006). The final draft of the DSS consisted of 49 items and utilized a 5-point Likert-type response format (1 = “strongly disagree” to 5 = “strongly agree”).

Data Analysis

The validity and reliability of the 49-item draft scale were assessed using SPSS and AMOS. To evaluate construct validity, an EFA was conducted using principal axis factoring (PAF) to identify latent constructs from the items. Before EFA, assumptions of normality, linearity, and absence of multicollinearity were verified to ensure the robustness of the factor structure. PAF was chosen because it better captures latent psychological constructs than data-reduction techniques (Fabrigar et al., 1999).

Following EFA, CFA was conducted using ML estimation in AMOS 24 to validate the proposed factor structure. Model fit was evaluated according to Hu and Bentler's (1999) criteria (CFI ≥ .90, RMSEA ≤ .08, SRMR ≤ .08). The model demonstrated acceptable fit (χ²/df = 2.1, RMSEA = .058, CFI = .94, TLI = .92, SRMR = .042), supporting the four-factor structure identified during EFA. This two-step approach ensured that the scale's factor structure was theoretically sound and statistically robust. Cronbach's alpha coefficients were calculated to assess internal consistency. Additionally, McDonald's Omega values were computed for each subscale, providing more robust estimates for multidimensional constructs (Hayes & Coutts, 2020). Subscale alpha values ranged from .78 to .91, indicating strong internal consistency.

Prior to performing EFA and CFA, the appropriateness of the sample was evaluated using the Kaiser–Meyer–Olkin (KMO) measure of sampling adequacy and Bartlett's Test of Sphericity. Additionally, skewness and kurtosis values were examined for normality. All values remained within ±2, supporting the use of ML estimation (Gravetter & Wallnau, 2011). All analyses were conducted after confirming that multicollinearity was absent among the items, ensuring a distinct measurement of the underlying constructs (Kline, 2023). Multivariate normality was assessed using Mardia's coefficient and indicated no severe deviation. ML estimation was employed. The KMO value of 0.876 indicated meritorious sampling adequacy (Hair et al., 2014), while Bartlett's test of sphericity (χ²(1176) = 4491.119, p < .001) confirmed that the data were suitable for factor analysis (Fabrigar et al., 1999). Items with factor loadings below .40 or high cross-loadings on multiple factors were removed. A total of four items were excluded, resulting in a final scale of 25. A detailed overview of item reduction across stages is provided in Appendix B.

Researchers’ Positionality

The primary researcher has prior experience in debate coaching and curriculum-based argumentation instruction. To minimize potential bias, qualitative coding was conducted independently by two researchers, and all analytic decisions were documented. The research team includes members with experience in psychometric analysis and mixed-methods research design.

Findings

Results of EFA

Based on prior literature and expert input, the scale was expected to reflect four potential debate skill domains: argumentation/strategy, communication, ethics/collaboration, and speaking. The emergence of four factors in the EFA was consistent with this theoretical expectation, supporting the scale's construct validity. An EFA was conducted on a sample of 202 participants using the initial 49 items of the scale, yielding four distinct dimensions. A CFA was subsequently conducted on an independent sample of 187 participants to validate the factor structure. Following EFA, 20 items were removed due to low or cross-factor loadings, yielding a 29-item structure that explained 50.95% of the total variance. The 29 items were distributed as follows: 16 in Factor 1, four in Factor 2, five in Factor 3, and four in Factor 4. The KMO coefficient was .876, indicating good sampling adequacy for factor analysis. In addition to theoretical expectations, the scree plot and eigenvalue criteria provided support for the four-factor solution. The scree plot illustrating the number of factors obtained through EFA is presented in Figure 1, providing a visual representation of the factor structure.

Figure 1.

Scree plot graph.

As a result of the EFA conducted on the 49 items, the factor loadings and the total variances explained by each factor are presented in Table 4. This table provides a detailed breakdown of each factor's contribution to the overall variance, offering a clear representation of the scale's dimensional structure.

Table 4.

Factor Loadings From the Exploratory Factor Analysis.

			Rotated Factor Variances
	Item No	Common Factor Variances	Factor 1	Factor 2	Factor 3	Factor 4
1	24	.534	.692
2	32	.493	.668
3	33	.449	.667
4	37	.511	.664
5	39	.580	.647
6	23	.459	.643
7	34	.492	.639
8	3	.415	.615
9	29	.507	.612
10	26	.442	.612
11	2	.422	.596
12	40	.515	.580
13	43	.425	.559
14	41	.387	.549
15	36	.513	.544
16	21	.451	.532
17	12	.820		.863
18	13	.732		.799
19	11	.685		.765
20	14	.429		.510
21	47	.572			.730
22	27	.489			.666
23	18	.425			.636
24	17	.501			.612
25	49	.466			.536
26	42	.479				.659
27	35	.567				.631
28	31	.516				.613
29	30	.498				.486
Total Variance Explained		50.95%	30.55%	8.21%	6.43%	5.76%

In the scale development process, factor loadings were categorized as high (.60 or above) or medium (.30–.59), regardless of their sign (Büyüköztürk et al., 2018). Items with factor loadings below .30 were removed, and as shown in Table 4, all remaining factor loadings were above .50. The EFA revealed that the 29-item structure accounted for 50.95% of the total variance. Specifically, the first factor accounted for 30.55% of the variance; the second, 8.21%; the third, 6.43%; and the fourth, 5.76%.

The EFA identified four distinct factors within the scale. The first factor, consisting of 16 items (2, 3, 21, 23, 24, 26, 29, 32, 33, 34, 36, 37, 39, 40, 41, 43), was labeled “Argumentation and Strategy Skills.” Examples of this factor include: “I can bring a new angle to the match” and “I can develop logical and coherent arguments.” The second factor, comprising four items (11, 12, 13, 14), was designated as “Communication Skills,” with example items such as: “I use my body language effectively.” The third factor, which includes five items (17, 18, 27, 47, 49), was named “Debate Ethics and Collaboration Skills.” An example of this factor is: “I use respectful language in arguments.” Finally, the fourth factor, consisting of four items (30, 31, 35, 42), was identified as “Speaking Skills.” An illustrative item for this factor is: “I do not fall into repetitions and repetitions in speeches.” All items were originally developed and validated in Turkish. English translations are provided for reference purposes only. The full set of final DSS items is presented in Appendix C.

Results of CFA

CFA was conducted on an independent sample of 187 participants to validate the structure identified through EFA. Figure 2 presents the path diagram and factor loadings for the four-factor structure of the DSS, providing a comprehensive visualization of the model.

Figure 2.

Path diagram of CFA results.

During CFA, four items (M14, M27, M35, M42) were removed due to insufficient standardized factor loadings. This resulted in a final 25-item structure distributed as 16 items in Argumentation and Strategy Skills, three in Communication Skills, four in Debate Ethics and Collaboration Skills, and two in Speaking Skills. These items were theoretically reviewed and found to duplicate content already measured in other factors or reflect ambiguous behavioral indicators not central to the core construct definitions. Their removal improved model fit without altering the underlying four-factor structure suggested by EFA, preserving theoretical integrity. All standardized loadings exceeded the acceptable threshold of .30, ranging from .52 to .98. Examination of the path diagram revealed that all items demonstrated good, very good, or excellent factor loadings, except for one item (M2), which had an acceptable factor loading of 0.52. The fit indices for the DSS are presented in Table 5, providing further evidence of the DSS's structural validity.

Table 5.

Model Fit Indices of the DSS Confirmatory Factor Analysis.

Fit Index	Value
χ²/df	1.63
RMSEA	.063
RMSEA (90% CI)	[.052, .074]
CFI	.913
TLI	.902
IFI	.915
SRMR	.067
RMR	.053
GFI	.826

Note. DSS = Debate Skills Scale; RMSEA = Root Mean Square Error of Approximation; CFI = Comparative Fit Index; TLI = Tucker-Lewis Index; IFI = Incremental Fit Index; SRMR = Standardized Root Mean Square Residual; RMR = Root Mean Square Residual; GFI = Goodness-of-Fit Index.

Model fit was evaluated according to Hu and Bentler (1999).

The removal of the four items did not result in the emergence of new latent structures or the collapse of existing dimensions. Instead, it strengthened the original factor solution by enhancing clarity and reliability without deviating from the EFA's theoretical framework. The fit indices of the DSS, including CMIN/df, RMSEA, CFI, TLI, IFI, RMR, and SRMR, indicated acceptable fit, indicating that the model aligns well with the observed data. Although modification indices were reviewed, no error covariances were added because they lacked theoretical justification. The model achieved an acceptable fit without requiring such modifications. Although GFI was slightly below conventional cutoffs, other fit indices indicated acceptable model fit (Hu & Bentler, 1999).

The correlation coefficients between the subdimensions of the measurement tool and the items were analyzed to assess internal consistency, as Karakoç and Dönmez (2014) suggested. These coefficients provide insight into the degree of alignment between items within each factor and the overall structure of the DSS. The correlation coefficients for the four-factor structure, as identified through EFA and confirmed through CFA, are presented in Table 6, offering a detailed evaluation of the DSS's internal consistency.

Table 6.

Pearson Correlations Between the Debate Skills Scale (DSS) and Its Subdimensions.

Dimensions	DSS	ASS	CS	DECS	SS
Debate Skılls Scale (DSS)	1	.952	.599	.622	.617
Argumentation and strategy skills (ASS)	.952	1	.399	.459	.525
Communication skills (CS)	.599	.399	1	.348	.322
Debate ethics and collaboration skills (DECS)	.622	.459	.348	1	.281
Speaking skills
(SS)	.617	.525	.322	.617	1

**p < .001.

An examination of Table 6 reveals statistically significant correlations among the subdimensions of the DSS, indicating high internal consistency. Specifically, a moderate positive correlation was observed between the ASS subdimension and the CS subdimension (r = .399, p < .01), as well as between the ASS subdimension and the DECS subdimension (r = .459, p < .01). The ASS subdimension also demonstrated a moderate positive correlation with the SS subdimension (r = .525, p < .01) and a high positive correlation with the DSS score (r = .952, p < .01). Similarly, the CS subdimension showed moderate positive correlations with the DECS subdimension (r = .348, p < .01), the SS subdimension (r = .322, p < .01), and the DSS score (r = .599, p < .01). Additionally, the DECS subdimension exhibited a moderate positive correlation with the SS subdimension (r = .622, p < .01) and a high positive correlation with the DSS score (r = .617, p < .01). All relationships were statistically significant at the .01 level.

The consistent positive direction of these correlations indicates that the measurement tool demonstrates high internal consistency. This finding supports the DSS's construct validity, confirming its reliability and suitability for assessing the intended constructs.

The DSS items aim to differentiate individuals with the measured characteristic from those without. To achieve this, a comparison was conducted by forming upper and lower groups comprising the top 27% and bottom 27% of participants based on their item scores (Büyüköztürk, 2018). The discrimination of the 25 items in the DSS was analyzed using an independent-samples t-test across these groups. Table 7 presents detailed information on the discrimination of the DSS items, demonstrating their ability to effectively distinguish between individuals with varying levels of the measured characteristic.

Table 7.

Comparison of the Values of the 27% of Upper-Lower Groups.

Item	Upper/Lower	N	M	SD	t	p
M2	Upper	52	4.7115	.53638	7.066	p < .001
	Lower	51	3.6275	.95835
M3	Upper	52	4.5577	.57440	9.735	p < .001
	Lower	51	3.0588	.94682
M11	Upper	52	4.4808	.75382	6.7650	p < .001
	Lower	51	3.0392	1.32606
M12	Upper	52	4.4808	.69987	7.403	p < .001
	Lower	51	3.0000	1.24900
M13	Upper	52	4.2308	.75707	6.432	p < .001
	Lower	51	3.0000	1.14891
M17	Upper	52	4.7308	.56414	6.773	p < .001
	Lower	51	3.6667	.97297
M18	Upper	52	4.5577	.75182	3.927	p < .001
	Lower	51	3.8235	1.10826
M21	Upper	52	4.4615	.69906	11.061	p < .001
	Lower	51	2.7059	.90098
M23	Upper	52	4.7500	.51924	10.905	p < .001
	Lower	51	3.0196	1.00976
M24	Upper	52	4.7115	.53638	8.100	p < .001
	Lower	51	3.2745	1.15028
M26	Upper	52	4.6538	.55606	10.711	p < .001
	Lower	51	3.1176	.86364
M29	Upper	52	4.7692	.42544	10.261	p < .001
	Lower	51	3.4902	.78416
M30	Upper	52	4.3846	.69038	8.615	p < .001
	Lower	51	3.0196	.90532
M31	Upper	52	3.7885	.82454	7.659	p < .001
	Lower	51	2.4510	.94475
M32	Upper	52	4.6154	.59914	9.677	p < .001
	Lower	51	3.3333	.73937
M33	Upper	52	4.7115	.49849	11.253	p < .001
	Lower	51	3.0588	.92546
M34	Upper	52	4.6346	.52502	11.012	p < .001
	Lower	51	3.1176	.84017
M36	Upper	52	4.8269	.38200	8.766	p < .001
	Lower	51	3.5294	.98697
M37	Upper	52	4.7885	.45747	11.414	p < .001
	Lower	51	3.0196	1.00976
M39	Upper	52	4.9423	.23544	13.764	p < .001
	Lower	51	3.3137	.81216
M40	Upper	52	4.6154	.59914	10.168	p < .001
	Lower	51	2.9608	.99922
M41	Upper	52	4.7692	.73071	6.678	p < .001
	Lower	51	3.6667	.93095
M43	Upper	52	4.6731	.73354	10.870	p < .001
	Lower	51	2.6275	1.13068
M47	Upper	52	4.5769	.77576	3.900	p < .001
	Lower	51	3.8431	1.10223
M49	Upper	52	4.8654	.34464	5.311	p < .001
	Lower	51	3.9804	1.14000

An analysis of Table 7 reveals a statistically significant difference favoring the upper group for all items (p < .001). This result indicates that each item on the DSS effectively distinguishes between individuals with high and low levels of the measured characteristic, thereby confirming the DSS's discriminatory capability.

Findings Regarding Reliability

Analysis of Internal Consistency and Reliability

The internal consistency of the DSS and its subdimensions was assessed using Cronbach's Alpha (α) coefficients. The findings indicated that the DSS exhibits internal consistency reliability ranging from acceptable to excellent across all dimensions. The DSS demonstrated strong internal consistency for the overall scale and most subdimensions. The ASS subdimension yielded α = .912 (excellent). The CS subdimension yielded α = .864 (good). The DECS yielded α = .708 (good). The SS yielded α = .637 (adequate). The total DSS yielded α = .917 (excellent). The results align with the interpretation criteria set out by Kılıç (2016), which categorize Cronbach's alpha coefficients as follows: > .90 = excellent, .70–.89 = good, .60–.69 = acceptable, .50–.59 = poor, and < .50 = undesirable. The DSS demonstrated strong internal consistency for the overall scale and most subdimensions. However, the Speaking Skills subscale showed acceptable but comparatively lower reliability (α = .637), suggesting the need for further item development in future research.

Evaluation of Unidimensionality

In addition to internal consistency, the unidimensionality of the DSS was assessed using second-order CFA and unidimensional factor analysis to determine whether all items could be consolidated into a single overarching construct of debate abilities. The fit indices did not reach acceptable thresholds, indicating that a unidimensional structure was not supported.

Thus, these findings indicate that the DSS cannot be regarded as a unidimensional scale. Each subdimension, Argumentation and Strategy Skills, Communication Skills, Debate Ethics and Collaboration Skills, and Speaking Skills, should be considered an independent yet interconnected element of students’ debate competency. These findings substantiate the theoretical justification for a multidimensional framework, consistent with the definition of debate skills, which encompasses cognitive, social, ethical, and expressive competencies. Consequently, forthcoming implementations of the DSS should evaluate and interpret each subdimension autonomously, while acknowledging their interrelated contributions to the overarching construct of argument skills.

Discussion

This study contributes to the existing literature by highlighting the essential role of cognitive skills, particularly critical thinking, in assessing debate ability (Ennis, 2009). Based on this theoretical perspective, the developed scale's Argumentation and Strategy Skills dimension embodies the cognitive foundation for systematic, evidence-based thinking. This study builds upon previous research by developing and validating a psychometrically sound, multidimensional tool to assess high school students’ debate competencies. Importantly, the DSS is a self-report measure and therefore captures students’ perceived debate competencies rather than observed debate performance. Accordingly, results should be interpreted as beliefs about debate-related skills (e.g., self-efficacy–aligned perceptions), not as direct indicators of competitive outcomes or external ratings. The full scale and sample items are provided in Appendix C to support transparency and future use.

Compared with instruments that emphasize a single facet (e.g., general reasoning dispositions or speaking confidence), the DSS operationalizes debate as a multidimensional competency profile that includes cognitive (argumentation/strategy), communicative, ethical/collaborative, and speaking-related components. This broader operationalization is intended to reduce construct underrepresentation when debate is used as a performance-based instructional method.

The scale development process progressed through successive refinement stages. The initial 80-item pool was reduced to 49 items following expert review. EFA refined the DSS to a 29-item structure across four factors, accounting for 50.95% of the total variance. CFA subsequently supported the four-factor structure and resulted in a final 25-item scale.

The first and most significant aspect, Argumentation and Strategy Skills, consists of 16 items that represent technical debate knowledge, the formulation of logical arguments, the presentation of organized speeches, and the application of evidence-based argumentation. This feature aligns closely with other studies highlighting structured reasoning and critical thinking as fundamental components of debate ability (Chikeleze et al., 2018). Recent studies also demonstrate how debate fosters critical participatory literacy and student agency, positioning argumentation not just as persuasion but as a tool for civic engagement and problem-solving (Malloy et al., 2020).

The second category, Communication Skills, comprises four components that emphasize both nonverbal and verbal communication, including body language, tone, diction, and audience participation. These competencies align with 21st-century educational frameworks, such as the P21 model (Partnership for 21st Century Learning, 2019), which emphasizes effective communication in student development. This aligns with research highlighting the importance of collaborative meaning-making and intercultural communication in advanced academic contexts, where debate functions as both a cognitive and socioemotional learning tool (El Majidi et al., 2023 , 2024). Debate, as a pedagogical approach, fosters the development of these skills through active engagement and reflective discourse (Demir et al., 2016). These findings are consistent with broader 21st-century skill frameworks emphasizing the “4Cs”—creativity, critical thinking, communication, and collaboration (Thornhill-Miller et al., 2023).

The third element, Debate Ethics and Cooperation Skills, encompasses students’ capacity to engage in ethical and constructive interactions. These qualities are crucial for advancing democratic principles and fostering interpersonal respect, as evidenced by previous research (Graefe, 2024; Rodriguez-Dono & Hernandez-Fernandez, 2021; Woo et al., 2024; Zare & Othman, 2015). This feature illustrates the overarching social roles of debate in promoting mutual understanding and collaborative problem-solving (Demir et al., 2016). Moreover, recent findings suggest that equitable debate structures can empower underrepresented students and reduce systemic barriers in advanced academic programs (Baketa et al., 2023).

The fourth element, Speaking Skills, was deliberately differentiated from general communication to highlight linguistic clarity, coherence, and fluency. While both communication and speaking involve expression, the communication factor captured broader interpersonal and collaborative elements, whereas the speaking factor reflected performance-based delivery skills. This distinction highlights the multidimensional nature of debate competencies. It facilitates a more focused assessment of linguistic competence. Elements within this dimension pertain to maintaining focus, preventing redundancy, and guaranteeing the pertinence of spoken information. These features correspond with research indicating that debate involvement improves speaking skills and communication confidence (Chikeleze et al., 2018; Iman, 2017). Parallel findings on cognitive engagement in accelerated curricula demonstrate that nuanced skill sets—such as communication and argumentation—play critical roles in sustaining motivation and achievement in advanced academic contexts (Shum et al., 2024). Collectively, these four criteria provide a comprehensive and conceptually robust framework for evaluating perceived debate competencies.

Implications for Classroom Practice

The DSS can be used as a formative assessment to support debate instruction by helping teachers and students identify perceived strengths and instructional needs across four domains (argumentation/strategy, communication, ethics/collaboration, and speaking). For example, teachers can administer the DSS before and after a debate unit to monitor perceived growth and to plan targeted mini-lessons (e.g., rebuttal structure, evidence use, or speaking clarity).

In enrichment-oriented settings, DSS profiles may inform differentiated supports. Students reporting high argumentation/strategy but lower speaking skills may benefit from structured delivery practice, while students reporting lower ethics/collaboration may benefit from explicit norms, role rotation, and reflection routines that promote respectful discourse.

Because the DSS is a self-report tool, it is best used alongside complementary evidence (e.g., teacher observations, peer feedback, debate rubrics, or performance ratings) to triangulate instructional decisions and reduce the risk of over-interpreting perceptions as performance.

Beyond classroom practice, the DSS may also inform broader research and policy discussions. Prior research suggests that debate participation is associated with gains in critical thinking, literacy, and postsecondary outcomes, particularly among historically underrepresented students (Schueler & Larned, 2025). In this context, the DSS provides a structured framework for examining how students perceive their debate-related competencies across multiple domains.

The scale may support equity-oriented reflection in advanced academic settings by helping educators monitor participation patterns and responsiveness to students who may be less confident or less vocal. However, claims regarding reductions in opportunity or excellence gaps remain provisional until future studies link DSS scores to longitudinal outcomes and independent performance indicators.

Beyond Türkiye, cultural adaptation and cross-national validation may extend the scale's applicability to diverse educational systems (Ho et al., 2025). Its multidimensional framework provides researchers with a structured tool for examining debate-related competencies in cognitive, ethical, and communicative domains.

Conclusion, Limitations, and Future Directions

This research introduces the DSS as a valid and reliable tool for evaluating the complex aspects of debate competencies among high school students. The DSS effectively addresses a significant methodological gap in educational assessment by integrating cognitive, communicative, ethical, and expressive elements. The multidimensional structure illustrates the complexity of debate as a pedagogical and developmental tool, providing educators with a nuanced framework for assessing students’ perceived competencies in debate contexts. Although this study contributes valuable insights, it has several limitations. First, the sample included substantially more female than male participants across both the EFA and CFA samples, which may limit generalizability given evidence of gendered participation and outcomes in some competitive debate contexts. Second, data were collected in Istanbul, Türkiye; debate norms and ethical frameworks may vary across cultural contexts (e.g., emphases on autonomy/individual rights versus collective harmony/duty), so the DSS may require cultural adaptation and revalidation before use in other settings. Third, the DSS relies on self-report, which is susceptible to social desirability bias, self-enhancement, and differences in self-awareness; thus, scores reflect perceived competencies rather than observed performance. Fourth, convergent and discriminant validity evidence with external measures (e.g., communication apprehension, critical thinking, self-efficacy, or independent debate performance ratings) was beyond the scope of this study and should be examined in future research. Fifth, the EFA sample size was close to the minimum recommendations for the initial 49-item pool, which may affect the stability of the factor solution and warrants replication with larger samples. Finally, subscale lengths were unequal (e.g., a 16-item factor and a two-item factor), which may influence reliability estimates and content coverage; future work should consider further item development to strengthen the shorter subscales.

Future research must address these limitations by implementing the DSS across various educational stages, encompassing primary, middle, and postsecondary levels. Further, integrating DSS with studies on cognitive engagement and motivational dynamics could deepen understanding of how debate sustains student participation and achievement across diverse contexts (Phuti et al., 2023; Romero-Díaz de la Guardia, 2022). In addition, further studies could explore how the DSS contributes to identifying gifted and high-ability learners, as debate can nurture motivation, empowerment, and advanced cognitive engagement (El Majidi et al., 2023; Malloy et al., 2020). Cross-national studies, such as recent work on gifted education in South Korea, also demonstrate the value of culturally responsive approaches and the need for adaptable instruments across diverse systems (Kim et al., 2024). The translation and cultural adaptation of the DSS may facilitate its application in cross-national studies, enabling comparative analyses of debate competencies across different educational systems. In advanced academic contexts, the DSS may provide a structured way to examine students’ perceived higher-order competencies associated with debate. However, its use for identifying gifted or high-ability learners would require additional criterion-based validation with external performance indicators. Such cross-contextual use may also support equitable pathways for advanced academics by ensuring that enrichment opportunities, such as debate, are accessible to diverse learners across different educational systems. Integrating observational or performance-based assessments with self-report data would enhance the instrument's construct validity and offer a more comprehensive perspective on students’ abilities. The DSS is a theoretically grounded and psychometrically validated instrument that enhances the measurement of debate skills in educational research. The application can potentially enhance assessment practices and instructional design, fostering inclusive, respectful, and critically engaged classroom discourse across diverse educational contexts. In addition, comparative and cross-cultural studies highlight the importance of adapting debate-based assessments across different systems to ensure equitable access to enrichment and advanced learning opportunities worldwide (Ho et al., 2025; Malloy et al., 2020).

Footnotes

Acknowledgments

The authors would like to thank the participating students for their valuable contributions.

ORCID iDs

Esra Töre

Kamil Arif Kırkıç

Ethics Approval

This study was approved by the Ethics Committee of İstanbul Sabahattin Zaim University, Approval No. 2024/01, date: February 16, 2024. All participants were informed about the purpose of the study and provided written informed consent. Participation was entirely voluntary, and participants were assured of confidentiality, anonymity, and the right to withdraw at any stage. No identifiable personal information was collected. The study posed minimal risk, involving only completion of a self-report scale. Sample items were shared in advance. Students were recruited voluntarily through school administrators in two cities in Türkiye.

Consent to Participate

Participation was voluntary, and informed consent was obtained from all students and their parents.

Consent for Publication

This study did not involve individual person data; hence consent for publication was not applicable.

Author Contributions

All authors contributed to the study design, data analysis, and manuscript preparation. The first author led the conceptual framing and manuscript revisions.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is an output of the scientific research project titled “Applied development model for twenty-first century skills: Enhancing high school students’ debate skills within the framework of faculty-school collaboration,” supported by Istanbul Sabahattin Zaim University.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The data supporting the findings of this study are available from the corresponding author upon reasonable request.

Author Biographies

Esra Töre is a faculty member in the Department of Educational Sciences at Istanbul Sabahattin Zaim University and a visiting scholar at Indiana University. Her research focuses on educational leadership and organizational behavior in education.

Kamil Arif Kırkıç is a faculty member in the Department of Educational Sciences at Istanbul Sabahattin Zaim University. He holds a PhD from Marmara University and specializes in curriculum development, program evaluation, instructional design, teacher education, and innovative learning methods.

Burak Uzun is a teacher at Silivri Necip Sarıbekir Vocational and Technical Anatolian High School, under the Ministry of National Education in İstanbul, Türkiye. His professional interests include vocational education and teacher professional development.

Appendix A

Codebook for Qualitative Item Development Phase.

Appendix B

Item Reduction Flow.

Appendix C

Final Version of the Debate Skills Scale (DSS) (25 Items)

The DSS was originally developed and validated in Turkish. The items below are presented in their original Turkish form. The full scale is provided to support transparency and use in educational contexts. English translations are provided for reader convenience only; the scale was developed and validated in Turkish.

Instructions: Please indicate the extent to which each statement reflects your debate skills by selecting the most appropriate option ranging from “strongly disagree” to “strongly agree.”

References

Arung

Jumardin

(2016). Improving the students’ speaking skill through debate technique. Journal of English Education, 1(1), 70–76. https://doi.org/10.31327/jee.v1i1.85

Baketa

Kovačić

Mornar

(2023). The role of debate in cognitive development and development of civic awareness of young people: A qualitative analysis. Sociologija, 65(3), 400–417. https://doi.org/10.2298/SOC2303400B

Bandura

(2006). Guide for constructing self-efficacy scales. In Pajares

Urdan

(Eds.), Self-efficacy beliefs of adolescents (Vol. 5, pp. 307–337). Information Age Publishing.

Benli

. (2019). Klasik Türk edebiyatında münazara [Debate in classical Turkish literature] [Unpublished doctoral dissertation]. Istanbul University.

Bhardwaj

Zhang

Tan

Y. Q.

Pandey

(2025). Redefining learning: Student-centered strategies for academic and personal growth. Frontiers in Education, 10(February), 1518602. https://doi.org/10.3389/feduc.2025.1518602

Boumediene

Hamadi

N. A.

Berrahal

K. F.

(2021). Classroom debate to enhance critical thinking skills. El Bahith for Sport and Social Sciences, 4(7), 441–457.

Büyükdinç

N. F.

(2007). Osmanlı medreselerinde bir öğretim metodu olarak münâzara ve Ahmet Cevdet Paşa’nın Âdâb-ı Sedâd adlı eseri [Debate as a teaching method in Ottoman madrasas and Ahmet Cevdet Pasha's work titled Âdâb-ı Sedâd] [Unpublished master's thesis]. Marmara Üniversitesi.

Büyüköztürk, Ş. (2018). Sosyal bilimler için veri analizi el kitabı [Handbook of data analysis for social sciences]. Pegem Akademi.

Büyüköztürk

Ş.

Çokluk

Ö.

Şekercioğlu

(2018). Sosyal bilimler için çok değişkenli istatistik: SPSS ve LISREL uygulamaları [Multivariate statistics for social sciences: SPSS and LISREL applications] (4. baskı). Ankara: Pegem Akademi.

10.

Çabuk

Yeni

(2016). Okul öncesi eğitimde yeni bir teknik: Münazara. [A new technique in preschool education: Debate]. Kastamonu Education Journal, 24(5), 2439–2456.

11.

Chikeleze

Johnson

Gibson

(2018). Let’s argue: Using debate to teach critical thinking and communication skills to future leaders. Journal of Leadership Education, 18(2), 123–137. https://doi.org/10.12806/V17/I2/A4

12.

Creswell, J. W., & Plano Clark, V. L. (2018). Designing and conducting mixed methods research (3rd ed.). Sage.

13.

Darby

(2007). Debate: A teaching-learning strategy for developing competence in communication and critical thinking. Journal of Dental Hygiene, 81(4), 1–10.

14.

Demir

M. K.

Şahin

Ç.

Tutkun

(2016). Sosyal bilgiler dersi için sınıf öğretmeni adaylarının münazara konusu oluşturma becerilerinin değerlendirilmesi. [Evaluation of classroom teacher candidates’ ability to create a debate topic for social studies course]. Journal of Kazım Karabekir Education Faculty, 32, 51–66.

15.

DeVellis

R. F.

(2014). Ölçek geliştirme: Kuram ve uygulamalar [Scale development: Theory and applications] (T. Totan, Trans.). Nobel Academic Publishing.

16.

Elder

Paul

(2020). Critical thinking: Tools for taking charge of your learning and your life (4th ed.). Rowman & Littlefield.

17.

El Majidi

De Graaff

Janssen

(2023). Debate pedagogy as a conducive environment for L2 argumentative essay writing. Language Teaching Research, 30(4), 1764–1788. https://doi.org/10.1177/13621688231156998

18.

El Majidi

De Graaff

Janssen

(2024). Debate as a pedagogical tool for developing speaking skills in second language education. Language Teaching Research, 28(6), 2431–2452. https://doi.org/10.1177/13621688211050619

19.

Ennis

R. H.

(2009). An annotated list of critical thinking tests. Retrieved May 12, 2024, from https://criticalthinking.net/wp-content/uploads/2024/04/An-Annotated-List-of-English-Language-Critical-Thinking-Tests.pdf

20.

Ertaş Kılıç

Şen

A. İ.

(2014). UF/EMI eleştirel düşünme eğilimi Ölçeğini türkçeye uyarlama çalışması. [Adaptation of the Critical Thinking Disposition Scale into Turkish]. Education and Science, 39(Supplement 2), 1–12. https://doi.org/10.15390/eb.2014.3632

21.

Fabrigar

L. R.

Wegener

D. T.

MacCallum

R. C.

Strahan

E. J.

(1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4(3), 272–299. https://doi.org/10.1037/1082-989X.4.3.272

22.

González-Pérez

L. I.

Ramírez-Montoya

M. S.

(2022). Components of education 4.0 in 21st century skills frameworks: Systematic review. Sustainability, 14(3), Article 1493. https://doi.org/10.3390/su14031493

23.

Graefe

A. K.

(2024). Gifted high school students’ perceptions of the impact of classroom power dynamics on motivation and empowerment. Journal of Advanced Academics, 35(1), 6–55. https://doi.org/10.1177/1932202X231220414

24.

Gravetter

F. J.

Wallnau

L. B.

(2011). Essentials of statistics for the behavioral sciences (8th ed.). Wadsworth Cengage Learning.

25.

Greco

Annovazzi

Palena

Camussi

Rossi

Steca

(2022). Self-efficacy beliefs of university students: Examining factor validity and measurement invariance of the new Academic Self-Efficacy Scale. Frontiers in Psychology, 12(January), 1–14. https://doi.org/10.3389/fpsyg.2021.498824

26.

Hair

J. F.

Black

W. C.

Babin

B. J.

Anderson

R. E.

(2014). Multivariate data analysis (7th ed., Pearson New International Edition). Pearson Education Limited.

27.

Hayes

A. F.

Coutts

J. J.

(2020). Use Omega rather than Cronbach’s alpha for estimating reliability. Communication Methods and Measures, 14(1), 1–24. https://doi.org/10.1080/19312458.2020.1718629

28.

S. Y.

Cowan

Bache

Nagtzaam

(2025). Beyond diplomats: Why model UNs should be for everyone—To advance education for sustainable development and build capacity for intergovernmental engagement. Sustainable Earth Reviews, 8(1), Article 9. https://doi.org/10.1186/s42055-025-00111-3

29.

L. T.

Bentler

P. M.

(1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55. https://doi.org/10.1080/10705519909540118

30.

Iman

J. N.

(2017). Debate instruction in EFL classroom: Impacts on the critical thinking and speaking skill. International Journal of Instruction, 10(4), 87–108. https://doi.org/10.12973/iji.2017.1046a

31.

Karakoç

F. Y.

Dönmez

(2014). Ölçek geliştirme çalışmalarında temel ilkeler. [Basic principles in scale development studies]. World of Medical Education, 13(40), 39–49. https://doi.org/10.25282/ted.228738

32.

Kennedy

R. R.

(2009). The power of in-class debates. Active Learning in Higher Education, 10(3), 225–236. https://doi.org/10.1177/1469787409343186

33.

Kim

Ryoo

Lee

(2024). Development of a scale to measure the parental competency of science-gifted students in South Korea. Journal of Advanced Academics, 35(1), 125–155. https://doi.org/10.1177/1932202X231219277

34.

Kılıç

(2013). Sampling methods. Journal of Mood Disorders, 3(1), 44–46. https://doi.org/10.5455/jmood.20130325011730

35.

Kılıç

(2016). Cronbach’s alpha reliability coefficient. Psychiatry and Behavioral Sciences, 6(1), 47. https://doi.org/10.5455/jmood.20160307122823

36.

Kline

R. B.

(2023). Principles and practice of structural equation modeling (5th ed.). Guilford Press.

37.

Laia

(2019). Improving the students’ ability in speaking by using debate technique at the tenth grade of SMK Negeri 1 Aramo. Journal of English Language Teaching, 4(1), 1–19. https://doi.org/10.30998/scope.v4i01.4408

38.

Lawshe

C. H.

(1975). A quantitative approach to content validity. Personnel Psychology, 28(4), 563–575. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x

39.

Leech

N. L.

Onwuegbuzie

A. J.

(2009). A typology of mixed methods research designs. Quality & Quantity, 43(2), 265–275. https://doi.org/10.1007/s11135-007-9105-3

40.

Lumbangaol

R. R.

Mazali

M. R.

(2020). Improving students’ speaking ability through debate technique. A Journal of Culture, English Language Teaching, Literature and Linguistics, 7(2), 163–172. https://doi.org/10.22219/celtic.v7i2.13674

41.

Malloy

J. A.

Tracy

K. N.

Scales

R. Q.

Menickelli

Scales

W. D.

(2020). It’s not about being right: Developing argument through debate. Journal of Literacy Research, 52(1), 79–100. https://doi.org/10.1177/1086296X19896495

42.

McKenzie

J. F.

Wood

M. L.

Kotecki

J. E.

Clark

J. K.

Brey

R. A.

(1999). Establishing content validity: Using qualitative and quantitative steps. American Journal of Health Behavior, 23(4), 311–318. https://doi.org/10.5993/AJHB.23.4.9

43.

MoNE [Ministry of National Education]. (2024a). Ortaokul Türkçe dersi öğretim programı: Türkiye Yüzyılı Maarif Modeli. [Secondary school Turkish language curriculum: Türkiye century education model]. Millî Eğitim Bakanlığı Yayınları.

44.

MoNE [Ministry of National Education]. (2024b). Ortaöğretim Türk Dili ve Edebiyatı Öğretim Programı (Hazırlık, 9, 10, 11 ve 12. Sınıflar): Türkiye Yüzyılı Maarif Modeli. [Secondary education Turkish language and literature curriculum (Preparatory, Grades 9–12): Türkiye century education model] Millî Ankara: Eğitim Bakanlığı Yayınları.

45.

Örün

Ö.

Sever

(2025). Cognitive flexibility: An educational approach. In Erdem

Kaya

(Eds.), Resilience, adaptability, and cultural awareness within the educational landscape (pp. 179–208). Springer.

46.

Özensoy

A. U.

(2011). Eleştirel okumaya göre düzenlenmiş sosyal bilgiler dersinin eleştirel düşünme becerisine etkisi. [The effect of social studies course organized according to critical reading on critical thinking skills]. Mersin University Journal of Faculty of Education, 7(2), 13–25. https://doi.org/10.17860/efd.31172

47.

Partnership for 21st Century Learning. (2019). Battle for kids. Retrieved May 12, 2024, from https://www.battelleforkids.org/insights/p21-resources/

48.

Phuti

Koloi-Keaikitse

Tsheko

G. N.

Oppong

(2023). Developing and validating a soft skills assessment scale for psychoeducational assessment. SAGE Open, 13(4), 1–15. https://doi.org/10.1177/21582440231218066

49.

Rodriguez-Dono

Hernández-Fernández

(2021). Fostering sustainability and critical thinking through debate—A case study. Sustainability, 13(11), 1–24. https://doi.org/10.3390/su13116397

50.

Romero-Díaz de la Guardia

J. J.

García-Garnica

Chacón-Cuberos

Expósito-López

(2022). Psychometric validation of a teamwork skills scale in a vocational training context. SAGE Open, 12(2), 1–12. https://doi.org/10.1177/21582440221103256

51.

Rosyid

Hidayati

I. N.

(2019). Thinking critically through debating: Promoting students’ HOTS and speaking competence. The 10th AISOFOLL, 78–88.

52.

Schueler

B. E.

Larned

K. E.

(2025). Interscholastic policy debate promotes critical thinking and college-going: Evidence from Boston public schools. Educational Evaluation and Policy Analysis, 47(1), 108–134. https://doi.org/10.3102/01623737231200234

53.

Shum

K. Z.

Suldo

S. M.

Shaunessy-Dedrick

O’Brennan

L. M.

(2024). A qualitative exploration of the facilitators and barriers of cognitive engagement among ninth-grade students in accelerated curricula. Journal of Advanced Academics, 35(1), 89–124. https://doi.org/10.1177/1932202X231223760

54.

Thornhill-Miller

Camarda

Mercier

Burkhardt

J. M.

Morisseau

Bourgeois-Bougrine

Vinchon

El Hayek

Augereau-Landais

Mourey

Feybesse

Sundquist

Lubart

(2023). Creativity, critical thinking, communication, and collaboration: Assessment, certification, and promotion of 21st century skills for the future of work and education. Journal of Intelligence, 11(3), Article 54. https://doi.org/10.3390/jintelligence11030054

55.

Walker

Kettler

(2020). Developing critical thinking skills in high ability adolescents: Effects of a debate and argument analysis curriculum. Talent, 10(1), 21–39. https://doi.org/10.46893/talent.758473

56.

Wang

Y. Y.

Chuang

Y. W.

(2024). Artificial intelligence self-efficacy: Scale development and validation. Education and Information Technologies, 29(4), 4785–4808. https://doi.org/10.1007/s10639-023-11967-0

57.

Woo

Cumming

T. M.

O’Neill

S. C.

(2024). South Korean University lecturers’ opinions about initial teacher education in gifted education. Journal of Advanced Academics, 35(3), 482–508. https://doi.org/10.1177/1932202X241230712

58.

Worthington

R. L.

Whittaker

T. A.

(2006). Scale development research: A content analysis and recommendations for best practices. The Counseling Psychologist, 34(6), 806–838. https://doi.org/10.1177/0011000006288127

59.

Zhang

Wang

(2025). The development and validation of a scale on student AI literacy in L2 writing: A domain-specific perspective. Journal of Second Language Writing, 69(September), 101227. https://doi.org/10.1016/j.jslw.2025.101227

60.

Zare

Othman

(2015). Students’ perceptions toward using classroom debate to develop speaking skills. Asian Social Science, 11(9), 158–170. https://doi.org/10.5539/ass.v11n9p158