Abstract
We developed and validated the Debate Skills Scale (DSS), grounded in social cognitive theory, to address the lack of an instrument for assessing debate competencies among high school students, as existing measures primarily assess general critical thinking rather than debate performance. Data were collected from 389 students, and an initial 49-item pool was generated based on literature and expert input. Exploratory Factor Analysis revealed a four-factor structure explaining 50.95% of the variance and reduced the scale to 29 items. Subsequent confirmatory factor analysis (CFA) removed four items, resulting in a final 25-item scale. CFA supported this structure, with internal consistency reliability ranging from acceptable to excellent, although lower reliability was observed for the Speaking Skills subscale. Discriminative validity analyses indicated that the DSS differentiated students based on debate experience and performance. The DSS provides a psychometrically validated self-report instrument for assessing perceived debate competencies and may inform instructional planning.
Keywords
Introduction
Debate is widely implemented in secondary education and frequently embedded within advanced academic and gifted programming as a structured pedagogical method for cultivating higher-order reasoning, argumentation, and strategic communication (Baketa et al., 2023; Shum et al., 2024; Thornhill-Miller et al., 2023). Within gifted education in particular, debate functions as a performance-based domain through which students demonstrate cognitive complexity, rhetorical fluency, and ethical engagement.
Despite this instructional importance, the field lacks a domain-specific, psychometrically validated instrument designed to measure debate-specific competencies distinct from general critical thinking constructs (Ennis, 2009). This measurement gap limits systematic evaluation of perceived debate competencies and constrains enrichment-oriented instructional planning, necessitating the development of a domain-specific, psychometrically validated measure of debate skills.
Contemporary educational paradigms emphasize assessing complex competencies beyond knowledge acquisition (Bhardwaj et al., 2025; González-Pérez & Ramírez-Montoya, 2022), reinforcing the need for reliable tools to evaluate structured argumentation and strategic communication in performance-based domains such as debate (Walker & Kettler, 2020).
However, such competencies are often difficult to observe directly in classroom settings, increasing the need for well-designed self-report instruments. Social cognitive theory posits that individuals’ beliefs about their capabilities influence performance behaviors, persistence, and strategic engagement (Bandura, 2006). In performance-based domains such as debate, students’ perceived self-efficacy may shape how effectively they construct arguments, respond to counterclaims, and regulate their speaking performance. Accordingly, measuring self-perceived debate competencies provides theoretically grounded insight into mastery and agency in performance. Within social cognitive theory, self-efficacy develops through mastery experiences, observational learning, social persuasion, and performance feedback (Bandura, 2006). In debate settings, repeated argumentative exchanges and peer modeling shape performance beliefs related to argument construction, communication, ethical positioning, and speaking regulation. Accordingly, a multidimensional measurement framework grounded in social cognitive theory is theoretically warranted.
Importantly, self-efficacy is not a generalized trait but a domain-specific construct that varies across contexts and performance conditions (Bandura, 2006). Recent scale development studies reinforce this distinction. For example, Wang and Chuang (2024) demonstrated that general technology self-efficacy measures were insufficient to capture artificial intelligence–specific performance beliefs, necessitating the development of a domain-tailored instrument. Similarly, Greco et al. (2022) emphasized that academic self-efficacy is multidimensional and must reflect context-bound competencies rather than global confidence beliefs. These findings support the argument that debate-related self-efficacy cannot be adequately inferred from general critical thinking or academic self-efficacy scales. Instead, a theoretically grounded, domain-specific instrument is required to capture the performative, interactive, and strategic nature of debate.
Research in advanced academic education has increasingly emphasized the need for domain-specific assessment of performance-based competencies (Thornhill-Miller et al., 2023). However, few instruments isolate debate-specific competencies as a distinct domain of performance. Precise measurement becomes particularly important in enrichment-oriented contexts where instructional decisions rely on valid assessment of students’ demonstrated strengths. Recent psychometric research further underscores the necessity of domain-specific measurement. Xu et al. (2025) demonstrated that generic competency frameworks fail to capture context-dependent performance behaviors, thereby strengthening construct validity when domain-tailored instruments are developed. By extension, debate, particularly in gifted and performance-oriented settings, requires a similarly specialized measurement framework.
Educational methods play a central role in fostering 21st-century competencies such as critical thinking, collaboration, and communication (González-Pérez & Ramírez-Montoya, 2022). Among these, debate functions as a performance-based pedagogical strategy through which students develop structured argumentation, strategic rebuttal, verbal fluency, and ethical engagement (Baketa et al., 2023; Shum et al., 2024). Beyond general cognitive skills, debate has been associated with civic awareness and democratic engagement (Baketa et al., 2023), strengthening students’ capacity for complex analysis, persuasive reasoning (Demir et al., 2016), and cognitive and interpersonal flexibility in dynamic learning contexts (Örün & Sever, 2025).
Debate Technique and Education
Debate is explicitly included in the 2024 Turkish Language and Literature Curriculum as a recommended instructional activity (MoNE, 2024a, 2024b). As both a pedagogical method and oral discourse practice, it supports cognitive, emotional, and social development (Çabuk & Yeni, 2016). Through structured engagement, students analyze, defend, and evaluate competing perspectives while developing tolerance and adaptability (Kennedy, 2009).
Historically rooted in the dialectical traditions of Ancient Greece and later integrated into Islamic intellectual practices as “Jedel,” debate has long functioned as a structured method for examining opposing perspectives (Benli, 2019; Büyükdinç, 2007; Darby, 2007). Although debate draws upon critical thinking, it often requires participants to argue positions independent of personal beliefs, emphasizing rhetorical performance rather than internal conviction (Ennis, 2009).
In practice, debate requires participants to present contrasting perspectives on real or hypothetical issues. It encourages group collaboration, links discussions to real-life scenarios, and encourages the use of structured rhetorical techniques and evidence presentation within assigned perspectives (Graefe, 2024). The process often culminates in a reflective summary, where participants refine their stances or introduce additional evidence to support their claims. In competitive settings, a jury may evaluate debates, providing feedback and scoring based on argument quality and engagement (Arung & Jumardin, 2016; Iman, 2017). This distinction is crucial for researchers: debate can facilitate the development of critical thinking, but it does not inherently demonstrate it, especially when argument roles are assigned regardless of personal beliefs (Elder & Paul, 2020).
As curriculum implementers, teachers shape the educational value of debate by contextualizing it within student-centered pedagogies and guiding critical reflection beyond rhetorical performance (Kim et al., 2024). Allowing students to participate in topic selection enhances motivation and engagement (Zare & Othman, 2015). Additionally, structured guidance from teachers fosters autonomy and accountability, enabling students to take ownership of their learning (Boumediene et al., 2021). Teacher preparation is, therefore, critical. Recent findings emphasize that teachers with training in gifted education are better equipped to identify and support high-ability learners, suggesting that the DSS could complement professional development in this area (Woo et al., 2024).
The benefits of debate extend beyond intellectual development. Research links debate to enhanced critical thinking, public speaking, emotional intelligence, and respect for diverse perspectives (Boumediene et al., 2021; Chikeleze et al., 2018; Iman, 2017; Rodriguez-Dono & Hernandez-Fernandez, 2021). It has also been associated with resilience, collaboration, adaptive problem-solving (Darby, 2007), and creativity and teamwork in active learning environments (Laia, 2019; Lumbangaol & Mazali, 2020). However, without structured debriefing or self-evaluation mechanisms, the extent to which these outcomes translate into sustained and measurable competencies remains uncertain (Bandura, 2006).
In summary, the debate technique is a versatile and comprehensive teaching strategy that cultivates students’ cognitive, social, and emotional capacities. Promoting active engagement, critical analysis, and collaboration equips learners with the skills necessary to address complex real-world scenarios, solidifying its relevance as a versatile instructional tool, particularly when aligned with curriculum objectives and implemented through reflective pedagogy.
Review of Existing Debate-Related Scales
The debate technique has been widely used in educational contexts, particularly in language learning, to foster argumentation, public speaking, and structured reasoning skills, which, in turn, may support the development of critical thinking (Rosyid & Hidayati, 2019). Although several established instruments assess reasoning and critical thinking—such as the New Jersey Test of Reasoning Skills (1983), the California Critical Thinking Dispositions Inventory (1992), and the James Madison Test of Critical Thinking (2004) these tools evaluate general cognitive tendencies rather than debate-specific performance behaviors (Ennis, 2009). For example, the California Critical Thinking Dispositions Inventory evaluates general reasoning dispositions but does not assess time-bound argumentative exchange or structured rebuttal performance in dialogic contexts (Ennis, 2009). This limitation aligns with broader psychometric findings indicating that transferring general self-efficacy or reasoning instruments into specialized performance domains risks underrepresentation of their constructs (Wang & Chuang, 2024; Xu et al., 2025). Similarly, the James Madison Test of Critical Thinking measures analytical reasoning but does not capture interactive strategic communication under assigned positions. The New Jersey Test of Reasoning Skills focuses on logical problem-solving rather than rhetorical fluency, ethical positioning, or performance regulation within competitive debate formats.
Although adapted versions such as the Cornell Critical Thinking Test Level X (Özensoy, 2011) and the UF/EMI Critical Thinking Disposition Scale (Ertaş Kılıç & Şen, 2014) have been used in debate-related research, their focus remains on general cognitive dispositions rather than debate-specific performance behaviors.
To provide a structured comparison of representative instruments and their limitations for debate assessment, Table 1 summarizes their primary purposes, sample contexts, and conceptual constraints.
Comparison of Existing Instruments and Their Limitations for Debate Assessment.
As summarized in Table 1, existing instruments primarily assess general reasoning or cognitive dispositions. None of these tools operationalize debate as an interactive, performance-based domain requiring strategic rebuttal, communicative fluency, ethical positioning, and time-regulated speaking.
Despite these contributions, there is a notable absence of a valid and reliable tool to assess debating skills. This reinforces the distinction between evaluating general cognitive tendencies and assessing debate-specific skills such as structured rebuttal, speaking fluency, and strategic engagement. Given this conceptual gap, a new tool is needed to address the debate's performative, rhetorical, and ethical aspects, distinct from general cognitive traits.
Grounded in social cognitive theory, we hypothesized a four-factor structure reflecting (a) cognitive mastery in argumentation and strategy, (b) social modeling and communicative interaction, (c) moral agency in ethical collaboration, and (d) performance efficacy in speaking skills. Consistent with multidimensional self-efficacy models in academic contexts (Greco et al., 2022), we conceptualized debate competence as a structured set of interrelated but distinct performance beliefs rather than a unidimensional construct.
Each proposed factor was conceptually derived from core components of social cognitive theory, linking domain-specific self-efficacy beliefs to cognitive mastery, social modeling processes, moral agency, and performance regulation. This study addresses this critical gap by developing and validating the Debate Skills Scale (DSS) specifically for high school students. The research has two primary objectives:
To establish the validity and reliability of the DSS. To analyze the DSS's exploratory and confirmatory factor structures for high school students.
Method
Research Design
This study employed an exploratory sequential mixed-methods design (QUAL → QUAN), in which qualitative findings informed subsequent quantitative scale validation (Creswell & Plano Clark, 2018). This research design collects and analyses qualitative and quantitative data during distinct phases, with one method taking precedence (Leech & Onwuegbuzie, 2009). The qualitative phase involved multiple steps to inform the development of the DSS items: an extensive literature review, analysis of existing scale items, observation of debate processes, interviews with students participating in debates, and consultations with experts in the field. These qualitative steps informed the conceptual framework of the DSS and guided the formulation of item content grounded in real-world debate experiences.
Subsequently, the DSS items were presented to participants, and the collected data were subjected to quantitative analysis, constituting the study's main phase. This sequential integration of qualitative insights and quantitative validation ensured a comprehensive approach to developing and testing the DSS. The study design reflects a qualitative-to-quantitative sequence, with the qualitative phase shaping item development and the quantitative phase testing psychometric properties.
Study Group
The sample for this study comprised 389 high school students enrolled in various institutions across Istanbul during the 2023–2024 academic year. Participants were enrolled in public high schools and general secondary schools; no International Baccalaureate, Advanced Placement, or formally designated gifted-track programs were included. Participants were selected using a convenience sampling strategy, which allowed for practical access to students actively engaged in debate events. Inclusion criteria required participants to be enrolled in grades 9–12, to have participated in at least one structured debate activity, and to demonstrate sufficient Turkish literacy to comprehend scale items. Exclusion criteria included incomplete responses exceeding 10% missing data and failure to provide informed consent. Missing data were minimal (<3%). Cases with substantial missing responses were removed using listwise deletion. The remaining missing values were handled using full information maximum likelihood (ML) during CFA estimation. High school students were selected based on prior findings that indicate their prominent and consistent participation in debate practices during formal education (Kılıç, 2013). The DSS was administered in group classroom settings using paper-and-pencil forms. Completion time ranged between 12 and 15 min. Administration was conducted under the researcher's supervision to ensure standardized instructions. The demographic characteristics of the participants are presented in Table 2, offering a detailed overview of their distribution across relevant categories. These demographic distributions offer contextual grounding for subsequent validity analyses and ensure transparency in sample representation.
Demographic Information of EFA and CFA Participants.
EFA = exploratory factor analysis; CFA = confirmatory factor analysis.
The exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) sample consisted of 389 participants, collected in two separate phases. Specifically, data from 202 participants were utilized for the EFA, while 187 participants were allocated for the CFA. Among the participants, 135 (34.70%) were male, and 254 (65.30%) were female.
Institutional Review Board approval was obtained from İstanbul Sabahattin Zaim University. No personally identifiable information was collected during the study. The research posed minimal risk, as it involved only completing a self-report scale to evaluate students’ debate-related competencies. The items in the DSS were shared in advance with both students and school administrators to ensure transparency and comprehension. High school students from public schools in two provinces of Türkiye were included through voluntary participation, with permission from school principals. No incentives were offered, and no classroom instruction was disrupted during data collection. The findings are intended to support students’ skill development and inform future instructional practices related to argumentation and communication.
Development Process of the Measurement Tool
The DSS development process followed the methodological principles outlined by DeVellis (2014), which include eight key steps in scale construction: defining the construct, generating an item pool, determining the format, conducting expert review, pretesting, analyzing items, performing factor analysis, and assessing reliability. The construct to be measured was defined as a latent psychological skill set involving observable and self-reported indicators of debate-specific competencies, such as argument construction, strategic rebuttal, verbal fluency, and ethical collaboration. An initial pool of 80 items was developed based on an extensive literature review, existing instruments related to communication and reasoning skills, structured observations of school debate activities, and semistructured interviews with 12 students who had participated in interschool debate competitions. Items reflected four expected domains: argumentation, speaking ability, teamwork/ethics, and debate strategy. These domains were theoretically derived and empirically validated.
The qualitative interview data were analyzed using inductive content analysis. Two independent coders reviewed the transcripts and developed initial open codes. Codes were compared, refined through discussion, and organized into higher-order categories and themes. Intercoder agreement was calculated using percent agreement and reached 87%, indicating satisfactory reliability. A codebook outlining definitions and exemplar quotations is provided in Appendix A. Table 3 presents the code–category–theme structure that informed item generation.
Code–Category–Theme Structure Informing DSS Item Development.
DSS = Debate Skills Scale.
To ensure content validity, the study adhered to the guidelines of Lawshe (1975) and McKenzie et al. (1999), which recommend consulting at least five experts during scale development. Content validity was assessed using Lawshe's (1975) method. Content Validity Ratio (CVR) values were calculated, and items that did not meet the threshold (CVR < .99 for five experts) were revised or eliminated. The expert panel consisted of two faculty members specializing in Turkish Education, two literature teachers, and one debate expert. These experts assessed the clarity, appropriateness, and relevance of the items. Based on their evaluations, items were added, revised, or removed as necessary, resulting in the elimination of 31 items. The full item reduction process across all stages is summarized in Appendix B.
The revised version was reviewed by a separate group of five students for clarity and comprehensibility prior to implementation, serving as a basic pilot for language control. However, no formal pilot data analysis was conducted. Although the EFA sample size approached recommended participant-to-item ratios, the ratio (approximately 4:1) was slightly below conservative guidelines and is acknowledged as a limitation (Worthington & Whittaker, 2006). The final draft of the DSS consisted of 49 items and utilized a 5-point Likert-type response format (1 = “strongly disagree” to 5 = “strongly agree”).
Data Analysis
The validity and reliability of the 49-item draft scale were assessed using SPSS and AMOS. To evaluate construct validity, an EFA was conducted using principal axis factoring (PAF) to identify latent constructs from the items. Before EFA, assumptions of normality, linearity, and absence of multicollinearity were verified to ensure the robustness of the factor structure. PAF was chosen because it better captures latent psychological constructs than data-reduction techniques (Fabrigar et al., 1999).
Following EFA, CFA was conducted using ML estimation in AMOS 24 to validate the proposed factor structure. Model fit was evaluated according to Hu and Bentler's (1999) criteria (CFI ≥ .90, RMSEA ≤ .08, SRMR ≤ .08). The model demonstrated acceptable fit (χ2/df = 2.1, RMSEA = .058, CFI = .94, TLI = .92, SRMR = .042), supporting the four-factor structure identified during EFA. This two-step approach ensured that the scale's factor structure was theoretically sound and statistically robust. Cronbach's alpha coefficients were calculated to assess internal consistency. Additionally, McDonald's Omega values were computed for each subscale, providing more robust estimates for multidimensional constructs (Hayes & Coutts, 2020). Subscale alpha values ranged from .78 to .91, indicating strong internal consistency.
Prior to performing EFA and CFA, the appropriateness of the sample was evaluated using the Kaiser–Meyer–Olkin (KMO) measure of sampling adequacy and Bartlett's Test of Sphericity. Additionally, skewness and kurtosis values were examined for normality. All values remained within ±2, supporting the use of ML estimation (Gravetter & Wallnau, 2011). All analyses were conducted after confirming that multicollinearity was absent among the items, ensuring a distinct measurement of the underlying constructs (Kline, 2023). Multivariate normality was assessed using Mardia's coefficient and indicated no severe deviation. ML estimation was employed. The KMO value of 0.876 indicated meritorious sampling adequacy (Hair et al., 2014), while Bartlett's test of sphericity (χ2(1176) = 4491.119, p < .001) confirmed that the data were suitable for factor analysis (Fabrigar et al., 1999). Items with factor loadings below .40 or high cross-loadings on multiple factors were removed. A total of four items were excluded, resulting in a final scale of 25. A detailed overview of item reduction across stages is provided in Appendix B.
Researchers’ Positionality
The primary researcher has prior experience in debate coaching and curriculum-based argumentation instruction. To minimize potential bias, qualitative coding was conducted independently by two researchers, and all analytic decisions were documented. The research team includes members with experience in psychometric analysis and mixed-methods research design.
Findings
Results of EFA
Based on prior literature and expert input, the scale was expected to reflect four potential debate skill domains: argumentation/strategy, communication, ethics/collaboration, and speaking. The emergence of four factors in the EFA was consistent with this theoretical expectation, supporting the scale's construct validity. An EFA was conducted on a sample of 202 participants using the initial 49 items of the scale, yielding four distinct dimensions. A CFA was subsequently conducted on an independent sample of 187 participants to validate the factor structure. Following EFA, 20 items were removed due to low or cross-factor loadings, yielding a 29-item structure that explained 50.95% of the total variance. The 29 items were distributed as follows: 16 in Factor 1, four in Factor 2, five in Factor 3, and four in Factor 4. The KMO coefficient was .876, indicating good sampling adequacy for factor analysis. In addition to theoretical expectations, the scree plot and eigenvalue criteria provided support for the four-factor solution. The scree plot illustrating the number of factors obtained through EFA is presented in Figure 1, providing a visual representation of the factor structure.

Scree plot graph.
As a result of the EFA conducted on the 49 items, the factor loadings and the total variances explained by each factor are presented in Table 4. This table provides a detailed breakdown of each factor's contribution to the overall variance, offering a clear representation of the scale's dimensional structure.
Factor Loadings From the Exploratory Factor Analysis.
In the scale development process, factor loadings were categorized as high (.60 or above) or medium (.30–.59), regardless of their sign (Büyüköztürk et al., 2018). Items with factor loadings below .30 were removed, and as shown in Table 4, all remaining factor loadings were above .50. The EFA revealed that the 29-item structure accounted for 50.95% of the total variance. Specifically, the first factor accounted for 30.55% of the variance; the second, 8.21%; the third, 6.43%; and the fourth, 5.76%.
The EFA identified four distinct factors within the scale. The first factor, consisting of 16 items (2, 3, 21, 23, 24, 26, 29, 32, 33, 34, 36, 37, 39, 40, 41, 43), was labeled “Argumentation and Strategy Skills.” Examples of this factor include: “I can bring a new angle to the match” and “I can develop logical and coherent arguments.” The second factor, comprising four items (11, 12, 13, 14), was designated as “Communication Skills,” with example items such as: “I use my body language effectively.” The third factor, which includes five items (17, 18, 27, 47, 49), was named “Debate Ethics and Collaboration Skills.” An example of this factor is: “I use respectful language in arguments.” Finally, the fourth factor, consisting of four items (30, 31, 35, 42), was identified as “Speaking Skills.” An illustrative item for this factor is: “I do not fall into repetitions and repetitions in speeches.” All items were originally developed and validated in Turkish. English translations are provided for reference purposes only. The full set of final DSS items is presented in Appendix C.
Results of CFA
CFA was conducted on an independent sample of 187 participants to validate the structure identified through EFA. Figure 2 presents the path diagram and factor loadings for the four-factor structure of the DSS, providing a comprehensive visualization of the model.

Path diagram of CFA results.
During CFA, four items (M14, M27, M35, M42) were removed due to insufficient standardized factor loadings. This resulted in a final 25-item structure distributed as 16 items in Argumentation and Strategy Skills, three in Communication Skills, four in Debate Ethics and Collaboration Skills, and two in Speaking Skills. These items were theoretically reviewed and found to duplicate content already measured in other factors or reflect ambiguous behavioral indicators not central to the core construct definitions. Their removal improved model fit without altering the underlying four-factor structure suggested by EFA, preserving theoretical integrity. All standardized loadings exceeded the acceptable threshold of .30, ranging from .52 to .98. Examination of the path diagram revealed that all items demonstrated good, very good, or excellent factor loadings, except for one item (M2), which had an acceptable factor loading of 0.52. The fit indices for the DSS are presented in Table 5, providing further evidence of the DSS's structural validity.
Model Fit Indices of the DSS Confirmatory Factor Analysis.
Note. DSS = Debate Skills Scale; RMSEA = Root Mean Square Error of Approximation; CFI = Comparative Fit Index; TLI = Tucker-Lewis Index; IFI = Incremental Fit Index; SRMR = Standardized Root Mean Square Residual; RMR = Root Mean Square Residual; GFI = Goodness-of-Fit Index.
Model fit was evaluated according to Hu and Bentler (1999).
The removal of the four items did not result in the emergence of new latent structures or the collapse of existing dimensions. Instead, it strengthened the original factor solution by enhancing clarity and reliability without deviating from the EFA's theoretical framework. The fit indices of the DSS, including CMIN/df, RMSEA, CFI, TLI, IFI, RMR, and SRMR, indicated acceptable fit, indicating that the model aligns well with the observed data. Although modification indices were reviewed, no error covariances were added because they lacked theoretical justification. The model achieved an acceptable fit without requiring such modifications. Although GFI was slightly below conventional cutoffs, other fit indices indicated acceptable model fit (Hu & Bentler, 1999).
The correlation coefficients between the subdimensions of the measurement tool and the items were analyzed to assess internal consistency, as Karakoç and Dönmez (2014) suggested. These coefficients provide insight into the degree of alignment between items within each factor and the overall structure of the DSS. The correlation coefficients for the four-factor structure, as identified through EFA and confirmed through CFA, are presented in Table 6, offering a detailed evaluation of the DSS's internal consistency.
Pearson Correlations Between the Debate Skills Scale (DSS) and Its Subdimensions.
**p < .001.
An examination of Table 6 reveals statistically significant correlations among the subdimensions of the DSS, indicating high internal consistency. Specifically, a moderate positive correlation was observed between the ASS subdimension and the CS subdimension (r = .399, p < .01), as well as between the ASS subdimension and the DECS subdimension (r = .459, p < .01). The ASS subdimension also demonstrated a moderate positive correlation with the SS subdimension (r = .525, p < .01) and a high positive correlation with the DSS score (r = .952, p < .01). Similarly, the CS subdimension showed moderate positive correlations with the DECS subdimension (r = .348, p < .01), the SS subdimension (r = .322, p < .01), and the DSS score (r = .599, p < .01). Additionally, the DECS subdimension exhibited a moderate positive correlation with the SS subdimension (r = .622, p < .01) and a high positive correlation with the DSS score (r = .617, p < .01). All relationships were statistically significant at the .01 level.
The consistent positive direction of these correlations indicates that the measurement tool demonstrates high internal consistency. This finding supports the DSS's construct validity, confirming its reliability and suitability for assessing the intended constructs.
The DSS items aim to differentiate individuals with the measured characteristic from those without. To achieve this, a comparison was conducted by forming upper and lower groups comprising the top 27% and bottom 27% of participants based on their item scores (Büyüköztürk, 2018). The discrimination of the 25 items in the DSS was analyzed using an independent-samples t-test across these groups. Table 7 presents detailed information on the discrimination of the DSS items, demonstrating their ability to effectively distinguish between individuals with varying levels of the measured characteristic.
Comparison of the Values of the 27% of Upper-Lower Groups.
An analysis of Table 7 reveals a statistically significant difference favoring the upper group for all items (p < .001). This result indicates that each item on the DSS effectively distinguishes between individuals with high and low levels of the measured characteristic, thereby confirming the DSS's discriminatory capability.
Findings Regarding Reliability
Analysis of Internal Consistency and Reliability
The internal consistency of the DSS and its subdimensions was assessed using Cronbach's Alpha (α) coefficients. The findings indicated that the DSS exhibits internal consistency reliability ranging from acceptable to excellent across all dimensions. The DSS demonstrated strong internal consistency for the overall scale and most subdimensions. The ASS subdimension yielded α = .912 (excellent). The CS subdimension yielded α = .864 (good). The DECS yielded α = .708 (good). The SS yielded α = .637 (adequate). The total DSS yielded α = .917 (excellent). The results align with the interpretation criteria set out by Kılıç (2016), which categorize Cronbach's alpha coefficients as follows: > .90 = excellent, .70–.89 = good, .60–.69 = acceptable, .50–.59 = poor, and < .50 = undesirable. The DSS demonstrated strong internal consistency for the overall scale and most subdimensions. However, the Speaking Skills subscale showed acceptable but comparatively lower reliability (α = .637), suggesting the need for further item development in future research.
Evaluation of Unidimensionality
In addition to internal consistency, the unidimensionality of the DSS was assessed using second-order CFA and unidimensional factor analysis to determine whether all items could be consolidated into a single overarching construct of debate abilities. The fit indices did not reach acceptable thresholds, indicating that a unidimensional structure was not supported.
Thus, these findings indicate that the DSS cannot be regarded as a unidimensional scale. Each subdimension, Argumentation and Strategy Skills, Communication Skills, Debate Ethics and Collaboration Skills, and Speaking Skills, should be considered an independent yet interconnected element of students’ debate competency. These findings substantiate the theoretical justification for a multidimensional framework, consistent with the definition of debate skills, which encompasses cognitive, social, ethical, and expressive competencies. Consequently, forthcoming implementations of the DSS should evaluate and interpret each subdimension autonomously, while acknowledging their interrelated contributions to the overarching construct of argument skills.
Discussion
This study contributes to the existing literature by highlighting the essential role of cognitive skills, particularly critical thinking, in assessing debate ability (Ennis, 2009). Based on this theoretical perspective, the developed scale's Argumentation and Strategy Skills dimension embodies the cognitive foundation for systematic, evidence-based thinking. This study builds upon previous research by developing and validating a psychometrically sound, multidimensional tool to assess high school students’ debate competencies. Importantly, the DSS is a self-report measure and therefore captures students’ perceived debate competencies rather than observed debate performance. Accordingly, results should be interpreted as beliefs about debate-related skills (e.g., self-efficacy–aligned perceptions), not as direct indicators of competitive outcomes or external ratings. The full scale and sample items are provided in Appendix C to support transparency and future use.
Compared with instruments that emphasize a single facet (e.g., general reasoning dispositions or speaking confidence), the DSS operationalizes debate as a multidimensional competency profile that includes cognitive (argumentation/strategy), communicative, ethical/collaborative, and speaking-related components. This broader operationalization is intended to reduce construct underrepresentation when debate is used as a performance-based instructional method.
The scale development process progressed through successive refinement stages. The initial 80-item pool was reduced to 49 items following expert review. EFA refined the DSS to a 29-item structure across four factors, accounting for 50.95% of the total variance. CFA subsequently supported the four-factor structure and resulted in a final 25-item scale.
The first and most significant aspect, Argumentation and Strategy Skills, consists of 16 items that represent technical debate knowledge, the formulation of logical arguments, the presentation of organized speeches, and the application of evidence-based argumentation. This feature aligns closely with other studies highlighting structured reasoning and critical thinking as fundamental components of debate ability (Chikeleze et al., 2018). Recent studies also demonstrate how debate fosters critical participatory literacy and student agency, positioning argumentation not just as persuasion but as a tool for civic engagement and problem-solving (Malloy et al., 2020).
The second category, Communication Skills, comprises four components that emphasize both nonverbal and verbal communication, including body language, tone, diction, and audience participation. These competencies align with 21st-century educational frameworks, such as the P21 model (Partnership for 21st Century Learning, 2019), which emphasizes effective communication in student development. This aligns with research highlighting the importance of collaborative meaning-making and intercultural communication in advanced academic contexts, where debate functions as both a cognitive and socioemotional learning tool (El Majidi et al., 2023, 2024). Debate, as a pedagogical approach, fosters the development of these skills through active engagement and reflective discourse (Demir et al., 2016). These findings are consistent with broader 21st-century skill frameworks emphasizing the “4Cs”—creativity, critical thinking, communication, and collaboration (Thornhill-Miller et al., 2023).
The third element, Debate Ethics and Cooperation Skills, encompasses students’ capacity to engage in ethical and constructive interactions. These qualities are crucial for advancing democratic principles and fostering interpersonal respect, as evidenced by previous research (Graefe, 2024; Rodriguez-Dono & Hernandez-Fernandez, 2021; Woo et al., 2024; Zare & Othman, 2015). This feature illustrates the overarching social roles of debate in promoting mutual understanding and collaborative problem-solving (Demir et al., 2016). Moreover, recent findings suggest that equitable debate structures can empower underrepresented students and reduce systemic barriers in advanced academic programs (Baketa et al., 2023).
The fourth element, Speaking Skills, was deliberately differentiated from general communication to highlight linguistic clarity, coherence, and fluency. While both communication and speaking involve expression, the communication factor captured broader interpersonal and collaborative elements, whereas the speaking factor reflected performance-based delivery skills. This distinction highlights the multidimensional nature of debate competencies. It facilitates a more focused assessment of linguistic competence. Elements within this dimension pertain to maintaining focus, preventing redundancy, and guaranteeing the pertinence of spoken information. These features correspond with research indicating that debate involvement improves speaking skills and communication confidence (Chikeleze et al., 2018; Iman, 2017). Parallel findings on cognitive engagement in accelerated curricula demonstrate that nuanced skill sets—such as communication and argumentation—play critical roles in sustaining motivation and achievement in advanced academic contexts (Shum et al., 2024). Collectively, these four criteria provide a comprehensive and conceptually robust framework for evaluating perceived debate competencies.
Implications for Classroom Practice
The DSS can be used as a formative assessment to support debate instruction by helping teachers and students identify perceived strengths and instructional needs across four domains (argumentation/strategy, communication, ethics/collaboration, and speaking). For example, teachers can administer the DSS before and after a debate unit to monitor perceived growth and to plan targeted mini-lessons (e.g., rebuttal structure, evidence use, or speaking clarity).
In enrichment-oriented settings, DSS profiles may inform differentiated supports. Students reporting high argumentation/strategy but lower speaking skills may benefit from structured delivery practice, while students reporting lower ethics/collaboration may benefit from explicit norms, role rotation, and reflection routines that promote respectful discourse.
Because the DSS is a self-report tool, it is best used alongside complementary evidence (e.g., teacher observations, peer feedback, debate rubrics, or performance ratings) to triangulate instructional decisions and reduce the risk of over-interpreting perceptions as performance.
Beyond classroom practice, the DSS may also inform broader research and policy discussions. Prior research suggests that debate participation is associated with gains in critical thinking, literacy, and postsecondary outcomes, particularly among historically underrepresented students (Schueler & Larned, 2025). In this context, the DSS provides a structured framework for examining how students perceive their debate-related competencies across multiple domains.
The scale may support equity-oriented reflection in advanced academic settings by helping educators monitor participation patterns and responsiveness to students who may be less confident or less vocal. However, claims regarding reductions in opportunity or excellence gaps remain provisional until future studies link DSS scores to longitudinal outcomes and independent performance indicators.
Beyond Türkiye, cultural adaptation and cross-national validation may extend the scale's applicability to diverse educational systems (Ho et al., 2025). Its multidimensional framework provides researchers with a structured tool for examining debate-related competencies in cognitive, ethical, and communicative domains.
Conclusion, Limitations, and Future Directions
This research introduces the DSS as a valid and reliable tool for evaluating the complex aspects of debate competencies among high school students. The DSS effectively addresses a significant methodological gap in educational assessment by integrating cognitive, communicative, ethical, and expressive elements. The multidimensional structure illustrates the complexity of debate as a pedagogical and developmental tool, providing educators with a nuanced framework for assessing students’ perceived competencies in debate contexts. Although this study contributes valuable insights, it has several limitations. First, the sample included substantially more female than male participants across both the EFA and CFA samples, which may limit generalizability given evidence of gendered participation and outcomes in some competitive debate contexts. Second, data were collected in Istanbul, Türkiye; debate norms and ethical frameworks may vary across cultural contexts (e.g., emphases on autonomy/individual rights versus collective harmony/duty), so the DSS may require cultural adaptation and revalidation before use in other settings. Third, the DSS relies on self-report, which is susceptible to social desirability bias, self-enhancement, and differences in self-awareness; thus, scores reflect perceived competencies rather than observed performance. Fourth, convergent and discriminant validity evidence with external measures (e.g., communication apprehension, critical thinking, self-efficacy, or independent debate performance ratings) was beyond the scope of this study and should be examined in future research. Fifth, the EFA sample size was close to the minimum recommendations for the initial 49-item pool, which may affect the stability of the factor solution and warrants replication with larger samples. Finally, subscale lengths were unequal (e.g., a 16-item factor and a two-item factor), which may influence reliability estimates and content coverage; future work should consider further item development to strengthen the shorter subscales.
Future research must address these limitations by implementing the DSS across various educational stages, encompassing primary, middle, and postsecondary levels. Further, integrating DSS with studies on cognitive engagement and motivational dynamics could deepen understanding of how debate sustains student participation and achievement across diverse contexts (Phuti et al., 2023; Romero-Díaz de la Guardia, 2022). In addition, further studies could explore how the DSS contributes to identifying gifted and high-ability learners, as debate can nurture motivation, empowerment, and advanced cognitive engagement (El Majidi et al., 2023; Malloy et al., 2020). Cross-national studies, such as recent work on gifted education in South Korea, also demonstrate the value of culturally responsive approaches and the need for adaptable instruments across diverse systems (Kim et al., 2024). The translation and cultural adaptation of the DSS may facilitate its application in cross-national studies, enabling comparative analyses of debate competencies across different educational systems. In advanced academic contexts, the DSS may provide a structured way to examine students’ perceived higher-order competencies associated with debate. However, its use for identifying gifted or high-ability learners would require additional criterion-based validation with external performance indicators. Such cross-contextual use may also support equitable pathways for advanced academics by ensuring that enrichment opportunities, such as debate, are accessible to diverse learners across different educational systems. Integrating observational or performance-based assessments with self-report data would enhance the instrument's construct validity and offer a more comprehensive perspective on students’ abilities. The DSS is a theoretically grounded and psychometrically validated instrument that enhances the measurement of debate skills in educational research. The application can potentially enhance assessment practices and instructional design, fostering inclusive, respectful, and critically engaged classroom discourse across diverse educational contexts. In addition, comparative and cross-cultural studies highlight the importance of adapting debate-based assessments across different systems to ensure equitable access to enrichment and advanced learning opportunities worldwide (Ho et al., 2025; Malloy et al., 2020).
Footnotes
Acknowledgments
The authors would like to thank the participating students for their valuable contributions.
Ethics Approval
This study was approved by the Ethics Committee of İstanbul Sabahattin Zaim University, Approval No. 2024/01, date: February 16, 2024. All participants were informed about the purpose of the study and provided written informed consent. Participation was entirely voluntary, and participants were assured of confidentiality, anonymity, and the right to withdraw at any stage. No identifiable personal information was collected. The study posed minimal risk, involving only completion of a self-report scale. Sample items were shared in advance. Students were recruited voluntarily through school administrators in two cities in Türkiye.
Consent to Participate
Participation was voluntary, and informed consent was obtained from all students and their parents.
Consent for Publication
This study did not involve individual person data; hence consent for publication was not applicable.
Author Contributions
All authors contributed to the study design, data analysis, and manuscript preparation. The first author led the conceptual framing and manuscript revisions.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is an output of the scientific research project titled “Applied development model for twenty-first century skills: Enhancing high school students’ debate skills within the framework of faculty-school collaboration,” supported by Istanbul Sabahattin Zaim University.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The data supporting the findings of this study are available from the corresponding author upon reasonable request.
Author Biographies
Appendix C
Final Version of the Debate Skills Scale (DSS) (25 Items)
The DSS was originally developed and validated in Turkish. The items below are presented in their original Turkish form. The full scale is provided to support transparency and use in educational contexts. English translations are provided for reader convenience only; the scale was developed and validated in Turkish.
Instructions: Please indicate the extent to which each statement reflects your debate skills by selecting the most appropriate option ranging from “strongly disagree” to “strongly agree.”
