Abstract
Recent initiatives have sought to integrate dynamic assessment and diagnostic assessment to facilitate language learning. Extending this line of innovation, the present study uses the example of refusals to illustrate the development and validation of a computerized dynamic diagnostic assessment of pragmatic competence (CDDA-P). To enhance the authenticity of assessed pragmatic performance, a bottom-up approach was adopted in item design, in which response options were empirically derived from a corpus of productions by 335 language users. The effectiveness of the CDDA-P was examined from three key perspectives: its ability to diagnose learners’ strengths and weaknesses, to identify their zones of proximal development (ZPDs) and learning potential, and to promote the development of learners’ pragmatic performance. A pretest–immediate posttest-delayed posttest design was employed to track changes in 66 Chinese learners’ performance before and after the implementation of the CDDA-P. Findings reveal that the CDDA-P can provide a fine-grained diagnosis of learners’ strengths and weaknesses in performing L2 refusals and identify their diverse ZPDs and learning potential. Furthermore, learners demonstrated significant improvement after mediation. This study presents both theoretical and methodological implications for the development of integrated dynamic and diagnostic language assessment, while also offering insights into promoting L2 pragmatic competence through assessment.
Keywords
Introduction
In recent years, the field of language testing and assessment has witnessed several advancements in dynamic assessment and diagnostic assessment, as an effort to strengthen the integration of learning, instruction, and assessment. Dynamic assessment incorporates instructional intervention as an integral part of the assessment process (Lantolf & Poehner, 2011). It serves as a mechanism for exploring the extent to which learners can surpass their current language abilities, thereby promoting their learning and development. Although dynamic assessment is closely associated with mediation and instructional support, its diagnostic potential has not been fully explored in more formal or structured assessment contexts (Leontjev et al., 2026). By comparison, diagnostic assessment measures clearly defined language abilities, provides personalized feedback on learners’ strengths and weaknesses, and has enjoyed a longer tradition of research and practice in the field (Lee, 2015). While both approaches fall under the larger umbrella of learning-oriented assessment, they differ in emphasis. A principled integration of dynamic assessment and diagnostic assessment has the potential to enhance the diagnostic, intervention, and predictive functions of assessment.
One way to integrate the two assessment approaches is through the development and use of computerized dynamic assessment (C-DA), which provides real-time, standardized intervention (Poehner & Lantolf, 2013) while simultaneously diagnosing learners’ strengths and weaknesses using items mapped to specific underlying abilities. C-DA is typically operationalized through two distinct formats: the cake format, which embeds mediation directly within the test sequence, and the sandwich format, which delivers mediation between a pretest and a posttest (Sternberg & Grigorenko, 2002). Since its introduction in the early 2000s (Guthke & Beckmann, 2000), C-DA of language abilities (e.g., in reading, Yang & Qian, 2020; in listening, Pileh Roud & Hidri, 2021; in writing, Vakili & Ebadi, 2022; in vocabulary, Ebadi et al., 2018; and in grammar, Poehner & Leontjev, 2018) has gained increasing prominence in the field. More recently, several studies have been devoted to assessing and teaching second language (L2) learners’ pragmatic competence through C-DA (Alavi et al., 2020; Qin & van Compernolle, 2021; Zangoei et al., 2019). However, several methodological gaps remain that limit the diagnostic utility, generalizability, and authenticity of current C-DA research in L2 pragmatics. First, they tend to lack detailed diagnoses of test-takers’ strengths and weaknesses in specific dimensions of the competence being assessed. Moreover, previous studies have predominantly employed the cake format, which limits their ability to evaluate learning outcomes beyond the scope of the C-DA session. Finally, existing studies have predominantly relied on researcher-generated multiple-choice discourse completion tasks (DCTs). The constrained nature of this approach has long raised validity concerns (Kasper, 2008), as limiting responses to predefined options may fail to capture the diversity of learners’ pragmatic performance.
Taking into consideration the aforementioned concerns, the present study aims to develop a computerized dynamic diagnostic assessment of pragmatic competence (CDDA-P) to integrate diagnostic and dynamic features into a single, computerized assessment platform. Taking the speech act of refusal as an example, this study examines the effectiveness of CDDA-P in diagnosing learners’ strengths and weaknesses in English refusals, predicting learners’ zones of proximal development (ZPDs) and learning potential, as well as promoting their pragmatic performance.
Literature Review
Dynamic Assessment and Computerized Dynamic Assessment
Dynamic assessment finds its theoretical roots in Russian psychologist Lev S. Vygotsky’s Sociocultural Theory. The essence of dynamic assessment lies in the provision of external tools or support, known as mediation, by instructors when learners encounter difficulties in doing tasks (Vygotsky, 1978). With the help of mediation, learners can complete tasks beyond the difficulty level of their own actual development (zone of actual development, ZAD), showcasing their developing abilities (i.e., ZPD) (Vygotsky, 1981). From this perspective, dynamic assessment, through the systematic manipulation of mediation, can provide fine-grained information about learners’ current and emergent abilities, thereby potentially supporting changes in learners’ micro-level cognitive processes and contributing to their learning gains (Lidz & Gindis, 2003). This sets it apart from traditional static assessment methods that primarily assess ZAD.
Dynamic assessment can be implemented in two formats: the cake format and the sandwich format (Sternberg & Grigorenko, 2002). In the cake format, mediation is integrated into the test to provide hints for learners whenever they encounter difficulties. Although the cake format facilitates immediate mediation, it is limited in evaluating developmental gains over time or knowledge retention beyond the testing session. By contrast, the sandwich format incorporates mediation between a pretest and a posttest. Both the pretest and posttest are traditional static assessments: the former establishes the baseline of learners’ performance, while the latter captures changes attributable to mediation. It is the integration of these three stages that renders the assessment dynamic (Haywood & Lidz, 2007). By distinguishing baseline performance from post-mediation outcomes, this format is particularly well suited to evaluating the effectiveness of the mediation and tracking the development of learners’ actual and potential abilities over time.
Despite its recognized potential, dynamic assessment has been difficult to implement in large-scale standardized assessment contexts due to its reliance on human mediation (Poehner & Lantolf, 2013). To address this limitation, C-DA has been developed, enabling computers to assume the role of a mediator by providing progressively explicit hints to test-takers (Poehner et al., 2015). In C-DA, instruction and assessment are dialectically integrated and conceptualized as a unified process rather than as separate activities (Lantolf & Poehner, 2004). This integration is operationalized through three types of scores (Kozulin & Garb, 2002): an actual score, representing learners’ independent performance (ZADs); a mediated score, capturing learners’ performance under mediation (ZPDs); and a learning potential score (LPS), which indexes learners’ responsiveness to mediation. LPS refers to “the openness to mediation that enables individuals to appropriate (take over) the meditational process and to use it to regulate their own intellectual activity” (Poehner & Lantolf, 2013, p. 329), highlighting the extent to which learners can internalize instructional support and transfer mediated assistance to subsequent, unassisted performance.
Pragmatic Competence and the Speech Act of Refusal
Conceptualizations of pragmatic competence originate in early models of communicative competence (Taguchi, 2019). Hymes’s (1972) distinction between linguistic and sociolinguistic competence established the foundational insight that successful communication requires more than grammatical knowledge. This view was elaborated in subsequent frameworks (Canale, 1983; Canale & Swain, 1980; Celce-Murcia et al., 1995), culminating in Bachman and Palmer’s (1996, 2010) influential model, which explicitly defined pragmatic competence as comprising functional knowledge and sociolinguistic knowledge. Together, these components capture the ability to perform communicative functions and to adapt language use to situational and social constraints.
Another line of inquiry was advanced by Thomas (1983), who, drawing on Leech’s (1983) typology of pragmatics, distinguished pragmalinguistic competence from sociopragmatic competence. Pragmalinguistic competence concerns the linguistic realization of communicative acts, whereas sociopragmatic competence involves judgments of appropriateness grounded in social norms and cultural values. This bipartite distinction was widely adopted and remains central to contemporary theorizing (Roever, 2022). Jung (2002) expands this framework by incorporating the ability to perform speech acts, convey and interpret non-literal meanings, perform politeness functions, perform discourse functions, and use cultural knowledge—shifting the emphasis from traditional discourse competence to conversational ability. Later, Roever (2011) proposed four components: the ability to recognize and express speech styles, contextualization cues, and discourse structures in extended monologues; the ability to engage in dialogic interaction; the ability to produce and recognize routine formulae; and the ability to comprehend implicatures. While this framework excels in clarity and operational strength, it overlooks critical dimensions such as appropriateness. More recent proposals have sought to highlight the role of multimodal/semiotic resources and learner agency (e.g., Ren, 2022; Taguchi, 2019); however, these classifications lack empirical validation.
In light of the theoretical diversity of pragmatic competence, the present study seeks to conceptualize pragmatic competence along three interrelated dimensions: pragmalinguistic knowledge, sociopragmatic knowledge, and actional competence. Pragmalinguistic knowledge refers to learners’ control of linguistic forms to achieve communicative purposes. Sociopragmatic knowledge involves understanding the social and cultural appropriateness of these forms in specific contexts. Actional competence refers to learners’ ability to perform speech acts in particular communicative events by selecting and deploying appropriate linguistic resources.
Refusal is widely recognized as a significant challenge for language learners in general (Bella, 2014; Eslami, 2010) and Chinese learners in particular (Chang & Ren, 2020; Lee, 2016), due to its face-threatening and intricate nature. Inappropriate refusals may lead to unintended offense and communication breakdown (Taguchi, 2013). Thus, assessing L2 learners’ ability to refuse and teaching effective refusal strategies appear to be essential. Giving dispreferred responses to requests, invitations, offers, and suggestions, known as eliciting acts of refusal, often leads to negative consequences for the interlocutor (Beebe et al., 1990). Different eliciting acts impose different requirements on both form and content for a refusal to be regarded as appropriate, resulting in refusals complex in nature (Eslami, 2010). The appropriateness of refusals is also affected by the power relation between the interlocutors (Al-Mahrooqi & Al-Aghbari, 2016; Nelson et al., 2002). Power refers to the “degree to which the hearer can impose his own plans and his own self-evaluation (face) at the expense of S’s (speaker’s) plans and self-evaluation” (Brown & Levinson, 1987, p. 77). Research suggests that different strategies and forms should be used to refuse interlocutors with different levels of power (Shishavan & Sharifian, 2016).
The assessment of refusal has predominantly been approached using two methods (Taguchi & Li, 2020). The contrastive linguistics approach establishes native speakers’ production as a baseline and compares it to that produced by learners. Lee (2016) examined refusals by L2 learners of English at three grade levels. He found that, as the grade level increased, learners’ use of direct strategies decreased. The rating scale approach evaluates learners’ performance using a rating scale. Hudson et al. (1995) established a comprehensive framework for assessing refusal production through 5-point analytic rating scales on six dimensions: (1) use of correct speech acts, (2) use of formulaic expressions, (3) amount of speech/information, (4) levels of formality, (5) levels of directness, and (6) levels of politeness. In the present study, the contrastive linguistics approach was employed during the test item compilation stage, while the rating scale approach was used to evaluate participants’ production in the pretest, immediate and delayed posttest.
Computerized Dynamic Assessment of L2 Pragmatic Competence
The C-DA of pragmatic competence research is relatively rare compared to language skills, with only a limited number of studies identified (Alavi et al., 2020; Qin & van Compernolle, 2021; Zangoei et al., 2019). Zangoei et al. (2019) pioneered C-DA of L2 pragmatic competence using a cake format C-DA with multiple-choice DCTs adapted from Derakhshan (2014) and (Xu, 2015), encompassing routines, implicature, and speech acts. Their results showed significant differences in test-takers’ responsiveness to mediation based on ZPD levels, highlighting the limitations of static tests that overlook learners’ potential and focus solely on actual performance. Alavi et al. (2020) examined requests and apologies using Hudson et al.’s (1995) multiple-choice DCTs and found that the cake format C-DA positively impacted pragmatic competence. Qin and van Compernolle (2021) applied Taguchi et al.’s (2013) multiple-choice DCTs in a cake format C-DA for indirect meaning comprehension, revealing mediated performance improvements and notable variations in learners’ mediation responses. They concluded that combining mediated and actual performance may offer more comprehensive insights than actual performance alone.
Although these earlier attempts made contributions to the C-DA of pragmatic competence, they appear to face some major limitations. First, they all adopted the cake format, which is limited in assessing the development of learners’ actual abilities, as the mediation is confined to the same session. Second, the tests were adapted from researcher-generated items in previous studies. While such items have their strengths, they might limit the contextual authenticity of the intervention materials, as they are often shaped by the designers’ expectations and may not fully capture the natural variation in learners’ pragmatic expressions. As such, incorporating learner-generated data into item development (Bardovi-Harlig & Su, 2023; Liu, 2006) has been suggested as a promising alternative for potentially enhancing contextual authenticity in assessment.
Diagnostic Assessment
The theoretical roots of diagnostic language assessment date back to the 1960s. Lado (1961, p. 369) defined it as “an achievement test that reveals learners’ strengths and weaknesses in specific language skills or components.” Alderson (2005) suggested that the purpose of diagnostic testing is to identify which areas of learners’ performance require further assistance, uncover potential underlying causes, and provide timely feedback to facilitate subsequent learning. Lee (2015) argued that diagnostic assessment analyzes potential reasons for learners’ weaknesses, and provides timely feedback to promote future learning.
Several influential diagnostic language assessments have previously been designed and implemented. DIALANG, which is no longer available, offered information on reading, listening, writing, vocabulary, grammar, and learner self-assessment, linking performance to the Common European Framework of Reference for Languages (CEFR; Alderson, 2005). The Diagnostic English Language Needs Assessment (DELNA) identifies strengths and weaknesses in listening, reading, and writing skills to support learners’ academic success (Elder & Erlam, 2001). Similarly, the Diagnostic English Language Tracking Assessment (DELTA) is a collaborative project that provides students with diagnostic data to track their strengths and weaknesses in academic English reading and writing (Urmston et al., 2013). These large-scale diagnostic tests often serve as frameworks and references for the design of local diagnostic assessments.
Complementing these large-scale assessments, localized diagnostic instruments have been developed to address specific learner populations and linguistic features. The UDig diagnostic system is a comprehensive online English diagnostic platform based on China’s Standards of English Language Ability, covering listening, speaking, writing, reading, grammar, and vocabulary (Foreign Language Teaching and Research Press, 2023). Research on UDig has examined its potential to link assessment and instruction (e.g., Jin & Yu, 2023), inform curriculum-based diagnosis (e.g., Fan et al., 2021), validate cognitive and learner-related dimensions (e.g., Sun & Song, 2024), and enhance linguistic development (e.g., Pan & Yin, 2019; Yang & Hu, 2024). Similarly, Xie (2020) adopted an item-bank approach in a Hong Kong university setting to diagnose linguistic problems in students’ academic writing, offering fine-grained and contextualized feedback on lexical and syntactic skills. This body of research underscores the pedagogical value of localized diagnostic assessment, illustrating how such tools may bridge assessment and instruction while addressing the nuanced needs of specific learner groups.
Research in diagnostic language assessment has made developments in the provision of fine-grained feedback and the expansion from large-scale standardized systems to localized contexts. Despite these advancements, most existing diagnostic tests focus primarily on receptive skills (listening, reading) and language components (such as grammar and vocabulary). To date, there appears to be no publicly available diagnostic assessment targeting pragmatic competence. Although previous research has explored dynamic assessment of pragmatic competence, such work has frequently emphasized how instructional support promotes learner development rather than diagnostic precision. This conceptual gap calls for an approach that combines diagnostic assessment with dynamic assessment. In response, the present study seeks to move beyond the emphasis on mediated performance to more systematically identify learners’ strengths and weaknesses across dimensions of pragmatic competence.
Computerized Dynamic Diagnostic Assessment: A Novel Approach to Merging Dynamic Assessment and Diagnostic Assessment
The characteristics of dynamic and diagnostic assessment potentially allow for their integration, to provide a more comprehensive report of learners’ abilities. Dynamically, it is intended to predict and foster learners’ development. Diagnostically, it aims to offer detailed insights into the test-takers’ strengths and weaknesses at a micro level. Hence, their combination more effectively supplements language teaching and learning. In response to Kunnan and Jang’s (2009) suggestion that diagnostic feedback be integrated into C-DA test design, effort has been made in the assessment of listening (Kamrood et al., 2019; Poehner et al., 2015) and reading (Yang & Qian, 2020). Meng and Fu (2023) made further advancements in using a cognitive diagnostic assessment approach to make item attributes more accurate, focusing specifically on the domain of English as a Foreign Language (EFL) listening. However, insufficient attention has been paid to pragmatics.
As described above, the present study developed a computerized dynamic diagnostic assessment of pragmatic competence (CDDA-P) by integrating C-DA and diagnostic assessment. This platform was intended to address the limitations of previous C-DA applications in pragmatics by: (a) seeking to provide fine-grained, diagnostic feedback on specific pragmatic dimensions; (b) integrating cake and sandwich formats of dynamic assessment to better capture learning potential and outcomes; and (c) utilizing a bottom-up, empirically derived approach to test item design in an effort to enhance authenticity.
The present study investigated the effectiveness of the platform in diagnosing learners’ strengths and weaknesses, identifying their ZPDs and learning potential, and promoting learners’ pragmatic performance through the following research questions (RQs):
What are the strengths and weaknesses, diagnosed by the CDDA-P, in Chinese EFL learners’ pragmatic performance in different types of refusals?
Can CDDA-P identify learners’ zones of proximal development and learning potential?
Is CDDA-P more effective than traditional static assessment in enhancing Chinese EFL learners’ pragmatic performance?
Method
Research Design
This study consists of two phases: (1) the development of the CDDA-P and (2) its implementation and initial validation. In the first phase, both the CDDA-P and its counterpart for comparison, the computerized static assessment of pragmatic competence (CSA-P), were developed. The CSA-P was included to serve as a comparison condition, allowing for an evaluation of the relative effectiveness of the CDDA-P in the context of pragmatic learning and assessment. The second phase adopted a quasi-experimental design, focusing on the application of the CDDA-P for diagnosing, predicting, and promoting L2 learners’ pragmatic competence. Specifically, the CDDA-P and the CSA-P were administered to the experimental and comparison groups, respectively. In addition to these assessments, a series of online pragmatic tests were administered throughout the study, including a pretest, an immediate posttest, and a delayed posttest, to monitor participants’ pragmatic performance over time.
Participants
A total of 401 participants were included in this study across two phases: platform development and initial validation.
Platform development phase (n = 335): This phase included 14 native English speakers, 73 high school students, 126 non-English major college students, and 122 English major college students. Participants completed oral and written DCTs to elicit their pragmatic responses to refusals. The collected responses were used to construct the item pool for the CDDA-P and CSA-P.
Validation phase (n = 66): This phase involved 66 Chinese college students for the evaluation of the CDDA-P’s effectiveness. These participants, aged 17 to 21 (M = 18.69, SD = 0.857), were divided into two equivalent groups: a CDDA-P group and a CSA-P group. Prior to the main study, the Quick Placement Test (QPT; University of Cambridge Local Examination Syndicate, 2001) was administered to classify participants. Based on their scores, 34 participants (7 males, 27 females) were assigned to the CDDA-P group, with a mean QPT score of 43.03 (SD = 4.026). The CSA-P group consisted of 32 participants (4 males, 28 females) with an average QPT score of 43.03 (SD = 5.058). An independent samples t-test confirmed no significant difference in QPT scores between the two groups (t = −.02, p = .999), indicating comparable English proficiency across both groups.
Instruments
The instruments used in this study include a QPT, a background questionnaire, a semi-structured interview, three written DCTs employed in the pretest, immediate, and delayed posttest, as well as the CDDA-P and the CSA-P.
The background questionnaire was used to collect participants’ demographic information including gender, age, major, student number, and years of learning English. Semi-structured interviews were employed to gain in-depth insights into participants’ impressions and feedback on the assessment (see Supplementary Appendix A and all other supplementary materials on the Open Science Framework [OSF]; Lu, 2026). To ensure clarity, both the questionnaire and interviews were conducted in Chinese. The written DCTs, adapted from previous studies (Beebe et al., 1990; Félix-Brasdefer, 2008; Lee, 2013; Tanck, 2004), included eight refusal scenarios, two request scenarios, and four disagreement scenarios (see Supplementary Appendix B). Refusal scenarios were designed to reflect eliciting acts and power relations in line with the CDDA-P framework. Requests and disagreements acted as fillers in order to help minimize participant bias, maintain engagement, and strengthen the study’s ecological validity. Three parallel versions of the DCT were developed for the pre-, post-, and delayed assessments. Each version contained the same number and types of scenarios, with matched contextual variables (eliciting acts and power relations) and identical rating criteria for the refusal items. The three versions underwent expert judgment to confirm their equivalence in content and expected difficulty.
The CSA-P was administered in a traditional static format as a multiple-choice test, with content identical to that of the CDDA-P. The primary difference between the two groups lay in the provision of prompts: while the CDDA-P group received interactive, real-time metapragmatic prompts during the task, the CSA-P group completed the test without any in-task support. However, following test completion, participants in the CSA-P group were immediately provided with the correct answers and detailed metapragmatic explanations, which mirrored the most explicit prompts used in the CDDA-P. This design aimed to simulate a conventional posttest feedback environment and facilitate self-regulated learning.
Test Design
The design of the test item consisted of two phases: option development and prompt design. The development of item options was informed by a pool of 335 language users’ actual production in DCTs. Despite criticisms of DCTs (e.g., Woodfield, 2008), they effectively capture what is typical and socially acceptable in given situations (Golato, 2003) and enable the collection of substantial data in one session (Nguyen, 2019). Drawing on prior research, the DCTs featured 16 scenarios across eight context types to assess specific dimensions of refusals (see Table 1). The DCTs were administered to 321 Chinese English learners and 14 native English speakers to ensure data diversity. Their responses were coded following Beebe et al.’s (1990) scheme, which classifies refusals into semantic formulas and adjuncts. Semantic formulas are primary expressions used to convey a refusal, while adjuncts are supplementary expressions that support refusals but cannot stand alone as a refusal.
Types of refusal measured by each item.
Subsequently, we enlisted the assistance of an expert with extensive experience in teaching L2 pragmatics, as well as two graduate students specializing in this area. They evaluated the contextual appropriateness of the refusal responses based on the Politeness Principle (Brown & Levinson, 1987). In line with relevant literature (Al-Gahtani & Roever, 2018; Al-Mahrooqi & Al-Aghbari, 2016; Taguchi, 2013) and the examination of learners’ refusals in this study, three key factors were identified as crucial to determining the appropriateness of refusals: directness, justification, and formality of language. Adjuncts such as appreciation, positive opinion, and pause fillers were found to be less important, as they cannot function independently as refusals. Table 2 presents the experts’ judgment of the appropriateness of factors. These factors also demonstrated instructional relevance, as learners frequently misused them, indicating a clear need for targeted teaching.
Appropriateness of Factors.
Native speaker responses were labeled as the most appropriate examples, while learners’ responses were used as distractors, reflecting varying levels of pragmatic competence. Based on the developmental framework derived from previous studies (e.g., Bella, 2014; Taguchi, 2013), we selected the most representative responses that could effectively illustrate the different stages in learners’ pragmatic development in the act of refusal. Specifically, as learners advance in proficiency, they tend to shift from direct strategies toward more indirect forms of refusal (e.g., using justification) and demonstrate increased sensitivity to contextual factors, such as the power relationship between the speaker and the addressee, by adjusting the formality of their expressions accordingly. Specifically, excessive formality with peers and excessive informality with professors both constitute sociopragmatic failures. Therefore, the multiple-choice options in the CDDA-P were carefully designed to reflect the progressivity in pragmatic appropriacy by systematically varying key features such as directness, justification, and formality. These selected responses were then combined to form the response options in the CDDA-P. The rationale for using a combination of responses from different learners is that such an approach better reflects the developmental trajectory of pragmatic competence than the performance of any single learner.
Table 3 presents an example item from the CDDA-P involving the refusal of a professor’s request. In this context, two key dimensions of pragmatic appropriacy were considered: directness and justification. The most appropriate response, modeled after native speakers, featured an indirect refusal with a specific justification. In contrast, the distractors, modeled after learners’ responses varied along these two dimensions: a direct refusal with a specific justification, a direct refusal with a vague justification, and an indirect refusal with a vague justification.
A sample Test Item.
In the second phase, mediational prompts were designed based on Poehner’s (2005) mediation typology and the Regulatory Scale (Aljaafreh & Lantolf, 1994). These prompts were tailored to learners’ needs and graduated in explicitness. The first prompt, Recognition of error, prompts the test-taker to attempt the task again. The second prompt, Reminder of directions, provides a general clue to guide the test-taker. The third prompt, Offer of choices, narrows the guidance by using “yes or no” questions to help the test-taker select an appropriate approach. The final prompt, Presentation of the appropriate response and explanation, supplies the correct answer with a comprehensive explanation. The specific wording of each prompt was tailored to the pragmatic knowledge being assessed by each item. In the sample item, the prompts were designed to guide learners to consider the two key dimensions—directness and justification—when evaluating the appropriacy of each response. To facilitate participants’ understanding, the prompts were written in Chinese. Two American teachers were consulted to proofread and review all materials, and necessary refinements were made to ensure the quality and accuracy of the assessment materials. Examples of mediational prompts are shown in Table 4.
Prompt Design.
Platform Development
The CDDA-P platform comprised a login page, an instruction page, a practice page, the mediation session with 16 test items, and a learner’s scoring profile page (see Supplementary Appendix C). Participants first read the instructions and completed a practice item to become familiar with the test procedures. Additionally, they could revisit previously answered items to reinforce learning, though submitting new responses was not permitted. During the mediation session (Figure 1), learners have up to four chances to answer each item. A correct answer on the first attempt earns four points. For each incorrect attempt, the score is reduced by one until the correct answer is revealed and a score of zero is earned. After treatment, learners receive three scores: the actual score, the mediated score, and LPS. The actual score, representing unmediated ability, is the sum of points earned from first-attempt correct answers. The mediated score reflects performance with support and is the total of all points earned across attempts. LPS is calculated using Kozulin and Garb’s (2002) formula, which helps differentiate between students with high and low learning potential:

The procedure of the CDDA-P.
The scoring profile page presented five types of information: (1) an overall view of learners’ performances, including their actual scores, mediated scores, LPSs, and the total time spent on the platform; (2) the number of hints used for each item in a tabular format; (3) learners’ actual and mediated scores by eliciting acts and power, shown in the form of bar charts; (4) a diagnosis of learners’ relative strengths and weaknesses; and (5) suggestions on how to refuse in an appropriate way.
Data Collection
Test Administration
The test was administered to 66 college students in two quiet classrooms on campus. It spanned a 1 month duration, with the procedure shown in Figure 2. In the first week, participants completed a background questionnaire, QPT, and the pretest. Based on the results of QPT, they were divided into two equivalent groups of CDDA-P and CSA-P. In the second week, they completed the treatment and the immediate posttest. Six participants in each group were selected randomly for interviews. In the fourth week, all the participants completed the delayed posttest.

Test administration procedures.
Scoring Procedure
Two raters scored the pretest, immediate, and delayed posttest data. The raters consisted of (1) an experienced English teacher who was a native speaker of American English, and (2) a PhD student in applied linguistics researching L2 pragmatics and language assessment. They were selected because of their relevant professional and academic backgrounds in English language teaching, L2 pragmatics, and language assessment. The raters were trained by the first author.
The raters used a 6-point holistic rating scale that consisted of three criteria: communicative function, appropriateness, and grammaticality (Li et al., 2019, see Supplementary Appendix D). To ensure rating consistency, the two raters were trained through the following steps: (1) reviewing, familiarizing themselves with, and discussing all materials, including the assessment tasks, scenarios, rating scale, and coding scheme for refusals; (2) jointly grading 5% of the data, during which they shared their understanding of rating criteria through practicing scoring, discussing cases, and revising previously assigned scores. After training, the two raters completed independent scoring of the remaining 95% of the cases. The two raters discussed all discrepancies until they reached an agreement. Inter-rater reliability, as measured by Pearson’s correlation coefficient between the two raters’ total scores, was .802.
Data Analysis
Table 5 shows the alignment between the RQs, data sources, and data analysis methods. Specifically, we performed a paired samples t-test to reveal any differences between Chinese EFL learners’ actual and mediated scores of refusals. We ran a two-factor repeated measures ANOVA to examine if any differences were present between the pretest, immediate, and delayed posttest scores for two groups of Chinese learners followed by paired comparisons to locate the locus of any differences.
Alignment between the RQs, data, and analyses undertaken.
Results
We considered a Cronbach’s alpha value of .753 for pooled actual and mediated scores, was considered an acceptable value.
Learners’ Strengths and Weaknesses Diagnosed by the CDDA-P
In line with previous research (Jang, 2009; Toprak & Cakir, 2021) and based on the performance of the examinees, this study defined learners’ strengths with an accuracy rate of .85 or higher, and learners’ weaknesses with an accuracy rate of .6 or lower. Learners are considered to have moderate mastery in a given dimension when their accuracy rate is between .60 and .85. By using two thresholds, we aimed to increase the discriminability between strong and weak performance.
Figure 3 displays learners’ mean accuracy rate based on actual performance. Among the refusal scenarios, learners performed best without mediation on refusing invitations (accuracy rate = 66.91%, SD = 0.20, range [25.00%, 100.00%]), followed by refusing suggestions (accuracy rate = 55.88%, SD = 0.29, range [0.00%, 100.00%]), refusing offers (accuracy rate = 47.06%, SD = 0.27, range [0.00%, 100.00%]), and worst on refusing requests (accuracy rate = 41.91%, SD = 0.13, range [25.00%, 75.00%]). When refusing interlocutors of different power, learners were better at appropriately refusing an interlocutor of higher power (accuracy rate = 63.97%, SD = 0.18, range [12.50%, 87.50%]) than an interlocutor of equal power (accuracy rate = 41.91%, SD = 0.16, range [0.00%, .75.00%]). As observed, learners in the current study showed considerable variation in performing L2 refusals: they demonstrated relatively stronger abilities in refusing invitations and interlocutors of higher power and displayed weaknesses when refusing an interlocutor of equal power, refusing requests, refusing suggestions, and refusing offers. Among the traits under consideration, learners performed best on the use of adjuncts (accuracy rate = 61.40%, SD = 0.16, range [25.00%, 87.50%]), and showed similar performance on refusal directness (accuracy rate = 52.04%, SD = 0.13, range [15.39%, 69.23%]) and the use of justification (accuracy rate = 54.62%, SD = 0.15, range [21.43%, 78.57%]). In contrast, learners need improvement on language formality (accuracy rate = 26.47%, SD = 0.18, range [0.00%, 60.00%]).

Mean accuracy rate based on actual performance.
To better display detailed diagnostic feedback provided by the CDDA-P, we randomly chose one candidate’s scoring profile, as shown in Figure 4. The scoring profile showcased Zhang’s overall performance in the upper-left section. Zhang achieved a score of 40 out of 64 for his actual performance and 52 for his mediated performance, resulting in an LPS of 1. The bar chart provides a summary of Zhang’s performance across different types of refusal scenarios. The categorization presented in the bottom left offers insight into Zhang’s specific strengths and weaknesses. This could helps learners gain a comprehensive understanding of their performance, especially the areas that require improvement. As shown in the Figure, Zhang demonstrated notable strengths in refusing offers and invitations, including refusing people of higher power but exhibited weaknesses in handling requests, suggestions, people of equal power, and formality.

Zhang’s scoring profile.
Learners’ Zones of Proximal Development and Learning Potential Identified by the CDDA-P
To account for learners’ responsiveness to mediation, Table 6 presents learners’ actual scores, mediated scores, and LPSs. A glance at Table 6 highlights a noteworthy observation. The mean mediated score (MS = 50.71) far exceeds the actual score (AS = 33.88), showing an average improvement of 16.83 points. Additionally, learners demonstrated greater uniformity in their performance after mediation (SD = 4.78).
Descriptive Statistics of Scores and Test Time in the CDDA-P Group.
Note. LPS: learning potential score.
To examine whether there were differences between actual and mediated scores, a paired samples t-test was conducted. A one-sample K–S test was run and results showed that the gain score was in the normal distribution, meeting the premise of a paired samples t-test. As shown in Table 7, the results indicated a significant difference between actual and mediated scores (t = 18.489, p < .001, d = 3.17), implying substantial variation between learners’ ZPDs and ZADs. To further analyze whether the CDDA-P could effectively identify learners’ ZPDs, we examined those with the same actual score of 32 points and displayed their actual and mediated scores in Figure 5. The figure highlights notable differences in learners’ mediated scores, revealing diverse potential for development. For instance, Learners 5 and 7 showed strong internalization of the assistance provided, exhibiting enhanced responsiveness. In contrast, Learners 4 and 6 had relatively lower ZPDs, indicating they required more explicit prompts to handle items beyond their current ability. These findings suggest that for some participants, the CDDA-P mediation was either insufficiently effective or misaligned with their ZPDs. The significant variability in ZPDs pointed to differing levels of responsiveness to mediation, potentially impacting future development rates.
Paired Samples t-test of Mediated and Actual Scores of the CDDA-P Group.
Note. MS: mediated score; AS: actual score.

Variations in mediated scores for participants sharing an actual score of 32 points.
Regarding learning potential, Table 6 shows that learners’ LPS ranged from 0.88 to 1.16. In line with Kamrood et al. (2019), we divided all learners into three groups with high, moderate, and low learning potential. Four test-takers fell into the low learning potential group (0.88 < LPS ⩽ 0.97). Nearly half of the test-takers (n = 15) belonged to the moderate group (0.97 < LPS ⩽ 1.06). The rest (n = 15) had high learning potential (1.06 < LPS ⩽ 1.16). Figure 6 illustrates the significant variability in LPS among participants, highlighting the diverse responses to mediation during the CDDA-P mediation. Another finding was that there was great variation in learners’ LPS, even among learners with the same actual scores, demonstrating that LPS could discriminate against learners who had the same ZAD.

Variations of learners’ LPSs according to actual scores.
The Effectiveness of the CDDA-P in Promoting Learners’ Pragmatic Performance
To evaluate the impact of the intervention on learners’ appropriate use of refusals and the persistence of this effect, repeated measures ANOVA was conducted to compare pretest, immediate posttest, and delayed posttest scores for both the CDDA-P and CSA-P groups. The descriptive statistics are shown in Table 8. The data met normality assumptions, confirming the suitability of repeated measures ANOVA. Results indicated a significant main effect for measurement time (F = 52.792, p < .001, ηp2 = .452) and group (F = 13.891, p < .001, ηp2 = .178). The interaction between measurement time and group was also significant (F = 22.801, p < .001, ηp2 = .263), necessitating further analysis of individual group effects. Within the CDDA-P group, paired comparisons revealed significant differences across all three tests, specifically, for the pretest– immediate posttest (Mdiff = −7.618, 95% CI [−9.251, −5.985], p < .001, d = −2.09), the pretest–delayed posttest (Mdiff = −5.882, 95% CI [−7.652, −4.113], p < .001, d = −1.67), and the immediate posttest-delayed posttest (Mdiff = 1.735, 95% CI [.338, 3.132], p = .01, d = .53). This showed that the intervention produced positive, sustained improvements in pragmatic competence. In contrast, within the CSA-P group, no significant differences were found among the three tests, specifically, for the pretest–immediate posttest (Mdiff = −1.469, 95% CI [−3.152, .214], p = .107, d = −.46), the pretest–delayed posttest (Mdiff = −1.469, 95% CI [−3.293, .355], p = .156, d = −.38), and the immediate posttest-delayed posttest (Mdiff ≈ .00, 95% CI [−1.440, 1.440], p ≈ 1.00, d ≈ .00). This showd that the traditional static assessment yielded only minor, non-significant gains in pragmatic competence.
Descriptive Statistics of Scores in the Pretest, Immediate, and Delayed Posttest.
Between-group comparisons were also conducted. Results showed that there was no significant difference in the pretest (Mdiff = −.947, 95% CI [−2.711, .818], p = .288, d = −.26) between the CDDA-P and the CSA-P group, while a significant difference was detected in the immediate posttest (Mdiff = 5.202, 95% CI [3.575, 6.829], p < .001, d = 1.57) and the delayed posttest (Mdiff = 3.467, 95% CI [1.605, 5.329], p < .001, d = .91). On average, the CDDA-P learners earned 5.2 points higher than the CSA-P group in the immediate posttest and 3.47 points higher in the delayed posttest, as shown in Table 8. Thus, the CDDA-P group outperformed their CSA-P counterparts in both the immediate and delayed posttest, suggesting that the CDDA-P was more effective in enhancing learners’ performance in pragmatic competence.
The semi-structured interview results also underscored the CDDA-P’s facilitative role in promoting learning. The CDDA-P test-takers held that the test was motivating and learning-oriented because the prompts provided chances to rethink their choices, and the hints instructed them on how to perform refusals appropriately in different contexts. For example, CDDA-P04 remarked that “ . . . the platform gives a prompt when I choose a wrong answer, which can promote thinking” and CDDA-P03 related that “I’ve learned something like being tactful and giving specific reasons. Pay attention to status and occasions. Don’t be too formal when rejecting classmates and show more respect when rejecting elders and teachers.” Some learners even demonstrated an acquired ability to perform context-appropriate refusals, adapting their language and strategies based on the power of the interlocutor and eliciting acts. For instance, CDDA-P06 stated that
When refusing an invitation, it’s better to express appreciation first, then give a specific reason why I can’t come. Finally, apologize and give a specific remedial plan such as going out next time. Also, it’s better to refuse a person of a higher social than me in a mild and polite tone.
Different from the CDDA-P learners, the CSA-P group who took the traditional static assessment did not show too much enthusiasm during the interview. They tended to be less motivated by the test and uncertain about the learning benefits they gained from the CSA-P. For example, when asked about the facilitative role that the test played in enhancing their performance, CSA-P02 hesitantly said “I’m not sure about it. Maybe a little” and further explained, “I’ve learned to change some words I use, such as ‘could’, ‘I’m afraid’, and be grateful.” Please refer to Supplementary Appendix E for more examples from the interview data.
Discussion
This study contributes to the field of language testing by developing a CDDA-P platform and demonstrating its potential as a methodologically innovative approach to assessing L2 pragmatic competence. The CDDA-P platform addresses key challenges in current language testing by integrating diagnostic and dynamic assessment principles to diagnose learners’ strengths and weaknesses across various types of refusals, to predict their zone of proximal development and learning potential, and to enhance their pragmatic performance.
The scoring profile developed in this study builds on prior research and represents a meaningful contribution to the development of dynamic assessment. Prior studies on C-DA often provided limited information, just focusing on actual scores, mediated scores, and LPSs. In contrast, the current scoring profile offers a more comprehensive analysis of learner performance. It integrates diagnostic and dynamic features, enabling educators to not only identify learners’ current strengths and weaknesses but also to understand the extent of their potential for growth with sufficient support. Diagnostically, it analyzes learner output across various types of refusals, such as refusing an invitation or a request. This fine-grained data helps distinguish learners with similar overall performance by identifying distinct pragmatic tendencies. Dynamically, it tracks the level of mediation each learner requires, providing insight into their ability to benefit from instruction. For example, two learners may both misuse direct refusals in a delicate situation, but while one improves with minimal prompting, the other continues to struggle, suggesting different instructional needs. LPS, in particular, functions as a predictive marker of responsiveness to pedagogical input. As Poehner and Yu (2022) noted, LPS is a key indicator of the learners’ capacity to respond to instructional interventions, thus providing a theoretical basis for the dynamic interpretation of assessment results. The incorporation of visual elements such as charts and graphs into the scoring profile further enhances the interpretability of the assessment results. Interview data corroborate these benefits: CDDA-P participants not only found the test more engaging but also showed greater metapragmatic awareness compared to CSA-P peers (see Supplementary Appendix E). While static assessments can offer some diagnostic utility when well-designed, the CDDA-P approach goes beyond mere diagnosis by offering a dynamic perspective on learner development, making it a valuable tool for informing instruction and fostering individualized learning strategies.
The current study also contributes to the ongoing discussion on the predictive function of dynamic assessment. In line with previous research on C-DA (Kamrood et al., 2019; Poehner & Lantolf, 2013; Poehner et al., 2015), the study also confirmed the predictive function of CDDA-P, which offered more comprehensive insights into both learners’ ZADs and their ZPDs. The scores provided bring unique and critical insights into learners’ abilities from different perspectives: actual scores provide insight into learners’ existing abilities, mediated scores show learners’ capacity to utilize instructional support, which can shed light on their ZPDs, and LPSs prove invaluable in assessing learners’ learning potential and predicting their future development (Poehner & Yu, 2022). In this regard, instructors should embrace the multifaceted nature of these scores and gauge the level of teaching support and assistance required to facilitate learners’ learning, rather than fixate on a single score as the sole determinant of learners’ abilities, as Poehner et al. (2015) have warned. The scoring profiles generated by the CDDA-P provided a more integrated perspective on learners’ established competencies, developing abilities, and learning potential. This comprehensive overview empowers stakeholders to tailor finely tuned learning plans that cater to each individual learner’s unique needs and developmental trajectory.
With regard to the third research question, the results suggest that the CDDA-P has a positive, albeit temporary effect on learners’ pragmatic performance. While participants demonstrated improved performance in the immediate posttest, a decline was observed in the delayed posttest conducted two weeks later. Nonetheless, their delayed posttest scores remained significantly higher than their pretest scores, indicating that the intervention had a measurable and lasting impact. These findings support the Vygotskian view of learning as a process that requires ongoing and scaffolded support to consolidate newly acquired abilities. Consequently, the study underscores the importance of integrating repeated, formative interventions into language instruction to maintain and reinforce learning gains over time.
Furthermore, we adopted the bottom-up approach to test item development, grounded in a systematic analysis of both native speaker and learner data. This method enabled us to design a test that is not only contextually authentic but also pedagogically meaningful. By closely aligning the test items with the actual language use patterns of the target population, we aimed to capture pragmatic language use in a realistic and relevant manner. This alignment is particularly critical in assessing pragmatic competence, where learner performance is shaped by a complex interplay of contextual, task-related, linguistic, and individual factors (Bachman & Palmer, 2010). We therefore argue that incorporating learner data into the test development process not only enhances the authenticity of the items but also leads to a more valid and nuanced assessment of learners’ pragmatic development.
Conclusion, Implications, and Suggestions for Further Research
In this exploratory study, we developed an online dynamic diagnostic assessment of pragmatic competence based on Chinese English learners’ refusal productions and demonstrated its effectiveness in diagnosing learners’ strengths and weaknesses in performing English refusals, uncovering their learning potential, as well as promoting their pragmatic performance.
This study holds several implications for the broader field of diagnostic assessment and computerized dynamic assessment, with specific relevance to L2 pragmatic dynamic assessment research. Macroscopically, this study proposes a novel integration framework that combines the principles of diagnostic and dynamic assessment, thereby expanding the scope and depth of information that can be obtained from a single assessment context. This integration enhances the applicability of pragmatic assessments, offering a more holistic view of learner performance and development. Mesoscopically, this study introduced a unique combination of the cake format and the sandwich format in the context of C-DA. By integrating both formats, the study offers a broader perspective on learner performance and learning, potentially informing further efforts to enhance dynamic assessment frameworks. Microscopically, the study contributed to the field of L2 pragmatic assessment by providing an innovative way of test compilation based on language users’ pragmatic production that could be used in future research.
As an early attempt at establishing a computerized dynamic diagnostic assessment of pragmatic competence, this study has several recommendations for future research. First, alternative methodologies to DTCs, such as interactive role-play scenarios, could be used to elicit participants’ pragmatic production in spoken communications. Second, we developed and initially validated the CDDA-P, taking the speech act of refusal as an illustrative case. Given that pragmatic competence is a multifaceted construct that encompasses three interrelated dimensions—namely pragmalinguistic knowledge, sociopragmatic knowledge, and actional competence—it remains to be explored whether similar outcomes would emerge using other speech acts and in different contexts. Third, while the use of a multiple-choice format offers administrative advantages, it also restricts learners to a limited set of pre-determined options. Future research is advised to adopt more open-ended and interactive methods of data collection, such as role plays. Fourth, while learner-generated options offer a more authentic reflection of learners’ pragmatic competence, they also pose challenges in terms of generalizability. Future research could explore cross-cultural applications and assess the method’s applicability in language testing of other languages or other learner groups. Finally, the text-based scenarios confined the CDDA-P to the linguistic dimension of pragmatic competence. Future research should therefore employ multimodal approaches (e.g., video-based scenarios) to capture and assess pragmatic competence as it is enacted through diverse semiotic resources.
Footnotes
Author Contributions
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the National Planning Office for Philosophy and Social Sciences of the People’s Republic of China (20BYY108).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
All supplementary appendices (A–E) are openly available on the Open Science Framework (OSF) project website (Lu, 2026) under a CC-BY 4.0 International license at
.
