Abstract
Machine translation (MT) errors may be seen as a limitation, but it is precisely these imperfections that provide learners with valuable opportunities for error-driven practice through post-editing (PE), thus developing their language proficiency and, in turn, supporting translation performance. However, little is known about how machine translation post-editing (MTPE) benefits intermediate learners of English as a foreign language (EFL). This empirical study explores the role of strategically informed MTPE practices in supporting intermediate Chinese EFL learners’ linguistic and translation development. Participants completed human translation and MTPE tasks guided by PE strategies, followed by reflection on their edits. Text complexity analysis using Eng-Editor revealed that post-edited texts exhibited higher overall textual, lexical, and syntactic complexity than human translations, with correct revisions marginally exceeding errors, suggesting that intermediate learners still struggle with error detection and correction. These findings underscore the need for clearer strategic support in PE tasks. Sentiment analysis of learner reflections based on SnowNLP with contextual adjustments showed a neutral-to-negative attitude toward PE, highlighting challenges related to syntactic proficiency and technical support. This study contributes empirical evidence on pedagogical MTPE use with intermediate learners, leading to refined PE guidelines and a multi-task design that emphasizes both selection competence and linguistic development to enhance translation performance.
Keywords
1. Introduction
Recent decades have witnessed continuous advancements in translation technologies, evolving from rule-based and statistical machine translation (MT) to neural machine translation (NMT), and now to cutting-edge Artificial Intelligence Generated Content (AIGC) systems. In earlier stages, MT systems often struggled to produce high-quality texts, resulting in confusing and sometimes comical outputs, which limited their widespread adoption. However, improvements in NMT quality and the availability of accessible NMT tools have increased their use in various language learning contexts, as Ducar and Schocket (2018, p. 782) noted that ‘it (MT) plays an increasingly important role in the twenty-first-century student’s approach to language learning.’ Although in terms of translation quality, emerging large language models (LLMs) may provide superior translation performance, NMT tools remain particularly relevant in pedagogical settings. In these contexts, the primary goal is not the efficiency of final output, but the development of learners’ analytical and evaluative skills in translation. From this developmental perspective, it is exactly the inherent imperfections in MT outputs that provide authentic opportunities for learners to engage in post-editing (PE), a process involving identifying and correcting errors through revision techniques such as addition, omission, rewriting, and paraphrasing for clarity (Niño, 2008).
Previous research has consistently reported positive linguistic, cognitive, and affective effects when integrating MT and PE into language classrooms (Garcia, 2011; Kliffer, 2005, 2008; J. Lee & Liao, 2011; Niño, 2009; Rico et al., 2018; Shei, 2002; Yamada, 2020). However, despite this growing integration, these studies have predominantly focused on advanced learners, with limited attention given to intermediate learners, who occupy a transitional stage of language development between basic communicative ability and more advanced linguistic competence. It is understandable that there was little research on intermediate learners in earlier studies when the quality of MT products was relatively low, because such MT systems required considerable language proficiency and cognitive effort that were beyond most intermediate learners’ capacity to fully identify and correct MT errors. With continuously improved quality, MT now further reduces learners’ reliance on foreign language expertise (Pym, 2013), making it increasingly feasible to integrate MT into the classroom for intermediate learners, particularly through PE tasks. Some efforts have been made to introduce MT into classrooms for intermediate learners as a tool to compensate for their limited productive resources. This allows them to engage with complex texts that might otherwise be beyond their reach and helps foster their PE skills and writing development (Garcia, 2011; Tsai, 2019; Yamada, 2014).
Despite these promising developments, empirical work on systematic investigation of MTPE pedagogy tailored for intermediate learners remains scarce, especially within the Chinese educational context. Chinese intermediate language learners typically study translation as non-professional translators and exhibit considerable proficiency disparities, thus standing to benefit greatly from MT’s capacity to level the playing field within the same classroom (J. Lee & Liao, 2011). At the same time, as Fu and Xie (2015) pointed out, translation technology instruction should be differentiated between English and non-English majors, yet the latter, often intermediate learners, receive limited attention in MT pedagogy. What’s more, existing studies remain largely product-oriented, focusing on the quality of learner translations rather than how MT and PE may support the development of translation competence. For intermediate learners, who have foundational skills but still struggle with complex syntax and vocabulary, translation competence is closely tied to linguistic ability. In this respect, learners’ syntactic complexity serves as a key indicator of their language proficiency and depth of linguistic awareness during translation. MTPE tasks provide accessible linguistic models while requiring learners to critically evaluate and revise MT outputs, thereby creating opportunities for noticing and syntactic development. From a pedagogical perspective, it is therefore important to investigate the potential of MT and PE tasks to foster syntactic development and how instructional guidance may support learners in deciding whether to accept or revise MT-generated structures. This decision-making process, as argued by Pym (2003), constitutes a crucial component of translation competence. Meanwhile, intermediate learners’ affective attitudes toward MT and PE may further mediate how they approach these tasks and the extent to which they benefit from them. However, few studies have examined these linguistic and affective dimensions together among intermediate learners, leaving uncertainties about how NMT‑supported translation activities influence both syntactic outcomes and emotional responses in classroom contexts.
Overall, while prior research has leveraged PE to refine MT outputs and enhance final products, fewer studies have explored the pedagogical potential of NMT in language learning and translation training among intermediate learners. Given that NMT’s language proficiency often surpasses that of most college students in their second language (L2) (Yamada, 2020), there is a clear need for instructional approaches that enable students to engage with MT critically and productively in order to derive linguistic, cognitive, and translation-related benefits. Building on prior work, the present study compares intermediate learner translations produced with and without MTPE to investigate the feasibility of MTPE tasks, their potential impact on the text complexity of learner translations, and learners’ attitudes toward MT and PE. Drawing on Pym’s minimalist view of translation competence, the study further explores whether and how MTPE may support the development of syntactic, lexical, and strategic skills as well as the role of affective engagement in mediating these processes. The findings are intended to inform evidence-based pedagogical guidance on MTPE practices for intermediate learners of English as a foreign language (EFL), with implications for fostering translation competence and more constructive learner engagement with MT.
2. Literature Review
2.1. Application of MT for Educational Purposes
Niño (2009) identified four main educational uses of MT: as a bad model, a good model, for vocational purposes, and as a computer-assisted language learning (CALL) tool. In this framework, using MT as a ‘bad model’ does not imply pedagogical inadequacy; rather, it involves deliberately exploiting imperfect MT output as a resource for error identification and correction. By contrast, the ‘good model’ approach treats MT as a relatively reliable reference translation, while vocational and CALL-oriented uses emphasize professional training and general language support, respectively. This section focuses on MT as a ‘bad model’, where learners’ engagement centers on diagnosing and revising errors in raw MT output as a core learning activity. ‘According to Fotos (1993) and Ellis (1990), formal instruction in the form of error detection and correction can raise learners’ consciousness of grammatical structures in the target language in order to promote advanced levels of target language attainment’ (quoted from Niño, 2008, p. 43). Empirical evidence from Niño’s (2008) study with advanced Spanish learners further demonstrates that error correction was a good teaching activity to enhance the awareness of language differences and promote language development. Kliffer (2005, 2008) compared MT output, human translation, and MTPE through error analysis in translation class. Both his experiments found that the application of MT helps students improve text quality and better understand the nature of translation. Most importantly, PE ‘gave students insight into the huge challenges which have confronted MT, especially the questions of how to deal with syntactic and lexical ambiguity, non-literal language, and inferencing’ (Kliffer, 2008, p. 63). Apart from the benefit of enhancing translation awareness, scholarly works by subsequent studies (Garcia, 2011; J. Lee & Liao, 2011; Niño, 2008) have unanimously reported several linguistic advantages emanating from MTPE, including improvements in text quality, heightened language awareness, increased language autonomy, and a reduction of disparity between students possessing varying language proficiency levels. However, the extent to which these benefits generalize across linguistic domains remains contested.
The direction of language transfer significantly influences the effectiveness of MT in language learning. Most studies have focused on using MT to generate texts from learners’ first language (L1) into their second language (L2), which students then post-edit to improve their language skills, particularly in writing (Garcia & Peña, 2011; S. M. Lee, 2019; Tsai, 2019; Xu, 2020). In translation learning, research has examined both directions of translation, despite the widely accepted translation ethic favoring translation into L1. Studies incorporating MTPE from L2 to L1 in translation classes reported improvements in text quality and cognitive benefits (Garcia, 2011; Kliffer, 2005; J. Lee & Liao, 2011). However, these improvements were mainly observed at the lexico-grammatical level, with less impact on syntactic and discourse-level features. This underscores the importance of explicitly training learners’ syntactic competence through PE tasks. Addressing this gap is particularly important in classroom contexts, where learners’ syntactic competence often constrains their translation performance.
2.2. Chinese–English Machine Translation Errors and PE Guidelines
Although NMT improves its output in many ways with the invention of new neural algorithms, it still fails in the following three fields: grammatical inaccuracy, issues of formality, and pragmatic breakdowns (Ducar & Schocket, 2018). In the English–Chinese translation direction, common MT errors identified in recent studies include vocabulary errors, syntactic errors, textual errors, and formatting mistakes (Guo & Hu, 2021; M. Li, 2021; X.-L. Wang et al.,2023). Vocabulary errors take up more than 70% of all error types (M. Li & Zhu, 2013), while syntactic problems, especially word order errors and structurally incomplete or jumbled sentences, are consistently identified as the most difficult to correct (Qiu et al., 2022). Terminology, style, and syntax complexity are equally difficult for both machine translation and human translation (Qian et al., 2022). Similar error types in C-E NMT translations were identified by F. Q. Li (2021) and Zhao et al. (2024), including grammatical mistakes, awkward or literal lexical choices, omissions, unclear references, syntactic errors, and problems with sentence structure. These findings on MT errors shed some light on the compilation of PE guidelines aimed at helping users identify and correct typical MT errors. For intermediate EFL learners, who often struggle with syntax, collocations, and pragmatic appropriateness, such guidelines offer structured support for improving their PE performance.
Different PE guidelines are employed in professional fields by different language service providers (LSPs) according to different needs. Flanagan and Christensen (2014) worked out a set of PE guidelines to tailor commercial PE guidelines for translator training purposes. Hu and Cadwell (2016) compared 5 sets of general guidelines proposed by representatives of LSPs and scholars at two different levels, namely light PE and full PE. Their analysis revealed substantial overlap among these guidelines and suggested that tailored PE guidelines for each project should be developed beforehand. These guidelines are primarily designed for professional translators and LSPs, so their level of detail and productivity-oriented focus may not align with the pedagogical needs and processing capacity of intermediate EFL learners. In addition, there is yet another important component in composing PE guidelines, as Bowker and Ciro (2019, p. 91) explained that the ‘types of problems that will be present in machine translation output will be tied more directly to the specific language pair and the particular machine translation system that was used.’ Therefore, it is not feasible to offer generic guidelines for self-post-editing, indicating that PE guidelines need to be language specific.
Regarding C-E PE guidelines, Zhu et al. (2020) proposed eight guidelines based on Newmark’s translation theory from four aspects, with two points at each level. They are as follows: at the text level, add or delete information accordingly; at the reference level, follow authoritative translations and contextual cues; at the coherence level, improve cohesion through word substitution and sentence restructuring; and at the naturalness level, choose words according to collocation rules and cultural context. Their study proved that these guidelines were feasible and rendered desirable output from Google Translate. However, the guidelines are designed mainly to improve the MT output instead of users’ language competence. Reflecting on the findings of these studies, the author reorganizes and adapts the PE guidelines for language development purposes. The resulting PE guidelines are explicitly tailored to intermediate EFL learners, prioritizing frequent error types and providing linguistic support rather than exhaustive professional criteria.
2.3. NMT for L2 Intermediate Language Learners
Previous studies generally indicate that NMT tools can be highly beneficial as learning aids, particularly in enhancing students’ PE and writing skills. However, conflicting results on their effectiveness were presented across proficiency levels. Some viewed MT as more beneficial for beginning to intermediate learners (Garcia & Peña, 2011; Lee, 2020; Niño, 2009), while others found advanced learners performed better in identifying and correcting errors in MT output than beginners or lower-intermediate learners, for whom NMT can actually have a detrimental effect (Chung, 2020; Chung & Ahn, 2021; Niño, 2008; Qi et al., 2024; Xu, 2022). Chung (2020) reported that advanced learners were more critical and identified 73% of errors in NMT texts, compared to 54% by intermediate learners, when conducting PE of NMT texts. Chung and Ahn (2021) also found that not only does the degree of benefits from MT differ, with advanced learners benefiting more than lower intermediate learners, but the areas in which they benefit most also differ. Advanced learners benefit more from MT in sentence reconstruction, while lower intermediate learners benefit more from correcting grammar errors.
In contrast, Xu (2022) suggested that while advanced students tended to make changes at the word level by substituting original vocabulary with the vocabulary MT suggested, intermediate learners leverage MT more holistically, extending revisions beyond lexical adjustments. The intermediate-level student expanded her revisions from the word level to the sentence level, actively using various resources, including her linguistic knowledge, dictionary, Google Search, and Japanese native speakers, to decide whether and how to accept or reject MT’s suggestions. A similar finding was observed by Qi et al. (2024), who found that trainee translators who were advanced language learners outperformed the intermediate group in MT error detection and correction in the lexicon category. What’s more, if students stop using the NMT tool in learning a foreign language, its short-term effect vanishes, and the students are left alone with the language competencies they had obtained earlier in their foreign language learning (Klimova et al., 2023). These studies, therefore, present a rather mixed picture of how intermediate learners engage with NMT. Some research suggests that limited linguistic proficiency may constrain their ability to critically evaluate MT output. However, other studies indicate that intermediate learners can still benefit from MT through grammar correction or broader revision processes. Overall, although intermediate learners typically possess sufficient linguistic resources to work with MT output, they often lack the strategic competence needed for systematic error detection and higher-level revision in MTPE tasks.
Taken together, recent literature reveals ongoing studies on the application of MT in language education, but most studies continue to prioritize the quality of final translations. Although researchers have investigated how students of varying proficiency levels benefit from MT, findings remain contradictory, particularly regarding the extent to which intermediate learners can use NMT productively. O’Neill (2019) emphasized that teachers should guide students to use online tools responsibly and effectively, suggesting that students’ L2 proficiency is an important prerequisite for the effective use of NMT. MT tools should not be used indiscriminately without adequate language competence, understanding of MT benefits and limitations, and clear guidelines. Moreover, existing research has rarely examined learners’ affective responses in MTPE tasks and how strategically designed PE guidelines might help intermediate EFL learners engage critically with NMT output and develop their translation competence. Against this background, the present study examines the translation performance of intermediate Chinese EFL learners with and without MT assistance, exploring how tailored PE guidelines may support their language proficiency, especially at the syntactic level, and shape their attitudes toward MT and PE. This study addresses the following research questions:
Research question 1: What are the differences in text complexity between L2 learner translations and post-edited MT texts?
Research question 2: How do PE strategies influence learners’ accuracy and effectiveness in identifying and correcting MT errors?
Research question 3: How do learners perceive the role of PE tasks in developing their linguistic and translation competencies?
3. Methods
3.1. Participants
42 adult native speakers of Chinese (55% female, aged 19–21 years) participated in the study. All participants were second-year, non-English majors enrolled in a selective English translation course. Although a separate standardized proficiency test was not administered for this study, participants’ English proficiency can be operationalized as intermediate, corresponding to the B1 band of the Common European Framework of Reference for Languages (CEFR). The classification was determined based on institutional placement criteria and empirical standards. Specifically, as second-year university students, the participants were enrolled in a curriculum aligned with China’s Standards of English Language Ability (CSE) Levels 4–5, which recent Rasch-based mapping studies have shown to broadly correspond to CEFR B1 (Peng & Liu, 2021, 2024).
The study was conducted mid-semester, after students had received instruction on basic translation principles, strengths and limitations of MT, and key linguistic contrasts between Chinese and English, such as analytic versus synthetic language features, hypotaxis and parataxis, and voice distinctions. Prior to the empirical study, students applied these strategies in a pilot PE task following the same procedure; although no formal data were collected, informal feedback gathered through group discussions indicated that students found the strategies useful for understanding linguistic differences. This preliminary application served as an initial validation of the pedagogical framework and PE guidelines, offering practical insights for their refinement.
3.2. Material and Procedure
Participants were given a Chinese tourist text of 279 characters to perform C-E translation. The text was selected for its moderate complexity, appropriate for learners at the B1 proficiency level. The translation tasks were divided into two stages: first, participants completed a human translation (HT) task within 10 minutes; participants were informed that they were not required to translate the entire text within this time limit. Immediately after, they performed a PE task on the machine-translated output generated by DeepL, also within a 10-minute limit. The 10-minute limit for both tasks reflects practical classroom time constraints and simulates the time-limited conditions typical of classroom-based translation practice.
During the human translation stage, no dictionaries or external resources were allowed. DeepL was chosen as the MT tool due to both its robust performance in C-E translation and its tendency to produce a certain number of errors, providing meaningful practice opportunities for PE tasks (Cai, 2023; Gao et al., 2024; F. Q. Li, 2021; Yu, 2024).
Both the human translations and post-edited versions were evaluated for lexical, syntactic, and overall text complexity using Eng-Editor (Jin & Lu, 2018; Jin et al., 2020), an online text assessment system built on a corpus of about 7,000 samples across Chinese EFL proficiency levels. The system draws on curriculum-based vocabulary lists and integrates the L2 Syntactic Complexity Analyzer (L2SCA) for syntactic analysis. In Eng-Editor, each complexity index is reported as a level-based difficulty score in which the integer denotes CSE proficiency level and the decimal marks the text’s percentile ranking within that level in the benchmark corpus. These indices are calibrated to CSE Levels 3–7, corresponding respectively to junior secondary English, the national college entrance examination, CET-4, CET-6, and postgraduate entrance requirements. L2SCA indices demonstrate high reliability (Cronbach’s α > 0.80) and strong validity against human L2 ratings (Lu, 2010), and prior studies have confirmed their suitability for both learner writing and translated texts (Liu & Afzaal, 2021).
After completing the PE task, students were asked to write reflections on their experience using MT for C-E translation. They were encouraged to comment on their satisfaction with the MT output, the usefulness of the PE strategies, and their perceptions of PE tasks. These reflections were analyzed in a Python environment using jieba for Chinese text segmentation, SnowNLP for baseline sentiment polarity scoring, and scikit-learn for TF–IDF (term frequency – inverse document frequency) keyword extraction. To improve the model’s sensitivity to translation-related evaluative language, a hybrid sentiment analysis approach was implemented in which SnowNLP’s polarity scores were supplemented with a small, domain-informed pedagogical lexicon. Instead of relying solely on generic polarity classification, the analysis incorporated simple contextual adjustments to account for contrastive expressions and negation, enabling a more accurate interpretation of evaluative shifts in learner reflections and yielding a nuanced understanding of their attitudes and perceptions.
Together, these procedures form an integrated methodological framework for examining learners’ linguistic and translation performance as well as their evaluative responses. The methodology reflects a smart pedagogy design, integrating NMT with structured PE strategies to promote translation competence by leveraging accessible digital tools and scaffolded practice.
3.3. PE Strategies
PE strategies designed to support linguistic and translation development draw on two main sources. First, research on common MT errors, which informs strategic sub-competence for developing PE competences (Konttinen et al., 2020). This research helps direct learners to focus on specific linguistic features in MT outputs to identify potential inaccuracies. Second, studies on linguistic differences between Chinese and English, which aim to guide learners to select sentence structures that better suit the target language context, rather than settling for merely grammatically correct but unnatural translations. Therefore, for one thing, the guidelines should be helpful for students to locate the potential problems in the MT output. For another, Ma (2013) highlighted that Chinese college students’ translation performance largely depends on bilingual sub-competences such as constructing well-formed sentences and composing coherent texts. Therefore, the guidelines integrate syntactic considerations based on the well-established C-E contrasts to enhance bilingual competence by encouraging learners to select translations that are not only grammatically correct but also collocationally appropriate, well-structured, and coherent.
Based on the above analysis, the study proposes seven guidelines related to sentence construction, grammar, and textual coherence, which are common sources of issues in C-E translation, as well as the language errors frequently seen in machine translation specific to this language pair.
Check collocations for Chinese four-character phrases: MT outputs often replicate Chinese sentence structures, resulting in expressions that are grammatically correct but awkward or unidiomatic in English, especially with four-character idiomatic phrases, which rely on specific word combinations or rhythmical patterns that rarely have direct equivalents in English.
Verify tenses for inflection accuracy: As Chinese lacks inflection, tense errors are common in learners’ translations. Although NMT has improved in this area, careful checking of tenses and agreement remains necessary (TAUS, 2016).
Check repetitive words: Repetition in Chinese may serve rhetorical or clarifying functions, but should be reduced or replaced in English translations for naturalness.
Refine figurative language: Figurative expressions are challenging for both MT and human translators; post-editors must ensure accurate conveyance of connotative meanings.
Adjust flat sentence construction: Chinese tends to use sequential, simple sentences, whereas English favors complex sentences with extensive modifiers. PE should adapt flat, top-heavy structures to natural English style.
Replace verbs with nouns properly: English often prefers nominalizations where Chinese uses verbs; appropriate substitutions improve translation quality.
Properly use passive voice: Passive constructions are more common and neutral in English, but can carry negative connotations in Chinese; post-editors should apply passive voice appropriately.
Unlike efficiency-focused guidelines such as those from TAUS, these strategies emphasize syntactic choices. They are specifically tailored to intermediate learners, serving as reminders of alternative linguistic options and supporting informed decision-making during PE. These PE strategies formed the instructional basis for the PE tasks in this study. Their effectiveness and areas for improvement are discussed in Section 5.
4. Data Analysis and Results
This study intends to provide a detailed description and grounded interpretation of intermediate Chinese learners’ MTPE performance by analyzing three types of data: syntactic complexity of participants’ human translation and PE texts, revisions made by participants in PE tasks, and their reflections on the usage of MT and PE guidelines.
4.1. Research Question 1: Text Complexity in HT and PE
To evaluate syntactic complexity, Eng-Editor was employed to analyze participants’ human translations and corresponding PE texts of the same source material. Eng-Editor offers robust measures to assess text complexity, with an intuitive interface and clear numerical outputs that make it accessible and easy to use. To ensure statistical stability and representative validity of the linguistic metrics, a rigorous screening criterion was applied. Only translations meeting a minimum length of 100 tokens were included in the complexity analysis, as linguistic indices are highly sensitive to text length (Hwang & Polio, 2023). Consequently, 15 participants who met this threshold in both HT and PE tasks were selected for the final comparative analysis. This approach, while reducing the sample size, prioritizes data integrity and prevents the skewed results associated with ultra-short textual segments. After obtaining the text complexity indices from Eng-Editor, all statistical analyses were conducted using IBM SPSS (Version 31.0). Normality tests indicated that some variables significantly deviated from normality (p < .05), and thus, depending on the distribution, either a paired t-test (for normally distributed data) or a Wilcoxon signed-rank test (for non-normally distributed data) was used. Statistical significance was set at p < .05, as shown in Table 1.
Text Complexity in Different Translations.
Notes. HT = human translation; MT = machine translation; PE = post-editing. t-values represent paired-samples t-tests (for normal data); z-values represent Wilcoxon signed-rank tests (for non-parametric data). Normality was verified via the Shapiro–Wilk test in IBM SPSS 31.0.3. ***p < .001, *p < .05, ns = not significant.
As presented in Table 1, the comparative analysis shows significant differences between HT and PE across several linguistic measures. Marked gains were observed in lexical and textual elaboration: lexical complexity increased from 3.97 (SD = 0.26) to 5.08 (SD = 0.14), and text complexity rose from 3.99 (SD = 0.14) to 5.01 (SD = 0.07), with both improvements reaching high significance (Wilcoxon p < .001). Interpreted within Eng-Editor’s CSE-aligned scale, these values indicate that learners’ lexical and text-level performance shifted from the Level 4 range (Gaokao level) toward Level 5 (CET-4), which corresponds to their expected proficiency band. Correspondingly, the number of complex noun phrases rose from 13.53 (SD = 2.71) to 21.00 (SD = 2.03), which was also highly significant (t = –8.96, p < .001). The frequency of non-finite verb phrases increased from 1.60 (SD = 1.40) to 2.53 (SD = 1.13), representing a statistically significant improvement (z = –2.12, p = .034).
Other syntactic measures showed relatively small numerical differences that did not reach statistical significance. While syntactic complexity showed a slight numerical rise from 3.92 (SD = 0.14) to 3.98 (SD = 0.09), a paired t-test indicated that this change was not statistically significant (t = –1.66, p = .119). On the CSE scale, these scores fall within the Level 3 to Level 4 range. Similarly, no significant difference was detected in the frequency of finite subordinate clauses (HT: 4.20, SD = 2.18; PE: 4.33, SD = 1.11; z = –0.14, p = .888). Overall, standard deviation values indicate moderate inter-learner variation across measures.
To provide a benchmark for these findings, Table 1 also displays the metrics for the MT output and the reference translations. Notably, while the PE lexical complexity (5.08) closely approached both the MT (5.05) and the professional reference (5.22), the PE syntactic complexity (3.98) remained aligned with the MT output (3.99) but was substantially lower than the reference level (4.31).
4.2. Research Question 2: Accuracy and Effectiveness of PE Strategies
Regarding revisions, all 151 edits made by the 42 participants during PE tasks were collected and categorized into three groups: correct, incorrect, and unnecessary revisions. A separate analysis was conducted for the 15 participants whose translations were included in the syntactic complexity analysis. To ensure reliability, the author independently coded the revisions three times over two days. Between the first and second rounds, 13 discrepancies were noted, resulting in an intra-rater reliability of 91.4%. The third round matched the second exactly.
Across the full cohort, the 151 revisions comprised 59 incorrect, 32 unnecessary, and 60 correct revisions, resulting in a correct revision rate of approximately 39.7%. The subset of 15 participants, selected for fine-grained linguistic analysis, produced 66 revisions (26 incorrect, 13 unnecessary, and 27 correct revisions), yielding a slightly higher correct revision rate of 40.9%. The distribution of revision categories is summarized in Table 2.
Revision Categories.
As shown in Table 2, the proportional distribution of revision types was similar between the two groups: incorrect revisions accounted for 39.1% and 39.4%, unnecessary revisions for 21.2% and 19.7%, and correct revisions for 39.7% and 40.9% for the full cohort and the subset, respectively. This consistency suggests that the 15-participant subset reasonably reflects the PE behavior of the full cohort.
Table 3 illustrates examples of revision coding. For instance, one student revised ‘Five mountains do not see the mountain, the return of the Yellow Mountain does not see the mountain’ (五岳归来不看山,黄山归来不看岳) to ‘The return of five mountains do not see the mountain, the return of the Yellow Mountain does not see the mountain.’ The revision was still awkward and does not convey the meaning of the original Chinese saying, which means that you won’t want to visit any other mountains after seeing Wuyue (five sacred mountains), and you won’t wish to see even Wuyue after returning from Mount Huangshan, thus classified as an incorrect revision. The second example showed an unnecessary revision where the student altered an already accurate MT phrase to ‘There is a saying in China.’ A correct revision was identified in the third example, where a nonsensical MT phrase ‘visit the victory’ was appropriately corrected to ‘appreciate the scenery’, conveying the intended meaning and improving readability.
Examples of Revision Categories.
Note. Boldface indicates the machine-translated segments targeted for revision and the corresponding changes made by participants during post-editing, illustrating examples of incorrect, unnecessary, and correct revisions.
4.3. Research Question 3: Learners’ Attitudes toward MT and PE
For learners’ reflections, 17 texts were collected from the 42 participants (a 40.5% response rate). The low response rate is attributed to the voluntary nature of the task and the scheduling of the reflection session following a demanding translation task. While this limited sample size necessitates caution in generalizing the findings, the reflections offer critical qualitative depth to the quantitative data. Table 4 presents the descriptive statistics of the sentiment score across all texts, showing a distribution leaning toward neutrality and critical introspection regarding MTPE tasks. The majority of the reflections (58.8%, n = 10) were classified as neutral (scores between 0.40 and 0.60), while 35.3% (n = 6) exhibited negative sentiment (scores ⩽ 0.40). Only 5.9% (n = 1) showed distinctly positive sentiment (0.653). The mean score of 0.428 reflects an overall cautious and evaluative stance toward MTPE tasks. The noticeable range between the maximum (0.653) and minimum (0.253) scores underscores the heterogeneity of learner experiences.
Quantitative Distribution and Descriptive Statistics of Sentiment Polarity.
Table 5 summarizes the top 10 keywords that characterize the thematic focus of students’ reflections. The highest weights were assigned to general task-related terms, including ‘Translation’ (1.745) and ‘Machine Translation’ (1.474). Beyond these primary task descriptors, the list contains several evaluative and technical terms. Specifically, ‘High Difficulty’ and ‘Issues/Problems’ exhibited weights of 0.853 and 0.691, respectively. Furthermore, challenges related to semantic and syntactic processing were represented by ‘Connotation’ (0.737), ‘Vocabulary’ (0.596), and ‘Word Order’ (0.530). These values provide a quantitative summary of the most prominent topics addressed in the student corpus. When interpreted in conjunction with the sentiment analysis results, these keywords further suggest that learners’ reflections were characterized by a critical and problem-oriented stance toward MTPE tasks.
TF–IDF (Term Frequency – Inverse Document Frequency) Keyword Extraction of Students’ Reflections: Top 10.
5. Discussion and Implications
5.1. Text Complexity Gains and Syntactic Development-Oriented PE Guidelines for Intermediate Learners
In relation to research question 1, quantitative analysis reveals that PE texts exhibit higher lexical complexity and text complexity compared to human translations, which is in line with Wang et al.’s (2021) observation that PE tasks are associated with increased lexical elaboration and perceived text quality. It suggests that learners actively generate more sophisticated vocabulary by adapting MT outputs. A possible explanation for this tendency can be found in Belam (2003), who claimed the PE task allowed the students to focus on new vocabulary, grammatical points, and style (language learning), translation skills, text analysis skills, and the importance of register in different communicative situations. From a grammatical standpoint, previous studies have shown that MT output generally meets and, in some cases, exceeds the minimum language requirement of English-speaking universities (Mundt & Groves, 2016), making it a useful source of reference and meaningful input for trainee translators in the process of language learning (Qi et al., 2024). For intermediate EFL learners, therefore, MT-supported PE may facilitate vocabulary expansion and lexical variation, provided that learners are guided to critically evaluate and adapt MT suggestions rather than reproduce them mechanically.
While students successfully leveraged MT to bypass lexical deficits, they adhered to MT’s syntactic structures. The use of finite subordinate clauses is only slightly higher in PE texts (4.33) compared to human translations (4.20) and machine translation (4.00). The lack of significant growth in this area (p = .888) suggests that intermediate learners continue to rely on clause-based strategies to express meaning rather than developing more advanced phrasal constructions, such as complex noun phrases and non-finite verb phrases, which require higher syntactic proficiency to master. In addition, syntactic complexity in PE texts (3.98) increases insignificantly compared with human translations (3.92) and is nearly identical to MT texts (3.99). The lack of significant improvement may reflect both the task-related constraints and the intermediate learners’ still-limited syntactic repertoire. Under these conditions, learners tended to prioritize lexical adjustments rather than engage in deeper syntactic restructuring of MT output. This pattern further indicates reliance on MT-provided sentence structures and reflects a tendency toward syntactic simplification, consistent with the simplification hypothesis in translation (Kwok et al., 2025; Liu & Afzaal, 2021). These syntactic profiles do not align with the level of syntactic generation generally expected of university-level EFL learners. Overall, while learners actively engage in lexical and textual generation through MT input, their syntactic development remains limited at the sentence and phrase levels. This suggests that intermediate learners require explicit, form-focused support to move beyond clause-based structures and to experiment with more complex syntactic patterns in their translations.
In relation to research question 2, data from the revision categories also support the above analysis. Most revisions focused on vocabulary-level issues, which are often unnecessary and incorrect. For example, a student incorrectly revised ‘collectively’ to ‘parallely’ in ‘The five most famous mountains in China are collectively called . . .’ (中国最有名的五座山合称 . . .). Or some unnecessarily added ‘many’ before ‘tourists’ in ‘tourists came here to explore . . .’ (游客到此探幽访胜 . . .). The relatively high rates of incorrect and unnecessary revisions highlight the fact that NMT systems exhibit more advanced language proficiency than intermediate learners (Yamada, 2020). The observed 39.7% correct revision rate reveals a performance-competence gap, where MTPE outputs exhibit higher overall text complexity, despite learners’ documented difficulties in correcting specific MT errors. Moreover, the correct revisions were most frequently associated with tense correction, whereas errors involving syntactic issues were seldom tackled, suggesting that the task’s pedagogical value lies in its diagnostic power to expose syntactic blind spots rather than in its ease of completion.
Despite those unsuccessful revisions, which accordingly bring negative perception to PE tasks, there are also correct revisions associated with positive learning experiences. Some students commented that ‘The task is much easier for me after learning the differences on sentence structures between Chinese and English.’ For example, in the translation of the Chinese saying ‘五岳归来不看山,黄山归来不看岳’, which exemplifies a paratactic construction requiring hypotactic restructuring in English. Several students managed to revise this phrase effectively: ‘As long as you have seen the Five Sacred Mountains, you will not be interested in other mountains. As long as you have seen Mount Huangshan, you will not be interested in the Five Sacred Mountains.’
The above instances demonstrate the potential of PE guidelines to facilitate learners’ linguistic and translation development, as intermediate learners are at a level most sensitive to grammatical errors (Lee, 2020), needing guidance to actively make corrections. But the lower syntactic revision and syntactic complexity in PE texts suggest that more emphasis needs to be put on sentence-level structures in PE tasks. Collectively, these patterns suggest that intermediate learners prioritize lexical adjustments over deeper structural revisions, revealing a gap between lexical gains and syntactic development. To address this imbalance, PE training should explicitly draw learners’ attention to syntactic contrasts between Chinese and English. Building on these empirical findings, the present study proposes tentative refinements for future research and pedagogical practice, designed to foreground such contrasts between Chinese and English and strengthen learners’ sentence-level competence in C-E translation. The revised guidelines are as follows:
Adopt authoritative translations for terminologies and culture-loaded expressions.
Use synonyms to enhance cohesion and avoid repetition.
Convert verbs to nouns where appropriate.
Employ passive voice when actions predominate.
Use non-finite verbs to avoid flat sentence structure.
Transform paratactic Chinese into hypotactic English.
In the revised set of guidelines, the original seven have been streamlined into six, facilitating easier memorization and more effective application. The first guideline addresses lexical issues commonly arising from MT outputs, while the remaining five focus on common issues identified in the present analysis. These refinements aim to shift intermediate learners’ focus from predominantly lexical adjustments toward more balanced lexico-syntactic development in future MTPE implementations. What’s more, as suggested by Yamada (2023), these guidelines can be further used as prompts for LLMs to facilitate targeted translation training.
5.2. Learners’ Attitudes and Translation Competence as Generation and Selection Abilities
In relation to research question 3, despite the enhanced text quality achieved through PE, learners’ perceptions of MTPE tasks were primarily neutral. While a minority of learners found the MTPE tasks beneficial, the majority reported experiencing challenges or frustrations in completing the translation tasks. The synthesis of the TF–IDF keywords with sentiment distribution reveals that learners focus on the agency of the translator (Rank 3) within the PE workflow (Rank 4), coupled with persistent sensitivity to linguistic constraints (Rank 9, 10). Rather than expressing generalized dissatisfaction, learners tended to anchor their critical evaluations in concrete challenges related to cultural connotation and linguistic adequacy. This indicates that their reservations toward MTPE stem from task-specific problem awareness rather than resistance to technology itself. A closer examination of the keywords further clarifies the sources of learners’ cautious attitudes. Terms such as ‘connotation’, ‘vocabulary’, and particularly ‘word order’ suggest that learners became more aware of linguistic differences between the source and target languages during the post-editing process. The prominence of ‘word order’ indicates that learners paid particular attention to structural differences between languages, which often required syntactic restructuring beyond simple correction. Addressing these issues required learners not only to identify potential problems in MT output but also to decide the extent to which the translation should be revised. For intermediate learners, whose ability to critically evaluate and revise translation options is still developing, such decision-making can create additional cognitive pressure and uncertainty. The refined guidelines proposed earlier in this study may therefore help reduce some of this uncertainty by offering more structured support for MTPE tasks. These findings echo observations that PE is sometimes reported as being a more demanding task than translation without MT as an aid (O’Brien, 2022; Yamada, 2020), pointing to a potential disconnect between measurable performance improvements and learners’ subjective experience of the task. Together, they highlight the need to rethink the training of translation competence for intermediate learners in the digital era.
Early research on translation competence primarily approached it from the perspective of bilingual ability, viewing it as ‘an innate skill’ naturally possessed by bilinguals. In recent years, however, translation competence has been widely conceptualized as a complex, multicomponent construct encompassing various interrelated skills and knowledge areas (EMT, 2022; PACTE Group, 2005). Nevertheless, Pym (2003, 2013) criticized the tendency among theorists to continually expand models by incorporating new skills, particularly in response to the increasing reliance on electronic tools. He argued that such expansions are conceptually flawed, as they fail to keep pace with market demands. Instead, Pym (2003) advocated for a minimalist theoretical framework, reducing translation competence to two core abilities: generation and selection. It defines translation competence primarily as the ability to generate a series of more than one viable target text for a pertinent source text (ST) and select only one viable TT from this series, quickly and with justified confidence. It moves away from rigid notions of equivalence and compartmentalized competencies, instead prioritizing adaptability, problem-solving, and purpose-driven reasoning, qualities essential for effective translation in a context increasingly shaped by MT and electronic tools.
Pym’s approach aligns well with the realities of MT and PE, where the machine often assumes the role of initial text generation, and the human translator’s core task shifts toward evaluating, modifying, and selecting among candidate translations. In the analysis of revision categories, the lower correct revision rate compared to the incorrect and unnecessary revision rates indicates intermediate learners’ insufficient ability to detect subtle MT errors involving nuances, idiomatic expressions, or culturally loaded content, While some learners produced more accurate translations in their human translation tasks, they retained incorrect MT outputs during PE practice, exhibiting a tendency to trust machine-generated translations and perceive them as superior to their own. This highlights the need to enhance skills such as creating or modifying scholarly texts to facilitate easier translation by MT systems, and refining MT outputs to improve their accuracy and readability (Bowker & Ciro, 2019; O’Brien & Ehrensberger-Dow, 2020).
This perspective provides a valuable theoretical lens for understanding how MT and PE can be harnessed to develop translation competence. With the wide availability of MT systems, they can be leveraged to perform the generation role, producing varied translations for learners to practice their selection skills. The distribution of revision types in this study indicates that the chosen texts provided ample opportunities for error correction, confirming their suitability for training intermediate learners. The texts used in this study and throughout the course are primarily general and culturally relevant, focusing on topics related to Chinese culture rather than specialized content. This approach caters to the needs and interests of non-English majors, offering an appropriate level of complexity for learners to develop translation skills and confidence without being overwhelmed by technical terminology (Kliffer, 2005). Correcting MT errors helps learners develop overt language knowledge, while accepting accurate MT results fosters covert language knowledge, both of which are crucial for improving overall language proficiency. With meaningful PE opportunities for error identification and correction generated by general-domain texts on Chinese culture, these activities not only enhance students’ translation competence but also promote their language learning.
Building on Pym’s definition of translation competence as the abilities of generation and selection, this study proposes a pedagogical design for intermediate EFL learners that integrates multiple translation tasks to develop learners’ linguistic and translation competence. Specifically, this design utilizes two MT outputs for learners to practice evaluative selection skills, while also incorporating two sentences for learners to engage in post-editing and human translation, respectively. This approach emphasizes the development of selection skills by guiding learners to choose appropriate MT tools for generation and to evaluate and select the most suitable translation from the two MT outputs. Such a focus not only enhances the learning experience but also fosters a more positive sentiment toward translation tasks. At the same time, the inclusion of PE and human translation in the designed tasks provides intermediate learners, who often lack confidence in full-text translation, with moderate opportunities to refine their linguistic abilities without being overwhelmed. This balanced design addresses the critique that PE activities often lack the authentic language training necessary to fully develop both linguistic and translation competence (Kliffer, 2008).
6. Conclusions and Limitations
This study explored the effects of MTPE practice on the text complexity of translations produced by intermediate Chinese learners, as well as learners’ revision behaviors and their perceptions of MT and PE strategies. The findings reveal that MT combined with PE strategies offers distinct benefits but also presents challenges that require targeted pedagogical intervention.
First, the study finds that MTPE substantially supports learners in generating significantly richer lexical and textual complexity, indicating that MT can serve as a valuable resource for exposing learners to more sophisticated linguistic input. However, its impact on syntactic development remains limited, as learners made few syntactic enhancements in either PE outputs or human translations.
Second, revision analysis reveals that PE tasks through the selected texts provided ample opportunities for learners to engage in error detection and correction, but students struggled particularly with sentence-level revisions. This indicates that learners’ selection competence, the ability to critically evaluate MT output and choose appropriate linguistic forms, requires explicit instructional support, highlighting the need for PE guidelines specifically targeting C-E syntactic differences.
Third, learners’ reflections show that while few students found the MTPE tasks positive, the majority experienced challenges and frustration, reflecting difficulties in both generation and selection processes at the intermediate proficiency level. Nevertheless, the synthesized keywords analysis proves that the students’ neutral and critical stance is grounded in specific, identifiable translation problems rather than vague dissatisfaction. This suggests that MTPE, while pedagogically promising, must be implemented with proficiency-appropriate task design, scaffolding, and feedback in order to enhance learners’ engagement and confidence.
Taken together, these findings suggest that intermediate learners require (1) targeted syntactic PE guidance to fully leverage MTPE, (2) explicit instructional support to develop selection competence through effective MTPE, and (3) proficiency-appropriate task design to address negative attitudes. The study contributes to MT‑mediated translation pedagogy by proposing a set of PE principles emphasizing syntactic refinement and critical evaluation. As MT tools continue to evolve, MTPE remains a valuable instructional tool, not only for producing translations but, more importantly, for fostering learners’ linguistic and translation competence.
Despite these insights, several limitations must be acknowledged. First, the sample sizes for the main analyses were relatively small. The linguistic complexity analysis was restricted to a relatively small subset of learners who produced sufficient text in both tasks, and the sentiment analysis was based on a limited number of voluntary reflections. These constraints, though necessary for metric reliability and genuine insight collection, reduce statistical power and may favor more proficient or motivated learners, limiting generalizability. The findings should thus be viewed as indicative and exploratory. Second, the within-participants design and classroom-based setting introduced potential confounds. Without a control group, causal attributions to the PE guidelines are tentative, as factors such as technology or task familiarity may also contribute. Moreover, the time-constrained classroom environment likely increased cognitive load, reflecting typical student conditions rather than professional translation processes. The use of a single text and genre further restricts the scope.
These limitations, while inherent to an exploratory study with intermediate learners, highlight opportunities for future research, such as larger-scale replications with diverse participant pools, flexible timing, multi-genre texts, and control groups to disentangle variables. Future studies may also explore integrating LLMs for personalized, real-time feedback to support intermediate learners’ transition from sentence-level revision to more autonomous translation decision-making, thereby reducing cognitive burden while scaffolding the development of both generation and selection competences.
Footnotes
Acknowledgements
The author would like to thank all the participants for their involvement in this study. The author is also grateful to the anonymous reviewers for their valuable comments.
Ethical Considerations
This study involved teaching experiments that did not require formal ethical approval. All participants were informed about the study objectives and procedures, and their participation was voluntary. No sensitive personal data were collected.
Funding
The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by Guangdong Provincial Education Science Project (Higher Education Special Project) (2022GXJK143), South China Agricultural University(JG2023103), Guangdong-Hong Kong-Macao Greater Bay Area University Online Open Course Alliance (WGKM2024041). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Data will be made available on request. For requesting data, please write to the corresponding author.
