Abstract
Automated program repair (APR) has been studied extensively in recent years. Existing approaches mainly generate single-position patches that fail to address multilocation faults effectively. While existing multistep repair approaches can iteratively generate patches for each fault position sequentially, their data augmentation methodologies lack rationality and deviate from real-world scenarios. Furthermore, they overlook the interdependencies between faulty statements, leading to patches learned from erroneous contextual patterns. In this article, we propose MuTemAPR, an APR approach that iteratively generates multilocation patches. MuTemAPR incorporates templates with neural machine translation. Specifically, our method introduces three key innovations. First, we design a template-based data augmentation framework that transforms single-line faulty code into multilocation faulty code through 35 mutation templates. It simulates a real-world environment by establishing variable-type mapping tables for more accurate repair augmentation. Second, we propose a reinforced faulty context training method that employs progressive annotation to incrementally learn repair processes from top to bottom in multifault code. Third, we implement a semantic constraint mechanism during training that enforces syntactic and semantic rules through differential analysis between templates, input code, and generated patches. We evaluate MuTemAPR on the widely used Defects4j benchmark. Experimental results demonstrate that our approach can effectively repair multilocation faults, successfully fixing five additional bugs compared with state-of-the-art methods on Defects4j v1.2 and v2.0.
Introduction
With the exponential growth in software scale and complexity, the cost of manual program repair has become increasingly prohibitive. 1 In recent years, numerous researchers have focused on automated program repair (APR) approaches. Existing approaches can be broadly categorized into search and deep learning-based methods.
As a type of search-based approach, the template-based approaches first set up several patterns of code-matching templates and repair rules.2–14 During execution, these approaches iteratively match faulty statements against predefined templates. Upon successful matching, they employ search algorithms to extract contextual parameters from the faulty code, which are subsequently integrated into fix patterns to generate patches. Although straightforward and efficient, such methods fail to address faults outside predefined template patterns.
Among the existing solutions, deep learning-based methods demonstrate superior repair capabilities.15–28 Most existing neural approaches adopt the neural machine translation (NMT) mechanism, in which models receive faulty code snippets as input and directly output repair patches. By leveraging the capacity of deep learning to capture hidden patterns in large-scale datasets, these methods exhibit significant advantages over template-based approaches.26–28 Recent advancements have integrated deep learning with template approaches, resulting in a template-based neural program repair framework. 29 This hybrid method employs two neural networks: the first determines the appropriate repair templates and identifies key repair parameters, whereas the second generates statement-level patches. Subsequently, a patch recovery tool converts these outputs into source code patches. This methodology combines the strengths of both methods to achieve notable performance improvements. However, critical limitations persist: Tenure 29 failed to incorporate code syntax–semantic constraints during model training, instead retaining conventional beam search strategies from prior NMT-based work. This approach generates numerous candidate patches through beam search before conducting semantic filtering and compilation checks. Crucially, Tenure applies semantic constraints after beam search, resulting in a substantial generation of inherently invalid patches (e.g., null-pointer checks for integer variables) during the search phase. This architectural flaw significantly reduces model accuracy and operational efficiency.
Furthermore, it should be emphasized that all the aforementioned methods operate through single-position repair. Specifically, they generate candidate patches by inputting faulty functions into models, with subsequent validation as the final output. This paradigm limits their capacity to address complex faults. Empirical observations from existing repair outcomes reveal that these methods can only handle faults spanning—one to three lines28,29 while demonstrating inadequate performance for more difficult faults. Even when facing complex faults, existing approaches can only partially repair several statements rather than achieve complete repair. Similarly, these approaches cannot address the problem of multilocation faults.
To address these limitations, Iter 30 proposed an iterative methodology that integrated fault localization, patch generation, and patch validation. This approach verifies patches from previous iterations, performs renewed fault localization, and generates subsequent patches within each iteration cycle. While effectively mitigating incomplete repairs inherent to single-step generation and enabling multilocation repair capability, there are still three limitations: First, the training dataset construction exclusively mutates single positions in the correct code snippet, ignoring authentic multilocation fault patterns. This data augmentation method severely decreases the repair efficiency when handling multilocation faults (see the “Data Augmentation Strategy” section). Second, Iter did not consider the visibility of the model during the training process. During training, Iter inputs the faulty code obtained by mutating the same correct code block at different positions into the model without taking any masking measures. This makes it easy for the model to generate correct repairs by memorizing the content of this code block, rather than truly possessing the ability to repair (see the “Model Training Strategy” section). Third, Iter ignores the constraints of code syntax and semantic rules. Instead, it treats compilation errors caused by syntax and semantic rules as ordinary faults. This does not align with the real-world environment, as most faults in the real repair environment can pass compilation and do not have missing syntax or semantics. Therefore, the problem of inconsistent code syntax and semantics should be mitigated both during model training and model inference (see the “Rule-Guided Semantic Filter” section).
In this article, we propose MuTemAPR, a novel neural program repair approach for multilocation patches that incorporates templates. Our goal is to extend existing single-location repair methods to multilocation repair to improve the accuracy of automatic repair methods. To achieve the above objectives, we improve the following three aspects.
First, we propose a new data augmentation method for multilocation repair. We conduct code analysis on each training data separately, extracting all the code information such as variable types, function input parameter types, output parameter types, and so forth. Then, we perform code matching and mutation in combination with 35 repair templates to mutate the non-faulty locations in the code into faulty ones. During the mutation process, we fill in the data according to the extracted code information to generate training data without compilation errors as much as possible, which aligns with the real-world environment. We randomly generate one to three faults for all training data to balance the training effect and training efficiency.
Second, we propose an ordered training strategy for generating multilocation patches. In order to prevent the model from memorizing different training data obtained by mutating the same correct code instead of learning the repair rules, we sort the different training data generated from the same sample in descending order of the number of faults. During training, we first input the training data with a larger number of faults and then the training data with a smaller number of faults. This approach simulates the repair environment and prevents the model from learning the correct repair patches of other locations. During the training process, we enhance the labels for the model by marking other faulty statements to reduce the model’s dependence on the content of other faulty statements.
Third, we propose a syntax and semantic rule filter during training. First, we dynamically construct the code information of the code block during the model training process. Then, when the model generates patches, we dynamically check them against the code information. If the content generated by the model does not conform to the syntax and semantic rules, we set the probability of that part of the patch to 0. We add an additional loss function to calculate the difference in patch content before and after the filter. This loss function will guide code syntax and semantic rules for the model.
We carried out a series of comprehensive experiments using Defects4j. Our findings indicate that MuTemAPR outperforms all selected baselines (repairing 5 more faults than Tenure and 12 more than Iter). We also studied the performance of MuTemAPR under a smaller beam size, and the experiments prove that the design of MuTemAPR enables it to achieve relatively better results within a smaller beam search range. The main contributions of this article are as follows:
We propose a new data augmentation method to expand single-location fault data to multilocation fault data. We present a new training strategy for multilocation faults to address the problem of the model over-memorizing training data. We introduce a rule-guided syntax and semantic filter, which enables the model to correct generation in a timely manner during the training process, thereby improving the generation accuracy and reducing the number of useless candidate patches. We conduct comparisons with other models on the widely used Defects4j dataset. The experimental results demonstrate the effectiveness of our approach.
The rest of this article is organized as follows. In the “Related Work” section, we stress the related work to our approach. The “Methodology” section presents our methodology. The fourth section describes the experiment setup, explains the evaluation results, and discusses the performance of the model. In the “Conclusion” section, we conclude this article.
Related Work
Template-based APR methods2–14,31–34 primarily identify available faulty code types through manual definition or mining from code repositories and employ pattern-matching techniques to fix the code. Among them, Tbar 14 has organized 35 template types from existing methods and demonstrates the best performance in template-based program repair.
With the widespread application of deep learning in various fields, repair methods based on deep learning have been proposed. These techniques utilize the structure of neural translators. Specifically, the encoder first learns the faulty code and generates an intermediate representation, and then the decoder generates the corresponding patch for the code. As an early automated repair method, SequenceR 21 uses long short-term memory (LSTM) 35 networks as both the encoder and decoder to repair the code. Subsequently, DLFix 22 employs a tree-based recurrent neural network and uses the abstract syntax tree (AST) as the code representation. Coconut 23 uses two models to learn the context code and the faulty code, respectively, and simultaneously employs models trained with multiple different hyperparameters to obtain correct patches more comprehensively.
To ensure that the generated code complies with the syntax rules of the programming language, some studies have proposed embedding syntax and semantic rules into the model. The innovation of Recoder 25 lies in adopting a syntax-guided edit decoder with a provider/decider architecture to accurately predict edit operations, ensuring the syntactic correctness of the repaired program. It also handles project-specific identifiers by generating placeholders. Knod 28 uses a novel three-stage tree decoder to directly generate the AST of the patch code to capture the code structure. Meanwhile, it integrates syntax and semantic rules into the decoding process during training and inference phases through domain rule distillation, enhancing the repair effect and generalization ability. Transfer-PR 27 pretrains 12 binary classifiers to select repair templates. Tenure 29 uses an LSTM to generate repair templates and the necessary key information directly. Additionally, it employs a copy mechanism to address the out-of-vocabulary (OOV) problem. Tenure combines the strength of template- and NMT-based APR methods. It constructs a big-scale dataset containing over 30 templates and 1 special template, uses an encoder–decoder model to learn semantic and syntax features and generate patches that need anti-regularization, and optimizes the copy mechanism through two models to avoid the OOV problem. Until now, some large language models have been used for APR.36–41
Iterative repair methods have also been proposed in recent years and are regarded as practical approaches for addressing multilocation faults. Iter 30 iteratively improves partial single-location and multilocation patches, integrating fault localization, patch generation, and patch verification into an iterative loop, which is carried out during both the training and inference phases.
Methodology
Overview
In this article, we propose MuTemAPR, a new APR method. The overall structure of MuTemAPR is shown in Figure 1.

Structure of MuTemAPR.
Training Phase
The training phase of MuTemAPR consists of three main components: a data augmentation model, a sequential inputter, and a transformer neural network with semantic constraints. Given a code block with a single fault mined from the network, the data augmentation model will generate several variants for it, depending on the mutability of the code and the requirements of the dataset (see the section “Data Augmentation Strategy”). Then, the sequential inputter will sort the data provided by the augmentation model, perform special marking, and input it into the model in order. During the model training process, MuTemAPR compares the code information with the content of the generated patches to determine whether the generated patches conform to the syntax and semantic rules, thereby improving the accuracy of the model.
Inference Phase
During inference, MuTemAPR adopts an iterative training process. It treats fault localization, patch generation, and patch validation as a complete process. It repeats the entire repair process n times to solve the problems of insufficient patch generation and multilocation faults. During the inference process, MuTemAPR also uses a filter to correct the semantics of the generated patches and eliminate useless templates.
In the following content, we introduce: (1) Data augmentation strategy: We adjust 35 repair templates from existing work and use them in reverse for code mutation to expand the multilocation fault dataset. (2) Model training strategy: It provides effective progressive learning for multilocation faults and avoids excessive memorization by the model during training. (3) Rule-guided semantic filter: By establishing mappings and constraining the generated patches, it reduces the problem of templates that do not conform to semantic specifications.
Data augmentation strategy
Most training sets used in existing deep learning-based methods contain single-line or single-location faults. Even when using iterative methods, only one fault is introduced into the correct function. Consequently, the models trained in this way can only perform single-location repairs.
We perform multilocation mutation on the datasets provided by existing methods. To more realistically simulate faults and reduce compilation problems in the training code, we first collect code information and build a code information table. We expand the code block into an AST and collect internal data, including classes, variables, methods, and parameters. Table 1 shows an example of code information collection. Based on this information, we dynamically fill in semantically appropriate data during data augmentation.
Example of code information collection
We select the templates provided in TBar, which aggregates 35 common repair templates. By reversing the repair methods, we transform them into mutation templates. An overview can be found in Table 2, and the detailed description can be found at https://github.com/soriyuzh/MuTemAPR Templates. In Table 2, Template 36 is provided from Tenure, and it will not be used for data augmentation. Instead, it will be used for patch generation. During the mutation process, we expand the source code into an AST and traverse each non-leaf node from the front to the back. Traversing from the front to the back simulates a situation where multiple locations are interrelated. During the traversal, MuTemAPR matches the 35 templates with the AST sub-trees. When a sub-tree conforms to the template rules and the information required for its mutation exists in the code information table, we mutate this sub-tree. Taking Template 1 as an example, MuTemAPR needs to determine that there is an exp variable in the if statement that satisfies the template rules. At the same time, the exp variable must be of a nonbasic variable type in the code information table. During the mutation process, the code information table will be updated promptly along with the code mutation to assist in the subsequent mutation of this code snippet. Moreover, when performing multilocation mutations on a code block, MuTemAPR will count the node positions of each faulty statement to avoid repeated mutations of the same statement.
Overview of 36 templates
According to our statistics of Defects4j v1.2 and v2.0, 78% of faulty versions usually have three or fewer faults, and 22% have four or more. Therefore, MuTemAPR allows a code block to have at most three faults. For an original code block
We adopt a polling approach to traverse the templates for code mutation. The polling strategy iterates through the templates in the order of their template numbers. If a template fails to meet the information table matching conditions of the current code block, it will be skipped directly, and the next template will be attempted. For instance, if the
Notably, the 13th template (Remove Buggy Statement 1) and the 14th template (Move Statement 1) generally apply to almost any code block. MuTemAPR only uses the 13th and 14th templates when it determines that any other templates cannot repair a particular code block. This method can significantly reduce the proportion of code generated by these two templates in the dataset (from 19% to 7%). It is important to emphasize that all the information we utilize during the code generation process is context-specific. This ensures that the generated faults align with the actual situation and can enhance the authenticity and effectiveness of the data augmentation.
MP1: Remove Checker
Templates 1 to 8 are all mutation variants for removing valid checkers. Specifically, Template 1 is used to remove type-conversion checkers. Templates 2–6 are for removing null-pointer checks, and Templates 7–8 are for removing range checks when retrieving data from arrays.
MP2: Remove Statement
Templates 9–12 are used to delete some normal statements. Among them, Template 9 deletes expression statements; Template 10 deletes return statements; Template 11 deletes exception-catching statements; Template 12 deletes if-comparison statements.
MP3: Insert Statement
Template 13 inserts a random statement. Under the condition of satisfying the code information table, Template 13 will randomly select a position to insert additional code.
MP4: Move Statement
Template 14 randomly selects a non-faulty statement and moves it to a random position.
MP5: Mutate Condition
Templates 15–17 are used to mutate conditional statements. Template 15 is for replacing conditional expressions, Template 16 is for randomly inserting conditional expressions and conditional symbols, and Template 17 is for randomly deleting a part of the conditional expression.
MP6: Mutate Clone
Template 18 is used to remove the clone method of the parent class and instead directly generate a new one, which results in the inability to inherit existing attributes and other content.
MP7: Mutate Data Type
Templates 19 and 20 are used to replace variable types. Since our method only focuses on analyzing the context within a single code block and cannot obtain global information, it is impossible to determine whether the types before and after replacement have an inheritance relationship. Therefore, this mutation pattern is more likely to lead to compilation errors.
MP8: Mutate Integer Division Operation
Templates 21–23 are used for the compilation of integer division. Templates 21 and 22 remove the decimal types in the division to simulate the scenario of forgetting type conversion. Template 23 is used to remove an implicit type conversion.
MP9: Mutate Literal Expression
Templates 24 and 25 are used to remove literals, including booleans, numbers, or strings. Among them, Template 24 randomly replaces a literal with another literal of the same type, and Template 25 replaces a literal with an expression.
MP10: Mutate Method Invocation Expression
Templates 26–29 are used to replace the functions and their parameters in method invocations. Template 26 is for replacing the function name, Template 27 is for randomly replacing the input parameters inside the function, Template 28 is for randomly adding parameters inside the function, and Template 29 randomly deletes parameters inside the function.
MP11: Mutate Operators
Templates 30–32 are used to replace operators. Template 30 replaces one operator with another while maintaining the same type (e.g., relational or arithmetic). Template 31 is for changing the precedence of arithmetic operators. Template 32 replaces the null-pointer check with the instance of comparator.
MP12: Mutate Return Statement
Template 33 randomly replaces the literals, variables, conditional expressions, and other contents in a return statement with similar expressions.
MP13: Mutate Variable
Template 34 is used to replace one variable with another, and Template 35 is used to replace an expression with a random variable.
Model training strategy
Existing methods28–30,42 only adopt traditional training strategies, where the data is shuffled and randomly input in each epoch. Since the training data is derived from several mutation factors, the faulty codes obtained from the same original code block exhibit similarities. This causes the model to memorize the correct content in each code snippet rather than learning the repair methods. Therefore, we propose a novel training strategy. MuTemAPR first marks different faults in a code and then conducts training in the order of the number of faults.
Fault Labeling
Existing methods only perform simple <pos> and <eos> markings on faulty statements, representing the start and end positions of the faulty statements, respectively. In this article, we first sort all the faults in a code block. The sorting rules are as follows: (1) If fault

Example of fault labeling.
Ordered Training
MuTemAPR does not adopt the traditional approach of randomly inputting data for training. Instead, we train on the code with more faults and then on the code with fewer faults. This approach offers several advantages: It enables the model to learn the multilocation fault repair process and prevents the model from prematurely encountering the correct content within the code block.
Take Figure 3 as an example. After data augmentation, we have three datasets: a dataset containing code with three faults, a dataset with code having two faults, and a dataset with one fault. Both are derived from through the data augmentation strategy. Some parts of the correct code in the original sense are incorrect within the context of the dataset.

Process of ordered training.
During training, we first input the data from and train the model for several epochs. Subsequently, we input the data from
Rule-guided semantic filter
Template Semantic Rules
MuTemAPR employs a graph-transformer neural network, 43 demonstrating the best performance in single-location repair. The graph-transformer can learn the structural information within ASTs and places more emphasis on the code’s structural features than the traditional transformer. MuTemAPR adopts an iterative prediction method, treating fault localization, patch generation, and patch verification as an integrated process. We train the graph-transformer to adapt to 36 templates from Meng et al. 29 However, existing methods overlook the syntactic and semantic compliance of the generated patches. Although Zhang and Wang 44 proposed some repair methods for templates, these methods are insufficient. Building upon it, we present an inspection scheme based on the code information table. Specifically, we leverage the code information table constructed in the “Data Augmentation Strategy” section to examine the rationality of the generated patches.
The specific details of semantic detection are shown in Table 3. According to Rules 1 and 2, Template Mutate Integer Division Operation 1–3 can only be used when the divisor and/or the dividend belong to the basic numeric types. Rule 3 states that using Template Mutate Method Invocation Expression 1 requires that the input parameters of method1 and method2 have the same type. Rule 4 indicates that Template Mutate Method Invocation Expression 2 can only be used when the type of
Semantic rules and its relation with templates
Semantic Loss Update
We utilize an additional loss to calculate the difference between
Among them,
Evaluation
Benchmark
In our experiments, we employ Defects4j v1.2.0 and v2.0.0, 46 both of which are datasets extensively utilized for validating APR tasks. Defects4J-v1.2 encompasses 395 actual bugs sourced from 6 open-source Java projects. An outline of Defects4j v1.2 is presented in Table 4. Moreover, Defects4j-v2.0 features an additional 444 faults and has been applied in recent research for testing purposes.
Summary of Defects4j v1.2.0
Experimental setting
Our dataset is sourced from Tenure,
29
which includes two thousand Java projects on GitHub with the highest number of stars. Inside,
Training dataset
The model is trained for 100k iterations on the one-fault dataset with a batch size of 6, 65k on the two-fault dataset, and 30k on the three-fault dataset. Each iteration number is obtained through grid search based on overall effect. Grid search selects the number of iterations ranging from 30k to 100k in steps of 5k. For each dataset, we select 10,000 pieces for validation and another 10,000 for testing. The detailed hyperparameter settings 47 are presented in Table 6. We use the patch recovery tool from Tenure, and we have implemented improvements. After recovering the generated repair information in the code, we perform deregularization to restore custom variables and function names. 48 We conduct three rounds of repair during iterative repair, generating two candidate patches in each round, which is consistent with Iter. A deregularization process is carried out once the generated repair information is transformed into code. This process aims to restore custom variable names and function names. In the iterative repair process, we execute three rounds of repair operations, maintaining consistency with Iter. 30
Setups for experiment
Since MuTemAPR is an automatic repair approach that combines fault localization and patch generation, we conduct experiments under the localization settings of Ochiai. 49 The Ochiai algorithm ranks among the most prevalently used spectrum-based fault localization techniques in existing APR research. Ochiai-based localization represents a realistic and standard automated repair process. We utilize the GZoltar tool to perform fault localization on the Defects4j dataset. 50 The experiments are executed on a server with a 64-bit, 20-core 2.1 GHz CPU, 502 GB of RAM, two NVIDIA Tesla V100 GPUs boasting 64 GB together, and operating under the Ubuntu 18.04 operating system environment. 51
Result and discussion
RQ1. MuTemAPR Performance in APR
To investigate RQ1, we estimate our method with the state-of-the-art methods: one template-based APR method Tbar 14 ; five NMT-based APR methods: CoCoNut, 23 RewardRepair, 20 Recoder, 25 Transfer-PR, 27 and Knod 28 ; and Tenure, 29 an APR method that combines templates with deep learning. Additionally, we also compare it with the state-of-the-art multiround repair method, Iter. 30 We use the Defects4j dataset for verification under the Ochiai fault localization method. The results of CoCoNut and Transfer-PR on Defects4j v2.0 with Ochiai are not publicly evaluated.
As seen from Table 7, under the condition of using Ochiai for fault localization, MuTemAPR outperforms all the other comparison methods. MuTemAPR repairs at least five more bugs than the other methods. Data augmentation expands multifault training samples, ordered training improves model learning efficiency, and semantic filtering reduces invalid patches. The synergy of the three achieves performance improvement. The reason why MuTemAPR is superior to TBar can be mainly attributed to the fact that we use deep learning to select the most appropriate template instead of traversing and matching each template.
Results of comparative study for Automated Program Repair
Compared with the NMT-based methods, first, our method integrates template information, which makes the generated content more concise and reduces the probability of errors during the generation process. In addition, we have expanded the training dataset through data augmentation, and our augmentation method is more in line with real-world faulty code. Compared with the Tenure method, which combines templates with NMT, we added semantic restriction rules during model training. As a result, the model learns the relevance of each template to the code semantics during training. During inference, templates that do not meet the semantic constraints are filtered out through these rules, making the generated content more accurate. Compared with the Iter method, which also uses iterative training, we have further improved the augmentation and training methods to address the issue of the model memorizing correct statements during training.
When tested on the Defects4j v2.0 dataset, the performance difference between our method and Tenure is not significant, with only a difference of two bugs. This is because the templates we use are derived from TBar, which is based on Defects v1.2.0 and lacks the analysis of fault data from v2.0.0.
RQ2. MuTemAPR Performance at a Small Beam Size
Table 8 presents the performance of Tenure and MuTemAPR under different beam sizes. The selected beam size value is based on the experimental settings of the existing mainstream methods, and it covers three typical scales: small, medium, and large. It can be observed that as the beam size gradually decreases, the number of bugs repaired by both tools decreases. As the beam size decreases, both MuTemAPR and Tenure experience a decline in the number of correctly repaired bugs. However, the downward trend of the correct repair count for MuTemAPR is more advantageous than that of Tenure. This is because MuTemAPR regulates the process of template selection by the model by incorporating semantic constraints into the model, and the training process is also more in line with real-world repair scenarios. Therefore, MuTemAPR can generate more patches that conform to grammar norms, which also means that the usability of these patches has been further enhanced. As a result, the generated templates automatically filter out those that do not conform to semantic norms.
Results of comparative study for Automated Program Repair in small beam size
RQ3. Ablation Study of MuTemAPR
Based on the components of our method, we conduct the following ablation experiments to demonstrate the effectiveness of our model. We have set up the following control groups: (1) the method without data augmentation (w/o data augmentation); (2) the method without special fault labeling (w/o fault labeling); (3) the method without using ordered training (w/o ordered training); And (4) the method without semantic filters to constrain the syntax of the patches generated by the model (w/o semantic filter).
It can be observed from Table 9 that each major component of our method contributes positively to the repair performance. In particular, data augmentation provides five positive repairs for MuTemAPR. Fault labeling can offer three additional repairs. This is because the special label enables the model to learn the repair process in an orderly manner. After incorporating the ordered training, MuTemAPR repairs 11 more bugs. The reason why ordered training is highly effective is twofold: first, it does not adopt a data-repeated training mechanism, thus avoiding the problem of model oversight. Second, our training strategy can better simulate the scenario of repairing multiple positions. The data augmentation strategy and the training strategy are the core parts of this article. Through practical multifault training, the learning ability of the model can be significantly enhanced, resulting in the generation of more effective patches. Although this study focuses on common faults within three locations, its iterative repair mechanism theoretically supports handling more locations of faults through multiple iterations. The semantic filter can provide seven correct repairs, which is attributed to the additional syntactic constraints allowing the model to learn the effective code semantics during the training process.
Ablation study result for SeTemAPR
Threat to Validity
The external validity of the method proposed in this article is threatened by the programming language, as the effectiveness of the method is only verified in Java and not in other programming languages. However, the methods used in data processing can be applied to other languages. Meanwhile, the semantic inspection rules can also be modified and applied to other programming languages. Another threat to external validity is the quality of the test code. To address this issue, this article selects Defects4j as the verification dataset, a widely used database constructed based on actual errors. The main threat to internal validity concerns the criteria for trustworthy patches. To mitigate such risks, all potential fixes undergo manual examination. A fix is deemed valid solely if human reviewers confirm its accuracy. The manual check standard is that the patch must pass all the test cases of Defects4j, and the code syntax should be standardized, without redundant logic, and be consistent with the original code functionality. At the same time, the fault localization tool GZoltar may cause evaluation biases due to issues related to the version of dependent libraries, which may further affect the quality of patch generation. We have improved reproducibility using an open-source toolchain and a patch dataset. Finally, this article also refers to the previously published research results.
Conclusion
In this article, we propose MuTemAPR, a neural program repair approach for multilocation patches that incorporates templates. We put forward a scheme for augmenting training data by leveraging templates. Meanwhile, we incorporate an ordered training strategy into the method, enabling the model to effectively learn how to repair multilocation faults while avoiding memorizing patch content. Finally, we apply additional code semantic rules to constrain the patches generated by the model, thereby enhancing the usability of the generated patches. An experimental study on 839 real-world bugs from the widely used Defects4J benchmark demonstrates that MuTemAPR can outperform existing APR techniques. Our method is only applicable to the Java language for verification. Future work can be accomplished by extracting new templates from the bug repositories of recent open-source Java and other programming language projects, thereby accommodating more types of faults.
Authors’ Contributions
Conceptualization, T.Z.; Methodology, T.Z.; Software, T.Z.; Validation, Y.Z.; Formal analysis, T.Z.; Investigation, Y.Z.; resources, T.Z.; Data curation, Y.Z.; Writing—original draft preparation, T.Z.; Writing—review and editing, Y.Z.; Visualization, T.Z.; Supervision, Y.Z.; Project administration, T.Z.
Data Sharing Agreement
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Footnotes
Author Disclosure Statement
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding Information
The author(s) received no financial support for the research, authorship, and/or publication of this article.
