Abstract
Software evolution is an important issue for the development of high-quality software. Explosive development of the Internet demands software being evolved ever rapidly. However, legacy systems are very likely to outgrow the graphical model. As the complexity of systems keep increasing, more nodes and edges are added to the diagram, this lead to the diagram getting less readability. In the meanwhile, it is hard for software reengineers to investigate the legacy system in the higher abstraction view. Therefore, for the strategy of coping with legacy system evolution, this paper proposes a dynamic model slicing approach to facilitate the legacy system evolution. The case study shows the proposed approach is useful and efficient.
Introduction
Because of the critical value to the business process, there are a significant number of legacy systems kept in operation. These legacy systems are vital to the enterprise in that they are tightly coupled with the enterprise’s information and production infrastructure and have been thoroughly tested during many years’ operation. Undoubtedly, more and more existing software will become to legacy ones due to the rapid development of information technology and constantly changing requirements.
Software evolution [1, 2] is a way out. Being a broader name to software maintenance, software evolution refers to the study and management of the process of making changes to software over time which comprises development activities, maintenance activities and a sequence of software reengineering activities. Software reengineering [1] is the core technique for successful software evolution which can be seen as a combination of reverse engineering, functional restructuring and forward engineering. The purpose of software reengineering is to utilise the advantages of existing systems and new technologies. On one hand, existing systems can be reengineered by taking advantages of new technologies, while on the other hand, new development efforts derive benefit from reusing existing system. Software productivity and quality can be improved by using software reengineering across the whole life cycle [3]. The typical software reengineering techniques focus on the process starting from program source code which is thought as the only reliable information in a legacy system. During the reengineering, the program specifications can be recovered from the legacy source code and will then facilitate legacy system’s migration in the following forward engineering phase.
However, as legacy systems are always huge in size, models extracted from such systems are likely to become large and complex. Most legacy systems consist of many interdependent elements, such as classes, procedures, data structures, variables and binary modules. These elements are often heavily coupled between each other in intricate ways, like procedure calls, inheritance relationships and variable references. In such situation, modification at one program segment may affect other modules in unexpected and undesirable ways. Hundreds of objects are involved in thousands of interactions which makes the extracted models from such large architecture harder to read and understand. Moreover, it tends to be tedious and poor readability on one side and it is valuable to judge the impact of a certain change of one model elements on other parts on the other side [4].
For the strategy of coping with legacy system evolution, it is important to comprehend and assess the change impact during software evolution activities. Therefore, it is necessary to consider the relationship between the functional elements which will be reflected in the relationships in the extracted models when representing the legacy systems in the higher abstraction view. Legacy systems always contain many cooperating components, and these components are often organised into identifiable clusters, namely, subsystems. That is, components which contribute to the same business logic are included in a subsystem. Hence, according to domain-expert knowledge, a legacy system is preliminarily decomposed based on the discrepancy of business logic that embedded in various subsystems. However, as soon as extracted models become large, navigating through large models becomes a tedious task that hinders their understanding. Therefore, there exists a real need to support the comprehension of such kind of models so as to facilitate the legacy system decomposition.
The aim of this paper is to answer three research questions:
What kind of models are (not) used to being sliced and why? What dependency relationships are (not) used in model slicing and why? How effective/ineffective is model slicing in comprehension during software evolution?
Corresponding to these research questions, based on our previous work on security driven software evolution [5, 6], we present the following contributions in this paper:
A framework for slicing models with UML class and communication diagrams, An improved decomposition algorithm to extract the independent components based on the proposed model slicing algorithm, A conducted experiment showing that the proposed framework performs well in terms of correctness and navigation effort, on extracted UML models.
The remainder part of this paper is arranged as follows. Section 2 reviews the related works in this area. Section 3 illustrates our proposed model and its detailed context, while Section 4 elaborates the various dependency relationships and an intermediate representation diagram of them is illustrated in Section 5 with a slicing algorithm to slice the diagram. Section 6 describes a system decomposition approach based on the proposed model slicing algorithm. Section 7 exemplifies the proposed approach by a case study and results have been discussed. Finally, Section 8 concludes by summarizing the key contributions and outlining the future steps.
To understand and test a large software product is a very challenging task. One way to ease it is program slicing, which is a technique emphasising on some certain behavioural aspects of a program and removing non-relevant codes to this behaviour of the program [4]. Another is model based slicing [7], which is a technique decomposing large software architecture model into smaller models to identify relevant model parts and extracting related model elements throughout model that corresponds to user defined slicing criterion.
Traditional program slicing techniques [8, 9, 10] usually focus on analysing the data or control dependency relationships among the program statements. It is not the case when slicing the architectural models of software is performed because there are several other kinds of relationships among models, such as the relationship between class and class, operation and class, operation and operation, object and class, and object and object etc.
While the problem of software program slicing is a well-studied topic, there are relatively few works in slicing for model slicing. There are no studies which have investigated the work practices of model slicing for software evolution. In this section, an overview of model based slicing will be revisited, including the various general approaches and techniques used to compute slices.
Unified Modelling Language (UML) is widely used to represent and construct the architecture of software system with the help of its various model diagrams. With increase in the size and complexity of software product, UML models representing the high level design of the product tend to become huge and complicated which may comprise thousands of interactions involving hundreds of objects [4]. For better visualisation of software architecture, impact analysis and for test case generation the properties of system architecture with slicing can be taken into account.
Model slicing
Model slicing can be done on different kinds of models with different purposes, such as UML diagrams, architectural description model, infinite state chart, feature model, and any other kinds of dedicated-purpose models. Accordingly, various slicing techniques are used. In this section, UML-based model slicing techniques will be reviewed as dependency relationships, control and data flow, UML/OCL (Object Constraint Language) constraints, model transformation and feature. Dependency types are analysed in this method and usually Dependency Graph is constructed as an intermediate representation step while slicing UML Models that can describe the various types of dependencies. Researches using dependency relationships can be found in the following works. Zhao describes a new dependence analysis technique in [11] to support the software architecture development. The work is extended by Zhao [12] in which a static architecture slicing technique is introduced. Wu and Yi [13] develop an approach that comprises different class relationships to define dependency relationships among classes. Van Langehove [14] propose an algorithm to reduce the number of interference dependencies in state charts by using the concept of slicing with concurrent states. Wang et al. [15] present an approach to slicing hierarchical automata for model checking in UML state diagram. Samuel et al. [16] present a methodology to generate dynamic slices and test cases with the help of UML communication diagram. Lallchandani and Mall [17] propose a technique for constructing dynamic slices of UML models using the integrated state-based information. This work is extended by Lallchandani and Mall [4] in which a dynamic model slicing method is proposed based on the proposed intermediate representation named MDG which is constructed from UML class diagram and sequence diagram.
As most of the dependencies are usually static which cannot depict the dynamic aspect of the system, Control and Data Flow are used as the important aspect of system modelling or UML models that describe the nature of every component, their behaviour, and interaction with other components. Korel et al. [18] present an approach to slicing state-based models called Extended Finite State Machines. Similar work has been done [19]. Lano [19] defined that slicing can be carried out for UML state machines, using data and control flow analysis to remove elements of the machine that do not contribute to the value of a set of features in a selected state of the machine. Samuel and Mall [16] slice UML activity diagram to generate test cases on the basis of control and data flow.
OCL is used to specify constraints on objects in the UML. OCL provides data types and operators to express the constraints of UML models. Shaikh et al. [20] proposed a verification technique to check the correctness of model with the help of slicing in which OCL constraints have been taken as input and slicing is done by using dependency graph. Swain et al. [21] proposed an algorithm for automatic generation of test cases from sequence diagrams by expressing class diagram in OCL.
Modelling transformation is also used as a kind of slicing technique in the following work. Lano and Kolahdouz-Rahimi [22] defined the technique for slicing of UML model using Model Transformation. The proposed technique uses class diagrams, individual state machines and communicating sets of state machines to perform slicing of UML Model. Ujhelyi et al. [23] proposed dynamic backward slicing of model transformations technique. They used model transformation language as a core of technique with the help of Dynamic Backward slicing by considering the Execution traces of program to generate final slice.
Recent research slices on feature model to understand cross-cutting concerns. Archer et al. [24, 25] proposed a new slicing technique on the feature model by taking cross-cutting constraints into account with respect to set of features which are acting as slicing criteria. By extended the previous author, Hubaux et al. [26] proposed a slice feature diagram to design three different views of an input diagram to provide more flexibility to the configuration environment.
Slicing and software evolution
In the domain of software evolution, slicing techniques have been used to help system understanding, change impact analysis, test cases generation and so on.
Priyadarshniwangoo [27] presents a classification algorithm for categorizing evolutionary organisms using slicing techniques. The algorithm efficiently discovers the evolutionary relationship between organisms. The UML model is automatically generated from the source code to validate the forward and backward dynamic slicing algorithm.
Change impact analysis is an important part of software evolution. Wen et al. [28] proposed a change impact analysis technique based on evolution slicing to tackle such big software evolution data at the code level. This technique firstly distinguishes the modified elements, and then constructs the evolution slice to assist software developers and maintainers to make evolution decisions. Nejati et al. [29] propose an approach to automatically identify the impact of requirements changes on system design by applying a static slicing algorithm to extract an estimated set of impacted model elements. Hearnden [30] proposed an incremental change propagation for automating software evolution in Model-Driven Architecture by using rule factorization and term slicing for the analysis of Tefkat transformations. [31] presents model-based impact analysis approach for state-based systems. The proposed approach uses model dependencies to automatically measure the expected impact for a requested change instead of relying on the expertise of system maintainer.
Paper [32] presented a slice-based estimation approach for maintenance by examining a case study of the GNU Linux kernel with over 900 versions using a lightweight slicing approach and encodes the forward decomposition static slice profiles for all variables in all the files in the system.
Gallagher et al. [33] formalize decomposition slicing, which identifies interferences between software components and isolates the components to be changed. Lity et al. [34] proposed a novel approach for incremental model slicing for delta-oriented software product lines. A variant-specific dependency graphs as well as an incremental slice computation is achieved based on the specification of model changes between variants by means of model regression deltas.
Framework of the proposed approach.
From the discussion above, it can be concluded that even though there are some researches on slicing in the domain of software evolution, however, most of them are at code level. Therefore, it is necessary to enrich the work at model level to facilitate the system understanding during the software evolution.
The objective of the proposed approach is to ease the model understanding during the software evolution. Firstly, we comprehend the legacy systems and extract higher representation models from legacy systems. Then, the interdependencies among the elements in the models will be examined. Hence, construction of the dependency diagrams will be made to lay a foundation to slicing. Finally, we can slice the dependency diagram according to the given slice criteria. Based on reengineering techniques and domain knowledge, a model slicing approach is proposed to facilitate the model understanding for legacy system evolution.
Figure 1 illustrates the framework of the proposed approach, which is linked by an evolution process model and comprises 6 main modules as follows:
Stage 1 Comprehension. System Comprehension contains the methods of understanding legacy systems with the help of domain knowledge. The process includes pre-processing, static analysis and dynamic analysis. It takes source codes as the input and then outputs UML diagrams including class diagrams and several communication diagrams which indicate the interaction among system objects. Stage 2 Dependency Analysis. For the slicing technique used in this paper, different dependency relationships among classes will be taken into consideration. This stage is the kernel of the proposed framework which includes analysing the various interdependencies among the UML model elements, including relation dependency, control dependency, data dependency and member dependency. Dependency Analysis takes the elements in class diagram and communication diagram as the input and produce the dependency relationships among them as the output. Stage 3 CSDG Construction. This stage comprises the procedures of structuring the interdependency diagram based on the dependency relationships analysed in the previous phase. The output is a constructed intermediate representation of dependency relationship, named Class Scenario Dependency Graph (CSDG). Stage 4 Slicing. Slicing stage involves applying the proposed algorithm to reduce the CSDG diagram with the given slicing criteria. The output is the sliced diagram which is tightly coupled in elements interdependency. Stage 5 Decomposition. System decomposition stage is composed of computing the cohesiveness among the model elements based on the relation weight analysis and producing the clusters (also called subsystems) via a decomposition algorithm. There are two main challenges to be taken up. One is how to determine the cohesion degree in a cluster. The other is what kind of intermediate diagram can be used to facilitate the partition. After examining the dependency relationship among classes and objects in UML diagrams, the proposed CSDG graph is used as the intermediate representation to serve the system partition and each type of edges in CSDG is weighed as the parameter to decide the cohesion degree. Moreover, a decomposition algorithm is proposed to search high independent clusters on the basis of CSDG and independent metric.
UML model dependency analysis
As mentioned in Section 2, there are various kinds of models and slicing techniques. In this section, we clarify the reasons why we choose UML as the slicing model and dependency relationships as the model slicing technique. For the further discussing of model slicing in Section 5, we present the detailed definition of each dependency relationship as well.
UML model selection
The first step of the proposed framework is to extract higher level models from the legacy system. The modelling of a legacy system concentrates on the reflection and comprehension of the legacy system at higher level of abstraction. The main purpose is to understand the structure of the target legacy system and its main tasks.
As legacy systems have static and dynamic characteristics that display their functionalities and represent their structural and behavioural characteristics respectively. Static information describes the structure of the software while dynamic information specifying the runtime behaviour. Static analysis along with dynamic analysis for the legacy system contributes to the various software artefacts and their relationships.
In real systems, the static analysis itself is not adequate for a complete system understanding. For example, class diagram only provides a static view of the class hierarchy. It is not possible to statically discover the real methods and related components that called in a call instruction referring to an interface or class. In practice, sometimes this set is still large and reverse engineers hardly want to screen all of them to detect the actual instances. Considering these difficulties, it is necessary to step through dynamic model analysis to learn what components are instantiated at run time and how they interact with each other. UML has proved to be a good platform for modelling real systems. When modelling a legacy system with UML, the information in the legacy system is refined by using the UML diagrams. Through the extraction of UML diagrams from legacy code, the transformation has realized analysis platform on UML to be helpful on the comprehension of legacy systems based on the general analysis language UML.
UML has static and dynamic modelling advantages. It satisfies the needs of software evolution. At the same time, UML presents a visual description of the system and makes the process of software evolution easily acceptable. Meanwhile, many tools support the transformation from UML diagrams to code, and UML facilities the reusability of software evolution, which is also helpful for forward engineering in the process of reengineering.
However, in practice, it is not necessary to use all of UML diagrams to model those legacy systems. Some of the UML diagrams are similar. For example, the class diagram is the most fundamental of the diagrams for modelling the structure of legacy system. An object has the same characteristics as the corresponding class. A class diagram is the abstraction of the common characteristics of the object group. Therefore, if class diagram is used to model the static system structure, the object diagram is superfluous. The behavioural models e.g., sequence diagrams and communication diagrams, are used to depict a sequence of actions in an interaction through which a use case is realized. Communication diagram emphasizes on the interaction among objects while sequence diagrams put more emphasis on the actual order. Actually, they are semantically equivalent. The static model cannot depict the behaviours of objects. Dynamic models, on the other hand, do not adequately represent considerations of structure and dependency management. Therefore, after checking reverse engineering tools which can be used to extract models from source code, class diagram and communication diagram are chosen as the representation models to be sliced.
UML model definition
Class Diagram is an important part of UML static modelling in which the classes of the systems and their relationships have been depicted. Class diagram in nature reflects the objects and the static relationship among them in software system. Object diagram instantiates the class diagram. Communication diagram, on the other hand, is another representation of object diagram which shows the message flow between objects in the software application and implies the basic associations (relationships) between classes. They are very useful for visualizing the way several objects collaborate to get a job done and for comparing a dynamic model with a static model.
Thus, in this paper, the class diagram dependency is analysed with the help of communication diagram. Since communication diagram describes execution scenarios of the system, the scenario path will be considered in our proposed method.
For consistency, the definition of UML class diagram is given below.
UML class diagram depicts a variety of relationships among classes. In UML class diagram, relations are represented by association line and navigation arrow. Dependency, is a form of association that specifies a kind of dependency relationship between two classes, dependency is displayed as a dashed line with an open arrow that points from the client model element to the supplier model element. Different from other relations, dependency relation is very useful when describing class relationships with no attributes visible. For example, dependency relation shows whether one packet is aware of the existence of the other packet. The component of one packet does not reference any class, component, interface, method or service of the other packet if there is no dependency relation between them.
As mentioned in the previous section, communication diagram in nature is a flow chart depicting an execution process. It is obvious that communication diagram includes control dependency information. Scenarios describe the flow of events when executing a use case and each event flow is called a scenario. Scenario path represents the complete trace of threads execution in communication diagram and it is the trace of participated objects interaction as well.
As described in [35], “a scenario is a formal description of the flow of events that occur during the execution of a use case instance, and it defines the specific sequence of events between the system and the external actors”. For the reasons above, scenario is used from communication diagram to compensate the limitation of class diagram by reflecting the control dependency in communication diagram to class diagram.
In communication diagrams, objects are shown with association connectors between them. Messages are added to the associations and show as short arrows pointing in the direction of the message flow. The sequence of messages is shown through a numbering scheme. Links between objects on communication diagrams are corresponding to the relationships on class diagrams.
Message is the only way to communicate between objects in UML interaction diagram which conveys information with the expectation that action will occur. Links are thought as the instances of associations. It is against the rule to create a link in a communication diagram if there is no relationship (i.e. association, aggregation, or composition) among the objects in the corresponding class diagram.
A message is shown as an arrow line from the sender message end to the receiver message end. A label is used to identify the number, guard condition, iteration or return value of message. Designer can find out the collaborations among objects, trace the execution path and the variation of messages by identifying the syntax. The objects in the communication diagram can be instantiated from the classes. There exists a corresponding relation in the class diagram for each link between the objects in the communication diagram. It is exactly based on which can be used to aid the dependency analysis in the proposed method.
Dependency analysis
Program dependence is the dependency relationships between the statements within a program. Data and control flows in the program source code determine the dependency relationship which can be represented by some forms of visualisation using program dependency analysis techniques [13].
In a program with sequence structure, to compute the dependency set for a statement s is a process of computing the reachability of the corresponding graph. UML is a kind of modelling language independent of normal programming languages. Therefore, the existing dependency analysis methods for program cannot be applied directly to analyse UML class diagram.
Relation dependency analysis
There are several kinds of relationships among classes in UML class diagram, such as aggregation, association, composition, and generalization/specialization. Association relation is a generic relationship between two classes, which indicates there are some kinds of relation among them but it cannot describe the relation concretely. However, the association relationship is not treated as a kind of dependency relationships in this research. It is because the association relationship can be addressed through the message transferring in an interaction or operation invocation, for example, the dependencies are represented from the control dependency, call dependency or data dependency which are discussed in the following section. Therefore, the relation dependences among classes are expressed as the following.
Aggregation relation. Aggregation relation is one of special form of association relation. Aggregation represents a “whole-part” relationship, which is known as “has a” relationship.
Composition relation. Composition is also named as strong aggregation which is known as “owns a” relationship.
Generalisation relation. Generalisation relationship indicates the inheritance relation and known as “is a” relationship.
Relation dependency analysis of UML class diagram is the process to analyse how a class is correlated with other classes or how an instance of a class is correlated with the other class’s instances.
Control dependency analysis
Because of the limitation of class diagram, there are only static structure information of classes can be shown in class diagram. Control dependency cannot be analysed solely on class diagram, however, it can be performed with the help of communication diagram.
The UML specification describes a precondition as: “an optional set of constraints specifying what must be fulfilled when the behaviour is invoked [36]”. For communication diagram, guard condition is a kind of precondition.
For better understanding of control dependency analysis, a set of dependency relations is defined. For uniformity, ci.oper_pi represents an operation of ci while an oper_p without class identifier represents any operation in the given class diagram and attr_a without class identifier represents any attribute in the given class diagram.
For For an operation Let
In non-concurrent event flow, it is obvious to conclude that control dependencies among operations are transitive.
Data dependency analysis
Data dependency is such a situation in which the execution of a statement is dependent on the value of some relevant operations. In UML class diagram, data dependency means the value of a variable defined by an attribute or operation is referred by another attribute or operation.
In the background of software evolution and security domain, data as an important information carrier should be treated as assets and kept confidential, integrated and available so as to immune from security threat. CIA is a widely used metric for measuring of information systems security, emphasising on the three key security properties of confidentiality, integrity and availability of information. Confidentiality is related to the border concept of data privacy and refers to limiting information access and disclosure to authorised users. Data integrity, namely, that data should remain integrity and have not been modified inappropriately. To analyse read dependency between attribute and operation among objects of classes will lay a foundation of data confidentiality analysis, similarly, to analyse write dependency between attribute and operation will contribute to data integrity analysis.
CALL( RA( WR(
If If If
Class diagram of CSDG meta-model.
For a given UML class diagram
From the Definitions 9 and 10, it can be concluded that DD
Flowchart of Algorithms 1 and 2.
In a class diagram, the attribute and operation are the inherent elements of a class. However, when it comes to analyse the different dependency relationships among various classes in the class diagram, it is necessary to represent this kind of relationships in a formal and clearly way which is the basis of the subsequent model slicing.
If If
UML model slicing
Thus far we have elaborated the analysis of dependency relationships among exacted UML model. Various kinds of dependencies existing among different model elements are usually considered when determining the impact of a change made in the system architecture [4]. In this section we briefly describe the central artefact and the corresponding strategies of the proposed slicing method. An intermediate representation for UML models is the prerequisite for UML model slicing which is proposed and named as Class Scenario Dependency Graph (CSDG) in this section.
Class scenario dependency graph (CSDG)
CSDG definition
This section presents an intermediate representation of the extracted UML models, named CSDG which lays a foundation for the subsequent model slicing.
A class diagram of CSDG meta-model is shown in Fig. 2. The structural design of CSDG is illustrated with the elements involved which can be roughly classified into two categories, node and edge. Each instance of CSDG consists of a set of nodes along with a set of edges describing the different kinds of dependencies.
The various CSDG nodes are listed as:
Class nodes. Class are represented by CL Attribute nodes. The attributes of a class CL are represented by AT Operation nodes. The operations called by a class CL are represented by OP
The edges representing the various kinds of dependencies relationships among the above nodes can be listed as:
Member dependency edge. Member dependency edges represent the composition of operations, attributes and class which means that AT node and OP node are the member of CL node. Relationship dependency edge. Relationship dependency edges represent the way of how classes and instances of classes are interrelating with each other. CSDG represents the class relationships in the same way to its corresponding class diagram which can be classified into association, generalisation, composition and aggregation. Message dependency edge. Message dependency edges represent messages flows among the objects which can be classified into the following types according to different message transferring.
Data dependency edge. Data dependency edges represent the flows of data where arise the class operation. There exists the data dependency if the parameter and return values of the operation directly or indirectly make use of the class attribute. Control dependency edge. Control dependency edges exist when whether or not the operation of an object is executed is determined by another operation of the object or another object. Call dependency edge. Call dependency edges represent the flow of operation calls invoked by objects.
Dependency semantic in CSDG
Dependency semantic in CSDG
For better understanding of CSDG, a graphical method to represent the dependencies in UML diagram is given in this section.
From the description of message dependency edge, it can be concluded that:
For any pair of nodes
Both One of the nodes is OP node, the other is either an AT node, or an OP node, and there exists a data dependency A message dependency edge can exist due to one of the following:
Both nodes of Both nodes of Both nodes of
Semantics analysis of dependency in CSDG
In the description of CSDG, the semantics of edges depend on the pair of nodes and the specific scenario. The same notations to [4] are used to represent the dependency semantic in CSDG. The edge is represented using a pair of nodes (from-node, to-node). The meaning of the from-node is that the dependency starts with from-node and ends with to-node. The direction from the from-node to the to-node depicts the dependency in the CSDG. The various dependencies semantics in CSDG and their corresponding notations are depicted in Table 1.
CSDG construction
In this section, we explain the construction of the CSDG by using the proposed construction algorithm. The construction procedure in the pseudocode takes two types of elements as the parameters. One is the elements from class diagram and the other is those from communication diagram. The constructed CSDG is returned as the output. The construction algorithm is composed of three steps. It identifies the three types of nodes from class diagram first. Then, member and relation dependency edges can be identified from the class diagram and added to the CSDG. Finally, various message edges are examined respectively and included into the CSDG based on the interactions in the communication diagram. Figure 3 shows the flowchart of the procedures of CSDG construction (left part) and slicing (right part). The detailed construction and slicing procedure are described in Algorithms 1 and 2, respectively.
Slicing algorithm
CSDG is the intermediate representation diagram of the exacted UML models from legacy systems. Once it has been constructed, it can be fed into the procedure of slicing. This section presents our slicing algorithm in the form of pseudocode. It takes the constructed CSDG and the slicing criteria as the input, and returns the sliced diagram which shows the interdependencies between the slicing criteria and other elements in the diagrams.
System decomposition
So far, we have presented the core artefacts of our proposed approach for both the conceptual definition and process implementation in Sections 4 and 5 respectively. In this section we present how the proposed model slicing method can be applied in software evolution domain to facilitate the decomposition of a legacy system.
As legacy systems are always huge in size, it is useful to break a complicated system down into a collection of small and manageable subsystems which respectively contains relevant functionalities. Clustering technique is usually used to break down a system into a set of meaningful sub-clusters by using some certain decomposition criteria. High cohesion, low coupling and interface minimization are the goals that such criteria try to achieve.
Legacy systems always contain a great number of cooperating components, and these components are often organised into identifiable clusters, namely, subsystems. That is, components which contribute to the same business logic are included in a subsystem. Hence, according to domain-expert knowledge, a legacy system is preliminarily decomposed based on the discrepancy of business logic that embedded in various subsystems.
Relationship type weight
As described in previous section, there are different types of relationships among classes in object oriented systems which play different roles in system architecture. Relationships between classes can tell people something about cohesion and coupling of modules, or about the layers built into the legacy system. For example, class B inherits from class A, class B is hardly independent from class A because the methods and attributes of class B may be inherited from class A. When it comes to system decomposition, if class A and class B are separated into two different components, reusability of the component where class B hosts may be low due to its tight dependence on the component where class A hosts. It will make the difference if the relationship between class A and Class B is message link. Therefore, taking into consideration of relationships among different classes in system decomposition may improve the correctness of decomposition.
For a given object oriented system, CSDG defined in Section 4.4 can be constructed based on the extracted static and dynamic models from the object-oriented system to represent the dependency relationship. Suppose
It can be concluded from Eq. (1), the weight of each edge in CSDG only depends on its relationship type. There are three types of node in CSDG (class node CL, attribute node AT and operation node OP) which accurately describing to what extent the classes are dependent on each other when analysing the dependency relationships among classes. Obviously, it is meaningless if AT nodes and OP nodes are separated from CL nodes and thereby they should be treated as a whole when considering system decomposition. Therefore, the value of Wmem should not be taken into consideration of independency metric calculation defined in Section 4.3.2 and thereby its value is set to 0 for the purpose of uniform.
Based on the coupling degree of class diagram relationships, there exists: generalization
Weight value for edge in CSDG
Results of one iteration for WebStoreApp
Class diagram of WebStoreApp bean classes.
In this paper, a greedy algorithm based on [37] is proposed to search for high independent clusters within limited scope of each node taking relationships among different classes into consideration. In [38], suppose that
where
Partition result of WebStoreApp application
Sequence diagram of Cart update operation.
Sequence diagram of Add Product to Cart operation.
Part of CSDG for WebStore application.
Slicing output using Product as the slicing criteria.
System use case diagram of WebStoreApp application.
Inspired by QMOOD, for a given
In Eq. (3), three considerations are taken into account including cohesion within subgraph, coupling between subgraph and its complement, the size of subgraph. Cohesion can be obtained by calculating the sum of weights of inner relationships within the subgraph, while coupling is achieved by calculating the sum of weights of outward relationship, and size of subgraph
Large software systems are usually composed of basic clusters and composite clusters due to the different abstraction level. Composite clusters can be further decomposed into basic clusters. The proposed decomposition algorithm is composed of two phases. In first phase IM in Eq. (3) is used to compute the independency metric so that the basic clusters can be identified and then removed from the graph. The remaining of the graph is treated as composite cluster on which further decomposition can be performed in second phase. Detailed description of the proposed decomposition algorithm is specified in Algorithm 3.
Introduction
WebStoreApp [39] is an open source online shopping application project based on GNU General Public License version 2.0 (GPLv2) which allows customers to shop brand new products as well as used items. WebStoreApp is written in Java using Struts 2.x and Hibernate 3.x frameworks. It Includes 62 classes, including 21 action classes (servlet), 12 beans classes (EJB), 12 interface classes and 1 POJO class implementation of Facade design pattern etc. It contains 10740 lines of code and provides the basic function for online shopping. Figures 5 and 6 show the user account screenshot and order processing screenshot of this application respectively.
In this section, VP-UML [40] is used to generate class diagram and sequence diagram from the source code of WebStoreApp. Visual Paradigm for UML (VP-UML) is a powerful UML CASE Tool from the OMG. All of the bean classes and their relationship are shown in Fig. 4. Several extracted sequence diagrams are depicted in Figs 5 and 6. Even though the dynamic diagram extracted from the source program is sequence diagram by using VP-UML, it is equivalent in semantics.
CSDG construction of WebStoreApp
This section a CSDG is constructed based on the extracted system diagrams according to the algorithm described in Section 4.4. Figures 4–6 are selected as the source graphs to construct CSDG. In view of readability and complexity of whole system, only parts of them are shown in Fig. 7.
Using the model slicing algorithm in Algorithm 2, Fig. 6 is sliced based on the given slicing criteria and the result is shown in Fig. 8.
Legacy system decomposition
According to the algorithm of system partition proposed in Section 6.2, the Independence Metrics are computed and the result is shown in Table 3.
After iteration computing for several times, the final partition result is shown in Table 4.
From the result of system partition, it can be concluded that there are five main modules in WebStoreApp which can be described as a system use case diagram in Fig. 9.
Conclusion and future work
Conclusion
Undoubtedly, more and more existing software will become to legacy ones due to the rapid development of information technology and constantly changing requirements. However, legacy systems are very likely to outgrow the graphical model which makes it a challenging task to understand and test such systems. The goal of this paper is to provide a suitable, effective and reusable approach to facilitating the evolution of legacy systems by slicing the extracted models derived from source codes.
In the introduction, we put up with the research questions of our paper. We conclude the paper by answering each of them.
What kind of models are (not) used to being sliced and why? It is difficult to slice extracted models as the concerns of legacy systems are scattered in different views in UML. As legacy systems have static and dynamic characteristics that display their functionalities and represent their structural and behavioural characteristics respectively. Both static and dynamic views of the system must be taken into consideration while understanding the functionalities and behaviours of the system. Therefore, in this paper, we choose class diagrams and communication diagrams as the source models to be sliced after considering the reverse engineering tools. What dependency relationships are (not) used in model slicing and why? Dependency relationships both in static and dynamic diagrams are taken into consideration. We group them as Relation Dependency, Control Dependency, Data Dependency and Member Dependency. Each of them is further classified. Relation Dependency is classified as Aggregation, Composition, and Generalization. Data Dependency is classified as Call, Read and Write. Member Dependency is composed of Operation and Attribute dependency. How effective/ineffective is model slicing in comprehension during software evolution? We applied the proposed approach in system clustering with a real-life case study (Section 7). The result shows the proposed framework simplified the extracted models by reducing the irrelevant or independent elements while considering the specified given element in the models and then facilitating the legacy system understanding by feeding the sliced models as the input while performing system decomposition.
We have reviewed the related research in Section 2. Even though there are various model slicing methods have been proposed for different purposes aiming at different types of models, few of them is proposed for software evolution. In Section 6, we applied the proposed method to decompose the legacy system based on the work of Luo et al. [37]. The proposed algorithm in this paper improves the accuracy comparing with the method in [37] for the following reasons. Firstly, the graph used in the decomposition algorithms in [37] is static diagram, while the graph used in this paper is a combination of static and dynamic diagram. Dynamic diagram depicts the interactivity between classes which cannot be shown in static diagram, and improves the accuracy when evaluating the independency among classes. Secondly, the initial node set of a cluster is obtained by setting the minimal path length as 4 in the algorithm in [37], while it is obtained by model slicing according to the dependency extent among classes in the proposed algorithm in this paper which is more accurate than that in [37].
Limitations
The first limitation of our method is dynamic diagrams cannot always be extracted from source code completely. Some models cannot satisfactorily be extracted from source code. These models are highly dependent on user interaction to identify external actors and their roles or are behaviour models of highly reactive systems with many external events. A developer/user community must be available in order to identify these external actors and the roles that they play in regards to the system or these actors with their roles must be available in up-to-date documentation.
The second limitation is the poor quality of the legacy system may affect the efficiency of the proposed approach. In this research, the legacy system is analysed by using reverse engineering tools. If the legacy systems are designed by standard programming, such as comments, document and coding format, etc., the proposed approach will be more effective. Otherwise, human intervention is needed to improve the quality of model extraction.
Future work is planned in various directions. First, regarding the evaluation method used, we utilized a real-life example as a case study (Section 7), together with basic analysis. It is comparably efficacious in illustrating the value of the work in general. While in a certain sense our evaluation approach can be improved with more slicing criteria and more case studies. Next, one related area of model slicing research that we have not pursued in this paper is that of methodology quality analysis, such as time complexity, completeness and consistency. In fact, there is very little work done with respect to ensuring engineered methodologies possess qualities. Last, the development of a comprehensive tool can be an important addition to improving the ease of use and efficacy of applying the proposed framework.
Footnotes
Acknowledgments
This work was supported by Natural Science Foundation of Liaoning Province (201602583).
Authors’ Bios
