Abstract
Data grid systems are utilized to share, manage, and process large data sets. On the other hand, an increasing number of applications with real-time constraints arise in several disciplines of science and engineering. The performance of a data grid system for real-time applications is highly dependent on the underlying job scheduling, data scheduling, and data replication algorithms and advance reservation mechanism. Thus, in the literature, there are numerous studies that propose solutions to the job/data scheduling, data replication, and advance reservation problems. In these studies, a number of simulators, emulators, or test beds have been used to evaluate the proposed algorithms. Furthermore, these simulators/emulators usually adopt fixed-grid models, which in turn dictate specific job/data scheduling and data replication mechanisms. In the literature, there is no unified framework for modeling grid systems with different architectures, which can allow researchers to develop new grid system models and evaluate them in a flexible manner. This paper presents a unique framework for modeling real-time data grid systems that attempts to unify a large class of job scheduling, data scheduling, and data replication algorithms based on several system services. Then, in order to enable the development of these algorithms under different system models, DGridSim is realized to be a multi-model discrete-event simulator, and its capabilities are exemplified by means of a set of simulation results. The main contribution of the research is DGridSim, which can model and simulate a variety of different data grid system models by means of several system services and their interactions.
1. Introduction
Data grids are envisaged to run real-time, data-intensive applications emerging from many disciplines of science and engineering, such as geographic information science, high-energy physics, and remote instrumentation.1–3 For example, the Large Hadron Collider (LHC) is a particle accelerator used to perform four major physics experiments. These experiments together will produce nearly 15 petabytes of experimental data each year. The data are initially stored at CERN and made available to scientists across the globe. In order to deliver these data through high bandwidth links and to execute data intensive jobs, a grid system called the Worldwide LHC Computing Grid (WLCG) has been built. 4 On the other hand, in climate research, climate simulations and observations studies will produce hundreds of petabytes of data. Based on such a massive data set, climate scientists strive to advance their understanding of phenomena such as climate variability, water cycle, sea-level rise, etc., which requires the use of peta-scale computing power and advanced networking and analysis infrastructure. 5 Fortunately, there is a variety of operational grid computing systems, e.g. EGI (European Grid Infrastructure), 6 Open Science Grid, 7 SEE-GRID (South Eastern European Grid-enabled e-Infrastructure), 8 EELA-2 (E-science grid facility for Europe and Latin America), 9 XSEDE (Extreme Science and Engineering Discovery Environment), 10 with a support for running compute- and data-intensive applications.
The data grid applications are typically composed of a number of jobs, each of which is associated with a real-time requirement as well as a demand for access to terabytes/petabytes of data. Thus, these jobs must be executed in compliance with their real-time requirements, which further require the real-time dissemination of job data, on a data grid infrastructure. In order to successfully complete the execution of real-time, data-intensive applications before their deadlines, a data grid infrastructure that manages all of its resources in a way to support the real-time operation, which will be referred to as a real-time data grid, must be architected. Unfortunately, in the literature, there is no such model available that thoroughly describes how a real-time data grid system should be built based on a set of commonly known grid services.
Based on the fact that the execution of jobs with data and deadline must be supported, a real-time data grid system model should at least include a job scheduling mechanism to find out computing elements to run jobs, a data scheduling means to discover storage and networking elements to retrieve job data, and a reservation method to guarantee the exclusive use of system resources to meet the associated deadlines. All three mechanisms, as well as data replication, as shown in the literature, are also crucial to boosting the system performance in terms of maximizing the number of jobs completed before their deadlines.2–4,11,12 Thus, a real-time data grid system model must accommodate job and data scheduling, data replication, and advance reservation mechanisms based on commonly known grid services, e.g. scheduling, replica location, reservation services, etc.
On the other hand, fine-tuning a data grid system, which also includes the development of job and data scheduling, data replication, and advance reservation algorithms, is a non-trivial task because of its highly sophisticated, distributed, dynamic, and heterogeneous nature. In order to mitigate such a complex task, according to the literature studies related to data grids have been generally carried out through simulators, such as Optorsim, 13 GridSim,14,15 and SimGrid. 16
Optorsim is a simulator designed by the European Data Grid Project. 13 Its design derives directly from the architecture of the data grid project. Authors state that the main focus of Optorsim is ‘to study the complex nature of a typical Data Grid and evaluate the effectiveness of replica optimization algorithms within such an environment’. It provides users with simulated architecture and programming interfaces of a data grid to evaluate and validate their replication strategies. The main advantage of Optorsim is that it performs two-stage optimization. Scheduling decisions are based on both the location of data and the status of network links between grid sites, while (re)optimization during the run-time of a job takes into account dynamic variations in the distribution of data and in the behavior of network resources.
The GridSim simulation tool is implemented in Java and employs a discrete-event simulation infrastructure named SimJava2. 14 It is implemented in layers for extensibility. GridSim allows the modeling of different resource characteristics and types. It supports a reservation-based mechanism for resource allocation and its computing nodes can run jobs in both space- or time-shared mode. GridSim has the ability to schedule compute- and/or data- intensive jobs. GridSim is very detailed in its simulation of the components of a Grid and introduces an economic model to manage the use of Grid resources through the buying and selling of resources. It is designed primarily to study job scheduling algorithms. GridSim was later extended to handle the simulation of data grids. 15
SimGrid is a simulator designed with the intent of studying job scheduling algorithms in heterogeneous platforms. 16 Starting from version 2, SimGrid has started to use analytical models in the data transfers in favor of the wormhole models used previously. SimGrid by default uses the flow level analytical TCP model MaxMin sharing strategy. With version 3.3, it has also implemented Vegas and Reno analytical TCP models. With pseudo-model integration of GTNets, it enables packet level simulation of data Grid networks as well.
A comparison of simulators, emulators, and experimental platforms for grid systems, including Optorsim, GridSim, SimGrid, ChicSim, 3 GangSim, 17 MicroGrid, 18 and Grid’5000, 19 can be found in the literature.20,21 In addition to these simulators, in this study DGridSim, a discrete-event simulator, is introduced. The main motivation behind DGridSim is to provide an all-in-one platform that enables the simulation of popular data grid system models, decoupled job and data scheduling, 3 hierarchical data organization, 4 hierarchical job scheduling, 22 etc., and the development of real-time job and data scheduling and data replication algorithms for these system models. Fortunately, this initial motivation has led to the development of a simulator that has contributed to the literature on the simulation of data grids in the following ways:
DGridSim is capable of simulating different data grid system models based on several well-defined services,3,4,22 which is really unique among the simulators in the literature with respect to the best knowledge of authors. Most simulators typically adopt a fixed (grid) system model and this pre-determined model is used in the development of all related algorithms.
DGridSim allows the simulation of job scheduling, data scheduling, and data replication algorithms altogether for its grid models so that a real data grid system with all relevant services can be simulated. To the best knowledge of authors, no other simulator has ability to evaluate job and data scheduling and data replication algorithms simultaneously for different scheduling and replication models available in the literature.
DGridSim focuses on real-time data grid systems that require the completion of all jobs before their deadlines. In order to provide such a guaranteed real-time service with jobs, all computing/networking/storage resources simulated in DGridSim must be reserved through a reservation mechanism by either a job scheduling or data scheduling mechanism. To the best knowledge of authors, the simulation of real-time data grid systems in which both job executions and data transfers must be completed before their deadlines has not been attempted before.
The rest of the paper is organized as follows: Section 2 defines what grid computing is and its concepts. Section 3 presents the fundamentals of a data grid system model in DGridSim. Section 4 describes four different data grid system models supported by DGridSim. Section 5 explains the software architecture in addition to the conceptual aspects of DGridSim. Section 6 introduces DGridSim GUI to highlight its unique features. Section 7 shows some simulation results related to different DGridSim models and provides the related analyses. Finally, Section 8 concludes the study.
2. Grid computing
According to Foster et al., 23 a ‘grid is a system that coordinates resource sharing and problem solving in dynamic, multi-institutional virtual organizations’. The term grid is chosen as an analogy to an electrical power grid, since these two concepts are similar in the following respects, namely a network infrastructure reaching almost everyone, generation of (electrical or computing) power in large installations, and simple integration of additional distributed providers (solar or wind energy, or special hardware). 24 It should be noted that the roots of grid computing could be tracked back to the late 1980s. Yet, the Globus Toolkit, a general middleware for grid systems, has popularized grid computing. After the introduction of grid computing, a variety of grid systems have been proposed, which include computational grids, e-science grids, desktop grids, data grids, etc. 11
Within the grid computing framework, a virtual organization is a collaboration of users or organizations to share resources in a controlled fashion, so that members may collaborate to achieve a shared goal. 23 Virtual organizations define the resources available for its collaborators and the policies for sharing the resources in addition to the protocols and mechanisms for applications to determine the suitability and accessibility of available resources. For example, a virtual organization may be composed of a hierarchy of regional, national, and international virtual organizations and sharing of data collections is governed by the relationship among these organizations in data grid architecture. 2
A grid can also be viewed as a seamless computing environment in which a wide variety of resources, including supercomputers, storage systems, and scientific instruments that are geographically distributed and owned by different organizations, are shared in order to solve large-scale computational and data intensive problems in science, engineering, and commerce. 25 In order to build such a computing environment, a grid infrastructure must have a number of capabilities, including discovery of resources, uniform access to remote computational and data resources, job and data request submission, monitoring, steering of job execution, job scheduling, publication and replication of data sets, etc. These and other related capabilities are provided by various grid components that are grouped into four layers:2,23,25grid fabric (all resources that are accessible from anywhere on the internet), core grid middleware, user-level middleware (application development environments, programming tools, and resource brokers), and grid applications and portals. Among the four layers, the grid middleware is software that plays a prominent role in turning a radically heterogeneous environment into a virtually homogeneous one. It should be emphasized here that DGridSim follows this layered architecture in its design to closely mimic grid systems, as will be explained in the next section.
3. Data grid system in DGridSim
DGridSim mimics a layered grid architecture composed of four layers, namely grid fabric, communication, grid services, and applications, which were inspired by previous studies.2,23,25 DGridSim implementations of each layer are explained below in a bottom-up fashion.
3.1. Grid fabric
DGridSim models a data grid system by a collection of sites (virtual organizations) interconnected by an internet network. A site, on the other hand, is composed of computing elements to run user jobs, storage elements to keep job data, and a local area network to provide a connection between computing and storage elements.23,25 A site with only computing elements or only storage elements (network attached storage) is also possible. Among grid sites, DGridSim designates one of them as grid management site where several grid level services are hosted. Furthermore, a computing element in every site is allocated to provide a variety of site level services.
Local area network in a site is composed of a gateway router and dedicated links between gateway router and every computing and storage element, which results in a local area network with star topology. Internet, on the other hand, is a network that provides communication among computing and storage elements of different sites. In the internet network, there are edge routers to which site gateway routers are connected and core routers to which edge or core internet routers are attached by dedicated links. There is a random network topology to form the connections among edge and core routers in the internet.
Within the scope of the grid fabric, DGridSim needs to simulate the use of resources, computing and storage elements, and network links that will be shared by the applications running on the system, which will be further elaborated in Sections 4 and 5.
3.2. Communication
In order to support real-time computing on a data grid, the network infrastructure must be capable of assuring end-to-end delay bounds for the data transfers. The problem of providing end-to-end delay bounds (or QoS guarantees in general) for applications has been the subject of many studies in the grid community,2,12,26,27 as well as the network community.28–31
In DGridSim, the data transfers occur either between two storage elements to replicate data, or from storage elements to computing elements to simulate the data accesses from respective jobs. Moreover, all data transfers are associated with a deadline. Based on previous and the related studies,26–31 when a job data with deadline needs to be moved from a source to destination, DGridSim either ensures the timely delivery of data, or rejects the related data transfer request by means of its data scheduling mechanism, which will be explained in detail Sections 4 and 5.
3.3. Grid services
In DGridSim, simulation of job scheduling, data scheduling, and data replication are all defined based on a set of services. These services are split into two groups, namely grid level services (e.g. grid job submission service, grid data scheduling service, etc.) provided by grid management site and site level services (e.g. site job submission service, site data scheduling service, etc.) supported by site management computing element. All DGridSim services are split into five groups, which are job scheduling, data scheduling, data replication, advance reservation, and information, according to their main role during a simulation.
3.3.1. Job Scheduling
According to several comprehensive surveys,11,32–35 a grid job scheduler can be organized in three different ways: centralized, hierarchical, and distributed. Consequently, DGridSim is implemented to support these three job scheduling models, as indicated in Table 1. It should be emphasized that supporting multiple job scheduling models is a unique feature of DGridSim among the simulators in the literature.20,21
Data grid system models supported by DGridSim.
In order to uniformly model the job scheduling models, DGridSim includes three grid level services—grid job submission service, grid job scheduling service, and grid job dispatch service—and three site level services—site job submission service, site job scheduling service, and site job dispatch service. Note that these services (and all other services) will be software components in a real Grid implementation, and the scheduling services will be all together in charge of mapping jobs to Grid resources under multiple criteria and Grid environment conditions.
3.3.2. Data Scheduling
Within the framework of data grids, the design of data-aware job scheduling algorithms has become an important problem.2,35–38 In addition, with the advance of the next generation networks with bandwidth guarantees, 31 several centralized data scheduling algorithms are proposed to schedule the data transfers with deadline on such networks.39–41
Based on the studies related to the data-aware job scheduling and data scheduling in the next generation networks, DGridSim is designed in such a way that it can provide the simulation of job and data scheduling algorithms together. According to Table 1, there are two different data scheduling models supported by DGridSim, namely centralized and hierarchical.39–42 That is, DGridSim allows a data-aware job scheduling algorithm to make use of both data scheduling models in a decoupled fashion in Model I, II, and IV. However, in Model III, a single coupled job and data scheduling algorithm will be in charge of producing both job and data scheduling decisions.
Similar to the modeling of job scheduling, DGridSim includes three grid level services, grid data submission service, grid data scheduling service, and grid data dispatch service, and three site level services, site data submission service, site data scheduling service, and site data dispatch service.
3.3.3. Data replication
Different data grid system models exist with respect to the organization of data sources. According to available surveys,2,43 federative and hierarchical data organization models are the most prevalent ones, and most of the dynamic data replication algorithms in the literature assume either of the two models. 43 DGridSim supports both the federative and hierarchical, which is a multi-tier computing model inspired by the WLCG, 44 data organization models.
Independent from the data organization model, the data replication algorithms in the literature can be grouped into two main classes, namely centralized and distributed (decentralized), with respect to the organization of replication decision-making authority.43,45 Furthermore, centralized data replication algorithms are naturally push-based (source-driven), 46 and distributed data replication algorithms can be implemented in either push-based or pull-based (receiver-driven) fashions.3,4 Consequently, in order to encompass most of the data replication algorithms,2,43 DGridSim is realized to support the simulation of centralized and distributed data replication algorithms for both hierarchical and federative data organization models, which is a unique capability among the simulators,20,21 as shown in Table 1.
In order to model such different data organization and data replication architectures, DGridSim consists of two grid level services, grid data replication service and grid replica location service, and two site level services, site data replication service and site replica location service.
3.3.4. Resource reservation
In the real-time data grid system, the key to providing performance guarantees with the real-time applications is the resource reservation model and related services that implements this reservation model. Fortunately, in the literature, a wealth of studies has addressed the advance reservation of computing, networking, and storage elements.12,27,42,47,48 Inspired by these studies, an advance reservation model is realized in DGridSim for the reservation of computing bandwidths in the computing elements, storage space and read/write bandwidths in the storage elements, and bandwidths in the network links. In DGridSim, there are two services related to the reservation of all grid resources, namely grid reservation service and site reservation service.
3.3.5. System information
The grid or site level job and data scheduling algorithms need the system information. The required information can be static (computing element powers, link bandwidths, etc.) as well as dynamic (available link bandwidths, network topology, etc.). In order to provide such static and dynamic information about various resources, grid information service for the whole grid and site information service for every site are realized by DGridSim. These information services are also standard in typical grid systems.1,2,22,23
3.4. Applications
The data grid systems are typically deployed to run the application models of independent tasks, bag of tasks, and workflows. 2 Among these three models, DGridSim currently supports the data intensive applications that can be considered as a collection of independent tasks (jobs), each of which requires multiple job data, similar to the astronomy image-processing application. 49 In the literature, there are numerous studies that address the scheduling problem of independent jobs on grids.2,11,33,35,49
4. System models in DGridSim
DGridSim supports the four different system models listed in Table 1, all of which are inspired from the studies in the literature, as explained in the previous section, so as to provide a unified and unique platform for the simulation of job scheduling, data scheduling, and data replication algorithms proposed or to be developed for the most common Data Grid system models in the literature.
In DGridSim, there are two fundamental mechanisms, namely job scheduling and data scheduling. Furthermore, both job and data scheduling mechanisms should exist at both grid level and site level. Consequently, these scheduling mechanisms will be elaborated separately at the grid level and site level in detail in the following sections. On the other hand, the data replication is complimentary to both job and data scheduling, and it is built on top of the data scheduling mechanism.
Thanks to the design of DGridSim, in which the modularity and reusability of its components are highly regarded, a single figure (Figure 1) is considered to be sufficient to describe all grid/site level job/data scheduling mechanisms. With respect to Figure 1,
All grid/site level job/data scheduling mechanisms can be completed in six steps, which are Submission (1), Enqueue/Notification (2), Scheduling Initiation (3), Scheduling (4-a, 4-b, and 4-c), Scheduling Completion (5), and Dispatching (6).
All star-labeled services will always be at the same level, that is, they are either all grid level or site level services. For example, while the grid level job/data scheduling mechanism is described, they will be all grid level services, such as Grid Job/Data Submission Service, Grid Job/Data Scheduling Service, Grid Information Service, Grid Reservation Service.
Submission, Scheduling, and Dispatch Service components are all deployed for either the job or data scheduling mechanism regardless of grid or site level. For example, while the grid/site level data scheduling mechanism is explained, they will become Grid/Site Data Submission Service, Grid/Site Data Scheduling Service, and Grid/Site Data Dispatch Service.
Interactions between the services of different levels are possible: Grid Job Dispatch Service to Site Job Submission Service in Models I, II, and III and Site Data Scheduling Service to Grid Data Submission Service in Models I and IV.
The solid lines represent the service interactions common to all system models, e.g. User/Service submits its job and data request through a Submission Service component; while the dashed lines show optional interactions, e.g. a grid level job scheduling algorithm in Grid Job Scheduling Service may obtain reservation tables from Grid Reservation Service.

Job and data scheduling and data replication mechanisms supported by DGridSim.
The way DGridSim implements these radically distinct system models on a set of common services is further elaborated in the following sections.
4.1. Model I grid systems
According to Table 1, DGridSim realizes hierarchical job and data scheduling mechanisms that enable decoupled job and data scheduling for Model I. Decoupling job and data scheduling results in employing different algorithms to schedule job and data separately.
4.1.1. Job scheduling
DGridSim instantiates a single grid level job scheduling mechanism for whole system (indicated as Grid=Yes in Table 1) and a site level job scheduling mechanism for each site (marked as Site=Yes in Table 1). In Model I, Grid Job Scheduling Service uses a job scheduling algorithm so as to assign submitted jobs to grid sites, while Site Job Scheduling Service calls for a site level job scheduling algorithm in order to find out computing elements on which the job deadlines will be met. Specifically, Figure 1 shows how DGridSim implements the proposed grid/site level job scheduling mechanisms:
[Submission]: Grid/Site Job Submission Service receives jobs from User/Grid Job Dispatch Service.
[Enqueue/Notification]: Grid/Site Job Submission Service places any incoming job in a job queue and informs Grid/Site Job Scheduling Service.
[Scheduling Initiation]: Grid/Site Job Scheduling Service calls for a job scheduling algorithm which can be either an online or offline algorithm.
Online job scheduling algorithm: All incoming jobs are immediately scheduled.
Offline job scheduling algorithm: Job queue is periodically checked and retrieved for job scheduling.
[Scheduling]: Job scheduling:
Based on the information required by the job scheduling algorithm, Grid/Site Job Scheduling Service may query one or more services or no service at all. For example, Grid Job Scheduling Service may interact with Grid Information Service for system-wide information, Grid Replica Location Service for job data locations, or Grid Reservation Service for resource reservation tables.
Once all required information are obtained, a hierarchical grid level job scheduling algorithm determines a site for each submitted job, which completes the job scheduling phase at the grid level. On the other hand, a site level job scheduling algorithm tries to discover a computing element and a deadline-satisfying reservation window on this computing for every submitted job to its site. If the job scheduling algorithm cannot find out a deadline-satisfying reservation window, Site Job Scheduling Service deletes this job from the queue.
Otherwise, Site Job Scheduling Service (SJSS) submits a reservation request with reservation window for the chosen computing element to Site Reservation Service. If Site Reservation Service notifies SJSS about the failure of computing element reservation, SJSS deletes the job from the queue.
Otherwise, SJSS submits a data transfer request to Site Data Submission Service, where the request consists of information related to job data, a deadline for all data transfers, and a computing element on which the related job is assigned. If SJSS is notified about a failed data transfer request by Site Data Submission Service, SJSS cancels the computing element reservation and deletes the corresponding job from the queue. Otherwise, such a job is now guaranteed to finish before its deadline.
[Scheduling Completion]: Grid/Site Job Scheduling Service sends a successfully scheduled job to Grid/Site Job Dispatch Service and removes it from the job queue.
[Dispatching]: Grid Job Dispatch Service forwards a scheduled job to the respective Site Job Submission Service, while Site Job Dispatch Service invokes the job on the chosen computing element according to the related reservation. Note that all job data will be available at this computing element before the job execution starts, as guaranteed by Site Data Submission Service.
4.1.2. Data scheduling
With respect to Table 1, DGridSim includes a single grid level data scheduling mechanism for whole system (indicated as Grid=Yes in Table 1) and a site level data scheduling mechanism for every site (marked as Site=Yes in Table 1). In Model I, Grid Data Scheduling Service runs a data scheduling algorithm in order to compute deadline-satisfying paths for the inter-site data transfers. On the other hand, Site Data Scheduling Service performs the task of scheduling in-site data transfers for the job data that are available within site and submitting a data transfer request to Grid Data Submission Service for inter-site data transfers for those job data that cannot be retrieved within site. Grid/site level data scheduling mechanism realized by DGridSim for Model I in Figure 1 works as follows:
[Submission]: Grid/Site Data Submission Service receives data transfer requests from Site Data Scheduling Service/Site Job Scheduling Service.
[Enqueue/Notification]: Grid/Site Data Submission Service inserts any incoming data request in a queue and informs Grid/Site Data Scheduling Service.
[Scheduling Initiation]: Since the paths over which job data will be transferred have not been determined for any data request in the queue yet, Grid/Site Data Scheduling Service calls for either an online or offline data scheduling algorithm to schedule inter-site/in-site data transfers.
[Scheduling]: Data scheduling:
Grid/Site Data Scheduling Service queries Grid/Site Replica Location Service in order to obtain the source storage elements for job data. In addition, with respect to the information required by the data scheduling algorithm, Grid/Site Data Scheduling Service may query Grid/Site Information Service or Grid/Site Reservation Service.
After having all related information, a grid/site level data scheduling algorithm attempts to determine all inter-site/in-site paths and related storage and network elements with available reservation windows in order to meet the job deadline. If the data scheduling algorithm fails in finding such path(s) based on the available storage and link bandwidths, Grid/Site Data Scheduling Service deletes this data request from the queue. Furthermore, Grid Data Scheduling Service notifies Site Data Scheduling Service through Grid Data Submission Service accordingly.
Otherwise, Grid/Site Data Scheduling Service submits a reservation request to Grid/Site Reservation Service only for those reservation windows and resources that are determined by the data scheduling algorithm. If the reservation service is unable to honor this reservation request, Grid/Site Data Scheduling Service deletes the data request from the queue. Otherwise, all inter-site/in-site job data are now guaranteed to be at its target computing element to meet the job deadline. Grid Data Scheduling Service notifies Site Data Scheduling Service via Grid Data Submission Service whether or not this data transfer request succeeds, which completes the grid level data scheduling. Note that the site level data scheduling needs for the following extra phase for the inter-site job data, which really initiates the grid level data scheduling.
If there are some job data (inter-site data) that cannot be found within site, after running the site level data scheduling algorithm and succeeding in making all reservations related to the in-site data transfers, Site Data Scheduling Service submits a data transfer request for the inter-site data to Grid Data Submission Service so that they can be transferred to the target computing element in this site. If Grid Data Submission Service notifies Site Data Scheduling Service about the failure of its data transfer request, Site Data Scheduling Service cancels all in-site storage and network element reservations for the in-site data transfers and deletes the data request from the queue.
[Scheduling Completion]: Grid/Site Data Scheduling Service sends a successfully scheduled data transfer request to Grid/Site Data Dispatch Service and removes it from the queue.
[Dispatching]: Grid/Site Data Dispatch Service is responsible for completing all data transfers for a job, each of which starts from a source storage element, finishes at target computing element, and uses network elements between them. Note that the usage of all storage and network elements during a data transfer by Grid Data Dispatch Service conforms to the related reservations made by the scheduling service.
4.2. Model II grid systems
From now on, instead of describing job and data scheduling mechanisms in detail for each supported model, the differences from the respective Model I mechanisms will only be highlighted. As will be clear in the following elaborations, regardless of job or data scheduling at grid or site level, the fourth step (job or data scheduling phase) mostly needs to be modified with respect to the model.
In Model II, DGridSim instantiates a grid level job scheduling mechanism in which Grid Job Scheduling Service (GJSS) employs a centralized job scheduling algorithm, which is opposed to the hierarchical one in Model I. The centralized job scheduling algorithm determines a computing element (instead of a site) and a deadline-satisfying reservation window on this computing element for every job. Once such a reservation window is found, GJSS will reserve it through Grid Reservation Service, and then submit a data transfer request to Grid Data Submission Service to make the job data available before its deadline at the computing element. It should be emphasized that, in Model II, job scheduling is completely decoupled from data scheduling. That is, a grid level data scheduling mechanism will be responsible for making data scheduling decisions and making job data available at a specific computing element.
DGridSim sets up a site level job scheduling mechanism for every site in which Site Job Scheduling Service runs a dummy scheduling algorithm that basically corresponds to repeating the scheduling decisions made by the grid level centralized job scheduling algorithm.
Similar to Model I, Model II instantiates a grid level data scheduling mechanism. The main difference between Model I and Model II is due to the source of data transfer request. In Model I, Grid Data Submission Service receives data transfer requests from Site Data Scheduling Service in Step 1, and Grid Data Submission Service notifies Site Data Scheduling Service in return whether or not this data transfer request succeeds in Step 4-c. However, in Model II, Grid Job Scheduling Service submits all data transfer requests, and Grid Data Submission Service informs Grid Job Scheduling Service accordingly.
Since Grid Job Scheduling Service submits all data transfer requests, and they are all scheduled by a centralized data scheduling algorithm for every job submitted to the system, DGridSim does not instantiate a site level data scheduling mechanism for Model II.
4.3. Model III grid systems
Among the four different models, Model III is the only one with the coupled job and data scheduling. Model III is quite similar to Model II, so it will be compared against Model II. In Model III, DGridSim creates a grid level job scheduling mechanism with coupled job and data scheduling algorithm, which is different from Model II with a centralized, decoupled job scheduling algorithm As a result, the centralized, coupled job and data scheduling algorithm not only discovers a computing element and a deadline-satisfying reservation window on this computing element for every job, but also determines reservation windows on storage element(s), from which job data will be fetched, and on network elements to be used during job data transfers to this target computing element. Provided that a set of reservation windows that guarantee the timely completion of job are found, Grid Job Scheduling Service will reserve all of them together through Grid Reservation Service, and then submit a data transfer request to Grid Data Submission Service to start both in-site and inter-site data transfers according to the related data transfer reservations.
Different from Model II with a centralized, decoupled data scheduling algorithm, Model III instantiates a grid level data scheduling mechanism with dummy data scheduling algorithm. That is, the grid level data scheduling mechanism of Model III has the sole task of carrying out the already scheduled either in-site or inter-site data transfers, and all the scheduling of data transfer requests are handled by Grid Job Scheduling Service.
Similar to Model II, DGridSim sets up a site level job scheduling mechanism with dummy job scheduling algorithm for every site and it does not foster a site level data scheduling mechanism at all.
4.4. Model IV grid systems
Common to Model I, II, and III is a single grid level job scheduling mechanism. However, in Model IV, there is no grid level job scheduling mechanism since Model IV schedules jobs in a distributed fashion without the coordination of a central authority.
In Models I, II, and III, there is a site level job scheduling mechanism. However, Model II and III employ a dummy job scheduling algorithm, while Model I requires a site level job scheduling algorithm to assign incoming jobs to computing elements within its site. On the other hand, Model IV needs a site level job scheduling mechanism with distributed job scheduling algorithm. In the distributed job scheduling, Site Job Submission Service receives jobs only from User within this site or from Site Job Dispatch Service of another buddy site of this site. Then, the job scheduling algorithm first tries to determine a computing element within its site and a deadline-satisfying reservation window on this computing element for every job. If it succeeds, the rest of the site level job scheduling process is the same as that of Model I. Otherwise, instead of immediately deleting such a job from the queue as in Model I, Model IV uses a distributed scheduling algorithm so as to possibly migrate this job to one of its buddy sites. All related details about the buddy set based distributed real-time task scheduling could be found in Shin and Chang. 50
Model IV requires a single grid level data scheduling mechanism and a site level data scheduling mechanism for every site. These data scheduling mechanisms in Model IV are exactly same as that of Model I.
4.5. Data replication
Another unique feature of DGridSim is its ability to support the simulation of various data replication algorithms proposed for both federative and hierarchical data organization models. As shown in Table 1, DGridSim supports the simulation of distributed-pull/push and centralized-push data replication algorithms for Model I and IV, centralized-push data replication algorithms for Model II and III. The realization of data replication algorithms in DGridSim can be split into three phases, and these phases are briefly elaborated in the following:
Collecting statistical information using data transfer requests: In order to obtain some statistical information related to the inter-site data transfer requests that are required by the data replication algorithm of interest, Grid Data Replication Service coordinates with Grid Data Scheduling Service in the centralized-push model. On the other, Site Data Replication Service interacts with Site Data Scheduling Service in the distributed-pull model and Grid Data Scheduling Service in the distributed-push model.
Running data replication algorithm: Grid Data Replication Service runs an offline centralized-push based algorithm, while Site Data Replication Service calls for an online distributed-pull/distributed-push based algorithm. In general, a data replication algorithm uses the statistical information collected so far and decides on possible job data replications, each of which is composed of a particular job data to replicate and source and/or destination sites of this replication with respect to the data organization model.
Initiating data transfers between source and destination sites for data replication: In order to realize the replication decisions made by a data replication algorithm, Grid Data Replication Service submits a data transfer request to Grid Data Scheduling Service in the centralized-push model, while Site Data Replication Service informs Site Data Scheduling Service, which in turn sends a request to Grid Data Scheduling Service in the distributed-pull/distributed-push models. It should be emphasized that Grid Data Scheduling Service takes cares of regular inter-site data transfer requests together with the ones due to the data replication.
4.6. Advance reservations
As a real-time data grid system simulator, DGridSim allows the advance reservation of computing elements, storage elements, and links. Specifically, Grid/Site Reservation Service keeps track of all reservations that have been made for those resources under its control in a reservation table. In the reservation table, for every resource, there exists an exclusive row into which reservation objects are inserted. Each reservation object, on the other hand, is defined by a quartet: reservation ID, reservation start time, reservation finish time, reservation size.
In DGridSim, Grid/Site Job Scheduling Service and Grid/Site Data Scheduling Service components can query a Grid/Site Reservation Service for obtaining its reservation table and later submit one or more reservation requests to them so as to guarantee the exclusive use of some resources. The details of advance reservation mechanism are explained below:
Querying reservation service: A scheduling service sends a reservation query message to a reservation service, where a reservation query message is composed of resource IDs or resource type, reservation start time, reservation finish time, and reservation size. Upon receiving such a query message, a reservation service looks up those rows in its reservation table that correspond to resource IDs or resource type in order to find one or more reservation windows in each row, where any window must fit in between reservation start time and reservation finish time while providing at least reservation size. All such reservation windows found and the related resource IDs are returned in a query response message to the related scheduling service.
Making a reservation: After receiving query response message and then discovering one or more windows to meet job/data transfer deadline, a scheduling service needs to send a reservation message composed of one or more reservation creation objects of the form {resource ID, reservation start time, reservation finish time, reservation size}. After successful insertion of all reservation objects into the reservation table, a reservation service returns a reservation ID for the further inquiries to scheduling service.
Canceling a reservation: When a scheduling service decides on cancelling some reservations, it sends a cancel reservation message to a reservation service where the message consists of the reservation ID of those reservations to be canceled. Once a cancel reservation message has been received, this reservation service removes those reservation objects from its reservation table.
5. Software implementation
DGridSim is a simulator that has been designed to achieve three main objectives: modularity, extensibility, and layered architecture. Modularity in this study is defined as the ease of modifying algorithms in existing services. Extensibility is defined as adding new services without burden. Layered architecture means that DGridSim is built in layers where the higher layers make use of the functionalities provided by the lower layers.
DGridSim is written in C++ programming language using C++SIM20 discrete-event simulator library. 51 The C++SIM20 library allows the development of process-oriented discrete-event simulation programs. A program using C++SIM20 library models a system as a collection of C++SIM20 processes which interact with each other by using the C++SIM20 structures. In the following, the software implementation details of DGridSim using C++SIM20 (CSIM shortly) are briefly explained.
5.1. System architecture
DGridSim has a layered system architecture with a well-defined object hierarchy. The hierarchy between objects is established using interfaces (Abstract Base Classes in C++), which provides a flexible and expandable environment for researchers.
DGridSim has four different layers: at the bottom, Base Layer has been developed using C++SIM20 library and it contains low level simulation utilities, such as registering events, advancing simulation time, etc. In the middle, Grid Component Library (GCL) Layer defines interfaces and contracts for all grid services. Furthermore, all four system models have been established by means of these grid services in this layer. User Code Layer at the top consists of the standard and specialized implementations of the interfaces defined in GCL Layer and it is the layer into which researchers can inject their own implementations of different grid algorithms. Finally, Graphical User Interface Layer interacts with GCL Layer and User Code Layer to set up the current simulation scenario as explained in the next section.
If the predefined service contracts or grid models are not adequate, new interfaces should be defined in GCL Layer. It is expected that no intervention should be necessary to Base Layer.
5.2. Resource components
In DGridSim, grid fabric is composed of three types of resource components: computing elements, storage elements, and network links. These components have some definite properties, but they do not have any business logic. For example, a storage element has properties such as storage space and read/write bandwidth. Consequently, the resource components are passive in nature, and resource managers are required to perform necessary operations on them. In DGridSim, there is a single Computing Resource Manager, Storage Resource Manager, and Link Resource Manager in order to manage the related computing, storage, and network resources, respectively. It should be noted that these resource managers are not shown in Figure 1, since they are included in Grid Reservation Service and Site Reservation Service.
DGridSim adopts the style of space-shared computing for its computing elements, that is, a computing element can run a single job at any given time. It associates a computing bandwidth parameter in terms of MIPS (Millions of Instructions Per Second) rating with every computing element so as to enable a job scheduling algorithm to compute how much time it will take the execution of and the reservation size of a job on a particular computing element. Specifically, DGridSim simulates the space-shared use of computing elements for running jobs as follows:
In order to ensure that whole computing bandwidth of a computing element is used up by an individual job, Grid/Site Reservation Service with the help of Computing Resource Manager always reserves a reservation window on any computing element for the execution of a single job.
A job is assigned different states, namely JOB_ SUBMITTED, JOB_RUNNING, etc., to keep track of its life cycle during the simulation of its execution in the system. When a job is scheduled to a computing element, its state is set to JOB_SCHED ULED_TO_PROCESSOR.
Once the state of a job is JOB_SCHEDULED_ TO_PROCESSOR, Site Job Dispatch Service waits until the scheduled start time of this job, and simulates its execution on the related computing element by means of passing the simulation time as much as the job execution time on the computing element.
In DGridSim, there are one or more storage elements within a site, and every storage element and job data are accompanied by storage capacity and job size parameters in megabytes (MB). During a simulation, storage elements keep master and replica copies of job data: Master job data are created on the storage elements depending on the data organization model chosen at the start of simulation, and exist until the end of simulation. They do not consume any storage space, that is, a storage element can keep as many master job data as required by the simulation scenario. On the other hand, there is no replica job data on the storage elements at the start of simulation. While simulation progresses, jobs start submitting data transfer requests. Meanwhile, Grid/Site Data Replication Service may call for a data replication algorithm according to the data replication model of simulation scenario. If this algorithm determines that a job data must be replicated on another storage element, the related replication service initiates a data transfer request with source and destination sites and job data to be replicated. Then, upon receiving such a request, Grid Data Scheduling Service carries out an inter-site data transfer to copy this job data to the remote destination storage element. In this way, the number of replica job data starts from zero and increases till the end of simulation. Different from the master copies, the replica job data consume storage space, so each storage element can keep a limited number of them. Furthermore, they can be deleted for opening some space for the new replica job data, as it will be explained shortly.
Based on the aforementioned explanations, the only way of consuming the storage space on storage elements in DGridSim is to copy replica job data on them through inter-site data transfers by means of Grid Data Scheduling Service. Specifically, DGridSim simulates the use of storage elements for storing replica job data as follows:
During an inter-site data transfer, a replica job data must not be deleted from the source storage element until the end of transmission. In order to ensure this, Grid Reservation Service is inquired to put a lock on this job data before the start of data transfer with the help of Storage Resource Manager. Note that it is possible put multiple locks on the same job data for securing different data transmissions. Once the data transfer is completed, Grid Reservation Service is asked to remove a lock on the job data.
After having a lock on the source job data and finding a path to the destination storage element, Grid Reservation Service is queried if the destination storage element has enough available capacity to keep this new replica job data. Grid Reservation Service returns one of the possible three results according to the reply from Storage Resource Manager: (i) There is sufficient unused storage space. (ii) The amount of unused storage space plus the total size of currently unlocked replica job data is equal to or greater than the size of new replica. In this case, Storage Resource Manager calls for a data replacement algorithm to delete one or more replica job data to free enough unused space for new replica. (iii) The difference between the capacity of storage element and the total size of currently locked replica job data is less than the size of new replica. In case (iii), Grid Data Scheduling Service drops the related data transfer request, so there will be no change in the status of destination storage element. In the first two cases, however, just enough storage space on the destination storage element is initially reserved, so unused storage space cannot be used for keeping any other job data than the current one. Then, at the end of data transfer, this reservation is cancelled, and its unused storage space is decreased accordingly.
It should be emphasized that DGridSim instantiates Grid and Site Replica Location Service components to keep track of the current locations of all master and replica job data available on the storage elements throughout the system.
In addition to the storage elements, every computing element is further equipped with a storage device with virtually infinite capacity so that it can provide enough storage space to store any job data temporarily during its execution. Furthermore, this storage device can be written at the maximum speed of device write bandwidth in MB/s during a data transfer from a source storage element. Note that there is no read to be performed from any storage device, which implies that a computing element cannot be the source of any data transfer. On the other hand, every storage element can be read and written at the maximum speeds of storage read bandwidth and storage write bandwidth in MB/s, respectively. In DGridSim, in order to simulate the use of these read and write bandwidths, computing element-gateway links and storage element-gateway links within sites are used, so the simulation of read/write bandwidths is no different than that of link bandwidths, as it will be explained below.
As a real-time data grid simulator, DGridSim provides end-to-end guaranteed network service for all data transfers. In order to provide such a service, first of all, DGridSim associates a bandwidth parameter in MB/s with every network link, and models any data transfer as a flow with source and destination elements, start and finish times, and job data size. Note that modeling the data transfers by flows is due to the fact that the connection-oriented next generation networks provide flows with bandwidth guarantees as well.28–31 During the simulation of execution of data-intensive jobs, a number of flows will be simultaneously scheduled on the network, so they all share the bandwidth of network links. Specifically, DGridSim simulates the use of network links for transferring job data as follows:
Grid Reservation Service and Site Reservation Service manage the sharing of links with the help of Network Resource Manager. These components ensure that the amount of flow traversing a link never exceeds its bandwidth in any reservation window.
Grid Data Scheduling Service and Site Data Scheduling Service components are in charge of computing the bandwidth requirements of flows via the related data scheduling algorithms. While an inter-site flow is being scheduled, Grid Reservation Service is queried to provide all links with reservation windows that meet the reservation constraints, e.g., reservation start and finish times and reservation size. Based on the returned set of links, a data scheduling algorithm attempts to find out a path from a source storage element to either another storage element or computing element. When a path is found, Grid Reservation Service is asked to insert a reservation creation object for every link on this path.
After successfully establishing a path, Grid Data Dispatch Service waits until the scheduled start time of data transfer, and simulates the use of network links by means of passing the simulation time as much as the data transfer time of the related job data on this path.
5.3. Service components
DGridSim simulates the function of each grid service, such as submission, scheduling, replication, etc., by means of a distinctive CSIM process. It should be noted that a CSIM process is not a real UNIX process; it is rather like a forked child thread in UNIX terminology. The main properties of CSIM processes pertaining to the modeling Grid services are as follows: (i) A CSIM process can be either active or suspended. When it is active, it continues to be active until it suspends itself by executing either a hold statement to advance the simulation time (e.g., waiting for the start time of a scheduled job) or a wait statement to wait for an event to occur (e.g., waiting on the global done event). Note that when a process is active, no simulated time passes. Once the time specified in a hold statement elapses, a suspended process becomes active again. (ii) Processes appear to operate simultaneously with other active processes at the same points in simulated time. This enables the illusion of all Grid services in a chosen simulation scenario running in parallel. Consequently, when processes (or services) are active, they will be performing their related functions that are described in the previous section in a parallel fashion in the simulated time. (iii) DGridSim empowers its processes to interact with each other through static function calls, so their relatively complicated interactions can be adequately modeled.
With the start of simulation, the main simulation process, namely sim, first becomes active. Inside of this process, all Grid services are started to set up the simulation environment. For example, initialization of all Model I services is shown in Figure 2. In this figure, Construct Simulation step creates sim process, which will further instantiate a few simulation primitives such as Grid Builder, Management Site Builder, Site Builder, and Grid. After its creation, sim process invokes Grid Builder first. Grid Builder yields and configures a user process, grid fabric, including routers, links, etc., and reservation service. Next, Grid Builder asks Management Site Builder to create a management site, and sets this management site. Then, Grid Builder inquiries Site Builder to create all other sites, and adds each of them to the grid. Finally, Grid Builder notifies Grid primitive to run all processes that correspond to all grid sites, reservation service, and user, which completes the initialization of the simulation environment.

Starting all grid services for Model I.
6. DGridSim GUI
DGridSim has a graphical user interface (GUI) that greatly facilitates the use of simulator and further highlights its unique features. An example snapshot of DGridSim for General tab is given in Figure 3. In order to create a simulation scenario, a user should first set the system model using Grid Model segment under General tab, which will make GUI show all currently implemented algorithms with respect to the chosen model.

A snapshot from DGridSim GUI for Model I.
In order to simulate data grid systems (Figure 3), Model I has been chosen. In Model I, the job scheduling is carried out in a hierarchical fashion. In DGridSim, seven different grid level online/offline scheduling algorithms are realized: Random, EDF (Earliest Deadline First), MCTF (Minimum Completion Time First), MCTFwDP (Minimum Completion Time First with Data Present), MMwDP (MinMin with Data Present), RT MCTFwDS (Real-Time Minimum Completion Time First with Data Staging) and RT MMwDP (Real-Time MinMin with Data Present), where EDF, MMwDP, and RT MMwDS are examples of offline scheduling algorithms. It should be emphasized that some job and data scheduling algorithms in DGridSim have been already proposed in the literature, e.g. Random, EDF, MCTF, etc. Yet, the others are developed to guide researchers how to implement sophisticated algorithms, e.g. RT MCTFwDS and RT MMwDS, based on some grid services implemented in DGridSim. In addition to seven grid level job scheduling algorithms, DGridSim offers an online site scheduling algorithm, namely RT Max Max (Real-Time MaxMax).
Similar to the job scheduling, in Model I, the data scheduling is carried out in hierarchical fashion as well. DGridSim currently supports three different grid level offline data scheduling algorithms: MinDelay/FPF (Minimum Delay Feasible Path First), MinDelay/FPF + MOFF (Minimum Delay Feasible Path First, Maximum Outgoing Flows First), and kDP + BSMP (k-shortest Disjoint Paths, Balanced Share Multi Path). On the other hand, because of the star connected local area network topology, there is always a single path between any storage and computing elements within a site. As a result, an online, built-in site level data scheduling algorithm easily identifies such unique paths whenever a job data needs to be retrieved from in-site storage elements. This built-in algorithm, however, is not shown in GUI.
In all four models, data organization can be either federative or hierarchical. Regardless of the chosen data organization model, DGridSim offers four options for the data replication: None (no data replication), MRD (Most Requested Data, pull-based and distributed), RT DIW (Real-Time Distributed Weighted, push-based, distributed), and RT CENT (Real-Time Centralized Weighted, push-based, centralized).
In all system models, two options are provided for Data Replacement if a data replacement algorithm is chosen: (1) None: Storage elements reject the replication of job data if they do not have enough storage space to keep them. (2) Least Recently Used (LRU): If a replica job data needs to be copied into a storage element and this storage element does not have enough storage space to keep it, it calls for Least Recently Used algorithm to find out some erasable job data. If LRU algorithm succeeds, those job data as determined by LRU are deleted before the replication.
As shown in Figure 3, DGridSim GUI has other tabs in addition to General. Under Sites, Jobs, Network, and Custom tabs, a user can set all related simulation parameters. For example, Sites tab include such parameters as site count, min/max storage element, min/max storage element capacity, min/max computing element, min/max computing element capacity, etc. All user settable parameters will be revealed in Simulation section. In order to start a simulation, Start button under Execute tab must be pressed. Simulation will be repeated as many times as the number specified by Re-run Count cell and will automatically use one or more CPU cores as indicated by Concurrent Instances cell under Execute tab. Since the snapshots of DGridSim GUI for the other system models are very similar to Figure 3, they are not included in this study.
7. Simulation experimentation
In this section, a set of simulation results will be presented to highlight some unique capabilities of DGridSim.
DGridSim creates a new data grid system based on the simulation parameters under Sites, Jobs, Network, and Custom tabs when a new simulation has been started. The data grid system created will be composed of sites, and Site Count parameter determines the number of sites for the simulated Data Grid system. Remember that sites are split into Tier-0, Tier-1, and Tier-2 sites in the hierarchical data organization model. As a result, in order to control the structure of site hierarchy, DGridSim uses Tier 1 Site Count parameter. That is, DGridSim divides Site Count number of sites into a single Tier-0 site, Tier 1 SiteCount number of Tier-1 sites, (Site Count - Tier 1 Site Count - 1) number of Tier-2 sites. Furthermore, it ensures that every Tier-1 site has at least one child Tier-2 site.
Regardless of the data organization model, every site is equipped by a set of computing elements, storage elements, and a local area network. The parameters related to forming a site except its local area network are as follows: Min/Max Computing Element denotes the minimum/maximum number of computing elements within a site, where every computing element is associated with a computing power in terms of MIPS rating between Min CE Capacity and Max CE Capacity. Min/Max Storage Element shows its minimum/maximum number of storage elements, where every storage element has a storage capacity in terms of MB between Min SE Capacity and Max SE Capacity. In the simulation results presented, data grid systems are formed with respect to the data organization model as follows.
Data grid systems with federative data organization: DGridSim creates a data grid system of Site Count=20 sites, each of which includes U~[Min Computing Element=24, Max Computing Element=36] heterogeneous computing elements and a single U~[Min Storage Element=1, Max Storage Element=1] storage element. The computing elements have MIPS rating of U~[Min CE Capacity=8000, Max CE Capacity=12,000] and storage elements have storage capacity of U~[Min SE Capacity=80,000, Max SE Capacity=120,000] MB. Note that U~ means uniformly distributed. It should be emphasized that similar input data modeling can also be found in other grid system simulators,13,14,21 and the related research studies.39,45,46
Data grid systems with hierarchical data organization: DGridSim forms a data grid system of Site Count=25 sites, which correspond to one Tier-0 site, Tier 1 Site Count=4 Tier-1 sites, and (25-1-Tier 1 Site Count)=20 Tier-2 sites. Remember that Tier-0 and Tier-1 sites have only storage elements, while all Tier-2 sites own both storage and computing elements. Consequently, deploying twenty Tier-2 sites in this organization equals the total computing power of sites in the federative data organization model. Every Tier-1 site is equipped with a single storage element U~[Min Storage Element=1, Max Storage Element=1] whose storage capacity is U~[Min SE Capacity=200,000, Max SE Capacity=300,000] MB, while Tier-0 site is assumed to have an infinite storage capacity. Every Tier-2 site, on the other hand, has U~[Min Computing Element=24, Max Computing Element=36] heterogeneous computing elements with computing power U~[Min CE Capacity=8000, Max CE Capacity=12,000] MIPS, and a single storage element U~[Min Storage Element=1, Max Storage Element=1] whose storage capacity is determined based on the total storage capacity of all Tier-1 sites and Tier SE Capacity Ratio parameter (0 < Tier SE Capacity Ratio ≤ 1.0). That is, the mean total value of Tier-1 storage capacity is equal to the mean total value of Tier-2 storage capacity times Tier SE Capacity Ratio. As a result, the mean total value of Tier-2 storage capacity is (250,000 × 4) × 0.5 = 500,000 MB, if Tier SE Capacity Ratio = 0.5, which results in the mean storage capacity value of 500,000/20 = 25,000 MB per site.
In a data grid system established by DGridSim, there are two types of networks: a local area network for every site and a single inter-site (internet) network that connects all local area networks. Similar to the studies in the literature, these networks are defined based on a few parameters related to routers and links as follows.
In DGridSim, a local area network consists of a single gateway router, links (gateway-computing element) that connect this router to computing elements, and links (gateway-storage element) that connect it to storage elements within this site in a star topology fashion. Thus, the number of links is equal to the number of computing and storage elements in total. In the star connected network, Min/Max CE Link Bandwidth and Min/Max SE Link Bandwidth denote the minimum/maximum value of gateway-computing element and gateway-storage element link bandwidths in terms of MB/s, respectively, while the delay of any in-site link is between Min Site Link Delay and Min Site Link Delay seconds. In addition to these six parameters that are sufficient to define a local area network with star topology, DGridSim needs some other parameters for the generation of an inter-site network: Min/Max Routers is the minimum/maximum number of inter-site network routers (core routers), Min/Max Internet Links is the minimum/maximum number of links that are between core routers and between core and gateway routers, and Min/Max Internet Link Bandwidth and Min/Max Internet Link Delay are the related bandwidth and delay values of these links. It should be emphasized here that DGridSim generates a random internet network topology by means of an in-house topology generation algorithm. This algorithm first randomly determines the number of routers and links according to the aforementioned parameter values, and then creates a connected, random network topology.
During the simulations, the following network parameter values are used. For the local area networks, gateway-computing element links have bandwidth of U~[Min CE Link Bandwidth=400, Max CE Link Bandwidth=600] MB/s; gateway-storage element links have bandwidth of U~[Min SE Link Bandwidth=800, Max SE Link Bandwidth=1200] MB/s; all in-site links have delay of U~[Min Site Link Delay=0.008, Max Site Link Delay=0.0012] sec. In the internet, on the other hand, there are U~[Min Routers=8, Max Routers=12] core routers and U~[Min Internet Links=16, Max Internet Links=24] links whose bandwidths are U~[Min Internet Link Bandwidth=120, Max Internet Link Bandwidth=180] MB/s and delays are U~[Min Internet Link Delay=0.0025, Max Internet Link Delay=0.0075] sec. Site and Network tabs together have all of these parameters.
After the data grid system fabric has been formed, jobs start coming into the system according to Poisson process with Inter-arrival Time=5 sec. DGridSim will create Job Count=2000 number of independent jobs until the end of a simulation run. Each job is modeled by means of a size (Min/Max Job Size) in terms of MI (Million Instruction) to determine its computing time on any computing element, a deadline (Min/Max Deadline) by which it must be completed, and a number of randomly assigned job data (Min/Max Job Data) with respect to a job data access pattern that must be present at the computing element before its execution starts. During the simulations, job sizes are U~[Min Job Size=4,800,000, Max Job Size=7,200,000] MI; their deadlines are U~[Min Deadline=400, Max Deadline=600] sec; their number of job data are U~[Min Job Data=2, Max Job Data=4]. Finally, DGridSim instantiates Data-item Count=10,000 different job data, each of which has a size of U~[Min DI Size=800, Max DI Size=1200] MB. It should be noted here that job data are randomly distributed to all sites in the federative data organization, while they all are initially stored in Tier-0 site in the hierarchical model. Furthermore, they are not deleted during the simulation, and they do not occupy any storage space, whereas their replicas consume it. Jobs tab in GUI has all these job related parameters.
Finally, Custom tab has several user-adjustable simulation parameters that set a) periods for the offline job/data scheduling services and data replication services, and information services b) replication thresholds for the data replication algorithms, c) a data access pattern among random, geometric, and Zipf for jobs, d) buddy set size and the related computing load threshold values.
Because of the space limitations, it is impossible to demonstrate all features supported by DGridSim. As a result, for each system model, a representative set of simulation results are included here to report the satisfiability (the ratio of number of jobs finished before their deadlines to total number of jobs) under varying simulation parameters. In these simulation studies, MinDelay/FPF was chosen as the grid level data scheduling algorithm because of its lower running times; RT CENT was selected as the data replication algorithm since it can be used for all models; LRU was used to complement the data replication algorithm.
All representative sets of simulation results are summarized in Tables 2, 3, and 4. The reported satisfiability data in these tables are average satisfiability figures that are obtained when the following simulation stopping criteria are all achieved: a) Simulation is repeated at least 10 times for the current scenario. b) Relative statistical error (the ratio of the half-width of the confidence interval to the mean of satisfied jobs) for 90% confidence interval is less than or equal to the accuracy value of 0.05.
Performance of job scheduling algorithms with the increasing number of jobs.
Performance of job scheduling algorithms with the increasing mean number of job data.
Performance of job scheduling algorithms with the increasing storage capacity.
7.1. Results for Model I
Three Grid scheduling algorithms are chosen for Model I simulation studies: MCTF, MCTFwDP and MMwDP. MCTF and MCTFwDP are online algorithms, while MMwDP is an offline one. Different from MCTF, MCTFwDP and MMwDP algorithms take into account the location of data (with the extension wDP) while choosing a site for every submitted job. In Table 2, the performance of these algorithms is reported with respect to increasing Job Count, while keeping all other aforementioned simulation parameters unchanged. According to Table 2, the performance of these Model I algorithms slightly decreases as the number of jobs increases, which can be elaborated as follows. Increasing the number of jobs also inflates the number of data transfers that must be completed before the start times of related jobs. On the other hand, when the network reaches a saturation point due to the already scheduled data transfers, it becomes much harder to schedule new jobs due to the scarce network bandwidth, which can lead to dropping more jobs than before. In Table 2, the best performance is obtained by MCTFwDP, which is followed by MMwDP and MCTF in federative model. In hierarchical model, however, the best results are obtained by MMwDP, which is followed by MCTFwDP and MCTF.
In Table 3, the mean number of job data is increased from one (U~[Min Job Data=1, Max Job Data=1]) to five (U~[Min Job Data=4, Max Job Data=6]) for Job Count=2000. As far as the performance of Model I algorithms is concerned, they all demonstrate a deteriorating performance with the increasing number of job data. This performance loss can be explained as follows. Increasing the number of job data results in more network contention. That is, in order to complete the execution of 2000 jobs, the mean number of data transfers required is 2000 and 10,000 if the mean number of job data is one and five, respectively. Such an escalated contention in the network, on the other hand, makes more jobs miss their deadlines, since a job is deemed to be failed even if one of its job data cannot be delivered on time. In Table 3, MCTFwDP and MMwDP are the best performing algorithms for the federative and hierarchical models, respectively.
In Table 4, the mean total storage capacity is varied among Low, Medium, and High, where Low halves Min/Max SE Capacity values, while High doubles them for Job Count=2000 and U~[Min Job Data=2, Max Job Data=4]. All Model I algorithms tend to increase their real-time performances with the increasing storage capacity. This performance gain can be attributed to the fact that the increasing storage capacity boosts the efficiency of the data replication algorithm. That is, the more the storage capacity of a site is, the more replicas can be stored within this site, which increases the temporal data locality in the site. Increasing the temporal data locality, on the other hand, not only reduces the number of costly inter-site data transfers, but also allows more jobs to be completed before their deadlines. Similar to the results in Table 2 and Table 3, MCTFwDP and MMwDP are superior for the federative and hierarchical models, respectively.
7.2. Results for Model II
Three Grid scheduling algorithms, namely FACEF, FACEFwDP, and bFACEFwDP, are selected for the simulations of Model II. FACEF and FACEFwDP are online algorithms, while bFACEFwDP is the offline version of FACEFwDP. With respect to Table 2, both FACEF and FACEFwDP algorithms retain their performance values, while bFACEFwDP experiences performance loss while the number of jobs increases. It is interesting to note that FACEF and FACEFwDP are the only algorithms among all algorithms that show a stable performance against the inflated number of data transfer requests. This capability of FACEF and FACEFwDP is further supported by the results in Table 3 and in Table 4 in which they continue to show a stable performance with the increasing number of job data and the increasing data replication efficiency, respectively. The reason behind this phenomenon is that FACEF and FACEFwDP algorithms reject too many jobs due to the unavailability of computing elements, which results in scheduling a relatively small number of data transfer requests, and creating an illusion of the independence from the increasing number of job data.
In all three tables, FACEF and FACEFwDP show a very similar and stable performance, which is always superior to bFACEFwDP, regardless of the data organization model. In addition, the performance results related to bFACEFwDP in Tables 2, 3, and 4 are similar to those of Model I algorithms in the respective tables. Thus, they are not repeated here.
It is also imperative that Model I and Model II algorithms are compared against each other. In the tables, Model I algorithms generally perform better than Model II ones. The performance of Model I algorithms is mostly limited by the data locality and available network capacity. On the other hand, Model II algorithms are mainly bound by the computing capacity as compared to the job data related mechanisms.
7.3. Results for Model III
Similar to the Model II simulation studies, FACEF, FACEFwDP, and bFACEFwDP grid level job scheduling algorithms are chosen for the simulations of Model III. It should be emphasized that even though Model II and Model III algorithms share the same names, they are essentially different algorithms. This is also evident from the simulation results presented for Model II and Model III systems.
The performance of Model III algorithms with respect to the increasing job size, the increasing number of job data, and the increasing storage capacity is very similar to that of Model I algorithms, which is why it is not elaborated further here. In addition, neither FACEF nor FACEFwDP shows a noticeable performance similar to Model II FACEF and FACEFwDP algorithms. Among Model III algorithms, FACEFwDP is the best one, followed by FACEF and bFACEFwDP regardless of the data organization model.
When Model I, II, and III algorithms are compared, Model III algorithms are inferior to both Model I and Model II algorithms. It should be also noted that Model III algorithms, different from Model II algorithms, are affected by the data locality and available network capacity similar to Model I algorithms. Finally, decoupling job and data scheduling, Model II vs. Model III algorithms, boosts the real-time system performance, which is consistent with the previous results in the literature.
7.4. Results for Model IV
In Model IV, job scheduling is distributed among the site level schedulers, for which DGridSim supports RT Min Max algorithm. During the simulations, Buddy Size=5 and buddy-sets are randomly formed.
Similar to the previous results, the performance of RT Min Max gets worse with the increasing number of jobs and number of job data, and with the decreasing storage capacity. Furthermore, RT Min Max is mostly bound by the data locality and available network capacity similar to Model I algorithms.
Among all algorithms, it is compelling to observe that RT Min Max is the second best algorithm after Model I MCTFwDP for the federative model and the third best after Model I MMwDP for the hierarchical model. Thus, RT Min Max is superior to Model II and Model III algorithms in general.
In order to complement all simulation results presented in Table 2, 3, and 4, the mean running times of DGridSim are reported in Tables 5 and 6, which are obtained on a computer with Intel i7 2.00 GHz processor, 1.5 GB of RAM, and 64-bit Windows 7 operating system. According to Table 5, the running times of DGridSim rises with the increasing number of jobs. It should be noted that increasing the number of jobs causes the number of job data to be scheduled to increase as well. As a result, taking both the increased number of job and job data into account, the running times of DGridSim grow mostly linearly with the number of jobs. In Table 6, the number of jobs is fixed at Job Count=2000, and the number of sites is varied from Site Count=20/25 to 100/125 for the federative and hierarchical data organization models, respectively. Increasing the number of sites rises the computation times of the respective job scheduling algorithms. Since Model II and Model III grid level scheduling algorithms try to find out computing elements for the submitted jobs, their running times are affected the most, as it was observed in Table 6. Once again, there is mostly a linear relationship between the number of sites and the running times of DGridSim.
The running times of DGridSim with the increasing number of jobs.
The running times of DGridSim while the increasing number of sites.
8. Conclusions
This study presents a framework for modeling real-time data grid systems based on a set of well-defined service descriptions and their interactions, which can be used by researchers to create their own data grid simulators, emulators, or real life implementations. This framework further provides an infrastructure in which a wide range of real-time job scheduling, data scheduling and data replication algorithms from the literature, or new algorithms can be embedded. Within this framework, DGridSim, which is a multi-model, discrete-event simulator, is developed. DGridSim embodies many unique features compared to similar simulator studies in the literature. The most distinguished features of DGridSim can be listed as follows: It supports four different data grid system models. All computing, storage, and networking resources are reserved by a reservation mechanism before they are used. DGridSim is designed in a modular and flexible fashion so that the researchers can add their own job scheduling, data scheduling, and data replication algorithms to any of the four models. DGridSim includes the source codes of many job scheduling, data scheduling, and data replication algorithms to help the researchers to write their own codes. Furthermore, it has a GUI that makes it easy to use and create a desired simulation environment.
According to the presented simulation results, it is evident that the grid system model, which is defined by the job and data scheduling and data replication mechanisms to be deployed, has a profound impact on its real-time performance. The hierarchical and distributed job scheduling models together with a hierarchical data scheduling mechanism have mostly provided the best results. The scalability of these approaches as compared to Model II and Model II with the centralized grid level job scheduling further makes them more appealing for the real Grid implementations. In addition, decoupling job and data scheduling mechanisms should be strived for, since it boosts the real-time performance in parallel to the previous work in the literature. Finally, the data organization model of Data Grid system determines the way jobs access their related data under the control of a data scheduling and data replication algorithms, which greatly affects the system performance.
As a future work, other types of applications including bag-of-tasks or workflows in addition to independent tasks will be supported; other known job/data scheduling and data replication algorithms from the literature will be included in order to provide an all-in-one platform to study all aspects of real-time data grid systems
Footnotes
Funding
This work was supported by the Institute of Scientific and Technological Research of Turkey (grant number 108E232).
