Abstract
Today’s cloud providers strive to attract customers with better services and less downtime in a highly competitive market. The need for minimizing the operational cost unavoidably leads cloud providers to rely on third party remote administrators for fulfilling regular maintenance tasks. In such a scenario, the lack of trust in those third party remote administrators paired with the extra privileges granted to them to complete the maintenance tasks usually implies undesirable security threats. A dishonest remote administrator, or an attacker armed with the stolen credential of a remote administrator, can pose severe insider threats to both the cloud provider and its tenants. In this paper, we take the first step towards understanding and mitigating such insider threats of remote administrators in clouds. Specifically, we first model the maintenance task assignments and their corresponding security impact due to privilege escalation. We then mitigate such impact through optimizing the task assignments with respect to given constraints. Finally, the simulation results demonstrate the effectiveness of our solution in various scenarios.
Introduction
Cloud computing has become the cost-saving IT solution for 73% of organizations worldwide [23] and is predicted to grow to a $300 billion business by 2021 [25]. Cloud computing is also affecting our daily lives through its impact on politics (e.g., politicians are increasingly turning to social networks, which are mostly cloud-based), education (e.g., Massive Open Online Course (MOOC) is mostly delivered via cloud), healthcare, entertainment, etc. The success of cloud computing comes from the many benefits it brings to IT management, e.g., the pervasive access from anywhere with an Internet connection, the flexibility of scaling services up or down to fit changing needs, and the efficiency to deploy applications quickly without worrying about underlying costs or maintenance of the infrastructure.
On the other hand, the widespread adoption of cloud computing also attracts more attention to its unique security and privacy challenges [18,58]. In particular, as the cloud service market becomes more and more competitive, cloud providers are striving to attract customers with better services and less downtime at a lower cost. Consequently, the search for an advantage in cost and efficiency will inevitably lead cloud providers to follow a similar path as what has been taken by their tenants, i.e., outsourcing cloud maintenance tasks to remote administrators including those from specialized third party maintenance providers [11]. Such an approach may also lead to many benefits due to resource sharing, e.g., the access to specialized and experienced domain experts, the flexibility (e.g., less need for full-time onsite staff), and the lower cost (due to the fact that such remote administrators are usually shared among many clients).
However, the benefits of outsourcing cloud maintenance tasks come at an apparent cost, i.e., the increased insider threats from remote administrators. Specifically, in order to complete their assigned maintenance tasks, the remote administrators must be provided with necessary privileges, which may involve accesses to physical and/or virtual resources of the underlying cloud infrastructure. Armed with such privileges, a dishonest remote administrator, or an attacker with the stolen credentials of such an administrator, can pose severe insider threats to both the cloud tenants (e.g., causing a large scale leak of confidential user data) and the provider (e.g., disrupting the cloud services or abusing the cloud infrastructure for illegal activities) [17]. On the other hand, cloud providers are under the obligation to prevent such security or privacy breaches caused by insiders [19], either as part of the service level agreements, or to ensure compliance with security standards (e.g., ISO 27017 [32]). Therefore, there is a pressing need to better understand and mitigate the insider threats of remote administrators in clouds.
Dealing with the insider threat of remote administrators in clouds faces unique challenges. First, there is a lack of public access to the detailed information regarding cloud infrastructure configurations and typical maintenance tasks performed in clouds. Evidently, most existing works on insider attacks in clouds either stay at a high level or focus on individual nodes instead of the infrastructure [11,35,60] (a more detailed review of related work will be given in Section 6). Second, cloud infrastructures can be quite different from typical enterprise networks in terms of many aspects of security. For instance, multi-tenancy means there may co-exist different types of insiders with different privileges, such as administrators of a cloud tenant, those of the cloud provider, and third party remote administrators. Also, virtualization means a more complex attack surface consisting of not only physical nodes but also virtual or hypervisor layers. To the best of our knowledge, there is a lack of any concrete study in the literature on the insider attack of remote administrators in cloud data centers.
In this paper, we take the first step towards understanding and mitigating the insider threat of remote administrators in clouds. Specifically, we first model the maintenance tasks and their corresponding privileges based on industrial practices from major cloud vendors and providers. We then model the insider threats posed by remote administrators assigned to maintenance tasks by applying existing security metrics; remote administrators possess elevated privileges due to the assigned maintenance tasks, and those privileges correspond to initially satisfied security conditions, which are normally only accessible by external attackers after exploiting certain vulnerabilities. Such model allows us to formulate the mitigation of the insider threats of remote administrators as an optimization problem and solve it using standard optimization techniques. We evaluate our approach through simulations and the results demonstrate the effectiveness of our solution under various situations. The main contribution of this paper is twofold:
To the best of our knowledge, this is the first study on the insider threat of remote administrators in cloud infrastructures. As cloud providers leverage third parties for better efficiency and cost saving, our study demonstrates the need to also consider the security impact, and our model provides a way for quantitatively reasoning about the tradeoff between such security impact with other related factors.
By formulating the optimization problem of mitigating the insider threat of remote administrators through optimal task assignments, we provide a relatively effective solution, as evidenced by our simulation results, for achieving the optimal tradeoff between security and other constraints using standard optimization techniques.
The preliminary version of this paper has previously appeared in [4]. In this paper, we have substantially improved and extended the previous version, with the following most significant extensions. First, while the previous version focuses on physical and virtual resources (e.g., physical hosts and virtual machines), we have added Section 3.4 to additionally consider a higher level of abstraction, i.e., services or business functions which may involve multiple physical and virtual resources; we do so by integrating the service dependency graph concept [1,59] with the existing resource graph in order to model the impact of service dependencies on cloud security during maintenance time. Second, while the previous version only relies on the k-zero day safety metric [62,63], which only considers the shortest attack path and zero day exploits, we have added Section 3.3 to additionally model the impact of all possible attack paths and exploits of known vulnerabilities using the Bayesian network-based security metric. Those extensions in our models correspondingly lead to additional use cases (Section 4.2) and a series of new simulations (Section 5).
The remainder of this paper is organized as follows. Section 2 presents a motivating example and discusses maintenance tasks and privileges. Section 3 presents the models of task assignment and insider threat. Section 4 formulates the optimization problem and discusses several use cases. Section 5 gives simulation results and Section 6 discusses related work. Finally, Section 7 concludes the paper.
Preliminaries
This section gives a motivating example and discusses maintenance tasks and privileges.
Motivating example
The insider threat of remote administrators depends on the underlying cloud infrastructures. Therefore, we will need the detailed configuration of cloud data centers in order to construct a concrete example of such insider threats. A key challenge here is the lack of public accesses to detailed information regarding hardware and software configurations deployed in real cloud data centers. Consequently, most existing works focus on either high level frameworks and guidelines for risk and impact assessment [36,43,52], or specific vulnerabilities or threats in clouds [20,55], with a clear gap between the two. To overcome such a limitation, we choose to devise our own fictitious, but realistic cloud data center designs, by piecing together publicly available information gathered from various cloud vendors and providers [3], as shown in Fig. 1.

An example of cloud data center.
To above configuration is based on existing concepts and common practices borrowed from major cloud vendors and providers to make our design more representative. For example, we borrow the multi-layer concept and some hardware components, e.g., Carrier Routing System (CRS), Nexus (7000, 5000, 2000), Catalyst 6500, and MDS 9000, from the cloud data center design of Cisco [8]. We synthesize various concepts of the VMware vSphere [31] for main functionality of hardware components in our cloud infrastructure (e.g., authentication servers, DNS, and SAN). We also assume the cloud employs OpenStack as its operating system [45]. The infrastructure provides accesses to both cloud users and remote administrators through the three layer design. Layer 1 connects the cloud to the internet and includes the authentication servers, DNS, and Neutron Server. Layer 2 includes the rack servers and compute nodes. Layer 3 includes the storage servers. OpenStack components run on the authentication servers, DNS server (a Neutron component provides address translation to machines running the requested services), and compute nodes (Nova to host and manage VMs, Neutron to connect VMs to the network, and Ceilometer to calculate the usage) to provide cloud services.
Such a cloud data center may require many maintenance tasks to be routinely performed to ensure the normal operation of the hardware and software components. Such maintenance tasks may be performed by both internal staff working onsite, and remote administrators including those from specialized third party providers. In our example, assume the cloud provider decides to rely on third party remote administrators for the regular maintenance of the five compute nodes (nodes #1–5 in Fig. 1), the authentication server (node #6), and the two controllers (nodes #7 and #8). As an example, Table 1 shows the maintenance tasks that need to be performed on those nodes. For simplicity, we only consider three types of tasks here (more discussions about maintenance tasks will be given in next section).
An example of required maintenance tasks
In such a scenario, the cloud provider naturally faces security challenges due to the fact that necessary privileges must be granted to allow the third party remote administrators to perform their assigned maintenance tasks. For instance, the task of reading log files needs certain read privilege to be granted, whereas modifying configuration files and installing a new system would demand much higher levels of privileges. Even though the cloud provider may (to some extent) trust the third party maintenance provider as an organization, the granted privileges may allow a dishonest remote administrator, or attackers with stolen credentials of a remote administrator, to launch an insider attack and cause significant damage to the cloud provider and its tenants. It is in the cloud provider’s best interest to better understand and proactively mitigate such potential threats. However, this toy example is enough to demonstrate that there exist many challenges in modeling and mitigating such threats.
First, as demonstrated in Table 1, there may exist complex relationships between maintenance tasks and corresponding privileges needed to fulfill such tasks, relationships between different privileges (e.g., a root privilege implies many other privileges), and dependency relationships between services or business functions and the underlying physical and virtual resources used to host such services or functions. Those relationships will determine the extent of an insider threat.
Second, the insider threat will also depend on which nodes in the cloud infrastructure are involved in the assigned tasks, e.g., an insider with privileges on the authentication servers (node #6 in Fig. 1) or on the compute nodes (nodes #1–5) may have very different security implications.
Third, the extent of the threat also depends on the configuration (e.g., the connectivity and firewalls), e.g., an insider having access to the controller node #8 would have a much better chance to compromise the storage servers than one with access to the other controller node #7).
Finally, while an obvious way to mitigate the insider threat is through assigning less tasks to each remote administrator such as to limit his/her privileges, as our study will show, the effectiveness of such an approach depends on many other factors and constraints, e.g., the amount of tasks to be assigned, the number of available remote administrators, constraints like each administrator may only be assigned to a limited number of tasks due to availability, or a subset of tasks due to his/her skill set, etc.
Clearly, modeling and mitigating the insider threat of remote administrators may not be straightforward even for such a simplified example (the solution for this example scenario is given in Section 4.2), and the scenario is likely far more complex for real clouds than the one demonstrated here. The remainder of the paper will present a systematic approach to tackle those challenges.
There exist different types of administrators in cloud data centers who perform maintenance tasks either onsite or through remote accesses [11]. For example, hardware administrators have physical access to the cloud data center to perform maintenance on the physical components. Security team administrators are responsible for maintaining the cloud security policies. Remote administrators (RAs) perform maintenance tasks on certain nodes of the infrastructure through network connections from remote sites. The first two types can be considered relatively more trustworthy due to their usually limited quantity and the fact they work onsite and directly for the cloud provider. The last type is usually considered riskier due to two facts, i.e., they work through remote accesses which are susceptible to attacks (e.g., via stolen credentials), and they may be subcontracted through third party companies, which means less control by the cloud provider. In this paper, we focus on the security risk of such remote administrators (RAs), even though our models and mitigation solution may be adapted to deal with other types of administrators and users if necessary.
There exists only limited public information about the exact maintenance tasks performed by major cloud providers. We have collected such information from various sources, and our findings are summarized in Table 2, which shows sample maintenance tasks mentioned by Amazon Web Service [6], Google Cloud [27], and Microsoft Azure [41]. As to privileges required for typical maintenance tasks, Bleikertz et al. provided five sample privileges required for maintaining the compute nodes in clouds [11], which we will borrow for our further discussions, as shown in Table 3.
Maintenance tasks in popular cloud platforms
Maintenance tasks in popular cloud platforms
Privileges used in this work
To simplify our discussions, our running example will be limited to ten maintenance tasks on three compute nodes with corresponding privileges on such nodes, as shown in Table 4. Later in Section 4.2, we will expand the scope to discuss the solution for our motivating example which involves all the eight nodes.
Maintenance tasks and privileges for the running example
This section presents out threat model and the proposed models of the maintenance task assignment and insider threats.
The threat model and maintenance task assignment model
The in-scope threats we consider include insider attacks from dishonest remote administrators or attackers with stolen credentials of such administrators. Consequently, we assume the majority of remote administrators is trusted, and if there are multiple dishonest administrators (or attackers with their credentials), they do not collude (a straightforward extension of our models by considering each possible combination of administrators as one insider can accommodate such colluding administrators). The third party provider is considered trusted as an organization and it will collaborate with the cloud provider to implement the intended task assignment. The cloud provider is concerned about certain critical assets, such as physical or virtual resources and services or business functions, inside the cloud, and it is aware of the constraints about task assignments such as the number of remote administrators, their availability and skill set, etc. Finally, as a preventive solution, our mitigation approach is intended as a complementary solution to existing vulnerability scanners, intrusion detection systems, and other prevention or mitigation solutions.
The cloud provider assigns the maintenance tasks to remote administrators (RAs) based on given constraints (e.g., which tasks may be assigned to each RA), and consequently the RA will obtain privileges required by those tasks. This can be modeled as follows (which has a similar syntax as [51]). Given
a set of remote administrators a set of maintenance task T, a set of privileges P, the remote administrator task relation the task privilege relation (Maintenance task assignment model).
a maintenance task assignment is given by function
The insider threat model
To model various resources and their relationships in cloud infrastructures, we borrow the resource graph concept [69,72] to represent hardware hosts (e.g., servers and networking devices), software resources (e.g., network services and applications) running on such hosts (only remotely accessible resources are considered), and the causal relationships between different resources (e.g., a zero day exploit on the Web server may lead to user privilege on that server which subsequently causes the application server to be accessible). This concept is more formally stated in Definition 2 and will be illustrated through an example.
(Resource graph [69,72]).
Given a network with the set of hosts H, the set of resources R, with the resource mapping
To quantify the insider threat of remote administrators based on resource graphs, we extend the k-zero day safety security metric [62,63]. Roughly speaking, the metric starts with the worst case assumption that the relative severity of unknown (zero day) vulnerabilities are not measurable; it then simply counts how many different resources must be compromised through such unknown vulnerabilities in order to compromise a given critical asset; a larger count will indicate a relatively more secure network, since the likelihood of having more unknown vulnerabilities all available at the same time, inside the same network, and exploitable by the same attacker, would be significantly lower. The following provides a simplified version of this concept, which will be illustrated through an example.
(Attack path and k-zero day safety [62,63]).
Given a resource graph

Modeling insider threat using the resource graph.
Figure 2 shows an example resource graph for our running example (the dashed lines and shades represent our extension to the model, which can be ignored for now and will be discussed later in Section 4.2; also, only a small portion of the resource graph is shown here due to space limitations). Each triple inside an oval indicates a potential zero day or known exploit in the format
In Fig. 2, the left-hand side box indicates the normal resource graph which depicts what an external attacker may do to compromise the critical asset
Next, given the maintenance task assignment for each RA, we can obtain all the possible paths he/she may follow in the resource graph, starting from all the initially satisfied conditions (e.g.,
Given the maintenance task assignment (i.e.,
The Bayesian network model
The previous section has applied the k-zero day safety metric to model the insider threat of remote administrators. This is a conservative model since the k value is defined based on the shortest attack paths, which attacker may or may not be able to follow in practice. Moreover, the model only considers zero day exploits and known vulnerabilities do not contribute to the k value. In this section, we extend this model by applying the Bayesian network (BN)-based metric [24] instead of the k-zero day safety.
The BN-based metric is based on the conditional probability of reaching the given critical assets given that all initial conditions are satisfied. We first construct a Bayesian network based on the resource graph and the conditional probability that each exploit can be executed given its pre-conditions are all satisfied. Such conditional probabilities can be assigned to both known vulnerabilities based on standard vulnerability scores (e.g., the CVSS scores [40]), and zero day exploits based on a nominal value (e.g., 0.08 [44]). Therefore, the model captures both zero day and known vulnerabilities, and it also takes all attack paths into consideration. Finally, the model can also capture additional casual dependencies, e.g., the same vulnerability appearing on multiple hosts may yield a higher probability (e.g., 0.9 in our examples).
By applying the BN-based metric to the resource graph given in Definition 4, we can obtain the probability for each RA to compromise the given critical assets given all the privileges implied by a maintenance task that is assigned to the RA. Since an RA may be assigned to multiple maintenance tasks, the RA can compromise the critical assets as long as at least one of the assigned tasks enables him/her to do so whose probability can be computed as in Equation (1). We redefine the insider threat model based on those discussions in Definition 5. The model also allows assigning the relative likelihood of each RA to be misbehaving, which can be estimated either based on the background (e.g., third party RAs should be assigned a higher probability than RAs of the cloud provider) or behavior-based detection results if available. Given the maintenance task assignment (i.e., (The BN-based insider threat model).
The service dependency model
The insider threat models introduced in previous sections are based on resource graphs, which are mainly designed to model hardware and software resources. The resource graphs, however, do not directly indicate any higher level services or business functions, the relationships between such services or functions, or their dependencies on the underlying hardware and software resources. For this purpose, the concept of service dependency graph [1,59] has been proposed to model security impact on services [15]. For example, Fig. 3 demonstrates an example in which the lower figure shows an attack graph (which is syntactically equivalent to a resource graph but designed for exploits of known vulnerabilities) depicting various exploits and their relationships, and the upper figure shows the service dependency graph depicting various services; the dashed line edges show the dependencies between the services and corresponding resources involved in the vulnerabilities. The service dependency graph and the attack graph can be integrated and flattened as an extended model. This model can be used to identify attack paths exploiting services or leading to critical assets given as services.

An example of service dependency graph.
We apply the service dependency model [59] to extend our insider threat models introduced in previous sections. For example, in Fig. 4, the left side of Fig. 4 shows examples of service dependency-related exploits which are integrated into the previous resource graph. Each triple inside a shadowed oval indicates a service dependency exploit, and the plaintext nodes in between shadowed ovals indicate the type of impacted service, which can run on either virtual or physical resources; the dash line shows the dependency between services. This model can be used to represent various causal relationships between services and resources, e.g., all services running on top of a server may be compromised if attackers gain full control over the server, a server (and other services) may not be affected when one of the services running on the server is compromised, and a service involving multiple resources may be compromised either when one of the resources is compromised (e.g., a Web service may become unavailable if either the Web, application, or database server is down) or multiple resources are compromised at the same time (e.g., a Web service might be supported by multiple redundant Web servers).

Modeling insider threats using service dependency graph.
We formalize the service dependency resource graph concept in Definition 6. The model extends the resource graph by adding nodes for services and their pre- and post-conditions, edges connecting services to those conditions, and edges inter-connecting the services (or their pre- and post-conditions) and the pre- and post-conditions of exploits (or the exploits). Definition 7 then extends the previous insider threat models based on the service dependency resource graph. We will apply this model in the upcoming sections to study the solution for mitigating the insider threats of remote administrators, and to conduct simulations to evaluate the effectiveness of the solution. In practice, the choice between those different models (e.g., k-zero day safety versus BN, or whether to consider service dependencies) will depend on the needs of specific applications and the available information or assumptions.
Given a network with the set of hosts H, the set of resources R, with the resource mapping
(The service dependency-based insider threat model).
Given the maintenance task assignment (i.e.,
The mitigation and use cases
In this section, we formulate the optimization-based solution for mitigating the insider threat of remote administrators during maintenance task assignment.
The optimization-based mitigation
Based on our definitions of the maintenance task assignment model and the insider threat model, we formulate the problem of optimal task assignment in Definition 8. The remote administrator task relation
(The optimal task assignment problem).
Given a resource graph G, the set of remote administrators
The Optimal Task Assignment Problem (Definition 8 ) is NP-hard.
First, evaluating the
In our study, we use the genetic algorithm (GA) [26] to optimize the maintenance task assignments by maximizing k. Specifically, the resource graph is taken as input to the optimization algorithm, with the (either average case or worst case) insider threat value k as the fitness function. We try to find the best task assignment for maximizing the value k within a reasonable number of generations. The constraints can be given either through defining the remote administrator task relation
We demonstrate our solution through several use cases with different constraints. The first three use cases are based on the five remote administrators and ten maintenance tasks presented in Table 4 and the fourth use case is based on the motivating example shown in Section 2.1. The last use case is based on the service dependency resource graph with three remote administrator and five maintenance tasks.
Maintenance tasks assignments for use case A
Maintenance tasks assignments for use case A
Use case A. In this case, each RA should be assigned with exactly two tasks (e.g., to evently distribute the tasks among all the RAs). The three tables shown in Table 5 show three possible assignments and the corresponding k values. Also, Fig. 2 shows an example path (dashed lines) for tasks assigned to RA C1 based on the left table, and also the shortest path yielding the minimum k value. We use the GA to find the optimal task assignment that meets the constraint given in this case, as shown in the right table, the maximal average of k values among all RAs is
Maintenance task assignments for use case D (the motivating example)

Resource graph for the motivating example.
Use case B. In this case, each RA should be assigned with at least one task (e.g., to ensure all RAs are employed while there is no consideration for their workload). The optimal task assignment under this constraint is (RA1{8,9,10}, RA2{4,5}, RA3{3}, RA4{1,2}, and RA5 {6,7}). This relaxed constraint improves the average k from 2.2 in the previous example to 2.8, which shows relaxing the constraint may increase k (which means less threat).
Use case C. In this case, each RA can handle a fixed subset of tasks (e.g., due to the level of training or skill). In our example, we assume RA1 can be assigned to any task requiring the read privilege, RA2 to tasks requiring write level 1 privilege, RA3 to tasks requiring write level 1 and 2, RA4 to tasks requiring write level 3, and RA5 can be assigned to any task. After applying our solution, the optimal assignment yields the maximal average of k values to be
Use case D. This case shows the optimal maintenance task assignment for tasks discussed in our motivating example in Section 2.1. We have eight RAs and each RA can handle maximum two tasks. The upper table in Table 6 shows the 15 maintenance tasks to be assigned. In Table 6, the four tables on the bottom show four different scenarios of tasks assigned to RAs and each table shows different average k. The bottom table on the right shows the optimal task assignment in term of the average
Use case E. In this use case, we demonstrate how service dependencies may affect the task assignment. We have five maintenance tasks as presented in Table 7. Assume all VMs running on the http compute node have backups but some VMs running on the app compute node do not have a backup. The critical asset is given as the DB service. We have three RAs each of which can be assigned with a maximum of two maintenance tasks. Table 8 shows two possible ways to assign the maintenance tasks to RAs and the corresponding k values. We use the GA to find the optimal task assignment that satisfies the constraints given in this case. As shown in the right table, the maximal average of k values among all RAs is
Maintenance tasks and privileges for the service dependency
Maintenance tasks assignments for use case E

The service dependency resource graph for use case E.

The average k among 500 RAs before and after applying the mitigation solution.

The average k among different number of RAs before and after the solution.

The minimum k for 500 RAs.

The minimum k for varying # of RAs.
This section shows simulation results on applying our mitigation solution under various constraints.
Experimental settings. All simulations are performed using a virtual machine equipped with a 3.4 GHz CPU and 4 GB RAM in the Python 2.7.10 environment under Ubuntu 12.04 LTS and the MATLAB R2017b’s GA toolbox. To generate a large number of resource graphs and service dependency graphs for simulations, we start with seed graphs with realistic configurations similar to Fig. 1 and then generate random resource graphs and service dependency graphs by injecting new nodes and edges into those seed graphs. Those resource graphs and service dependency graphs were used as the input to the optimization toolbox where the fitness function is to maximize the average or worst case insider threat values (given in Definition 4 and Definition 7); also, we used the optimization toolbox where the fitness function is to minimize the probability of reaching the critical asset by using the BN-based metric (given in Eq. (1)) with various constraints, e.g., the number of available RAs and maintenance tasks, how many task may be assigned to each RA, assigning a fixed number of RAs with specific privilege, and assigning some of the maintenance tasks to the local administrators. We repeat each simulation on 300 different resource graphs to obtain the average result.
The average case insider threats. The objective of the first two simulations is to study how the average case insider threat (i.e., the average of k values among all RAs) may be improved through our mitigation solution under constraints on the number of tasks and RAs, respectively. In Fig. 7, the number of available RAs is fixed at 500, while the number of maintenance tasks is varied between 500 and 2,000 along the X-axis. The Y-axis shows the average of k values among all RAs. The solid lines represent the results after applying our mitigation solution under constraints about the maximum number of tasks assigned to each RA. The dashed lines represent the results before applying the mitigation solution.
Results and Implications: From the result, we can make the following observations. First, the mitigation solution successfully reduces the insider threat (increasing the average of k values) in all cases. Second, the results before and after applying the solution decrease (meaning increased insider threat) following similar linear trends, as the number of maintenance tasks increases until each RA reaches its full capacity. Finally, the result of maximum four tasks per RA after applying the solution is close to the result of maximum ten tasks per RA before applying the solution, which means the mitigation solution may allow more (more than double) tasks to be assigned to the same number of RAs while yielding the same level of insider threat.
In Fig. 8, the number of maintenance tasks is fixed at 2,500 while the number of RAs is varied between 400 and 1,000 along the X-axis. The Y-axis shows the average of k values among all RAs. The solid lines represent the results after applying the mitigation solution and the dashed lines for the results before applying the solution. All the lines start with sufficient numbers of RAs for handling all the tasks since we only consider one round of assignment. We apply the same constraint as in previous simulation.
Results and Implications: Again, we can see the mitigation solution successfully reduces the insider threat (increasing the average of k values) in all cases. More interestingly, we can observe the trend of the lines as follows. The dashed lines all follow a similar near linear trend, which is expected since a larger number of RAs means less insider threat since each RA will be assigned less tasks and hence given less privileges. On the other hand, most of the solid lines follow a similar trend of starting flat then increasing almost linearly before reaching the plateau. This trend indicates that, the mitigation solution can significantly reduce the insider threat when the number of RAs is within certain ranges past which it becomes less effective (because each RA already receives minimum privileges). The trend of four tasks per RA is slightly different mostly due to the limited number of RAs.
The worst case insider threats. The objective of the next two simulations is to study how the worst case insider threat (i.e., the minimum k values among all RAs) behaves under the mitigation solution. Figure 9 and Fig. 10 are based on similar X-axis and constraints as previous two simulations, whereas the Y-axis shows the minimum k among all RAs (averaged over 300 simulations).
Results and Implications: In Fig. 9, we can see that the minimum k values also decrease (meaning more insider threat) almost linearly as the number of tasks increases. In contrast to previous simulation, we can see the minimum k values are always lower than the average k values, which is expected. In Fig. 10, we can see the minimum k values also increase almost linearly before reaching the plateau as the number of RAs increases. In contrast to previous simulation, we can see the increase here is slower, which means the worst case results (minimum k values) are more difficult to improve with a increased number of RAs. Also, we can see that the worst case results reach the plateau later (e.g., 900 RAs for 8 tasks per RA) than the average case results (700 RAs).
The impact of the highest privileges. The objective of the next two simulations is to study how the average case insider threat (i.e., the average of k values among all RAs) can be when we assign some RAs with the highest privilege (W3) under our mitigation solution. In Fig. 11, the number of available RAs is fixed at 500 and each RA can handle 4 tasks as maximum, while the number of maintenance tasks is varied between 500 and 2,000 along the X-axis. The Y-axis shows the average k among all RAs. The solid lines show the results of average k after applying our mitigation solution under constraints about the number of RAs grant the W3 privilege before assigning tasks which are 20 RAs, 10 RAs, and no RA are granted the W3 privilege before task assignment, respectively.
Results and Implications: From the results, we can make the following observations. Grant the highest privilege to some of the RAs before assigning maintenance tasks can increase the average k to some degree when compared to the case when RAs are only granted privilege based on tasks needed to be performed. However, this decreases slower than others for the RAs who are granted privileges based on the maintenance tasks. As we can see in the figure, the trend (average k) of 20 RAs granted the highest privilege decreases faster (the insider threat increases) than others, which is expected because the highest privilege does not always correspond to the shortest path (e.g. W3 on the http node corresponds to a longer path than W2 on the app node).
In Fig. 12, the number of maintenance tasks is fixed at 2,500 while the number of RAs is varied between 400 and 1,000 along the X-axis. The Y-axis shows the average k among all RAs. Each RA can perform 10 maintenance tasks at most. The solid lines show the results of average k after applying our mitigation solution under constraints about the number of RAs grant the W3 privilege before assigning tasks which are 40 RAs, 20 RAs and no RA are granted the W3 privilege before task assignment, respectively.
Results and Implications: From the results, we can make the following observations. Granting the highest privilege to some of the RAs before assigning maintenance tasks can increase the average k in all cases, and all cases follow the similar trend of starting flat then increasing almost linearly before reaching the plateau. This trend shows granting the highest privilege to some RAs will increase the average k since the number of RAs are increased and the number of tasks are fixed.

The average k among 500 RAs with some of them granted the highest privilege.

The average k among different numbers of RAs with some of them granted the highest privilege.
The objective of the next two simulations is to study how the worst case insider threat (i.e., the minimum k values among all RAs) behaves under the mitigation solution when we assign some RAs with the highest privilege (W3). In Fig. 13, the number of available RAs is fixed at 500 and each RA can handle four tasks at most, while the number of maintenance tasks is varied between 500 and 2,000 along the X-axis. The Y-axis shows the minimum k among all RAs. In Fig. 13, the number of maintenance tasks is fixed at 2,500 while the number of RAs is varied between 400 and 1,000 along the X-axis. The Y-axis shows the average k among all RAs. Each RA can perform 10 maintenance tasks at most. The solid lines show the results of minimum k after applying our mitigation solution under constraints about the number of RAs granted the W3 privilege before assigning tasks, which are 10 RAs, 20 RAs, and no RA granted W3 privilege in Fig. 13, respectively, and 20 RAs, 40 RAs, and no RA granted W3 privilege in Fig. 14, respectively.
Results and Implications: From the result, we can make the following observations. The minimum k in Fig. 13 follows the similar trend as in Fig. 11 which decreases almost linearly as the number of tasks increases and decrease faster as the number of RAs granted with the highest privilege increases. In Fig. 14, the minimum k increases almost linearly before reaching the plateau.

The minimum k among 500 RAs with some of them granted the highest privilege.

The minimum k among different numbers of RAs with some of them granted the highest privilege.
The impact of local administrators. The objective of the next two simulations is to study how the average case insider threat (i.e., the average of k values among all RAs) behaves when we add a local administrator (LA) to perform MTs with their k value equal to the minimum k. In Fig. 15, the number of available RAs is fixed at 500, while the number of maintenance tasks is varied between 500 and 2,000 along the X-axis. In Fig. 16, the number of maintenance tasks is fixed at 2,500, while the number of RAs is varied between 400 and 1,000 along the X-axis. The Y-axis shows the average k among all RAs in both figures. The solid lines show the results of average k after applying our mitigation solution under the constraint that an LA can perform MT with its k value equal to the minimum k.
Results and Implications: In Fig. 15, we can see that the average k mostly decreases slowly. The local administrator corresponds to the shortest path (minimum k) MTs needed to be performed. Increasing the number of tasks that can be assigned to each RA can increase the average k and the value of the average k decreases more slowly. In Fig. 16, we can see that increasing the number of RAs and eliminating the highest risk tasks (minimum k) by assigning those tasks to the LAs will increase the average k linearly before reaching the plateau.

The average k among 500 RAs while assigning some of the tasks to a local administrator.

The average k among different numbers of RAs while assigning some of the tasks to a local administrator.
The impact of service dependencies. The objective of the next simulations is to study how the service dependency can affect the average k. In Fig. 17, the number of available RAs is fixed at 500, while the number of maintenance tasks is varied between 500 and 2,000 along the X-axis. The Y-axis shows the average k among all RAs. The solid lines show the results of average k after considering the service dependency in our mitigation solution under constraints about the maximum number of tasks assigned to each RA. The dashed lines represent the results without considering the service dependency in our mitigation solution.
Results and Implications: In Fig. 17, we can see that considering the service dependencies will decrease the average k because we would have to consider more critical assets at the same time (e.g. if we want to secure the database VM service from being compromised, we will need to consider any service connected to the database VM as a critical asset). Also, we can see that, the result of k when considering the service dependency is relatively close to the result of k without considering it, which means the mitigation solution is relatively effective for protecting services as well as resources.
In Fig. 18, the number of maintenance tasks is fixed at 2,500 while the number of RAs is varied between 400 and 1,000 along the X-axis. The Y-axis shows the average k among all RAs. The solid lines show the results of average k after considering the service dependency in our mitigation solution under constraints about the maximum number of tasks assigned to each RA. The dashed lines represent the results without considering the service dependency in our mitigation solution.
Results and Implications: From the result, we can make the following observations. Considering the service dependency will increase the average k almost linearly before reaching the plateau. However, when we compare the average k with the result of the resource graph, we find that the average k under service dependency is lower which is expected because we are essentially considering more critical assets under the service dependency.

The average k among 500 RAs with and without considering service dependency.

The average k among different numbers of RAs with and withoutconsidering service dependency.
The objective of this simulation is to show how the minimum k behaves when considering the service dependency. In Fig. 19, the number of available RAs is fixed at 500, while the number of maintenance tasks is varied between 500 and 2,000 along the X-axis. The Y-axis shows the average k among all RAs. In Fig. 20, the number of maintenance tasks is fixed at 2,500 while the number of RAs is varied between 400 and 1,000 along the X-axis. The Y-axis shows the average k among all RAs.
Results and Implications: In Fig. 19, we can see that the minimum k values also decrease almost linearly as the number of tasks increases, which means the insider threat increases. From the results in Fig. 20, we can see that the minimum k values also increase almost linearly before reaching the plateau as the number of RAs increases. The increase here is slower than that in Fig. 18, which means the worst case (minimum k values) are more difficult to improve.

The minimum k among 500 RAs under service dependency.

The minimum k among different numbers of RAs under service dependency.
The BN-based metric. The objective of this simulation is to apply the BN-based metric instead of the k-zero day safety metric on both the resource graph and the service dependency graph. In Figs 21 and 22, the number of available RAs is fixed at 500, while the number of maintenance tasks is varied between 500 and 2,000 along the X-axis. The Y-axis shows the average probability to compromise the critical asset among all RAs.
Results and Implications: From the results, we can make the following observations. In Fig. 21, The probability to reach a critical asset almost increases linearly when the number of maintenance tasks increases while the number of RAs are fixed. Also, we find increasing the maximum number of tasks that can be assigned to each RA slows the increasing rate of the probability which means slower increase in the insider threat. In Fig. 22, we find the result follows the similar trend as the previous figure. However, the probability to reach the critical assets in the service dependency resource graph is much higher than that in the resource graph (e.g. for a maximum four tasks for each RA, the probability is almost 25% higher in the service dependency resource graph), which is expected because the service dependency means attackers would have more options to compromise the critical assets.

The probability to compromise critical assets based on the resource graph.

The probability to compromise critical assets based on the service dependency resource graph.
The insider threat is a challenging issue for both traditional networks and clouds. Ray and Poolsapassit propose an alarm system to monitor the behavior of malicious insiders using the attack tree [49]. Mathew et al. use the capability acquisition graphs (CAG) to monitor the abuse of privileges by malicious insiders [38]. Sarkar et al. propose DASAI to analyze if a process contains a step that meet the insider attack condition [53]. Chinchani et al. propose a graph-based model for insider attacks and measure the threat [16]. Althebyan and Panda propose predication and detection model for insider attacks based on knowledge gathered by the internal users during work time in the organization [5]. Bishop et al. present insider threat definition based on security policies and determine the source of risk [10]. Roy et al. study an employee assignment problem to find an optimal tasks assigned to the employee based on constraints in role-based access control [50].
There exists only limited effort on insider attacks in the context of clouds. Our previous work focuses on applying different threat modeling techniques to cloud data center infrastructures where the focus is on external attackers [3]. Gruschka and Jensen devise a high level attack surface framework to show from where the attack can start [28]. The NIST emphasizes the importance of security measuring and metrics for cloud providers in [43]. A framework is propose by Luna et al. for cloud security metrics using basic building blocks [37]. Resource graphs can be automatically generated by modeling the network and vulnerabilities and many useful analyses may be performed using resource graphs [7,56,67,68]; however, our work is the first to use resource graphs for modeling insider attacks.
There exist many works on network security metrics in general [47,65]. Some of those works focus on modeling known vulnerabilities for network security [2,68] while other works focus on modeling unknown vulnerabilities (zero day attacks) [62,64,70,73], which are usually considered unmeasurable due to the uncertainties involved [39]. A BN-based security metric applies resource graphs to measure the security level of a network [24]; the metric converts the CVSS scores of vulnerabilities into attack probabilities and then obtain the overall attack likelihood for reaching critical assets. We apply this metric to measure insider threats in this paper. Security metrics and measurements in clouds still face many challenges as shown in [14]. Following security standards is shown to be not enough to ensure the security of cloud infrastructures and security metrics may help to evaluate the security level [9]. Halabi and Bellqich use the Goal-Question-Metric to develop quantitative evaluation metric to help the cloud provider to evaluate its cloud security service and to know the level of security [30]. Finally, there exist some works focusing on high level risk assessment for clouds, such as the framework to evaluate the security of clouds based on the security impact in six categories and abstract levels of security impact [52].
The proactive mitigation of security threats can be performed through network hardening. Early works on network hardening focus on breaking all the attack paths that an attacker can follow to compromise a critical asset, either in the middle of the paths or at the beginning (disabling initial conditions) [57,61,66]. Network hardening using optimization is proposed by Gupta et al. in [29], refined with multiple objective optimization by Dewri et al. in [21] and with dynamic conditions by Poolsappasit et al. in [48], and extended as vulnerability analysis with cost/benefit assessment [22] and risk assessment [71]. More recent works [12,13] focus on combining multiple hardening options through optimization, and improving the diversity of networks, respectively. We borrow the optimization-based hardening techniques [12,29] to mitigate the insider threats in this paper. In the context of clouds, there are works on securing the cloud from insider attacks by limiting the trust on the compute node [60]. Li et al. focuses on supporting users to configure privacy protection in compute nodes [35]. Closest to our work, Bleikertz et al. focus on securing the cloud during maintenance time by limiting the privileges granted to the remote administrators based on the tasks assigned to that administrator [11].
There are many works that focus on the service dependency. Some of those existing works focus on the mission impact which relies on the service dependency model to capture the threats that can impact a mission [33,54]. Chen et al. [15] show how the service dependency can affect the system since compromising a service can affect other services hosted in same network. Natarajan et al. [42] develop a tool for automaticaly finding dependency between services in large networks. Another work develops different techniques to monitor service dependencies in distributed systems [46]. Closest to our work, Sun et al. [59] embeds mission impact and service dependency in an attack graph to find how service dependency can impact different missions. We borrow their model of service dependency in evaluating the insider threats of remote administrators.
Conclusion
In this paper, we have modeled the insider threat during maintenance task assignment for cloud providers to better understand the threats posed by third party remote administrators. We have formulated the optimal assignment as an optimization problem and applied a standard optimization technique to derive solutions under different constraints. We have extended our insider threat models to consider service dependencies. Also, we applied the BN-base metric to take all attack paths and known vulnerabilities into consideration. Based on such models, we have conducted simulations for different use cases whose results show our solution can significantly reduce the insider threat of remote administrators. The limitations and corresponding future directions are as follows. First, the current mitigation solution is static in nature, and we will devise incremental solutions to handle streams of new maintenance tasks and dynamics (joining or leaving) of RAs, changing priority or weight of tasks, etc.). Second, our model has kept the cost implicit, and we will consider explicit cost models (e.g., based on the nature of the tasks, the amount or duration of tasks, and privileges needed) and incorporate such cost models into the mitigation solution.
Footnotes
Acknowledgments
Authors with Concordia University are partially supported by the Natural Sciences and Engineering Research Council of Canada under Discovery Grant N01035. Sushil Jajodia was supported in part by the National Institute of Standards and Technology grants 60NANB16D287 and 60NANB18D168, National Science Foundation under grant IIP-1266147, Army Research Office under grant W911NF-13-1-0421, and Office of Naval Research under grant N00014-15-1-2007.
