Abstract
Role mining, the process of deriving a set of roles from the available user-permission assignments, is considered to be an essential step in successful implementation of Role-Based Access Control (RBAC) systems. Traditional role mining techniques, however, are not equipped to handle temporal extensions of RBAC like the Temporal-RBAC (TRBAC) model. In this paper, we formally define the problem of finding a minimal set of roles from temporal user-permission assignments, such that in the resulting TRBAC system, users acquire either the same or a subset of the permissions originally assigned to them for the complete or partial durations of time as specified in the input. We show that the problem is NP-complete and propose a greedy algorithm for solving it. Our algorithm first derives a set of candidate roles from the temporal user-permission assignments and then selects the least possible number of roles from the candidate role set. The final output consists of a set of roles, a user-to-role assignment relation, a role-to-permission assignment relation and a role enabling base describing the time durations for which each role is enabled. Performance of the proposed approach has been evaluated on a number of synthetic as well as real-world datasets.
Keywords
Introduction
Over the years, Role-Based Access Control (RBAC) has become a well-accepted and successful model for enforcing secured access to sensitive information and restricted resources [10,34]. In RBAC, roles are defined by an organization to which users are assigned. One or more permissions are associated with each role. Users acquire necessary authorizations to access resources through these permissions by being members of appropriate roles. In an RBAC system, the set of permissions included in a role is at the disposal of any user assigned to that role for an unrestricted period of time until the role is revoked from the user, typically by an administrator. Thus, RBAC does not impose any constraint on the time duration for which roles can be available to the users. In many real-life scenarios, however, there is a need to restrict the availability of permissions to users only for specific periods of time. To achieve this, it is necessary to impose a constraint on the duration for which a user can assume a particular role assigned to him. Since the basic RBAC model cannot handle such a temporal dimension associated with roles, several extensions of RBAC have been proposed like the Temporal Role-Based Access Control (TRBAC) model [2] and the Generalized Temporal Role-Based Access Control (GTRBAC) model [23]. These models allow roles to be enabled for specific sets of time intervals and remain disabled for the rest of the time.
For organizations choosing to use RBAC, defining a complete and correct set of roles is a primary requirement. This is necessary to ensure that we can derive the most benefit from RBAC. The process of determining the set of roles is known as Role Engineering [8], which can be of two types: top-down [33] and bottom-up [9,38]. The top-down approach analyzes and decomposes the organizational business processes into smaller units in order to identify the permissions required to carry out specific tasks. It is not viable when the number of business processes, users and permissions becomes very large, consequently increasing the number of roles. In contrast, the bottom-up approach, often called role mining, forms the roles in an algorithmic way that can be easily automated. The input to role mining is a set of direct user-permission assignments. Role mining generates a set of roles, a user-role assignment relation denoting which roles are assigned to each user and a role-permission assignment relation capturing the permissions associated with each role. Various kinds of metrics have been considered for optimality during role mining, most notably the number of roles generated [35] and the weighted structural complexity [30]. Since role mining might result in roles that are semantically less meaningful, a combination of top-down and bottom-up approaches, known as hybrid role engineering, has recently been proposed [13,30].
The traditional role mining techniques [3,5,9,25,38] do not consider any temporal information that could be associated with the user-permission assignments while deriving a set of roles. Hence, these are not suitable for mining roles in the context of temporal role-based access control models like TRBAC [2]. We call such user-permission assignments as temporal user-permission assignments and the process of mining roles from them as temporal role mining. Each user, on being assigned a subset of the derived set of roles, acquires the corresponding permissions only for those durations of time for which the roles are enabled. Thus, the process of temporal role mining not only generates the set of roles, user-role assignments and role-permission assignments, but also the sets of time intervals during which each role is enabled (captured in a Role Enabling Base – REB).
It may be noted that the sets of time intervals for which users acquire various permissions through their assigned roles after role mining can be required to match either exactly or with limited deviations from the sets of time intervals for which they were originally assigned those permissions. This would depend on specific organizational needs and it affects the nature of the target role set generated by the process of temporal role mining. The problem of finding a minimal set of roles from initial temporal user-permission assignments can have different formulations depending on whether an exact or an near-to-exact (partial) match is the final objective. In this paper, a generalized formulation of the temporal role mining problem for TRBAC is presented that incorporates the notions of both exact and partial match. We name this problem as the Generalized Temporal Role Mining Problem (GTRMP).
The main contributions of the current work can be summarized as follows:
We formally define the Generalized Temporal Role Mining Problem and show it to be NP-complete. This problem definition extends the work presented in [28] by incorporating both exact and inexact variants of the problem. The acceptable level of mismatch for the inexact variant can be set appropriately by specifying the value of an input parameter to the problem.
We propose a greedy algorithm to solve GTRMP. It incorporates several improvements over the one proposed in [28].
Results of extensive experiments in terms of the number of roles produced and the overall execution time with both synthetic as well as real-world datasets are presented. These show significant improvement in comparison to the previous algorithm. Moreover, results on real-world datasets indicate that the proposed approach can be effectively used for temporal role mining in practical applications.
The proposed approach operates in two phases, namely, (a) generation of candidate roles and (b) selection of a minimum cardinality subset of candidate roles and assigning them to appropriate users so that the intended type of match (exact or partial) between the role mining output and the input temporal user-permission assignments is obtained. For the second phase, four different greedy choices are proposed using which candidate role selection can be done.
The rest of the paper is organized as follows. Section 2 discusses some of the preliminaries on RBAC, TRBAC and role mining. Section 3 formally defines the Generalized Temporal Role Mining Problem. In Section 4, we present our approach to solve the problem and illustrate its working with an example. Experimental results, both on synthetic as well as real-world datasets, are presented in Section 5. Section 6 reviews existing literature and Section 7 concludes the paper with some insight into future work.
Preliminaries
This section presents some of the preliminary concepts related to the RBAC and TRBAC models as well as role mining that would help in understanding our proposed work.
Role-Based Access Control (RBAC)
In the core RBAC model [10] (the
Set of permissions,
Core RBAC also has sessions, which have not been included here as they are not directly relevant in the context of role mining.
Temporal Role-Based Access Control (TRBAC)
The TRBAC model [2] is an extension of RBAC in which each role is enabled for a set of time intervals as specified in a Role Enabling Base (REB). Otherwise, they remain disabled. In order to specify the sets of time intervals for which a role can be enabled, TRBAC uses the periodic expression formalism [2]. A periodic expression denotes an infinite set of time intervals represented in terms of Calendars, each of which is a set of consecutive time intervals. Let
As an example, the periodic expression
Role mining
Role mining is the process of deriving a set of roles from direct user-permission assignments represented in the form of a boolean matrix called the UPA matrix. The rows of UPA correspond to users and the columns correspond to permissions. An entry (
Mining roles in the context of TRBAC
In this section, we first describe how the temporal user-permission assignments can be represented in the form of a matrix. We then present a generalized version of the temporal role mining problem, which we name as GTRMP.
Temporal UPA matrix
Unlike the UPA matrix introduced in Section 2.3 that captures static binding of permissions to users, a temporal user-permission assignment relation describes the set of time intervals for which one or more permissions are available to each user. It is assumed that such information is available with the system administrator and is provided as input for temporal role mining. We represent these temporal assignments in the form of a matrix and name it as the Temporal UPA (TUPA) matrix. The rows of TUPA correspond to users and the columns represent permissions. Each cell (
Example Temporal User-Permission Assignment (TUPA) matrix
Example Temporal User-Permission Assignment (TUPA) matrix
Table 1 shows an example TUPA matrix in which user
Given a TUPA matrix as input, temporal role mining derives a set of roles ROLES, a user-role assignment relation UA, a role-permission assignment relation PA and a role enabling base REB. UA and PA are boolean matrices similar to the ones obtained through traditional role mining as described in Section 2.3. The need to generate the REB makes the temporal role mining process different and more challenging than the traditional role mining algorithms.
Once temporal role mining is complete and a TRBAC system has been obtained, the sets of time intervals for which the users acquire various permissions through roles and their enabling conditions, can be determined from UA, PA and REB. These temporal user-permission assignments in the generated TRBAC system either match exactly or there are some mismatches with the TUPA, which could be of the following types:
Mismatch in the set of permissions of any user. Mismatch only in the set of time intervals during which any permission is available to a user. Mismatch both in the set of permissions and the time intervals of availability of those permissions. one or more users acquire less number of permissions after temporal role mining than they were originally assigned; while the users acquire only those permissions that were originally assigned to them, the sets of time intervals for which they get these permissions are subsets of the sets of time intervals corresponding to the original permission assignments; a combination of the two.
Thus, depending on the type of mismatch, the following situations may arise (under closed world assumption, where only more restricted availability of permissions to users is considered):
We say that the output of temporal role mining is temporally δ-consistent with the input TUPA, if there are δ or lesser number of mismatches of the types mentioned above. We call δ as the temporal mismatch limit. Depending on the value of δ, the problem of temporal role mining can have two versions: the exact version in which
(Generalized Temporal Role Mining Problem (GTRMP)).
Given a set of users USERS, a set of permissions PRMS, a temporal user-permission assignment TUPA and a temporal mismatch limit
Note that an exact version of the problem where
It needs to be emphasized here that, GTRMP is not a trivial extension of δ-approx RMP introduced in [35], where the threshold is simply the number of 0s and 1s obtained by combining the UA and PA matrices generated by the role mining procedure that do not match with the input UPA matrix. GTRMP can be reduced to δ-approx RMP if a mismatch only in the permission component of the temporal mismatch limit δ is considered and the sets of time intervals corresponding to all the input temporal user-permission assignments are taken to be the same. Thus, δ-approx RMP is also a special case of GTRMP. Besides, the sets of time intervals corresponding to the temporal user-permission assignments can span over several calendars and can be different from each other.
The Generalized Temporal Role Mining Problem is NP-complete. The proof is given in the Appendix.
Greedy algorithm for solving GTRMP
In this section, we propose a heuristic algorithm for solving GTRMP that takes as input a TUPA matrix and a value of the temporal mismatch limit δ. We first generate a set of candidate roles from TUPA. A greedy heuristic then selects the least possible number of roles out of these candidate roles so that the output is temporally δ-consistent with the TUPA. This section also illustrates the working of our approach using an example.
Algorithm description
The following two sub-sections describe our approach for solving GTRMP using two phases.
Creation of candidate roles
We first describe how the set of candidate roles is created from an input TUPA matrix. The set of candidate roles consists of three types of roles – unit roles, initial roles and generated roles.
Unit roles. As mentioned in Section 3.1, each user-permission combination in TUPA consists of one or more entries of the form
Initial roles. In this step, for each user, one or more initial roles are created by taking into consideration the set of time intervals corresponding to each permission assignment. If one or more permissions are assigned to a user u for the same set of time intervals, then the user u, the assigned permissions and the corresponding set of time intervals are combined together to create an initial role.
We illustrate the process of creating the set of initial roles, denoted by
If
If
If
If
If
In general, if a user is assigned n permissions and t distinct sets of time intervals exist across those n permission assignments, then a total of t initial roles will be created for that user.
Generated roles. Pairwise intersection among the initial roles is next carried out to create roles consisting of a set of permissions and a set of time intervals that are common between any two users. We call this set of newly created roles as
Role
The set of candidate roles, denoted by
Role selection
The set
At the end of the role selection phase, at least
Strict Coverage of Triples (SCT): Select the role that fully covers the maximum number of uncovered or already partially covered triples. If there is a tie, break the tie by selecting the role that partially covers the maximum number of uncovered or already partially covered triples.
Weak Coverage of Triples (WCT): Select the role that covers (either fully or partially) the maximum number of uncovered or already partially covered triples. If there is a tie, break the tie by selecting the role that fully covers the maximum number of uncovered or already partially covered triples.
Strict Coverage of Uncovered Triples (SCUT): Select the role that fully covers the maximum number of uncovered triples. If there is a tie, break the tie by selecting the role that partially covers the maximum number of uncovered triples. If no such role exists, select the role that fully covers the maximum number of already partially covered triples and break the tie by selecting the role that partially covers the maximum number of already partially covered triples.
Weak Coverage of Uncovered Triples (WCUT): Select the role that covers (either fully or partially) the maximum number of uncovered triples. If there is a tie, break the tie by selecting the role that fully covers the maximum number of uncovered triples. If no such role exists, select the role that covers (either fully or partially) the maximum number of already partially covered triples and break the tie by selecting the role that fully covers the maximum number of already partially covered triples.
Since at least
SCT gives priority to roles that fully cover more number of triples in a single iteration. Its tie breaking logic ensures that triples that remain partially covered as a result of selecting a role in a particular iteration may be covered fully by some other role in a subsequent iteration, and in the final output, lesser number of uncovered triples are present than partially covered ones. In contrast, WCT selects roles in such a manner that both full and partial coverage of triples are ensured as a specific number of triples need to be fully covered and also it is desirable to have lesser number of triples left uncovered. WCT breaks ties in such a manner that more number of triples get fully covered in each iteration.
SCUT initially tries to select a role that fully covers the maximum number of triples not covered so far. This implies that SCUT gives higher priority to the uncovered triples over the partially covered ones, as we intend to have less number of uncovered triples at the end of role selection. Tie is broken by selecting the role that fully covers the maximum number of uncovered triples, thereby giving precedence to roles that will result in more number of uncovered triples getting fully covered after being selected. Proceeding in this manner, after first few iterations, every triple of the TUPA will be partially covered. In successive iterations, roles that fully cover the maximum number of such already partially covered triples are selected and if there is a tie, it is broken by selecting the role that partially covers the maximum number of already partially covered triples, thereby preferring triples getting partially covered rather than remaining uncovered. In contrast, WCUT first selects those roles that cover (either fully or partially) more number of uncovered triples, thus giving priority to full and partial coverage of triples as opposed to their remaining uncovered. Tie is broken by giving preference to the role that fully covers more number of uncovered triples. Subsequently, when some of the triples get fully covered and the rest get partially covered, WCUT selects a role that covers (either fully or partially) the maximum number of triples that are already partially covered, and in the event of a tie, breaks it by selecting the role that partially covers the highest number of already partially covered triples.
Thus, SCT and WCT do not take into explicit consideration whether a particular triple has already been partially covered during role selection, whereas SCUT and WCUT first consider only those triples that have not been covered so far while selecting a role. Hence, SCT and WCT as well as SCUT and WCUT are pairs of complementary choices, i.e., the initial criterion for role selection in one becomes the criterion for tie breaking in the other. The four choices for role selection essentially differ from each other with regard to the nature of coverage of the different types of triples.
After the best candidate role is selected in each iteration, the fully and partially covered triples are marked appropriately and Γ is updated. The algorithm terminates when
We summarize the steps of our proposed approach before presenting the detailed algorithms.
Candidate role generation:
Creation of three types of roles from the input TUPA:
Unit roles: Single permission roles that can be assigned to a single user and can be enabled for a single set of time intervals, created corresponding to each triple of the TUPA.
Initial roles: Created by combining the different permissions assigned to a single user for the same sets of time intervals.
Generated roles: Produced by performing pairwise intersection among the initial roles.
Creation of candidate role set by taking union of the sets of unit, initial and generated roles.
Merging of appropriate candidate roles.
Role selection:
Iterative selection of a minimal cardinality subset of the candidate role set using any one of the four greedy heuristics, namely, SCT, WCT, SCUT and WCUT.
Selecting the best candidate role in each iteration and updating the TUPA.
Execution stops when required number of triples have been fully covered.
Further merging among the selected candidate roles, if possible.
Generating the set of roles R, the user-role assignment relation UA, the role-permission assignment relation PA and the role enabling base REB as output.
The algorithm for determining the set of candidate roles (Det-Cand-Roles) is shown in Algorithm 1. Line 1 initializes

Det-Cand-Roles

Sel-Fin-Roles
The algorithm for selecting the final set of roles (Sel-Fin-Roles) is shown in Algorithm 2. The input to Sel-Fin-Roles is the set
The overall time complexity is
Simplified representation of the TUPA shown in Table 1
We illustrate the working of our proposed approach with the help of the TUPA given in Table 2, which is a simplified representation of the TUPA of Table 1.
Candidate role generation
The TUPA matrix is given as input to the algorithm Det-Cand-Roles. First, the set
Role selection
The algorithm Sel-Fin-Roles takes the set If the value of δ is set to 2, then the final number of roles would be 5 if SCT or WCT is used, whereas it would be 6 if SCUT or WCUT is used. The cells highlighted in gray in Table 3 indicate the roles that are not selected when
Candidate role selection
Candidate role selection
Output for the TUPA of Table 2
In this section, we evaluate the performance of our proposed approach in terms of the number of roles produced and the overall execution time. The algorithms were implemented in C on a 3 GHz Core 2 Duo processor machine having 2 GB memory and running Fedora 13 as the operating system. We carried out experiments using both synthetically generated datasets as well as real-world datasets. Appropriate code optimization was done to reduce the overall execution time. During the creation of initial and generated roles, whenever a new role is created, it is first checked if any existing role contains the same set of permissions and is enabled for the same set of time intervals. If so, the two sets of users are merged. As a result, the sizes of the initial and generated role sets are reduced, which in turn reduces the time required to create the candidate role set. Creating the initial and generated roles in this manner also decreases the overall execution time if the number of distinct sets of time intervals is increased, which is reflected in the experimental results. We conclude this section with a discussion, summarizing the observations from the results of both types of datasets.
Results on synthetic datasets
This sub-section first describes how the synthetic datasets are generated followed by results of our experiments on these datasets.
Dataset generation
For synthetically generating the datasets, a simulator has been built that creates TUPA matrices of different dimensions. For our experiments, number of users has been taken to be 100, 200 and 400 and the number of permissions is considered to be half, equal and double the number of users. The number of distinct sets of time intervals is varied from 1 to 3. When it is 1, all the permissions are assigned for the same set of time intervals. This case is denoted as 1D. When 2 distinct sets of time intervals are present, they can be related to one another in three different ways: (i) one set of time intervals contains the other (2C), (ii) the two sets of time intervals intersect, but neither contains the other (2O) and (iii) the two sets of time intervals are mutually disjoint (2D). For 3 distinct sets of time intervals, the interrelationships are of the following types: (i) two sets of time intervals intersect one another, but neither of them contains the other and the third one is disjoint from the two (2O1D), (ii) all the three sets of time intervals are mutually disjoint (3D), (iii) one set of time intervals contains another and the third is disjoint from the two (2C1D), (iv) one set of time intervals contains the other, which in turn contains the third (3C) and (v) all the three sets of time intervals intersect one another, but none of them contains any of the remaining two (3O). Each user-permission assignment of the TUPA matrices is randomly associated with one or more of the above mentioned types of sets of time intervals when the distinct sets of time intervals is 2 and 3. The value of the temporal mismatch limit δ is taken as 0, 0.5, 1 and 2 percent of the total number of triples in the TUPA. Since these datasets are randomly generated, each experiment was repeated 30 times. For all the experiments, we report the median number of roles obtained and the mean execution time over all 30 runs. In our results, we show how the number of roles produced and the execution time of our approach are affected due to variation in the number of users, number of permissions, value of δ and the different time interval relationships. Performance of all the four proposed heuristics were found to be similar both in terms of the number of roles and the execution time, with the SCT heuristic described in Section 4.1.2 producing the best results in most of the cases. Hence in the rest of this section, we report the results obtained by using SCT for role selection.
Results
Tables 5–7 show the median number of roles obtained when the number of users are 100, 200 and 400, respectively. Results have been reported in each of the three tables for all the nine cases of time interval relationships mentioned in the previous sub-section and for all values of δ.
Number of roles for 100 users
Number of roles for 100 users
Number of roles for 200 users
Number of roles for 400 users
The temporal role mining problem reduces to the non-temporal role mining problem for case 1D with
It may be observed from the tables that, cases 2C and 3C respectively produce the least number of roles among all the combinations of 2 and 3 distinct sets of time intervals most of the time. In these cases, due to containment of the sets of time intervals, a user is effectively assigned a permission for a single set of time intervals, thereby resulting in limited number of time interval splitting during role selection. The number of roles obtained for cases 2O, 2O1D and 3O are most of the times higher than those obtained for cases 2C, 2C1D and 3C, respectively, since overlapping time intervals result in more splitting of time intervals than contained time intervals. However, cases 2C1D and 2O1D produce equal number of roles in many cases. Presence of the single disjoint set of time intervals effectively increases the total number of distinct sets of time intervals from 1 to 2 for case 2C1D in comparison to cases 2C and 3C, but for 2O1D, the total number of distinct sets of time intervals remains 3. As a result, 2C1D sometimes produces the same number of roles as that produced by 2O1D, which is not observed for cases 2C–2O and 3C–3O. Disjoint sets of time intervals, though do not result in time interval splitting, increase the effective number of distinct sets of time intervals in comparison to contained sets of time intervals. So, case 2D, in general, generates more roles than 2C and cases 3D and 2C1D produce higher number of roles than 3C. For similar reasons, 2C1D produces lesser number of roles than 3D. However, 2C1D sometimes generates the same number of roles as that produced by 3D, thereby showing that the cumulative effect of contained and disjoint sets of time intervals can produce results similar to the case containing only disjoint sets of time intervals.
Between the cases 2O and 2D, the former, in general, generates higher or equal number of roles. Splitting of the sets of time intervals both during candidate role generation as well as selection due to the presence of overlapping sets of time intervals, sometimes outweighs the effect of the presence of the disjoint sets of time intervals, resulting in more number of roles in case of overlapping time intervals. However, such splitting can, at times, produce effects similar to that observed in the case of disjoint sets of time intervals, ultimately producing equal number of roles. Similar reasoning applies for cases 3D and 3O. In general, the number of roles for 2O1D is lesser or equal to 3O. The lesser number of roles for 2O1D are produced due to the presence of more number of overlapping time intervals in 3O, leading to greater amount of time interval splitting. However, the combination of overlapping and disjoint sets of time intervals, in some cases, produces an effect similar to that of overlapping sets of time intervals, thereby generating equal number of roles for 2O1D and 3O.
It can be observed that as the number of distinct sets of time intervals increases from 1 to 2, the number of roles generated also increases due to the splitting of time intervals and the presence of more number of disjoint sets of time intervals. When the number of distinct sets of time intervals is 3, the number of triples in the TUPA matrix is more than that for 2 distinct sets of time intervals. Since splitting of time intervals is not carried out during initial role creation, roles that can fully cover a large number of triples are created, which ultimately does not increase the final number of roles obtained for cases of 3 distinct sets of time intervals than those of 2 distinct sets of time intervals. The tables further show that the number of roles increases with increase in the number of users and permissions. The increase in the number of roles with an increase in the number of permissions is more prominent in case of higher number of users.
Execution time (in ms) for 100 users
Execution time (in s) for 200 users
Execution time (in s) for 400 users
Next, the execution time of our algorithm is reported in Tables 8–10 for the same datasets considered in Tables 5–7. It is seen from the tables that the execution time increases with increase in the number of users and permissions. The rate of increase matches with the time complexity of the algorithm mentioned in Section 4.1. Also, increasing the value of δ reduces the execution time as expected. It is observed from the tables that case 3D takes less time than 2D, which in turn takes lesser time than 1D. This is because, with an increase in the number of disjoint sets of time intervals, more users are assigned different permissions for the disjoint sets of time intervals. As a result, pairwise intersection among the initial roles leads to the creation of a smaller number of generated roles. Thus, the overall size of the candidate role set is reduced, which in turn, reduces the total execution time.
When two or more sets of time intervals overlap or are contained within one another, the execution time is much higher than the cases having all disjoint sets of time intervals since intersecting or contained time intervals, in general, result in the creation of higher number of generated roles. So, 3D has a lower execution time than both 2O1D and 2C1D, which again have lower execution time than 3O and 3C, respectively. Moreover, contained time intervals result in higher execution time than overlapping time intervals. Though in case of contained time intervals, the effective number of distinct sets of time intervals is less than that of overlapping time intervals, the former results in a higher number of time interval splitting during the creation of generated roles as compared to overlapping time intervals, thereby creating more number of candidate roles. Therefore, 2C, 3C and 2C1D, in general, take more time than 2O, 3O and 2O1D, respectively. It may be noted that, 2D, 2C and 2O respectively have longer execution time than 3D, 3C and 3O. This shows that with increase in the number of distinct sets of time intervals, the execution time decreases as a result of the code optimization mentioned at the beginning of this section. Thus, it can be concluded that the execution time not only depends on the number of distinct sets of time intervals, but also on the interrelationships among those intervals.
In this sub-section, we present results on the number of roles generated and the corresponding execution time for several real-world datasets introduced in [9]. We compare the performance of our current approach in terms of these two evaluation criteria with the approach presented in [28]. The datasets used for performance evaluation were also used by a number of other non-temporal role mining algorithms for reporting their results [24,31].
Dataset description
Table 11 shows the details of the UPA matrices of the real-world datasets along with the number of roles generated by the non-temporal role mining algorithm of [9]. UA and PA matrices for these datasets are first obtained by decomposing the respective UPA matrices using the biclique cover based approach proposed in [9]. Since these datasets are non-temporal in nature, the REBs are synthetically generated. For creating the REBs, 10 distinct sets of time intervals are considered and these time intervals are related to one another via the relationships of containment, overlap, consecutiveness and disjointedness. This is done keeping in mind real-life scenarios, where different roles can be enabled for different sets of time intervals and the interrelationships among these intervals can be of varied nature. Each role of a particular dataset is enabled for any one of the 10 sets of time intervals. The real UA and PA matrices and the synthetically generated REBs are combined to derive the TUPA matrices, which serve as input to our temporal role mining algorithm. Thus, each user-permission assignment in a TUPA is associated with one or more of these 10 sets of time intervals. Since the REBs are randomly generated, 30 different REBs are considered for each dataset, resulting in that many number of random temporal UPA variants. The value of δ is taken to be 0, 0.5, 1 and 2 percent of the total number of temporal user-permission assignments in each of the TUPA matrices. For each dataset, the median number of roles over the 30 executions is computed, which is reported as a percentage of the number of non-temporal roles given in the rightmost column of Table 11. In case of the overall execution time, the mean of the 30 executions has been reported. The next sub-section also presents a comparison of the proposed approach with the approach presented in [28]. Henceforth, for the sake of convenience we refer to the approach proposed in the current paper as PROP and that given in [28] as DBS. Since DBS provides an exact solution, only the results for
Real-world datasets
Real-world datasets
Figure 1 shows the number of roles produced by PROP and DBS as percentage of the corresponding non-temporal roles for all the nine datasets. When

Roles generated for real-world datasets.
Execution time for the real-world datasets is reported in Table 12. Since the overall execution time of PROP did not show much variation with increase in the value of δ, we report the execution time for only
Execution time (in s) for real-world datasets
In this sub-section, we summarize our observations from the results obtained from both synthetic and real-world datasets. Results from both these datasets show that with an increase in the value of δ, the number of roles produced and the execution time decreases. Thus, for systems where a limited degree of mismatch is tolerable, a considerably smaller role set can be obtained in a reduced amount of time than that obtained when no mismatch is allowed. The number of roles and execution time are also affected by the interrelationships among the time intervals. In general, contained sets of time intervals produce lesser number of roles than both overlapping and disjoint sets of time intervals. Also, it is observed that overlap among the sets of time intervals results in time interval splitting, leading to higher number of roles than disjoint sets of time intervals. However, since presence of disjoint sets of time intervals increases the effective number of distinct sets of time intervals, in certain cases they produce the same number of roles as that produced by overlapping sets of time intervals. Increase in the number of disjoint sets of time intervals reduces the overall execution time. Disjoint sets of time intervals result in lower execution time than overlapping sets of time intervals, which in turn, have lower execution time than contained sets of time intervals. Moreover, comparative results on the real-world datasets show that our current approach outperforms the approach proposed in [28].
Related work
Role mining, a bottom-up role engineering technique, is the process of deriving a set of roles from the available user-permission assignments. Vaidya et al. define Basic-RMP as the problem of finding a minimal set of roles from the input user-permission assignment relation [35,36]. They prove it to be NP-complete and also map the problem to the Minimum Tiling Problem of databases. A heuristic algorithm for finding the minimum tiling is adopted to solve Basic-RMP. Two algorithms were proposed by them in [38], namely, CompleteMiner and FastMiner to derive a set of candidate roles from a given user-permission assignment relation via subset enumeration. Several variants of the role mining problem have been proposed in the existing literature of which Edge-RMP [25] aims to find a set of roles, a UA and a PA so that
Lu et al. [25] model RMP as an optimal boolean matrix decomposition (OBMD) problem. They consider different variants of OBMD and solve them using binary integer programming. Frank et al. in [12] propose a probabilistic model that clusters boolean data. In this model, a single object can simultaneously belong to multiple clusters. The authors have shown that the proposed method can be used to solve the role mining problem. The user-permission assignment data given as input to role mining can contain several erroneous assignments which can propagate to the mined set of roles. Such erroneous assignments are called noise. Vaidya et al. in [37] propose a noise model that considers both over-assignment and under-assignment noise. They show that the algorithm proposed in [36] for solving δ-approx RMP can be used to handle over-assignment errors. However, this algorithm is not able to handle under-assignment errors.
Blundo and Cimato propose a Simple Role Mining Algorithm (SMA) [3]. In each iteration, SMA selects a particular row of the UPA, either randomly or depending on whether the user corresponding to that row has the maximum or the minimum number of permissions. The permissions associated with the selected row are taken collectively to form a candidate role. When a new candidate role is generated, the rows corresponding to the users whose permissions are already covered by the current set of candidate roles are removed from the UPA. The algorithm terminates when all the rows of the UPA have been considered. In [32], the authors propose role mining algorithms based on two machine learning techniques, Latent Dirichlet Allocation (LDA) and Author-Topic Model (ATM). These algorithms reflect the pattern of usage of permissions by the users and also user attributes. A number of genetic algorithm based approaches have been proposed to solve the role mining problem [21,42].
Ma et al. [27] present an approach for role mining where each permission is assigned a particular weight depending on its importance. Visual Role Mining (VRM) [6] is a role mining approach that derives a set of roles from the available user-permission assignments by visually analyzing their graphical representation. In [40], Xu and Stoller propose role mining algorithms which optimize several policy quality metrics that are based on the policy size and role interpretability. Verde et al. [39] propose a six step methodology to make role mining scalable and hence suitable for deployment in the real world. Gal-Oz et al. [14] propose an algorithm for deriving roles from the usage pattern information present in web applications.
In [5] the authors propose a three-step methodology to reduce the complexity of the role mining process and also the administration cost by deriving the role set only from the stable user-permission assignments, i.e., user-permission assignments that belong to roles having weight above a predefined threshold value. The unstable assignments are used to create single-permission roles. In order to elicit comprehensible roles having strong relationships with the business structures of the respective organizations, a role generating procedure has been proposed in [7], which incorporates business information in the process of finding roles.
Baumgrass and Strembeck [1] propose a set of guidelines that allow migration of current-state RBAC models derived from role mining approaches to target-state RBAC models derived from role engineering approaches. Hachana et al. [17] formally define the problem of comparing two sets of roles and prove it to be NP-complete. They also propose a greedy heuristic for solving the problem. Hybrid role engineering techniques developed by combining top-down and bottom-up strategies have been proposed in [11,41].
Enumerating roles in presence of various constraints are also available in the literature. In [4,18,22,24], role mining algorithms considering various cardinality constraints like maximum number of users that can be assigned to a role, maximum number of roles per user, and maximum number of permissions per role have been proposed. Lu et al. in [26] propose a variant of RMP, called the constraint-aware role mining problem (CRM) that allows negative permissions in roles. Such negative permission roles can account for the separation of duty constraints. In order to solve CRM, they propose the Extended Boolean Matrix Decomposition (EBMD) approach. Hu et al. [19] use answer set programming to find RBAC configurations that conform with several constraints which reflect various security and business requirements.
A survey of the existing literature indicates that though a considerable amount of work exists on role mining in RBAC systems, none of the existing work, except for the one proposed by us in [28], focuses on mining roles in temporal RBAC systems. However, our current work is different from the work presented in [28] in terms of the formulation of a generalized version of the temporal role mining problem and also the greedy approach proposed to solve the problem.
Conclusion and future directions
Temporal role mining provides a method for deriving roles in TRBAC systems that assign permissions to users for specific sets of time intervals. In this paper, we have presented a formal definition of the Generalized Temporal Role Mining Problem (GTRMP), which takes into consideration both notions of exact and partial matches between the input temporal user-permission assignments and those obtained from the output of the temporal role mining process. The problem has been shown to be NP-complete and we have proposed a greedy heuristic for solving it. Results of extensive experiments show that permitting a limited degree of mismatch, though results in restricted availability of certain permissions to some users, reduces the number of roles significantly, especially when the number of users and permissions is large.
In this work, the number of roles has been considered to be the only optimization metric. This metric is useful for organizations which consider the cost of administering the resulting TRBAC system only in terms of the number of roles. However, the administrative cost may also include other aspects like the cost of administering the user-role assignment relation, role-permission assignment relation, role enabling base and the role hierarchy. So, metrics that consider the sizes of the user-role and role-permission assignment relations as well as the size of the role enabling base and role hierarchy apart from the size of the mined role set can be designed for overall cost optimization. Various other temporal constraints present in the TRBAC model other than enabling and disabling of roles can be taken into consideration while devising the temporal role mining techniques. Apart from these, features such as temporal role hierarchies and temporal Separation of Duty (SoD) constraints of the GTRBAC model [23] may also be considered during temporal role mining. In temporal role hierarchies, the inheritance relationships among the roles (and consequently the associated permissions) are dependent on the enabling duration of the roles and temporal SoDs reflect conflicts among time intervals and role assignment to users and consequently the availability of permissions included in those roles. We would also like to look into hybrid temporal role engineering techniques in the context of temporal RBAC models which will help to create roles incorporating the business information available within any organization. Another future direction of work is to design tools for building the TUPA matrix from available information including access logs. Such a tool would aid the system administrator in his task of creating the TUPA matrix.
Footnotes
Complexity analysis of GTRMP
Here, we present a formal analysis of the complexity class of GTRMP. We first formulate a decision version of the problem called Decision-GTRMP (DGTRMP) as follows.
