Abstract
Role mining is to define a role set to implement the role-based access control (RBAC) system and regarded as one of the most important and costliest implementation phases. While various role mining models have been proposed, we find that user experience/perception – one ultimate goal for any information system – is surprisingly ignored by the existing works. One advantage of RBAC is to support multiple role assignments and allow a user to activate the necessary role to perform the tasks at each session. However, frequent role activating and deactivating can be a tendinous thing from the user perspective. A user-friendly RBAC system is expected to assign few roles to every user. So in this paper we propose to incorporate to the role mining process a user-role assignment constraint that mandates the maximum number of roles each user can have. Under this rationale, we formulate user-oriented role mining as the user role mining problem, where all users have the same maximal role assignments, the personalized role mining problem, where users can have different maximal role assignments, and the approximate versions of the two problems, which tolerate a certain amount of deviation from the complete reconstruction. The extra constraint on the maximal role assignments poses a great challenge to role mining, which in general is already a hard problem. We examine some typical existing role mining methods to see their applicability to our problems. In light of their insufficiency, we present a new algorithm, which is based on a novel dynamic candidate role generation strategy, tailored to our problems. Experiments on benchmark data sets demonstrate the effectiveness of our proposed algorithm.
Introduction
As opposed to the permission-based access control mechanism, which assigns permissions required to perform tasks to users directly, the role-based access control (RBAC) mechanism defines a role set by associating permissions to roles and then assigns roles, rather than a number of permissions, to users. RBAC has been regarded as a de facto access control model. Its many advantages include the convenience of authorization allocation and the reduction of the system administrative workload. To benefit from those advantages, enterprises still employing their old access control systems need to migrate to RBAC. To accomplish the migration, the first phase is to define a good role set. While the role defining problem is seemingly straightforward, it has been recognized as one of the costliest phases in the implementation of RBAC and poses a great challenge to the system engineers. The difficulty comes from the fact that a RBAC system engineer usually has little knowledge on the semantic meanings of user responsibilities and business processes within an enterprise.
Role mining has proven to be an effective (machine-operated) means of discovering a good role set. Its key idea is to utilize data mining technologies to extract patterns from existing permission assignments of the old access control system, which are then used to establish roles. This greatly facilitates the implementation of RBAC (by migrating from the old access control system). In the literature, role mining has been extensively studied. In a nutshell, the existing literature investigates role mining with different objectives, including minimization of the number of roles, minimization of the administration cost, minimization of the complexity of the role hierarchy structure, and others. However, we find that few existing works considered to improve end-user experience (of the underlying RBAC system), which should be one important goal for any practical information system. Needless to say, users’ experience/perception of a system represents system usability and directly affects the eventual success of the system in practice. As such, we argue that user-friendliness should be an essential criterion for evaluating the quality of role mining.
In this paper, we study user-oriented role mining, being the first to explore the role mining problem from the end-user perspective. RBAC allows a user to assume multiple roles. A user can switch roles at different sessions to activate necessary permissions. However, frequent role switching can be a tedious thing for users. Our daily experiences tell us that end users often prefer fewer role assignments; as long as a user acquires all the needed permissions, the fewer roles she has to bear, the better usability she may feel upon the system. That is, from the end-user perspective, a good RBAC system should have as few user-role assignments as possible. This in fact coincides with an advantage of RBAC: recall that one reason accounting for the wide acceptance of RBAC is that it allows users to carry very few roles while enjoying their (potentially many) access rights. However, on the flip side, if we create a unique role for every user, in which case user-role assignments are trivially the most sparse, then the resultant RBAC system would contain too many roles. This would contradict to a premise of RBAC, which is to map permission roles with functional roles within an organization. As such, a user-oriented RBAC solution should not compromise other advantages of RBAC. To this end, we propose to limit the maximum number of roles a user can take on top of regular role mining. Such a strategy would well balance user friendliness and other system quality factors such as administrative overhead. While the idea is clear, the added constraint poses extra challenges to role mining, considering that role mining in general has already been a hard problem. Towards tackling the obstacle, we make the following contributions: (1) we formulate user-oriented role mining as four specific problems that are the user RMP, the approximate user RMP, the personalized RMP, and the approximate personalized RMP; (2) we study the theoretical properties of the four problems and formulate them as standard optimization problems; (3) in view of the weaknesses of the existing role mining algorithms, we present efficient algorithms, tailored to the user-oriented role mining problems; (4) to investigate the effectiveness of our algorithm, we conduct experiments on benchmark data sets and obtain promising experimental results.
This paper is an extension of a paper with the same title accepted in The Proceeding of the 27th Annual IFIP WG 11.3 Working Conference on Data and Applications Security and Privacy [13]. The extended version significantly extends the original conference paper. In particular, it addresses two new problems: the personalized RMP and the approximate personalized RMP. The properties of these two problems are presented. Variants of original algorithms are also expanded. Four different methods for candidate role generation are investigated and compared with respect to performances. Several new experiments are conducted on new examples of configuration.
The remainder of the paper is organized as follows. Section 2 reviews existing role mining works in the literature. Section 3 presents the user-oriented role mining problem. Section 4 presents optimization models. The heuristic algorithm is provided in Section 5. Experimental results on benchmark access control data sets are reported in Section 6. Section 7 concludes the paper.
Related work
The concept of role engineering was introduced in 1995 by Coyne [2]. It aims to define an architectural structure that is complete, correct and efficient to specify the organization’s security policies and business functions. Coyne’s approach is a top-down process oriented strategy for role definition. With the top-down approach, one generally starts from requirements and successively refines the definitions to reflect the business functions. Fernandez and Hawkins [5] used cases to determine the needed permissions of roles. Brooks [1] developed a graphical user interface to migrating to a role-based environment. Roeckle et al. [24] proposed to deduce roles by analyzing business processes. Shin et al. [26] conducted a top-down role engineering by examining backward and forward information flows and employing unified modeling language. Neumann and Strembeck [22] discovered roles by aggregating permissions derived from analyzing usage scenarios. Kern et al. [11] proposed an iterative-incremental approach, that considerers different stages of the role life-cycle including analysis, design, management and maintenance.
Top-down approaches are not practical for enterprises of medium or large size, because it is difficult for an access control engineer to understand the semantic meanings of a large number of operational processes and user duties in a short time. For large-scale problems, bottom-up approaches, that discover roles in an automated fashion, are preferred. Bottom-up approaches typically utilize existing user-permission assignments to formulate roles. In particular, data mining techniques are often employed in bottom-up approaches to identify promising roles. Kuhlmann et al. [12] coined the concept of role mining using data mining. In [25], an algorithm ORCA was proposed to build a hierarchy of permission clusters using data mining technologies. However, overlap between roles is not allowed in ORCA, which contradicts to normal practice in real applications. Vaidya et al. [31] proposed a subset enumeration approach, which can effectively overcome this limitation. Lu et al. [14,15] formulated the role mining problem as binary linear integer program, that enables role engineers to utilize optimization software packages to discover roles. Ene et al. [4] provided some fast exact and heuristic algorithms for discovering the minimum role set.
An inherent issue with various bottom-up role discovering approaches is that there is no unanimous notion for goodness of a role set. Vaidya et al. [28] proposed to use the number of roles to evaluate the goodness of a role set. They [29] also introduced to use the administrative task as an evaluative criterion. Role hierarchy is another important evaluative criterion, as it is closely related to the semantic meanings of roles. Related works on role hierarchy include [3,9,19]. Molloy et al. [20] further introduced a weighted structural measure, which could describe most of studied evaluative criteria. Ma et al. [18] also used weights to combine multiple objectives. Our work in this paper strengthens this line of research by incorporating user experience/perception as an extra evaluative criterion in role mining.
Beside the above works on finding a role set with respect to different criteria, there are other interesting works with different flavors. Lu et al. [14] presented an optimization framework for role engineering. They even extended their work to incorporate negative authorizations in [16,17] so that the discovered RBAC solutions support either negative role assignments or negative permissions in roles. The role mining problem can be related to many problems outside the access control field, such as the discrete basis problem [23], the database tiling problem [8], the overlapping clustering problem [27], among others. So the approaches designed for those problems could be utilized for the role mining purpose. For example, Frank et al. [6] provided a probabilistic model to analyze the relevance of different kinds of business information for defining roles that can both explain the given user-permission assignments and describe the meanings from the business perspective. They [7] also introduced a model to take the attributes of users into account. Some research works were concerned about data cleanliness, as collected user-permission assignments data may be contaminated. Studies on role mining in the presence of noisy data are presented in [21,30].
Problem definitions
In this section, we study and formulate user-oriented role mining. As we have discussed in the Introduction, from users’ perspective a user-friendly RBAC system should assign as few roles as possible to each user; no user is happy with being overwhelmed by assuming too many roles (titles). Ideally, a user would wish to carry only one role given that the role provides with him all the necessary access privileges for him to work and function smoothly. Indeed, in reality most organization’s systems are designed that way. For example, in a school system, the majority of people carry only one role among STUDENT, FACULTY, STAFF and VISITOR. In a software company, most employees are either ACCOUNTANT, ENGINEER or MANAGER. Thus, user-oriented role mining is characterized with the fact that the maximal role assignment for each user should be constrained. This gives rise to the definition of user-oriented role mining, as stated below. User-Oriented Role Mining is to discover a RBAC solution with respect to some evaluation criterion, such that after being assigned to roles, every user gets the same permissions as before, and the user-role assignments satisfy the predefined policy on the maximal role assignments. (User-Oriented Role Mining).
Existing evaluation criteria
To concretize user-oriented role mining, we have one question to answer: what evaluation criterion should be used to evaluate the goodness of a RBAC solution? In the literature, role mining is typically formulated as certain optimization problems with objectives and constraints. As summarized by Molloy et al. at [19], there are five main factors which can be used to evaluate the goodness of a RBAC solution. They are the number of roles
Almost all role mining evaluative criteria can be generally described with the weighted structural complexity measure introduced in [19], which sums up the above five factors, with possibly different weights for each factor.
(Weighted structural complexity measure).
Given a weight vector
Role mining in general involves minimizing
The sum of user-role assignments and role-permission assignments of
Other evaluative criteria, such as minimizing the complexity of the resultant role hierarchy [19], are also more from the system administrator perspective, rather than the end-user perspective.
Notations
By examining existing evaluative criterion, we found that the only factor in the Weighted Structural Complexity Measure that matters to end-users is the size of user-role assignments
Given the evaluation criteria, we study four variants of the user-oriented role mining. They are user RMP, approximate RMP, personalized RMP, and approximate personalized RMP. The notations to be used in their definitions are provided in Table 1. Note that we represent all data sets, i.e. user-permission assignments, user-role assignments, and role-permission assignments, in the binary matrix form. It helps to build a connection with the existing work of Boolean matrix decomposition. In the following, we will define the four user-oriented role mining variants.
RMP commonly refers to discovering roles from existing user-permission assignments without much consideration of the semantics of operations within an institution/organization. One basic requirement of role mining is that the discovered roles should well reconstruct the original user-permission assignments. The strictest case is completeness; in other words, for every user there exists an assignment of roles such that the user gets the same permissions (no more no less) as being assigned before. As user-permission assignments, user-role assignments, role-permission assignments are represented by binary matrices
However, the basic RMP originates purely from the administrative angle and does not reflect any user perspective. An optimal solution for a basic RMP instance may lead to a system with the least administrative cost. But it does not necessarily mean that the system would be favored by the users. For example, a RBAC solution with 10 roles and each user gets up to 2 roles may be favored by users over a RBAC solution with 8 roles and each user gets at least 6 roles. From the user perspective, a good RBAC system is a system with less role assignments to users, so users do not have to frequently switch roles between sessions. For example, a doctoral student who works as teaching assistant at a university would wish to get only one role that is designed for student teaching assistant rather than two roles, one for student and the other for faculty. That is why we propose the user RMP. The user RMP essentially is extended from the basic RMP. The only difference is the uniform constraint on the maximal role assignments for all users. The user RMP can be used to model the role mining case that the system engineer has prior knowledge on the maximal role assignments. The knowledge would help filter out many unwanted RBAC solutions and improve the quality of the mined role set. Note that the user RMP assumes all users have the same tolerance of excess role assignments, which might not be true in some cases. We will address the issue later by proposing other RMP variants. The formal definition of user RMP is given as below.
(User RMP).
Given m users, n permissions, user-permission assignments
It can be roughly formulated as below:
The completeness constraint for the basic RMP is not necessary in some cases. Case 1: a bottom-up role mining approach is used to explore potentially promising roles, rather than finalization. The advantage of bottom-up role mining is being able to discover roles in an automated fashion by utilizing a computer. However, it typically ignores the semantics within an institution/organization due to the difficulty of processing semantic information. As a result, it happens that the derived roles do not meet practical needs, e.g. permissions grouped in a role do not have any semantic connection. In this case, it is not necessary to enforce the completeness constraints, as the enforcement might miss some potentially interesting roles. Case 2: the existing user-permission assignments contain noise, e.g. a personnel change occurs and is not timely reported after the data is collected. For that case, enforcing the completeness constraints would cause the over-fitting problem. Relaxing the completeness constraint could alleviate the problem. The approximate RMP is the RMP with the objective of minimizing the number of roles and the constraint that the reconstructed user-permission assignments approximate the original assignments to a certain degree. Same as the basic RMP, the approximate RMP does not consider the user perspective. Users would always prefer a RBAC solution with few role assignments, regardless of the overall system administrative cost. By extending the approximate RMP, we propose and study the approximate user RMP, which is the same as the approximate RMP, except that a user cannot be assigned to more than a certain number of roles. The number could come from prior knowledge or experience. Again here we assume that all users have the tolerance of excess role assignments.
(Approximate user RMP).
The approximate user RMP is the same as the user RMP, except for that
It can be described as the following optimization problem.
Both the user RMP and approximate user RMP assume that all users have the same tolerance of excess role assignments and exert the same constraint of the maximal role assignment for all users without catering for users’ concrete needs. It is possible that some users have high tolerance, while some others have low tolerance. If we enforce the same maximal role assignment constraint, the constraint may be too strict for some users, which limits the feasible solution space and results a poor role set, and too loose for some other users, which leads to a role set of low score in user satisfaction. To deal with the problem, we propose and study the personalized RMP variants. The core is the concept of personalized tolerance. Instead of exerting the same constraint for all users, we allow users to express their preferences on tolerance of excess role assignments, which can be done by conducting a survey. So for every user, we include a personalized constraint
(Personalized RMP).
Given m users, n permissions, user-permission assignments
The personalized RMP is also a minimization problem. The only difference from the user RMP is that each user is subject to different limits on the maximal role assignments.
The approximate personalized RMP is extended from the approximate RMP with incorporation of personalized tolerance. Its formal definition is as below.
(Approximate personalized RMP).
The approximate personalized RMP is relaxed from the personalized RMP, such that
It can be described using optimization model as well.
The regular RMPs and personalized RMPs differ not only in their formulations, but in solutions. If we apply standard optimization solvers, they would appear no much difference, because a RMP problem and its personalized variant have the same objective and the same number of constraints. But as all RMP variants studied in this paper are hard problems (will be proved later), we have to resort to heuristic algorithms for large instances. Heuristic algorithms designed for a RMP problem and its personalized variant do differ. Intuitively, it is more difficult to solve personalized RMPs as one needs to carter for various needs. As an analogy, it would be relatively easy to teach a class of students with the same background. If students come with different prior knowledge, a teacher may have to slow down to take care of everyone so to maximize the overall class performance. We will provide detailed discussion on the difference of solutions later.
Theoretical analysis and optimization model
In this section, we analyze computational complexity and build optimization models for the four user-oriented RMP variants. The results would provide us some insights into the problems. First, we study their computational complexity.
The user RMP, the personalized RMP, and their approximate versions are NP-hard.
The basic RMP is known to be NP-hard, as it can be reduced to the classic NP-hard set cover problem [28]. The basic RMP problem can be further reduced to the user RMP problem. If we set the value of the maximal role assignment to be a sufficiently large number, then the user RMP problem is essentially the same as the basic RMP problem. So the user RMP problem is NP-hard. As the user RMP problem is a special case of the approximate user RMP problem with the error ratio being 0, so the approximate user RMP problem is NP-hard as well. The personalized RMP problem is also NP-hard, as it is generalized from the user RMP. Since the approximate personalized RMP problem is generalized from the personalized RMP problem, so it is NP-hard as well.
Next, we will build optimization models for the four problems. Among many existing role mining approaches, the optimization approach has been favored by researchers, due to the existence of many public and commercial optimization software packages. The user-oriented RMP problems can be formulated by optimization models as well, which enables an engineer to directly adopt an existing software package. In the previous section, when we define the four problems, we provide rough optimization models, which cannot be directly fitted into any optimization software package. In the following, we will elaborate the optimization models and put them in the standard optimization form.
We formulate the user RMP first, which can be viewed as a variant of the basic RMP with a constraint that the number of user-role assignments is less than δ. Suppose we have located a set of q candidate roles, represented by a binary matrix
Binary variable
The first constraint enforces that if user i has permission j, at least one role containing permission j has to be assigned to user i.
The second constraint enforces that if user i has no permission j, no role containing permission j can be assigned to user i.
The third constraint
The approximate user RMP can be viewed as a relaxed version of the user RMP by not strictly enforcing the reconstruction conditions. As opposed to top-down role discovery approaches, which define roles by examining the semantic meaning of a large number of operational processes and user responsibilities, role mining is a bottom-up approach, which mines roles by analyzing patterns from existing user-permission assignments. So role mining is typically suitable for implementing RBAC systems for large-scale organization. As role mining discovers roles purely from existing assignments, so the result may not satisfy practical needs and thus additional steps may be required. From that perspective, it may not be necessary to discover a role set and role assignments to completely reconstruct the excising assignments. By relaxing the competentness constraint, we might be able to discover better roles. It is the rational of the approximate spareness RMP.
To model the approximation constraint, we introduce some slacking variables to the constraints that enforce completeness. The final optimization model is as below.
In the first two constraints,
The third and fourth constraints convert
The constraint of
The optimization model for the personalized RMP problem can be obtained by simply modifying the optimization model for the user RMP problem such that the constraint
The optimization models in the previous section allow us to directly adopt fruitful optimization research results. However, the problems are NP-hard in nature, which means that for problems of mid to large data size, specially designed efficient heuristics are still required. So in this section, we provide fast heuristic algorithms for the presented four problems. We start with the approximate personalized RMP problem, as other problems can be viewed as its special cases. For instance, if all users have the same limit on the maximal role assignments, the approximate personalized RMP problem is equivalent to the approximate user RMP problem. If the allowed approximation errors are 0, then the approximate personalized RMP problem is equivalent to the personalized RMP problem. So a solution to the approximate personalized RMP problem can be easily extended to the other three problems. In the following, we assume that no two users have the same permissions in a data set, because duplicate users do not affect the final RBAC solution. This assumption is also employed in other role mining methods, e.g. [31]. In practice, we can achieve that by simply removing users with the same permission assignments. To do so, we group all users who have the exact same set of permissions, which can be done in a single pass over the data by maintaining a hash table of the sets of permissions gradually discovered.
Recall that the approximate personalized RMP is to find a minimum set of roles to reconstruct the existing user-permission assignments with the constraints that (i) the maximal role assignments for user i are not more than a given number
The general structure of our algorithm is to iteratively choose a candidate role and assign it to users until the total reconstruction errors are less than the given threshold δ, while the constraints on the maximal role assignments for all users are carefully enforced from start to finish. The procedure can be described in Algorithm 1, where RSelector and CGenerator refer to the function of selecting a candidate role form the candidate pool and the function of generating candidate roles respectively.

Approximate personalized RMP
Indeed, the approach of discovering roles in an iterative fashion has been observed in many role mining methods, e.g. the Lattice Model [3], the FastMinder [31] and the optimization-based heuristic [14]. The distinguishing elements of our algorithm are (i) the way of generating candidate roles and (ii) the rule for role selection at each iteration.
One key feature of our algorithm is a dynamic role generation strategy. Given n permissions, there are
Motivated from the existing role mining literature, we consider three different ways of generating candidate roles. The first way is called itself, which is inspired by [15]. In [15], every user’s permission set
Existing assignments
Existing assignments

CGenerator – itself
The second method is called intersection, which is inspired by [31]. In [31], they generate an initial candidate role set by making every user’s permission set
The third method is called association, which is inspired by [23]. The original association method takes advantage of the relationship between permissions (columns of a binary matrix) and generates candidate roles of permissions, which frequently occur together. Given an input matrix
For illustration purpose, we formally present the itself candidate role generation method in Algorithm 2. The other two candidate role generation methods can be derived from it.
In this section, we study how to select roles from a candidate role pool and assign them to users. We also discuss how to enforce maximal role assignment constraints for each user along the iterative role selection process. We tried the following candidate role selection strategies. (i) greedy – choose the candidate role that covers the most remaining permission assignments; (ii) fewest – select the candidate role with the fewest permissions; (iii) most – select the candidate role with the most permissions; (iv) rand – choose a random candidate role. Among which, the first strategy is widely used in heuristic algorithms for various role mining problems. But it is the most time consuming, because at each iteration the algorithm needs to examine all candidate roles in order to find the best one. The last strategy is the fastest one, as there is nearly no time cost for role selection. But since the selection process is blinded, the quality of the returned role set has no guarantee. The time cost of the second and third strategies are in the between. They have been used to solve the basic RMP problem in [3] and are reported of decent results. For illustration purpose, the complete procedure of the greedy role selection method is given in Algorithm 3. We will omit the algorithm for the other selection methods, as it is not difficult to derive them by modifying Algorithm 3.

RSelector – greedy
To enforce the constraint that no user gets more than
The key of the above user-oriented role mining algorithm is the continuous updating of candidate roles. At each iteration of the algorithm, candidate roles are generated and a candidate role is selected. So the total computations then depend on the number of iterations, the candidate role generation method and the candidate role selection method. Consider the most computationally expensive one, the intersection candidate role generation method and the greedy role selection method. Suppose the data set consists of m users and n permissions. According to our algorithm, at each iteration, at least one user’s permissions are completely covered. So the maximum required iterations are m. At each iteration step, each candidate role is compared against each remaining user. Denote the number of candidate roles as l, which is m for the itself candidate role generation method,
Experiments and results
In this section, we conduct experiments on both synthetic and real data sets to examine the effect of the maximal role assignment constraint on the mined RBAC solutions and evaluate the performance of the heuristic algorithms.
Synthetic data sets
The synthetic data sets are generated as follows. Firstly, generate a set of unique roles

Results on user RMP. (Colors are visible in the online version of the article;

Results on approximate user RMP. (Colors are visible in the online version of the article; https://dx-doi-org.web.bisu.edu.cn/10.3233/JCS-140519.)

Results on personalized RMP. (Colors are visible in the online version of the article; https://dx-doi-org.web.bisu.edu.cn/10.3233/JCS-140519.)
The first experiment is to study the effect of the maximal role assignment constraints with respect to the evaluation criterion, the number of roles. We utilize the heuristic algorithm with the itself candidate role generation method and the greedy candidate role selection method. We generate

Results on approximate personalized RMP. (Colors are visible in the online version of the article; https://dx-doi-org.web.bisu.edu.cn/10.3233/JCS-140519.)
The second experiment is to study the algorithms. The framework of our algorithms is dynamic and iterative in nature. At each iteration, a set of candidate roles are generated and then some role selection method is applied. The iteration terminates when some condition is satisfied. We tried several candidate role generation methods, including itself, intersection, and association, and several role selection method, including greedy, fewest, most and random. Firstly, we study the role generation methods. We consider the user RMP problem and run the iterative algorithm with the three different candidate role generation methods respectively coupled with the same greedy candidate role generation method. All experiments are run against the same data set used before. The maximal role assignment constraint is set to 3. The experimental results are reported in Table 3. It shows that itself and association have the comparable computing time cost, while intersection takes much more time. The reason is that intersection generates
Comparison on number of roles for candidate role generation methods
Secondly, we study the role selection methods. The experimental setting is as follows. The user RMP problem is considered. A testing data set is generated with m and n fixed to 50 and the maximal role assignment to 3. We vary the true number of roles k from 4 to 10 and the density rate of 1’s elements from 0.05 to 0.25. For each data generation parameters set, we generate five matrices. We run the iterative algorithm coupled with the greedy, fewest, most, and rand candidate role generation methods respectively against the testing data sets. We report the average results across the five matrices generated with the same parameters setting in Figs 5–8. We observe that in terms of the number of roles, the fewest and greedy methods have comparable performance and are better than the most and rand method. We also observe that in terms of computing time, the greedy has the worst performance. The reason is that the greedy candidate role generation method needs to compute the contribution of every candidate role, which takes much time. While, the fewest and most methods only need to count the number of contained permission for every candidate role. The rand method incurs the least computing time, because a random candidate role is randomly picked at each iteration. Surprisingly, although the random method has no rational behind, its performance in terms of both the number of returned roles and computing time is decent. It suggests that for large instances an iterative role search algorithm coupled with the random candidate role selection method could be a practical solution. Another surprising observation is that for quite a few parameter settings, the fewest method performs better than the greedy method. Greedy search is a classic searching strategy used in optimization. It intends to find a global optimal (or good) solution by traversing a series of local optimal solutions. The fewest method, mathematically speaking, removes a clique with few edges from a bipartite graph at each iteration. But it is hard to mathematically explain why the fewest method performs better than the greedy method. But the experimental result implies that in practice one could try both methods and pick the better returned solution.

Results on average number of roles by varying density parameter. (Colors are visible in the online version of the article;

Results on average computing time by varying density parameter. (Colors are visible in the online version of the article;

Results on average number of roles by varying true number of roles. (Colors are visible in the online version of the article;

Results on average computing time by varying true number of roles. (Colors are visible in the online version of the article;
We also conduct experiments on benchmark access control data sets. They are americas_small, apj, healthcare, domino, firewall1 and firewall2, which can be found at the HP website.2
The first experiment evaluates the user-oriented exact RMP. We want to know whether our user-oriented role mining approach can effectively enforce the maximal role assignment constraint and whether the output of the algorithm is comparable to the optimal RBAC solution without the constraint. From the previous experimental results on synthetic data sets, we learned that the itself candidate role generation method coupled with the greedy candidate role selection method returns good solutions and also runs fast. So in the following experiments, we will use that strategy. We also learned that our algorithm exhibits similar behavior on the four user-oriented role mining variants. So for convenience, we here only consider the user RMP problem, for which all users have the same maximal role assignment constraint. For reference convenience, we call our user-oriented role mining algorithm as Dynamic. We run Dynamic algorithm on those real data sets with different maximal role assignment constraints. We compare our results with the benchmark role mining algorithm, Lattice [3]. As far as we know, Lattice has the best reported result with respect to the minimization of the number of roles and the minimization of the system administrative cost. The experimental results are reported in Tables 5–8. In these tables, δ denotes the approximation rate. The exact RMP requires the approximation rate to 1. So we only look at the portion of the results with
Data description
fire1
americas_small
fire2
domino
healthcare
apj
In the results, when t decreases, the size of
Furthermore, we are pleased to see that even with the constraint being enforced, the complexity of the RBAC solution returned by Dynamic is still comparable to that of Lattice. For example, in Table 9, when t is 2, Dynamic returns a RBAC solution with only 19 roles, while Lattice returns a solution with 15 roles and the maximum role assignments of 4. Another observation is that the
The second experiment is to study the user-oriented approximate RMP. We want to know how the RBAC solution varies with the approximation rate. We run the Dynamic algorithm by varying the value of δ from 0.95 to 0.80. Results are reported in Tables 5–8.We observe that when the complexity of the RBAC solutions decrease drastically when δ increases. For instance, in Table 5, with t of 8, only 39 roles are required to cover the 95 percent of permission assignments (i.e.,
To summarize, the two experiments have demonstrated the effectiveness of our user-oriented RMP approach. We highlight that one primary advantage of Dynamic is that it allows a RBAC engineer to tune the maximal role assignment constraint to reflect the real need. This feature is not supported by any existing role mining method. More importantly, the overall system complexity of the resultant solution is comparable to that of the optimal solution without any constraint.
In this paper, we studied the role mining problem from the end-user perspective. Unlike other existing role mining approaches which primarily aim to reduce the administrative workload, our approach strives to incorporate better user experience into the role decision process. As end-users prefer simple role assignments, we add a constraint that mandates the maximum number of roles a user can have to the role mining process. The number usually can be determined in practice by a brief study on the general business processes of an organization. Basing on this rationale, we formulated user-oriented role mining as four specific problems, the user RMP, the approximate user RMP, the personalized RMP, and the approximate personalized RMP. The user RMP is applicable to the case where there is a prior knowledge on the maximal role assignments that a user can take. The personalized RMP is applicable the case that users are allowed to express their own interests on the upper limits of role assignments. Their approximation versions can be used in cases, where exact completeness is not required, to explore interesting roles. We studied existing role mining methods, and found that some of them can be applied to our problems with simple modification. For better efficiency, we also designed new algorithms tailored to our problems, which are based on a dynamic candidate role generation strategy. Experimental results demonstrate the effectiveness of our approach in discovering a user-oriented RBAC solution while without increasing the overall administrative workload too much.
Future work can go along two directions. One is to study the feasibility of employing some statistical measures such as Bayesian information criterion to facilitate the role mining process. The motivation is that sometimes the accurate sparseness constraint (the maximum role that a user can have) is not available. We could employ some statistical criteria to choose the RBAC model with a good balance of model complexity and describability. The other direction is to consider the dynamic sparseness constraint. In this work, we assume that the same sparseness constraint is enforced to everyone. However, it might be the case that some user requires many role assignments due to some need. In such cases, a more practical role mining approach is to minimize the sparsity of the whole user-role assignments rather than enforcing a sparseness constraint for every user.
