Abstract
We examine multiagent resource allocation to maximize global utility in settings with dynamic arrivals, proposing a distributed coordination framework that allows resource preemptions. We then introduce the concept of resource reservations to address expected future changes in the task allocation environment; accounting for switching costs and considering irrevocable allocations both become desirable. We address these problems using dummy agents and validate our approach using simulations comparing utility with and without dummy agents. We discuss the value of this framework in the context of social computing, enabling improved interactions for teamwork and coordination in domains like health care for emergency response management.
Introduction
Multiagent systems are of interest to the artificial intelligence, industrial engineering, and operations research communities as a method of solving resource allocation problems. There are a number of reasons for this interest, including increased robustness (since individual components can fail without catastrophic loss) (e.g. [1]); natural mappings to problems where the parties involved in the allocation have some degree of autonomy (e.g. [2]); and as a useful heuristic approach in problems which are too large to solve directly (e.g. [3]).
In order to enable better teamwork among human users, models for multiagent resource allocation may be employed. Motivated by scenarios of allocating doctors to patients in emergency medical settings, we examine the challenge of deciding whether to delay assignment of resources which may not be interruptible once they begin processing. We offer a solution guided by the concept of a reservation, resting on the integration of planning and learning, and on the modelling of bother incurred by these human resources.
We will begin by summarizing a multiagent resource allocation framework. After presenting the details of this framework, we proceed to outline a proposal for anticipating future changes through the use of what we refer to as resource reservations. We then clarify how this approach can be leveraged in scenarios where there are irreversible allocations. Validations are presented, using simulated scenarios, to quantify the value of the proposed solutions. In all, we offer a new direction for extending the scope of resource allocation decision making, intended for environments that are highly dynamic, and where new work may arise at short notice.
Within the area of multiagent systems, resource allocation tasks consist of problems wherein a set of agents, with individual preferences, is to be allocated a set of (possibly) heterogeneous resources. Ideally, the allocations are selected to maximize a value function. In this work, we are especially concerned with iterated versions of this problem, where agents may repeatedly renegotiate a given allocation as new agents arrive. We will focus primarily on distributed approaches (e.g. [4, 5, 6]), as there is a natural mapping from these representations to the sorts of problems we envision the system being used to solve, for example, problems in the domain of medical resource allocation, where distributions of doctors’ time may be required to change frequently [7, 8].
Literature review
Within Multiagent Systems, the assignments of objects, items, or tasks to agents has been widely studied for decades. For a more comprehensive overview of the area, we refer the interested reader to [2] or [9].
The area is organized in two parallel literatures, considering similar problems and developing similar solutions. In Multiagent Task Allocation, a central agent divides a larger project into smaller units, and distributes them among a population of assistant agents to maximize efficiency, utility, or throughout. This distribution may be planned centrally [10, 11], or emerge dynamically through negotiation among the assistants [12, 13, 14, 15], or through distributed heuristic techniques [16, 17, 18]. In general, self-organizing systems are more robust in large scale deployments [9].
Resource Allocation considers the closely related complementary problem where a central agent distributes resources to agents in response to their individual needs or requests. This scenario models a wide range of real-world problems [19, 20, 21, 22, 23]. As with Task Allocation approaches, Resource Allocation approaches can be categorized by the method used to assign resources to agents. In centralized approaches, agents typically submit information about their requirements to a central planner, who computes an allocation directly [24, 25, 26, 23, 27]. These approaches typically offer strong theoretical guarantees on performance, but require significant computation and communication in large scale systems [9]. In contrast, self-organizing systems have the beneficiary agents negotiate or heuristically coordinate among themselves to assign the resources [28, 29, 30, 31]. As with Task Allocation, self-organizing approaches to Multiagent Resource Allocation generally scale better, providing reduced communication and coordination costs, at the expense of robust theoretical guarantees on efficiency or throughput [9].
Active research in multiagent resource allocation focused on theoretical systems, and recent surveys [9] of the field call for the adoption of more real-world constraints, dynamic re-allocation of resources, and the design of self-organizing systems for cooperative environments, topics addressed by the work in this paper. Other topics, including the interplay between related tasks, dynamically learning the skill profiles of agents during allocation, and the design of self-organizing systems for common emerging physical domains like the smart grid are not touched on directly.
This paper is best understood in the context of self-organizing solutions to Multiagent Resource Allocation, and adopts a heuristic approach, a strategy which has been shown to be successful in a variety of applications [22, 32, 29]. We discuss how to improve upon these models by introducing the concept of “dummy” agents, tasked with securing the right to use resources at a future point on behalf of needy agents who may potentially appear but who have not yet entered the system, and so cannot yet negotiate on their own behalf. The benefits of this approach and the effectiveness of existing systems are evaluated in simulations designed to mimic the essential features of medical resource allocation problems, where patients come and go, and patients’ needs dynamically change over time.
A framework for preemptive resource allocation
In this section, we present an overview of a framework for multiagent resource allocation that supports preemption. We begin by articulating the problem that this model is intended to address, followed by comparisons with other existing models. We then proceed to present the formal specification that is the underpinning for the system that we designed, following this with a complete, but concise description of that system.1
The problem
In our research, we are trying to address the tension arising from two opposed objectives. Specifically, we wish to (i) preserve agent autonomy while (ii) ensuring that critical agents can get access to the resources they need, even and especially when those resources are currently assigned to agents with much lesser needs. In order to do so, we seek to improve upon existing multiagent resource allocation solutions.
Distributed approaches to multiagent resource allocation problems possess different levels of agent autonomy. In a fully distributed approach, agents typically have individual ownership of some initial set of resources, which they can then voluntarily exchange with others for resources that they prefer. They exercise full control in deciding whether to exchange their resources for those of other agents and thus may be selfish. Alternatively, it is possible to adopt a convention that agents will not be purely self interested, but will cooperate with one another to some extent – for example, allowing the formation of coalitions in which various subgroups collaborate to produce mutually beneficial allocations (e.g. [33]).
In this work, we adopt an approach between these two views. Our agents act as individuals, and plan only to satisfy their own resource requirements. However, they are also much more cooperative than in many existing distributed systems, and might accept trades of resources which are not advantageous to them personally, if doing so improves the global welfare of the agents as a whole. Similarly to Paulussen et al. [7], we accomplish this by supporting preemption of resources, wherein one agent will relinquish its resource to another agent with a greater need. Accurately determining whether such a trade is possible requires estimating the opportunity cost (the utility the agent could have obtained from taking some other action) of having a preempted agent hold its resource. That is, we must be able to determine what the agent’s behaviour will be following a preemption, and what the utility derived from that behaviour is likely to be. Our system provides a more accurate estimation of preemption costs by combining planning techniques (which predict future behaviours) with learning techniques (which determine the expected value of those behaviours) in a novel fashion.
Formal problem specification
We define a resource allocation problem as a set of tasks
The solution to such a resource allocation problem is an allocation, defined as a function over
An optimal allocation maximizes utility. In this work we will consider a utility function EU:
This accumulates the change in utility for each task based on the total time spent in the system, from time of arrival to time of completion (one tick after the last tick at which a resource is allocated to the task).
Our system models distributed resource allocation problems using four types of components and four domain-specific functions.
In our system, a
A type The resource A countdown
Tasks are supported by
An associated task they are responsible for resolving. A plan for resolving their task: a sequence of actions that the agent will take, based on its understanding of environmental conditions, to acquire needed resources. The actions are attempts to acquire resources, and are selected based on the agent’s beliefs about the expected value and probability of obtaining each resource. An expected utility A length of time since the plan was generated
Task agents also have two state variables which respectively measure churn (how quickly the environment is changing) and congestion (demand for resources the agents need). These concepts are clarified further in the subsection below. In particular, we explain how modelling churn leads an agent to construct shorter or longer plans.
In our system,
Resources are supported by
Four additional domain-specific functions are employed by resource proxy agents:
Our proposed system can be summarized as follows. At each time tick, a set of new tasks to be completed arrives, and each task is assigned an agent. Each agent requests the resource needed for its task from that resource’s proxy agent, and all incoming requests to a resource are batched together. The request with the highest positive improvement in expected utility is then accepted. The gain and loss in utility can be calculated directly from the type of each task (revealed by task agents when requesting resources) and the known time to complete each task. The potential
A task agent generates a plan for satisfying the resource needs of its user, provided that the agent has no such plan in the present step. In this work we model each plan as a Transfer-Of-Control (TOC) strategy [35, 12]: a series of attempts to acquire resources, with each new step only executed if the previous step has not been successful. For example, such a plan might be to approach resource
where
Agents model their environment using two parameters which facilitate replanning. The parameter
where
where
An agent also estimates the demand for resources in the system (
The resource proxy agent filters incoming requests; only the request with the highest positive improvement in expected utility made during a tick is forwarded to the underlying resource. This batching of resource requests is particularly valuable when a resource has just completed a task, as otherwise low-priority tasks may give up a reliable resource, only to be displaced by the next-highest priority task and so on, in sequence. The resource proxy agent calculates the expected utility of allowing a preemption according to the following equation:
where req is the task requesting a resource
The Gain portion of this equation corresponds to the utility the preempting agent would receive by acquiring the resource right now. This gain is the result of the task the agent seeks to complete being finished earlier than it would be otherwise. The Loss portion corresponds to the change in utility for the agent currently holding the resource, if it were to yield its resource to the preempting task, and then the resource after that task is complete, reacquire it. The Regain portion is the expected utility the preempted agent will acquire from executing its current plan instead of waiting to reacquire the lost resource.
Agents that have been preempted many times will report lower
Note that resources accept requests according to the domain-specific function
A final defining characteristic of the model is its applicability to scenarios where tasks may arrive dynamically. Since our solution does not require the task agents to possess full knowledge of competing requests, new tasks can be inserted at any time without disruption.
As mentioned, the overall design goals of our solution for coordination among the agents are inspired by the transfer of control paradigm of Scerri et al. [12, 35]. Our work aligns well with the extension of this framework constructed by Cheng and Cohen [32], which enables both transferring decision making control and communicating in order to determine whether that control will be accepted.
The main strength of the model presented here comes from the combination of several factors. In order to allow as many preemptions as possible which increase the utility of the system overall, we present a novel way of computing opportunity costs. This calculation requires us to know what the preempted agent could do instead of holding its current resource, and exactly how beneficial such behaviours would be. While Paulussen et al. [7] considered only the immediate actions an agent could take when computing opportunity costs, we can do better. When a preempting agent requests a resource, we use the value of the preempted agent’s plan as an estimate for the value of the alternative actions this agent would take if the preemption were allowed. Using the planning method allows our model to both avoid currency and utilize preemption in a principled and broader way than in previous approaches.
It is not sufficient to use these estimates directly however, because an individual agent’s plan can be overly optimistic in two ways as a result of using only local information when making decisions. First, the agent might generate a plan for its current environment, only to have that environment change. If the agent allows its resource to be preempted on the basis of such a plan, the resulting preemption may actually decrease the utility of the system. Second, the agent is not directly aware of the resource demands of others in the system, so it may assume that the acquisition of another resource will be easy when high demand may make it quite hard. The use of our heuristic learning models for churn and congestion is thus a central element of the framework, allowing agents to dynamically adjust their estimates for how long a plan will be useful and the probability they will be able to hold onto any resources they acquire, using only local information. We also clarify what kind of communication between agents can facilitate the decision making about preemption of resources (information about backup plans, for instance).
Expanding the multiagent resource allocation framework
In this section, we introduce the concept of reservations of resources, and discuss how to address irreversible allocations, in an effort to expand the scope of what may be handled for distributed multiagent resource allocation with dynamic task arrivals.5
Towards a reservation system for resource allocation
If there is a cost associated with reassigning resources, then there may be times where it is more efficient to keep a resource idle rather than locking it into an allocation where it provides less benefit than it might be able to provide in the future. Under these circumstances, the higher the cost, the easier it is to find situations where waiting can help. In this section, we outline a proposal that allows waiting through reservation.
Opportunity cost and the value of waiting
Opportunity cost is the drop in utility due to losing the ability to perform an action; for example, consider a case where you have $10, you can buy coffee for $1 to gain 1 utility, and tomorrow you can buy a movie ticket for $10 to gain 20 utility. (Assume that having extra money left over has no value.) Buying coffee now gives more utility than not buying coffee (1 utility for buying, 0 for waiting), but it comes with the opportunity cost of 20 utility because you won’t be able to afford the movie ticket later. The correct decision in this case is to keep money in reserve in order to take advantage of the future payoff. Note that opportunity cost is highly dependent on the expected changes in environment, so potential tasks which can use the reserved resource should feel free to challenge the decision to reserve a resource under the hope that the opportunity cost has lowered. For example, if you find out that other plans will prevent you from seeing the movie tomorrow, then the opportunity cost drops from 20 utility to 0 utility and so there is no need to conserve money now when you could use it to gain utility.
The opportunity cost in a preemption environment is usually smaller, but still exists. In the example above, imagine if you could borrow $1 from a friend at the cost of 2 utility due to the headache of tracking debt. In this case, the total cost of buying the coffee drops to just 2 utility, because you can still see the movie by borrowing money (i.e. there is no opportunity cost to buying the coffee, just the cost of borrowing money). The opportunity cost is still higher than the utility gain from buying coffee, but if there is some probability that you won’t be able to see the movie even if you can pay for it then not buying coffee now could be a wasted opportunity. With risk-neutral preferences, you should buy coffee if there is a 50% or higher chance of missing the movie if you can borrow money. This is because if you buy coffee (utility gain 1), and you will go to see a movie with probability
When waiting is valuable
Waiting is an expensive operation for resources in a rate-limited resource allocation environment, as cutting down on the amount of time that a resource is in use effectively reduces the pool of resources available, slowing the completion of all tasks currently in the system. However, there are still edge cases where hasty allocations can similarly waste resource capacity.
Consider a utility function where each task applies a constant cost to the system for every time step that the task remains in the system. Each time step that a resource waits thus applies a potential cost to the system (i.e. a loss of utility from delaying a task) equal to the highest single step cost among available tasks, as the resource could have been applied towards finishing that task one step faster. This captures how there is no wait cost when there are already enough resources to handle all existing tasks.
For a concrete example: Consider a computing cluster with two servers (resources) available,
Consider a case with only one resource and generalized tasks. Take two tasks:
Task Task
The net cost for finishing the first task first thus comes to
As expected, if
As tasks take longer amounts of time to complete, the late-arriving task needs less of an advantage to be worth delaying the currently-available task. In particular, a currently-available task which requires many steps to finish anyway may be worth putting on hold to quickly complete new tasks that require less time.
The analysis becomes more difficult when there are multiple resources involved, as different resource assignments must be considered and the specifics of future arrivals become important. Suppose now that there are resources
Task Task
Suppose that
The middle outcome is strictly worse than the last outcome where the correct final allocation is known. Optimal allocation thus gives the system
Thus, when preemption is prevalent, reserving resources is not for the benefit of the tasks that receive the reserved resources. Those tasks could have preempted the resource even if it had not been reserved. Reservation instead improves utility for tasks that are warned away from taking resources that they will not be able to keep.
The value of a wait action can be provided in a task-centred system by dummy task agents, which are agents representing a blank ‘dummy’ task which corresponds to waiting for a real task that will enter the system later. Unlike other tasks this dummy task can always be interrupted (even in environments where preemption is difficult or impossible), as a resource ‘performing’ this task is understood to be on standby until a sufficiently important task interrupts it. Once the real task in question arrives in the system, the dummy agent grants its resource to that task and seeks a new incoming task to represent.
A dummy agent competes with other task agents for resources by reporting the utility of waiting, measured by examining the opportunity cost of allocating the resource now. If the utility of waiting is higher than the utility gain from allocating the resource to any other task that approached it, then the resource is allocated to the dummy agent instead of working on a real task; by definition, the cost of allocating the resource is higher than the cost of leaving it idle in this case. By reusing the concept of a task agent, the dummy agent seamlessly provides a wait action to resources as an alternative to other available tasks. Note that the opportunity cost of waiting increases sharply when more than one resource waits, as the system would need to expect multiple tasks to enter in a short span of time (otherwise the time spent waiting could be used to complete other tasks) in order to use multiple reserved resources. For this reason, the rest of this paper assumes that only one resource will wait at a time; possible extensions are given in Section 7. If every resource evaluates the cost of waiting and concludes that it is in the system’s best interests for one resource to wait, then resources will need to coordinate to determine which resource waits. In simplistic solutions using local information exclusively, every resource might decide to wait, incurring heavy real costs as available work is ignored. The use of a dummy agent allows resources to implicitly coordinate with each other on when to wait, as only the resource approached by the dummy agent will wait. The dummy agent however can reuse the existing planning mechanism that task agents use to select resources in order to choose a good resource to wait for impending tasks.
It is expensive for a system to not work at full capacity, as every tick spent waiting for the arrival of a task is a tick not spent clearing the tasks that already exist in the system. For waiting to be worth the trouble, truly critical tasks must arrive, where the time savings offered by a dummy agent for these critical tasks are enough to outweigh the loss of processing capacity.
In a proof-of-concept implementation, opportunity cost was simulated by asking an oracle for the highest-priority task that will arrive in the next time step and granting that priority to the dummy task. This is an idealistic level of information, though it may be justified in scenarios where tasks are discovered before resources can be assigned to them (e.g. a computing example where tasks must have large amounts of data uploaded to a cluster before computation can start, or a medical example where ambulance drivers can call ahead to the hospital to alert them to incoming patients, so the tasks are known before they start). Even using perfect knowledge, this simple method slightly overestimates the opportunity cost, as the current tick may still be usable; if the dummy task prevents a slightly-lower priority task from getting an early start which is worth making the slightly-higher priority task wait, then there is utility loss. For example, consider a system where a task that costs 90 utility per tick is already present (Task A), and next tick a new task will arrive that costs 100 utility per tick (Task B). If the only available resource can finish either task in three ticks, then if it started on Task A now and only addressed Task B after finishing with Task A, then the total cost will be 90
A more realistic implementation would not have perfect knowledge of incoming tasks. This could be an issue for resources with disparate compatibilities, as choosing the correct resource to wait becomes nontrivial: Even if the distribution of incoming task types is known, if no one resource is compatible with all high-probability task types, it is difficult for the dummy agent to choose a resource to approach. For example, consider a case of 10 task types, with larger task types having higher priority. If there is a high probability of a new task of type 7 arriving in the next step, then the dummy agent would approach a resource compatible with type 7 tasks. However, if all resources compatible with type 7 tasks are incompatible with type 8 tasks, and the distribution gives an equally high chance of type 7 or type 8 tasks arriving, then the dummy agent must pick according to some other criterion. While this may sound strange in the abstract, consider a medical example where type 7 tasks are patients with broken bones and type 8 tasks are patients with internal injuries; if resources in this case are doctors, then finding specialists in both injuries may be difficult. Our implementation does not address this problem, instead simply choosing a resource to reserve based on which resource is best suited for the most important expected incoming task. A more detailed approach would be to have the agent consider the expected value of reserving each potential resource, given its beliefs about the distribution of incoming tasks and the skills of each doctor. We discuss this option in more detail in Section 7.
Irreversible allocation
As described previously, easy access to preemption greatly softens the impact of any mistakes made when allocating resources, as a task must always yield its resource to another task of sufficiently greater need. However, not all problem domains allow for such easy reallocation. Consider the scheduling of underground construction for a water company, a problem domain where tasks involve tearing up sections of roadway and shutting off water flow in order to perform maintenance. While a resource here is a construction crew, work cannot be interrupted in the middle of replacing a section of underground water pipe, as the interruption in service is unacceptable. Even if a pipe were to break elsewhere, taking the construction crew away from the job in progress would leave just as bad of an emergency at the unfinished job site. In these domains where allocation is irreversible (until the task is complete), there is far less leniency towards bad initial allocations that ignore new incidents.
In operating systems, a computational task that requires exclusive use of a resource and cannot be interrupted by others is referred to as an atomic task. In a general-purpose computing environment, arbitrary computational tasks may depend on each other in ways that make it dangerous to preempt complex resources (e.g. printers, communication channels) from tasks in progress. While modern multi-tasking allows arbitrary processes to be paused and resumed with no loss of state, external resources may carry their own associated state which is not automatically managed by multi-tasking. This is an analogous case of irreversible allocation, and problems such as priority inversion (where a low-priority task holds a resource required by a high-priority task while a medium-priority task uses all the computational time and prevents the low-priority task from finishing) and deadlock (two tasks each need two resources to finish, each task holds one resource, and neither task is willing to abandon its resource, thus neither task can finish) arise in computational environments from poor allocation.
While the system described thus far cannot suffer from priority inversion or deadlock as each task is assumed to require only one resource, high-priority tasks can still be forced to accept low-quality resources if they arrive at a time when all the suitable resources are already allocated. Long-term plans can be valuable in such situations, as a perfect resource which will be available shortly may be preferable over a terrible resource available now, at least for the task that actually receives the perfect resource out of all those competing for it.
Switching costs
The applicability of the dummy agent depends on whether the environment permits preemption. While not all domains permit perfect preemption, preemption is not a binary feature of an environment; reallocating resources between tasks in progress can be expensive without being impossible, and this can be modelled through switching costs.
In computer scheduling, switching costs typically refer to the cost associated with recording the current state of a process and loading the state in a stored process in order to switch execution from the first process to the second. Psychology also uses the term switching cost for the cognitive effort required for a human to stop performing one task and start another, with distractions causing undue effort as they cause subjects to lose their place in an ongoing task.
Resetting task progress whenever a preemption occurs serves as a kind of switching cost, though as delays have different utility impact depending on task severity, the magnitude of this cost varies greatly. Actual difficulty in switching between tasks can be modelled as an additional cost; making such a cost independent of task type allows for analysis of a gradient of problems between a preemption environment and an environment where all allocations are irreversible.
Equation (6) of Section 3.4 does not account for switching costs, as they are independent of the available variables involved in the utility formula (old completion time, new completion time, and task type). However, switching costs can be implemented with a simple adjustment where the baseline utility for preemption requests is calculated; applying a switching-cost penalty here to any case where a task leaves a resource will adjust the behaviour of the model to reflect these switching costs. To make this modification in the system of [29], which we build upon, all that is needed is an additional cost added on line 17 of Algorithm 8 (the core algorithm used in that paper to compute preemption costs). The only other adjustment required is for evaluation purposes, as whatever mechanism used to record the total utility cost incurred by the system must correctly apply switching costs.
Consider Eq. (6) above, which simply examines the expected cost of preemption and approves the preemption request when positive. Every additional cost placed on preemption will reduce the number of cases where this value is positive, making the system more conservative with changing allocations. As the cost increases, eventually the switching cost exceeds any possible benefit from preemption, at which point allocations are irreversible.
Changes in allocation strategy
When resource allocation is irreversible, there is value in spending more time to work out a good allocation. Environments that permit preemption allow sub-optimal allocations to improve through a sequence of trades, encouraging the quick selection of a simple reasonable allocation which can be improved upon later. Opportunity cost becomes far more significant, as a system operating at full capacity (often necessary to maximize utility) is also incapable of responding quickly to new tasks. Preemption allows all currently-working resources to act as a reserve for emergency arrivals, so the lack of preemption makes the idea of reserve resources tenable again.
Our learning model for congestion is not meaningful in an irreversible environment, as to be expected since its learning events rely on preemption. The congestion model only modifies the backup plan value for use in approving preemption, so when preemption is removed this model cannot contribute. However, the model for churn is still valuable, as the ability to look several steps into the future is far more valuable when resources can only be claimed when idle. Asking resource proxy agents when a resource will be free again helps create good plans, but when this information is out of date (for example, if multiple tasks are implicitly queuing up to take a resource once it finishes its current task) the churn model allows a task agent to recognize when it is better to plan myopically and take whatever resources are currently available.
Bother modelling is no longer relevant in an irreversible environment (provided that idle resources are always willing to work), though the bother model can be used to bridge the difference between preemption and irreversible environments. Using
Validation
We begin with a test to confirm the value of our fundamental approach to multiagent resource allocation, as outlined in Section 3. We compare performance against a random assignment system (in which resources were assigned to agents in a random order, without considering desirability or agents’ priorities), and an idealized system in which the supply of resources is finite, but all resources are able to satisfy agents’ needs in a single timestep (whereas, in the real simulation system, agents’ needs are distributed uniformly at random, and take an average of 5 timesteps for a random resource to resolve). The simulations contained 50 resources, and 500 agent types. The needs of an agent were resolved by possessing a resource for a number of timesteps dependent on the resource and agent types (selected uniformly at random between 1 and 11 steps, at the start of a simulation). Fifty resources were used in all simulations, but the number of agents varied between 100 and 2,000. As shown in Fig. 1, the cost per agent (i.e. the product of wait times and an agent’s type) was far higher for the random system, and grew far faster for that system as well.6
Costs per agent (the product of waiting time and an agent’s type) for the heuristic system, a random allocation, and an unattainable idealized system. Points are the average of 100 random allocation problems.
Benefits of using a dummy agent plotted against number of agents that enter the system (evenly distributed over the first 50 ticks, 
We then proceeded to demonstrate performance advantages for our proposed reservation system and its use of dummy agents, as outlined in Section 4.1. In order to indicate the importance of waiting, a resource-constrained simulation environment was used with dynamic arrivals of agents (each with an associated task) over time. When there are only a handful of resources available to process tasks, the impact on the system from having a single resource wait is relatively large and thus more likely to illustrate the benefits and failings of this approach. To further illustrate the difference, a skewed bimodal distribution was used: Tasks assigned to one of two types: minimum or maximum severity, with 90% of tasks at minimum severity. Minimum severity tasks had an associated cost of 1 unit per timestep of delay, while maximum-severity tasks had a cost of 500 per timestep. One simulated resource could complete a task of either type in 4 ticks, while all other resources required 6 ticks to complete a task; the expected lower bound on cost per task would thus be 500
Figure 2 shows the change in average cost7 with 5, 10, and 15 resources between using a dummy agent or not. Under these idealized conditions, adding a dummy agent allows the system to use its resources more efficiently, though not statistically significantly so (pairwise Welsh’s t-test,
Unfortunately, it is the huge disparity in task priorities which allows this improvement in efficiency. Figure 3 shows the results of a similar simulation, differing only in the severities of incoming tasks: While 10% of tasks were still maximum (500) severity, the remaining 90% tasks were at medium (250) severity instead of minimum (1) severity. Once the baseline work available to the system is significant, the loss of system capacity due to the dummy agent accumulates over time and drives down system efficiency as the backlog of meaningful tasks grows. Similar results can be seen in Fig. 4 which uses a uniform distribution of task severities from 1 to 500. The distribution of available tasks can entirely decide the efficacy of this dummy agent.
Benefits of using a dummy agent plotted against number of agents that enter the system (evenly distributed over the first 50 ticks, 
Benefits of using a dummy agent plotted against number of agents that enter the system (evenly distributed over the first 50 ticks, tasks with 
A huge disparity in task priorities is necessary but not sufficient; the value of the dummy agent arises from the ability to respond to rare high-severity events. Figure 5 shows a simulation where tasks are evenly split between maximum (500) and minimum (1) severity rather than making maximum-severity tasks rare. A backlog of high-severity tasks leaves no available work for a dummy agent, as the high-severity tasks already in the system are assured to immediately receive the resources which finish performing work on other high-severity tasks. The resource constraint here works against the efficacy of the dummy agent, as the dummy agent can only provide value through reserving resources if there are resources available to reserve. When all resources are occupied by tasks of the same importance that the dummy agent represents, then the dummy agent cannot provide any benefit.
Benefits of using a dummy agent plotted against number of agents that enter the system (evenly distributed over the first 50 ticks, 
While there is reason to believe that the system will benefit from allowing resources to wait for more critical tasks, the practical benefit is difficult to provide when the environment permits preemption. The ability to revoke resources from tasks in progress reduces the opportunity cost to such a degree that only very specific scenarios permit improvement. Preemption is a valuable tool which should be exploited when available, but not all real-world scenarios permit preemption. To explore the full potential of the dummy agent, we will need to examine cases where preemption is difficult or impossible, resulting in irreversible allocations.
In a system allowing preemption, there is some inherent inefficiency to the use of a dummy agent to reserve resources. A resource allocated to a dummy agent is incapable of performing useful work for the system, restricting the potential benefit; of course, as long as the dummy agent prevents low-priority tasks from wasting time by acquiring the resource and then losing it before completion, then there is a chance of benefit. Additional costs for preemption make this benefit easier to achieve, as preventing these switching costs justifies losing more work. However, this dynamic changes abruptly once allocations are irrevocable (as discussed in Section 4.2). As the dummy agent does not represent a real task, a resource can be allocated to the dummy agent without locking that resource to a task. This means that important tasks can immediately acquire a resource held in reserve by the dummy agent. If the resource had instead been allocated to a low-priority task, then an important task which arrives later is completely unable to access the resource until that low-priority task has completed.
Benefits of using a dummy agent in the presence of 
To construct a scenario where the dummy agent is likely to be important, assume the following: tasks require a sizable investment of time to complete, independent of their severity; critically-important tasks are rare enough to not usually create a backlog; and dummy agents have enough advance warning to reserve a slot if necessary. Figure 6a shows the results of a simulation run under these assumptions where every preemption carries an additional cost of 500 utility, thus requiring severity-500 tasks to provide some proof of benefit in order to preempt severity-1 tasks. Figure 6b shows a similar simulation with a switching cost of 1000 utility, making preemption even more difficult. Figure 6c shows the results of such a simulation where preemption is impossible. As expected, a single dummy agent makes more of a difference when there are fewer resources in the system, as it has the capability to reserve a larger fraction of the available resources. The impact is also much greater when preemption is impossible than when preemption is merely costly, and higher costs increase the potential benefit; while the 500-utility switching cost does not show a statistically significant difference between using a dummy agent or not (pairwise Welsh’s t-test,
The assumption that all tasks require significant portions of time to complete is fairly reasonable. In many domains there are “maintenance” tasks that require a nontrivial amount of work and must be performed eventually but rarely pose a critical threat to performance. In a computing domain, disk defragmentation and system updates fit this role; in medical domains, consider cosmetic and reconstructive surgery as examples of uninterruptible low-priority tasks.
In domains where critical tasks are common enough to create a backlog, there is less value to be gained from reserving resources as the critical tasks will naturally acquire any resources that complete prior tasks. If a block of time is spent completing critical tasks back-to-back, the only improvement granted by the dummy agent exists at the beginning of this block where critical resources begin completion slightly earlier than they would otherwise. However, if critical tasks are rare then without a dummy agent it will be far more common to have all resources tied up with non-critical tasks whenever a critical task arrives, increasing the proportion of time spent waiting for non-critical tasks to complete. Removing this initial wait cost on multiple occasions produces the improvements visible in Figure 6c. Naturally, reserving resources is only useful when there is a reserved resource ready for a newly-arrived critical task. This is where knowledge of future arrivals can be useful.
The topic of multiagent resource allocation has been examined by a variety of researchers. Application areas include rich contexts such as manufacturing flow lines for small and medium sized enterprises [36, 37] and alleviating traffic congestion [38]. We will focus our discussion in this section on different approaches developed by artificial intelligence researchers, distinguishing those intended for competitive environments from ones aimed at enabling solutions in cooperative settings.
In this paper we imagine that there is a common aim among agents in the environment, for the allocation of tasks to have some overall benefit to the organization. In contrast, a host of researchers have focused their attention on examining theoretical properties of resource allocation solutions, specifically for competitive scenarios where agents are self-interested. Bachrach and Rosenschein [39] explore mechanisms with a central authority interacting through payment protocols in order to converge to an optimal solution, when the marginal production of each resource is diminishing. Roos and Rothe [40] are also interested in maximizing social welfare, discovering complexity results such as areas of intractability. Examining theoretical guarantees for optimal outcomes that minimize inequality is the focus of the work of Schneckenburger et al. [41]: the hurdles that arise in practice are discussed as well. One paper that begins to bridge the gap between competitive and cooperative contexts is that of Li and Soh [42]: this work emphasizes the value of communication and coordination between the agents, elaborating on the role of negotiation.
Our work aligns most closely with that of researchers who are focused on cooperative settings. Indeed, a primary motivation for our solution was the context of mass casualty incidents affecting emergency rooms at hospitals. We envision our system being especially applicable to medical scheduling problems of the sort considered by Paulussen et al. [22, 7], for example, those where a group of patients with varied conditions and health states need to be assigned to see a group of doctors in an order which minimizes the years of potential life lost. We are particularly interested in this approach because it uses a fully distributed method for the assignment of agents to resources, while also allowing for agents with great immediate need to acquire necessary resources (which may happen, for instance, with Mass Casualty Incidents).
One aim of our work is to improve upon the resource allocation solution developed by Paulussen et al. [22, 7]. Similar to Paulussen, we assume that each patient can be assigned an agent to negotiate on their behalf. But we move away from Paulussen’s proposal of providing agents with currency proportionate to their needs, as this can produce problems. For example, after several rounds, low priority patients may have amassed capital to acquire resources, even if global utility would be better served by treating the more critical patients first. We also seek to improve upon the conservative approach to allowing preemptions adopted by Paulussen et al. [7]. In particular, they utilize a worst case approximation of the opportunity cost of preemptions, and thus underestimate the value of the behaviours an agent would engage in following a preemption. It is assumed that cyclical preemptions would result in the last agent in the cycle (the “victim”) being left with no resource at all, and do not consider what actions it could take beyond a failed attempt to acquire an already-claimed resource. This will naturally cause the preempted agent to underestimate its ability to acquire a new resource, and thus overestimate the costs associated with yielding its current resource to the preempting agent. In our solution, we strive to reason with more accurate estimates of the costs.
Another paper by Chapman et al. [15] takes a different approach, where resources actively seek out tasks to perform. This approach allows preemption as well as cooperation, as no work is lost if resources switch to new tasks and multiple resources can expedite a single task. However, the passive nature of tasks prevents this approach from rapidly responding to new arrivals, and the assumption that resources are identical removes the problem of determining which resource is best suited to a given task (focusing instead on spatial constraints). Our work focuses on these problems. We note our approach is specifically designed to handle heterogeneous resources, in direct contrast with the distributed, dynamic multiagent proposals of other artificial intelligence researchers, which operate under the assumption of homogeneity, such as that of Parker and Gini [4].8
Conclusions and future work
In this paper, we presented a novel approach to dynamic resource allocation when task switching is costly or impossible, using a “dummy agent” to reserve resources that might be useful later, if new urgent tasks arrive in the system. We showed that this simple addition can significantly improve performance, while maintaining the appealing properties of the underlying dynamic resource allocation system from [29, 44], such as its lack of currency, decentralized nature, and adaptability. This represents an important improvement in the range of situations the system can handle, significantly improving its real-world applicability. This research therefore sheds light on how multiagent systems solutions can be designed to enable effective teamwork and collaboration in challenging, dynamic environments.
Addressing allocation of resources in critical medical scenarios such as mass casualty incidents certainly requires being careful when committing a resource to a task (due to being unable to interrupt and reassign). Our work has demonstrated the potential benefits introduced by allowing for reservations on some of the most valuable resources; further, since we integrate into our solution a modelling of bother (which can equate as well with fatigue, for humans), we take into account true real-world concerns, when determining when to initiate the social interactions that will end up providing effective allocation solutions.
In our treatment of preemptions, we have examined more closely how to enable future planning for resources which may be needed, but where the preemption act is not yet imminent. This has been achieved through our concept of reservations and we have worked out in detail how the agents can reason appropriately in order to enable the resource allocation to proceed effectively. Introducing dummy agents proves to be particularly valuable. Central to our overall solution we have as well the dual threads of planning and learning: we are sensitive to dynamic changes in churn and congestion. Communication between agents is also essential for our coordination to succeed: the elements required for effective planning within the system are shared and thus can influence the decision making process.
Multiagent resource allocation has been proposed for a host of other social situations. Natural disaster scenarios are the focus of the work of [45], where spatial and communication constraints need to be addressed when determining how to perform the allocation. Supporting consumers in service selection for computing resources is the backdrop of the effort by [46], which promotes a distributed solution to cope with dynamic change, in the face of imperfect information. Enabling interaction between robots and humans motivates the research being done by a host of researchers (e.g. [47, 4, 48]), with approaches that introduce sequential auctions to negotiate for optimal solutions, support long term coalitions sharing utilities with other teams, and address tolerance of failures through a centralized lookahead. What is not considered in these solution frameworks is addressing the overarching concern of committing a task to a critical resource, leaving that resource unavailable for a more demanding task that arrives dynamically soon after, a central focus of the work we present in this paper. For future work, we could imagine examining the applications mentioned above more closely, possibly integrating into our proposed solution new methods for learning and planning which are ideally suited for these other social scenarios. It may also be useful to examine our research as part of a larger effort to enable teamwork in health care settings, considering related work such as [49].
There are several clear extensions for our research. First of all, because the existing algorithms are fairly computationally efficient (capable of resolving a system with 5000 tasks and 100 resources during one test), future improvements can freely focus on adding more detail to the model to improve the utility of allocations. For example, we could enhance the learning and decision making capabilities of the dummy agents, or of the task agents with more computationally complex approaches.
The “dummy agent” as presented in this paper has room for expansion. The algorithms provided rely on oracular knowledge of incoming tasks, which is justifiable in some domains for a short horizon (provided there is a gap between becoming aware of a necessary task and becoming capable of addressing the task). There are clear paths for advancement here: First, this limited knowledge of the future can be harnessed more fully in an attempt to derive benefit in more general scenarios than the selection provided; second, the oracular assumption can be dropped in favour of a predictive approach which uses prior knowledge to anticipate future task arrivals and take action based on imperfect information. In this latter case, it would be important to solve the problem of selecting the optimal resource to reserve in cases where the specific type of expected task is uncertain, as an ideal algorithm would reserve resources that are suited to incoming tasks. Another potential expansion would be the introduction of multiple dummy agents, thus allowing the system to keep more than one resource reserved; these dummy agents would need to coordinate with each other in order to avoid reserving more resources than there are expected task arrivals.
Footnotes
We assume human resources, which might decline to fulfill frivolous requests when their BSF is high.
Per our assumption of human resources, time ticks in plans are identical to time ticks used to measure task completion (i.e. resource requests are processed on the same timescale as task completion).
These parameters are presented generically for completeness, but we use fixed values in our experiments.
Cost is simply a negative utility incurred by delays in processing. Costs are averaged over all tasks, over all ticks.
Acknowledgments
The authors gratefully acknowledge support from the Natural Sciences and Engineering Research Council of Canada.
Authors’ Bios
