Using reservations for multiagent resource allocation with costly preemption

Abstract

We examine multiagent resource allocation to maximize global utility in settings with dynamic arrivals, proposing a distributed coordination framework that allows resource preemptions. We then introduce the concept of resource reservations to address expected future changes in the task allocation environment; accounting for switching costs and considering irrevocable allocations both become desirable. We address these problems using dummy agents and validate our approach using simulations comparing utility with and without dummy agents. We discuss the value of this framework in the context of social computing, enabling improved interactions for teamwork and coordination in domains like health care for emergency response management.

Keywords

Multiagent resource allocation preemption proxy agents plan-based coordination

1. Introduction

Multiagent systems are of interest to the artificial intelligence, industrial engineering, and operations research communities as a method of solving resource allocation problems. There are a number of reasons for this interest, including increased robustness (since individual components can fail without catastrophic loss) (e.g. [1]); natural mappings to problems where the parties involved in the allocation have some degree of autonomy (e.g. [2]); and as a useful heuristic approach in problems which are too large to solve directly (e.g. [3]).

In order to enable better teamwork among human users, models for multiagent resource allocation may be employed. Motivated by scenarios of allocating doctors to patients in emergency medical settings, we examine the challenge of deciding whether to delay assignment of resources which may not be interruptible once they begin processing. We offer a solution guided by the concept of a reservation, resting on the integration of planning and learning, and on the modelling of bother incurred by these human resources.

We will begin by summarizing a multiagent resource allocation framework. After presenting the details of this framework, we proceed to outline a proposal for anticipating future changes through the use of what we refer to as resource reservations. We then clarify how this approach can be leveraged in scenarios where there are irreversible allocations. Validations are presented, using simulated scenarios, to quantify the value of the proposed solutions. In all, we offer a new direction for extending the scope of resource allocation decision making, intended for environments that are highly dynamic, and where new work may arise at short notice.

Within the area of multiagent systems, resource allocation tasks consist of problems wherein a set of agents, with individual preferences, is to be allocated a set of (possibly) heterogeneous resources. Ideally, the allocations are selected to maximize a value function. In this work, we are especially concerned with iterated versions of this problem, where agents may repeatedly renegotiate a given allocation as new agents arrive. We will focus primarily on distributed approaches (e.g. [4, 5, 6]), as there is a natural mapping from these representations to the sorts of problems we envision the system being used to solve, for example, problems in the domain of medical resource allocation, where distributions of doctors’ time may be required to change frequently [7, 8].

2. Literature review

Within Multiagent Systems, the assignments of objects, items, or tasks to agents has been widely studied for decades. For a more comprehensive overview of the area, we refer the interested reader to [2] or [9].

The area is organized in two parallel literatures, considering similar problems and developing similar solutions. In Multiagent Task Allocation, a central agent divides a larger project into smaller units, and distributes them among a population of assistant agents to maximize efficiency, utility, or throughout. This distribution may be planned centrally [10, 11], or emerge dynamically through negotiation among the assistants [12, 13, 14, 15], or through distributed heuristic techniques [16, 17, 18]. In general, self-organizing systems are more robust in large scale deployments [9].

Resource Allocation considers the closely related complementary problem where a central agent distributes resources to agents in response to their individual needs or requests. This scenario models a wide range of real-world problems [19, 20, 21, 22, 23]. As with Task Allocation approaches, Resource Allocation approaches can be categorized by the method used to assign resources to agents. In centralized approaches, agents typically submit information about their requirements to a central planner, who computes an allocation directly [24, 25, 26, 23, 27]. These approaches typically offer strong theoretical guarantees on performance, but require significant computation and communication in large scale systems [9]. In contrast, self-organizing systems have the beneficiary agents negotiate or heuristically coordinate among themselves to assign the resources [28, 29, 30, 31]. As with Task Allocation, self-organizing approaches to Multiagent Resource Allocation generally scale better, providing reduced communication and coordination costs, at the expense of robust theoretical guarantees on efficiency or throughput [9].

Active research in multiagent resource allocation focused on theoretical systems, and recent surveys [9] of the field call for the adoption of more real-world constraints, dynamic re-allocation of resources, and the design of self-organizing systems for cooperative environments, topics addressed by the work in this paper. Other topics, including the interplay between related tasks, dynamically learning the skill profiles of agents during allocation, and the design of self-organizing systems for common emerging physical domains like the smart grid are not touched on directly.

This paper is best understood in the context of self-organizing solutions to Multiagent Resource Allocation, and adopts a heuristic approach, a strategy which has been shown to be successful in a variety of applications [22, 32, 29]. We discuss how to improve upon these models by introducing the concept of “dummy” agents, tasked with securing the right to use resources at a future point on behalf of needy agents who may potentially appear but who have not yet entered the system, and so cannot yet negotiate on their own behalf. The benefits of this approach and the effectiveness of existing systems are evaluated in simulations designed to mimic the essential features of medical resource allocation problems, where patients come and go, and patients’ needs dynamically change over time.

3. A framework for preemptive resource allocation

In this section, we present an overview of a framework for multiagent resource allocation that supports preemption. We begin by articulating the problem that this model is intended to address, followed by comparisons with other existing models. We then proceed to present the formal specification that is the underpinning for the system that we designed, following this with a complete, but concise description of that system.1

3.1 The problem

In our research, we are trying to address the tension arising from two opposed objectives. Specifically, we wish to (i) preserve agent autonomy while (ii) ensuring that critical agents can get access to the resources they need, even and especially when those resources are currently assigned to agents with much lesser needs. In order to do so, we seek to improve upon existing multiagent resource allocation solutions.

Distributed approaches to multiagent resource allocation problems possess different levels of agent autonomy. In a fully distributed approach, agents typically have individual ownership of some initial set of resources, which they can then voluntarily exchange with others for resources that they prefer. They exercise full control in deciding whether to exchange their resources for those of other agents and thus may be selfish. Alternatively, it is possible to adopt a convention that agents will not be purely self interested, but will cooperate with one another to some extent – for example, allowing the formation of coalitions in which various subgroups collaborate to produce mutually beneficial allocations (e.g. [33]).

In this work, we adopt an approach between these two views. Our agents act as individuals, and plan only to satisfy their own resource requirements. However, they are also much more cooperative than in many existing distributed systems, and might accept trades of resources which are not advantageous to them personally, if doing so improves the global welfare of the agents as a whole. Similarly to Paulussen et al. [7], we accomplish this by supporting preemption of resources, wherein one agent will relinquish its resource to another agent with a greater need. Accurately determining whether such a trade is possible requires estimating the opportunity cost (the utility the agent could have obtained from taking some other action) of having a preempted agent hold its resource. That is, we must be able to determine what the agent’s behaviour will be following a preemption, and what the utility derived from that behaviour is likely to be. Our system provides a more accurate estimation of preemption costs by combining planning techniques (which predict future behaviours) with learning techniques (which determine the expected value of those behaviours) in a novel fashion.

3.2 Formal problem specification

We define a resource allocation problem as a set of tasks $T$ , a set of resources $R$ , and the following functions: completion: $T,R\mapsto\mathbb{N}$ , taking a task and a resource and returning the amount of time required for that task to be completed by that resource; and $\textit{arrival}:T,\mathbb{N}\mapsto\{\textit{true},\textit{false}\}$ , taking a task and a time tick and returning whether that task has entered the system by that tick.

The solution to such a resource allocation problem is an allocation, defined as a function over $R,\mathbb{N}\mapsto\{T\cup\emptyset\}$ taking a resource and a tick and returning the task (if any) that the resource is allocated to at that tick. An allocation must satisfy the following constraints:

•
$\forall t\in T,\exists r\in R,n\in\mathbb{N}\ [\textit{allocation}(r,n+i)=t\ % \forall i\in[0,\textit{completion}(t,r)]\land\forall q\in R,m\in\mathbb{N}\ [m% \geqslant n+\textit{completion}(t,r)\implies\textit{allocation}(q,m)\neq t]]$ (all tasks are completed, and not allocated further resources after completion)
•
$\forall t\in T,r\in R,n\in\mathbb{N},\lnot arrival(t,n)\implies\textit{% allocation}(r,n)\neq t$ (tasks cannot be allocated before they arrive)
•
$\forall r,q\in R,n\in\mathbb{N}(r\neq q\wedge\textit{allocation}(r,n)\neq% \emptyset\implies\textit{allocation}(r,n)\neq\textit{allocation}(q,n))$ (each task is allocated at most one resource)

An optimal allocation maximizes utility. In this work we will consider a utility function EU: $\mathbb{N}$ , $\mathbb{N}$ , $T\mapsto\mathbb{R}$ , that takes the old and new completion ticks of a task and returns the change in utility from this change in completion time. Using this, an allocation’s utility can be determined by:

$\displaystyle\textit{EU}_{\textit{allocation}}=\sum_{t\in T}\textit{EU}(% \mathop{\text{argmin}}\limits_{n\in\mathbb{N}}(\textit{arrival}(t,n)=\textit{% true}),\mathop{\text{argmin}}\limits_{n\in\mathbb{N}}(\exists r\in R\ \textit{% allocation}(r,n)=t)+1,t)$ (1)

This accumulates the change in utility for each task based on the total time spent in the system, from time of arrival to time of completion (one tick after the last tick at which a resource is allocated to the task).
3.3 Components

Our system models distributed resource allocation problems using four types of components and four domain-specific functions.

In our system, a task represents a need for some user which must be satisfied through the acquisition of a resource. A task is described by the following information:

•
A type $\theta_{t}\in\Theta_{T}$ .
•
The resource $r$ currently assigned to this task.
•
A countdown $\textit{time}\_\textit{left}$ of time remaining before the current resource $r$ will complete the task, or $\infty$ if $r$ is NULL.

Tasks are supported by task agents, which we will refer to simply as “agents” throughout. We define an agent by:

•
An associated task they are responsible for resolving.
•
A plan for resolving their task: a sequence of actions that the agent will take, based on its understanding of environmental conditions, to acquire needed resources. The actions are attempts to acquire resources, and are selected based on the agent’s beliefs about the expected value and probability of obtaining each resource.
•
An expected utility $\textit{eu}_{\textit{plan}}$ describing the value of executing its plan. The effective coordination of requests for resources in the system depends on sharing this expected utility, as explained below.
•
A length of time since the plan was generated $t_{\textit{update}}$ , used to determine which step of the plan to execute next.

Task agents also have two state variables which respectively measure churn (how quickly the environment is changing) and congestion (demand for resources the agents need). These concepts are clarified further in the subsection below. In particular, we explain how modelling churn leads an agent to construct shorter or longer plans.

In our system, resources are characterized by a type $\theta_{r}$ , which is a member of a finite set of possible types $\Theta_{R}$ . In order to reason about the likelihood that task agent plans will be successful, we assume that resources are human beings who will ultimately be engaged in interaction, in order to be asked to perform tasks. Following Cohen et al. [34], we assign a “bother-so-far” value BSF to each resource. This user modelling factor is intended to capture the extent to which a resource is being overwhelmed by requests from the system. We thus integrate the option of denying any new requests.

Resources are supported by proxy agents which limit the number of requests the system makes of humans to prevent unnecessarily increasing bother. The resource requests made by task agents are filtered by these proxy agents which evaluate the expected utility of granting the request, only forwarding those requests which increase net utility.

Four additional domain-specific functions are employed by resource proxy agents:

•
$\textit{processing}\_\textit{time}(\theta_{r},\theta_{t})$ is the length of time that a task with type $\theta_{t}$ needs to control a resource with type $\theta_{r}$ in order to be processed.
•
$\textit{EU}(\textit{old},\textit{new},\theta_{t})$ describes the expected gain in individual utility if a task of type $\theta_{t}$ is completed at time new instead of at time old.
•
$P_{\textit{stop}}(\textit{EU}_{\textit{net}},\textit{BSF})$ describes the probability that a resource with total bother BSF would agree to begin resolution of a task, assuming an overall gain of $\textit{EU}_{\textit{net}}$ .2
•
$\textit{bother}\_\textit{increase}\_\textit{function}(\textit{EU}_{\textit{net% }},\textit{BSF},\theta_{r})$ describes the increase in a resource’s BSF from a request to take an action with net utility $EU_{net}$ , given their BSF and type.

3.4 Algorithm description

Our proposed system can be summarized as follows. At each time tick, a set of new tasks to be completed arrives, and each task is assigned an agent. Each agent requests the resource needed for its task from that resource’s proxy agent, and all incoming requests to a resource are batched together. The request with the highest positive improvement in expected utility is then accepted. The gain and loss in utility can be calculated directly from the type of each task (revealed by task agents when requesting resources) and the known time to complete each task. The potential $eu_{\textit{plan}}$ is known from the reported utility of an agent’s backup plan (a required calculation of each agent requesting or receiving a resource).

A task agent generates a plan for satisfying the resource needs of its user, provided that the agent has no such plan in the present step. In this work we model each plan as a Transfer-Of-Control (TOC) strategy [35, 12]: a series of attempts to acquire resources, with each new step only executed if the previous step has not been successful. For example, such a plan might be to approach resource $r_{1}$ during the first and second tick,3 then approach resource $r_{2}$ during the third tick due to some perceived opportunity, then return to approaching resource $r_{1}$ during the fourth tick. If the agent successfully receives any of these resources, then the plan has succeeded and the rest of the plan is discarded. If the agent reaches the end of this plan without receiving a resource, then the agent generates a new plan. In our implementation, we use an unconstrained dynamic program to generate these TOC plans. The agent’s plan consists of a set of actions $X$ , selected by the planning technique to maximize the recursive valuation function $V$ :

$\displaystyle V(X)=\textit{EU}(X_{i})\times P(X_{i})+(1-P(X_{i}))\times V(X% \setminus X_{i})$ (2)

where $X_{i}$ is the first action in the list, EU is the expected gain in utility if the action succeeds, and $P$ is the probability of the action succeeding (based on the bother model). Both EU and $P$ can be computed from the domain-specific functions EU and $P_{\textit{stop}}$ mentioned above.

Agents model their environment using two parameters which facilitate replanning. The parameter $a.m_{\textit{churn}}$ is agent $a$ ’s estimate of the extent of churn in the system. This is learned heuristically using “churn events”: interactions where the agent discovers its plans for acquiring resources are based on information that is no longer true because the environment has changed. These events are triggered every time the agent has to regenerate its plan due to a failed validation attempt which indicated that the pre-planned action could no longer improve utility. The model’s internal value is used to determine how much effort should be put into planning out future actions further into the future. In particular, the plan length used by an agent is given by:

$\displaystyle\textit{plan}\_\textit{length}=\left\lfloor\frac{k-1}{1+a.m_{% \textit{churn}}}\right\rfloor+1$ (3)

where $k$ is the maximum allowable plan length (limit of computational resources), and $a.m_{\textit{churn}}$ is an exponentially weighted decaying sum of the number of churn events experienced by the agent, given by:

$\displaystyle a.m_{\textit{churn}}=\sum_{i\in I}c\,\alpha^{t_{i}}$ (4)

where $c$ is the penalty for a churn event, $\alpha$ is a decay rate,4 and $t_{i}$ is the time since the event $i$ . Thus, the more frequently the plan is regenerated, the higher the churn value is, and the shorter new plans are. If the environment is stable enough for plans to be reliable, then the churn value will decay and permit the creation of longer plans.

An agent also estimates the demand for resources in the system ( $a.m_{\textit{congestion}}$ ); there is a penalty ( $c^{\prime}$ ) for if the agent learns that what it expects is inaccurate, which can also decay over time ( $\alpha^{\prime}$ ). Events for this model occur when a held resource is taken before a task is completed. As a result, an agent will adjust the expected utility of its plan per Eq. (5) below (using $V(X)$ from Eq. (2)).

$\displaystyle a.eu_{\textit{plan}}=\left\lfloor\frac{V(X)}{1+a.m_{\textit{% congestion}}}\right\rfloor$ (5)

The resource proxy agent filters incoming requests; only the request with the highest positive improvement in expected utility made during a tick is forwarded to the underlying resource. This batching of resource requests is particularly valuable when a resource has just completed a task, as otherwise low-priority tasks may give up a reliable resource, only to be displaced by the next-highest priority task and so on, in sequence. The resource proxy agent calculates the expected utility of allowing a preemption according to the following equation:

$\displaystyle\textit{EU}(X_{i})=\textit{EU}\ (\textit{req.time}\_\textit{left}% ,\textit{pt}_{r,\textit{req}},\textit{req}.\theta_{t})\hskip 62.596063pt\text{% (Gain)}+\ \textit{EU}\ (\textit{own.time}\_\textit{left},\textit{pt}_{r,% \textit{req}}+\textit{pt}_{r,\textit{own}},\textit{own}.\theta_{t})\ \text{(% Loss)}+\ \textit{own.eu}_{\textit{plan}}\hskip 147.954331pt\text{(Regain)}$ (6)

where req is the task requesting a resource $r$ currently owned by task own, and $\textit{pt}_{r,t}$ gives the processing time for a resource $r$ to complete a task $t$ .

The Gain portion of this equation corresponds to the utility the preempting agent would receive by acquiring the resource right now. This gain is the result of the task the agent seeks to complete being finished earlier than it would be otherwise. The Loss portion corresponds to the change in utility for the agent currently holding the resource, if it were to yield its resource to the preempting task, and then the resource after that task is complete, reacquire it. The Regain portion is the expected utility the preempted agent will acquire from executing its current plan instead of waiting to reacquire the lost resource.

Agents that have been preempted many times will report lower $\textit{eu}_{\textit{plan}}$ values (because of higher congestion experienced). This reflects the idea that in truth, they will likely be left without a resource if preemption occurs. Preemption is allowed if net utility is positive. Having resource proxy agents perform these calculations works towards maximizing $\textit{EU}_{\textit{allocation}}$ by allowing the agent to automatically filter out any requests that will reduce overall expected utility.

Note that resources accept requests according to the domain-specific function $P_{\textit{stop}}$ , which abstracts the probability of a resource with a given bother level BSF agreeing to address a new task, given that doing so has an expected utility of $\textit{EU}_{\textit{net}}$ . This captures the idea that resources may have some autonomy (e.g. human resources). If the request is accepted, the resource is allocated to that task and the task’s processing time is set.

A final defining characteristic of the model is its applicability to scenarios where tasks may arrive dynamically. Since our solution does not require the task agents to possess full knowledge of competing requests, new tasks can be inserted at any time without disruption.

As mentioned, the overall design goals of our solution for coordination among the agents are inspired by the transfer of control paradigm of Scerri et al. [12, 35]. Our work aligns well with the extension of this framework constructed by Cheng and Cohen [32], which enables both transferring decision making control and communicating in order to determine whether that control will be accepted.

3.5 Comments

The main strength of the model presented here comes from the combination of several factors. In order to allow as many preemptions as possible which increase the utility of the system overall, we present a novel way of computing opportunity costs. This calculation requires us to know what the preempted agent could do instead of holding its current resource, and exactly how beneficial such behaviours would be. While Paulussen et al. [7] considered only the immediate actions an agent could take when computing opportunity costs, we can do better. When a preempting agent requests a resource, we use the value of the preempted agent’s plan as an estimate for the value of the alternative actions this agent would take if the preemption were allowed. Using the planning method allows our model to both avoid currency and utilize preemption in a principled and broader way than in previous approaches.

It is not sufficient to use these estimates directly however, because an individual agent’s plan can be overly optimistic in two ways as a result of using only local information when making decisions. First, the agent might generate a plan for its current environment, only to have that environment change. If the agent allows its resource to be preempted on the basis of such a plan, the resulting preemption may actually decrease the utility of the system. Second, the agent is not directly aware of the resource demands of others in the system, so it may assume that the acquisition of another resource will be easy when high demand may make it quite hard. The use of our heuristic learning models for churn and congestion is thus a central element of the framework, allowing agents to dynamically adjust their estimates for how long a plan will be useful and the probability they will be able to hold onto any resources they acquire, using only local information. We also clarify what kind of communication between agents can facilitate the decision making about preemption of resources (information about backup plans, for instance).

4. Expanding the multiagent resource allocation framework

In this section, we introduce the concept of reservations of resources, and discuss how to address irreversible allocations, in an effort to expand the scope of what may be handled for distributed multiagent resource allocation with dynamic task arrivals.5

4.1 Towards a reservation system for resource allocation

If there is a cost associated with reassigning resources, then there may be times where it is more efficient to keep a resource idle rather than locking it into an allocation where it provides less benefit than it might be able to provide in the future. Under these circumstances, the higher the cost, the easier it is to find situations where waiting can help. In this section, we outline a proposal that allows waiting through reservation.

4.1.1 Opportunity cost and the value of waiting

Opportunity cost is the drop in utility due to losing the ability to perform an action; for example, consider a case where you have $10, you can buy coffee for $1 to gain 1 utility, and tomorrow you can buy a movie ticket for $10 to gain 20 utility. (Assume that having extra money left over has no value.) Buying coffee now gives more utility than not buying coffee (1 utility for buying, 0 for waiting), but it comes with the opportunity cost of 20 utility because you won’t be able to afford the movie ticket later. The correct decision in this case is to keep money in reserve in order to take advantage of the future payoff. Note that opportunity cost is highly dependent on the expected changes in environment, so potential tasks which can use the reserved resource should feel free to challenge the decision to reserve a resource under the hope that the opportunity cost has lowered. For example, if you find out that other plans will prevent you from seeing the movie tomorrow, then the opportunity cost drops from 20 utility to 0 utility and so there is no need to conserve money now when you could use it to gain utility.

The opportunity cost in a preemption environment is usually smaller, but still exists. In the example above, imagine if you could borrow $1 from a friend at the cost of 2 utility due to the headache of tracking debt. In this case, the total cost of buying the coffee drops to just 2 utility, because you can still see the movie by borrowing money (i.e. there is no opportunity cost to buying the coffee, just the cost of borrowing money). The opportunity cost is still higher than the utility gain from buying coffee, but if there is some probability that you won’t be able to see the movie even if you can pay for it then not buying coffee now could be a wasted opportunity. With risk-neutral preferences, you should buy coffee if there is a 50% or higher chance of missing the movie if you can borrow money. This is because if you buy coffee (utility gain 1), and you will go to see a movie with probability $p>$ 0.5, the expected cost incurred tomorrow is $2p$ (i.e. the probability you need to borrow money, times the cost of borrowing). In contrast, if you cannot borrow money, the gain of buying coffee is the same (1 utility), but the potential loss is now $20p$ instead, so buying coffee is only rational if there is a 95% or higher chance of missing the movie (i.e. $p<$ 0.05, so $20p<$ 1) if you can’t borrow money.

4.1.2 When waiting is valuable

Waiting is an expensive operation for resources in a rate-limited resource allocation environment, as cutting down on the amount of time that a resource is in use effectively reduces the pool of resources available, slowing the completion of all tasks currently in the system. However, there are still edge cases where hasty allocations can similarly waste resource capacity.

Consider a utility function where each task applies a constant cost to the system for every time step that the task remains in the system. Each time step that a resource waits thus applies a potential cost to the system (i.e. a loss of utility from delaying a task) equal to the highest single step cost among available tasks, as the resource could have been applied towards finishing that task one step faster. This captures how there is no wait cost when there are already enough resources to handle all existing tasks.

For a concrete example: Consider a computing cluster with two servers (resources) available, $r_{1}$ and $r_{2}$ . A task $t_{1}$ already exists in the cluster with a severity of 200, and in one tick a new task $t_{2}$ with severity 401 will enter the cluster. Let $r_{1}$ be a standard server that can complete either task in 6 ticks, while $r_{2}$ is a high-end server that can complete either task in 3 ticks. Before $t_{2}$ enters the cluster, it appears strictly better to assign $t_{1}$ to the high-powered $r_{2}$ to save on 3 ticks of processing time at 200 utility lost per tick. However, if the cluster makes this seemingly reasonable allocation, then one tick later $t_{2}$ enters the cluster and has the severity to preempt $r_{2}$ from $t_{1}$ . This follows from how delaying the completion of $t_{1}$ by 4 ticks (instead of being completed in 2 more ticks it would be completed in 6, whether by running on $r_{1}$ or by waiting for $r_{2}$ to complete $t_{2}$ and restarting on $r_{2}$ ) costs 800 utility while delaying the completion of $t_{2}$ by 2 ticks (waiting for $t_{1}$ to complete in 2 ticks before starting on $r_{2}$ rather than preempting $r_{2}$ and starting immediately) costs 802 utility. The first tick of computation spent by $r_{2}$ on $t_{1}$ is thus wasted. However, if the powerful $r_{2}$ had been reserved for $t_{2}$ in that first tick by allocating $r_{1}$ to $t_{1}$ instead, then $t_{1}$ would be completed in a total of 6 ticks (instead of 7 ticks, the first of which was wasted due to preemption) while $t_{2}$ would still be completed in 3 ticks, for a total savings of 200 utility. If there was an unacceptably high penalty to stopping a task in progress, the less-severe $t_{1}$ is completed in 3 ticks while the more-severe $t_{2}$ is completed in 5 ticks (for a total cost of 3 $\times$ 200 $+$ 5 $\times$ 401 $=$ 2605 utility), as opposed to $t_{1}$ being completed in 6 ticks on $r_{1}$ while the powerful $r_{2}$ , reserved for $t_{2}$ , completes $t_{2}$ in 3 ticks (for a total cost of 6 $\times$ 200 $+$ 3 $\times$ 401 $=$ 2403 utility, saving 202 utility).

Consider a case with only one resource and generalized tasks. Take two tasks:

•
Task $t_{1}$ , which costs $x$ utility per step and will require $y$ steps to finish, and exists in the system right now.
•
Task $t_{2}$ , which costs $a$ utility per step and will require $b$ steps to finish, and will enter the system in $c$ steps.

The net cost for finishing the first task first thus comes to $x\times y+a\times(b+y-c)$ , while the cost for waiting for the second task comes to $x\times(y+b+c)+a\times b$ . In order for waiting to be worthwhile, the following relation must therefore hold:

$\displaystyle x\times y+a\times(b+y-c)>x\times(y+b+c)+a\times b$ $\displaystyle a\times(y-c)>x\times(b+c)$ $\displaystyle\frac{a}{x}>\frac{b+c}{y-c}$

As expected, if $t_{1}$ could be completed before $t_{2}$ arrives (if $y<c$ ) then there is no reason not to finish $t_{1}$ first. For tasks that take similar amounts of time to complete, $t_{2}$ must have higher severity than $t_{1}$ ; taking the example above where either task could be completed in 3 ticks and $t_{2}$ is delayed by 1 tick, $t_{2}$ must have higher severity by a factor of $\frac{3+1}{3-1}=$ 2 to be worth waiting for.

As tasks take longer amounts of time to complete, the late-arriving task needs less of an advantage to be worth delaying the currently-available task. In particular, a currently-available task which requires many steps to finish anyway may be worth putting on hold to quickly complete new tasks that require less time.

The analysis becomes more difficult when there are multiple resources involved, as different resource assignments must be considered and the specifics of future arrivals become important. Suppose now that there are resources $r_{1}$ and $r_{2}$ , and adjust the tasks from before:

•
Task $t_{1}$ , which costs $x$ utility per step and will require $y_{1}$ steps to finish with $r_{1}$ or $y_{2}$ steps to finish with $r_{2}$ , and exists in the system right now.
•
Task $t_{2}$ , which costs $a$ utility per step and will require $b_{1}$ steps to finish with $r_{1}$ or $b_{2}$ steps to finish with $r_{2}$ , and will enter the system in $c$ steps.

Suppose that $r_{1}$ is a better resource than $r_{2}$ (thus $y_{1}<y_{2}$ and $b_{1}<b_{2}$ ). Also assume that $r_{1}$ is not so much better than $r_{2}$ that the best case is always to have $r_{1}$ complete both tasks, as that analysis is shown above. Then there are three possible outcomes:

(iii) (i)
$t_{1}$ is finished by $r_{1}$ and $t_{2}$ is finished by $r_{2}$ , at cost $(x\times y_{1})+(a\times b_{2})$
(ii)
$t_{1}$ is assigned $r_{1}$ which is then taken by $t_{2}$ which finishes while $t_{1}$ is finished by $r_{2}$ , at cost $(x\times(c+y_{2}))+(a\times b_{1})$
(iii)
$t_{1}$ is finished by $r_{2}$ and $t_{2}$ is finished by $r_{1}$ , at cost $(x\times y_{2})+(a\times b_{1})$

The middle outcome is strictly worse than the last outcome where the correct final allocation is known. Optimal allocation thus gives the system $x\times c$ utility every time that the last outcome is better than the first outcome. Longer lookahead improves resource reservation decisions (capped by the processing time of tasks using the best resource), and this benefit is dependent on the priority of the less important task – if some tasks are of negligible importance to the system, then it is easiest to simply preempt resources from those tasks when necessary.

Thus, when preemption is prevalent, reserving resources is not for the benefit of the tasks that receive the reserved resources. Those tasks could have preempted the resource even if it had not been reserved. Reservation instead improves utility for tasks that are warned away from taking resources that they will not be able to keep.
4.1.3 Dummy agent approach

The value of a wait action can be provided in a task-centred system by dummy task agents, which are agents representing a blank ‘dummy’ task which corresponds to waiting for a real task that will enter the system later. Unlike other tasks this dummy task can always be interrupted (even in environments where preemption is difficult or impossible), as a resource ‘performing’ this task is understood to be on standby until a sufficiently important task interrupts it. Once the real task in question arrives in the system, the dummy agent grants its resource to that task and seeks a new incoming task to represent.

A dummy agent competes with other task agents for resources by reporting the utility of waiting, measured by examining the opportunity cost of allocating the resource now. If the utility of waiting is higher than the utility gain from allocating the resource to any other task that approached it, then the resource is allocated to the dummy agent instead of working on a real task; by definition, the cost of allocating the resource is higher than the cost of leaving it idle in this case. By reusing the concept of a task agent, the dummy agent seamlessly provides a wait action to resources as an alternative to other available tasks. Note that the opportunity cost of waiting increases sharply when more than one resource waits, as the system would need to expect multiple tasks to enter in a short span of time (otherwise the time spent waiting could be used to complete other tasks) in order to use multiple reserved resources. For this reason, the rest of this paper assumes that only one resource will wait at a time; possible extensions are given in Section 7. If every resource evaluates the cost of waiting and concludes that it is in the system’s best interests for one resource to wait, then resources will need to coordinate to determine which resource waits. In simplistic solutions using local information exclusively, every resource might decide to wait, incurring heavy real costs as available work is ignored. The use of a dummy agent allows resources to implicitly coordinate with each other on when to wait, as only the resource approached by the dummy agent will wait. The dummy agent however can reuse the existing planning mechanism that task agents use to select resources in order to choose a good resource to wait for impending tasks.

It is expensive for a system to not work at full capacity, as every tick spent waiting for the arrival of a task is a tick not spent clearing the tasks that already exist in the system. For waiting to be worth the trouble, truly critical tasks must arrive, where the time savings offered by a dummy agent for these critical tasks are enough to outweigh the loss of processing capacity.

In a proof-of-concept implementation, opportunity cost was simulated by asking an oracle for the highest-priority task that will arrive in the next time step and granting that priority to the dummy task. This is an idealistic level of information, though it may be justified in scenarios where tasks are discovered before resources can be assigned to them (e.g. a computing example where tasks must have large amounts of data uploaded to a cluster before computation can start, or a medical example where ambulance drivers can call ahead to the hospital to alert them to incoming patients, so the tasks are known before they start). Even using perfect knowledge, this simple method slightly overestimates the opportunity cost, as the current tick may still be usable; if the dummy task prevents a slightly-lower priority task from getting an early start which is worth making the slightly-higher priority task wait, then there is utility loss. For example, consider a system where a task that costs 90 utility per tick is already present (Task A), and next tick a new task will arrive that costs 100 utility per tick (Task B). If the only available resource can finish either task in three ticks, then if it started on Task A now and only addressed Task B after finishing with Task A, then the total cost will be 90 $\times$ 3 $+$ 100 $\times$ 5 $=$ 270 $+$ 500 $=$ 770 utility. However, if the resource waits for one tick to finish Task B as soon as possible, and no other resources are available to resolve Task A until after Task B is complete, then the total cost will be 90 $\times$ 6 $+$ 100 $\times$ 3 $=$ 540 $+$ 300 $=$ 840 utility which is worse. With that said, in a system with perfect lookahead this corner case is the only way for the dummy agent to negatively impact the system. In order to see improvement from waiting, the ratio of task importance must be higher as the task completion time decreases, as the single missed step becomes more important when it can complete more of the currently-available task.

A more realistic implementation would not have perfect knowledge of incoming tasks. This could be an issue for resources with disparate compatibilities, as choosing the correct resource to wait becomes nontrivial: Even if the distribution of incoming task types is known, if no one resource is compatible with all high-probability task types, it is difficult for the dummy agent to choose a resource to approach. For example, consider a case of 10 task types, with larger task types having higher priority. If there is a high probability of a new task of type 7 arriving in the next step, then the dummy agent would approach a resource compatible with type 7 tasks. However, if all resources compatible with type 7 tasks are incompatible with type 8 tasks, and the distribution gives an equally high chance of type 7 or type 8 tasks arriving, then the dummy agent must pick according to some other criterion. While this may sound strange in the abstract, consider a medical example where type 7 tasks are patients with broken bones and type 8 tasks are patients with internal injuries; if resources in this case are doctors, then finding specialists in both injuries may be difficult. Our implementation does not address this problem, instead simply choosing a resource to reserve based on which resource is best suited for the most important expected incoming task. A more detailed approach would be to have the agent consider the expected value of reserving each potential resource, given its beliefs about the distribution of incoming tasks and the skills of each doctor. We discuss this option in more detail in Section 7.

4.2 Irreversible allocation

As described previously, easy access to preemption greatly softens the impact of any mistakes made when allocating resources, as a task must always yield its resource to another task of sufficiently greater need. However, not all problem domains allow for such easy reallocation. Consider the scheduling of underground construction for a water company, a problem domain where tasks involve tearing up sections of roadway and shutting off water flow in order to perform maintenance. While a resource here is a construction crew, work cannot be interrupted in the middle of replacing a section of underground water pipe, as the interruption in service is unacceptable. Even if a pipe were to break elsewhere, taking the construction crew away from the job in progress would leave just as bad of an emergency at the unfinished job site. In these domains where allocation is irreversible (until the task is complete), there is far less leniency towards bad initial allocations that ignore new incidents.

In operating systems, a computational task that requires exclusive use of a resource and cannot be interrupted by others is referred to as an atomic task. In a general-purpose computing environment, arbitrary computational tasks may depend on each other in ways that make it dangerous to preempt complex resources (e.g. printers, communication channels) from tasks in progress. While modern multi-tasking allows arbitrary processes to be paused and resumed with no loss of state, external resources may carry their own associated state which is not automatically managed by multi-tasking. This is an analogous case of irreversible allocation, and problems such as priority inversion (where a low-priority task holds a resource required by a high-priority task while a medium-priority task uses all the computational time and prevents the low-priority task from finishing) and deadlock (two tasks each need two resources to finish, each task holds one resource, and neither task is willing to abandon its resource, thus neither task can finish) arise in computational environments from poor allocation.

While the system described thus far cannot suffer from priority inversion or deadlock as each task is assumed to require only one resource, high-priority tasks can still be forced to accept low-quality resources if they arrive at a time when all the suitable resources are already allocated. Long-term plans can be valuable in such situations, as a perfect resource which will be available shortly may be preferable over a terrible resource available now, at least for the task that actually receives the perfect resource out of all those competing for it.

4.2.1 Switching costs

The applicability of the dummy agent depends on whether the environment permits preemption. While not all domains permit perfect preemption, preemption is not a binary feature of an environment; reallocating resources between tasks in progress can be expensive without being impossible, and this can be modelled through switching costs.

In computer scheduling, switching costs typically refer to the cost associated with recording the current state of a process and loading the state in a stored process in order to switch execution from the first process to the second. Psychology also uses the term switching cost for the cognitive effort required for a human to stop performing one task and start another, with distractions causing undue effort as they cause subjects to lose their place in an ongoing task.

Resetting task progress whenever a preemption occurs serves as a kind of switching cost, though as delays have different utility impact depending on task severity, the magnitude of this cost varies greatly. Actual difficulty in switching between tasks can be modelled as an additional cost; making such a cost independent of task type allows for analysis of a gradient of problems between a preemption environment and an environment where all allocations are irreversible.

Equation (6) of Section 3.4 does not account for switching costs, as they are independent of the available variables involved in the utility formula (old completion time, new completion time, and task type). However, switching costs can be implemented with a simple adjustment where the baseline utility for preemption requests is calculated; applying a switching-cost penalty here to any case where a task leaves a resource will adjust the behaviour of the model to reflect these switching costs. To make this modification in the system of [29], which we build upon, all that is needed is an additional cost added on line 17 of Algorithm 8 (the core algorithm used in that paper to compute preemption costs). The only other adjustment required is for evaluation purposes, as whatever mechanism used to record the total utility cost incurred by the system must correctly apply switching costs.

Consider Eq. (6) above, which simply examines the expected cost of preemption and approves the preemption request when positive. Every additional cost placed on preemption will reduce the number of cases where this value is positive, making the system more conservative with changing allocations. As the cost increases, eventually the switching cost exceeds any possible benefit from preemption, at which point allocations are irreversible.

4.2.2 Changes in allocation strategy

When resource allocation is irreversible, there is value in spending more time to work out a good allocation. Environments that permit preemption allow sub-optimal allocations to improve through a sequence of trades, encouraging the quick selection of a simple reasonable allocation which can be improved upon later. Opportunity cost becomes far more significant, as a system operating at full capacity (often necessary to maximize utility) is also incapable of responding quickly to new tasks. Preemption allows all currently-working resources to act as a reserve for emergency arrivals, so the lack of preemption makes the idea of reserve resources tenable again.

Our learning model for congestion is not meaningful in an irreversible environment, as to be expected since its learning events rely on preemption. The congestion model only modifies the backup plan value for use in approving preemption, so when preemption is removed this model cannot contribute. However, the model for churn is still valuable, as the ability to look several steps into the future is far more valuable when resources can only be claimed when idle. Asking resource proxy agents when a resource will be free again helps create good plans, but when this information is out of date (for example, if multiple tasks are implicitly queuing up to take a resource once it finishes its current task) the churn model allows a task agent to recognize when it is better to plan myopically and take whatever resources are currently available.

Bother modelling is no longer relevant in an irreversible environment (provided that idle resources are always willing to work), though the bother model can be used to bridge the difference between preemption and irreversible environments. Using $P_{\textit{stop}}$ functions that have high rates of rejection will make allocations more difficult to interrupt, encouraging similar behaviour as in a fully irreversible environment.

5. Validation

We begin with a test to confirm the value of our fundamental approach to multiagent resource allocation, as outlined in Section 3. We compare performance against a random assignment system (in which resources were assigned to agents in a random order, without considering desirability or agents’ priorities), and an idealized system in which the supply of resources is finite, but all resources are able to satisfy agents’ needs in a single timestep (whereas, in the real simulation system, agents’ needs are distributed uniformly at random, and take an average of 5 timesteps for a random resource to resolve). The simulations contained 50 resources, and 500 agent types. The needs of an agent were resolved by possessing a resource for a number of timesteps dependent on the resource and agent types (selected uniformly at random between 1 and 11 steps, at the start of a simulation). Fifty resources were used in all simulations, but the number of agents varied between 100 and 2,000. As shown in Fig. 1, the cost per agent (i.e. the product of wait times and an agent’s type) was far higher for the random system, and grew far faster for that system as well.6

Figure 1.

Costs per agent (the product of waiting time and an agent’s type) for the heuristic system, a random allocation, and an unattainable idealized system. Points are the average of 100 random allocation problems.

Figure 2.

Benefits of using a dummy agent plotted against number of agents that enter the system (evenly distributed over the first 50 ticks, 90% low-severity and 10% high-severity tasks). Smaller values are good. (All error bars show a 95% confidence interval. Figures are best viewed in colour.)

We then proceeded to demonstrate performance advantages for our proposed reservation system and its use of dummy agents, as outlined in Section 4.1. In order to indicate the importance of waiting, a resource-constrained simulation environment was used with dynamic arrivals of agents (each with an associated task) over time. When there are only a handful of resources available to process tasks, the impact on the system from having a single resource wait is relatively large and thus more likely to illustrate the benefits and failings of this approach. To further illustrate the difference, a skewed bimodal distribution was used: Tasks assigned to one of two types: minimum or maximum severity, with 90% of tasks at minimum severity. Minimum severity tasks had an associated cost of 1 unit per timestep of delay, while maximum-severity tasks had a cost of 500 per timestep. One simulated resource could complete a task of either type in 4 ticks, while all other resources required 6 ticks to complete a task; the expected lower bound on cost per task would thus be 500 $\times$ 4 $\times$ 0.1 $=$ 200 if minimum-severity tasks are ignored and all maximum-severity tasks are completed on the 4-tick resource without delay. The simulated dummy agent was granted 5 steps of perfect lookahead, and behaved as any other agent would when representing a task with severity equal to the highest severity in the lookahead window. This reduced scenario is ideal for a dummy agent, as there is a single preferential resource which will not always be occupied with maximum-severity tasks, and the minimum-severity tasks are inexpensive to delay when a resource is reserved.

Figure 2 shows the change in average cost7 with 5, 10, and 15 resources between using a dummy agent or not. Under these idealized conditions, adding a dummy agent allows the system to use its resources more efficiently, though not statistically significantly so (pairwise Welsh’s t-test, $n=$ 50, on each set of common parameter settings).

Unfortunately, it is the huge disparity in task priorities which allows this improvement in efficiency. Figure 3 shows the results of a similar simulation, differing only in the severities of incoming tasks: While 10% of tasks were still maximum (500) severity, the remaining 90% tasks were at medium (250) severity instead of minimum (1) severity. Once the baseline work available to the system is significant, the loss of system capacity due to the dummy agent accumulates over time and drives down system efficiency as the backlog of meaningful tasks grows. Similar results can be seen in Fig. 4 which uses a uniform distribution of task severities from 1 to 500. The distribution of available tasks can entirely decide the efficacy of this dummy agent.

Figure 3.

Benefits of using a dummy agent plotted against number of agents that enter the system (evenly distributed over the first 50 ticks, 90% medium-severity and 10% high-severity tasks). Smaller values are good. (All error bars show a 95% confidence interval. Figures are best viewed in colour.)

Figure 4.

Benefits of using a dummy agent plotted against number of agents that enter the system (evenly distributed over the first 50 ticks, tasks with uniformly distributed severity). Smaller values are good. (All error bars show a 95% confidence interval. Figures are best viewed in colour.)

A huge disparity in task priorities is necessary but not sufficient; the value of the dummy agent arises from the ability to respond to rare high-severity events. Figure 5 shows a simulation where tasks are evenly split between maximum (500) and minimum (1) severity rather than making maximum-severity tasks rare. A backlog of high-severity tasks leaves no available work for a dummy agent, as the high-severity tasks already in the system are assured to immediately receive the resources which finish performing work on other high-severity tasks. The resource constraint here works against the efficacy of the dummy agent, as the dummy agent can only provide value through reserving resources if there are resources available to reserve. When all resources are occupied by tasks of the same importance that the dummy agent represents, then the dummy agent cannot provide any benefit.

Figure 5.

Benefits of using a dummy agent plotted against number of agents that enter the system (evenly distributed over the first 50 ticks, 50% low-severity and 50% high-severity tasks). Smaller values are good. (All error bars show a 95% confidence interval. Figures are best viewed in colour.)

While there is reason to believe that the system will benefit from allowing resources to wait for more critical tasks, the practical benefit is difficult to provide when the environment permits preemption. The ability to revoke resources from tasks in progress reduces the opportunity cost to such a degree that only very specific scenarios permit improvement. Preemption is a valuable tool which should be exploited when available, but not all real-world scenarios permit preemption. To explore the full potential of the dummy agent, we will need to examine cases where preemption is difficult or impossible, resulting in irreversible allocations.

5.1 Validation: Revisiting dummy agents

In a system allowing preemption, there is some inherent inefficiency to the use of a dummy agent to reserve resources. A resource allocated to a dummy agent is incapable of performing useful work for the system, restricting the potential benefit; of course, as long as the dummy agent prevents low-priority tasks from wasting time by acquiring the resource and then losing it before completion, then there is a chance of benefit. Additional costs for preemption make this benefit easier to achieve, as preventing these switching costs justifies losing more work. However, this dynamic changes abruptly once allocations are irrevocable (as discussed in Section 4.2). As the dummy agent does not represent a real task, a resource can be allocated to the dummy agent without locking that resource to a task. This means that important tasks can immediately acquire a resource held in reserve by the dummy agent. If the resource had instead been allocated to a low-priority task, then an important task which arrives later is completely unable to access the resource until that low-priority task has completed.

Figure 6.

Benefits of using a dummy agent in the presence of switching costs (a), large switching costs (b), and irrevocable allocation (c), plotted against number of agents that enter the system (evenly distributed over the first 50 ticks). Smaller values are good. (All error bars show a 95% confidence interval. Figures are best viewed in colour.)

To construct a scenario where the dummy agent is likely to be important, assume the following: tasks require a sizable investment of time to complete, independent of their severity; critically-important tasks are rare enough to not usually create a backlog; and dummy agents have enough advance warning to reserve a slot if necessary. Figure 6a shows the results of a simulation run under these assumptions where every preemption carries an additional cost of 500 utility, thus requiring severity-500 tasks to provide some proof of benefit in order to preempt severity-1 tasks. Figure 6b shows a similar simulation with a switching cost of 1000 utility, making preemption even more difficult. Figure 6c shows the results of such a simulation where preemption is impossible. As expected, a single dummy agent makes more of a difference when there are fewer resources in the system, as it has the capability to reserve a larger fraction of the available resources. The impact is also much greater when preemption is impossible than when preemption is merely costly, and higher costs increase the potential benefit; while the 500-utility switching cost does not show a statistically significant difference between using a dummy agent or not (pairwise Welsh’s t-test, $n=$ 50, on each set of common parameter settings), the 1000-utility switching cost creates just enough of a difference to be significant, and the irrevocable scenario shows a clear improvement.

The assumption that all tasks require significant portions of time to complete is fairly reasonable. In many domains there are “maintenance” tasks that require a nontrivial amount of work and must be performed eventually but rarely pose a critical threat to performance. In a computing domain, disk defragmentation and system updates fit this role; in medical domains, consider cosmetic and reconstructive surgery as examples of uninterruptible low-priority tasks.

In domains where critical tasks are common enough to create a backlog, there is less value to be gained from reserving resources as the critical tasks will naturally acquire any resources that complete prior tasks. If a block of time is spent completing critical tasks back-to-back, the only improvement granted by the dummy agent exists at the beginning of this block where critical resources begin completion slightly earlier than they would otherwise. However, if critical tasks are rare then without a dummy agent it will be far more common to have all resources tied up with non-critical tasks whenever a critical task arrives, increasing the proportion of time spent waiting for non-critical tasks to complete. Removing this initial wait cost on multiple occasions produces the improvements visible in Figure 6c. Naturally, reserving resources is only useful when there is a reserved resource ready for a newly-arrived critical task. This is where knowledge of future arrivals can be useful.

6. Related work

The topic of multiagent resource allocation has been examined by a variety of researchers. Application areas include rich contexts such as manufacturing flow lines for small and medium sized enterprises [36, 37] and alleviating traffic congestion [38]. We will focus our discussion in this section on different approaches developed by artificial intelligence researchers, distinguishing those intended for competitive environments from ones aimed at enabling solutions in cooperative settings.

In this paper we imagine that there is a common aim among agents in the environment, for the allocation of tasks to have some overall benefit to the organization. In contrast, a host of researchers have focused their attention on examining theoretical properties of resource allocation solutions, specifically for competitive scenarios where agents are self-interested. Bachrach and Rosenschein [39] explore mechanisms with a central authority interacting through payment protocols in order to converge to an optimal solution, when the marginal production of each resource is diminishing. Roos and Rothe [40] are also interested in maximizing social welfare, discovering complexity results such as areas of intractability. Examining theoretical guarantees for optimal outcomes that minimize inequality is the focus of the work of Schneckenburger et al. [41]: the hurdles that arise in practice are discussed as well. One paper that begins to bridge the gap between competitive and cooperative contexts is that of Li and Soh [42]: this work emphasizes the value of communication and coordination between the agents, elaborating on the role of negotiation.

Our work aligns most closely with that of researchers who are focused on cooperative settings. Indeed, a primary motivation for our solution was the context of mass casualty incidents affecting emergency rooms at hospitals. We envision our system being especially applicable to medical scheduling problems of the sort considered by Paulussen et al. [22, 7], for example, those where a group of patients with varied conditions and health states need to be assigned to see a group of doctors in an order which minimizes the years of potential life lost. We are particularly interested in this approach because it uses a fully distributed method for the assignment of agents to resources, while also allowing for agents with great immediate need to acquire necessary resources (which may happen, for instance, with Mass Casualty Incidents).

One aim of our work is to improve upon the resource allocation solution developed by Paulussen et al. [22, 7]. Similar to Paulussen, we assume that each patient can be assigned an agent to negotiate on their behalf. But we move away from Paulussen’s proposal of providing agents with currency proportionate to their needs, as this can produce problems. For example, after several rounds, low priority patients may have amassed capital to acquire resources, even if global utility would be better served by treating the more critical patients first. We also seek to improve upon the conservative approach to allowing preemptions adopted by Paulussen et al. [7]. In particular, they utilize a worst case approximation of the opportunity cost of preemptions, and thus underestimate the value of the behaviours an agent would engage in following a preemption. It is assumed that cyclical preemptions would result in the last agent in the cycle (the “victim”) being left with no resource at all, and do not consider what actions it could take beyond a failed attempt to acquire an already-claimed resource. This will naturally cause the preempted agent to underestimate its ability to acquire a new resource, and thus overestimate the costs associated with yielding its current resource to the preempting agent. In our solution, we strive to reason with more accurate estimates of the costs.

Another paper by Chapman et al. [15] takes a different approach, where resources actively seek out tasks to perform. This approach allows preemption as well as cooperation, as no work is lost if resources switch to new tasks and multiple resources can expedite a single task. However, the passive nature of tasks prevents this approach from rapidly responding to new arrivals, and the assumption that resources are identical removes the problem of determining which resource is best suited to a given task (focusing instead on spatial constraints). Our work focuses on these problems. We note our approach is specifically designed to handle heterogeneous resources, in direct contrast with the distributed, dynamic multiagent proposals of other artificial intelligence researchers, which operate under the assumption of homogeneity, such as that of Parker and Gini [4].8

7. Conclusions and future work

In this paper, we presented a novel approach to dynamic resource allocation when task switching is costly or impossible, using a “dummy agent” to reserve resources that might be useful later, if new urgent tasks arrive in the system. We showed that this simple addition can significantly improve performance, while maintaining the appealing properties of the underlying dynamic resource allocation system from [29, 44], such as its lack of currency, decentralized nature, and adaptability. This represents an important improvement in the range of situations the system can handle, significantly improving its real-world applicability. This research therefore sheds light on how multiagent systems solutions can be designed to enable effective teamwork and collaboration in challenging, dynamic environments.

Addressing allocation of resources in critical medical scenarios such as mass casualty incidents certainly requires being careful when committing a resource to a task (due to being unable to interrupt and reassign). Our work has demonstrated the potential benefits introduced by allowing for reservations on some of the most valuable resources; further, since we integrate into our solution a modelling of bother (which can equate as well with fatigue, for humans), we take into account true real-world concerns, when determining when to initiate the social interactions that will end up providing effective allocation solutions.

In our treatment of preemptions, we have examined more closely how to enable future planning for resources which may be needed, but where the preemption act is not yet imminent. This has been achieved through our concept of reservations and we have worked out in detail how the agents can reason appropriately in order to enable the resource allocation to proceed effectively. Introducing dummy agents proves to be particularly valuable. Central to our overall solution we have as well the dual threads of planning and learning: we are sensitive to dynamic changes in churn and congestion. Communication between agents is also essential for our coordination to succeed: the elements required for effective planning within the system are shared and thus can influence the decision making process.

Multiagent resource allocation has been proposed for a host of other social situations. Natural disaster scenarios are the focus of the work of [45], where spatial and communication constraints need to be addressed when determining how to perform the allocation. Supporting consumers in service selection for computing resources is the backdrop of the effort by [46], which promotes a distributed solution to cope with dynamic change, in the face of imperfect information. Enabling interaction between robots and humans motivates the research being done by a host of researchers (e.g. [47, 4, 48]), with approaches that introduce sequential auctions to negotiate for optimal solutions, support long term coalitions sharing utilities with other teams, and address tolerance of failures through a centralized lookahead. What is not considered in these solution frameworks is addressing the overarching concern of committing a task to a critical resource, leaving that resource unavailable for a more demanding task that arrives dynamically soon after, a central focus of the work we present in this paper. For future work, we could imagine examining the applications mentioned above more closely, possibly integrating into our proposed solution new methods for learning and planning which are ideally suited for these other social scenarios. It may also be useful to examine our research as part of a larger effort to enable teamwork in health care settings, considering related work such as [49].

There are several clear extensions for our research. First of all, because the existing algorithms are fairly computationally efficient (capable of resolving a system with 5000 tasks and 100 resources during one test), future improvements can freely focus on adding more detail to the model to improve the utility of allocations. For example, we could enhance the learning and decision making capabilities of the dummy agents, or of the task agents with more computationally complex approaches.

The “dummy agent” as presented in this paper has room for expansion. The algorithms provided rely on oracular knowledge of incoming tasks, which is justifiable in some domains for a short horizon (provided there is a gap between becoming aware of a necessary task and becoming capable of addressing the task). There are clear paths for advancement here: First, this limited knowledge of the future can be harnessed more fully in an attempt to derive benefit in more general scenarios than the selection provided; second, the oracular assumption can be dropped in favour of a predictive approach which uses prior knowledge to anticipate future task arrivals and take action based on imperfect information. In this latter case, it would be important to solve the problem of selecting the optimal resource to reserve in cases where the specific type of expected task is uncertain, as an ideal algorithm would reserve resources that are suited to incoming tasks. Another potential expansion would be the introduction of multiple dummy agents, thus allowing the system to keep more than one resource reserved; these dummy agents would need to coordinate with each other in order to avoid reserving more resources than there are expected task arrivals.

Footnotes

For a more detailed view of this system and its validation in comparison with competitors see [].

We assume human resources, which might decline to fulfill frivolous requests when their BSF is high.

Per our assumption of human resources, time ticks in plans are identical to time ticks used to measure task completion (i.e. resource requests are processed on the same timescale as task completion).

These parameters are presented generically for completeness, but we use fixed values in our experiments.

For the implementation in this Section, assume that $c=c^{\prime}=$ 1 and $\alpha=\alpha^{\prime}=$ 0.9 as is the case in []. These parameters are defined in Section 3.4.

Note that these results are consistent with previously published results for a similar framework [].

Cost is simply a negative utility incurred by delays in processing. Costs are averaged over all tasks, over all ticks.

Note however, that heterogeneous solutions have been offered for some specific problem domains, like grid computation scheduling [].

Acknowledgments

The authors gratefully acknowledge support from the Natural Sciences and Engineering Research Council of Canada.

Authors’ Bios

	Graham Pinhey completed his M.Math in Artificial Intelligence (2014) at the University of Waterloo in Waterloo, ON, Canada. He received an NSERC Alexander Graham Bell Master’s Canadian Graduate Scholarship in 2013 to fund work on his Masters thesis on using reservations to improve handling of dynamic task arrivals in multiagent systems.
	John A. Doucette is an Assistant Professor in the Division of Natural Science at New College of Florida, in Sarasota, FL, USA. His doctoral work received thesis prizes from the Canadian Artificial Intelligence Association the University of Waterloo’s Faculty of Mathematics in 2016. His research interests include topics in machine learning, computational social choice, multiagent coordination, and privacy enhancing technologies.
	Robin Cohen is a Professor in the David R. Cheriton School of Computer Science at the University of Waterloo in Waterloo, Ontario, Canada where she has been a faculty member since 1984. She served as Associate Dean of Research in the Faculty of Mathematics at the University of Waterloo from 2011–2014 and has held a number of senior administrative positions at the university. Prof. Cohen has been an active member of the AAMAS and Artificial Intelligence communities since their inception. She became a Senior Member of AAAI in 2014. She is also a founding member of the User Modeling community. Her current research interests include trust modeling in multiagent systems and coordination in cooperative multiagent systems.

References

Lai

Rasmusson

Adar

Zhang

and Huberman

B.A.

, Tycoon: An implementation of a distributed, market-based resource allocation system, Multiagent and Grid Systems 1(3) (2005), 169–182.

Chevaleyre

Dunne

P.E.

Endriss

Lang

Lemaitre

Maudet

Padget

Phelps

Rodriguez-Aguilar

J.A.

and Sousa

, Issues in multiagent resource allocation, Informatica 30(1) (2006), 3–31.

Sandholm

, Contract types for satisficing task allocation, in: AAAI Spring Symposium Series: Satisficing Models, March 23–25, 1998, pp. 68–75.

Parker

and Gini

, Tasks with cost growing over time and agent reallocation delays, in: Proceedings of AAMAS 2014, May 5–9, 2014, pp. 381–388.

Edwin

and Cox

M.T.

, Resource coordination in single agent and multiagent systems, in: Proceedings of the 13th IEEE conference on Tools with Artificial Intelligence, November 7–9, 2001, pp. 18–24.

Vermeulen

Bohte

Somefun

and Poutré

, Multi-agent pareto appointment exchanging in hospital patient scheduling, Service Oriented Computing and Applications 1(3) (2007), 185–196.

Paulussen

T.O.

Zöller

Heinzl

Pokahr

Braubach

and Lamersdorf

, Dynamic patient scheduling in hospitals, in: Proceedings of Multi-Konferenz Wirtschaftsinformatik, Agent Technology in Business Applications Track, March 9–11, 2004, pp. 149–174.

Einav

Feigenberg

Weissman

Zaichik

Caspi

Kotler

and Freund

H.R.

, Evacuation priorities in mass casualty terror-related events: Implications for contingency planning, Annals of Surgery 239(3) (2004), 304–310.

Zhang

and Vasilakos

A.V.

, A survey of self-organization mechanisms in multiagent systems, IEEE Transactions on Systems, Man, and Cybernetics: Systems 47(3) (2017), 441–461.

10.

Shehory

and Kraus

, Methods for task allocation via agent coalition formation, Artificial intelligence, 101(1–2) (1998), 165–200.

11.

Zheng

and Koenig

, Reaction functions for task allocation to cooperative agents, in: Proceedings of AAMAS 2008, May 12–16, 2008, pp. 559–566.

12.

Scerri

Pynadath

D.V.

and Tambe

, Towards adjustable autonomy for the real world, Journal of Artificial Intelligence Research 17(1) (2002), 171–229.

13.

Abdallah

and Lesser

, Learning the task allocation game, in: Proceedings of AAMAS 2006, May 8–12, 2006, pp. 850–857.

14.

De Weerdt

Zhang

and Klos

, Distributed task allocation in social networks, in: Proceedings of AAMAS 2007, May 14–18, 2007, pp. 76–84.

15.

Chapman

A.C

Micillo

R.A.

Kota

and Jennings

N.R.

, Decentralised dynamic task allocation: A practical game: Theoretic approach, in: Proceedings of AAMAS 2009, May 10–15, 2009, pp. 915–922.

16.

Wang

and Liu

, Ant colony optimization for task allocation in multi-agent systems, China Communications 10(3) (2013), 125–132.

17.

Macarthur

K.S.

Stranders

Ramchurn

S.D.

and Jennings

N.R.

, A distributed anytime algorithm for dynamic task allocation in multi-agent systems, in: Proceedings of AAAI 2011, August 7–11, 2011, pp. 701–706.

18.

Dos Santos and Bazzan

A.L.C.

, Distributed clustering for group formation and task allocation in multiagent systems: A swarm intelligence approach, Applied Soft Computing 12(8) (2012), 2123–2131.

19.

Sousa

Ramos

and Neves

, The fabricare scheduling prototype suite: Agent interaction and knowledge base, Journal of Intelligent Manufacturing 14(5) (2003), 441–455.

20.

Wei

Vasilakos

A.V.

Zheng

and Xiong

, A game-theoretic method of fair resource allocation for cloud computing services, The Journal of Supercomputing 54(2) (2010), 252–269.

21.

Feldmann

Gairing

Lücking

Monien

and Rode

, Selfish routing in non-cooperative networks: A survey, in: Proceedings of MFCS 2003, August 28–September 1, 2003, pp. 21–45.

22.

Paulussen

T.O.

Jennings

N.R.

Decker

K.S.

and Heinzl

, Distributed patient scheduling in hospitals, in: Proceedings of IJCAI 2003, August 9–15, 2003, pp. 1224–1232.

23.

Hajaj

Dickerson

J.P.

Hassidim

Sandholm

and Sarne

, Strategy-proof and efficient kidney exchange using a credit mechanism, in: Proceedings of AAAI 2015, January 25–30, 2015, pp. 921–928.

24.

Cramton

Shoham

and Steinberg

, An overview of combinatorial auctions, ACM SIGecom Exchanges 7(1) (2007), 3–14.

25.

Krishna

, Auction Theory, Academic Press, 2009.

26.

Wurman

P.R.

Wellman

M.P.

and Walsh

W.E.

, A parametrization of the auction design space, Games and Economic Behavior, 35(1–2) (2001), 304–338.

27.

Krysta

Manlove

Rastegari

and Zhang

, Size versus truthfulness in the house allocation problem, in: Proceedings of ACM EC 2014, June 8–12, 2014, pp. 453–470.

28.

Fatima

S.S.

and Wooldridge

, Adaptive task resources allocation in multi-agent systems, in: Proceedings of AGENTS 2001, May 28–June 01, 2001, pp. 537–544.

29.

Doucette

J.A.

Pinhey

and Cohen

, Multiagent resource allocation for dynamic task arrivals with preemption, ACM Transactions on Intelligent Systems and Technology, 8(1) (2016), 3.

30.

Schlegel

and Kowalczyk

, Towards self-organising agent-based resource allocation in a multi-server environment, in: Proceedings of AAMAS 2007, May 14–18, 2007, pp. 74–81.

31.

Lesser

and Sim

K.M.

, Strategic agents for multi-resource negotiation, Autonomous Agents and Multi-Agent Systems 23(1) (2011), 114–153.

32.

Cheng

M.Y.K.

and Cohen

, A hybrid transfer of control model for adjustable autonomy multiagent systems, in: Proceedings of AAMAS 2005, New York, NY, USA, July 25–29, 2005, ACM, pp. 1149–1150.

33.

Sandhlom

T.W.

and Lesser

V.R.T.

, Coalitions among computationally bounded agents, Artificial Intelligence, 94(1–2) (1997), 99–137.

34.

Cohen

Cheng

M.Y.K.

and Fleming

M.W.

, Why bother about bother: Is it worth it to ask the user? in: Proceedings of the AAAI Fall Symposium on Mixed-Initiative Problem-Solving Assistants, November 4–6, 2005, pp. 6.

35.

Scerri

Pynadath

D.V.

and Tambe

, Why the elf acted autonomously: Towards a theory of adjustable autonomy, in: Proceedings of AAMAS 2002, July 15–16, 2002, pp. 857–864.

36.

Barenji

A.V.

Barenji

R.V.

Roudi

and Hashemipour

, A dynamic multi-agent-based scheduling approach for smes, The International Journal of Advanced Manufacturing Technology, 89(9–12) (2017), 3123–3137.

37.

Barenji

A.V.

and Barenji

R.V.

, Improving multi-agent manufacturing control system by indirect communication based on ant agents, Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering 231(6) (2017), 447–458.

38.

Desai

Loke

S.W.

and Desai

, Cooperative vehicles for robust traffic congestion reduction: An analysis based on algorithmic, environmental and agent behavioral factors, PLoS One, 12(8) (2017), e0182621.

39.

Bachrach

and Rosenschein

J.S.

, Coalitional skill games, in: Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems-Volume 2, International Foundation for Autonomous Agents and Multiagent Systems, 2008, pp. 1023–1030.

40.

Roos

and Rothe

, Complexity of social welfare optimization in multiagent resource allocation, in: Proceedings of AAMAS 2010, May 10–14, 2010, pp. 641–648.

41.

Schneckenburger

Dorn

and Endriss

, The atkinson inequality index in multiagent resource allocation, in: Proceedings of AAMAS 2017, May 8–12, 2017, pp. 272–280.

42.

and Soh

L.-K.

, Hybrid negotiation for resource coordination in multiagent systems, Web Intelligence and Agent Systems: An International Journal 3(4) (2005), 231–259.

43.

Lee

, Resource Allocation and Scheduling in Heterogeneous Cloud Environments, University of California, Berkeley, 2012.

44.

Pinhey

Doucette

J.A.

and Cohen

, Distributed multiagent resource allocation with adaptive preemption for dynamic tasks, in: Proceedings of AAMAS 2014, May 5–9, 2014, pp. 1441–1442.

45.

Zhang

and Bai

, A dynamic coordination approach for task allocation in disaster environments under spatial and communicational constraints, in: Proceedings of MIPC 2014, July 28, 2014, pp. 35–41.

46.

Jacyno

Bullock

Payne

and Luck

, Understanding decentralised control of resource allocation in a minimal multi-agent system, in: Proceedings of AAMAS 2007, May 14–18, 2007, pp. 1243–1245.

47.

Heap

B.G.J.

and Pagnucco

, Repeated sequential auctions with dynamic task clusters, in: Proceedings of AAAI 2012, July 22–26, 2012, pp. 1997–2002.

48.

Tsalatsanis

Yalcin

and Valavanis

K.P.

, Dynamic task allocation in cooperative robot teams, International Journal of Advanced Robotic Systems 6(4) (2009), 309–318.

49.

Amir

Grosz

Stern

and Sanders

, Information sharing for care coordination, in: Proceedings of HAIDM 2013, May 7, 2013, pp. 82–88.