Abstract
Interaction protocols are commonly used in agent-based systems. They ensure good coordination between agents by proposing a specific message exchange pattern. However, these interaction protocols are not perfect; they need more extensions to offer, among others, better performance and scalability, mainly when tight deadlines are involved. In this case, participants often fail to answer some requests before their deadlines due to overload, bottlenecks, slow network, or being busy or blocked. Designing agents without considering this issue may decrease their sociability, which wastes valuable chances to obtain the best goals. The proposed approach uses the participant's experience to train supervised learning models to predict if the replies will reach initiators before deadlines or not, thereby enabling a prioritization mechanism for handling interaction requests more effectively. The proposed approach has been evaluated using multiple Contract Net interaction scenarios of two case studies under the JADE platform. The promising results show a significant increase in agents’ sociability measured by a new metric that we have proposed called Sociability Degree via Interaction Protocols (SDIP) where it was maintained even when systems scale up in term of number of agents and initiated interactions.
Keywords
Introduction
Agent oriented paradigm offers a great solution to solve problems that cannot be handled by monolithic systems. Developing multiagent systems (MAS) was the interest of many researchers in various domains. In consequence, several multiagent systems development methodologies and platforms have been proposed in the literature1,2 to facilitate and support their design and implementation. However, maintenance support for these systems has not been sufficiently addressed.3,4
Every system has to be maintained at some point, whether for corrective, perfective, preventive or adaptive purposes. Software maintenance and evolution in general still a hard, complex and costly phase of the agent-based application life cycle. 5 However; it is more economical to maintain current code than substitute it with a new one.6-9 Enhancing multiagent systems 10 has been the interest of many researchers even recently by focusing on changing agents’ architectures, protocols 11 and organizations or improving agent properties like deliberation,12-14 cooperation, 15 matchmaking, 16 communication, 17 interaction protocols, 18 reactivity, 19 etc.
In multiagent systems, agents interact to solve problems. For better interactions, a variety of protocols are usually used. These interaction protocols have some limitations which appear under different circumstances whether because of an increased number of agents,20-23 slow or overloaded network, blocked or busy agent states.
When an agent receives multiple interactions’ requests in a short time, it may not be able to answer some of them before reaching the specified deadlines. By default, the agent does not have any responding policy which prioritizes certain messages than the others. This means that the requests are served based on their receiving order (FIFO), which can be a cause for many interaction failures. 24
The proposed approach for enhancing the agent's sociability through interaction protocols consists on ordering the received requests according to their deadlines. Participants use supervised learning models trained using past data, to predict if the received request has a chance to be responded before reaching its deadline. Therefore, the requests can be classified into three categories: (i) requests without deadlines, (ii) requests with deadlines which are predicted to be failed interactions, and (iii) requests with deadlines which are predicted to be successful interactions. The suggested solution has been validated with several scenarios on two case studies under JADE. 25 The results which are very promising show a great agent's sociability enhancement via the Sociability Degree via Interaction Protocols (SDIP) metric that we have proposed.
The remaining of this paper is structured as follows. In section 2, a brief overview of major related work is given. The problem statement and the proposed approach to enhance agent sociability are introduced in section 3. Section 4 validates and discusses the proposed approach. Section 5 gives some conclusions and future work directions.
Literature review
Multiagent systems’ evaluation and enhancement has been the topic of many researches. Recent works have proposed methodologies and approaches to evaluate them,26-28 while others proposed approaches to enhance some of their properties.29-35 Integrating machine learning algorithms in these systems to enhance agents has been proposed even recently.36-42 Unfortunately, none of them addresses the agents’ sociability.
This study is about enhancing the agents’ sociability when it is implemented using interaction protocols. This exact topic has received little attention. Therefore, to remain within the scope of this study, only the researches related to enhancing interaction protocols in terms of their performance and scalability will be examined.
In, 18 authors proposed a semi-centralized protocol, which uses a dedicated central agent with specific roles. The proposed protocol extends the well-known Contract Net Protocol (CNP) to be used for efficient and trusted negotiation task in multiagent systems with low communication overhead. They introduced a new agent that plays the role of the system manager. Initiators ask the manager for a list of participants with good reputations only, to reduce the possibility of failing while implementing the task assigned. Their proposed protocol shows significant improvement in time and communication overhead under various conditions. This enhancement was not addressed for a scalability study; therefore, their proposal needs to be verified with large scale systems, especially when there is a centralized manager agent that can be a cause for bottlenecks.
In, 43 the authors have proposed and implemented an updated version of CNP called updated-CNP in order to save the overhead of restarting the process of CNP execution between a set of agents willing to coordinate with each other. They have added an additional step into a conventional CNP. As long as the final report is not sent yet, any changes are allowed during the interaction. They justified their approach using a case study of predator-prey system produced with platforms JADE and JATLite. The updated-CNP results were compared with the conventional one by using three parameters, updated tasks, task repetitions and communication overhead. Performance was enhanced in terms of reaching agreements when modifications are requested during the interaction, without repeating the whole process.
In, 44 the authors have proposed an improvement to Contract Net with Confirmation Protocol (CNCP). A procedure was developed for determining the participant's bidding threshold. The authors have specified a general description formula of participants’ DoA (Degree of Availability) and its weight coefficient with an enhanced function for bids evaluation. The results of their simulations show that the proposed improvement can enhance the performance of CNCP in large scale multiagent systems effectively.
To make Contract Net scalable in large grid resource allocation, the authors of 45 have suggested several methods: (i) to handle increased number of resources, a queue length based announcement strategy was proposed. (ii) In order to handle increased number of users, a decentralized broker system in which the brokers communicate with each other was proposed. The two strategies are combined to show that the resulting combined strategy can handle both large number of users and resources. They presented detailed simulation results to show that their approach significantly improves the scalability of the system.
To solve Contract Net protocol's scalability issue, the authors in 46 suggested an instance-based learning system (IBL) that would use previously stored instances to pick a target agent. It avoids the expensive bidding task. The strategy was applied in a virtual centralized healthcare network that uses the Contract Net protocol for the distribution of services across hospitals. Experimental tests show that the machine performance was substantially increased with the use of the IBL. In regards to the number of tasks the system is more scalable.
The authors in 24 have extended the Contract Net protocol to: (i) allow parallel negotiations, (ii) optimize negotiation time, (iii) reduce the contractors’ decommitment situations, (iv) detect failures and prevent negotiations with blocked agents. The first CNP limit they described is when concurrent CFPs are received which will be answered sequentially. Adding their deadlines with the participant's possible blocked state will waste some contracts for them. The authors have introduced new communications primitives to the Contract Net protocol, in order to create interactions’ states which prevent the agent from facing faulty situations. The proposed approach was evaluated using four experimental groups. Each one has two sub experiments with different number of initiators and participants. The used performance metric is the negotiation time which was indeed reduced using the proposed approach. Their work addresses the task allocation problem where participants propose their availability times to initiators. Their availability can be changed depending on the response of the initiators and the pre-scheduled tasks. The context of our study is much wider than theirs, and addresses any interaction protocol which has deadline constraints in large scale systems.
In, 47 the authors have proposed a formal model which can handle temporal aspects in a CNP interaction. In their study, CFPs’ deadlines have been taken into consideration in order to prevent failures during interactions by using Timed Colored Petri Nets to model the CNP process. The authors have distinguished three possible situations for proposals’ arrival: (1) all the proposals have been received before the deadline, (2) some of the proposals have been received after the deadline, and (3) all the proposals have been received after the deadline. According to these situations, initiators and participants can change their states by canceling and exiting the interaction to avoid blocking situations. This study is the only one that focused on the temporal aspect of the CNP along with 24 and, 45 unfortunately, the authors did not intend to increase the number of proposals received before the deadlines but only to manage agents’ states.
The presented researches have not investigated the sociability property of the agents during their interactions. In our knowledge, this paper is the first investigating the problem and showing the capability of a participant agent to respond to several incoming Contract Net interaction requests (initiations) before their deadlines. Based on previous interactions, the participant will be able to prioritize the received initiations using machine learning algorithms to maximize the number of interactions which are successfully established.
Table 1 provides a synthesized overview of the literature review, structured around ten carefully defined evaluation criteria aimed at facilitating a deeper understanding and comparison of the various studies. Does the study consider CFP deadlines by analyzing potential consequences of deadline violations and suggesting preventive measures? Does the proposed enhancement require changes in the structure of the original Contract-Net protocol? Can the participant participate to multiple interactions simultaneously in the study? Does the participant receive a large number of CFPs in the study? Does the proposed approach increase or decrease the number of participants for a given CFP in order to enhance the CNP? Does the proposed approach increase or decrease the chance to obtain a better proposal? Does the proposed approach have mechanisms to manage blocked or failure situations during the interaction? Does the proposed approach allow the initiator to avoid repeating task announcement if it fails to reach an agreement? Does the proposed approach reduce the CNP interaction time? Does the proposed approach increase the agent's sociability by regaining the lost initiations due to CFP's deadlines or other causes?
Synthesized overview of the literature review.
Synthesized overview of the literature review.
This section introduces the problem of using a basic FIFO strategy to handle interaction initiations in multiagent systems, which often leads to delayed or failed responses due to deadline constraints. To address this, a machine learning-based approach is proposed to prioritize interaction initiations based on their probability of success. The goal is to enhance agent sociability by allowing agents to intelligently select and respond to initiations that are more likely to result in successful interactions.
Problem statement
Multiagent systems offer distributed solutions to various problems, in where, agents can accomplish their goals by coordinating and interacting through interaction protocols like Contract Net (CNP), request, etc. Conventional interaction protocols are not perfect and need to be extended to better handle some systems’ situations. 48
Initiators can send initiations to a number of participants, like requests or call for proposals (CFP). These initiations may be bounded to a limited time called deadline, which is the waiting time before starting responses’ evaluation. Participants must answer before the deadline reaches, where sometimes, they do not have enough time to do that. Initiations can be delayed due to network problems or posted in the participant message queue for a long time if it was busy or blocked. 24
Participants respond to initiations based on their receiving order. In this case, they can serve an initiation without deadline, an expired deadline or with a deadline that is going to be expired before the answer reaches the initiator. In all these situations, valuable responding times have been wasted that could be used to serve other initiations when they still have enough time. In this case, both initiators and participants have lost possible interactions which might give the best offers (Figure 1).

Failed and succeeded interactions between initiators and participants.
A possible interaction scenario is presented in Figure 2. A participant agent receives four CFPs from different initiators to initiate their interactions. The participant can participate to one or multiple interactions simultaneously. In both cases, it cannot prepare proposals at the same time.

A possible interaction scenario with decreased agent sociability.
The CFPs are received at different times with different deadline constraints. The participant which has been unavailable during the reception of the first three CFPs decreases the remaining times for CFP_In2 and CFP_In3, where CFP_In1 has no time limit. The CFP_In4 arrives last during proposal creation for Initiator1, so it waits until the participant answers all the previous messages.
In the presented scenario, we suppose that there is no network delay and the CFPs are delivered quickly to the participant which can prepare one proposal in about 2 s. The remaining time to the deadline for each CFP is when the participant starts to serve it until the deadline reaches. Therefore, the more the CFP waits in the queue the less are the chances to reply before the deadline and to initiate a successful interaction. Increased numbers of CFPs affect others as well because more proposing times for some CFPs are more waiting times for others. If the network is slow then it already has taken some of the deadline constraints which make it even worse.
The participant in the presented scenario was able to answer to the first CFP_In1 which does not have any deadline constraint. This increased the waiting time of the CFP_In3 from 4 s when the participant was unavailable to 6 s after preparing a proposal for the previous CFP, adding 2 s to prepare its proposal makes it 8 s. Even if the delivery time is ignored, the deadline is still reached. So, in this case, it may or may not be succeeded. For the remaining CFPs, it was too late for the participant to respond to them, because their waiting time has exceeded the deadline.
In such scenarios, it is impossible to answer to all the CFPs in time, but managing them at first can increase interaction's initiation chance, which means more successful interactions and better agent sociability.
Many tools and methodologies have been proposed in the literature to support the development of multiagent systems. However, maintenance of existing multiagent systems, which is an important research area, is not explored enough.3,19,49
The proposed approach enhances agent sociability through interaction protocols and machine learning algorithms. The proposed approach allows the participant agent to avoid responding to three initiation cases while its message queue is not empty: (i) the initiation has no deadline, (ii) the initiation with an expired deadline, or (iii) the initiation with an expiring deadline before the response reaches the initiator.
The proposed approach's goal is to prioritize the received interactions requests based on their probability of success (Figure 3). According to the initiations types (CFPs), three priority levels have been proposed.

CFP message handling in the participant agent side using the proposed approach.
At first, two types of CFPs can be categorized in the agent default message queue (MsgQ1): (i) CFPs with deadlines called “
To predict if a CFPD has a chance to be replied before its deadline or not, the participant agent must save the history of its interactions. The saved data are used as training set for several machine learning algorithms. Next, these algorithms are validated to be able to select the best model for the prediction process (enhanced mode). While the interaction is in the enhanced mode, the participant keeps registering its history to retrain the algorithms afterwards (Figure 4).

Global overview of the proposed approach for continuous dataset updating.
The proposed approach consists of predicting if the current interaction initiation is going to be successful or not based on three attributes: Remaining_time, Reply_time and a class attribute called outcome.
The first attribute is the Remaining_time (Figure 5), which is the difference between the deadline time and the message's read time (when it is extracted from the message queue) as calculated in equation (1). If the system under enhancement is distributed on different machines, then these machines must have a synchronized time, which can be easily done using a unique time server.

The proposed attributes.
The second attribute is the Reply_time (Figure 5), which is the sum of two times “equation (2)”: (a) the proposing time, which is the necessary time to prepare only one proposal. It can be variable because it depends on the executed task and the used environment. Therefore, the mean of proposing times is calculated after each proposal to be used as the proposing time. (b) The second time is the delivery or the transmission time, which is also estimated. It represents the necessary time to deliver the proposal to the initiator. If the MAS platform does not record the delivery time like in the case of JADE then it can be obtained by calculating the time difference between the time of a message posting in the participant message queue and its sending time from the initiator by embedding this later in the sent CFP. For real distributed MAS over LAN or WAN a synchronized clock is required to obtain accurate timestamps in UTC (Coordinated Universal Time).
For distributed MAS over WAN where multiple networks can be deployed like GSM, satellite, wired…etc., where networks may suffer from perturbations, more attributes can be beneficial which describes the network state more efficiently. Using the external IP address and the initiator's AID can provide many possible new features via an API like container name, country, location, distance, ISP, network type…etc. Therefore, the features selection is what adapts the proposed approach to WAN distributed MAS with variable networks’ capabilities.
In this study, we already have many concerns to put together in order to validate the proposed approach. In consequence, only local interactions are considered, therefore the Reply_time attribute will not be as effective as the Remaining_time, unless the participant offers different services with different proposing times.
The third attribute is the class named outcome initialized with “0”. When the initiator responds back to the participant's proposal with an accept or a reject, this means that the interaction was successful and the outcome attribute gets updated with “1”.
The training set is collected during a normal interaction (Algorithm 1) where only initiations with deadline constraints are committed at this stage (Filtering step). For every initiation, the participant creates a data instance containing the calculated remaining time, reply time and the outcome of the received initiation (Figure 6). When the desired training set size is achieved then the participant starts the training phase.

The data collection phase of the participant agent.
Depending on the executed scenario and the stress of the MAS in term of CFP frequencies, the number of collected instances needed to trigger the training phase (“n”) may not provide high model's performance at this initial stage with normal interaction, because the dataset will inevitably exhibit a large class imbalance where the failure class dominates. This imbalance resulted from the FIFO policy in queue management on the participant side. Therefore, we either have to collect data with less stressed scenarios with logical CFPs frequencies or allow continuous dataset and model retraining during the enhanced mode. Even if the agent starts using a model with moderate performance, it continues to collect data, updates the dataset with more successful instances, and retrains the model. This iterative process progressively enhances the model's performance over time. Either ways, the circumstances of the scenarios will be replicated for the normal interaction and the proposed approach.
The ARFF file in
The training phase is presented in Algorithm 2 where the dataset is used to train and validate several machine learning algorithms (binary classifiers). The used classifiers are: voted perceptron, 51 logistic regression, 52 multilayer perceptron, stochastic gradient descent (SGD), 53 simple logistic,54,55 random forest, 56 random tree, 57 support vector machine using sequential minimal optimization (SMO),58,59 naïve Bayes 60 and Bayes networks 61 are validated using the 10 fold cross-validation method. The participant selects and saves the best model to be used later for predicting interactions’ outcomes.
The 10 fold cross-validation technique is used to obtain an estimate of the classifier's performance, by exploiting the entire dataset. This is achieved by doing several test iterations on different training and test sets, then averaging the results. The validation consists of randomly segmenting the initial data set into 10 disjoint subsets numbered from 1 to 10. At every test's iteration, only one subset is used as a test set while the other 9 subsets are used as a training set.
The enhancement phase
Algorithm 3 implements the sociability enhancement process. As mentioned previously, two types of CFP messages can be received by the participant: CFPD (with deadlines) and CFPND (without deadlines).
When a CFPD is received, the participant calculates the followings: The remaining_time by subtracting the current timestamp from the CFP's deadline timestamp. The delivery_time is computed as the difference between the CFP's post timestamp in the queue and its original send timestamp. The reply_time is estimated as the sum of the meanProposingTime and delivery_time.
The participant predicts the outcome of this CFPD using the Predict() function with the computed remaining_time, reply_time and the saved model. Each interaction is stored in CFP_Map by calling SaveCFP(), which records the CFP key, remaining_time, reply_time, the predictedOutcome, and an initial realOutcome set to 0.
If predictedOutcome equals 1, the participant starts the interaction immediately by calling StartInteractionProtocol(). Once the proposal is created (via create_proposal()), the actual proposal time is used to update the meanProposingTime dynamically, reflecting recent interaction speeds. If the initiator responds (checked via initiator_responded()), the realOutcome is updated to 1 using UpdateRealOutcome().
If the predictedOutcome is 0, the CFPD is postponed by enqueuing it into MsgQ2 (the postponed CFPs with deadlines queue). Any CFPND (without deadline) is directly enqueued into MsgQ3.
When the main queue MsgQ1 becomes empty, the participant starts handling postponed CFPDs from MsgQ2. Before responding, it verifies whether each CFPD's deadline has not yet passed (current_time() < cfp.deadline_timestamp). If valid, it proceeds similarly by starting the interaction, creating the proposal, updating meanProposingTime, and possibly updating the realOutcome.
If no CFPD remain and MsgQ1 stills empty, the participant processes CFPND from MsgQ3. These are handled without deadline constraints but still update meanProposingTime based on actual proposal times.
When all queues are empty, and the number of entries in CFP_Map exceeds or equals Retrain_trigger_threshold, and no ongoing interactions exist (all_interactions_completed()), the participant updates the dataset using UpdateDatasetFromCFPs(). Then it retrains the model via RetrainModel(). If the new model outperforms the previous one (model_performance_improved()), it replaces the model by calling ReplaceModel(), clears CFP_Map, and resets the Retrain_trigger_threshold to its original value N. Otherwise, the threshold is increased by the current size of CFP_Map, delaying retraining until more data is collected.
The following sequence diagram (Figure 7) shows the same scenario presented in the problem statement's section (Figure 2) but performed using the proposed approach. The participant agent begins to reply to the received CFPs when it is available. The participant replies to the CFP_In3 first, instead of the CFP_In1 which was filtered as a CFPND type and postponed for last. The CFP_In3 has 8 s before its deadline, where 4 s of them was wasted while the participant was blocked; this leaves only 4 s until the deadline as a remaining time. During this remaining time, the participant will spend approximately 2 s to send a reply leaving extra 2 s.

The previous interaction scenario using the proposed approach.
Based on agent history and the two attributes (Remaining_time and Reply_time), the participant has predicted that the reply to this CFP_In3 has a high chance of success and replies to it immediately. Using the same process for the next CFP_In2 in message queue, gives a remaining time of 1 s and a reply time of 2 s. In this case, the participant has predicted that the reply of CFP_In2 has a low chance and it is most likely going to fail. Therefore, the CFP_In2 was postponed for later and the CFP_In4 takes its place with a remaining time of 5 s, reply time of 2 s and a prediction of a successful reply. When the participant has no other messages waiting in the queue, it starts to reply to the postponed CFPD0. The CFP_In2 is the message to be replied at this step, which logically it should not be, because the deadline has already passed. If the postponed CFPD0 list was not empty, then it would be inappropriate to reply to CFP_In2, because it just wastes 2 s of reply time that will delay the next postponed messages (CFPD0). When all the CFPD have been replied, the participant answers to the CFPND messages postponed earlier. The count of the successfully replied CFPs using the proposed approach is 3 out of 4 replies. In the other case, only 1 message out of 4 is surely successful because one of the replies will arrive to the initiator approximately at the same time when the deadline reaches, so it could be failed after counting initiators reading tasks.
To validate the proposed approach, two case studies have been used. Both of them are implementations of the Contract Net protocol under JADE platform. In each case study, different experiments have been performed under different conditions to create multiple scenarios. The same circumstances for each scenario have been replicated for the normal interaction and the proposed approach for fair comparisons.
The first case study
In each scenario of this implementation, multiple initiators are created to send CFPs to the same participant, where each CFP message may or may not have a deadline constraint (randomly). Furthermore, a time delay of 1 s has been used between initiators’ creation to reduce the load. The participant agent takes about 2 s in average to only prepare the proposal message.
In the data collection phase, the participant agent interacts in a normal CNP to register its interactions’ history. When the training set reaches 1000 instances, the participant trains multiple machine learning classifiers (training phase) offered by Weka and validates them using the 10 fold cross-validation method. The used classifiers are logistic regression, multilayer perceptron, stochastic gradient descent (SGD), simple logistic, voted perceptron, random forest, random tree, support vector machine (SMO), naïve Bayes and Bayes network.
Weka offers different metrics to evaluate the classifiers’ performance. These metrics can be calculated using the confusion matrix which helps to visualize the distribution of the correctly and misplaced instances. Weka presents the confusion matrix differently than other sources. The rows in Weka's confusion matrix represent the instances in the actual class, while the columns represent the instances in the predicted class. Since this is a binary classification, only two classes are involved which are the interaction success class (a = 1) and the interaction failure class (b = 0).
The Weka's confusion matrix definition is presented in Table 2. If we suppose that the interaction success class “a” to be the positives and the interactions failure class “b” as the negatives, then: - TP (true positives) are the interaction correctly predicted to succeed, - FP (false positives) are the interactions incorrectly predicted to succeed, - TN (true negatives) are the interactions correctly predicted to fail, - FN (false negatives) are the interactions incorrectly predicted to fail.
The confusion matrix definition in Weka where actual elements in rows and predicted elements in columns.
The confusion matrix definition in Weka where actual elements in rows and predicted elements in columns.
Table 3 presents the confusion matrix of the logistic regression's 10 fold cross-validation. Using this matrix, we can find the number of correctly and incorrectly predicted interactions, which leads us to the first metric called the Accuracy. This latter is the ratio of the number of correct predictions on the total number of all predictions. In the case of this classifier, accuracy is 99.5%, where 995 interactions were correctly classified and 5 interactions were incorrectly classified.
The confusion matrix of the logistic regression classifier validated using 10 fold cross-validation.
Table 4 shows some detailed metrics for each class resulted from the validation of the logistic regression model using 10 fold cross-validation. The TP Rate (True Positive Rate) also called the
The detailed results for the 10 fold cross-validation of the logistic regression classifier.
Weka offers the weighted average of its rates which is calculated to give a single value for both classes “equation (3)”. The weighted average is used instead of a simple average because of the imbalance that can be present in the data set. Imbalanced classes mean that a given class has more instances than the other. Therefore, the number of instances is used as weights to calculate the weighted average. In the used training set there are 378 instances labeled as class “a” (Weighta) and 622 labeled as class “b” (Weightb).
The FP Rate (False Positive Rate) is the number of the incorrect predictions in a given class divided by the total elements of the other class. If class “a” is the positives then the FP rate is the number of interactions incorrectly predicted to succeed divided by the number of all the failed interactions.
The Precision which is also called the positive predicted value (PPV) can answer the question: how many selected elements are relevant? It is the number of the correct predictions of a class divided by the total of predictions of the same class.
The F-Measure is the harmonic mean of precision and recall which gives an overall score that depicts how well our model is performing? Weighted average F-Measure is calculated using the f-measures of both classes and not with the weighted averages of precision and recall.
Weka also offers very important metrics for each class which are the ROC (Receiver Operating Characteristic) and PRC (Precision- Recall) curves. ROC curve is constructed using TPR and FPR, when the PRC is created using precision and recall. The values presented by Weka for these metrics are called the Area Under Curve (AUC). These AUC values are obtained using the getROCArea() and getPRCArea() functions of weka.classifiers.evaluation.ThresholdCurve java class. The AUC is the probability that a classifier will predict an instance correctly.
In certain applications, one class can be more important than the other, so the classifiers must have some trade-off between metrics to be able to classify the desired class as best as possible even if the classification performance for the other class decreases to an acceptable limit (ex. healthy or diseased patient). In this study, predicting correctly in both classes is very important, because if the model classifies an actual fail interaction as a successful one then the participant will lose time in processing its CFP message, which propagates over the rest of the CFPs in the message queue. Therefore, to better describe the results of both classes in less space the weighted average of the previous metrics is presented.
Table 5 summaries the results of the 10 fold cross-validation method applied on the selected classifiers. The accuracy of the used classifiers was very promising starting from 93.1% with the naïve Bayes algorithm to 99.6% with simple logistic and voted perceptron. As reported in previous studies, accuracy can be misleading in imbalanced datasets. Consequently, comparing classifiers using ROC and PRC is more appropriate. The ROC is based on TP rate and FP rate, the best ROC values are found with high TP rates and low FP rates which are the cases with logistic regression, multilayer perceptron, simple logistic and random forest where the TP rate is more than 99.3% and the FP rate is less than 0.7% with AUC of 0.999. The PRC is based on precision and recall, which both of them must have high values for the PRC to be good which is the case for the same classifiers with also the best ROC values where there are more than 99.3% in precision and recall and 0.999 in PRC. Other classifiers also performed very well on this validation, noting that SMO and naïve Bayes have registered more false positives and false negatives than the other classifiers but they still have very good ROC and PRC values. Since multiple classifiers have the same performance measures, the logistic regression has been selected randomly among them to be used as the model for predicting interactions outcome in the enhancement phase.
The validation's results of the used classifiers using 10 fold cross-validation for the first case study.
In the enhancement phase, different interaction scenarios have been performed to validate the proposed approach. In each scenario (iteration) there was only one participant which receives one CFP from multiple initiators. The number of CFPs was increased by 20 CFPs in each new scenario, starting with 20 initiators in the first one to 500 initiators in the last one (25 iterations). The CFPs’ deadlines were randomly generated from 0 to 14 s then saved to be used twice in the same order during the same iteration, first for normal CNP's interactions and secondly for the proposed approach's interactions (each iteration has one scenario performed twice, which means 50 experiments in total). The 25 scenarios are totally random and different than each other. The participant registers its predictions and its corresponding real outcomes, to be able to evaluate the enhancement process. During these evaluations, the registered real outcomes are not used for updating the data set or the model.
The results of the normal CNP and the proposed approach are presented in Table 6. The improvement percentage defined by equation (4) has been used for each scenario to show the impact of the proposed approach on the sociability of the participant agent.
Replied CFPs using normal CNP and the proposed approach for the first case study.
The bold columns indicate the data used to calculate the improvement percentage according to Equation (4).
As expected in both approaches, all the CFPND without deadlines were replied successfully. Unfortunately, this was not the case for the CFPD. Only 1 to 15% of the CFPD were succeeded during the majority of the normal CNP scenarios. When the number of CFPs was increased, the number of succeeded replies in the normal CNP was very low and stable during all the scenarios. This is explained in the increased count of the failed replies, where the more CFPs were involved the more fails were obtained.
Taking the 460 CFPs scenario as an example: There are 32 CFPND (CFPs without deadlines) that are excluded from the evaluation, because they always succeed. This leaves us with 428 CFPD (CFPs with deadlines). From these 428 CFPD, only 3 of them succeeded using the normal Contract Net Protocol, while 115 succeeded with the proposed approach. So, going from 3 to 115 interactions means a 37x increase which corresponds to the 3733% improvement. Even though the proposed approach still lost 313 CFPs from the 428, this is still considered a promising result compared to the normal interaction, which only succeeded in 3 interactions.
In software quality of multiagent systems’ literature
In this paper, sociability is addressed from interaction protocols point of view, particularly focusing on an agent's ability to respond to interaction initiations (CFPs) in limited time. While this may initially appear narrow, the interaction protocols contribute to the concept of sociability by engaging with structured, goal-oriented communication patterns, such as the FIPA Contract-Net Protocol. These protocols go beyond simple ACL message exchanges and are central to enabling the major properties of sociability: cooperation, negotiation, and communication.
We give a formal definition for sociability in the context of interaction protocols by the capacity of an agent to preserve interactions. It means that the agent commits to an interaction before its deadline, ensuring that it is not wasted at the very first stage. Therefore, we propose a new metric named Sociability Degree via Interaction Protocols (SDIP) defined as follows:
- P is the set of all preserved interactions. - I is the set of all interaction opportunities manifested in the number received initiations.
How can identify the preserved interactions P?
By examining the FIPA interaction protocols 62 we can categorize these protocols into two categories. In the first category, after the participant submits a prosal, the initiator replies back, as in Contract Net protocol, FIPA Dutch Auction Interaction Protocol, FIPA English Auction Interaction Protocol.
In the second category, the initiator only sends the initial message without providing any further reply to the participant as in FIPA Propose Interaction Protocol, Request When Interaction Protocol, FIPA Request Interaction Protocol, FIPA Subscribe Interaction Protocol…etc.
Thus, for the first category where the initiator is expected to respond, it becomes possible for the participant to define and track the set of preserved interactions P, since these interactions involve a clear acknowledgment or feedback from the initiator.
Therefore, for each interaction i ∈ I, we define an indicator function Ri as:
Using this, we define the set of preserved interactions as:
For the second category, the preserved interactions set P for a given participant must be identified by every initiator for each initiation. The condition for preserving interaction is in this case is:
this case, For each interaction i∈ I, we define an indicator function Ri as:
Therefore, the SDIP is then given by:
This ratio quantifies the fraction of interaction opportunities that we successfully preserved by the agent, meaning the agent's proposals were actually received and answered by the initiator, thus confirming that these interactions were not wasted.
In Our Study, since that the initiator replies to the proposal of the participant then we were able to calculate the SDIP from the participant side using only CFPD.
In the proposed approach, the results were way better, where the participant was able to successfully reply to 30 up to 75% of the CFPD with a mean SDIP in all scenarios equal to 0.47. When the number of CFPs was increased, the successful replies were increased as well, even that they were sometimes decreased compared to previous iteration as shown in Figure 8. This is because each scenario has different deadlines in different order than the previous one.

Comparison of the number of succeeded replies between the normal CNP and the proposed approach for the first case study.
Compared to the normal CNP, the failed replies number has been decreased significantly as shown in Figure 9. In a percentage point of view, the participant maintained about 29% of the CFPD as its failed replies percentage. The improvement percentages in Table 6 show the effectiveness of the proposed approach in improving agent sociability up to 3700%, especially when the received CFPs number is increased.

Comparison of the number of failed replies between the normal CNP and the proposed approach for the first case study.
In this implementation, the participant offers 3 different services. (i) A data search service which takes 10 s to perform, (ii) a file download service which takes 20 s to perform and (iii) a file upload service which takes 30 s to perform. Initiators can request only one service with 1 or 2 min as deadlines (CFPD only). Furthermore, 10 s interval has been used between initiators’ creation (Figure 10). The validation has been performed in 8 iterations designated by CFPD number starting from 10 to 200 CFP. Three (03) scenarios have been performed for each iteration which means 24 different scenarios. Each scenario is performed twice for the normal CNP and the proposed approach (48 experiments in total).

A scenario of the second case study.
The previously selected classifiers were applied on this case study as well. These classifiers were validated using the 10 fold cross-validation method. This data set has 1035 instances which contain 568 instances labeled as class 0 and 467 instances labeled as class 1. Noting that the Remaining_time attribute in this case cannot predict the outcome alone because there are three different proposing times depending on the selected service. Therefore, the Reply_time attribute has more impact on the prediction process differently than the first case study. The validation results presented in Table 7 show very high performance scores in all the classifiers especially with logistic regression which is used as the prediction model for the enhancement phase.
The validation's results of the used classifiers using 10 fold cross-validation for the second case study.
The results in Table 8 show the scalability problem during the normal CNP evaluation. In this case, the participant couldn’t increase its sociability when more initiators (or CFPs) were involved. The percentage of the succeeded replies has been dropped from 70% in the first iteration to 2% and 3% in the last two iterations. In the contrary with the proposed approach, the participant has successfully replied from 50% to 70% of the received initiations during all the iterations. The improvement percentage of all the performed scenarios has been presented as well. The enhancement was noticed from the 3rd scenario of the first iteration with 40% performance increase up to 2800% performance increase in the 2nd scenario of the 7th iteration.
The successfully replied CFPs during the second case study.
To visualize the agent's sociability difference between the normal CNP and the proposed approach, the means of the succeeded replies’ percentages have been used (Table 9). In each iteration, the mean of the three scenarios is calculated then turned into a percentage using equation (11).
The mean percentages of the succeeded replies for the second case study.
As presented in Figure 11, when the number of CFPD was increased the normal CNP couldn’t stabilize the participant's sociability. The succeeded replies mean percentage was dropped quickly and remarkably from 63% to 4%. In the other hand, the proposed approach has recorded a slow and a slight drop in the participant's sociability which was maintained above 54% during all iterations.

Comparing mean percentages of the succeeded replies for the normal CNP against the proposed approach for the second case study.
In order to measure the performance of the proposed approach, a confusion matrix must be formed for all iterations using the predictedOutcome and realOutcome variables described before. Afterwards, recall, precision, accuracy and f-measure defined previously are used as performance metrics.
The obtained performance measures presented in Table 10 shows significant results in all the used metrics. Very few false positives and false negatives have been registered in this evaluation. Since the metrics are nearly stable through all interactions from 96.6% to 100% scores then the performance of the proposed approach is not affected by the increased number of messages in terms of correct predictions and not in terms of increased agent interactions. But, to evaluate the scalability and the performance limits of the proposed approach more tests are needed with higher numbers of messages.
Performance measures of the proposed approach for the first case study.
Performance measures of the proposed approach for the first case study.
The bold columns indicate the data used to calculate the improvement percentage according to Equation (4).
The results of the second case study were no different than the first one. Table 11 contains the confusion matrix of the three scenarios of each iteration. Only two false positives have been recorded in the first scenarios of the 5th and the 6th iterations and one false negative in the 3rd scenario of the 6th iteration. All the other outcomes were predicted correctly. The performance measurements were calculated and presented in Table 12 using the previous confusion matrices. The results are very promising showing high scores for the performance metrics in all the scenarios. These high scores do not mean that the predicting system is perfect or it is never wrong. But it means that this problem is very adequate for machine learning solutions with selecting the best model, the right attributes and a real training set gathered from experience.
Confusion matrix for each scenario for the second case study.
Performance measures of the proposed approach for the second case study.
The bold columns indicate the data used to calculate the improvement percentage according to Equation (4).
In machine learning, underfitting and overfitting are two important issues that must be avoided. Underfitting occurs when a model does not fit the training data well enough, which leads to a poor prediction. Overfitting occurs when a model fits the training data very well but performs poorly on unseen data. Under these definitions, it is clear that the selected logistic regression model is far from being under fitted because it did very well with the training data, and far from being overfitted as well because it performed very well on the unseen data like it did on the training data. The 10 fold cross-validation is one of many methods that can be used to avoid these problems. Therefore, all the used classifiers during the validation phase which have good scores are neither overfitted nor underfitted. Besides, the proposed approach allows the participant agent to update its dataset overtime so that the used model is always fitted and updated.
Conclusion and future work
Interaction protocols are widely used in multiagent systems. Their standard implementations are not perfect and need to be extended in many situations. A problem related to initiations’ deadlines has been presented, which decreases agent's sociability, wastes valuable interactions and can lead to non-optimal results when the system scales up. This problem did not get any attention from researchers despite its importance.
We presented, in this paper, a novel approach to enhance agent sociability by extending its interactions with machine learning. The responder agent will be able to decide if an interaction initiation has to be answered immediately or postponed based on its interactions history. Several interactions’ scenarios on two case studies have been performed where different machine algorithms have been validated using the 10 fold cross-validation method. The obtained results were very promising where the number of successful interactions for the tested participant agent has been increased significantly from 225% up to 3733% in the first case study and from 18% to 2800% in the second one. Furthermore, the selected classification model has performed very well during all the scenarios in terms of precision, recall and accuracy.
As future work, we plan to: (i) replicate the study first on other case studies, (ii) Incorporate more participants to evaluate if the learning phase has an impact on the system's performance, (iii) Evaluate the proposed approach on a distributed example through WANs to investigate more the effectiveness of the Reply_time attribute on the classification, and (iv) Coding the proposed approach in one separate module using aspect oriented programming for a better reuse and maintenance.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
