A testing framework for JADE agent-based software

Abstract

Multi-agent systems are proposed as a solution to mitigate nowadays software requirements: open and distributed architectures with dynamic and adaptive behaviour. Like any other software, multi-agent systems development process is error-prone; thus testing is a key activity to ensure the quality of the developed product. This paper sheds light on agent testing as it is the primary artefact for any multi-agent system’s testing process. A framework called JADE Testing Framework (JTF) for JADE platform’s agent testing is proposed. JTF allows testing agents at two levels: unit (inner-components) and agent (agent interactions) levels. JTF is the result of the integration of two testing solutions: JAT a well-known framework for JADE’s agent’s interaction testing and UJade, a new solution that was developed for agent’s unit testing. UJade provides also a toolbox that allows for enhancing JAT capabilities. The evidence of JTF usability and effectiveness in JADE agent testing was supported by an empirical study conducted on seven multi-agent systems. The results of the study show that: when an agent’s code can be tested either at agent or unit levels UJade is less test’s effort consuming than JAT; JTF provides better testing capabilities and the developed tests are more effective than those developed using UJade or JAT alone.

Keywords

Agent-oriented software engineering testing framework unit testing mutation testing multi-agent system JADE

1. Introduction

Like All human activities, software development is error-prone and multi-agent systems (MAS) development makes no exception. However, unlike other software, agent-based software involves additional characteristics such as autonomy, proactivity, reactivity, social-ability, and adaptability, which makes their verification a bit more challenging. M.Winikoff claimed that “One of the strengths of agents and multi-agent systems is that they are able to deal with a range of situations, balancing proactivity and reactivity as needed … However, how can I be confident that my system will work – reasonably-in all situations ?” [56].

Since its inception in the 80s, considerable research has been conducted on this topic. A recent mapping study on the verification and validation of emergent behaviour in software engineering [5] pointed out that multi-agent systems is the area where most of the research has been conducted so far. Of 168 identified papers, 73 (43%) were about multi-agent systems behaviours verification. Wide verities of verification techniques are used throughout the different stages of agent-based software’s life cycle from formal methods and model-based methods at design-stage [24, 37, 38, 57], to agent and agent-based system’s execution checking at runtime [1, 6, 54, 58], passing through testing and debugging at development stage [13, 19, 25, 44, 52, 60]. However, model-based methods and testing are the most used verification techniques [2]. A complete and detailed systematic literature review and mapping on multi-agent systems verification are proposed in [2].

Commonly, multi-agent systems testing is addressed on five levels [45], starting with the Unit level testing, in which all the units that make up the agent such as goals, knowledge, plan and reasoning engine are tested. Then, the Agent level testing is seen as an integration testing of the previous units, the agent’s capability to reach its goals and to interact with its environment is verified. After that, the Integration (Group) level testing, where the emerging properties and the collective behaviour resulting from the interaction of a group of agents are tested. At the fourth level, System (Society) level testing, the agent society as a whole is examined, the emergent and macroscopic behaviour alongside the quality properties (e.g., openness, fault-tolerance and Auto-X properties) supposed to be exhibited by the system are tested. Finally, the acceptability of multi-agent systems is tested at the Acceptation level.

In this paper, the primary interest is in the first two agent testing levels: unit-level and agent-level, in particular at JADE’s agent testing. JADE (Java Agent DEvelopment framework) [3], is one of the most used platforms for agent-based software systems development [32]. The specialised literature reviews along with open-code repositories searching for JADE’s agent test codes revealed that testing this type of agent is not well covered, the existing testing solutions focus only on the interaction testing and on the debugging of the exchanged Agent Communication Language (ACL) messages (that is the second level) [13, 25]. To the best of our knowledge, no working solution for effective agent’s unit testing exists so far.

It is believed that the JUnit framework can be used to test agent’s units [13] since JADE’s agent is simply a Java code. However, the specificities of JADE’s agent’s code make it hard and sometimes impossible to use JUnit alone. JADE’s programming principles recommended that agents’ internal logic (behaviours) and knowledge (beliefs) should be protected from outside access and manipulation [3]. Furthermore, an agent is more than a simple object, it is a complex entity. Its code is scattered on multiple classes and it is runtime coupled with the JADE platform. Testing its units effectively cannot be done without taking into account this code nature.

To handle these issues and more, a complete functional testing framework for JADE’s agent is proposed, named JADE Testing Framework (JTF). It allows to effective test JADE’s agent on two levels: on the unit-level via a subcomponent named UJade, build upon JUnit, Mockito and PowerMockito frameworks; and on the agent-level (interaction level), with The JAT4 sub-component, an enhanced version of JADE Agent Testing (JAT) framework [13], a well-known framework for JADE’s agent interaction testing.

Before proceeding further, it is worth noting that along this paper the terms agent testing and agent-level testing (or agent-level for short) are used as follows: agent testing denotes the task of testing a JADE agent functionality, as part of a multi-agent system development cycle, whereas, agent-level testing (or agent-level for short) refers to the second abstraction level, following (Nguyen et al. classification [45]), under which the agent is considered during the aforementioned testing task.

The remainder of this paper is structured as follows: The existing JADE’s agent testing solutions are presented in Section 2. Section 3 draws the requirements specification of an effective JADE’s agent testing framework. Section 4, is dedicated to the presentation of the proposed solution: detailing JADE testing framework architecture accompanied by a description of how test cases can be implemented using it. Then, to provide evidence of the framework’s usability and effectiveness, an empirical study, based on the mutation testing technique, was conducted on seven multi-agent systems. The experiment’s methodology, results, and threats of validity are presented in Sections 5, 6 and 7 respectively. Finally, Section 8 highlights the drawn conclusions and sets the headlines for future work.

2. Literature review

2.1 General JADE agent testing solutions (non-oriented development methodology)

The early work on JADE’s agent testing is JADE Test Suite [9] proposed by JADE’s development team.1 The framework was developed to allow building test suites effortlessly in a cheap and incremental way. It is based on a two-levels model. In the first level (unit), the atomic agent’s capabilities are tested such as the ability to create, send or receive messages, reply to an incoming request from another agent, etc. Then, at the second level (agent), the agent functionality (a group of a set of capabilities) is tested. Later, the framework was enhanced in the context of PASSI methodology’s works [12, 41], the Agent Factory tool [17] was integrated to allow the generation of mock agents, used to simulate the agent under test environment. JADE Test Suite is poorly documented and the existing document [16] is incomplete and does not provide information on how test cases are coded. Furthermore, no code for the use of the tool could be found.

R. Coelho et al. proposed JADE Agent Testing (JAT) framework for unit testing JADE-based multi-agent systems [13, 14]. Even though they called it unit testing but it is oriented agent interactions testing JAT adopt a role-based approach for test case definition, for each role of the agent under test, a mock-agent is defined. A JAT’s mock agent is a fake implementation of a real agent. It implements a test script in which the agent under test is stimulated by sending to it messages and then verifying the conformance of the received ones. JAT uses aspect-oriented techniques to monitor and control the asynchronous execution of the agents during testing. Besides, to ease the task of mocks implementation, JAT proposes a template-based mechanism that can generate automatically from an XML specification of an interaction protocol, a mock agent code.

Later, an adapted JAT version for BDI agent, named JAT4BDI, was proposed [18]. In this new version, monitoring capability was enhanced to observe the agent’s reasoning cycle while it is interacting. Additionally to verify agent’s communication, assertions on agent’s goals, beliefs and plans can be formulated. Both JAT and JAT4BDI adopted a black-box testing strategy, only agent interactions and their consequences on agent mental state (goals, beliefs, etc.) are tested, leaving agent internal structure and internal logic uncovered.

2.2 Development-methodology centre testing solutions

As part of Tropos methodology, an agent-oriented development methodology, Nguyen et al. proposed a goal-oriented approach for JADE’s agent test case definition [46]. For each agent under test’s goal, a test suite is defined to verify the goal fulfilment. A special agent (named the Tester Agent) is used to execute the test suite against the agent under test, it stimulates the agent under test with messages and then verifies the correctness of its answer. The developed testing tool, named eCAT, has two other components alongside the Tester agent: a Monitoring agent, to monitor and debug communications and events among agents, and a test suite editor that semi-automatically generates test suites skeletons from XML specification.

Likewise, in the abovementioned work, Nguyen et al. adopted a black-box testing strategy, where they rely on agent sociability to test it. Still, it may occur that some agents’ goals’ achievement plans (behaviours) do not contain communication acts (e.g., inner-goals) making the Tester agent unable to verify them. Moreover, goal achievement does not always mean that the plan was executed correctly without errors.

Another agent-oriented development methodologybased work is [14] in INGENIAS Methodology. The authors proposed two tools for JADE’s agent’s interaction debugging and testing. The first is a debugging tool named “the ACLAnalyser tool” which tracks and logs individual agent conversations. The generated log can be used either to check the correctness of the interaction sequence and detect both overhead data exchanged between agents and unbalanced execution configurations; or for more advanced analysis such as applying data mining techniques to discover emergence patterns in the system (social) level. The second tool is a testing framework built upon JUnit and integrated within INGENIAS Development Kit. It uses a model-driven approach for test suite skeleton generation and allows to test interaction via the verification of its effects on agent mental state.

INGENIAS approach has some downsides: ACLAnalyser is more about inter-communication relationship visualisation than verification. The conformance of exchanged messages (e.g., message content, performative, etc.) cannot be checked. Moreover, like previous works, the proposed test framework is interactions’ effects centred, it does not consider agent’s behaviour.

In [10], Á. Carrera et al. proposed BEAST Methodology, an agile agent testing methodology based on Behaviour Driven Development technique [47]. The BEAST Tool2 allows generating automatically JUnit test cases skeletons from Behaviour Driven Development scenarios specifications. Similarly, to JAT, BEAST adopted a black-box testing strategy based on the use of mocks. To afford more flexible mocking capabilities, the Mockito framework3 was integrated into the tool. This permitted on one hand to mock the ordinary Java objects and on the other hand to ease the configuration of the three mock-agent types (ResponderMockAgent, ListenerMockAgent, MediatorMockAgent) that the tool supports. Moreover, a set of utility methods are provided to access and manipulate both the JADE platform and the agent under test, allowing either to set the test fixture or to verify agent under test interactions and its mental state.

Additionally, to limit tests to agent’s interactions and its mental state evolution, the BEAST tool provides only three types of mocks and unlike JAT’s mocks where mocks’ behaviours can be settled by a developer allowing the implementation of complex interaction schemas (like protocols), BEAST’s mocks are treated as a black box only their communication methods can be stubbed to set the messages to send or to receive.

2.3 Summary

The existing JADE agent testing solutions focused only on agent-level testing, that is test agent’s interaction and its effects on agent’s mental state, living agent units testing uncovered. This conclusion can easily be confirmed by looking at the available JADE-based multi-agent systems in the code repositories like GitHub, GitLab, SourceForge, Google Code Archive, etc. In fact, after reviewing these repositories, few are the projects that provide agent test cases and rare those that agent unit is tested. Actually, in the identified ones, the object-oriented approach was adopted to test agent’s methods in isolation using JUnit. To accomplish that, the tester had to make agent’s methods public which is against agent programming principles [3], that state that agent’s inner state (mental state and inner behaviour) should be protected from external access.

The assumption that JADE’s agent is just a Java code and JUnit framework can be used to test agent units [13] is false. The specificity of JADE’s agent’s code makes it hard and sometimes impossible to test JADE’s agent’s unit (behaviour, methods and beliefs, etc.) using JUnit alone. JADE’s API promotes high abstraction. It allows the developer to focus only on implementing agent’s behaviour inner-logic. All the aspects related to behaviour management (scheduling, execution, etc.) and agent-live state management are hidden from the developer. Therefore, testing JADE’s agent’s behaviour effectivity cannot be achieved without running it.

The previous problems are further coupled with the fact that agent may have different behaviours and different execution states. Testing agent behaviour requires being able to bring agent to a desire testing state. JUnit provides neither the appropriate mechanisms to manipulate agent inner-state easily nor to simulate its environment (JADE platform services or other agents).

Meanwhile, at agent-level testing, JADE Agent Testing (JAT) is a good choice to use at this level. It has already proved its efficacy in agent interaction testing [13, 18], but similar to unit-level testing, the lack of mechanisms to directly manipulate the agent’s inner-state to bring it to a desired testing state, obliges the tester to drive the agent under test to this state indirectly via messages. Therefore, JAT mock agents are supposed, before testing a desire interaction scenario, to conduct a priory all the necessary interaction sequences that bring the agent under test from its initial state to the desired one. For instance, in [14], JAT original paper, the authors used the traditional book trading system [8] to demonstrate JAT usability (in this system a buyer is looking for a candidate seller showing the best price from which it will buy a book); to test book purchasing scenario, from the buyer agent side, the JAT mock-seller agent should first implement the correct interaction sequence then leads the buyer to select it as a seller before checking the correctness of the purchasing’s interactions. This solution is time consuming and error-prone specially for complex interactions, for which the need for a better alternative is advised.

As an alternative to the previous solutions, JADE testing framework a complete, effective and easy-to-use framework for JADE’s agent testing is presented in the next sections. JADE testing framework was developed to overcome the limits of existing frameworks. Besides allowing agent interactions’ testing effectively and efficiently, it offers to go beyond the black-box testing strategies, to unbox agent, and to test its units effectively; the whole with an agent-oriented testing syntax.

3. Requirement for a complete JADE agent testing framework

The main goal is to implement a JADE agent testing framework that allows functional testing any JADE agent effectively. The framework should meet the following requirements.

Using a JADE agent code or specification a developer can use the framework to write and run a functional test on the agent code. No assumption should be made on agent code, except that is a code for a JADE Agent.

An agent can be tested at two levels. At the first level the agent is treated as a white box, each of the units that make it are put under the scope to verify their functionality. In the second level, seen as an integration test of the previous units, the agent is treated more as a black box, the agent’s capability to reach its goals and to interact with its environment is verified.

Testing at the unit level requires being able to access, manipulate, and verify JADE agent’s inner elements, thus methods, behaviours (which define his inner logic), and field (which represent its beliefs and mental state). The framework should allow doing that while providing the necessary mechanisms to overcome the external access restriction on the agent state.

Furthermore, JADE agent behaviours are complex objects not just simple methods and they can be combined to construct more complex ones, also their execution logic is coupled with the one of the JADE platform. As mentioned above, JADE’s programming API promotes high abstraction. Therefore, testing only behaviour’s methods is not sufficient; the whole behaviour needs to be run to verify it. This may seem as an integration test of the behaviour at this level, but it is required to test agent behaviour as a unit. The framework should provide a mechanism to isolate a specific agent behaviour, run and verify the agent with this desired behaviour, in any desired state.

At agent testing level where the agent as a whole is considered, the tester should be able to put an agent in a desired situation, run and monitor it to verify how it behaves to reach its goals, how it interacts with its environment how its mental state evaluates, etc. Furthermore, any interaction situation with other agents or systems that may populate the agent environment should be possible to verify, in terms of adhesion to interaction protocol, if used, the correctness of the sequence and content of exchanged messages, etc.

Many testing situations, either unit or agent level may require the use of mocks to simulate an external entity, with which the element under test is coupled/interacted because the use of the real entity is not feasible (not yet developed or third parties’ system not yet connected, etc.) or expensive (in terms of resource or time). The framework should provide the capability to mock these entities (another agent, system, a JADE platform service, etc.) and to customize (stub) their answers and/or behaviour (in case of a mock agent) based on the testing scenario.

The framework should also be user-friendly, an agent-oriented test syntax and assertions should be provided to ease the test writing, whether it is a unit or agent test, it should be run in the same way. At the end of each test, a report should be generated indicating results (pass or fail). In case of a failed test, enough information should be provided to answer where in the code and why it failed.

4. JADE testing framework presentation

4.1 JADE testing framework architecture

Figure 1.

JADE testing framework general architecture.

Figure 2.

JADE testing framework use case.

Based on the previous requirements, JADE testing framework (JTF) was developed to provide a simple, complete, and effective solution for JADE’s agent testing. JTF is constituted of three components (Fig. 1): UJade for agent’s unit testing, JAT4 for agent interaction testing and a domain-specific language (DSL) that provides a set of adopted assertions for JADE’s agent testing. JTF is built upon JUnit Mockito and PowerMockito.4

JUnit is the most popular and powerful Java testing framework, its uses allow firstly to shallow the JTF learning curve since JTF tests, unit or agent level tests, are basically JUnit tests. Secondly, it eases the integration of JTF with IDE and building tools (like Eclipse and Maven). And lastly, most of the JUnit eco-system libraries can still co-work with JTF; JaCoCo for code coverage and PITest library for mutation testing, used later in this paper are examples of such libraries.

PowerMockito5 is an extension of Mockito6 library, one of the best Java mocking libraries. PowerMockito enhances Mockito by allowing access, manipulate, and mocking private and final methods of Java classes which suit perfectly the current context since all JADE agent’s methods, behaviours and fields are private and cannot be accessed from outside [3].

Starting from a test specification (Fig. 2) that can be either a unique test case or a test suite (a set of test cases bound together in one executable unit), written in compliance with JUnit test structure, a developer can use next either an IDE or a building tool to run the test(s). JTF use PowerMockito test runner to run and generate the test report. The report is a standard JUnit report indicating for each executed test whether it passed or failed and, in case of a failed test, an explanation of where and why it failed.

Before continuing with the description of JTF inner structure, it is worth mentioning that in the next paragraphs, the book trading system [8] is used as a case study. All the presented codes were implemented during this system testing using JTF.

Figure 3.

JADE testing framework package diagram.

4.1.1 UJade

The main idea behind UJade (Fig. 3, package jtf.level.unit) is to use PowerMockito capabilities to generate a spied version of the agent under test, that can be run as any other JADE’s agent and on which it will be possible: on one hand to monitor and verify the agent’s inner-actions (behaviours); and on the other hand, to access, manipulate and assert the agent’s mental state. For instance, the PowerMockito white box feature gives the possibility to access and set (resp. call) agent’s classes’ fields (resp. methods), while the stubbing and mocking capabilities can be used to set the test fixture on both: agent under test, for example, to set it into the desired testing state, the test developer can stub some agent’s methods (like message reception, behaviour adding, etc.); and on JADE platform, where the platform’s services can be mocked to set predefined answers (like setting the Directory Facilitator (DF) service’s answers for agent service searching requests).

Additionally, UJade provides a set of utility functions (Fig. 3, sub-package: jtf.level.unit.mockito) that extend PowerMockito capabilities, allowing to:

•
Easily access and manipulate agent’s behaviours that are implemented as inner Java classes or anonymous ones.
•
Block the execution of undesirable behaviours along with the possibility to spy on the desired one (UJade provide a special stubbing function for agent’s addbehaviour() method).
•
Captor the return values of agent’s and behaviours’ methods.

Figure 4.
Sequence diagram for a typical UJade unit test.

Figure 4 presents a sequence diagram of a typical agent unit test (testing agent behaviour and mental state). The fragment spy and setup describe how UJade uses PowerMockito to create a spy version of an agent under test (AUT), the later inner action and state are stubbed and changed to put it in a desire testing state. In fragment Mock AUT’s Environment, environmental objects and systems that the element under test is depending-on or coupled-with are mocked using Mockito and PowerMockito. Next, once the agent test is run, UJade calls PowerMockito verification functionality to monitor and verify the agent’s inner action. Finally, the agent’s mental state is asserted (asset AUT Inner State) to verify its evolution.

It is noteworthy that the first attempts of using PowerMockito on JADE agent were unsuccessful, the spy version could not launch correctly. After reviewing the JADE Agent class code, a bug was spotted in the run() method control flow exactly within the method initialization (init()) of agent active lifecycle (code line 1488, Listion 1).

To spy an object, Mockito, which is the core component of PowerMockito, creates a modified copy of that object (i.e., a spy). It is recommended that all interactions should be with the generated spy, not the real object [43], so that the inner-actions can be cached and verified. However, the non-respect of coding style convention (how to refer to the outer classes in non-static nested class) in the init() method of the inner-class ActiveLifeCycle of Agent class (code lines 1538–1543, Listion 2), has resulted in calling form the spy, the real agent methods: setActiveState(), notifystarted() and setup(), instead of those of the spy, and so causing the launch of the real agent, not the spy. This could be easily overcome if the expression “Agent.this.” was used in the previous methods’ calls.

To fix this bug an AspectJ class was implemented to intercept this error and to fix it on runtime. The decision to use the aspect programming approach [30] to implement this bug correction instead of directly changing the JADE platform code was driven by the desire to make JTF decoupled as much as possible from the running JADE platform version.

Listing 1.
A code snippet of agent’s run() method.

Listing 2.
A code snippet of init() method.

4.1.2 JAT4

The second testing component in JADE Testing Framework (JTF) is an upgraded version of JADE Agent Testing (JAT) framework, named JAT4, that integrates PowerMockito and supports the JUnit4 syntax. With JAT4, it is easy to write a more concise and interaction-centred test, since unlike the original JAT, where the mock agent is obliged to conduct all the necessary interaction sequences that bring the agent under test from its initial state to the desired one. In JAT4, PowerMockito features are used to access and manipulate directly agent under test inner-state; to stub and mock both agent under test’s inner-actions and JADE platform services.

It is noteworthy that JAT original code had to be refactored and evolved to support the current JADE platform version (4.5.0) and to overcome the libraries versions conflict7 that prevents its integration with UJade in one testing framework. JAT4 can work on all JADE platform versions 3.x and 4.x. Besides, to easy an eventual extension of JTF, especially since there is another JAT version for JADE’s BDI-agents (JAT4BDI), JAT4 was implemented as a separate project and it is settled in JTF as an external dependency (Fig. 3, JAT4.jar).

Figure 5 depicts a sequence diagram of a typical JAT4 test case. A test starts generally by creating a (or multi) JAT mock agent(s) with which the agent under test is going to interact. These mock agents, JAT4 depend heavily on them to implement the testing scenario and to evaluate agent under test answers. Optionally, PowerMockito mocking capability can be used, similarly to the unit level, to mock no-agent environmental objects. Next, depending on the testing scenario, JAT4 can run a spied agent with stubbed action and a predefined mental state (like at unit level, to put it in a desired state) or the agent as it is, in its initial state. After that, JAT4 synchronises and monitors both (all of) mock agent(s) and agent under test. Once the mocks finish their task they notify JAT4 about the results. Finally, like unit level, agent mental state can be asserted to verify its evolution.

Fig. 5.

Sequence diagram for a typical JAT4 agent test.

Listing 3.

Mock-buyer’s behaviour.

4.1.3 JADE testing framework’s domain-specific language

JTF is based on JUnit, a testing framework with a general-purpose language, do the necessity for a more expressive and suitable syntax for JADE’s agent testing. The adopted domain-specific language (DSL) is based on the AssertJ library8 (Fig. 3, package jtf.assertion.jade). It provides a rich set of assertions which besides easing the test writing it allows to have better readability and understandability of test code (i.e., better code quality) compared with the basic JUnit assertion [34]. The choice of AssertJ library rather than other assertions libraries like Hamcrest9 allows for increasing the tester’s productivity [35, 36]. The current DSL proposes five sets of assertions (package jtf.assertions.jade):

•
Assertions related to agent definition (AID, launching arguments, state, etc.).
•
Assertions to verify agent composite behaviour structure.
•
Assertions related to agent and the descriptions of its services in the DF catalogues.
•
Assertions on ACL messages.
•
Assertions for ontologies verification.

Listing 3 gives a mock buyer behaviour code, which was implemented to test (at agent level) the seller agent’s order-processing. In the code lines between 17 and 33, the mock buyer verifies whether the seller is replaying its book order with the correct message format. Without the proposed DSL, 5 assertions (commented lines 26 to 30) are required to test the replay message format. However, using JTF’s DSL, only one assertion is sufficient (line 33).
4.2 Test cases implementation on JADE testing framework

JADE Testing Framework (JTF) is based on JUnit 4, thus test cases definition and test methods implementation follow JUnit4 syntax. To define a test case, a tester has to extend one of these two classes: UjadeTest.Java for a unit-level test case or JAT4.Java for an agent-level test case. These classes contain all behind scenes codes, necessary for preparing (resp. cleaning) the test environment before (resp. after) test case running, for example: launching the jade platform; cleaning the platform and killing agents after each test execution; shutting-down the platform after test case ends, etc.

Listing 4 and Listing 5 show the test code skeletons for JTF’s unit-level and agent-level test methods respectively. While Listing 6 presents typical code skeletons of a JAT’s mock agent’s behaviour (action() method). The bold lines represent the main instructions, whereas the other ones are optional, it depends on the test case In the next listing the acronym AUT denote agent under test.

Listing 4.

A unit-level test case code skeleton.

Listing 5.

An agent-level test case code skeleton.

Listing 6.

JAT mock agent’s behaviour’s action method.

Listing 7.

Bookseller unit-level test case.

Listing 8.

Bookseller agent.

Listing 9.

The bookseller agent-level test case.

Listing 10.

The bookseller’s behaviour: Purchase-order server.

As an example of a test case implementation under JTF, a partial test code of the seller agent from the book trading system is presented in Listing 7 and Listing 9. In the first test case (Listing 7), the seller’s start-up inner-actions (setup method, Listing 8) are tested using UJade (unit-test). The test method starts by creating: a mock of the DFService and a spied instance of the seller with three books as initializations arguments (lines 49–53), the spy’s method addbehaviour() is stubbed to assure that no behaviour is going to be added (line 54) thus only the start-up code is executed. After launching the spied seller in line 58, its inner-action addbehaviour() is tested, to verify whether it was called twice as supposed (code lines 56 and 57, Listing 8); if it is the case, the pasted behaviours arguments are captured and their class-types are asserted later in lines 84–86. Then, in lines 63 to 74, the seller’s service registration in the DF (code lines 44–55, Listing 8) is verified to check whether the mocked DFService was solicited appropriately and that the published service’s description is correct. Lines 76–81 check if the seller’s catalogue was initialized correctly with only two books instead of three since the third book’s price is incorrect (code line 51, Listing 7).

Meanwhile, Listing 9 presents a partial agent-level test case of the seller agent, with a test method (testPurchaseOrderServer) that was implemented to test the purchase interactions between the seller and a buyer, from the seller side (Listing 10). The designed testing scenario consists of using a mock buyer to solicit the seller for two books: the first one is available and the second is not. The seller is supposed to reply to the mock with two messages: an inform message for the first request and a failure message for the second one. Also the seller’s catalogue should be updated as a result of these interactions.

The test starts by creating and launching the seller and a mock buyer (code lines 20–22, Listing 9), and then the test thread is suspended until the interaction between the seller and the mock ends (code line 23, Listing 9). The instruction on line 24 verifies that the test was passed successfully from the mock side. At last, the updated state of the seller’s catalogue is asserted (lines 26 and 27, Listing 9).

From the mock buyer side, all the interaction-scenario logic and the associated verification instructions are implemented in a dedicated behaviour (the previous Listing 3). The mock initiates its communication with the seller by sending to it a purchase request for the book “The hobbit” (lines 17–22, Listing 3) then it checks if the seller has replied to its request by an INFORM message (24–34, Listing 3. After that, it sends a second request for the book “Moonless” (lines 36–38, Listing 3), and it verifies that the seller’s answer this time is a FAILURE message (41–47, Listing 3). If all the previous interactions and verification were conducted correctly the mock sets the test as passed from its side (line 48, Listing 3), otherwise failed (line 52, Listing 3).

5. Case of experiments

To demonstrate the effectiveness of JADE Testing Framework (JTF) in JADE’s agents testing, an experimental comparison between JTF, JAT-alone and UJade-alone testing capability was conducted. The investigated research questions were:

RQ1.1:
Does JTF (i.e., the integration of UJade with JAT) allows better agent testing than UJade alone?
RQ1.2:
Does JTF (i.e., the integration of UJade with JAT) allows better agent testing than JAT alone?

In the experiment, the tests of 23 agents, from seven (07) opensource multi-agent systems, were implemented using each solution. Then, the effectiveness of each solution’s test suite was assessed using the mutation testing technique (MTT) [20, 50]. The MTT is known to be a good indicator of test quality. Unlike coverage metrics techniques, which only indicate whether a line/instruction is tested, MTT allows to assess as well how well this instruction is tested [11, 28].

During the experiment planning, it was found that some pieces of agent’s code can be completely tested (100 per cent code coverage) either at unit-level or agent-level (see test design technique section):

•
Agent’s behaviour with one-way communication schema (only send or receive messages).
•
An agent implementing the predefined JADE interaction protocols.

These dual-test-level codes are communication-oriented. They are supposed to be tested at the agent level, but UJade capabilities allow to test them at the unit level too. Thus, an additional research question was raised:

RQ2:
Which of the test levels, the unit-level with UJade or the agent-level with JAT, requires less effort to write tests for the dual-test-level codes?

5.1 Experimental procedure

The process of questions answering was conducted as follows. First, the dual-test-level cases (RQ2) were examined, all the agents’ codes with dual-test-level possibilities have been identified and their associated tests were implemented on both of the levels (unit-level with UJade and agent-level with JAT). The generated tests’ construction efforts were then assessed and analysed to figure out what is the best way (the less effort required) to test these cases.

Afterwards, both main research questions (RQ1.1 and RQ1.2) were explored. All the selected systems’ agents were tested at both levels: unit and agent levels, by using UJade and JAT respectively. The resulting test cases were then grouped into three categories:

•
The UJade category: contained the unit level’s test cases (implemented using UJade).
•
The JAT category: grouped agent level’s test cases (implemented using JAT).
•
The JTF category: contained both of the levels’ test cases. To avoid test case duplication caused by the dual-test-level codes, the precedents analyse findings (RQ2) were used to determine which test level is less effort-consuming and subsequently which test cases to retain.

Each of these categories was evaluated using the mutation testing technique. The latter consists of generating faulty versions of agents under test (call mutants), by deliberately introducing small syntactic changes (mutations) in it, and then each category’s test suite is executed on these mutants to check whether it can detect them or not. If any test fails on a mutant, then this mutant is considered as killed otherwise the mutant is said as live. The ratio of mutant killed to all the mutant represent the mutation score or the mutation adequacy [20, 50]. The test suite with a high score is considered as having strong testing power and high effectiveness in fault detections [11, 23, 29].
5.2 Testing strategy

To establish a solid ground for this empirical analysis and to reduce its internal validity threats, a unified testing strategy was adapted where:

•
A common test design technique, based on the McCabe structured test approach [42], was defined to drive test-case designing.
•
The test completion criterion was settled to 100% of McCabe (complexity) coverage (i.e., all basis paths).
•
The EclEmma10 tool was chosen to compute the agent’s code coverage.
•
The PIT [27] Tool11 was chosen to execute the mutation test.

Additionally, to quantify the test’s writing effort (Test construction effort), the definition of [7] was adopted. In the latter, the size of the test suite is considered a strong testing-effort indicator. Bruntink et van Deursen proposed two metrics to compute test size: the number of test lines of code (TLOC) and the number of JUnit assert methods (NbAssert) used to test a class. However, this work was originally proposed for object-oriented context; to handle the specificity of the agent-oriented paradigm, two additional metrics were adopted: the number of test-cases (NTC) and the number of mocks used to test an agent (NbMock: the number of JAT mocks and Powermockito spy and mocks – Eq. (1)). Furthermore, the number of assertions (NbAssert) metric’s definition was slightly modified to cope with JTF capabilities since, besides allowing to formulate assertions on agent inner state, JTF permits to verify agent inner logic too. Therefore, the number of Mockito verify instructions called in an agent’s test, is also taken into account when counting the NbAssert metric Eq. (2).

$\displaystyle\text{NbMock}=\text{Nb{\_}JAT{\_}Mock}+\text{Nb{\_}Spy}+\text{Nb{% \_}Mock}$ (1) $\displaystyle\text{NbAssert}=\text{Nb{\_}Assert}+\text{Nb{\_}Verify}$ (2)
5.2.1 Test design technique

In the McCabe structured test technique, also known as basis path testing, the control flow structure of the program is used to establish path coverage criteria [55]. The choice of the McCabe technique was supported by the facts that: this technique promotes analytical over arbitrary test case design since the focus is on program logic, fewer test cases are required compared to the branch testing technique and every statement in the program is covered at least once [4, 55].

To ensure the consistency of test-case designing across all agents in the systems under test (SUT), a set of guidelines are settled for each testing level. At the unit level, the following agent under test (AUT) elements are tested: agent’s methods, behaviours, sub-behaviours and behaviours’ methods, using the bellow guidelines (Listing 11).

Listing 11.

Agent’s units test steps.

Listing 12.

Agent-level test steps.

At agent level, since in JADE, behaviours are the main building blocks of agents [3], for each agent under test behaviour with a communication code, all the possible interaction scenarios are identified (by following McCall technique) and tested using JAT mocks agent. Each interaction scenario (scenario-under-test ScUT) is tested flowing the next guidelines (Listing 12).

The dual-test-level situations were identified during the guidelines definition. The first one was identified within behaviours that are mainly communication-oriented and their communication schema is simple: only one-way communication, either receive or send messages. This situation is supposed to be tested at agent level, like any other interaction behaviours, by flowing Listing 12 guidelines. But it can also be tested at unit level (19–24 Listing 11) by faking messages reception and then testing their effects on agent inner state (in case of message reception), or by verifying agent send method calls’ arguments (in case of message sending).

The second dual-test-level situation is associated with JADE’s agent interaction protocols’ tests. To relieve the developer from the burden associated with protocols implementation and message flow consistency checking, JADE API12 provides a set of ready-made interaction protocols classes (defined as agent behaviour) that developer can extend by overwriting only the methods responsible for preparing the messages to be sent (prepareRequestes(), prepareResultNotificatione(), prepareCFP(), etc.); or processing the received messages (handleAgree(), handleInfome(), handleAllResponses(), etc.), depending on the interaction situation. Testing these protocols can be realized either at the agent level, with JAT mocks like any other interaction behaviour, or at the unit level where the overwriting methods are tested like any other behaviour method.

5.3 Subject systems

Seven (7) multi-agent systems (MAS) were selected as a sample for the experiment (Table 1). Four of them are from the GitHub repositories (MAS 1, 3, 6, 7). The MAS 2 was developed as a case study in the JAT paper [13]. The MAS 4 is from the JADE platform distribution package and the MAS 5 is from the google code project archive. Some basic statistics about these systems are presented in Table 1. For each system the number of agents in the system, the sum of agents’ lines of code and the agent code cyclomatic complexity (CC, calculated by EclEmma) are given. Systems 1, 2 and 7 are relatively small, meanwhile, systems 3 and 5 have medium sizes. The mean number of agents in the previous systems is 2. However, system 6 is the biggest and the most complex one, with 9 agents and a total agents’ cyclomatic complexity is 206.

Table 1
The multi-agent systems understudy

MAS	System	Agents	Line of codes (LOC)	Cyclomatic complexity (CC)
Sys 1	Auction System -1-	2	322	29
Sys 2	Auction System -2-	3	397	47
Sys 3	Auction System -3-	3	525	79
Sys 4	Book Trading System	2	266	52
Sys 5	Hospital-Appointment Allocation System	2	520	100
Sys 6	Home Automation System	9	1972	206
Sys 7	Treasure Hunt System	2	254	46
	Total	23	4256	559

The first system13 is an auction system (reverse auctions). It comports two agent types, a company and a carrier agent (Fig. 6). The company agent is looking for an agent to do a job with the lowest possible price. In meantime, the carrier agent is a bidder agent, it tries to get the highest possible value for doing a job. When there are multiple bidders, they will underbid each other, to force others to forfeit.

The second system (Fig. 7) is another implementation of the previous auction system. The system contains three agent types. The enterprise’s agents sell some goods; the bargainer agent wants to buy an item with the possible lowest price. The difference with the 1 ${}^{\text{st}}$ system is that once the bargainer identifies the candidate enterprises, it delegates the price negotiation process to an auction agent.

Fig. 6.

Auction system #1.

Fig. 7.

Auction system #2.

The third system14 implements a first-price sealed-bid auction (Fig. 8), where bidders place their bids in sealed envelopes and simultaneously send them to an auctioneer, who will announce as a winner the bidder with the highest bid. The system comports three agent types: Auctioneer, BidderHuman, and BidderComp. The auctioneer agent is responsible for the main auction events. BidderHuman and BidderComp are both bidder agents, the BidderHuman agent is controlled by a human user via a GUI (graphical user interface) but, the BidderComp is an autonomous agent.

Fig. 8.

Auction system #3 (first-price sealed-bid.

The fourth system is the aftermentioned traditional book trading system [8] available with the JADE platform distribution15 (Fig. 9). The system contains two types of agents: BookSeller and BookBuyer. The BookBuyer agent tries to buy a book at a low price, from one of the existing BookSeller agents.

Fig. 9.

Book trading system.

Fig. 10.

Hospital-appointment allocation system.

Fig. 11.

Home automation architecture.

The fifth system16 is a hospital-appointment allocation system (Fig. 10). It has two types of agents: a hospital agent responsible for appointment management and a patient agent that tries to get an appointment that matches its preferences.

The sixth system17 is a home automation solution project, with JADE framework as an intermediate layer responsible for managing interactions between physical hardware devices, such as sensors and actuators (Fig. 11). Each physical device (button, light bulb, receivers, brightness or temperature sensors, infrared remote control, etc.) is associated with an agent that acts as a wrapper of this device, abstracting the interaction way with it. The hardware devices are physically connected to a set of interconnected micro-controllers; altogether, they form a mesh network. The system architecture consists of a building with rooms, each room may enclose one or many devices.

The current system version comprises nine (9) types of agents: a demo agent responsible for system launch, a controller agent to interact with a user (display the system state and receive user commands), a room agent, a building agent, a bulb and a toggle switch agent, two sensors agents (temperature and light sensors) and a mesh Net gateway agent to interconnect the software and the hardware.

The last system18 simulates a treasure hunt game (Fig. 12). The system comports two agents. A “game master” hides a treasure on a grid and a “player” agent tries to find the treasure location. At each run cycle, the player agent moves by one square on the grid, and the game master tells him whether he is getting closer or further away from the treasure.

Fig. 12.

Treasure hunt system.

6. Experiments results

6.1 Answering the dual-test-level question

In the aim to answer RQ2, the 23 agents’ codes have been reviewed. As a result: 30 dual-test-level cases were identified. To test them, 163 test cases were developed using both solutions: 126 at the unit level using UJade and 37 at the agent level with JAT. Table 2 indicates test case distribution per dual-test-level case type.

Table 2
Test-cases per dual-test level case-type

Dual-test-level case-type	N:	Unit level	Agent level
One-way communication	3	3	3
Interaction protocols	27	123	34
Sum	30	126	37

The collected data show that one-way-communication cases have required fewer test cases to be covered (one test case per dual-test-level case per level) compared to the interaction protocol case-type. The reason is that: in the former, the communication schema is simple (only send or receive messages), which can be easily tested either at the unit level using UJade – by mocking message reception or spying on the agent to verify the message sending –, or at the agent level with JAT – by implementing a simple reactive JAT‘s mock agent –.

However, the interaction protocol cases required more test cases, nearly 6 per case. Besides, the number of test cases developed at the unit level was threetime more than the number developed at the agent level. This is because of the ways this case-type was tested: with UJade at the unit level, each method within the interaction protocol was tested alone, and the average number of implemented methods in the protocols was 3; meantime, with JAT at the agent level, the interaction protocol was tested as one entity via JAT’s mocks; each test-case verifies a possible interaction scenario where multiple methods can be covered at each time.

To ease the inter-level test-effort comparison, the test-effort was modelled as a weighted sum function of the four adopted metrics Eq. (3). The function’s weights were determined using the rating technique: Analytic Hierarchy Process (AHP) [49]. The latter was chosen since it has a low computation time and the consistency of the adopted rates or weights can easily be checked. All AHP-related calculations were done using PriEsT19 tool (Priority Estimation Tool) [51]. Table 3 presents the decided pairwise comparison matrix. The consistency rate of the resulted weights is 0.009, which is under the recommended threshold of 0.1 [49]. It is worth mentioning that the AHP approach proposes a comparison fundamental scale of nine points. Yet, only a 5-point scale (1 to 5) was used, since the 7 to 9 levels are used for comparison decisions based on demonstrated claims and facts, which is not the case in the current study.

$\displaystyle\textit{Test\_Effort}=0.449\times\textit{NbMock}+0.235\times% \textit{TLOC}+0.235\times\textit{NTC}+0.081\times\textit{NbAssert}$ (3)

Table 3

Analytic Hierarchy Process (AHP) pairwise comparison matrix

	Number of test-cases (NTC)	Test lines of code (TLOC)	Number of mocks (NbMock)	Number of assertions (NbAssert)
Number of test-cases (NTC)	–	0.5	1	3
Test lines of code (TLOC)	2	–	2	5
Number of mocks (NbMock)	1	0.5	–	3
Number of assertions (NbAssert)	0.33	0.2	0.33	–

6.1.1 One-way communication

Figure 13 shows the test-effort metrics’ measurement values for the three one-way communication cases on each level. The overall efforts values obtained from Eq. (3) are illustrated in a separate chart bar (Green) – the location of the cases in systems under test (SUTs) are indicated in Table 4 –. Testing one-way communication cases at the unit level required less effort than at the agent level. The number of TLOC with UJade is always lower than TLOC with JAT. This is because with JAT, the mocks are JADE’s agents that were manually coded. Meanwhile with UJade mocks and spies are generated automatically (by Powermockito), and less code is required to define their behaviours (stubbing method). Also, the number of assertions at the unit level is generally higher than at the agent level since UJade, at the unit level, allows more verification possibilities then JAT. With JAT only the message conformance and message effects on agent’s beliefs can be verified, but with Ujade, the agent’s inner-actions can also be verified.

Table 4
One-way communication cases

MAS	Agent	Case
Sys 3	BidderComp	C 1
	BidderHuman	C 2
Sys 5	HospitalAgent	C 3

The average measurement values of the test effort’s metrics of the three one-way communication cases are presented in Table 5. Data shows that testing one-way communication schema’s code at the unit level with UJade reduces by 26% the number of test lines of code required and it allows better verification capabilities since more assertions and verifications instructions can be formulated. However, this solution came with an extra charge, more mocks (33%) must be defined compared to the agent level. The overall effort is reduced by 24% when UJade is used compared to JAT. Due to the low number of cases (only 3), the strengths of these findings could not be statistically verified.

Table 5

The average test efforts measurements values – One-way communication –

Test framework	TLOC	NbAssert	NbMock	NTC	Test effort
UJade	30.33	2.00	2.67	1.00	14.65
JAT	41.00	1.00	2.00	1.00	19.20

Fig. 13.

One-way communication test effort’s metrics measurements (unit vs agent level).

Fig. 14.

Interaction protocol cases: Test effort’s metrics measurements (unit vs agent level).

Fig. 15.

Interaction protocol cases: The overall test effort values (unit vs agent level).

6.1.2 Interaction protocol

Considering the interaction protocol cases, the test effort related metrics’ measurement values of the 27 cases on both levels are presented in Fig. 14 (Table 6 shows the locations of these cases in the systems under test (SUTs)). For lack of space the overall test effort values are illustrated in a separate chart (Fig. 15).

Table 6
Interaction protocols cases

MAS	Agent	Case
Sys 1	CarrierAgent	C1-C2
	CompanyAgent	C3
Sys 5	PatientAgent	C4
Sys 6	Building	C5
	Bulb	C6-C9
	Controller	C10
	LightSensor	C11-C15
	MeshNetGateway	C16
	Room	C17-C18
	TempSensor	C19-C23
	ToggleSwitch	C24-C27

By analysing the charts, it seems that the test efforts of interaction protocol cases flow the same trends as one-way communication cases: lower TLOC value and higher NbAssert value at the unit level – like mentioned previously, JAT required more code to implement mocks and only assertion on agent under test’s messages and beliefs can be formulated –.

But, unlike the previous dual test-level cases, the interaction protocols required more test cases at the unit level compared to the agent level, and the number of mocks needed is nearly the same. The exception was spotted in case 3 (C3), the complexity of the interaction scenarios (high TLOC and NAssert) in this protocol required 17 mocks in UJade against 9 in JAT. The implemented protocol, in this case, is the iterative contract net protocol, used by the company agent (the agent under test) to interact with multiple bidders, which are supposed to underbid each other, in hope of others forfeiting. To test it, a high number of JAT agent mocks (17) was needed to cover the different interaction scenarios. Meanwhile at the unit level, fewer mocks were required (only 9), since UJade provides a set of functionality that help to directly call and test the interaction protocols methods (methods that handle the reception or preparing the sending).

Table 7 presents the average values of the test-effort metrics’ measurements. Testing the interaction protocols at the unit level is more efficient than at the agent level: less test code is required (25 % less) and better verification capabilities are provided, with nearly the same number of mocks. However, the number of test cases to implement is 2.5 times more than JAT. The overall test effort is reduced by 22%

Table 7

Average test effort metrics value – interaction protocols –

Test framework	NTC	TLOC	Assert	Mocks	Test effort
UJade	4.56	76.48	10.59	4.85	37.41
JAT	1.26	102.67	8.07	5.07	48.24

To test the significance of the above-mentioned finding, a lower-tailed Mann-Whitney test (U) was performed on the overall test effort values with the following hypotheses (alpha $=$ 0.5)

H0:

Testing Interaction protocols at the unit level is not less effort consuming than at the agent level.

H1:

Testing Interaction protocols at the unit level is less effort consuming than at the agent level.

The statistical test was conducted using the XLSTAT20 tool. Table 8 presents the test results. The $p$ -value ( $<$ 0.0001) is lower than the significance level alpha (0.5), thus the null hypothesis H0 is rejected, and the alternative one H1 is accepted. So, testing interaction protocols at the unit level is less effort consuming and the difference is statistically significant.

Table 8

Mann-Whitney test results

U	155
Expected value	364.500
Variance (U)	3333.481
$p$ -value (one-tailed)	$<$ 0.0001
alpha	0.05

From the two previous analyses’ results, it can be concluded that testing one-way communication schemas and interaction protocols using UJade is less effort-requiring than using JAT.

6.2 Answering the main research questions

In the quest for answers to the main research questions (RQ1.1 and RQ1.2), 431 test cases (360 tests at unit level and 71 at agent level) were developed to test the 23 agents under test. These test cases were divided into three categories. The first category is the UJade test suite, which consists of 360 unit-level test cases. The second is JAT test suite category, with 71 test cases, representing all the cases that can be implemented under JAT to test the 23 agents. The third is the JTF category, with 394 test cases (360 at the unit level and 34 at the agent level), it is the result of aggregating (without duplication) the previous two categories into one complete solution (JTF). The dual test level codes were tested at the unit level, as the previous research question’s answer suggests.

Table 9 shows the code coverage (instruction and complexity) for each category’s test suite. The collected coverage data need to be interpreted in light of the following facts: firstly, EclEmma uses a modified version of the cyclomatic metrics (the strict cyclomatic metric), which is different from the one used for test case design (the ordinary cyclomatic). In the strict cyclomatic metric, the side effect of the Boolean operator, in binary decisions – such as “if” and “while” statements – is taken into account, for each Boolean operator the complexity is incremented by one. Secondly, EclEmma does not consider exception handling as branches try/catch blocks when counting the complexity.21 And lastly, in the previous test case design guidelines, it was decided not to test the empty catch blocks or the catch blocks with only one instruction, that is, showing the exception trace into the console (system.out). This decision was based on the assumption that these blocks are not worth testing since there is no agent’s inner-action or belief manipulation to verify.

The first fact is the reason why the complexity coverage is not 100% for some agents, even though the implemented test cases were designed to meet this completion criterion, the BookBuyerAgent is one of these agents, its JTF instruction coverage is 100%, but the complexity coverage is 92%. Meanwhile, the last two facts help to clarify why in the cases tagged with (*), the complexity coverage is 100% but the instruction coverage is not.

Table 9
Agent’s code coverage JAT vs UJade vs JTF

MAS	Agent	Nbr instr	Instruction coverage			Complexity	Complexity coverage			Tag
			JAT	UJade	JTF		JAT	UJade	JTF
Sys 1	CarrierAgent	124	85%	98%	98%	7	57%	86%	86%
	CompanyAgent	685	91%	99%	99%	22	86%	100%	100%	*
Sys 2	AuctionAgent	401	100%	61%	100%	12	100%	58%	100%
	BargainerAgent	462	97%	99%	99%	17	82%	100%	100%	*
	EnterpriseAgent	436	99%	89%	99%	18	94%	83%	94%
Sys 3	Auctioneer	746	96%	93%	100%	44	75%	86%	93%
	BidderComp	330	96%	55%	99%	14	93%	79%	100%	*
	BidderHuman	398	96%	99%	99%	21	95%	100%	100%	*
Sys 4	BookBuyerAgent	522	92%	88%	100%	38	74%	76%	92%	-
	BookSellerAgent	242	96%	60%	99%	14	86%	64%	93%
Sys 5	HospitalAgent	612	99%	61%	99%	35	80%	54%	83%
	PatientAgent	1118	96%	70%	98%	65	74%	74%	91%
Sys 6	Building	268	77%	95%	96%	14	71%	100%	100%	*
	Bulb	957	87%	96%	96%	42	74%	95%	95%
	Controller	216	97%	97%	97%	9	89%	100%	100%	*
	LightSensor	909	96%	97%	97%	35	74%	89%	91%
	MeshNetGateway	368	86%	96%	96%	17	94%	100%	100%	*
	Room	494	93%	97%	97%	22	77%	100%	100%	*
	TempSensor	930	97%	97%	97%	31	81%	97%	97%
	ToggleSwitch	685	97%	97%	97%	28	79%	96%	96%
	Demo	420	0%	94%	94%	8	0%	100%	100%	*
Sys 7	Player	362	89%	100%	100%	28	82%	100%	100%
	GameMaster	260	94%	100%	100%	18	89%	100%	100%

The collected data show that merging both solutions UJade and JAT (i.e., JTF) results in better code coverage than JAT or UJade alone, in the worst case JTF has the same code coverage as JAT (in 2 agents) or UJade (in 14 agents).

As mentioned early, coverage metrics are misleading indicators of test quality. To overcome this drawback a mutation test is executed, on the 3 categories’ test suites, using the PIT22 tool. Driven by [11] finding, that the level of test fault revelation (test efficiency) highly depends on strength of the mutation test; a list of 15 mutation operators (mutators) was retained. Table 10 lists the retained ones;23 a more detailed description of these mutators is available at [26].

The first 13 mutators are defined in PIT documentation as the “stronger group” configuration [26]. However, the last two mutator types (Non-Void Method Calls, Inline Constant) were added to better fit the mutators list to the current study context; In JADE’s agent sometimes beliefs are defined as constant variables with inline initialisation instructions, the “Inline_Constant” operator helps to check whether the test can detect the changes in the agent’s initial state.

“Void_Method_Calls” with “Non_Void_Method_Calls”, provide the ability to alternate JADE’s agent inner-actions, which are supposed to be tested by UJade and JAT. For instance, they give the ability to: Mutate agent behaviour by removing calls to behaviour adding/removing methods (exp: addBehaviour(), addSubBehaviour()), removing calls to agent’s life-state manipulation methods (doDelete(), block(), doSuspend(), doWait()), etc; Mutate Agent communication: by blocking messages sending (removing calls to send() method), alternating the received messages (changing the returned value of Receive() and blockReceive() methods), also mutating the messages to be sent or received (mutating receptors, message’s content and performative, etc.); Mutate agent interaction with the platform: by alternating communication with DFService (removing the call to register(), mutating the description of services to advertise, and changing the result of DF searching); blocking the creation and launching of new agents.

Table 10

Mutation operations

Mutator	Description
Conditionals Boundary	Replace the relational operators: “ $<$ ” to “ $<=$ ”, “ $<=$ ” to “ $<$ ”, “ $>$ ” to “ $>=$ ”, “ $>=$ ” to “ $>$ ”
Increments	Replace increment instruction with decrement and vice versa.
Invert Negatives	Inverts the negation sign of integer and floating-point numbers.
Math	Replace binary arithmetic operations on either integer or floating-point with another operation: “ $+$ ” to “ $-$ ”, “” to “/”, “%” to “”, “&” to “ $\|$ ”, “ $<<$ ” to “ $>>$ ”, “ $>>>$ ” to “ $<<$ ”
Negate Conditionals	Mutate boolean conditionals: “ $==$ ” to “! $=$ ”, “! $=$ ” to “ $==$ ”, “ $<=$ ” to “ $>$ ”, “ $>=$ ” to “ $<$ ”, “ $<$ ” to “ $>=$ ”, “ $>$ ” to “ $<=$ ”
Empty returns	Replace return values with an ‘empty’ value for types: String to “ ”, Integer, Short, Long, Character, Float, Double to 0, Optional to Optional.empty(), List to Collections.emptyList(), Set to Collections.emptySet().
False Returns	Replace primitive and boxed Boolean return values with false.
True returns	Replace primitive and boxed Boolean return values with true.
Null returns	Replace return values with null.
Primitive returns	Replace int, short, long, char, float and double return values with 0.
Remove Conditionals	Remove all conditional statements such that the guarded statements or “If”. If else block is present it will always be executed.
Experimental Switch	Mutate the switch statement by replacing first the default label (if it is used) with the first label in this switch statement. Then All the other labels with the old default one.
Void Method Calls	Remove calls to void methods.
Non-Void Method Calls	Remove calls to non-void methods. Their return values are replaced by the Java default value of their returned types: Boolean with false, int, byte, short and long with 0, float and double with 0.0, a chart with ‘ $\backslash$ u0000’ and Object with null.
Inline Constant	Mutates the inline constant instruction by replacing the literal value assigned to this constant, for instance, for Boolean constant replace the unmutated value true with false and replace false with true.

Using the previous mutators list, 3794 mutations were introduced in the 23 agents’ code by PIT, nearly 1 mutation per LOC (0.89 per LOC). It is worth mentioning that due to the presence of flaky tests,24 the mutation test was run several times (10 times). Each time the mutation scores were slightly different. Yet, the computed standard deviations of the overall mutation scores (the mutation score of the 9 systems altogether per run) of these 10 runs were small (0.64%, 0.66%, 0.80% for UJade, JAT, JTF category’s test suite respectively), this phenomenon could be neglected, without affecting the findings of this analysis. These flaky tests are caused mainly by the concurrence execution existing between agents (agent under test and mocks) threads and JTF test runner thread and by the Synchro Wait25 routines heavily used in both the JADE platform and JTF. These causes are wellknown roots of flakiness in multithread systems [22, 33, 39].

Table 11 presents the bestrun results, for each agent, the number of mutations introduced and the mutation score of each category’s test suite are indicated.

The collected data show that the JTF’s test suite allowed a better mutation score (or in worst cases the same score) compared with JAT-alone or UJade-alone test suites. In 7 cases (tag: T), the JTF test suites have scored better than both solutions test suites. While in 12 cases (tag: U), the JTF test suites had scored the same score as UJade, in one case (Tag J), it had scored the same score as JAT; and in three cases all the suites had the same score. Meanwhile, when Comparing JAT and UJade mutation scores: in 7 cases JAT had better scores than UJade, whereas in 13 cases, UJade had scored better than JTF.

Additionally, the obtained mutation scores allow also the validation of the quality and the rigour of the adopted test design technique; despite the strengths of the conducted mutations test (0.89 mutations per LOC, and 15 classes of mutation operation), the mutation scores were all high, above 95%. The median values of mutation scores for the implemented test suites are 96%,96% and 99%, for UJade’s, JAT’s and JTF’s test suites respectively.

Table 11

Mutation test results

MAS	Agent	Nb mutations	Mutation score			Tag
			UJade	JAT	JTF
Sys 1	CarrierAgent	122	99%	93%	99%	U
	CompanyAgent	176	99%	93%	99%	U
Sys 2	AuctionAgent	125	56%	100%	100%	J
	BargainerAgent	144	99%	97%	99%	U
	EnterpriseAgent	153	87%	99%	99%	T
Sys 3	Auctioneer	159	92%	97%	99%	T
	BidderComp	67	60%	96%	99%	T
	BidderHuman	72	99%	93%	99%	U
Sys 4	BookBuyerAgent	166	89%	87%	100%	T
	BookSellerAgent	80	35%	96%	99%	T
Sys 5	HospitalAgent	206	50%	99%	100%	T
	PatientAgent	384	69%	98%	98%	T
Sys 6	Building	81	96%	81%	96%	U
	Bulb	306	96%	87%	96%	U
	Controller	84	98%	98%	98%	–
	LightSensor	344	97%	97%	97%	U
	MeshNetGateway	87	94%	89%	94%	U
	Room	171	97%	95%	97%	U
	TempSensor	351	97%	97%	97%	–
	ToggleSwitch	240	98%	98%	98%	–
	Demo	118	93%	0%	93%	U
Sys 7	Player	92	100%	86%	100%	U
	GameMaster	66	100%	91%	100%	U
	Sum	3794

To highlight the previous findings and to determine whether the outperformance of JTF compared to JAT and UJade is significant or not, two upper-tailed Wilcoxon signed-rank tests (V) were performed on the following hypotheses (alpha $=$ 0.5)

Test 1:

H0.1: JTF does not allow a better mutation score than UJade. H1.1: JTF has a better mutation score than UJade.

Test 2

H0.2: JTF does not allow a better mutation score than JAT. H1.2: JTF has a better mutation score than JAT.

These statistical tests were also conducted using the XLSTAT tool. Tables 12 and 13 present the results of the tests. The $p$ -values were computed using 10000 Monte Carlo simulations, they are lower than the significance level alpha (0.005 and less than 0.001 respectively). Thus the null hypothesises H01 and H02 are rejected, and the alternative hypothesises H11 and H12 are accepted. This means that JTF allows better mutation scores than JAT and UJade. The differences are statistically significant.

Table 12

Wilcoxon signed-rank test 1 results

V	36
Expected value	18.000
Variance (V)	51.000
$p$ -value (one-tailed)	0.005
alpha	0.05

Table 13

Wilcoxon signed-rank test 2 results

V	190
Expected value	95.000
Variance (V)	617.500
$p$ -value (one-tailed)	$<$ 0.0001
alpha	0.05

From the aforementioned analysis, it can be concluded, as answers to the main research questions (RQ1.1 and R1.2), that the integration of UJade with JAT (i.e., JTF) allows better agent testing than JAT, or UJade alone.

7. Threat to validity

Scientific studies are open to validity threats and limitations. The experiments in this study were planned and executed with precaution to minimise as many threats as possible. For instance, for the internal validity, the attention was on phases: test implementation and mutation-test execution. To reduce the threats associated with test writing, a rigour testing strategy was defined in which: The McCabe technique was chosen to ensure the thoroughness and the uniformity of tests (each instruction is tested at last once); a set of guidelines were defined to drive the unique test developer in his task. The test developer is a PhD candidate with relevant experience in JADE’ agents and Java program test implementation. Furthermore, all test codes were reviewed after 3 months by the same developer for eventual errors. The high values of mutation-test scores obtained in the 2 ${}^{\text{nd}}$ experiment (all tests scored above 95%), despite the strengths of retained mutators list (0.89 mutations per LOC), witness the quality and the rigour of the test implementation activity.

For the mutation-test execution, all test phases: mutations generation, tests execution and measurements collection; were done automatically with PIT, which has considerably reduced the risks of these phases. PIT is a well-documented Java mutation-test tool, it had already proved its efficiency in this field [59, 31, 53, 15].

Considering the construct validity threats, mitigating them was tried: by the use of the mutation-test technique instead of coverage metrics to evaluate JADE Testing Framework effectiveness (the mutation-test technique is a good indicator of test effectiveness); by adopting a set of standard metrics for code coverage, test effort and mutation test score; and by executing all measurement and calculation related tasks with automated software tools, like EclEmma and PIT XLSTATA, etc.

As for conclusion validity, the strengths of both conclusions: JADE Testing Framework effectiveness compared to JAT or UJade alone and the effortlessness of UJade compared to JAT in interaction protocols test writing was statistically demonstrated. However, the suitability of UJade compared to JAT in one-way communication testing could not be statistically assessed due to the low number of cases (only three cases).

The main drawback of this study which may constraint the generalisability of obtained results is the sample size, only 23 agents were tested. As an attempt to overcome this limit, the sample was built from different multi-agent systems, with different sizes, ranging from small (treasure hunt system) to medium (home automation system), and with a variety of agent types (reactive, proactive, autonomous, subordinate, etc.). The lack of benchmarks and open industrial systems were and still are, known issues in the field of multiagent system research [21, 40, 48]. We believe that JADE Testing Framework is in its early stages and further studies are necessary to enhance it and demonstrate its capabilities.

8. Conclusions and future work

Like all human activities, software development is error-prone and multi-agent systems development makes no exception. In this work JADE platform’s agent testing is investigated. The literature and the open-code repository review revealed that unit testing of this platform’s agent is not well covered, all existing solutions focus only on agent interaction debugging and testing. To the best of our knowledge, to date, no work was proposed to test jade’s agent’s units. It is accepted as true that the JUnit framework can be applied to test JADE’s agent’s code like any other Java code, which is wrong; the specificity of JADE’s agent’s code makes it hard and sometimes impossible to test agent’s unit (inner behaviour, inner state, beliefs etc.)

In this work a framework for JADE’s agent unit and agent levels testing was proposed The JADE Testing Framework (JTF) is a result of the integration of two solutions: JADE Agent Testing (JAT) framework, a known JADE’s agent interaction testing framework; and UJade framework, a new solution that was built upon JUnit, Mockito and PowerMockito with aims to effectively test JADE’s agent at the unit level and to enhance JAT framework capabilities. Moreover, JADE Testing Framework proposes a domain-specific language for agent test writing, that was developed to facilitate test writing and augment the quality of test code.

The usability and effectiveness of JADE Testing Framework were consolidated and evaluated in realistic cases. An empirical study with two experiments was conducted for this purpose on a sample of 23 agents from 7 multi-agent projects In the first experiment, agents’ codes that have a dual testing possibility, either on the unit or the agent levels were explored. The obtained results showed that in these special cases, UJade is less effort (test writing effort) consuming compared with JAT. In the second experiment the quality of tests developed using JADE Testing Framework was evaluated using the mutation-test technique, the experiment concluded that JADE Testing Framework permits writing tests with high and better effectiveness compared with UJade or JAT alone.

Looking forward, we intend to enhance this work in a number of ways. We want to carry out an additional study to validate the JADE Testing Framework on other multi-agent system projects. Also, we plan to investigate the agent testability to identify and understand what factors influence JADE’s agent testing effort, since during the previous experiments, we noticed that for some agents, tests were easy to develop but the task was time and effort consuming for others.

Furthermore, we intend to improve JADE Testing Framework by: firstly, extending it to support integration test level; Secondly, integrating multithread and concurrence testing libraries like concurrentunit26 or awaitility27 to better handle the problem of synchronisation between JADE Testing Framework’s test-runing thread and the agent under test’s thread, and to reduce the number of flaky tests. The actual JADE Testing Framework version uses a time delay (Thread.sleep() Java instruction) as part of the adopted solution, which can also negatively affect the test running time; Thirdly, improving JADE Testing Framework DSL expressiveness by more JADE-test oriented instructions and assertions, to enhance the quality of the test code and reduce the test writing effort; And lastly, integrating JAT4BDI, JAT version for JADE BDI agents.

Footnotes

https://jade.tilab.com/download/add-ons/.

http://github.com/gsi-upm/BeastTool/.

https://site.mockito.org/.

JUnit4(v4.13.1), powermockito2 (v2.09) and Mockito3 (v3.3).

https://github.com/powermock/powermock/wiki/Mockito.

https://site.mockito.org/.

JAT use The JUnit3 and JADE 3.0

https://assertj.github.io/doc/.

http://hamcrest.org/JavaHamcrest/.

https://www.eclemma.org/.

http://PIT.org/.

https://jade.tilab.com/doc/api/index.html.

https://github.com/Adrrei/Multi-Agent-Systems-In-JADE.

https://github.com/ardiyu07/jade-blind-auction.

https://jade.tilab.com/download/jade/.

https://code.google.com/archive/p/mascoursework/.

https://github.com/AL333Z/JadeHomeAutomation.

https://github.com/Syncrossus/JadeTreasureHunt.

https://sourceforge.net/projects/priority/.

https://www.xlstat.com.

https://www.eclemma.org/jacoco/trunk/doc/counters.html.

PIT version 1.6.2, https://mvnrepository.com/artifact/org.PIT/PIT-maven/1.6.2.

PIT was configured with mutators options: STRONGER, NON_VOID_METHOD_CALLS, INLINE_CONSTS.

No deterministically pass or fail on the same version of code.

Make asynchronous calls without properly waiting for the call to return.

https://github.com/jhalterman/concurrentunit.

http://www.awaitility.org.

Acknowledgments

This work was supported by the General Directorate of Scientific Research and Technological Development (DGRSDT) affiliated with The Higher Education and Scientific Research Ministry (MESRS) of Algeria.

The authors want also to acknowledge Francisco Cunha and Carlos Lucena, from Pontifical Catholic University of Rio de Janeiro, Brazil, for sharing the source code of JADE Agent Testing framework and the cases studies developed for its validation.

Author’s Bios

	Ayyoub Kalache is an Assistant Professor at the University of Constantine – 1, Algeria. He is a member of the research team “DISE” (distributed-intelligent systems engineering) in ReLa(CS)2 (Research laboratory on computer science complex systems) Laboratory at the University of Oum El Bouaghi, Algeria. Currently, he is a PhD candidate His main areas of interest include object and agent-oriented software engineering, software quality assessment and assurance.
	Mourad Badri is a full professor at the Department of Mathematics and Computer Science of the University of Quebec, Trois-Rivières, Québec, Canada. He is also the director of the Software Engineering Research Laboratory. He holds a PhD in computer science (software engineering) from the National Institute of Applied Sciences in Lyon, France. His main areas of interest include object and aspect-oriented software engineering, software quality attributes, software quality assurance, software maintenance and evolution as well as aspect mining and refactoring.
	Farid Mokhati is a full professor of Computer Science at the Department of Mathematics and Computer Science of the University of Oum El-Bouaghi in Algeria. He holds a University accreditation (Habilitation Universitaire) in Computer Science (Distributed Artificial Intelligence) awarded by BADJI Mokhtar University (Annaba) in Algeria. Currently, he is the head of DISE team in ReLa(CS)2 Laboratory at university of Oum El Bouaghi, Algeria. His main areas of interest include object and agent-oriented software engineering and formal methods.
	Mohamed Chaouki Babahenini is a researcher and head of real-time rendering group at LESIA Laboratory. He is also a full professor at the Department of Computer science of the Biskra University in Algeria, where he received a Ph.D. in 2006. His current research interests are real-time rendering, 3D reconstruction, point-based rendering and data mining. He has co-authored many papers in these fields.

References

Bakar

N.A.

and Selamat

, Runtime Verification of Multi-agent Systems Interaction Quality, in: Intelligent Information and Database Systems, Berlin, Heidelberg, 2013, pp. 435–444. doi: 10.1007/978-3-642-36546-1_45.

Bakar

N.A.

and Selamat

, Agent systems verification: Systematic literature review and mapping, Appl. Intell. 48(5) (May 2018), 1251–1274. doi: 10.1007/s10489-017-1112-z.

Bellifemine

F.L.

Caire

and Greenwood

, Developing multi-agent systems with JADE, vol. 7. John Wiley & Sons, 2007.

Bharat Kumar

Harish

and Sravan Kumar

, An Catholic and Enhanced Study on Basis Path Testing to Avoid Infeasible Paths in CFG, in: Global Trends in Information Systems and Software Applications, vol. 270 Krishna

P.V.

Babu

M.R.

and Ariwa

, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 386–395. doi: 10.1007/978-3-642-29216-3_42.

Brings

Daun

Keller

Aluko Obe

and Weyer

, A systematic map on verification and validation of emergent behavior in software engineering research, Future Gener. Comput. Syst. 112 (Nov. 2020), 1010–1037. doi: 10.1016/j.future.2020.06.049.

Briola

Mascardi

and Ancona

, Distributed Runtime Verification of JADE Multiagent Systems, in: Intelligent Distributed Computing VIII, Cham, 2015, pp. 81–91. doi: 10.1007/978-3-319-10422-5_10.

Bruntink

and van Deursen

, An empirical study into class testability, J. Syst. Softw. 79(9) (2006), 1219–1232. doi: 10.1016/j.jss.2006.02.036.

Caire

, JADE Programming-Tutorial-for-beginners, TILAB, formerly CSELT, 2009. [Online]. Available: https://jade.tilab.com/doc/tutorials/JADEProgramming-Tutorial-for-beginners.pdf.

Caire

Cossentino

Negri

Poggi

and Turci

, Multi-Agent Systems Implementation and Testing, Vienna, Austria, 2004, 11.

10.

Carrera

Á.

Iglesias

C.A.

and Garijo

, Beast methodology: An agile testing methodology for multi-agent systems based on behaviour driven development, Inf. Syst. Front. 16(2) (Apr. 2014), 169–182. doi: 10.1007/s10796-013-9438-5.

11.

Chekam

T.T.

Papadakis

Le Traon

and Harman

, An Empirical Study on Mutation, Statement and Branch Coverage Fault Revelation That Avoids the Unreliable Clean Program Assumption, in: 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), Buenos Aires, May 2017, pp. 597–608. doi: 10.1109/ICSE.2017.61.

12.

Chella

Cossentino

Sabatucci

and Seidita

, From passi to agile passi: Tailoring a design process to meet new needs, in: Proceedings. IEEE/WIC/ACM International Conference on Intelligent Agent Technology, 2004. (IAT 2004)., 2004, pp. 471–474. doi: 10.1109/IAT.2004.1342998.

13.

Coelho

Cirilo

Kulesza

von Staa

Rashid

and Lucena

, JAT: A Test Automation Framework for Multi-Agent Systems, in: 2007 IEEE International Conference on Software Maintenance, 2007, pp. 425–434. doi: 10.1109/ICSM.2007.4362655.

14.

Coelho

Kulesza

von Staa

and Lucena

, Unit testing in multi-agent systems using mock agents and aspects, in: Proceedings of the 2006 International Workshop on Software Engineering for Large-Scale Multi-Agent Systems – SELMAS ’06, Shanghai, China, 2006, p. 83. doi: 10.1145/1138063.1138079.

15.

Coles

Laurent

Henard

Papadakis

and Ventresque

, PIT: a practical mutation testing tool for Java (demo), in: Proceedings of the 25th International Symposium on Software Testing and Analysis, New York, NY, USA, Jul. 2016, pp. 449–452. doi: 10.1145/2931037.2948707.

16.

Cortese

Caire

and Bochicchio

, JADE Test Suite User Guide, TILab, 2005. [Online]. Available: https://jade.tilab.com/doc/tutorials/JADE_TestSuite.pdf.

17.

Cossentino

Sabatucci

and Chella

, Designing JADE systems with the support of CASE tools and patterns, Spec. Issue JADE Telecom Ital. J. EXP Sept., 2003.

18.

Cunha

Diniz Da Costa

Viana

and Pereira De Lucena

C.J.

, JAT4BDI: An Aspect-Based Approach for Testing BDI Agents, in: 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), vol. 2, Dec. 2015, pp. 186–189. doi: 10.1109/WI-IAT.2015.121.

19.

Dehimi

N.E.H.

Mokhati

and Badri

, Testing HMAS-based applications: An ASPECS-based approach, Eng. Appl. Artif. Intell. 46 (Nov. 2015), 232–257. doi: 10.1016/j.engappai.2015.09.013.

20.

DeMillo

R.A.

Lipton

R.J.

and Sayward

F.G.

, Hints on test data selection: Help for the practicing programmer, Computer 11(4) (Apr. 1978), 34–41. doi: 10.1109/C-M.1978.218136.

21.

Dix

Hindriks

Logan

and Wobcke

, Engineering Multi-Agent Systems (Dagstuhl Seminar 12342), Dagstuhl Rep. 2(8) (2012), 74–98. doi: 10.4230/DagRep.2.8.74.

22.

Eck

Palomba

Castelluccio

and Bacchelli

, Understanding Flaky Tests: The Developer’s Perspective, in: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, New York, NY, USA, 2019, pp. 830–840. doi: 10.1145/3338906.3338945.

23.

Frankl

P.G.

Weiss

S.N.

and Hu

, All-uses vs mutation testing: An experimental comparison of effectiveness, J. Syst. Softw. 38(3) (Sep. 1997), 235–253. doi: 10.1016/S0164-1212(96)00154-9.

24.

Gammie

and van der Meyden

, MCK: Model Checking the Logic of Knowledge, in: Computer Aided Verification, Berlin, Heidelberg, 2004, pp. 479–483. doi: 10.1007/978-3-540-27813-9_41.

25.

Gómez-Sanz

J.J.

Botía

Serrano

and Pavón

, Testing and Debugging of MAS Interactions with INGENIAS, in: Agent-Oriented Software Engineering IX, Berlin, Heidelberg, 2009, pp. 199–212. doi: 10.1007/978-3-642-01338-6_15.

26.

Henry Coles, Mutation operators, PIT Mutation Testing, 2020. https://pitest.org/quickstart/mutators/ (accessed Jan. 20, 2021).

27.

Henry Coles Laurent

Henard

Papadakis

and Ventresque

28.

Inozemtseva

and Holmes

, Coverage is not strongly correlated with test suite effectiveness, in: Proceedings of the 36th International Conference on Software Engineering, Hyderabad India, May 2014, pp. 435–445. doi: 10.1145/2568225.2568271.

29.

Just

Jalali

Inozemtseva

Ernst

M.D.

Holmes

and Fraser

, Are mutants a valid substitute for real faults in software testing?, in: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering – FSE 2014, Hong Kong, China, 2014, pp. 654–665. doi: 10.1145/2635868.2635929.

30.

Kiczales

Hilsdale

Hugunin

Kersten

Palm

and Griswold

W.G.

, An Overview of AspectJ, in: Proceedings of the 15th European Conference on Object-Oriented Programming, Berlin, Heidelberg, 2001, pp. 327–353.

31.

Kintis

Papadakis

Papadopoulos

Valvis

Malevris

and Le Traon

, How effective are mutation testing tools? An empirical analysis of Java mutation testing tools with manual analysis and real faults, Empir. Softw. Eng. 23(4) (Aug. 2018), 2426–2463. doi: 10.1007/s10664-017-9582-5.

32.

Kravari

and Bassiliades

, A survey of agent platforms, J. Artif. Soc. Soc. Simul. 18(1) (2015), 11. doi: 10.18564/jasss.2661.

33.

Lam

Muşlu

Sajnani

and Thummalapenta

, A study on the lifecycle of flaky tests, in: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, New York, NY, USA, Jun. 2020, pp. 1471–1482. doi: 10.1145/3377811.3381749.

34.

Leotta

Cerioli

Olianas

and Ricca

, Fluent vs Basic Assertions in Java: An Empirical Study, in: 2018 11th International Conference on the Quality of Information and Communications Technology (QUATIC), Sep. 2018, pp. 184–192. doi: 10.1109/QUATIC.2018.00036.

35.

Leotta

Cerioli

Olianas

and Ricca

, Two experiments for evaluating the impact of Hamcrest and AssertJ on assertion development, Softw. Qual. J. 28(3) (Sep. 2020), 1113–1145. doi: 10.1007/s11219-020-09507-0.

36.

Leotta

Cerioli

Olianas

and Ricca

, Hamcrest vs AssertJ: An Empirical Assessment of Tester Productivity, in: Quality of Information and Communications Technology, Cham, 2019, pp. 161–176. doi: 10.1007/978-3-030-29238-6_12.

37.

Lomuscio

and Raimondi

, MCMAS: An open-source model checker for the verification of multi-agent systems, Int. J. Softw. Tools Technol. Transf. 19(1) (Feb. 2017), 9–30. doi: 10.1007/s10009-015-0378-x.

38.

Lomuscio

and Raimondi

, MCMAS: A Model Checker for the Verification of Multi-Agent Systems, in: Computer Aided Verification, Berlin, Heidelberg, 2009, pp. 682–688. doi: 10.1007/978-3-642-02658-4_55.

39.

Luo

Hariri

Eloussi

and Marinov

, An empirical analysis of flaky tests, in: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, New York, NY, USA, Nov. 2014, pp. 643–653. doi: 10.1145/2635868.2635920.

40.

Mascardi

et al., Engineering multi-agent systems: State of affairs and the road ahead, ACM SIGSOFT Softw. Eng. Notes 44(1) (Mar. 2019), 18–28. doi: 10.1145/3310013.3322175.

41.

Massimo Cossentino, From Requirements to Code with PASSI Methodology, in: Agent-Oriented Methodologies, Brian Henderson-Sellers and Paolo Giorgini, Eds. Hershey, PA, USA: IGI Global, 2005, pp. 79–106. doi: 10.4018/978-1-59140-581-8.ch004.

42.

McCabe

T.J.

, A Complexity Measure, IEEE Trans. Softw. Eng., vol. SE-2, no. 4, Art. no. 4, Dec. 1976. doi: 10.1109/TSE.1976.233837.

43.

Mockito, Spy (Mockito 3.11.0 API), javadoc, 2021. https://javadoc.io/doc/org.mockito/mockito-core/latest/org/mockito/Spy.html (accessed Jun. 05, 2021).

44.

Nascimento

Alencar

Lucena

and Cowan

, A metadata-driven approach for testing self-organizing multiagent systems, IEEE Access 8 (2020), 204256–204267. doi: 10.1109/ACCESS.2020.3036668.

45.

Nguyen

C.D.

Perini

Bernon

Pavón

and Thangarajah

, Testing in Multi-Agent Systems, in: Agent-Oriented Software Engineering X, Berlin, Heidelberg, May 2009, pp. 180–190. doi: 10.1007/978-3-642-19208-1_13.

46.

Nguyen

C.D.

Perini

and Tonella

, A Goal-Oriented Software Testing Methodology, in: Agent-Oriented Software Engineering VIII: 8th International Workshop, AOSE 2007, Honolulu, HI, USA, May 14, 2007, Revised Selected Papers, Berlin, Heidelberg, 2008, pp. 58–72. doi: 10.1007/978-3-540-79488-2_5.

47.

North

, Introducing BDD, Dan North & Associates, 2006. https://dannorth.net/introducing-bdd/ (accessed May 20, 2021).

48.

Pěchouček

and Mařík

, Industrial deployment of multi-agent technologies: Review and selected case studies, Auton. Agents Multi-Agent Syst. 17(3) (Dec. 2008), 397–431. doi: 10.1007/s10458-008-9050-0.

49.

Saaty

T.L.

, How to make a decision: The analytic hierarchy process, Eur. J. Oper. Res., vol. 48, no. 1, Art. no. 1, Sep. 1990. doi: 10.1016/0377-2217(90)90057-I.

50.

Sayward

DeMillo

Budd

T.A.

and Lipton

R.J.

, The design of a prototype mutation system for program testing, in: Managing Requirements Knowledge, International Workshop on, Los Alamitos, CA, USA, 1978, p. 623. doi: 10.1109/AFIPS.1978.195.

51.

Siraj

Mikhailov

and Keane

J.A.

, PriEsT: An interactive decision support tool to estimate priorities from pairwise comparison judgments: PriEsT: An interactive decision support tool to estimate priorities from pairwise comparison judgments, Int. Trans. Oper. Res. 22(2) (Mar. 2015), 217–235. doi: 10.1111/itor.12054.

52.

Tiryaki

A.M.

Öztuna

Dikenelli

and Erdur

R.C.

, SUNIT: A Unit Testing Framework for Test Driven Development of Multi-Agent Systems, in: Agent-Oriented Software Engineering VII, Berlin, Heidelberg, 2007, pp. 156–173. doi: 10.1007/978-3-540-70945-9_10.

53.

Vera-Pérez

O.L.

Monperrus

and Baudry

, Descartes: A PITest Engine to Detect Pseudo-Tested Methods: Tool Demonstration, in: 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), Sep. 2018, pp. 908–911. doi: 10.1145/3238147.3240474.

54.

Wang

and Zhu

, CATest: A Test Automation Framework for Multi-agent Systems, in: 2012 IEEE 36th Annual Computer Software and Applications Conference, Izmir, Turkey, Jul. 2012, pp. 148–157. doi: 10.1109/COMPSAC.2012.24.

55.

Watson

A.H.

Wallace

D.R.

and McCabe

T.J.

, Structured testing: A testing methodology using the cyclomatic complexity metric, vol. 500. US Department of Commerce, Technology Administration, National Institute of …, 1996.

56.

Winikoff

, Challenges and Directions for Engineering Multi-agent Systems, presented at the Dagstuhl Seminar 12342 (August 2012), Dagstuhl, 2012. [Online]. Available: http://arxiv.org/abs/1209.1428.

57.

Wooldridge

Fisher

Huget

M.-P.

and Parsons

, Model checking multi-agent systems with MABLE, in: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems: Part 2, New York, NY, USA, Jul. 2002, pp. 952–959. doi: 10.1145/544862.544965.

58.

Liu

and Wang

, Runtime Verification of Multi-Agent Self-Adaptive System, in: 2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD), May 2021, pp. 12–17. doi: 10.1109/CSCWD49262.2021.9437643.

59.

Zhang

Harman

Hao

Jia

and Zhang

, Predictive mutation testing, IEEE Trans. Softw. Eng. 45(9) (Sep. 2019), 898–918. doi: 10.1109/TSE.2018.2809496.

60.

Zhang

Thangarajah

and Padgham

, Automated Testing for Intelligent Agent Systems, in: Agent-Oriented Software Engineering X, vol. 6038 Gleizes

M.-P.

and Gomez-Sanz

J.J.

, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 66–79. doi: 10.1007/978-3-642-19208-1_5.

A testing framework for JADE agent-based software

Abstract

Keywords

1. Introduction

2. Literature review

2.1 General JADE agent testing solutions (non-oriented development methodology)

2.2 Development-methodology centre testing solutions

2.3 Summary

3. Requirement for a complete JADE agent testing framework

4. JADE testing framework presentation

4.1 JADE testing framework architecture

Table 1 The multi-agent systems understudy

6.1 Answering the dual-test-level question

Table 2 Test-cases per dual-test level case-type

Table 4 One-way communication cases

Table 6 Interaction protocols cases

Table 9 Agent’s code coverage JAT vs UJade vs JTF

8. Conclusions and future work

Footnotes

Acknowledgments

Author’s Bios

References

Table 1
The multi-agent systems understudy

Table 2
Test-cases per dual-test level case-type

Table 4
One-way communication cases

Table 6
Interaction protocols cases

Table 9
Agent’s code coverage JAT vs UJade vs JTF