Abstract
The rapid development of object oriented programming (OOP) technology has made it one of the mainstream programming technologies that has been widely used in the design and development of object oriented software (OOS). The inheritance, encapsulation and polymorphism properties of object-oriented language can improve the reusability, scalability and interoperability of software while increasing the difficulty of testing OOS. Researchers have proposed a variety of testing methods to test OOS among which random testing (RT) has been widely used due to its simplicity and ease of use. An OMISS-ARTsum algorithm is proposed in this paper that uses improved OMISS random test FSCS-ART with max-sum standard, which is an implementation version of fixed-sized-candidate-set ART. The OMISS-ARTsum algorithm calculates the total distance between a candidate test case and the executed test case set before the next test case is selected from the set of candidate test cases. Unlike the traditional max-sum based FSCS-ART algorithm, OMISS-ARTsum does not calculate the distance between each executed test case and the candidate case and then sum up the total distance, but uses the method of summing up all the executed test cases and the candidate cases. The information of executing test cases is saved as a whole and the distance between the executed test case set and candidate cases is calculated at the same time. Experiment shows that compared to the OMISS-ART algorithm, the proposed OMISS-ARTsum algorithm can reduce the time overhead.
Introduction
In recent years, object oriented programming (OOP) technology has experienced rapid development that has made it the mainstream programming technology and has been widely used in the design and development of object oriented software (OOS). The three characteristics of object-oriented languages, i.e., inheritance, encapsulation and polymorphism, [1, 2] help to improve software reusability, scalability and interoperability while increasing the difficulty of OOS testing [3]. Software testing is an important means of guaranteeing software quality [4]. For this reason, researchers have proposed a variety of test methods to test the security of OOS [5–9].
A very important issue in software testing is the selection of test cases. Effective software testing automatically picks up test cases because manual selection is time-consuming and labor-intensive. Random testing (RT) [10–12] is a technique for automatically selecting test cases. The principle is simple, that is, to randomly select test cases from the entire input field. RT has a wide range of practicalities, is easy to implement in automated test tools, has no overhead for selecting test cases from a test input set, and does not treat different test cases differently when test cases are selected. Therefore, the technology that guarantees software quality [13–15] makes it easy to evaluate the effectiveness of software and is also widely used in the industry [16].
Researchers have found that defective software usually has a continuous failure area. Therefore, if the test cases selected by the random test can be evenly distributed throughout the input domain, the effectiveness of the test method will be greatly improved. Based on this, Chen et al. proposed adaptive random testing (ART) [17] that is dedicated to enhancing the ability of RT to detect faults. ART guides the generation of test cases by using a similarity/non-similarity measure to distribute test cases as evenly as possible throughout the input domain [18, 19]. ART has been widely used in programs with numerical inputs. However, due to the complex structure of OOS programs, the difficulty of calculating the differences between test inputs has increased, making the application of ART to object-oriented software a great challenges.
To quantify the differences between OOS test cases, researchers have proposed the concept of distance metrics for measuring the distance between non-numeric inputs and some formulas. Ciupa et al. proposed a measure mechanism for calculating the distance between two objects from the same class or derived from the same ancestor and proposed an object-oriented adaptive random test algorithm based on this distance measure mechanism [20, 21]. Object-oriented software (ARTOO) selects the candidate case that is the farthest from all the test cases in the executed test case set from the set of candidate test cases as the next test to be executed.
Lin et al. claimed that the input space of object-oriented programs is usually multi-dimensional [22] and extended ARTOO to propose a divergence-oriented approach to adaptive random testing (DO-ART). DO-ART selects the test cases furthest from all used test cases from the test case pool so that the selected test cases are evenly distributed across the test case pool. Chen et al. observed that OOS test cases usually contain multiple objects and multiple methods of multiple classes [23], including a collection of objects and a sequence of method calls, and the distance metric used by ARTOO and DO-ART cannot calculate this. Therefore, they proposed a measure of the difference in test input in object-oriented software, which was named the object and method invocation sequence similarity (OMISS).
Let’s now review some technologies on software testing. First is random testing in which test cases are randomly selected from the entire input domain. Because of its simple principle and easy implementation, it is widely used in OOS testing. Ciupa et al. studied the predictability of randomized tests [24] and, through experiments, found out that from the beginning of the experiment, after 10 minutes, 30% of all failures were detected, and after 90 minutes, 38% of all failures were detected. At the same time, they also found that the relative number of faults detectable by random tests is predictable, but the faults detected based on different test case generators are different, ie the type of faults detectable by RT is unpredictable. Moreover, the time when the RT detected the first error was also predictable. They found that RT could quickly find the first error, as in their experiments, always within 30 seconds.
Ciupa et al. concluded that the number of faults detected by different test case generators is roughly the same [25] and, therefore, the number of detected faults can be used as a standard to judge whether a test method is good and accurate. Through experimental research, it was found that each test method has its specific detectable abnormalities. Almost no test method can detect all the abnormalities that could be detected by other test methods, that is, no test method can completely replace test methods. Therefore, they concluded that RT can be used in conjunction with other test methods to find different faults.
Random testing refers to random selection of test cases from the input domain, i.e., each test case is equally likely to be selected. Adaptive random testing selects test cases from the entire input domain according to certain rules, i.e., each test case is no longer equally likely to be selected. Chen et al. found that test cases in a program are clustered according to whether they can reveal the nature of the fault, i.e., the test cases causing the errors are gathered together, and so are the safe test cases. Based on this finding, Chen et al. proposed that if the test cases selected by the random test were evenly distributed throughout the input domain, the likelihood of these test cases detecting errors would increase. Therefore, Chen et al. proposed ART [26], which uses some test case selection rules to guide the generation of test cases, which can greatly improve the ability of random tests to find errors. Chen et al. also proposed an adaptive random test for a fixed number of candidate test cases (SCS-ART). The algorithm has a set of candidate test cases and a set of executed test cases. Each time the candidate test case set is selected, the farthest distance from all the test cases in the executed test case set is executed. The candidate use case is the next pending use case [27]. Experiments showed that SCS-ART greatly improved the ability of RT detection errors [28].
ARTOO [21] is an object-oriented adaptive random test method for testing Eiffel programs. Its test input contains a specific method and an object, which is the call object of the method. When the next test case is selected, the target object and the parameters of the method are selected, which selects the object furthest from the used object and the parameter farthest from the used parameter. Therefore, the distance metric in ARTOO is used to calculate the distance between two objects, which consists of the basic distance, the type distance and the recursive distance.
DO-ART [22] is an extension of ARTOO. Its test case involves a selected object and a sequence of method calls with length n. The last method called is the method being tested. The previous n-1 call method is used to create selected object. DO-ART uses the distance measure of ARTOO, but prepares a set of test inputs for the test program with each test input being significantly different from the other test inputs. OMISS-ART [23] is a similarity measurement mechanism that uses test input in an object-oriented program. The distance metric used in OMISS-ART calculates the distance between multiple objects, containing multiple classes and test cases for multiple methods.
Unit testing of OOS based on RT technology has been receiving attention in the software testing community [10, 29–32]. D’Amorim et al. proposed unit testing as an important, common and substantial part of software development [33]. They analyzed that unit testing typically involves two main technologies, i.e., automated test generation techniques and automated test classification techniques. Among them, automatic test generation techniques include random generation and symbolic execution and automatic test classification techniques include techniques based on uncaught exceptions inferred from manually provided tests and operational models that violate rules. Consequently, they developed a model-based symbol testing method for OOS unit testing and conducted experimental research to compare the performance of the method and the other three technologies. The other three technologies are model-based random testing, abnormal symbol based testing and random tests based on exceptions. They found that random generation is uncertain and depends on random seeds that can produce different test cases while symbolic execution has opposite properties due to incompleteness of theorem prover and incompleteness of the constraint solver. They also found that these technologies are complementary in revealing faults and it is recommended that test tools and test users apply several techniques to the same test code to find errors more effectively.
Barus et al. proposed an adaptive random test algorithm (ARTsum) [34] for testing structured software with non-numeric inputs, which has a linear time overhead, i.e., the time it takes to pick n test cases is O(n). They proposed that in order to apply ART to software with non-numeric inputs, a suitable distance tool for measuring the difference between non-numeric inputs is needed. They used the concept of classification and selection from class partition testing to measure the distance between structured test cases. Then a new FSCS-ART algorithm is implemented based on the classification-selection distance metric and the max-sum selection criteria. When ARTsum picks the next test case from the set of candidate test cases, it calculates the sum of the distances between each candidate use case and all executed test cases and selects the largest test case as the next pending test case. ARTsum uses an integer tuple S to hold information about all executed test cases and the classification-selection distance metric to calculate each candidate use case and executed test.
Adaptive random testing
Chen et al. found that the distribution of the test input causing the fault showed three modes [26], namely, dot, strip and block. Inputs that cause errors in the dot pattern are not aggregated into one or more regions, but are widely dispersed throughout a large portion of the input domain. Inputs that cause errors in the strip mode form the shape of the strip. Inputs that cause errors in the block mode are concentrated in a closed area. Studies have shown that the distribution of errors that cause the formation of strips and blocks are widespread and the point-like differences are relatively rare. Based on this, Chen et al. proposed adaptive random test to make the test cases distributed as evenly as possible throughout the input domain so that when error, strip and block distributions are presented, errors can be found more quickly, thereby increasing randomness. Test failure detection efficiency.
Many researchers have found that the inputs that cause faults tend to form a continuous region [35]. Based on this finding, Chen et al. proposed an adaptive random test [17, 26] to make the test input more evenly distributed throughout the input domain, thus improving the effectiveness of random testing. In order to achieve a uniform distribution of test inputs, researchers have proposed a number of adaptive random test algorithms.[22, 36–43] ART is no longer a random selection of test cases, but a selection of test cases through certain rules, among which are distance-based selection, selection based on exclusion regions and selection based on region division.
FSCS-ART [26] is the most classic and most important adaptive random test algorithm based on distance operation. Each time a test case is selected, it has a fixed number of candidate test cases and then the candidate test case is calculated and executed. FSCS-ART has two sets of test inputs, one is the executed set called E and the other is the candidate set called C, and the test cases in C are randomly generated. The core idea of FSCS-ART is to select the candidate test case furthest away from the test case in distance E from C according to the distance metric formula. In determining the farthest concept, a commonly used standard is max-min. The algorithm of the standard is used to first calculate the distance between each candidate test case and each test case in E, record the minimum distance, and then select the smallest distance. An alternative criterion is max-sum, which uses the standard algorithm to calculate the distance between each candidate test case and each test case in E, and then selects the distance and the largest candidate use case. Use of max-sum to select the standard FSCS-ART is shown in the following.
RRT (Restricted Random Testing) [44] is the most primitive random region-based random test algorithm. It divides a removal area with the same radius around each test case that has been executed but does not reveal the error. The test case can be selected from the removal area and can only be selected from outside the removal area.
ART-B and ART-RP are two basic randomization test algorithms based on region partitioning [38]. After ART-B randomly selects a test case from an unselected area, it divides the area and continues to pick. Each time ART-RP randomly selects test cases from the largest blank area, the area is divided at the selected test cases after selection.
MART (Mirror Adaptive Random Testing) [39] uses multiple selection rules. It first divides the input domain into multiple different regions, such as m, and then uses FSCS-ART to pick test cases in the first region and m-1 tests in other m-1 regions using mapping methods.
Algorithm validity metrics are used to describe the ability and efficiency of the algorithm to detect errors. Commonly used metrics are: F-measure (Fm), i.e., the number of test cases spent on finding the first error, Fm-time, i.e., the time it takes to find the first error, E-measure (Em), i.e., the number of failures detected by a fixed number of test cases, P-measure (Pm), i.e., the probability of discovering at least one failure in the program with a fixed number of test cases [45]. The lower the Fm of an algorithm, the lower the Fm-time, and the higher the Em, the higher the Pm and the more effective the algorithm is.
Efficient adaptive random algorithm based on omiss metrics
This section presents the OMISSsum metrics and formulas for calculating the distance between a test case and n test cases based on the OMISS metric [23]. Then, based on the OMISSsum distance metric, a time-efficient object-oriented adaptive random test algorithm is proposed.
Data abstraction, inheritance, and dynamic binding are three basic concepts of object-oriented programming [3]. Most object-oriented programming languages use classes for data abstraction and use class derivation to inherit from another class: derived classes inherit members of the base class. Dynamic binding allows the compiler to decide whether to use a function defined in the base class or a function defined in a derived class. A class is an abstraction of a series of concrete concepts that look different but essentially the same. It consists of properties and methods. An object is an instance of a class that has some specific property values and common behavior. The properties section includes custom properties and properties that are inherited from the parent class. In addition, attributes can be divided into reference attributes and non-reference attribute. The type of the reference attribute is a reference type, such as a class, etc., and the type of the non-reference attribute is a basic type, such as bool, int, char, and so on.
For the two parts: t.OBJ and t.MINV, t.OBJ represents a collection of objects and objects in an object collection can come from many different classes and t.MINV represents a sequence of method calls. When the method being called is non-static, the method call requires objects and methods, where the object is the caller, the method is the callee, and the object and method are from the same class. Therefore, for any non-static methods called in t.MINV, the corresponding call object must appear in t.OBJ. In general, for test input, the object part and the method call part are not empty. However, in the case of testing only the correctness of the constructor, the method call sequence part is empty whereas in the case of testing only the correctness of the static method, the object set part is empty.
Chen et al. proposed to use the OMISS metric to describe the difference between the two test cases in the OOS that have the test case structure, i.e., test case distance in OMISS metrics [23]. Testcase Distance is equal to the sum of the distance between the object set (TCobjDist) and the distance between the method call sequences (TCmSeqDist). That is, the distance between the two test cases t1 and t2 is: TestcaseDistance (t1, t2)
The distance between the method call sequences includes the difference in length between the two method call sequences (LenD), the difference in the method set (MsD), and the difference in the method call sequence (SD). The difference based on the method set is equal to 1 minus the number of methods in which the intersection of t1.MINV and t2.MINV is divided by the number of methods of t1.MINV and t2.MINV. The difference in the method call sequence treats the method call sequence as an ordered list and calculates the ratio of the number of positions in the two method sequences that are not the same at the same position to the length of the shorter method sequence.
When the distance between object sets in OMISS is calculated, first the set with fewer objects is expanded by adding empty objects so that the expanded object set and the set with more objects contains the same number of objects. For example, suppose there are k1 objects in t1.OBJ, k2 objects in t2.OBJ, and k2 is smaller than k1, then t2.k1-k2 empty objects should be added to OBJ so that there are k1 objects in t2.OBJ. Then the order of the objects in t1.OBJ is fixed and the k1 objects in t2.OBJ are arranged to be aligned with the objects in t1.OBJ for a total number of k1! arrangement. Then, the distances between the two objects t1.OBJ and t2.OBJ are calculated. The smallest total distance among the k1! total number of distances is the distance between t1.OBJ and t2.OBJ.
Object distance is used to measure the difference between two objects. Object distance in OMISS (ObjDist) is equal to the sum of the distance of the object’s behavior part (BehDist) and the distance of the object’s attribute part (AttDist). That is, the distance between two objects p and q is ObjDist(p, q) = BehDist(p.M, q.M)+AttDist(p.A, q.A) if neither p nor q is null, where pM represents the behavior part of the object p, i.e., the method part of the class associated with the object p and pA represents the attribute part of the object p. The distance between the behavioral parts of the two objects p and q includes the difference in the number of methods and the set of methods contained in the class they are associated with. When the behavior distance is calculated, we should first consider the behavior part of the class associated with p and q as the two method sets pM and qM. BLenD is equal to the absolute value of the difference between the number of methods included in pM and qM where kp . M represents the number of methods contained in pM; the difference in method content BMsD is equal to 1 minus the ratio of the number of methods included in the intersection of pM and qM to the number of methods contained in the union of pM and qM where pAN represents the set of non-reference attributes contained in object p and pAR represents the set of reference attributes contained in object p.
Calculation of the distance of the reference properties of the objects p and q iteratively uses the formula for calculating the distance between the object sets, i.e., pAR and qAR are treated as two object sets, and the formula is refDist (p.AR, q.AR)*TCobjDist (p.AR, q.AR) where α is a non-negative number, usually α = 1/2.
The non-reference attribute distance between two objects p and q contains two parts, one is the difference in attribute type (typeDist) and the other is the difference in attribute value, i.e., the basic distance (secDist). The basic distance is treated as an auxiliary distance in the OMISS metric and is saved in the form of the distance representation a + bi where a is the primary distance and b is the auxiliary distance because the OMISS distance definition considers the structural difference of the object to be more important than the value of the constructed object.
We now propose an object-oriented adaptive random test algorithm with lower time overhead, which takes nearly linear time in selecting n test cases. Compared to the FSCS-ART algorithm, the algorithm also has a set of candidate test cases, but uses max-sum to pick the criteria and use a new way to calculate the sum of the distances of each candidate test case from all previously executed test cases. Before describing the algorithm, let’s briefly review the basic implementation of the FSCS-ART algorithm based on the OMISS metric using the max-sum standard. Assume that n test cases have been selected and executed, which is denoted as E = t1, t2,..., tn. The test case distance in the OMISS metric is equal to the sum of the method call sequence distance and the object set distance. Object distance is equal to method distance, reference attribute distance and non-reference attribute
As discussed in the previous section, it takes O(n) time to calculate the sum of the distances between a test case and n test cases using the OMISS metric. This section improves the formula for calculating the total distance between a test case and n test cases using the OMISS metric, and gives a new calculation formula so that the loop used in the new calculation formula is independent of the number of executed test cases, i.e., it has nothing to do with n. Since the new calculation formula is based on the OMISS summation formula, the new formula for calculating the total distance between a test case and n test cases is called the OMISSsum metric. The main idea of the OMISSsum metric is to treat the n executed test cases as a whole and use only one OMISSsum metric to get the total distance.
As described, the test case t consists of two parts: a collection of objects in t.OBJ-t and the method call sequence in t.MINV-t. Let’s define the test case set E as two parts: E.OBJ and E.MINV. E.OBJ is a collection of object collections where each object in the inner set of objects comes from a different executed test case. E.MINV is a collection of method sequences. If E = t1, t2, and t1.OBJ = o11, o12, o13 and t1.MINV = m11,m12, t2.OBJ = o21, o22, o23, t2.MINV = m21, m22, m23, then E.OBJ = o11, o21, o12, o22,o13, o23, E.MINV = m11, m12, m21, m22, m23.
Given the test case t and the test case set E, the distance between t and E (sum_TestcaseDistance) is equal to the sum of the object set distance (sum_TCobjDist) and the method call sequence distance (sum_TCmSeqDist), i.e., sum_TestcaseDistance(t, E) = sum_TCobjDist(t.OBJ, E.OBJ)+sum_TCmSeqDist(t.MINV, E.MINV).
To calculate the distance between a collection of objects, i.e., the distance between t.OBJ and E.OBJ, we need to add a null value to a collection with fewer elements according to the number of elements in t.OBJ and E.OBJ to make the two collections have the number of elements. For example, for E.OBJ = o11, o21, o12, o22, o13, o23, if t.OBJ = o1, o2, we need to add an empty object to t.OBJ, and if t.OBJ = o1, o2, o3, o4, we need add an empty collection of objects to E.OBJ. The distance between t.OBJ and E.OBJ is equal to the minimum of the sum of the distances of the objects in t.OBJ and the elements in E.OBJ, which are differently arranged.
In this section, the three algorithms, i.e., OMISS-ARTsum, OMISS-ART and RT, are compared. For each method, 200 experiments are carried out with different seeds and the average was taken as the final result.
The variant test [48] is a fault-based widely used software testing technique that provides a test standard for testing the effectiveness of test case sets in fault detection capabilities. In the mutation analysis, for a given test input, if the running variant program (the wrong version obtained by injecting a simple syntax to change the original program) is different from the result of running the original program, the implanted error is found. This means that the test input has detected a fault. In the experimental analysis, the mutation test is also used to study the effectiveness and efficiency of the proposed algorithm.
In order to obtain the validity and time overhead of the OMISS-ARTsum algorithm for fault detection, we choose to study six target programs. These six programs are all from open source websites [49–52] and are implemented in C# or C++. Table 1 summarizes the details of the six target programs and their implant errors. As can be seen from the table, these target programs have advantages and disadvantages as experimental objects. Calendar and SATM are small-scale libraries that are implanted with 5 and 4 failures, respectively. The size of the NSort library and the Wavelet Library library are also relatively small, respectively, with 15 and 14 faults being implanted. The IceChat and the Foundation libraries are large and have the most number of failure implants, 24 and 28, respectively. In all six procedures, faults are implanted into randomly selected methods and none of methods is implanted with two or more errors. Thirteen common mutation operations [48, 53] are: arithmetic operator substitution, logical operator substitution, relational operator substitution, constant substitution scalar, scalar replacement scalar, scalar substitution constant, array reference replacement constant, subclass type New method calls, parameter order changes, accessor method changes, access modifier changes, hidden variable missing, and replacement of attributes with member fields.
Class libraries
Class libraries
In the experiment setup, for each target program, each test algorithm used a different seed to run 200 times to obtain the average of Fm. In order to obtain the number of errors detected when executing a fixed number of test cases, i.e. Em, each test algorithm performed a total number of 5000 test cases. In order to study the number of detected test cases as the number of test cases increases, the number of detected faults and the time required to select test cases, when the number of test cases n = 100, 500, 1000, 1500,..., 4000 is recorded as well as the detected number of failures and the time spent. The test case picking time is also the average of 200 test runs using different seeds.
When generating a test case pool, each test case contains a maximum of 5 objects and methods and the test case pool contains a total of 10,000 test cases. The number of test cases (k) to be included is set to 10 when the algorithm uses the set of candidate test cases. The experiment showed that the efficiency of fault detection increases with increasing k, but the detection efficiency increases slowly after k is greater than 10 [31]. When the algorithm calculates the distance between test cases, the recursion depth is set to 4 if recursive calculation is involved. All experiments were performed on a machine with an Intel dual-core i3-4170, 3.70 GHz processor, 4GB RAM,
With the rapid development of object-oriented programming technology, object-oriented software is widely used in all walks of life. The inheritance, encapsulation and polymorphism of object-oriented languages, although beneficial to reusability, scalability and interoperability of software, increase the difficulty of testing OOS.Researchers have proposed a variety of test methods to test the security of OOS, which is one of the most popular testing techniques due to its simplicity and ease of implementation. In order to improve the ability of RT to detect faults, researchers proposed ART that is committed to guiding the generation of test cases and distributing them evenly throughout the test input domain. ART is primarily used for programs with numeric inputs. In order to apply ART to OOS, a metric needs to be designed to calculate the distance between object-oriented software test cases. Currently, the metrics for calculating the distance between OOS test cases are the ARTOO and the OMISS metrics. The ARTOO metric is used to calculate the distance between objects and the OMISS metric is used to calculate the distance between a test case containing a collection of objects and a sequence of method calls. Since the FSCS-ART algorithm using ARTOO metrics or OMISS metrics is needed to calculate the distance between the candidate test case and all executed test cases (n test cases) when selecting the next test case, there is the need to use n ARTOO metrics or OMISS metrics. In order to reduce the number of calculations and the calculation time, two improved metrics were proposed in this paper. Three object-oriented adaptive random test algorithms were proposed based on the corresponding metrics. A large number of experiments were carried out to verify the feasibility of the proposed algorithm. The main work can be summarized as follows. We analyzed the distance metric for the object-oriented software test for processing the distance between test cases with multiple objects and multiple methods - OMISS distance metric, and improved the OMISS distance metric to calculate the distance between a test case and n test cases with a formula. We presented the OMISS-ARTsum algorithm based on the improved formula for the OMISSsum metric. OMISS-ARTsum is based on the OMISSsum metric, max-sum picking standard FSCS-ART algorithm. When the OMISS-ARTsum algorithm selects the next test case to be executed from the set of candidate test cases, the information of all executed test cases is saved as a whole and the distance between the executed test case set and the candidate use case is calculated once.
Footnotes
Acknowledgments
The work in this paper has been supported by NQI Research on Key Technologies of Resource Aggregation and Sharing (No. 2017YFF0209602).
