Abstract
Assembly optimisation activities that involve assembly sequence planning and assembly line balancing have been extensively studied because of the importance of optimal assembly efficiency to manufacturing competitiveness. Numerous research works in assembly sequence planning and assembly line balancing mainly focus on developing algorithms to solve problems and to optimise assembly sequence planning and assembly line balancing. However, there is a scarcity in works that focus on developing problems to test these algorithms. In optimisation algorithm development, testing algorithms by a broad range of test problems is crucial to identify their strengths and weaknesses. This article proposes a generator of assembly sequence planning and assembly line balancing test problems with tuneable complexity levels. Experiments confirm that the selected combination of input attributes does control the generated assembly sequence planning and assembly line balancing problem complexity, and also that the generated problems can be used to identify the suitability of a given algorithm to problem types.
Introduction
In manufacturing, assembly optimisation involves bringing and joining parts and/or subassemblies together to make the process as efficient as possible. Assembly sequence planning (ASP) and assembly line balancing (ALB) are classified among major topics in assembly optimisation because both are directly related to assembly efficiency.1,2 Recently, researchers have discovered benefits of solving and optimising ASP and ALB problems together,3,4 leading to increased research focus on testing new or improved algorithms that operate on these combined problems. In order to assess the performance of new or improved algorithms and to compare them with existing algorithms, a wide range of test problems are required. In ASP and ALB optimisation works that focus on algorithm development or improvement, researchers have used two approaches to test algorithm performance. One approach is to test the algorithms using specific case studies.5,6 Another acknowledged approach is to adopt the test problems that are frequently used in literature.7,8 These approaches lack generality because there has been no investigation into the fit of algorithms to problem types. Algorithms have not been tested with a wide range of problem types.
The most frequently used test problem in ASP is an assembly of a transmission-type part with eleven components, presented by De Fazio and Whitney. 9 This problem has been presented in many articles, such as Chen and Liu, 10 Smith and Liu 11 and Wang et al., 12 to evaluate algorithm performance. Other than this widely used problem example, most ASP test problems found in literature have only been used within the same research group. There is thus no accepted standard ASP test problem for evaluating algorithm performance. On the other hand, in ALB optimisation, development of test problems was started in 1960s, resulting in many that have been developed and collected by different researchers. These problems vary in task size from eight to 297 tasks. The famous ALB problems, such as the eight-tasks by Bowman, 45 tasks by Kilbridge and Wester, 70 tasks by Tonge, 111 tasks by Arcus and 297 tasks by Scholl, are still being used until today to evaluate algorithm performance for line balancing problems. 13
Although these few benchmark ASP and ALB problems are available for comparing algorithm performance, there is no standard test problem set that covers a wide variety of problem difficulties, especially to test the combined ASP and ALB optimisation. Not only is this important for enhancing the researchers’ understanding of their algorithm, it will also help users in selecting which algorithm is more appropriate to their requirements. In order to facilitate such experimentation, a set of problems with a controllable complexity level is needed. One way to address this is to devise a test problem generator (TPG) with tuneable difficulty level that can systematically generate a set of test problems with a desired mix of complexity levels.
This article proposes a TPG with a tuneable complexity level for combined ASP and ALB problems. ‘TPG for ASP and ALB’ explains the requirements and specifications for the proposed TPG. ‘TPG development’ will explain the methodology of the TPG development, which is divided into graph and data generation methodology. Then, ‘Experimental design’ describes the experimental design to test the proposed TPG for ASP and ALB. ‘Results and discussion’ presents and discusses the experimental results in this work. Finally, we present the authors’ conclusions on the proposed TPG according to experimental results.
TPG for ASP and ALB
In the mathematical optimisation community, the importance of TPGs is widely appreciated. Although algorithm development is important, any new algorithm should ideally be tested with a wide range of problem types before making any conclusion on their usefulness. 14 Most of ASP and ALB works focus on proposing and demonstrating algorithm performance on specific ASP and/or ALB problems. There is a lack of investigation into testing and validating the performance of algorithms on wider classes of problems.
A TPG will be useful to provide a wide range of ASP and ALB problems with differing characteristics and difficulties. In many cases, the problem difficulty is only determined by the size of the problem. While this is correct in certain cases, this overlooks the influence of many other attributes on problem difficulty. Additionally, TPGs will also be useful to identify which algorithm may be more suitable for a given type of problem. This knowledge is very important to help users to choose the right algorithm, and also for researchers to identify opportunities for further improvement in a particular algorithm.
To provide the mentioned benefits, the TPG must satisfy the following requirements.
Representation. The problems are generated on the basis of assembly task and represented using a precedence graph. This was the common way to represent task-based assembly problems in earlier works. 2
Output. The TPG is expected to produce precedence graphs that represent task-based assembly problems. Besides that, the TPG must also be able to generate assembly data, which consists of assembly direction and tool for ASP and assembly time for ALB. These types of data are selected based on popularity from a literature survey. 2
Tuneable difficulty level. One of the important features expected in a TPG is a tuneable difficulty level. This feature will ensure that test problems are generated within known difficulty ranges as required.
There are not many proposals in literature on methods for generating test problems in this domain. Furthermore, existing proposals are limited to generating test problems for ALB. Bhattacharjee and Sahu’s proposal is to generate a random precedence graph to represent an ALB problem. 15 In this approach, the assembly problem is generated randomly, and then the problem difficulty is measured to determine its complexity level. Later, a systematic data generator for assembly line balancing was proposed by Otto et al. 16 Besides presenting a systematic method for generating precedence graphs, this work also demonstrates that common graph structures in real-world assembly problems, i.e. chains, bottlenecks and modules, can be generated on a precedence graph. This approach is also able to generate problems at the desired difficulty levels. 16
Otto’s work is the one closest to our stated requirements because this work fulfils requirements (1) and (3). In Otto’s work, ALB problems are generated based on assembly tasks and represented using precedence graphs. It also gives users the ability to create test problem difficulty at the desired level of difficulty. However, since this work was specifically developed for ALB problems, it fails requirement (2). Therefore, in this article, the ALB-only systematic data generator proposed by Otto in 2011 will be expanded to incorporate both ASP and ALB test problems.
TPG development
The TPG was developed using the methodology presented in Figure 1. The details of each stage are explained in this section.

Test problem generator development flow.
The first stage in developing the TPG is to identify the input and output elements. Next is the independent development of automated generators for assembly graph, ASP and ALB data. Finally, the outputs from graph and data generators are synchronised and combined to produce a complete test problem set. A worked example of the proposed TPG, with outputs for each step, is presented in Table 1.
Example of test problem generation process.
Input and output elements of assembly test problems
The mapping of input and output variables is shown in Figure 2. The tuneable inputs are presented in bold and italic font.

Problem generator input and output map.
Tuneable input elements
The tuneable input elements are variables that are used to control the problem difficulty generated by the TPG. In this work, one new tuning variable is proposed and the rest are adopted from previous works. The TPG is conceptually divided into two parts: the generation of assembly graphs and the generation of assembly data. The next section will discuss the tuneable input variables for each part. Although the tuneable input variables for ALB has been discussed in earlier works, no clear link has been suggested in literature between input and specific difficulty levels for ASP. 13
Tuneable input for assembly graph
Two tuneable inputs will be used to generate precedence at a specific complexity level. The first input variable to measure graph complexity is n, the number of nodes in a graph. In ASP and ALB contexts, graph nodes represent assembly tasks for a given problem. The number of possible assembly sequences will exponentially increase with the number of nodes. In surveyed literature, the size of ASP problems varies between 5 to 75 nodes; in ALB, 86% of surveyed ALB articles used between 7 to 150 nodes, while the remaining 14% used up to 300 nodes.
Another graph input variable conceptually linked to graph difficulty is order strength. Order strength (OS) measures the relative number of precedence relation in a graph. By increasing the relative number of precedence relations, the resulting graph is expected to be more complicated.13,15 Order strength is defined as a total number of ordering relation in transitive closure divided by the possible number of ordering relation for particular graph. The order strength is calculated as
where R is the total number of ordering relations and P is the possible number of ordering relations
where n is the number of nodes.
The order strength value varies between [0, 1]. OS = 0 shows that there is no precedence relation in the graph and OS = 1 shows that there is only one feasible sequence for a particular problem. The order strength attribute is used together with order strength tolerance (δ OS ) since it is difficult (impossible in some cases) to meet the exact order strength value.
Tuneable input for assembly data
Previously, a number of time-related measures for ALB data have been proposed, such as ratio between maximum and minimum completion time between assembly lines, 17 standard deviation, 15 and time variability ratio. A time variability ratio (TV) has consistently been used in previous works and is selected for use in this work. A time variability ratio indicates the range of task time of all tasks dispersed between the assembly lines. A time variability ratio is calculated as
where t max is the maximum task time, t min is the minimum task time and ct max is the maximum cycle time.
A smaller time variability ratio value indicates that existing task times are distributed in a smaller range, which leads to an increased level of problem complexity. The t max constraint in equation (4) is introduced to avoid generation of uniformly small task time, which leads to inconsistency of difficulty levels. The ct max constraint is explained in ‘Other input elements’.
Meanwhile, in the ASP problem domain, no variable for measuring data complexity has been established. In this work, the ASP data considered are assembly directions and assembly tools. This type of data can be measured by considering how many times (i.e. frequency) a similar direction or tool appears in the problem. A common optimisation objective is to minimise direction or tool changes in a sequence of tasks. Thus, the frequency ratio (FR) is proposed to be used as an input variable that measures ASP data complexity.
where f min is the minimum data frequency and f max is the maximum data frequency.
Data with a higher frequency ratio is harder to arrange to achieve the minimum number of changes because the choice and variability of data are high. This type of data will usually produce a higher number of changes compared with smaller frequency ratio data. The details of graph and assembly data attributes level are shown in Table 2. In this table, the attribute level for the ‘number of nodes’ is proposed based on a survey on problem sizes, as mentioned in ‘Tuneable input elements’, while the proposed classification of frequency ratio and time variability levels are based on a few initial tests. The proposed classification of order strength levels is adopted from literature review. 16
Assembly graph and assembly data attribute levels.
The tuneable input variables are classified into low, medium and high levels because of nonlinearity of the problem. Although the general trend of problem difficulty over tuneable variables can be predicted, when tuning for a targeted difficulty level, too small variable changes may lead to inconsistent difficulty levels. The classification of level difficulties, as in Table 2, can be used as a guideline for users in selecting appropriate difficulty levels for their use. To reduce the possibility of inconsistent difficulty levels, it is suggested to use the midpoint of the medium level to generate a medium difficulty problem.
Other input elements
Apart from the tuneable elements, there are other ‘compulsory’ inputs that are required for generating a complete problem. Although some of these variables have implications to the problem difficulty level, they are not used here as a means to control the problem difficulty because of a lack of agreement in literature. These inputs are: number of stages (s), maximum cycle time (ct max ), number of assembly direction (n dir ) and number of assembly tool (n tool ). Number of stages (s) refers to the number of columns that contain nodes in a specific precedence graph. In Figure 3, the example graph consists of three stages (hence s = 3), which are shown separated by dotted lines. This variable determines the basic shape of the graph, where a smaller number of stages will produce graphs with more parallel nodes. The maximum cycle time (ct max ) is the upper limit of allowable cycle time. This variable is calculated from the required production rate of the assembly line. The number of directions and number of tools are also required to generate ASP data.

Example of precedence graph.
Another important element of the TPG is the pseudo-random number generator that underlies most of the data generation algorithm. In this work, the pseudo-random generator used is Mersenne Twister with the range between [0, 232−1] for a 32-bit integer. 18 Appropriate use of seed values ensures that all results are reproducible.
Output elements
There are two sets of outputs generated by the proposed TPG. The first output is the assembly precedence graph (e.g. Figure 3), represented by a precedence matrix, which is an n × n matrix filled with 1 or 0 values (Table 3). The left-most column shows assembly tasks and the top row shows the follower tasks. The value 1 shows that the task j must be performed after task i.
Example of a precedence matrix.
The second output is a data matrix that consists of assembly directions, assembly tools and assembly time associated with every task. This data is generated according to the required difficulty level as determined in tuneable input variables.
Assembly graph generation
In this work, the systematic graph generation method is adopted from Otto’s work. 16 The five steps below are as proposed in that work.
Step 1. Provide all the compulsory inputs. The compulsory inputs are number of nodes (n), desired order strength (OS d ), order strength tolerance (δ OS ) and number of stages (s).
Step 2. Generate and distribute the nodes in all stages using uniform distribution.
Step 3. Connect every node in stage k >1 with exactly one random node in stage k − 1. This step is important to keep the nodes in their original stages.
Step 4. Calculate the order strength using equation (1). If the order strength is within OS d ± δ OS , then terminate the process. Otherwise, continue with Step 5.
Step 5. Select a node i in stage k < s and insert an arc to a random node j in stage m > k until the desired order strength is achieved. A direct arc from node i to node j is allowed only if:
Task i has no restriction, such as isolated node or special structure;
The order strength values have not exceeded the desired upper limit.
ASP data generation
In this work, the ASP data that are considered are ‘assembly direction’ and ‘assembly tool change’. The following steps are applied to generate these data. Besides number of tasks, n, the required input in ASP data generation is ASP ‘data frequency ratio’, FR.
Step 1. Calculate all possible lower (L limit ) and upper (U limit ) limits of data frequencies according to FR. The L limit and U limit represent the minimum and maximum number of times that a particular direction or tool appear in the generated problem. These limits must fulfil the following constraints
Equations (6) and (7) ensure that the summation of generated data within upper and lower limits matches the number of tasks, n. In these equations, n type represent the number of direction (n dir ) or number of tool (n tool ) type. In this work, six major direction axes (+x, −x, +y, −y, +z, −z) are considered, thus n type for n dir is equal to six. Meanwhile the n type for n tool depends on the number of tool types in a particular assembly line.
Step 2. Randomly select a pair of lower and upper limits from the set of possible limits determined in Step 1. Generate remaining data frequencies using uniform distribution. The summation of data frequencies must be equal to n.
Step 3. Generate the ASP data based on frequencies (Step 2) in random order.
ALB data generation
The ALB data to be generated is the ‘task time’ for all nodes. The required inputs are ‘maximum cycle time’ (ct max ) and ‘time variability ratio’ (TV). This data is generated in two steps:
Step 1. Calculate all possible limit of task time based on TV. The upper limit must not exceed ct max . Randomly select an upper and lower limit from all possible limit pairs.
Step 2. Generate the remaining task times between upper and lower limits using uniform distribution.
Combine and synchronise the graph and data output
Synchronisation of ASP-specific and ALB-specific outputs is straightforward because both ASP and ALB representations are both developed using the same assembly task basis. 19 Data generated in ‘ASP data generation’ and ‘ALB data generation’ are directly linked with assembly tasks and no further adjustment is needed. In this synchronisation step, the output data consisting of ASP data from ‘ASP data generation’ and ALB data from ‘ALB data generation’ are combined to establish a data matrix. In the data matrix, the assembly direction data is located on the first column, assembly tool data in the second column and assembly data for ALB in the third column.
The final process in this step is to transform the precedence graph into a precedence matrix as explained in ‘Output elements’. This is an important process to synchronise the format of the assembly graph into readable computer language.
Experimental design
This section describes the set-up of the experimental design to assess problems generated using the proposed TPG. The experiment is divided into two phases. In Phase 1, the experiment will focus on the ability of the TPG to generate problems at a desired complexity level by manipulating the tuneable input attributes. Then, in Phase 2, the generated problems from the TPG will be used to evaluate the performance of a set of selected algorithms. The purpose of the second phase experiment is to identify if the generated problems from the TPG can be used to characterise the best and worst performance of each algorithm.
Phase 1. Testing of tuneable input
The experiment in this phase is conducted by dividing all the tuneable input variables into five levels, as presented in Table 4.
Tuneable input level setting.
A reference variable setting (datum) is selected as a baseline, while the rest of the problem variable settings are generated by changing only one variable value at a time. In this case, level 3 is selected as the reference variable setting because it is in the middle between the minimum and maximum value. The complete experimental table for Phase 1 is shown in Table 5.
Experimental table for Phase 1.
From Table 5, 17 test problems are generated by changing one variable at a time. Problem 1 represents the reference variable setting, problems 2–5 examine the effect of n, problems 6–9 for effect of OS, problems 10–13 for effect of TV and problems 14–17 for effect of FR.
In order to solve precedence graphs, the topological sort algorithm is used to generate feasible assembly sequences. This approach will ensure that the generated sequences are always feasible by sorting the nodes into ‘available’ and ‘unavailable’ tasks, during the sequence generation process. 20
To test the generated problems, three different algorithms were selected for each problem type. For an ASP problem, a multi-objective genetic algorithm (MOGA) that was used in Choi et al. 21 is chosen. This algorithm is selected because, in common with this work, it used task-based representation in representing ASP problems. Additionally, a genetic algorithm is one of the most frequently used algorithms for solving and optimising ASP problems. 2 In this algorithm, the fitness function for ASP is
where dc is the number of direction changes, tc is the number of tool changes, dc max is the maximum possible number of direction changes, tc max maximum possible number of tool changes and dc max , tc max is the number of nodes – 1.
To test the ALB problem, an ant colony optimisation (ACO) algorithm that has been used for a simple assembly line balancing problem (SALBP) in Bautista and Pereira 22 is used. This algorithm is selected based on citation popularity. In addition, an ant colony algorithm is also one of the frequently used algorithms to solve and optimise the ALB problem. 2 In this algorithm, the fitness function is designed as
where ct is the cycle time, nws is the number of workstations, wload is the workload variance, ctmax is the maximum possible cycle time, nws max is the maximum possible number of workstations and wload max is the maximum possible workload variance.
Finally, for an integrated ASP and ALB problem, a hybrid genetic algorithm (HGA) that used in Chen et al. 3 is selected. This algorithm is also selected based on the popularity of this work for integrated ASP and ALB. The fitness function for this problem is designed as
Phase 2. Algorithm testing using generated problems
In Phase 2, the algorithms’ performance to generate a Pareto optimal solution for a combined ASP and ALB problem are tested. The purpose of this test is to determine whether the problems generated by the TPG have sufficient variety that enables users to perceive differences in algorithm performance. To perform this test, the MOGA and ACO algorithm, previously used to optimise ASP and ALB independently, will be used to optimise a combined ASP and ALB problem alongside a HGA. The objective function set for this experiment is as follows.
f 1 = minimise number of direction change
f 2 = minimise number of tool change
f 3 = minimise cycle time
f 4 = minimise number of workstations
f 5 = minimise workload deviation
In order to evaluate the performance of each algorithm when dealing with different complexity problems, the following performance indicators are adopted from Deb 23 and Yoosefelahi et al. 24 are used.
Number of non-dominated solutions in Pareto optimal, : Show the number of non-dominated solutions generated by each algorithm in Pareto solution. Higher indicates better algorithm performance.
Error ratio, ER: ER is given by dividing the number of solutions which are not members of the Pareto optimal set with the total number of solutions generated by algorithm q. Smaller ER indicates better algorithm performance.
Generational distance, GD: GD finds an average distance of solution with the nearest Pareto optimal solution. A smaller GD indicates better algorithm performance.
where s q is the number of solutions generated by algorithm q
where
iv. Spacing: This indicator measures the relative distance between each solution.
where
v. Maximum Spread, Max spread : Measures the extent of solution distribution found by the algorithm. Larger maximum spread is better.
Results and discussion
Phase 1: Results
The output from Phase 1 experiments are presented in Figures 4 to 7, showing the average of best fitness value from ten runs.

Average of best fitness for a range of n (number of tasks).

Average of best fitness for different OS value.

Average of best fitness for different frequency ratio.

Average of best fitness for different time variability ratio.
Number of tasks (n)
Figure 4 shows the effect of n on the ASP, ALB and combined ASPALB problem difficulties. In all cases, the problems with a larger number of tasks tend to be found to have better fitness, although they have similar tuneable input setting for OS, FR and TV. This output pattern is related with increments of problem difficulties when the number of tasks is increased. The output trend is also consistent with previous works, such as in Scholl, 13 Bhatterji and Sahu 15 and Otto et al. 16
Order strength (OS)
Figure 5 shows the effect of OS change for ASP problems with 15, 20, 40, 60 and 80 tasks. In these graphs, the ASP problems with high OS values tend to produce better fitness values compared with low and medium OS values. A similar output pattern is also found in ALB and combined ASPALB problems, as shown in Figure 5. This result indicates that problems with higher OS values will have lesser difficulty levels compared with low OS values. This finding corroborates a few previous works,25–27 while contradicting a few works that associate higher OS values with greater complexity.13,16
This mismatch is due to the dissimilar approaches used in solving the precedence graph. In the works that directly used generated permutation as an assembly sequence, precedence graphs with higher OS values are harder to solve. Direct permutation has a high probability of generating infeasible sequences; since the numbers of precedence constraints in high OS graphs are higher than low OS graph while the search space for both conditions remains the same.
On the other hand, in the works that ensures the feasibility of sequence, such as using topological sort, the precedence graph with higher OS is easier to solve, because of differences in the search space size. The OS value directly influences the number of possible feasible sequences in a precedence graph. In this case, the number of feasible sequences in high OS is smaller than in low OS, because the precedence constraints limit the flexibility of re-sequencing. Since the search space for the precedence graph with high OS is smaller than a low OS, it is easier to generate a solution with better fitness in high OS graphs than with low OS graphs.
Nevertheless, there is inconsistency in outputs for ASP with OS 0.5 and 0.6, ALB with OS 0.4 and 0.5 and combined ASPALB with OS 0.5 and 0.6. For these cases, the problem with a smaller OS emerges with better fitness compared with a larger OS. A likely explanation is that the chosen OS gaps for these problems are too small, since it does not happen in larger OS gaps, such as between OS 0.6 and 0.4 or smaller. Small OS gap means that there is only small search space difference between the two problems that has influenced the inconsistency of results for both conditions. Therefore, to ensure a clear separation between one difficulty levels with another, OS gaps that are too small should be avoided. More investigation is needed to fully investigate the effect of OS.
Frequency ratio (FR)
The output from the ASP problem in Figure 6 shows that the proposed complexity attributes of FR can be used to control the ASP data complexity.
ASP data with a high FR will have a wider range of choices that directly increase the size of search space. In contrast, ASP data with a low FR have a smaller search space owing to a more limited data variety. As a consequence, the algorithms found it more difficult to achieve minimum direction and tool change for ASP data with a higher FR.
Time variability ratio (TV)
ALB results in Figure 7 confirm that the time variability ratio adopted from previous works is effective to control the assembly time data complexity.13,16
The assembly times with higher TV are easier to arrange because the combination of small and large task times tend to fit the cycle time better than uniformly large task times (low TV). Finally the combined ASPALB outputs in this figure clearly show that the TV input variable is able to control the assembly data difficulties as expected.
The results of the tuneable input test show that the ASP and ALB problem complexity can be controlled via the input attributes of the TPG. Although the early assumption that the precedence graph with higher OS will have greater complexity is unfounded, this attribute’s usefulness is maintained by redefining its value: to generate precedence graphs with a low complexity, a higher OS level must be used, while for graphs with a high complexity, the OS must be set to the lower level. It is found that the selection of a tuneable input level is also important to ensure that the desired problem difficulty is achieved. Selection of proper gaps between one level to another is very important to avoid inconsistent problem difficulty.
In order to test the significance of the results, statistical tests are performed. In this case, an ANOVA test is carried out to test if there are any significant differences between the results of one level with results from another level. The null hypothesis stated that there would be no difference among five tuneable input levels means. The summary of the ANOVA test is presented in Table 6.
Summary of the ANOVA test.
n: number of tasks; OS: order strength; FR: frequency ratio; TV: time variability ratio; SSB: sum of square between groups; SSW: sum of square within groups; SS: sum of square total; MSB: mean squares between groups; MSW: mean squares within groups.
In this case, the critical f-value (f*) that is acquired with a 0.05 level of significance from the f-distribution table is 2.22. 28 Table 6 consistently shows larger f-values compared with f*. Since all the f-values are larger than f*, the null hypothesis for all tuneable input is rejected. In other words, it shows that at 0.05 confidence levels, there are statistically significant differences between levels for n, OS, FR and TV. However, this test does not tell us the exact groups or levels that have statistically significant difference in means. Therefore, a posteriori test known as Tukey’s honestly significant different test (Tukey’s HSD test) is performed.
Tukey’s HSD test compares the mean of the rejected null hypothesis with the means of other groups to identify if there is any significant difference between the mean of one level with another. The value of the absolute difference between two means will be compared with a critical HSD as proposed in the Tukey’s table. 28 The summary of the Tukey’s HSD test at 0.05 confidence interval is presented in Table 7. From Table 7, the absolute mean difference between two levels for n and TV consistently indicates larger values than the critical HSD value. It shows that there are significant differences in all levels for n and TV. It means that the problem difficulties for these variables can be statistically distinguished between each level.
Summary of Tukey’s HSD test.
HSD: honestly significant different; n: number of tasks; OS: order strength; FR: frequency ratio; TV: time variability ratio.
Meanwhile, in the OS and FR variables, the absolute mean difference also shows larger values than critical HSD except for the cases between OS values of 0.2 and 0.3, OS values 0.5 and 0.6, FR values 0.3 and 0.4, and FR values 0.5 and 0.6. This result is related to the selection of appropriate gaps between levels, since it only occurs between adjacent levels. Consistent with earlier discussions on the effect of OS change on the problem difficulties (Figure 5), too small gaps between consecutive levels should be avoided.
Phase 2: Results
In this phase, 25 problems with different difficulty settings are used to demonstrate the usefulness of the TPG for testing the performance of algorithms. The set-up is for multi-objective optimisation of ASP, ALB, and ASPALB problems. The assembly problem for this experiment is set up as in Tables 8 and 9.
Problem setting for experiments in Phase 2.
Attribute settings for different graph and data difficulty levels.
n: number of tasks; OS: order strength; FR: frequency ratio; TV: time variability ratio.
The tuneable input for these test problems are grouped into graph difficulty (n and OS variables) and data difficulty (FR and TV variables). The data of test problems that are used in this work can be downloaded from http://public.cranfield.ac.uk/s135489/ASP_ALB_Test_Problems_Data.zip.
The results from Phase 2 experiments are summarised in Table 10. Numbers in brackets are weighting values that are assigned to each algorithm based on its performance for the respective indicator. For every indicator in a given problem, the best result is assigned a weight value of 3, while the second and third positions are assigned weight values of 2 and 1, respectively. Then, the algorithm ranking is made through comparison of the weighted sums.
Summary of the result of experiments on selected multi-objective algorithms.
Numbers in brackets are weighting values from the best (weight = 3) to worst (weight = 1) performance.
Based on the results in Table 10, the HGA consistently shows the best performance in all problems having low and low–medium graph difficulties (problems 1–10). Meanwhile, the MOGA shows better performance compare with the ACO algorithm for problems 1–3, but then showed inconsistent performance for problems 4–10. In problems 4–10, the ACO algorithm starts to overcome the MOGA performance in some cases.
Meanwhile, for the problems with medium and medium–high graph difficulties (problems 11–20), the HGA and ACO algorithms alternately lead the algorithms in the first rank. However, when the graph difficulty is increased to a high difficulty (problems 21–25), ACO has consistently shown a better performance and followed by HGA and MOGA. The relative performance of each algorithm is presented graphically in Figure 8.

Algorithm’s ranking for the range of test problems.
Conclusions
In this work, a TPG with tuneable complexity for ASP and ALB problems has been proposed. A set of experiments has been conducted to assess the TPG. Experimental results confirm that problem complexities can be controlled by tuneable input variables.
The results from the Phase 1 experiments that test the effects of tuneable inputs confirm the ability of the TPG to generate problems with varying complexity levels. The problem difficulties will increase when using a larger number of tasks (n), smaller order strength (OS) value, larger frequency ratio (FR) or smaller time variability ratio (TV). As presented in the results, n and OS influence the assembly graph difficulties, while FR and TV influence the assembly data difficulties. The results of statistical tests confirmed that there are significant differences of the problem difficulties when changing the value of n and TV variables. On the other hand, the significant difference of problem difficulties can also be achieved by selecting the appropriate value for OS and FR, as suggested in Table 2.
The results of the algorithm performance experiment in Phase 2 show that the HGA consistently performed well in optimising problems with low and medium difficulties. Meanwhile, the ACO showed good performance in problems with a high level of difficulty. Based on the performance of both algorithms, the HGA is recommended for integrated ASP and ALB problems with low and medium difficulties, while the ACO is for ASP and ALB problems at high difficulty. These findings confirm that the problems generated by the TPG offer a sufficient range of problem varieties to be used in algorithm testing. The generated problems were found to be useful to identify the strengths and weaknesses of the tested algorithms.
Although further experiments are needed to confirm these strengths and weaknesses, the TPG has provided an important path by supplying a variety of ASP and ALB problems for systematic testing. Therefore, it can be concluded that the proposed TPG is able to generate combined ASP and ALB problems in a wide range of difficulties.
Footnotes
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
