Abstract
The clustering of spatial-geographic units, zones or areas has been used to solve problems related to Territorial Design. Clustering adapts to the definition of territorial design for a specific problem, which demands spatial data processing under clustering schemes with topological requirements for the zones. For small sized instances, once the geographical compactness is attended to as an objective function, this problem has been solved by exact methods with an acceptable response time. However, for larger instances and due to the combinatory nature of this problem, the computational complexity increases and the use of approximated methods becomes a necessity, in such a way that when geographical compactness was the only cost function a couple of approximated methods were incorporated giving satisfactory results. A particular case of this kind of problems that has had our attention in recent years is the classification by partitioning of AGEBs (Basic Geographical Units by its initials in Spanish). Some work has been made related to the formation of compact groups of AGEBs, but additional re-strictions were often not considered. A very interesting and highly demanded application problem is to extend the compact clustering to form groups with for some of its descriptive variables. This problem translates to a multi-objective approach that has to pursue two objectives to attain a balance between them. At this point, to reach a set of non-dominated and non-comparable solutions, a method has been included that allows obtaining the Pareto front through the Hasse diagram, which implies proposing a mathematical programming model and the synthetic resulting between compactness and homogeneity.

Introduction
Partitioning problems have been widely studied in the literature, however, their combinatorial nature makes their resolution difficult using exact methods due to the computational complexity associated with the hard NP category. When only one objective is treated, is possible to obtain good results in acceptable computational time. However, when partitioning is multiobjective partitioning problem, The difficulty lies in solving at least two aspects: (a) the generation of a set of suboptimal and non-dominated solutions and (b) the design of a partitioning algorithm that builds groups (clusters), simultaneously satisfying two objective functions. In other words, the existence of clustering problems for spatial data implies proposing solutions within the framework of partitioning with the use of heuristic methods, even with the incorporation of multi-objective techniques when dealing with more than one objective. In this sense, and especially for census grouping problems, the resolution focuses on obtaining compact and homogeneous groupings for population/census scenarios.
The grouping of census data with their respective variants presents serious problems given the selection of variables that will intervene in the grouping. On the other hand, the cluster must also be compact considering the spatial location of these data. This gives rise to proposing a method that groups census data with population characteristics to respond to homogeneity and, in parallel, geometrically compact groupings are achieved by taking the spatial position in
To solve this problem, we have taken advantage of previous results on geographic partitioning [1], basic multi-objective theory [2], the Variable Neighborhood Search VNS heuristic [3], and Maxima properties [4].
At this point, the main contribution of this work is to integrate within the multi-objective partitioning algorithm both VNS to achieve approximate solutions and the multi-objective method based on order theory, which will choose from these solutions the ones that are not dominated.
In the territory design problem that is addressed, the objective is to obtain a partition of geographics objects (Basic Geostatistical Areas). Its composition is given by 2 components: geographic coordinates in the
A set of solutions is sought, made up of partitions, where each element of the partition is geographically very close to each other such that the intra-group geometric compactness is satisfied. For the homogeneity objective, the balance is calculated for some census variables that define the problem and it is incorporated into the Variable Environment Search within the optimization process to achieve approximate solutions that conform to the process of identifying the non-dominated solutions (Minima) [3, 4].
This work is organized as follows: Introduction in Section 1. In Section 2 the definitions, basic concepts and brief state of the art are exposed. The model for compactness is described in Section 3. Section 4 presents the modeling for homogeneity. Section 5 deals with introducing multiobjective concepts and order theory. Finally, Section 6 exposes the application on a drainage problem and shows the set of solutions not dominated through the Minimum that are reflected in the associated Pareto Front. Finally the conclusions and references are presented.
Preliminaries
In science and engineering problems, the cost of evaluating the functions that describe them is high, specifically in optimization processes when the dimensions of the multi-objective search space are expanded, that cost rises exponentially.
Analytical optimization methods have been used for decades and for real problems, the number of variables that must be taken into account and the mathematical nature of the models require simplifications that allow the application of multi-objective strategies that support committed optimization with more than one criterion cost or fitness, which optimize objectives simultaneously by putting them in competition with each other, so that for each value of a fitness or cost function the best of the other objectives is obtained. Thus, the solution of the general problem is a combination of the optimal solutions of each objective, so said solution is not unique.
Related works
For the labor problems that we are presenting, some articles present different solutions, for example [6], have focused on improving an evolutionary metaheuristic based on vector evaluation to solve two multi-objective problems 1) environmental-economic dispatch and 2) portfolio optimization. The purpose is to deal with two populations independently and exchange information between them. The first population evolves according to the best individual of the second population and vice versa. The algorithmic choice will be executed in each generation stochastically between three evolutionary algorithms. To evaluate the results, the authors use an established metric in multi-objective evolutionary algorithms called hypervolume [6].
Multi-objective evolutionary algorithms (MOEA) optimize objectives by registering the solutions on the Pareto front. On the other hand, for the performance of the MOAS, the rate of convergence and the diversity of the population are used. In multi-objective clustering problems, to design a good algorithm with a high rate of convergence and diversity, k-means clustering is used because it provides diversity in the decision search space, then it is combined with a real-parameter version of the enhanced environmental adaptation method (IEAM-RP) because IEAM-RP converge very fast [7].
Vehicular networks have unique capabilities and are associated with vehicle communications to specific infrastructure to reduce traffic dynamics and support providers and the driver, then, it is possible to rely on security and communication alerts. Although the authors do not treat the problem as multi-objective, a subtle optimization/clustering way is identified in the location-allocation context to solve the neighborhood location registration and communications problem. The information of these messages in the communications is optimized and only important information is transmitted to the relevant vehicles and for the optimization of communication, an Artificial Bee Colony (ABC) swarm intelligence algorithm is used with a diffuse component with which, they achieve greater precision of optimization in the transmission of information with the minimum delay and with the lowest energy consumption [8].
For our work, we agree that the sets involved in this section are immersed in a metric space
A partition of a set
A k-partition of a set
It should be noted that the power set contains all the subsets, therefore, all the classes or groups that can be formed from
The distance between two nonempty sets of objects
then
By consistency for each
Considering that an object can be represented in the plane under a graph, the following is defined: An undirected graph is a pair of sets
The order of
If a graph is not connected, we say that it is disjointed or disconnected. If the distance between
Statement 1. A component of
Statement 2. A subset
Compactness
Geometric compactness in land design has not been precisely defined. In this context, the researchers have concentrated on describing the measure quantitatively based on the problem, which has implied the existence of many compactness proposals [10].
Intuitively, geometric compactness can be understood when we imagine that the shape of each area grouping resembles a square, a circle or a convex geometric figure, this means that in some cases the measurements are not totally convincing [11]. In other words, geometric compactness has been expressed as a condition that seeks to avoid the creation of areas of irregular or non-polygonal shape. It is also considered that the compactness constitutes clarity in the delimitation of the zones and it has been appreciated that the compact zones are easier to administer and evaluate due to the reduction of transfer times and communication difficulties [12].
According to the above, we have defined compactness as follows:
To formalize this idea when the objects are AGEBs, the distance between groups is first defined.
If we denote by
A group
A group
An example of how two objects meet the above constraints is shown in Fig. 1.
Four compact groups.
Considering the previous, a group
A partition
Note that with this definition the space
Compactness representation.
Another useful way of looking at the compactness of a group
Taking into account the previous observation, it is define the compactness of a partition as follows:
The following proposition establishes an equivalence with Definition 1.
Demostration. If
There is an imprecision in terms of homogeneity and balance. What we are trying to find is that the groups of zones have equilibrium or balance with respect to one or more properties of the geographical units that form them. For example, zones can be designed that have the same workload, the same travel times, or the same percentages of socio-economic representation. In general, it is not possible to achieve the exact equilibrium, therefore, it is chosen to calculate the deviation with respect to the ideal arrangement. The greater the deviation, the worse the equilibrium of the zone.
Population balance
In the design of zones for a logistical problem, for example, it is desirable that all zones contain the same amount of population or at least that the population difference between each zone is the minimum possible to optimize transfers. Different measures can be defined to calculate the population balance of zones, how-ever, each measure depends on the specific problem and all of them lead to very similar results.
The formulas for homogeneity are listed below:
The simplest way to measure the population balance is to add the absolute values of the difference between the population of each area and the population average per area
where The difference in population between the most populated area and the least populated area Sometimes this difference is divided by the population average The division of the most populated area by the least populated The following method is given by the function
Where:
In this way, it is sought that the population of each area is within the interval
It is observed that this function will take the value of 0 if the population of each zone is in the interval
Homogeneity for spatial data of a population makes sense when the values of the population variables in a grouping process have “more or less” the same values, that is, homogeneous groups are those that want the groups to maintain ap-proximately the same amount of a population variable, for example, each group having roughly the same number of inhabitants.
Homogeneous grouping has many applications; It is useful when you want to obtain groups of zones to which workloads or resources must be assigned equally.
For example, assume that it is necessary to supply a state of the republic with a service
Homogeneity over agebs for population variables
If we denote by
A group
When only homogeneity is considered, the results are not practical since in the absence of compacidade, the dispersion of the objects is an expected consequence (see Fig. 3) [14].
The solution to the homogeneous classification in which the objects that make up each of the four groups are scattered throughout the state. Each group has roughly the same number of dwellings.
According to the above, a desirable characteristic is that the groups in an area have a balance with respect to one or more properties of the geographic units that form them. A population variable can be considered as a function
For example,
Note that Definition 5, in practice, turns out to be an ideal condition, since in many territorial design problems this condition is clearly impossible. To cite an example, if
where
Note that a
If a partition
Then if the bound for the homogeneity error is small and the function
In many problems of territorial design, more than one population variable may intervene, for example, it may be desirable to consider an area
for each
where
In the search for strategies that support the simultaneous optimization of more than one cost or aptitude criterion, multi-object optimization arises, which seeks to optimize objectives simultaneously and in competition, in such a way that for each value of a cost function it is obtained the best combination of objectives. This situation indicates that the solution of this type of problem is a combination of the optimal solutions for each objective, so this solution is not unique [15, 16].
It is possible to say that the implementation of algorithms to solve multiobjective optimization problems is divided into two categories that obey the type of objective functions used. The first group assumes low computational cost objective functions and the overall efficiency of the optimization algorithms is pursued. The second group focuses on building algorithms where the computational cost is high. In this situation it is assumed that the biggest problem is centered on the space of solutions and the competition between the objective functions, therefore, it is interesting that the algorithmic procedure in the optimization reduces the number of evaluations of the cost functions since it has been experienced that computationally, these govern the duration and process of the optimization.
In general, it is true that most of the optimization problems in the real world are non-linear and multimodal, with more than one objective to solve, which are contradictory to each other and subject to various complex constraints. Even for single-objective problems, it is common that there are no optimal solutions that will satisfy them. Consequently, multi-objective optimization is defined as the problem of finding a vector of decision variables that satisfies certain restrictions and optimizes a vector of functions whose elements represent the objective functions [17]. Multi-objective optimization is not restricted to the search for a single solution, but for a set of solutions called Non-Dominated solutions. Each solution of this set is said to be a Pareto Optimum and they are represented in the space of the values of the objective functions, which make up what is known as the Pareto Front.
In a specific problem, obtaining the Pareto Front is the main purpose of Multiobjective Optimization. Multi-objetive scheme.
Multi-objective problems can be made clearer by identifying the relationships between the characteristics of the problem, its constraints, and the main objectives to be improved together. For these types of problems, it is possible to have an expression as a mathematical function. When referring to the improvement as a whole, it is said that all functions must be optimized simultaneously, thus defining a problem of the type described below:
Minimize
Given the
and is evaluated in
Pareto front
Another common option to be used as Pareto dominance is expressed as:
with
In this way, the answer to the problem of finding non-dominated solutions is known as the solution set of the problem. In the other hand, the group of values of the objective function with domain restricted to the vectors of the solution set (the non-dominated vectors), is what is well know as the Pareto front.
In summary, the Pareto optimal set is the solution space of the problem and the Pareto Front is its image with respect to the function
The Pareto optimal for a given multiobjective problem is a partially ordered set, seen formally. In multiobjective problems, we look for the minimal elements of the solution space
Order relations
Reflexivity: Every element of A is related to itself, that is to say
Antisymmetry: If two elements of
Transitivity:
A relation of order R on a set A can be denoted by the ordered pair
If
If
If we have
The following statement is of vital importance to generate non-dominated solutions:
From a partial order, the dominance relationship
When it happens that
A
Let
Proof:
Let
So it fulfills the following implications:
(a)
If
Proof:
By definition
They are certain implications. The last expression can be written as
Let
To conclude this subsection, minimal and maximal are cited as components of a Hasse diagram.
Maximal and minimal elements in a Hasse diagram
Let
Intuitively, an element n of a CPO
Similarly, an element
A partition of spatial data is sought whose composition is given by 2 components: geographic coordinates in the
The first component determines a distance matrix to optimize geometric compactness (goal 1: minimization of Euclidean distances). The vector of population variables is used to optimize homogeneity (second objective: minimize some of the census variables). The problem to be solved must be described based on the population variables, therefore, before optimizing the objective functions, a selection of the variables of interest is necessary [20].
In this sense, the partition that is sought is a set of solutions such that each one is made up of objects that are geographically very close and balanced for some census variable. Under these conditions, we are facing a bi-objective problem of territory design and high computational complexity, a situation that requires the use of heuristic methods to achieve approximate solutions and multi-objective techniques to find a set of pairs of non-dominated solutions that form the front of Pareto.
Solution strategy
The problem to be solved is partitioning in TD, it is discrete, combinatorial, integer-mixed where the partitioning takes geographic objects as objects to be grouped and the algorithm considers a cost function that minimizes distances between objects and centroids and that according to the partitioning by medoids, compactness is indirectly satisfied provided that Definitions 1–3 and/or Proposition 1 of Section 3 are incorporated into the construction of the groups in the implementation. On the homogeneity side, this is optimized by seeking a balance between a census variable of interest. Both homogeneity and compactness are optimized at the same time to obtain the set of non-dominated solutions.
The grouping strategy consists of randomly choosing objects as centroids that identify the number of groups. Those non-centroid objects that have the shortest distance to a given centroid are the members of a group. This idea together meets the convenient compactness properties and is implemented in the creation of groups under the minimization of distances. At the same time, the homogeneity of the created grouping is calculated since the domain in this problem is a partition, that is, in multi-objective problems the function to be optimized has the same domain for all objectives, thus on the same partition compactness and homogeneity are optimized [21]. At this point it is important to observe that in multi-objective optimization problems the best alternative
The heuristics incorporated into the grouping algorithm should focus on finding a random initial solution (current solution), which is a partition where compactness and homogeneity have been minimized, obtaining a pair
Application
The goal is obtaining a set of groups of spatial data which composition is given by two components: geographical coordinates on the plane
The description vector is used to optimize the second objective function and consists in minimizing the homogeneity of a given census variable. The selection of the variables will be made over the data that relates the population to the tap water and drainage network services [22].
According to the National Institute of Statistics and Goegraphy (INEGI) their census data have allowed making diagnoses about sufficiency or deficit of services at home. These indicators have been the base to develop construction, expansion, improvement, financing and characterization strategies for houses. With the census variables it will be possible to carry out studies about the deterioration of houses, the access to basic services that provide comfort, ease the domestic work and improve the quality of life.
We focus in offering a partitioning of 2 groups of population related to tap water and drainage network services. These groups are formed by spatial objects known as Agebs. Each Ageb consists of a vector of variables, which represent geostatistical data from a census.
An important factor is making a decision about the viability of assigning resources to vulnerable population sector. We have chosen the population with partial tap water and drainage network services as study case. This means that the variables described on the list below are the ones that participate in the grouping and to make sure all of the Agebs are involved; we have filtered the variables with values between 0 and 100%.
Assuming that there is a government program to support this population sector, they need a set of groupings that show the distribution of this population regarding the following census variables, which are strongly related with each other:
Z119 Total of inhabited houses. Z120 Inhabited private houses. Z137 Private houses with drainage connected to the septic tank, ravine, crack, river, lake or sea. Z138 Private houses without drainage network service. Z140 Private houses with tap water inside the building. Z141 Private houses with tap water within the land. Z142 Private houses with tap water available by transportation (public water tap or from another home) Z143 Private houses that only have tap water and drainage network services. Z144 Private houses that only have drainage network and electricity services. Z145 Private houses that only have tap water and electricity services. Z146 Private houses that have tap water, drainage network and electricity services. Z147 Private houses that don’t have tap water, drainage network nor electricity services. Z136 Private houses with drainage connected to the public network [22, 23].
Application description
A model prior to the one presented in this work was developed and applied to the variables related to water and drainage described in the previous paragraph. It was considered that 8, 32 and 64 groups maintaining homogeneity in the variable private houses that don’t have tap water, drainage nor electricity services (Z147) [19].
Since in this work we have proposed a new model with the objective functions of Sections 3 and 4, we present here a different test with two groups but without altering the Pareto dominance strategy for the non-dominated solutions.
The first column of Table 1 belongs to the cost of the homogeneity function
Result of a bi-objective partition with two groups
Maximum solutions that form the pareto front
Maximum solutions that form the pareto front
In Fig. 4 it can be seen that the Pareto front has been formed according to the non-dominated solution method applied to the problem of water and drainage.
Pareto chart for the problem about houses with drainage and water problems.
If the decision maker is interested in sacrificing compactness or homogeneity he must choose a solution from the marked points of the graph.
Since it is a minimization problem, the values in Table 1 of the previous section correspond to the minimums for the objectives of compactness and homogeneity.
An important contribution is that the multiobjective technique in this work has overcome the problems where Pareto dominance through minimals did not achieve an adequate set of solutions and some pairs of dominated solutions lagged behind minimals [5, 23].
However, this problem was solved with another modeling although, it generated satisfactory results, it has been important to express homogeneity and compactness in another way to determine if the implementation of the measures that we have included in this manuscript are better than in past implementations, which it is translated as future work, however, the authors estimate that the current modeling exceeds our previous contributions. On the other hand, the model for compactness and homogeneity is innovative, however, other restrictions still need to be addressed and the model faithfully implemented, which translates into future work.
