The paper considers new classes of spaces of finite, bounded, measurable multisets with different metrics, pseudometrics, quasimetrics, symmetrics, and some properties of these metrics. We discuss the possibilities to apply new types of metrics for estimating proximity of objects with many numerical and/or verbal attributes. We use the introduced indexes of similarity and dissimilarity of objects represented as multisets in new methods of group multiple criteria decision making.
Necessity to assess a similarity or dissimilarity of objects (options, alternatives) is presented often in various theoretical and practical problems of decision making, data analysis, pattern recognition, artificial intelligence, machine learning, bio-informatics, chemistry, and other areas. The similarity and dissimilarity of objects are evaluated by their proximity in the attribute space [1, 30].
In mathematics, there are known a lot of attribute spaces with various metrics for characterizing object features [5, 12]. We mention the spaces of vectors, sequences, functions, sets, fuzzy sets, and so on [10, 28]. Metrics in spaces of multisets or sets with repeating elements were introduced firstly by the author [15, 21] and studied in a small number of works [2, 27]. This paper describes new classes of multiset spaces. We discuss how to use new families of metrics for estimating the similarity and dissimilarity of objects with many numerical and/or verbal attributes that exist in several different exemplars (copies).
The paper has the following structure. Section 2 contains basic notions of the multiset theory. In Section 3, we define families of multisets, introduce a new concept of multiset measure. Section 4 describes new classes of spaces of finite, bounded, measurable multisets with different metrics, pseudometrics, quasimetrics, and symmetrics. Section 5 considers important properties of metrics generated by multiset measure. In Section 6, we show how to represent and compare multi-attribute objects using the proposed notions. Section 7 presents the indexes of similarity and dissimilarity of objects, generalized the known indexes for multisets and applied for group multiple criteria choice. Section 8 concludes the paper.
Basic notions of multiset theory
Let us consider the basic definitions of the multiset theory [3, 29]. A multisetA drawn from an ordinary (crisp) set X = {x1, x2, …} with different elements, called the domain, is defined as the following collection of identical elements
Here is the multiplicity function that marks the number of occurrence of an element x ∈ X in the multiset A. It is denoted by the symbol °. A multiset becomes a set when kA (x) = χA (x), where χA (x) =1 if x ∈ A, and χA (x) =0 if x ∉ A. A combination kA (x) °x of k identical elements x is called the component of multiset A and denoted also as xkA(x) A one-element set {x} = {1°x} = {x1} is called the simple set or singleton. A zero-element set {0°x} = {x0} will be named the zeron. A one-component multiset {k°x} = {xk} is called the simple multiset, multiple singleton, or multiton.
The set SuppA = {x|x ∈ X, χSuppA (x) = min [kA(x), 1]} is named the support set or carrier of a multiset A. The maximum value (x) = altA of the multiplicity function is called the height of a multiset A. The cardinality of a multiset A is defined as a total number of all elements of a multiset A: cardA = |A| = ΣxkA (x), and the dimensionality of a multiset A is defined as a total number of all distinct elements of a multiset A: dim A =/A/= ΣxχA (x) = |SuppA|. For instance, the cardinality |A| of a n-dimensional multiset A = {kA (x1) °x1, …, kA (xn) °xn} over a n-element set X = {x1, …, xn} is equal to kA (x1)+ … + kA (xn) = m, and the dimensionality /A/ of a multiset A is equal to χA (x1) + … + χA (xn) = n = |X|.
Multisets A and B are said to be equal (A = B) when kA (x) = kB (x), ∀ x ∈ X, and unequal (A ≠ B) when kA (x) ≠ kB (x) for at least one x ∈ X. For equal multisets, we have |A| = |B|, /A/=/B/, altA = altB, SuppA = SuppB. Multisets A and B are named equicardinal if |A| = |B|; equidimensional if /A/=/B/; equivalued if multisets are equicardinal and equidimensional.
A multiset B is said to be included in a multiset A (B ⊆ A), when kB (x) ≤ kA (x), ∀x ∈ X. Then a multiset B is called the submultiset or multisubset of a multiset A, and a multiset A is called the overmultiset or parent multiset of a multiset B. For B ⊆ A, we have |B| ≤ |A|, /B/≤/A/, altB ≤ altA, SuppB ⊆ SuppA. As for sets the fulfillment of conditions A ⊆ B and B ⊆ A implies the equality of multisets: A = B.
A multiset is called the constant multisetN[h] if kN[h] (x) = h = const, , the empty multiset ∅ if k∅ (x) = 0; the maximal multisetZ if , ∀x ∈ X. In the last case all multisets A of a multiset family A are submultisets of the multiset Z. Note that the empty multiset ∅ is the constant multiset N[0] of the height 0. Any ordinary set A, in particular, the domain X and the support set SuppA of a multiset A, are the constant multisets N[1] of the height 1. The maximal multiset Z with kZ (x) = k is the constant multiset N[k] of the height k.
We define the following operations on multisets [3, 29]:
Many properties of operations on multisets are analogous to the properties of operations on ordinary sets. Among them are the idempotency of union and intersection A ∪ … ∪ A = A, A ∩ … ∩ A = A, involution of the complement , identity, commutativity, associativity and distributivity of some operations. As for sets, not all operations on multisets are mutually commutative, associative, and distributive. For multisets, as well as for sets, there is a duality of operations of union and intersection with respect to a multiset complement (an analogue of the de Morgan’s laws). For multisets, there is a new kind of duality of arithmetic addition and subtraction:
For multisets, some properties of the operations, existed for sets, are absent, but new properties appear that have no analogues for sets [19–21]. For example, the equalities: , , where U is the universal set, always hold for sets. Unlike sets, for multisets, the following equalities are valid: , . At the same time, it can be either , , , and , , . The cases are also possible when and . In addition, the empty multiset ∅ and the maximal multiset Z are mutually complementary , , and .
Families and measures of multisets
Let us consider some special types of families and functions of sets and multisets [3, 29].
The family of all subsets of a set A that includes the set A and the empty set ∅ is called the power set or Boolean of a set A and often denoted as P (A).
The family of all different submultisets of a multiset A over a domain X is called the macroset or Boolean of a multiset A and will be denoted as P (A). Here the word “different“ plays an important role, since the multiset theory, in general, allows for a repetition of elements in a multiset and multisets in a family of multisets. In the family P (A), different submultisets are always presented in single exemplars, including the multiset A, which is the maximal multiset for the family P (A), the support set SuppA, and the empty multiset ∅. So, the Boolean P (A) is the set of different submultisets of a multiset A.
The family of all possible submultisets of a multiset A over a domain X is called the power multiset or multiboolean of a multiset A and will be denoted as Q (A). The multiboolean Q (A), in contrast to the Boolean P (A), is a multiset, in which several identical exemplars of submultisets of a multiset A are allowed. The Boolean P (A) of a multiset A is the support set SuppQ (A) of the multibulean Q (A). If A ⊆ A, then P (A) ⊆ P (A) ⊆ Q (A). Some of the elements of the Booleans P (A), P (A), and the multiboolean Q (A) are themselves sub(multi)sets of other (multi)sets of these families.
A family of multisets over a domain X such that every element x ∈ X appears no more than k times in any multiset A, that is kA (x) ≤ k, , is called the k-bounded series or k-bounded repetition of multisets and denoted as P[k]. The maximal multiset for the family P[k] is the constant multiset Z[k] with the multiplicity function kZ[k] (x) = k = const, ∀x ∈ X. Thus, the k-bounded series P[k] is the Boolean P (Z[k]) of the multiset Z[k]. In particular, 0-bounded series P[0] consists only of the empty multiset ∅, 1-bounded series P[1] coincides with the Boolean P (X) of the set X.
The cardinality of a familyP (A) of sets and familiesP (A), Q (A), P[k] of multisets is determined as a total number of its elements (that is, respectively, subsets of a set A and submultisets of a multiset A). For example, the cardinality cardP (A) = |P (A) | of the Boolean P (A) of the n-element set A ={ x1, …, xn } is equal to 2n = 2|A| = 2cardA. The cardinality cardP (A) = |P (A) | of the Boolean P (A) of a n-dimensional multiset A ={ kA (x1) °x1, …, kA (xn) °xn } over a n-element set X ={ x1, …, xn } is equal to . The cardinality cardQ (A) = |Q (A) | of the multiboolean Q (A) of a n-dimensional multiset A = {kA (x1) °x1, …, kA (xn) °xn } over a n-element set X ={ x1, …, xn } is equal to 2m = 2|A| = 2cardA. The cardinality cardP[k] = |P[k]| of the k-bounded series P[k] of multisets over the finite set X ={ x1, …, xn } is equal to (1 + k) n.
A family of multisets over a set X ={ x1, x2, … } is called the semiring of multisets if it contains the empty multiset ∅, intersection Ai ∩ Aj of multisets Ai, Aj ∈ E, and any multiset A ∈ is representable as a finite union of disjoint submultisets As ⊆ A; the ringk of multisets if it contains the empty multiset ∅, union Ai ∪ Aj, sum Ai + Aj and difference Ai - Aj of multisets Ai, Aj ∈ k; the algebra of multisets if it is a ring that includes the maximal set, named the unit of algebra, and is closed under finite unions, additions and complements of multisets; σ-ringkσ of multisets if it is a ring and contains a countable union and sum of multisets As ∈ kσ; δ-ringkδ of multisets if it is a ring and contains a countable intersection of multisets As ∈ kδ; σ-algebraσ of multisets if it is a σ-ring, contains the unit of algebra, and is closed under countable unions, additions and complements of multisets. In particular, every ring of multisets is a semiring of multisets, every σ-algebra of multisets is a δ-algebra of multisets, and vice versa.
The Boolean P (A) of a set A is the minimal algebra of subsets of a set A, which is the unit of algebra. The Boolean algebra B (A) of subsets of a set A ⊆ X is the σ-algebra of a set X, which is the unit of algebra, and the family P (A) is the support of the algebra. The Boolean P (A) of a n-dimensional multiset A ={ kA (x1) °x1, …, kA (xn) °xn } over a finite set X ={ x1, …, xn } is the minimal algebra of submultisets of a multiset A, which is the unit of algebra. The Boolean algebra B (A) of submultisets of the multiset A ={ kA (x1) °x1, …, kA (x2) °x2 } over an infinite set X ={ x1, x2, … } is the σ-algebra of a multiset A, which is the unit of algebra.
A measure of a set is of great importance in many areas of mathematics [14]. A set measure intuitively associates with a “massiveness” of a set and depends on the number of its elements. A notion of a set measure appears as a generalization of the concepts of length of a segment, area of a plane figure, volume of a spatial body. We introduce a notion of a measure of a multiset as follows [21].
A non-negative real-valued function , defined on a family A = { Ai } i∈I of multisets, is called the strongly additive or strongly finitely additive measure of a multiset if the equality
holds for any finite collection A of multisets, and called the stronglyσ-additive or strongly countably additive measure of a multiset if the equality
holds for any countable collection A of multisets.
A family A is a semiring , ring k, σ-ring kσ, or σ-algebra σ of submultisets Ai of some multiset A ={ kA (x1) °x1, kA (x2) °x2, … } over a set X ={ x1, x2, … }. Such a multiset A can be, for example, the maximal multiset Z ={ kZ (x1) °x1, kZ (x2) °x2, … }, constant multiset Z[k] ={ k°x1, k°x2, … }, n-dimensional multiset ={ k (x1) °x1, …, k (xn) °xn }. From the identity of the empty multiset ∅ =∅ + ∅ and the equality (2) follows: m (∅) = m (∅ + ∅) = m (∅) + m (∅) = 2m (∅). So, m (∅) = 0.
If multisets of an family A are disjoint (Ai ∩ Aj = ∅, i ≠ j, AiAj ∈ A), then a sum of multisets is equal to union: . In this case, the expressions (2) and (3) can be written as
A measure m of a multiset satisfying the condition (4) is called weakly additive or weakly finitely additive, and satisfying the condition (5) is called weaklyσ-additive or weakly countably additive.
Obviously, the strongly additive function of a multiset is weakly additive. The converse, generally, is not true. The weak additivity (4), (5) of a multiset function coincides with the finite and countable additivity of a set function. For sets, the operation of addition is not feasible, and the strong additivity of a function is absent.
The measure m of a multiset has also the following properties: a monotonicity m (A) ≤ m (B) ⇔ A ⊆ B; a symmetry ; a continuity ; an elasticity m (b • A) = bm (A), where b ⩾ 1 is an integer. Thus, features of multiset functions are more complicate and diverse than features of set functions.
Let the measure of a singleton {xi} be equal to m ({ xi }) = wi, 0≤ wi < ∞. Then, according to the equalities (2) and (3), the function
defined on a family A = { Ai } i∈I of multisets Ai = kA (xi) °xi, is strongly additive or strongly σ-additive measure of a multiset. If we put all the numbers wi = 1 in formula (6), then we obtain: m (A) = ∑ikA (xi) = |A|. Thus, the multiset cardinality is a special case of the multiset measure.
Let us note other important properties of the multiset measure stipulated by the strong additivity of a measure and generalizing some properties of the multiset cardinality [13]:
Define the space of multisets with a measure as a pair (σ (Z), m), where σ (Z) is σ-algebra of the maxi-mal multiset Z, m is σ-finite measure given on σ-ring kσ of submultisets . Necessary and sufficient conditions for the measurability of a multiset are formulated as follows. A multiset A belonging to σ-ring kσ is measurable if and only if there exists a measurable multiset B ∈ kσ; such that the inequality m (AΔB) < ɛ holds for any ɛ > 0 [21]. The sum, union and intersection of a finite number of measurable multisets, difference and symmetric difference of two measurable multisets, reproduction of a measurable multiset are also the measurable multisets.
A real-valued function dX, defined on the direct product X × X of a set X, is called the metric on a set X if for any elements of X the following requirements are fulfilled: (10) the axiom of symmetry dX (x, y) = dX (y, x); (20) the axiom of identity dX (x, y) = 0 ⇔ x = y; (30) the axiom of a triangle dX (x, y) ≤ dX (x, z) + dX (z, y). The non-negativity of a metric dX (x, y) ⩾ 0 follows from the requirements (10 - 30). A set X with a metric dX given on X is called the metric space and denoted as (X, dX). A numerical value of function dX (x, y) is called the distance between the elements x and y of a set X.
A metric space X is said to be metrically convex or d-convex if, for any different elements x, y ∈ X, x ≠ y, there exists a distinct element z ∈ X, z ≠ x, z ≠ y such that the triangle inequality (30) is transformed into the equality dX (x, y) = dX (x, z) + dX (z, y). This metric dX is named d-convex. The property of the metric convexity of a space (X, dX) is a specific analogue of the ternary relation [x, z, y] “the element z is placed between the elements x and y” that is given by the condition inf(x, y) ≤ z ≤ sup(x, y) in a partially ordered set.
For evaluating a mutual disposition of elements of a set X, other indexes of the elements’ proximity are used, which correspond to the incomplete axiomatics of the metric space. A real-valued function dX, defined on the direct product X × X of a set X, satisfying the axiom (10) of symmetry and the coincidence condition (40) dX (x, x) = 0 for all x ∈ X, is called the quasimetric on a set X, and a set X with a quasimetric dX is called the quasimetric space. A quasimetric dX, satisfying the axiom (30) of a triangle, is called the pseudometric on a set X, and a set X with a pseudometric dX is called the pseudometric space. In quasi- and pseudometric spaces, the axiom (20) of identity for dX is not satisfied, in general, that is, the condition dX (x, y) = 0 does not imply the equality of the elements x and y. The coincidence condition (40) is weaker than the axiom (20) of identity. Therefore, every metric is also a quasimetric and pseudometric. The converse is not generally true.
A real-valued function dX, defined on the product X × X of a set X, satisfying the axioms (10) of symmetry and (20) of identity, is called the symmetric on a set X, and a set X with a symmetric dX is called the proximity space. A symmetric dX, satisfying the inequality (50) for any elements of a set X, is called the ultrametric on a set X or the ultrametric distance between the elements x and y. It is easy to verify that the axiom (30) of a triangle holds for an ultrametric, and the inequality (50) is stronger than the inequality (30). Therefore, every ultrametric is also a metric, and a space (X, dX) with an ultrametric dX is a metric space. The converse is not generally true.
Consider now possible approaches to forming metric spaces (A, dA) on a family A of multisets.
The Boolean P () of submultisets As ⊆ of n-dimensional multiset = { kX (x1) °x1, …, kX (xn) °xn } over a finite set X ={ x1, …, xn } forms the metric space of finite multisets [8]:
p ⩾ 1 is an integer;
The Boolean P () of submultisets of n-dimen-sional multiset over a finite set X ={ x1, …, xn } is equivalent to the set of n-dimensional vectors s = (ys1, …, ysn) with real components , i = 1, …, n. The metric spaces Pp, P∞ of finite multisets are metrically convex, embeddable into the vector spaces and the spaces lp of bounded numerical sequences, everywhere dense, separable, incomplete, noncompact, but relatively compact.
The σ-algebra σ () of submultisets As ⊆ of a multiset ={ k (x1) °x1, k (x2) °x2, … } over an arbitrary set X ={ x1, x2, … }, where multiplicities of a submultiset As satisfy the restriction kAs (xi)≤ ks < ∞ or for all integers p ⩾ 1, forms the metric space of bounded multisets [26]:
The σ-algebra σ () of bounded submultisets of a multiset over an arbitrary set X ={ x1, x2, … } is equivalent to the set of bounded numerical sequences Ys ={ ysi } = { ys1, ys2, … } with real terms , satisfying the restriction |ysi|≤ ks < ∞ for all integers p ⩾ 1. The metric spaces p, ∞ of bounded multisets are metrically convex, embeddable in the spaces lp of bounded numerical sequences, separable, complete, noncompact, but locally compact.
Note that the cardinality |AΔB| = ∑i|kA (xi) - kB (xi)| of symmetric difference of multisets A and B, presenting in the expressions (11–13) and (15–17), satisfies the axioms (10 - 30) of a metric space.
The space (σ (Z), m) of multisets with a measure, where σ (Z) is σ-algebra of the maximal multiset Z ={ kZ (x1) °x1, kZ (x2) °x2, … } over a set X ={ x1, x2, … }, a multiset measure m is the strongly σ-additive and completely σ-finite, forms the metric spaces of measurable multisetsZqp = (σ (Z), dZqp), q = 1, 2, 3, 4, p ⩾ 1 is an integer, with the functions [15, 21]:
The functions dZ3p and dZ4p are not defined for A = B =∅, therefore it is assumed by definition that dZ3p (∅, ∅) = dZ4p (∅, ∅) = 0. The function dZ1p gives the mapping , and the functions dZ2p, dZ3p, dZ4p give the mapping . The functions dZqp (A, B), q = 1, 2, 3 are pseudometrics satisfying the axioms (10) of symmetry, (30) of a triangle, and the coincidence condition (40). The function dZ4p (A, B) is a quasimetric satisfying the axiom (10) of symmetry and the coincidence condition (40).
In general, the equality A = B does not follow from the condition m (AΔB) = 0. For multisets that differ by a multiset of zero measure, the condition m (AΔB) = 0 implies the so-called m-equalities of multisets A = mB, A1 ∪ A2 = mB1 ∪ B2, A1 ∩ A2 = mB1 ∩ B2, which are valid almost everywhere. Then the axiom (20) of identity holds for the functions dZqp (A, B), q = 1 -4. The functions dZ1p, dZ2p, dZ3p become metrics, and the function dZ4p becomes a symmetric almost everywhere on the corresponding space of measurable multisets [21, 26].
The metric spaces Z1p, Z2p of measurable multisets are homeomorphic, complete, separable, the spaces Z11, Z21 are metrically convex and isometric to the space l1 of bounded numerical sequences. The metric space Z31 is metrically convex.
We shall call the pseudometric dZ1p (A, B) on the space (σ (Z), m) of measurable multisets general or basic, the pseudometric dZ2p (A, B) completely averaged, the pseudometric dZ3p (A, B) locally averaged, the quasimetric dZ4p (A, B) averaged. The general pseudometric dZ1p (A, B) marks the proximity of two multisets A and B in the original space. The completely averaged pseudometric dZ2p (A, B) marks the proximity of two multisets A and B reduced to the maximal possible distance in the original space. The locally averaged pseudometric dZ3p (A, B) marks the proximity of two multisets A and B reduced to the joint “common part” of these two multisets in the original space. The averaged quasimetric dZ4p (A, B) marks the proximity of two multisets A and B reduced to the maximal “common part” of these two multisets in the original space.
When we replace the multiplicity function kA (x) of a multiset by the characteristic function χA (x) of a set in (11–18), we obtain the metric spaces of the finite and bounded sets. Replacing the multiplicity function kA (x) of a multiset by the characteristic function χA (x) of a set in (19–22) we define the metric spaces of measurable sets. In these cases, the general metrics dX11 (A, B) = m (AΔB) and dX11 (A, B) = |AΔB| are called the Fréchet distance or a measure metric. The locally averaged metric dX31 (A, B) = m (AΔB)/(A ∪ B) is called the Steinhaus distance, and the metric dX31 (A, B) = |AΔB|/|A ∪ B| is called the biotopic distance [5, 13].
Properties of multiset measure metrics
Consider some features of the metrics dZ1p, dZ2p, dZ3pdZ4p generated by a multiset measure [21, 26].
The general pseudometric dZ1p (A, B) = [m (AΔB)] 1/-p and the completely averaged pseudometric dZ2p (A, B) = [m (AΔB)/m (Z)] 1/-p are continuous, uniformly continuous and equicontinuous functions, the locally averaged pseudometric dZ3p (A, B) = [m (AΔB)/m (A ∪ B)] 1/-p and the averaged quasimetric dZ4p (A, B) = [m (AΔB)/m (A + B)] 1/-p are piecewise continuous functions of its variables almost everywhere on the corresponding metric space for any number p. The pseudometrics dZqp (A, B), q = 1, 2, 3 and the quasimetric dZ4p (A, B) are not similar to any metrics of other metric spaces, in particular, to the Minkowski-type metrics (13), (17). For wi = 1 for all i in the formula (6), the general pseudometric dZ11 (A, B) = |AΔB| coincides with the Hamming-type metrics (11), (15).
Metrics generated by a multiset measure have also some geometric properties that are associated with the different types of transformations of spaces Zqp = (σ (Z), dZqp) of measurable multisets.
The general pseudometric dZ1p (A, B) = [m (AΔB)] 1/-p for any integer p ⩾ 1 is:
– invariant with respect to the “symmetric” shift dZ1p (A, B) = dZ1p (AΔ, BΔ) to an arbitrary multiset ; the “additive” shift dZ1p (A, B) = dZ1p (A + , B + ) to an arbitrary multiset ; the “conjunctive” shift dZ1p (A, B) = dZ1p (A ∪ , B ∪ ) to a disjoint multiset such that A∩ = B ∩ = ∅;
– invariant with respect to the exclusion dZ1p (A, B) = dZ1p (A - , B - ) of a multiset ⊆ A ∩ B; the exclusion dZ1p (A, B) = dZ1p ( - A, - B) from a multiset ⊇ A ∪ B; the exclusion dZ1p (A, B) = dZ1p ((A ∪ B) - , - (A ∩ B)) of a multiset placed between multisets A and B such that A ∩ B ⊆ ⊆ A ∪ B;
– invariant with respect to the transformations dZ1p;
– elastic with respect to the extension dZ1p (A, B) = (1/ - b) 1/-pdZ1p (B • A, B • B), where b ⩾ 0 is an integer;
– for p = 1 maximally additive dZ11 (A, B) = dZ11 (A ∪ , B ∪ ) + dZ11 (A ∩ , B ∩ ) with respect to an arbitrary multiset ;
– for p = 1 d-convex dZ11 (A, B) = dZ11 (A, ) + dZ11 (, B) with respect to a multiset placed between multisets A and B such that A ∩ B ⊆ ⊆ A ∪ B.
The completely averaged pseudometric dZ2p (A, B) = [m (AΔB)/m (Z)] 1/-p for any integer p ⩾ 1 is:
– invariant with respect to the “symmetric” shift dZ2p (A, B) = dZ2p (AΔ, BΔ) to an arbitrary multiset ; the “additive” shift dZ2p (A, B) = dZ2p (A + , B + ) to an arbitrary multiset ; the “conjunctive” shift dZ2p (A, B) = dZ2p (A ∪ , B ∪ ) to a disjoint multiset such that A∩ = B ∩ = ∅;
– invariant with respect to the exclusion dZ2p (A, B) = dZ2p (A - , B - ) of a multiset ⊇ A ∪ B; the exclusion dZ2p (A, B) = dZ2p ( - A, - B) from a multiset ⊇ A ∪ B; the exclusion dZ2p (A, B) = dZ2p ((A ∪ B) - , - (A ∩ B)) of a multiset placed between multisets A and B such that A ∩ B ⊆ ⊆ A ∪ B;
– invariant with respect to the transformations dZ2p (A, B) = dZ2p (A ∪ B, A ∩ B) = dZ2p (A - B, ;
– invariant with respect to the extension dZ2p (A, B) = dZ2p (b • A, b • B), where b ⩾ 0 is an integer;
– for p = 1 maximally additive dZ21 (A, B) = dZ21 (A ∪ , B ∪ ) + dZ21 (A ∩ , B ∩ ) with respect to an arbitrary multiset ;
– for p = 1 d-convex dZ21 (A, B) = dZ21 (A, ) + dZ21 (, B) with respect to a multiset placed between multisets A and B such that A ∩ B ⊆ ⊆ A ∪ B.
The locally averaged pseudometric dZ3p (A, B) = [m (AΔB)/m (A ∪ B)] 1/-p for any integer p ⩾ 1 is:
– invariant with respect to the transformation dZ3p (A, B) = dZ3p (A ∪ B, A ∩ B);
– invariant with respect to the extension dZ3p (A, B) = dZ3p (b • A, b • B), where b ⩾ 0 is an integer;
– for p = 1 d-convex dZ31 (A, B) = dZ31 (A, ) + dZ31 (, B) with respect to a multiset = A ∪ B.
The averaged quasimetric dZ4p (A, B) = [m (AΔB)/m (A + B)] 1/-p for any integer p ⩾ 1 is:
– invariant with respect to the transformation dZ4p (A, B) = dZ4p (A ∪ B, A ∩ B);
– invariant with respect to the extension dZ4p (A, B) = dZ4p (b • A, b • B), where b ⩾ 0 is an integer.
Representation and comparison of multi-attribute objects
There are a lot of problems, where the objects are described by manifold characteristics: quantitative, qualitative, mixed, continuous or discrete. In addition, these objects can exist also in several exemplars (versions, copies) with different values of attributes, a convolution of which is impossible, or mathematically incorrect. Various versions of a multi-attribute object arise, for instance, in the cases, when an object is evaluated by several experts on many criteria with numerical and/or verbal scales, or the characteristics of object are calculated several times by different methods, or measured several times by different tools. Examples of such tasks are collective classification and ordering multicriteria decisions, recognition of graphic symbols, processing text documents.
The multiplicity and repeatability of factors that describe the objects complicate and render more difficult the solution of these problems. We discuss here possible ways of representing and comparing such objects [17, 24].
Firstly, consider the situation when objects O1, …, Oq, which are characterized by the attributes K1, …, Kn with rating scales , l = 1, …, n, exist in single versions. Traditionally, each object Oi, i = 1, …, q is presented as a vector or tuple , is the numerical or verbal grade on a scale Xl of the attribute Kl. The vector/tuple i is a point of n-dimensional space X = X1 × … × Xn formed by score scales of the attributes K1, …, Kn.
The situation becomes more complicated when there are several versions , s = 1, …, t of each object Oi, i = 1, …, q, which differ in values of the attributes K1, …, Kn. In such cases, no one, but t vectors/ tuples correspond to the object Oi. A vector/tuple describes one of versions of the object Oi. A component is the value of attribute Kl in the s-th version of the object Oi that is equal to a grade xj, j = 1, …, h of rating scale Xl ={ x1, …, xh }, or , el = 1, …, hl of rating scale , l = 1, …, n.
In n-dimensional attribute space X = X1 × … × Xn, the object Oi is now represented as not a single point i, but a group (“cloud”) of t points . Individual values of attributes describing versions of the object Oi (estimates of different experts, characteristics measured in different ways or by tools) may be similar, different, and even contradictory, that in turn can lead to incomparability of vectors/tuples representing the object Oi. A collection of multi-attribute objects O1, …, Oq, each of which is represented in the attribute space X = X1 × … × Xn as the own “cloud” consisting of t different points, can have a rather complex structure that is difficult to analyze. Therefore, it is desirable to simplify a description and aggregate a representation of such multi-attribute objects.
One of possible approaches to simplifying a description of the multi-attribute object Oi, represented as the group of vectors , is as follows. Since components of a vector are numerical values, we can specify a single vector instead of several vectors corresponding to the object Oi. Components of a vector are to be defined in n-dimensional attribute space X = X1 × … × Xn from additional formal conditions or some meaningful considerations. For example, it could be a vector with the averaged or weighted values of vectors’ components in a group, a vector the most closed to all vectors in a group, or a vector that is the center of a group.
However, when rating scales have discrete numerical estimates, a vector with “averaged” components may not physically exist in the original attribute space, since there are no corresponding numerical grades on scales X1 × … × Xn. So, one must either expand the primary rating scales by introducing intermediate numerical grades, or consider the rating scales as continuous. Strictly speaking, both the first and the second change the original formulation of a problem.
When objects are represented by tuples with symbolic, verbal or mixed components, several tuples can not be replaced, even in principle, by a single tuple with averaged, weighted, mixed values, since these operations are mathematically unrealizable.
In order to operate with such vectors, we use the formalism of multiset theory, which allows us to take into account simultaneously the heterogeneous features, possible combinations of characteristic values, the presence of different versions of objects.
Let all attributes K1, …, Kn have the same rating scale X ={ x1, …, xh }. Take out the set X ={ x1, …, xh } as the domain and correspond a multiset
to an object Oi, i = 1, …, q. Here a multiplicity kAi (xj) shows how many times an estimate xj, j = 1, …, h occurs in the description of object Oi.
Let now each attribute Kl has the own rating scale , l = 1, …, n. Introduce a combined scale (hyperscale) of attributes – the set , which consists of n groups of characteristics and combines all gradations of estimates on all rating scales. Then a multiset
over the domain X = X1 ∪ … ∪ Xn is corresponded to an object Oi. Here a multiplicity shows how many times an estimate , el = 1, …, hl of the attribute Kl occurs in the description of object Oi. The expression (24) can be easily rewritten in the “usual” form (23) by making changes: .
The variety of operations on multisets provides the ability to aggregate multi-attribute objects in different ways. It is common to combine objects into groups (classes) using the operation of addition of vectors or union of sets. A group Df of objects described by multisets can be formed as a sum f = ∑iAi, kf (xj) = ∑ikAi (xj), union f = ∪ iAi, , intersection f = ∩ iAi, or as a weighted linear combination f = ∑ibi • Ai, f = ∪ ibi • Ai, f = ∩ ibi • Ai of multisets corresponding to objects. While adding multisets, all properties (all values of all attributes) of all members of a group are aggregated. While uniting or intersecting multisets, the best properties (the maximum values of all attributes) or, respectively, the worst properties (the minimum values of all attributes) of individual members of a group are increased.
Let a multiset , i = 1, …, q, over a set X ={ x1, …, xh } or of estimates corresponds to a s-th version , s = 1, …, t of multi-attribute object Oi, and a multiset Ai ={ kAi (x1) °x1, …, kAi (xh) °xh } corresponds to an object Oi. We form the multiset Ai representing the object Oi by the weighted addition of multisets describing versions of this object as follows: . The multiplicity function is equal to , j = 1, …, h. An integer coefficient c<s> marks a significance of the s-th version of object (expert competence, measurement accuracy).
Consider an illustrative example. Let objects O1, …, O10 be papers submitted at any Conference. Two experts review each paper and estimate it by the following criteria: K1. Presentation and clarity; K2. Originality of paper; K3. Significance of theory; K4. Significance of application; K5. Appropriateness of title, abstract and keywords; K6. Appropriateness of figures and tables; K7. Bibliography; K8. Overall evaluation. The qualitative criteria K1, …, K8 have rating scale with following verbal grades: a – excellent; b – good; c – satisfactory; d – poor; e – null. Every expert gives also one of the recommendations: ra – to accept the paper; rb – to reject the paper.
Table 1 shows the expert estimates of versions of papers O1, …, O10 presented as tuples 1, …, 10 and multisets A1, …, A10 over the set X ={ a, b, c, d, e }. For instance, two versions and of the paper O1 are represented by two tuples and , and by two multisets and . Suppose that both reviewers are equally competent: c<1> = c<2> = 1. Then the estimates of the paper O1 as a whole is described by the multiset .
The expert estimates of papers as tuples and multisets over the set X
O ∖ K
K1
K2
K3
K4
K5
K6
K7
K8
O ∖ X
a
b
c
d
e
b
a
b
a
b
a
b
a
4
4
0
0
0
a
a
a
a
b
b
b
a
5
3
0
0
0
b
e
d
e
c
d
d
d
0
1
1
4
2
c
d
e
e
b
c
c
c
0
1
4
1
1
e
e
c
e
b
e
e
e
0
1
1
0
6
e
d
c
e
a
d
e
c
1
0
2
2
3
a
c
d
b
b
a
b
a
3
3
1
1
0
b
b
c
a
b
a
c
b
2
4
2
0
0
b
b
b
b
b
a
b
b
1
7
0
0
0
a
a
c
b
b
b
a
b
3
4
1
0
0
a
a
b
b
b
a
a
b
4
4
0
0
0
b
a
b
b
b
b
a
b
2
6
0
0
0
b
a
d
c
c
c
e
c
0
1
3
2
2
c
d
e
b
d
b
d
d
0
2
2
3
1
b
a
b
d
c
b
a
b
2
4
1
1
0
a
b
a
c
b
a
b
b
3
4
1
0
0
c
d
c
e
c
c
d
c
0
0
5
2
1
b
c
d
d
d
c
c
d
0
1
3
4
0
a
a
b
a
c
a
a
a
6
1
1
0
0
c
b
c
b
d
b
d
c
0
3
3
2
0
Table 2 shows the estimates of papers O1, …, O10 presented as multisets A1, …, A10 over the expanded set X = X1 ∪ … ∪ X8 ∪ R = {a1, b1, c1, d1, e1; a2, b2, c2, d2, e2; a3, b3, c3, d3, e3; a4, b4, c4, d4, e4; a5, b5, c5, d5, e5; a6, b6, c6, d6, e6; a7, b7, c7, d7, e7; a8, b8, c8, d8, e8; ra, rb}. Here Xl = {al, bl, cl, dl, el} is the rating scale of the attribute Kl, l = 1 – 8, and R = {ra, rb} is the set of the expert recommendations. The multiset A1 = {1°a1, 1 ° b1, 0 ° c1, 0 ° d1, 0 ° e1; 2 ° a2, 0 ° b2, 0 ° c2, 0 ° d2, 0 ° e2; 1 ° a3, 1 ° b3, 0 ° c3, 0 ° d3, 0 ° e3; 2 ° a4, 0 ° b4, 0 ° c4, 0 ° d4, 0 ° e4; 0 ° a5, 2 ° b5, 0 ° c5, 0 ° d5, 0 ° e5; 1 ° a6, 1 ° b6, 0 ° c6, 0 ° d6, 0 ° e6; 0 ° a7, 2 ° b7, 0 ° c7, 0 ° d7, 0 ° e7; 2 ° a8, 0 ° b8, 0 ° c8, 0 ° d8, 0 ° e8; 2 ° ra, 0 ° rb} over the set describes the estimates of the paper O1.
The expert estimates of papers as multisets over the set
O∖X
a1b1c1d1e1
a2b2c2d2e2
a3b3c3d3e3
a4b4c4d4e4
a5b5c5d5e5
a6b6c6d6e6
a7b7c7d7e7
a8b8c8d8e8
rarb
A1
1 1 0 0 0
2 0 0 0 0
1 1 0 0 0
2 0 0 0 0
0 2 0 0 0
1 1 0 0 0
0 2 0 0 0
2 0 0 0 0
2 0
A2
0 1 1 0 0
0 0 0 1 1
0 0 0 1 1
0 0 0 0 2
0 1 1 0 0
0 0 1 1 0
0 0 1 1 0
0 0 1 1 0
0 2
A3
0 0 0 0 2
0 0 0 1 1
0 0 2 0 0
0 0 0 0 2
1 1 0 0 0
0 0 0 1 1
0 0 0 0 2
0 0 1 0 1
0 2
A4
1 1 0 0 0
0 1 1 0 0
0 0 1 1 0
1 1 0 0 0
0 2 0 0 0
2 0 0 0 0
0 1 1 0 0
1 1 0 0 0
2 0
A5
1 1 0 0 0
1 1 0 0 0
0 1 1 0 0
0 2 0 0 0
0 2 0 0 0
1 1 0 0 0
1 1 0 0 0
0 2 0 0 0
2 0
A6
1 1 0 0 0
2 0 0 0 0
0 2 0 0 0
0 2 0 0 0
0 2 0 0 0
1 1 0 0 0
2 0 0 0 0
1 1 0 0 0
2 0
A7
0 1 1 0 0
0 0 0 1 1
0 0 0 1 1
0 1 1 0 0
0 0 1 1 0
0 1 1 0 0
0 0 0 1 1
0 1 1 0 0
0 2
A8
1 1 0 0 0
1 1 0 0 0
1 1 0 0 0
0 0 1 1 0
0 1 1 0 0
1 1 0 0 0
1 1 0 0 0
0 2 0 0 0
2 0
A9
0 1 1 0 0
0 0 1 1 0
0 0 1 1 0
0 0 0 1 1
0 0 1 1 0
0 0 2 0 0
0 0 1 1 0
0 0 1 1 0
0 2
A10
1 0 1 0 0
1 1 0 0 0
0 1 1 0 0
1 1 0 0 0
0 0 1 1 0
1 1 0 0 0
1 0 0 1 0
1 0 1 0 0
1 1
Group multiple criteria choice
The problem of multiple criteria choice is formulated as follows. There is given a collection of objects (options, alternatives) O1, …, Oq, which are evaluated by one or several experts upon many criteria K1, …, Kn. Each criterion Kl has a scale , l = 1, …, n, the discrete numerical and/or verbal grades of which are ordered or not. Based on knowledge of experts and/or preferences of decision-makers (DM), it is required: to select one or several best objects; to order all objects; and to assign all objects into several classes (categories).
Consider a multi-attribute object Oi as a point of metric space (A, dA) of multisets, A ={ A1, …, Am }, dA is a metric of type (11–22). In practical problems, a multiset measure can be given as a linear combination (6): , wi > 0 is a significance (weight) of the element xi or the attribute Kl, l = 1, …, n. A proximity of the analyzed objects in the attribute space is characterized either by dissimilarity or similarity of object properties. So, dissimilarity between objects Oi and Oj can be evaluated, for example, by one of the metrics (19–22), which we rewrite as follows:
A similarity of the objects Oi and Oj can be evaluated by one of the following indicators:
Here the factors
The relations Hij = Mij - Lij, Nij = Mij + Lij hold.
The expressions s01, s11, s21, s31, s41 (29–33) for p = 1 generalize the known non-metric indexes of objects’ similarity such as, respectively, the measure of absolute similarity, the Russell-Rao similarity measure, the simple matching coefficient, the Jacquard coefficient or the Rogers-Tanimoto measure, the Sorensen index [1, 30].
We used the proposed indexes of dissimilarity (25–28) and similarity (29–33) between objects, which exist in several copies with many different, in particular, contradictory values of quantitative and/or qualitative attributes, in the new elaborated methods of group verbal decision analysis [17, 24].
The method ARAMIS (Aggregation and Ranking Alternatives nearby the Multi-attribute Ideal Situations) focuses on group ordering objects with many verbal attributes [23, 24]. This method is based on the estimations of the objects’ proximity to any “ideal” object in the attribute space and allows us to rank collectively multi-attribute objects without a pre-construction of individual rankings.
In the illustrative example, the best paper O+ and the worst paper O- are described by the multisets A+ ={ 10° a, 0° b, 0° c, 0° d, 0° e } and A- ={ 0° a, 0° b, 0° c, 0° d, 10° e }. Calculating the distances d (Ai, Aj) between object i and objects O+, O-, one can arrange all objects with respect to closeness to the best O+ or worst O- objects. The descending ordering of papers nearby the best paper O+ is as follows: O1 ≻ O6 ≻ O10 ≻ O4 ≻ O8 ≻ O5 ≻ O3 ≻ O2 ≻ O7 ≻ O9. There are here two groups of papers: O1, O6, O10, O4, O8, O5 (with the distances 14–24); and O3, O2, O7, O9 (with the distances 30–32). The first group can be considered as the accepted papers, the second group as the rejected papers.
The methods CLAVA-HI and CLAVA-NI (CLustering Alternatives with Verbal Attributes – HIerarchical and NonhIerarchical) allow us to generate clusters of several copies of objects with many verbal attributes, when a number of clusters is fixed or not fixed in advance [15, 24]. These methods are based on the proximity of objects in the multiset metric space. In hierarchical clustering, one combines step by step the nearest pairs of objects Ou, Ov and/or clusters Du, Dv such that , and forms new clusters as a sum, union or intersection of multisets describing objects/ clusters. Finally, two or more groups of objects are built.
In the example, the first cluster Dh includes the papers O1, O4, O5, O6, O8, O10 with, in general, the ‘high’ estimates a, b, which can be considered as accepted. The second cluster Dl includes the papers O2, O3, O7, O9 with, in general, the ‘low’ estimates c, d, e, which can be considered as rejected.
The method MASKA (abbreviation of the Russian words Multi-Attribute Consistent Classification of Alternatives) focuses on a group distribution of objects, described by many verbal attributes, between several classes [18, 22–24]. This method allows us to find collective rules that aggregate a lot of manifold and may be inconsistent individual rules for sorting multi-attribute objects.
Individual sorting rules of several persons are expressed by multisets Ai representing the objects Oi (the rows in Table 2). By adding the multisets Ai in accordance with any “majority of voices”, one constructs two new multisets that characterize a class Da of more preferable and a class Db of less preferable objects. For each l-th group of “substantial” attributes, one finds such pairs of submultisets Qla and Qlb that are placed at the maximum distance d (Qla, Qlb) in the multiset metric space and define the best decomposition of objects into the classes Da and Db. Various combinations of “substantial” attributes, included in these submultisets, form the generalized collective decision rule for choice of the multi-attribute objects that aggregates individual sorting rules with the acceptable approximation accuracy.
In the example, the class Da consists of the more preferable papers O1, O4, O5, O6, O8, O10; the class Db consists of the less preferable papers O2, O3, O7, O9. Two generalized collective decision rules for sorting papers are formulated as follows. “Papers with the estimates good or excellent by the criteria K2, K6, and the estimate excellent, good or satisfactory by the criterion K1, in other words, papers with the estimates (a2 or b2), and (a6 or b6), and (a1 or b1 or c1), are to be accepted”. “Papers with the estimates satisfactory, poor or null by the criteria K2, K6, and the estimate poor or null by the criterion K1, in other words, papers with the estimates (c2 or d2 or e2), and (c6 or d6 or e6), and (d1 or e1), are to be rejected”.
The multi-method technology PAKS-M (abbreviation of the Russian words Progressive Aggregation of the Classified Situations by many Methods) is elaborated for solving difficult problems of multiple criteria choice [25]. For instance, to select the preferable object, described by many numerical and/or verbal attributes, from a small collection of incomparable objects. This technology provides reducing the dimension of the attribute space, constructing several hierarchical systems of composite criteria and an integral quality index with verbal scales, which aggregate many initial attributes, and, finally, ordering and/or classification of multi-attribute objects using several decision making methods.
Conclusion
The paper describes new classes of spaces of finite, bounded and measurable multisets with new types of metrics that can be used in various methods of classification, sorting, ranking multi-attribute objects. The representation of such objects as multisets and application of new proposed indicators of the objects’ similarity or dissimilarity expand the range of problems under consideration. New tools allow us to solve new problems never being sold earlier, and solve known traditional problems in more simple and effective ways.
Underline that the new methods ARAMIS, CLAVA-HI, CLAVA-NI, MASKA for group ordering and sorting multi-attribute objects are unique and have no analogues. These methods operate with objects described by many numerical, symbolic and/or verbal attributes and existed in several different copies. In these methods, verbal attributes are not transformed in or replaced by any numerical ones as, for instance, in MAUT, TOPSIS and fuzzy methods. These methods take into account diversity and contradictions of knowledge of many experts and/or preferences of decision makers without searching for a consistency of actors’ judgments.
The developed methods and technologies of group verbal decision analysis were applied for solving practical choice problems. Among them are the competitive selection of research projects [18], multi-attribute classification of credit cardholders [22], multi-aspect evaluation of efficiency of research projects [24], and multiple criteria selection of the advanced computing complex [25].
Footnotes
Acknowledgments
This work is partially supported by the Russian Foundation for Basic Research (projects 16-29-12864, 17-07-00512, 17-29-07021, 18-07-00132, 18-07-00280).
References
1.
M.R.Anderberg, Cluster Analysis for Applications, Academic Press, 1973.
2.
I.Batyrshin, Uncertainties with memory in construction of strict monotonic t-norms and t-conorms for finite ordinal scales: Basic definitions and applications, Applied and Computational Mathematics, 10(3) (2011), 498–513.
C.Brink, Some background on multisets, Automated Reasoning Project, Technical Report TR-ARP 2/87, RSSS, Australian National University, 1987.
5.
M.M.Deza and E.Deza, Encyclopedia of distances, Springer, 2009.
6.
J.C.Gower, Similarity, dissimilarity and distance, measures of, in: Encyclopedia of Statistical Sciences,John Wiley, 12, 2008, pp. 7730–7738.
7.
P.Jaccard, Etude comparative de la distribution florale dans une portion des Alpes et du Jura, Bulletin de la Société Vaudoise des Sciences Naturelles37(142) (1901), 547–579.
8.
W.A.Kosters and J.F.J.Laros, Metrics for mining multisets, in: Research and Development in Intelligent Systems XXIV, Proceedings of AI-2007,Springer-Verlag, 2008, pp. 294–303.
9.
M.Krawczak and G.Szkatula, Bidirectional comparison of multi-attribute qualitative objects, Information Sciences436-437 (2018), 367–387.
10.
L.A.Lyusternik and V.I.Sobolev, Elements of functional analysis, Nauka, 1965(in Russian).
11.
P.C.Mahalanobis, On the generalised distance in statistics, Proceedings of the National Institute of Science of India12 (1936), 49–55.
12.
E.Marczewski and H.Steinhaus, On a certain distance of sets and the corresponding distance of functions, Colloquium Mathematicum6 (1958), 319–327.
13.
M.O'Searcoid, Metric Spaces,SpringerLondon, 2009.
14.
J.Oxtoby, Measure and category,Springer-Verlag, 1971.
15.
A.B.Petrovsky, An axiomatic approach to metrization of multiset space, in: Multiple Criteria Decision Making,Springer-Verlag, 1994, pp. 129–140.
16.
A.B.Petrovsky, Metric spaces of multisets, Doklady of Academy of Sciences344(2) (1995), 175–177(in Russian).
17.
A.B.Petrovsky, Structuring techniques in multiset spaces, in: Multiple Criteria Decision Making,Springer-Verlag, 1997, pp. 174–184.
18.
A.B.Petrovsky, Multiattribute sorting of qualitative objects in multiset spaces, in: Multiple Criteria Decision Making in New Millennium,Springer-Verlag, 2001, 124–131.
19.
A.B.Petrovsky, Basis Concepts of Multiset Theory, Editorial URSS, 2002(in Russian).
20.
A.B.Petrovsky, Operations with multisets, Doklady Mathematics67(2) (2003), 296–299.
21.
A.B.Petrovsky, Spaces of Sets and Multisets, Editorial URSS, 2003(in Russian).
22.
A.Petrovsky, Multi-attribute classification of credit card-holders: Multiset approach, International Journal of Management and Decision Making7(2/3) (2006), 166–179.
23.
A.Petrovsky, Group verbal decision analysis, in: Encyclopedia of Decision Making and Decision Support Technologies,IGI Global, 1, 2008, pp. 418–425.
24.
A.Petrovsky, Group multiple criteria decision making: Multiset approach, in: Recent Developments and New Directions in Soft Computing. Studies in Fuzziness and Soft-Computing, 317, Springer International Publishing, 2014, pp. 19–33.
25.
A.B.Petrovsky and V.N.Lobanov, Multi-criteria choice in the attribute space of large dimension: Multi-method technology PAKS-M, Scientific and Technical Information Processing42(5) (2015), 76–86.
26.
A.B.Petrovsky, Indicators of similarity and dissimilarity of multi-attribute objects in metric spaces of sets and multisets, Artificial Intelligence and Decision Making (Iskusstvennyiy intellekt i prinyatiye resheniy)4 (2017), 78–94. (in Russian)
27.
A.Rebai, Canonical fuzzy bags and bag fuzzy measure as a basis for MADM with mixed non cardinal data, European Journal of Operational Research78 (1994), 34–48.
28.
L.Sheremetov, I.Batyrshin, D.Filatov, J.Martinez and H.Rodriguez, Fuzzy expert system for solving lost circulation problem, Applied Soft Computing8 (2008), 14–29.
29.
D.Singh, A.M.Ibrahim, T.Yohana and J.N.Singh, An overview of the applications of multisets, Novi Sad Journal of Mathematics37(2) (2007), 73–92.
30.
P.H.A.Sneath and R.R.Sokal, Numerical Taxonomy. The Principles and Practice of Numerical Classification, Freeman, 1973.