Towards a general theory of similarity and association measures: Similarity,dissimilarity and correlation functions

Abstract

Similarity, correlation and association measures play an important role in statistics, information retrieval, data mining and data science, classification and machine learning, recommender systems and decision-making. They have numerous applications in ecology, social and behavioral sciences, biology and bioinformatics, social network and time series analysis, image and natural language processing. Often the measures with the same name introduced on different domains have different properties, and the measures with the same properties have different names. To unify analysis of measures defined on different domains, this paper considers these measures as functions defined on universal domain and satisfying some sets of properties. The general properties of similarity functions (SF) and dissimilarity functions (DF) under the joint name of resemblance functions (RF) studied on universal domain and illustrated by examples on specific domains. The known and the new methods of construction of similarity measures are considered. This paper discusses the following aspects of RF: relationship with fuzzy (valued) relations, T-transitivity and triangle inequality, Minkowski distance and data transformation, cosine SF, RF on domains with involution (negation), aggregation and transformations of RF, visualization of RF. The paper considers also the lattice of RF, composition and min-transitive transformations of SF (fuzzy proximity relations), applications to hierarchical clustering and non-probabilistic entropy of RF. In addition, the paper proposes the method of construction of correlation functions (association measures) using SF. Pearson correlation and Yule’s Q association coefficients obtained as particular cases of the general method. One can use the paper as a survey of works on similarity and dissimilarity measures on specific domains, as a guide for constructing new similarity and correlation measures, as a base for the study of mathematical properties of resemblance functions on universal and specific domains, and also as a part of the course on Data Science.

Keywords

1 Introduction

Since many decades the number of works on measures of similarity, resemblance, correlation, association, relationship, interestingness, comparison etc. has been dramatically increased [3 , 68–71]. These measures play an important role in statistics, information retrieval, data mining and data science, classification and machine learning, recommender systems and decision-making. They have numerous applications in ecology, social and behavioral sciences, biology and bioinformatics, social network and time series analysis, image and natural language processing. Although similarity and association (correlation) measures often used for different tasks, for example for data classification [29, 63] and for analysis of relationships between variables [27 , 57], in previous years they are often considered together as descriptive measures of relationship [27 , 58].

There are several mutually related approaches to analysis of similarity and association measures:

Prescriptive approach: propose a measure (for a specific domain) and study its property. Some examples: Jaccard similarity measure [29], Pearson’s product moment correlation coefficient [57] and many others.

Descriptive approach: systemize, survey, compare different measures. Several surveys on similarity and association measures for different domains are published [5 , 66].

Normative approach: define measures as functions satisfying some set of reasonable properties, construct measures satisfying these properties, and analyze properties of different measures on some domain. Examples can be found in some surveys mentioned above and also here [5 , 70].

Visualization: visualize measures and data or give them a graphical interpretation to better understand their properties, relationships and difference between them [17 , 68].

Often the similarity, correlation or association measures introduced on different domains have the same name but different properties, and the measures with the same properties have different names. To avoid this terminological confusion existing in literature and to unify analysis of measures defined on different domains, this paper considers these measures as functions defined on universal domain and satisfying simple sets of properties. The general properties and methods of construction of such functions studied on universal domain and illustrated on specific domains. This paper uses a normative approach to analysis of similarity and association measures. Such approach was used in [12 –14] for constructing association (correlation) measures by means of similarity measures when both types of measures were considered as functions satisfying some sets of properties. As a particular case of the general methods of construction of association measures from similarity measures proposed in [12 , 15] one can obtain the Pearson’s product moment correlation coefficient. These results show an importance of similarity measures not only for analysis of similarity in data, but also for constructing measures of more sophisticated associations of data. For this reason, this paper studies in detail general concepts related with similarity measures that can serve as a part of the general theory of similarity, correlation and association measures.

The paper considers similarity measures as symmetric and reflexive similarity functions taking values in the interval [0, 1]. Such functions have been considered in the theory of fuzzy (valued) relations initiated by Lotfi Zadeh [1, 2] and later studied in many works, see for example [4 , 69]. For this reason, some terminology, notations and results of valued relations used here for similarity functions. Generally, one can consider similarity functions as fuzzy (valued) proximity relations [36] and vice versa but there are some differences between them.

Fuzzy relations appeared as a generalization of crisp, non-fuzzy relations, and for them the property of transitivity is very important [2 , 65]. Zadeh introduced [2] similarity relations as symmetric, reflexive and max-min transitive functions generalizing equivalence relations. The concept of equivalence relation plays a significant role in mathematics and applications. Such relation defines a partition of the relations domain on equivalence classes. Fuzzy similarity relation defines a hierarchy of partitions of the domain on equivalence classes, and a hierarchical clustering of a finite set can be considered as a transformation of the proximity relation given on this set into a similarity relation [7 , 65]. Generally, the T-transitivity of fuzzy relations is studied [21 , 36], where T is a t-norm [47].

The paper considers similarity functions defined on a universal domain as models of similarity measures defined on specific domains. For example, on the set of binary vectors with n dichotomous (Yes/No or 1/0) attribute values, dozens of similarity measures are introduced [5 , 49]. Such measures are usually not max-min transitive [30]. For similarity measures (SM) on some specific domain, the following problems are important: the selection of SM suitable for considered task, construction of a new SM, comparison of SM, possible transformations of SM, visualization of SM etc.

The paper discusses similarity functions (SF) and dissimilarity functions (DF) easily obtained one from another. Generally, it is a reason to consider only similarity functions [49] but these two functions have different interpretations and methods of construction. Therefore, both considered here together under the joint name resemblance functions (RF) [5].

The paper studies the general properties of RF defined on universal domain and methods of their construction. Some results discussed in this paper are new, some introduced as extensions of properties and methods considered in different works on fuzzy relations and on similarity measures for specific domains [4 , 65 etc.]. The paper surveys some important works studying the properties of RF.

The paper considers SF as a particular case of an association function that can have negative values. SF used also for construction of correlation functions (measures), see Section 13 and [13].

The paper has the following structure. Section 2 considers some popular domains of SF. Section 3 defines the main properties of SF, DF and RF and gives examples of these functions. Sections 4 and 5 compare RF with T-transitive fuzzy relations. Section 6 studies the methods of construction of DF by Minkowski distance and data transformation. Section 7 considers RF on domains with ordering and involution (negation). Section 8 studies transformations and aggregation of RF. Section 9 considers visualization of SF on 2 × 2 tables. Sections 10 and 11 describe the lattice of RF and non-probabilistic entropy of RF. Section 12 focuses on composition and min-transitive transformations of fuzzy relations with application to hierarchical clustering. Section 13 considers the method of constructing correlation functions using RF. Section 14 discusses related works and conclusions.

2 Similarity measures on universal set and on specific domains

The large number of measures introduced for measuring similarity, correlation or association between binary vectors, between measurements or observations of variables, between time series, between fuzzy sets etc. Often these measures use similar formulas with similar properties and this is a reason to consider these measures generally as functions defined on a universal domain. Such general approach to analysis of similarity measures can simplify the transfer of results obtained for such measures on one domain (for one type of data) on another domain (for another type of data). Other advantages in consideration of measures as functions satisfying some properties are the following. It gives possibility to analyze relationships between different types of functions describing measures, and to formulate the problem of development of general methods of construction of functions satisfying given properties [13]. Using these methods one can introduce new measures on specific domains without disadvantages or drawbacks of measures used before.

In this paper, similarity measures are considered as functions S : Ω × Ω → [0, 1] defined on a universal domain Ω, and taking values in the interval of real values [0,1] such that for any elements (objects) x and y in Ω the reasonable properties of symmetry: S (x, y) = S (y, x), and reflexivity: S (x, x) =1, are fulfilled. The first property says that if we compare x with y and y with x the similarity between them will be the same. The second means that the similarity of any element x with itself is maximal. Note that although the class of similarity functions can serve as a model of large number of similarity measures for different applications there is possible to consider the measures of relationship or association that do not satisfy these simple properties. For example, the measure of inclusion of one set in another is reflexive but not symmetric. Correlation coefficient is symmetric and reflexive but takes values in interval [–1,1], see Section 13. One can use the number of messages between the nodes of some social network as a measure of relationships between them. Such measure will not be reflexive because some nodes will not send messages to itself; the number of messages between two nodes in both directions can be different, i.e. non-symmetric; and the number of messages can have values greater than one. In some applications, it is reasonable to transform relationship measures into similarity functions by symmetrization, normalization and other transformations. However, these measures can be studied also as special classes of association functions [5 , 61].

This paper considers similarity measures as functions defined on universal set Ω. The properties of these functions can be easily extended on specific domain. On such domains, these functions can have additional properties related with the structure of data from this domain. Below there are the examples of some popular specific domains Ω:

the set of membership or probability values Ω = [0, 1];

the set of real values Ω = R;

the set of nonnegative real values Ω = R⁺;

the rating scale Ω ={ 1, 2, 3, 4, 5 };

the rating scale with linguistic labels, e.g. Ω = {very bad, bad, not good not bad, good, very good}.

2.1 The set of n-tuples

In this paper, special attention will be given to similarity functions defined on the set Ω of n-tuples x = (x₁, …, x_n), such that for all i = 1, …, n, the elements x_i take values in some set L. Depending on application, L can be the set of binary values {0,1}, the interval [0,1], the set of real values, the set of nonnegative real values, a rating scale etc.

In different applications, the n-tuple x = (x₁, …, x_n) can have the following interpretations:

the vector of n attribute values of the object x;

the results of n measurements (observations) of the variable x for n sampling units;

the time series values ordered in time with a fixed step between time points;

the ranking of n objects such that all x_i obtain different values from the set of ranks {1, …, n};

the ranking of n objects such that all values x_i in x are unique;

the rating profile of a user that evaluates n items in a rating scale, e.g. in {1,2,3,4,5};

the rating utility profile such that x_i is the real valued utility of the rating of the item i;

the membership values (in [0,1]) of n elements of discrete fuzzy set;

the finite probabilistic distribution such that $\sum_{i = 1}^{n} x_{i} = 1$ .

Generally, it is possible to consider n-tuples where x_i are the sets of membership values of element i in intuitionistic, hesitant and other fuzzy sets.

In many cases, the measure defined on the set of n-tuples with one interpretation can be used on the set of n-tuples with another interpretation. For example, the Pearson’s product moment correlation coefficient initially introduced as a measure of correlation of two variables given by n measurements for n units of a sample or a population often used as a measure of similarity or correlation between vectors, time series, rating profiles etc. Note that sometimes a measure having good interpretation on one type of data can be misleading or can have drawbacks when it is used for another type of data due to specific properties of the new domain. For example, the Pearson correlation coefficient can be misleading on the set of time series or on the set of bipolar rating profiles [16, 18].

2.2 Binary n-tuples and 2×2 tables

As a special case of the set of n-tuples the paper considers the set Ω of all binary n-tuples x = (x₁, …, x_n), x_i ∈ {0, 1}, i = 1, …, n, of the length n > 1. An n-tuple x represents the object x described by n binary features, attributes or properties such that x_k = 1 if the object x possesses the attribute k, and x_k = 0 in the opposite case. Denote by U the set of all n attributes. For two objects x = (x₁, …, x_n) and y = (y₁, …, y_n) denote by X and Y the sets of attributes possessed by x and y, respectively. $\bar{X}$ and $\bar{Y}$ will denote the complements of the sets X and Y, i.e. the sets of attributes not possessed by x and y, respectively.

Another source for binary n-tuples are dichotomous (Yes/No) variables x and y measured for n sampling units. For example, the values 1 and 0 of x_k can denote presence or absence of lung cancer in k-th patient, respectively, and y_k will have values 1 or 0 if this patient smokes or does not, respectively. In this case, X is the set of patients with lung cancer and Y is the set of smoking patients.

The first type of data appears in classification tasks and dozens of similarity measures for such data have been introduced [29, 63]. The second type of data appears usually in statistics where different association measures between variables defined [27]. Latterly both types of data are considered together because presented by 2 × 2 tables and both types of measures constructed similarly [28, 42]. Below, for definiteness, we will refer to the first interpretation of data.

Generally, one can use similarity or association measures on 2 × 2 tables as measures of similarity or association between any subsets X and Y of finite universal set.

The following four numbers are calculated for two binary n-tuples x = (x₁, …, x_n) and y = (y₁, …, y_n):

a = |X ∩ Y|, the number of attributes k possessed both by x and y (such that x_k = 1, y_k = 1);

$b = | X \cap \bar{Y} |$ , the number of attributes k possessed only by x (such that x_k = 1, y_k = 0);

$c = | \bar{X} \cap Y |$ , the number of attributes k possessed only by y (such that x_k = 0, y_k = 1);

$d = | \bar{X} \cap \bar{Y} |$ , the number of attributes k possessed neither by x nor by y (such that x_k = 0, y_k = 0).

It fulfills: a + b + c + d = n, |U| = n, |X| = a + b, |Y| = a + c, $| \bar{X} | = c + d$ , $| \bar{Y} | = b + d$ .

The numbers a and d give the numbers of positive and negative matches, respectively [29, 63]. Usually the numbers a, b, c, d are organized as a 2×2 table called a contingency table, see Table 1.

Table 1
2×2 table

Y $\bar{Y}$

1 0

X 1 a b a + b

$\bar{X}$ 0 c d c + d

a + c b + d n

		Y	$\bar{Y}$
X	1	a	b	a + b
$\bar{X}$	0	c	d	c + d
		a + c	b + d	n

The following sections consider some popular similarity measures on 2×2 tables for illustrating general properties of similarity measures, the methods of their transformation and construction.

3 Similarity and dissimilarity functions

3.1 Similarity functions

Let Ω be a universal set. A function S : Ω × Ω → [0, 1] is called a similarity function on Ω if for all x, y in Ω it is symmetric:

S (x, y) = S (y, x),

(1)

and reflexive:

$S (x, x) = 1 .$ (2)

A similarity function S is strictly reflexive if

$S (x, y) < 1, for all x, y in Ω such that x \neq y .$

A similarity function S is 0-normal on Ω if

$S (x, y) = 0, for some x, y in Ω .$

Such elements x and y will be called 0-opposite in Ω for function S.

3.1.1 Remark

This paper uses for similarity functions the terminology and notations from the theory of fuzzy (or valued) relations where the similarity function called a symmetric and reflexive fuzzy relation [2], a fuzzy similitude relation [4] or a valued proximity (tolerance) relation [36]. The properties of symmetry and reflexivity are considered in many works, e.g. in [5 , 44]. In [44] it is considered symmetric and reflexive E-coefficient that satisfies also the transitivity property: if S (x, y) =1 and S (y, z) =1 then S (x, z) =1.

Generally, we use the term similarity function when defined on a universal set Ω that can represent some specific domain, e.g. the set of all real valued n-tuples. For specific domain Ω, the similarity function can have some additional properties depending on the properties of Ω.

If a similarity function is used for measuring similarity between objects of a specific domain Ω then it can serve as a similarity measure. Note that the inverse assertion is not always true. The term similarity measure sometimes used for functions which are not reflexive or not symmetric or take values out of the interval [0,1]. Such functions will not be similarity functions. On the contrary, a “correlation measure” satisfying symmetry, reflexivity and taking values in [0,1] will be a similarity function [71].

When the values of similarity function (or similarity measure) calculated between objects of some finite set X then this function defines a valued proximity relation between the elements of the set X [36]. In this case, if not misleading, we denote the set X by Ω.

3.2 Dissimilarity functions

A function D : Ω × Ω → [0, 1] satisfying for all x, y in Ω the properties of symmetry:

$D (x, y) = D (y, x),$

and irreflexivity: $D (x, x) = 0,$

will be called a dissimilarity function on Ω. A dissimilarity function D is strictly irreflexive if it is fulfilled: D (x, y) >0 for all x, y in Ω such that x ≠ y. A dissimilarity function D is 1-normal on Ω if for some x, y in Ω it fulfills: D (x, y) =1. Such elements x and y will be called 1-opposite in Ω for the function D.

Dissimilarity functions are dual to similarity functions. These functions will be called complementary one to another if for all x, y in Ω it fulfills:

$S (x, y) + D (x, y) = 1 .$ (3)

From Equation (3) it follows for all x, y in Ω:

$S (x, y) = 1 - D (x, y), D (x, y) = 1 - S (x, y) .$

A similarity function is strictly reflexive or 0-normal if and only if its complementary dissimilarity function is strictly irreflexive or 1-normal, respectively.

Due to duality of similarity and dissimilarity functions one can consider only one of these functions, but these measures have different interpretations and methods of construction. For this reason, it is useful to consider these functions together as resemblance functions.

3.3 Resemblance functions

A resemblance function is a symmetric function R : Ω × Ω → [0, 1] that is reflexive or irreflexive. Two resemblance functions R₁ and R₂ are called resemblance functions of the same type if both are reflexive or both are irreflexive.

Denote P (S) and P (D) the sets of all similarity and all dissimilarity functions, respectively, defined on the set Ω. Then P (R) = P (S) ∪ P (D) is the set of all resemblance functions defined on Ω. Section 10 considers the lattice of resemblance functions.

3.4 Examples of resemblance functions

3.4.1 Resemblance functions on 2 × 2 tables

Consider some popular similarity measures on 2×2 tables [28 , 42] together with their complementary dissimilarity functions. Such measures usually used in classification of objects described by n binary attributes or features. These measures can be used also for measuring similarity and dissimilarity between sets and between n measurements of two dichotomous variables.

In this paper to simplify notations, the same similarity measures S on 2 × 2 tables denoted as:

functions S (x, y), where x = (x₁, …, x_n) and y = (y₁, …, y_n) are n-tuples with binary components,

functions S (X, Y), where X and Y are the sets of attributes possessed by n-tuples x and y,

functions S (a, b, c, d) with elements of 2 × 2 tables defined by x and y, such that a + b + c + d = n,

functions S (a, d), when it is possible, due to b + c = n - (a + d). Such representation is used in visualization of these measures as functions of arguments a and d in Section 9. See also [17].

To avoid confusions, the reader can use different notations for these functions.

Jaccard:

$S (x, y) = \frac{a}{a + b + c} = \frac{| X \cap Y |}{| X \cup Y |} = \frac{a}{n - d},$ $D (x, y) = \frac{b + c}{a + b + c} = \frac{| X \oplus Y |}{| X \cup Y |},$

Simple Matching:

$\begin{matrix} S (x, y) = \frac{a + d}{a + b + c + d} \\ = \frac{| X \cap Y | + | \bar{X} \cap \bar{Y} |}{n} = \frac{a + d}{n}, \end{matrix}$

$\begin{matrix} D (x, y) = \frac{b + c}{a + b + c + d} \\ = \frac{| X \cap \bar{Y} | + | \bar{X} \cap Y |}{n} = \frac{| X \oplus Y |}{n} . \end{matrix}$

The complementary dissimilarity function for Simple Matching has also another form:

$D (x, y) = \frac{1}{n} \sum_{i = 11}^{n} | x_{i} - y_{i} | .$

Hence, the Simple Matching similarity measure is nothing else but the complementary measure of the normalized Manhattan distance.

Ochiai (cosine):

$S (x, y) = \frac{a}{\sqrt{(a + b) (a + c)}} = \frac{| X \cap Y |}{\sqrt{| X | | Y |}},$ $D (x, y) = \frac{1}{2} \sum_{i = 1}^{n} {(\frac{x_{i}}{\sqrt{a + b}} - \frac{y_{i}}{\sqrt{a + c}})}^{2} .$

Ochiai (cosine) similarity measure is the complementary similarity function of the dissimilarity function obtained from the Euclidean distance between normalized n-tuples, see the next section and Section 6.

Note, that when the denominator in the formulas of Jaccard and Ochiai similarity measures equals to 0 due to the reflexivity of similarity functions they can be evaluated as 1, see Section 9.2.

The properties of similarity functions as functions S (X, Y) of the sets of attributes X and Y and as functions S (a, b, c, d) of parameters a, b, c, d, will have the following forms:

symmetry:

$\begin{matrix} S (X, Y) = S (Y, X), \\ S (a, b, c, d) = S (a, c, b, d), \end{matrix}$

reflexivity:

S (X, X) = 1, S (a, 0, 0, d) = 1,

strict reflexivity:

$\begin{matrix} S (X, Y) < 1, if X \neq Y, \\ S (a, b, c, d) < 1, if b + c > 0, \end{matrix}$

0-normality:

$\begin{matrix} S (X, Y) = 0, for some X, Y, \\ S (a, b, c, d) = 0, for some a, b, c, d . \end{matrix}$

The Jaccard, Simple Matching and Ochiai similarity measures are strictly reflexive and 0-normal similarity functions.

It is surprising that the Russel & Rao measure mentioned in many works as a similarity measure is symmetric but not reflexive, hence it is not a similarity function:

Russel & Rao:

$S (x, y) = \frac{a}{a + b + c + d} = \frac{| X \cap Y |}{n} = \frac{a}{n} .$

The popular coefficient of association:

Yule’s Q:

$A_{YuleQ} (x, y) = \frac{ad - bc}{ad + bc},$

is a symmetric and reflexive function but it is not a similarity function because it can be negative. It is an example of correlation function (see Section 13).

3.4.2 Cosine similarity measure

On the set of non-negative real valued n-tuples x = (x₁, …, x_n) such that x ≠ (0, …, 0), it is often used

Cosine similarity measure:

$cos (x, y) = \frac{\sum_{i = 1}^{n} x_{i} y_{i}}{\sqrt{\sum_{i = 1}^{n} x_{i}^{2}} \sqrt{\sum_{i = 1}^{n} y_{i}^{2}}} .$

It is symmetric, reflexive and takes values in [0,1] hence it is the similarity function. It is the 0-normal similarity function: two n-tuples x and y are opposite, i.e. S (x, y) =0, if and only if they are orthogonal.

Note that on the set of real valued n-tuples cosine measure will not be a similarity function because it can be negative.

The cosine similarity function has the following complementary dissimilarity function:

$D (x, y) = \frac{1}{2} \sum_{i = 1}^{n} {(\frac{x_{i}}{\sqrt{\sum_{i = 1}^{n} x_{i}^{2}}} - \frac{y_{i}}{\sqrt{\sum_{i = 1}^{n} y_{i}^{2}}})}^{2},$ (4)

defined by Euclidean distance between normalized n-tuples. Section 6 considers the general class of such dissimilarity functions.

For binary n-tuples, cosine similarity measure coincides with Ochiai similarity measure.

Non-negative n-tuples appear, for example, in text processing, when the text is represented by a vector x of n linguistic features and x_i equal to the number of appearances of the i-th feature in the text [59]. In such n-tuples n can be greater than 1000.

4 Similarity functions and fuzzy relations

4.1 Min-transitivity and ultrametric inequality

A similarity function S is min-transitive if for all x, y, z in Ω it is fulfilled:

$S (x, z) ⩾ min (S (x, y), S (y, z)) .$ (5)

For similarity functions min-transitivity is equivalent to the following condition fulfilled for all x, y, z in Ω:

$\begin{matrix} if S (x, z) ⩾ max (S (x, y), S (y, z)) \\ then S (x, y) = S (y, z) . \end{matrix}$

The last condition means that if S is min-transitive then for any three objects x, y, z in Ω at least two of three values S (x, y), S (y, z) and S (x, z) are equal.

Zadeh introduced the property of min-transitivity under the name of max-min transitivity for fuzzy similarity relations [2].

A similarity function S is min-transitive if and only if its complementary dissimilarity function D is ultrametric, i.e. for all x, y, z in Ω it fulfills the ultrametric inequality:

$D (x, z) \leq max (D (x, y), D (y, z)) .$ (6)

An ultrametric dissimilarity function D on Ω defines a hierarchy of nested partitions of the set Ω and used in hierarchical clustering [45]. The min-transitive similarity function is the valued equivalence (similarity) relation defining hierarchy of partitions of the set Ω on equivalence classes of this relation that gives natural interpretation for these partitions [2, 65]. See the next section and Section 12.

4.2 Fuzzy (valued) relations

A fuzzy relation S on the set Ω is introduced by Lotfi Zadeh [1, 2] as an extension of the characteristic function of a binary non-fuzzy relation and is defined for all x, y in Ω by its membership function μ_S (x, y) taking values in [0,1]. The value μ_S (x, y) denotes the grade of membership of the ordered pair (x, y) in S or the strength of the relation S between x and y. A fuzzy relation S on a finite set Ω which is symmetric, reflexive and max-min transitive, i.e. $μ_{S} (x, z) ⩾ max_{y \in Ω} min (μ_{S} (x, y), μ_{S} (y, z))$ for all x, y, z in Ω, is called a similarity relation [2] or fuzzy equivalence relation. Often, the membership function μ_S (x, y) is denoted by S (x, y) and max-min transitivity is given as min-transitivity Equation (5) [4, 36]. The similarity relation S defines a hierarchy of nested crisp equivalence relations obtained by “alfa-cuts” of S and hence the nested partitions of the set Ω on equivalence classes of these relations. Similarity relation used as the base for fuzzy relational clustering [65]. A similarity function can serve as a fuzzy relation on finite set of objects when the relationships like transitivity between different pairs of objects are considered.

Usually, from similarity measures, the min-transitivity is not required; for this reason, this property is not required also from similarity functions. Moreover, as De Baets et al. [30] showed, the wide class of parametric similarity measures does not contain min-transitive measures, see Section 5.5.

4.3 Non-fuzzy equivalence relations

A binary (given by pairs of objects) non-fuzzy (crisp) proximity relationR on a set Ω is defined as a set of ordered pairs: R ⊆ {(x, y) |x, y ∈ Ω}, satisfying for all x, y in Ω the properties:

$\begin{matrix} if (x, y) \in R then (y, x) \in R, (symmetry), \\ (x, x) \in R, (reflexivity) . \end{matrix}$

The proximity relation R on Ω can be represented by its characteristic (indicator) function: 1_R (x, y) =1, if (x, y) ∈ R and 1_R (x, y) =0, in the opposite case. The function S (x, y) = 1_R (x, y) satisfies symmetry Equation (1), reflexivity Equation (2) and takes values in {0, 1} ⊂ [0, 1], hence it is the similarity function.

A non-fuzzy equivalence relationR on Ω is a transitive proximity relation, i.e. for all x, y, z in Ω it is fulfilled:

$if (x, y) \in R and (y, z) \in R then (x, z) \in R .$

This transitivity condition is equivalent to the min-transitivity Equation (5) of the similarity function S (x, y) = 1_R (x, y).

5 T-transitivity and triangle inequality

5.1 Distance functions and metrics

A symmetric and irreflexive function d (x, y) taking values in the set of non-negative real values is referred to as a distance function. A pseudo-metric is a dissimilarity or distance function satisfying for all x, y, z in Ω the triangle inequality:

$d (x, z) \leq d (x, y) + d (y, z) .$ (7)

A strictly irreflexive pseudo-metric is called a metric.

Note that the ultrametric inequality is stronger than the triangle inequality.

If d is a distance function and g is a non-negative non-increasing function such that g (0) =1 then S (x, y) = g (d (x, y)) is the similarity function. Below are examples of similarity and complementary dissimilarity functions obtained from distance function d:

$S (x, y) = \frac{1}{1 + d (x, y)}, D (x, y) = \frac{d (x, y)}{1 + d (x, y)},$ $S (x, y) = exp (- d^{2} (x, y)), D (x, y) = 1 - S (x, y) .$

If for all x, y in Ω it is fulfilled d (x, y) ≤ M for some positive number M, then these functions:

$D (x, y) = \frac{d (x, y)}{M}, S (x, y) = 1 - \frac{d (x, y)}{M},$

are complementary dissimilarity and similarity functions. If d (x, y) = M for some x, y in Ω then S and D will be 0- and 1-normal functions, respectively.

5.2 Metrics on the set of real n-tuples

Below there are the most popular metrics on the set Ω of real n-tuples. Denote: x = (x₁, …, x_n), y = (y₁, …, y_n).

Minkowski distance of order p ⩾ 1:

$d (x, y) = \sqrt[p]{\sum_{i = 1}^{n} {| x_{i} - y_{i} |}^{p}} .$

For p = 2 we obtain Euclidean distance: $d (x, y) = \sqrt{\sum_{i = 1}^{n} {| x_{i} - y_{i} |}^{2}} .$

Manhattan (taxicab) distance (for p = 1):

$d (x, y) = \sum_{i = 1}^{n} | x_{i} - y_{i} | .$

The book [33] gives more examples of distances.

5.3 T-transitive similarity functions

A similarity function S is T-transitive if for all x, y, z in Ω it is fulfilled:

$S (x, z) ⩾ T (S (x, y), S (y, z)),$

where T is a t-norm, i.e. commutative, associative, and monotonic binary operation on [0, 1] satisfying for all a in [0,1] the boundary condition: T (a, 1) = a. Below there are examples of basic t-norms [47] defined for all a, b in [0,1]:

$\begin{matrix} T_{M} (a, b) = min (a, b), (minimum) \\ T_{p} (a, b) = ab, (product) \\ T_{L} (a, b) = max (a + b - 1, 0) . (Lukasiewicz t - norm) \end{matrix}$

In this notation, min-transitive similarity function is T_M-transitive.

5.4 T-transitivity and metrics

Menger [51] considered probabilistic relationE such that E (a, b) denotes the probability that a and b are equal. This function satisfies symmetry, reflexivity and property: E (a, c) ⩾ E (a, b) E (b, c). This probabilistic relation is the similarity function satisfying product-transitivity: E (a, c) ⩾ T_p (E (a, b), E (b, c)). The function d (a, b) = - log(E (a, b)) is the metric.

Bezdek and Harris [21] studied different types of transitivity of symmetric and reflexive fuzzy relations. It was shown, for example, that if fuzzy relation R is T_L– transitive (called max-Δtransitive) then the function D = 1 - R will be a pseudo-metric.

5.5 Transitivity of parametric families of similarity measures on 2×2 tables

T-transitivity of cardinality-based parametric similarity measures and their possible relationship with triangle inequality studied in [30]. Using the notation considered before these similarity measures on 2 × 2 tables represented as follows:

$S (x, y) = \frac{K_{4} (b + c) + {aK}_{1} + {dK}_{2}}{K_{3} (b + c) + {aK}_{1} + {dK}_{2}},$ (8)

where K₃ > K₄ ⩾ 0 and K_i, i = 1, 2, are positive real parameters.

The paper [30] notes that this family does not contain min-transitive similarity measures. It is proved that a similarity measure from this family is T_L-transitive if and only if K₃ ⩾ max(K₁, K₂), where T_L-transitivity is given in the form: S (x, z) ⩾ S (x, y) + S (y, z) -1. Its complementary dissimilarity measure is pseudo-metric satisfying triangle inequality.

[30] proved also that a similarity measure from this family is T_P-transitive if and only if

$K_{3} K_{4} ⩾ max (K_{1}^{2}, K_{2}^{2}) .$ (9)

It should be noted that for the consistent similarity measures on 2 × 2 tables it should be S (x, y) =0 if x_i ≠ y_i for all i = 1, …, n, i.e. if x and y have neither positive (a = 0) nor negative (d = 0) matches. For such similarity measures, it should be K₄ = 0, in Equation (8), hence Equation (9) does not fulfill, and consistent similarity measures cannot be T_P-transitive.

It is surprising that the consistent similarity measures on 2 × 2 tables from the parametric family Equation (8) including most of popular similarity measures can be neither min-transitive nor T_P-transitive. If such similarity measures are T_L-transitive then they are complementary to pseudo-metrics.

5.6 Parametric families of similarity measures on 2 × 2 tables

Gower and Legendre [40] considered the following parametric families of similarity measures for 2 × 2 tables:

$T_{θ} = \frac{a}{a + θ (b + c)}, S_{θ} = \frac{a + d}{a + d + θ (b + c)},$

where θ > 0 that will be also referred to as (a)-family and (a + d)-family of similarity measures, respectively. Batyrshin et al. [17] introduced the following parametric (a + pd)-family of similarity measures:

$S_{a + pd} (x, y) = \frac{a + pd}{a + pd + θ (b + c)},$

where θ > 0 and p belongs to [0,1]. This formula gives (a)-family T_θ when p = 0 and (a + d)-family S_θ when p = 1.

These three parametric measures belong to the class of parametric measures Equation (8), hence these measures do not satisfy the min-transitivity and T_P-transitivity properties.

For consistent similarity measures Equation (8) with K₄ = 0 dividing nominator and denominator of Equation (8) on positive value K₁ we obtain:

$S (x, y) = \frac{a + {dp}_{2}}{p_{3} (b + c) + a + {dp}_{2}}$

where p₂ = K₂/K₁ and p₃ = K₃/K₁. The condition of T_L-transitivity will have the form: p₃ ⩾ max(1, p₂). Thus, T_L-transitive similarity measures from the families of similarity measures T_θ, S_θ and S_a+pd are characterized by condition: θ ⩾ 1.

6 Data transformation

6.1 Transformations on universal domain

Consider the general method of construction of resemblance functions based on data transformation.

Proposition 1.LetF : Ω → Ω₁be a mapping (transformation) from the universal setΩinto the setΩ₁andR₁ : Ω₁ × Ω₁ → [0, 1] be a resemblance function onΩ₁then the function defined for allx, yinΩby:

$R (x, y) = R_{1} (F (x), F (y)),$

is the resemblance function R : Ω × Ω → [0, 1] on the set Ω of the same type as R₁.

The proof is straightforward.

A typical example of data transformation F is a data normalization. In this case, Proposition 1 says: normalize data and apply some resemblance function to normalized data. However, for specific domains Ω and Ω₁ the transformation F can have a variety of forms. For example, Sections 6.3–6.7 introduce p-transformation of data applied together with Minkowski distance of order p for constructing dissimilarity functions on a domain Ω.

6.2 Transformations of binary n-tuples

The method of construction of similarity measures on 2 × 2 tables can be represented in a general form as the composition of two mappings: F : Ω × Ω → Ω₁ and F₁ : Ω₁ → [0, 1], such that

$S (x, y) = F_{1} (F (x, y)),$

where Ω is the set of binary n-tuples, Ω₁ is the set of non-negative integer 4-tuples (a, b, c, d) such that a + b + c + d = n and F₁ satisfies the properties: F₁ (a, 0, 0, d) =1 (for reflexivity), and F₁ (a, b, c, d) = F₁ (a, c, b, d) (for symmetry). Here F₁ (a, b, c, d) denotes F₁ ((a, b, c, d)). The transformation F is given by a 2 × 2 table, see Section 2.2, transforming the pair of n-tuples (x, y) into 4-tuple (a, b, c, d). The transformation F₁ is defined by a specific similarity measure transforming four values a, b, c, d into a similarity value from [0,1].

6.3 P-transformations in constructing dissimilarity functions by Minkowski distance of order p

Consider data transformations partially based on results of [12]. A real valued n-tuple z = (z₁, …, z_n) satisfying for some real p ⩾ 1 the property:

$\sum_{i = 1}^{n} {| z_{i} |}^{p} = 1,$

is called p-normalized (or p-standardized). Denote $Ω_{n}^{p}$ the set of p-normalized real n-tuples.

A function $F : Ω \to Ω_{n}^{p}$ transforming elements x of a set Ω into p-normalized n-tuples F (x) = (F (x) ₁, …, F (x) _n) in $Ω_{n}^{p}$ such that for all x in Ω it fulfills:

$\sum_{i = 1}^{n} {| F (x)_{i} |}^{p} = 1,$ (10)

is called p-transformation (p-normalization or p-standardization) of elements of Ω.

Proposition 2.LetF (x) be a p-transformation of elements of the setΩinto n-tuples then the function

$D (x, y) = \frac{1}{2} \sqrt[p]{\sum_{i = 1}^{n} {| F (x)_{i} - F (y)_{i} |}^{p}},$ (11)

is the dissimilarity function (metric) on Ω.

Proof. From |a - b| ≤ ||a| + |b||, from Minkowski inequality: $\sqrt[p]{\sum_{i = 1}^{n} {| a_{i} + b_{i} |}^{p}} \leq \sqrt[p]{\sum_{i = 1}^{n} {| a_{i} |}^{p}} + \sqrt[p]{\sum_{i = 1}^{n} {| b_{i} |}^{p}}$ , and from Equation (10) it follows $\sqrt[p]{\sum_{i = 1}^{n} {| F (x)_{i} - F (y)_{i} |}^{p}} \leq 2$ and hence 0 ≤ D (x, y) ≤1 for all x, y in Ω. D is the metric due to Minkowski distance is the metric ■

From Proposition 2 it follows that the function:

$D (x, y) = \frac{1}{2^{p}} \sum_{i = 1}^{n} {| F (x)_{i} - F (y)_{i} |}^{p},$ (12)

is also the dissimilarity function on Ω. For Euclidean (p = 2) and Manhattan (p = 1) distances it gives the following dissimilarity functions:

$D (x, y) = \frac{1}{4} \sum_{i = 1}^{n} {(F (x)_{i} - F (y)_{i})}^{2},$ (13) $D (x, y) = \frac{1}{2} \sum_{i = 1}^{n} | F (x)_{i} - F (y)_{i} | .$ (14)

6.3.1 Remark

The considered method of construction of dissimilarity functions uses the description of objects of the set Ω by n features (attributes, properties or measurements) obtained in two ways:

The function F is a feature function assigning to each object from Ω a n-tuple of n feature values F (x) = (F (x) ₁, …, F (x) _n).

The elements of the set Ω are given by m-tuples of real values x = (x₁, …, x_m) and F (x) = (F (x) ₁, …, F (x) _n) is a transformation of m-tuple x = (x₁, …, x_m) into a real valued n-tuple F (x). Often, n = m, but generally these two numbers can be different. For example in [12, 16] the moving approximation transform replaces the sequence of m time series values by a sequence of n local trends, where n < m.

From dissimilarity function obtain complementary similarity function by: S (x, y) =1 - D (x, y).

6.4 Dissimilarity functions for non-negative n-tuples

Proposition 3.LetFbe a p-transformation of elements of the setΩinto n-tuples, and for allx, yinΩit fulfills: F (x) _i ⩾ 0, F (y) _i ⩾ 0 for alli = 1, …, n, then the functions

$D (x, y) = \sqrt[p]{\frac{1}{2} \sum_{i = 1}^{n} {| F (x)_{i} - F (y)_{i} |}^{p}},$ (15) $D (x, y) = \frac{1}{2} \sum_{i = 1}^{n} {| F (x)_{i} - F (y)_{i} |}^{p}$ (16)

are dissimilarity functions on Ω.

Proof. From |a - b| ≤ max(a, b) for all a, b ⩾ 0 it follows: |a - b|^p ≤ max(a^p, b^p) ≤ a^p + b^p. Hence, for p-transformation F from Equation (10) it follows:

$\sum_{i = 1}^{n} {| F (x)_{i} - F (y)_{i} |}^{p} \leq \sum_{i = 1}^{n} ({| F (x)_{i} |}^{p} + {| F (y)_{i} |}^{p}) = 2,$

that gives for Equations (15), (16): 0 ≤ D (x, y) ≤1. The similarity and irreflexivity of D are evident ■

From Equation (16) for p = 2 we obtain that:

$D (x, y) = \frac{1}{2} \sum_{i = 1}^{n} {(F (x)_{i} - F (y)_{i})}^{2},$ (17)

is a dissimilarity function. Compare it with Equation (13). For p = 1 from Equation (16) we obtain Equation (14).

6.5 Constructing p-transformations

Proposition 4.Letf (x) be a mapping of elementsxofΩinto a real valued n-tuples: f (x) = (f (x) ₁, …, f (x) _n) andΩ_f=0 = {x ∈ Ω|f (x) = (0, …, 0)} be a proper subset ofΩ. Then the functionFdefined onΩ ∖ Ω_f=0by

$F (x)_{i} = \frac{f (x)_{i}}{\sqrt[p]{\sum_{i = 1}^{n} {| f (x)_{i} |}^{p}}}, i = 1, \dots, n,$ (18)

is a p-transformation on Ω ∖ Ω_f=0 for any p ⩾ 1.

The proof is straightforward.

This proposition together with previous propositions give a possibility to construct dozens of dissimilarity and complementary similarity functions.

6.6 Cosine similarity measure

For a non-negative 2-transformation F (x) from Equation (17) and $\sum_{i = 1}^{n} F (x)_{i}^{2} = 1$ obtain similarity function:

$\begin{matrix} S (x, y) = 1 - \frac{1}{2} \sum_{i = 1}^{n} {(F (x)_{i} - F (y)_{i})}^{2} = \\ \sum_{i = 1}^{n} F (x)_{i} \cdot F (y)_{i} = \frac{\sum_{i = 1}^{n} F (x)_{i} F (y)_{i}}{\sqrt{\sum_{i = 1}^{n} F (x)_{i}^{2}} \sqrt{\sum_{i = 1}^{n} F (y)_{i}^{2}}} = \\ = cos (F (x), F (y)) . \end{matrix}$

On the set Ω of non-negative n-tuples x = (x₁, …, x_n) using in Equation (18)f (x) _i = x_i and p = 2 obtain:

$F (x)_{i} = \frac{x_{i}}{\sqrt{\sum_{i = 1}^{n} x_{i}^{2}}},$

that gives the cosine similarity measure:

$S (x, y) = \frac{\sum_{i = 1}^{n} x_{i} y_{i}}{\sqrt{\sum_{i = 1}^{n} x_{i}^{2}} \sqrt{\sum_{i = 1}^{n} y_{i}^{2}}} = cos (x, y) .$

From Equation (17) obtain the dissimilarity function Equation (4) complementary to cosine similarity measure and considered in Section 3.4.2.

Soft cosine similarity measure including similarity between features considered in [62].

6.7 P-transformations with aggregation functions

The general methods of p-normalization (p-standardization) using aggregation functions [20, 41] are considered in [12]. For example on the set Ω of n-tuples x = (x₁, …, x_n) define in Equation (18):

$f (x)_{i} = x_{i} - g (x),$

where g (x) is an aggregation function, e.g. mean, median, maximum or minimum of n values of n-tuple x = (x₁, …, x_n). For example, for arithmetic mean:

$g (x) = \bar{x} = \frac{1}{n} \sum_{i = 1}^{n} | x_{i} |,$

and for p = 2, Equation (18) gives the following popular transformation of data:

$F (x)_{i} = \frac{x_{i} - \bar{x}}{\sqrt{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}}},$

and Equation (13) gives the dissimilarity function:

$D (x, y) = \frac{1}{4} \sum_{i = 1}^{n} {(\frac{x_{i} - \bar{x}}{\sqrt{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}}} - \frac{y_{i} - \bar{y}}{\sqrt{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}})}^{2} .$

This function used in Section 13 for constructing Pearson’s correlation coefficient [12, 15].

6.8 Similarity functions for real n-tuples

From Equation (13) for 2-normalized F (x) it follows:

$\begin{matrix} S (x, y) = 1 - \frac{1}{4} \sum_{i = 1}^{n} (F (x)_{i} - F (y)_{i})^{2} = \\ \frac{1}{2} (1 + \sum_{i = 1}^{n} F (x)_{i} F (y)_{i}) = \\ \frac{1}{2} (1 + cos (F (x), F (y))) . \end{matrix}$

Note that here cosine takes values in [– 1,1] and hence it is not a similarity function !

6.9 Resemblance functions for finite probabilistic distributions

Proposition 5.Letx = (x₁, …, x_n) be 1-normalized, i.e. $\sum_{i = 1}^{n} | x_{i} | = 1$ , then $F (x) = (\sqrt[p]{| x_{1} |}, \dots, \sqrt[p]{| x_{n} |})$ is p-normalized.

Proof. $\sum_{i = 1}^{n} {| F (x)_{i} |}^{p} = \sum_{i = 1}^{n} {| \sqrt[p]{| x_{i} |} |}^{p} = \sum_{i = 1}^{n} | x_{i} | = 1$ ■

From Propositions 3 and 5 it follows

Proposition 6.Letx = (x₁, …, x_n) be 1-normalized then the functions

$\begin{matrix} D (x, y) = \sqrt[p]{\frac{1}{2} \sum_{i = 1}^{n} {| \sqrt[p]{| x_{i} |} - \sqrt[p]{| y_{i} |} |}^{p}}, \\ D (x, y) = \frac{1}{2} \sum_{i = 1}^{n} {| \sqrt[p]{| x_{i} |} - \sqrt[p]{| y_{i} |} |}^{p}, \end{matrix}$ (19)

are the dissimilarity functions.

If a 1-normalized data is non-negative, i.e.:

$\sum_{i = 1}^{n} x_{i} = 1 and x_{i} ⩾ 0 for all i = 1, \dots, n,$

then one can use the following dissimilarity function obtained from Equation (19)

$D (x, y) = \frac{1}{2} \sum_{i = 1}^{n} {| \sqrt[p]{x_{i}} - \sqrt[p]{y_{i}} |}^{p},$

and for p = 2:

$D (x, y) = \frac{1}{2} \sum_{i = 1}^{n} {(\sqrt{x_{i}} - \sqrt{y_{i}})}^{2} .$ (20)

When x = (x₁, …, x_n) and y = (y₁, …, y_n) are probability distributions (1-normalized) then the function:

$D (x, y) = \sum_{i = 1}^{n} {(\sqrt{x_{i}} - \sqrt{y_{i}})}^{2},$

is called Hellinger discrimination, Matusita measure [32] or Squared-Chord distance [26]. Note that the last measure is not a dissimilarity function because it takes values in [0,2].

If x = (x₁, …, x_n) and y = (y₁, …, y_n) are probability distributions then the dissimilarity function Equation (20) has the following complementary similarity function:

$S (x, y) = \sum_{i = 1}^{n} \sqrt{x_{i} y_{i}},$

called Bhattacharyya coefficient [3 , 32] and equal to the cosine between the vectors $(\sqrt{x_{1}}, \dots, \sqrt{x_{n}})$ and $(\sqrt{y_{1}}, \dots, \sqrt{y_{n}})$ .

7 Resemblance functions on domains with ordering and involution (negation)

7.1 Resemblance functions on ordered domain

A similarity function S on a set Ω with a partial ordering ≤ is called monotone on Ω if for all x, y, z in Ω it is fulfilled:

$if x \leq y \leq z then S (x, z) \leq min (S (x, y), S (y, z)) .$

This property considered together with min-transitivity Equation (5) of S will give very strong restrictions on possible values of S:

$if x \leq y \leq z then S (x, z) = min (S (x, y), S (y, z)) .$

Dually, for a dissimilarity function the monotonicity property has the form:

$if x \leq y \leq z then D (x, z) ⩾ max (D (x, y), D (y, z)) .$

The monotonicity property for distances on partially ordered set was considered in [7] and for valued relations in [36].

7.2 Involution (negation) on Ω

Consider the properties of resemblance functions related with an involution (negation, complement, reflection) operation introduced on the domain. For example, in [13] involution (reflection) is used in definition and construction of association (correlation) measures.

A function N : Ω → Ω is called an involution on Ω if for all x in Ω it fulfills:

$N (N (x)) = x .$

A non-trivial involution N (which is not the identity function) such that N (x) ≠ x for some x in Ω will be called an (involutive) negation on Ω. Denote a negation on Ω also as N_Ω. Generally, the terms negation and non-trivial involution one can use as interchangeable but negation is more common.

From the definition of the negation, it follows that it is a bijective mapping of Ω onto Ω and N^-1 (x) = N (x) for all x in Ω.

An element x of Ω such that N (x) = x will be called a fixed point ofN in Ω and denoted by x_FP. The set of all fixed points of N in Ω will be denoted FP (N, Ω). For specific domains Ω, this set can be empty, finite or infinite. Denote Ω_N = Ω ∖ FP (N, Ω). This set is used in definition of association (correlation) measures [13], see also Section 13.

7.3 Negation on ordered domain

A negation N on a set Ω with a partial ordering ≤ will be called antitone if for all x, y in Ω it is fulfilled:

$if x \leq y then N (y) \leq N (x) .$

7.3.1 Examples of negations

Let Ω be the set of all n-tuples x = (x₁, …, x_n) of the length n > 1, and x_i, for all i = 1, …, n, takes values in some set L. Let N_L be a negation on L. Define negation N_Ω on Ω by: N_Ω (x) = (N_L (x₁), …, N_L (x_n)). From the involutivity of N_L it follows the involutivity of N_Ω. If not misleading, denote the negation of n-tuples and the negation of their elements by the same letter N, i.e. N (x) = (N (x₁), …, N (x_n)).

If L is the set of real numbers, define N (x) = - x = (- x₁, …, - x_n). n-tuple x_FP = (0, …, 0) is the unique fixed point of this negation.

If L = {0, 1} then n-tuple x = (x₁, …, x_n) is a vector of binary attributes or the sequence of n measurements of dichotomous variable. Define negation on Ω by: N (x) = (1 - x₁, …, 1 - x_n). This negation has not fixed points and FP (N, Ω) =∅.

If L = [0, 1] and x = (x₁, …, x_n) is a list of membership values (membership function) of a finite fuzzy set u = {u₁, …, u_n} then N (x) = (1 - x₁, …, 1 - x_n) is the negation of the fuzzy set x [1]. The n-tuple x_FP = (0.5, …, 0.5) is the unique fixed point of this negation. Generally, on [0,1] the involutive negation is defined as a strictly decreasing function that can differ from N (x) =1 - x [10, 67].

Let L = {1, 2, …, m} be a bipolar scale [18] with the negation N_L (k) = m - k + 1, and x = (x₁, …, x_n) be a rating profile of the user x evaluating n items in the rating scale L. Then this negation and corresponding negation N_Ω have no fixed points if m is even. If m is odd, i.e. m = 2r + 1 for some integer r, then N_L has the fixed point C = r + 1 and N_Ω has the unique fixed point: x_FP = (C, …, C).

On all considered above domains of n-tuples define the partial ordering as follows:

$x \leq y if x_{i} \leq y_{i} for all i = 1, \dots, n .$

Then the considered negations will be antitone.

7.4 Consistent resemblance functions

A similarity function S on the set Ω with a negation N is non-contradictive or consistent if for all x in Ω it is fulfilled:

$S (x, N (x)) = 0 .$

Dually we have for consistent dissimilarity function:

$D (x, N (x)) = 1 .$

If a similarity function on the set Ω is consistent then any element of Ω is opposite to its negation.

A similarity function S cannot be consistent on Ω if negation N has a fixed point in Ω. Indeed, in this case due to reflexivity of S it should be:

$S (x_{FP}, N (x_{FP})) = S (x_{FP}, x_{FP}) = 1 .$

Some domains Ω with a negation N having fixed points and non-consistent similarity function S can be converted to the domains with consistent similarity function S after elimination of fixed points from Ω, i.e. after replacing Ω by Ω_N = Ω ∖ FP (N, Ω).

Note that the consistency of similarity functions (similarity measures) defined on 2 × 2 tables considered in Section 5.5 coincides with the consistency defined here.

7.5 Co-symmetric resemblance functions

A resemblance function R on the set Ω with negation N is called co-symmetric if

$R (N (x), N (y)) = R (x, y), for all x, y in Ω .$

A resemblance function is co-symmetric if and only if its complementary resemblance function is also co-symmetric.

As shown in [13], a resemblance function R on the set Ω with negation N is co-symmetric if and only if it satisfies the property:

$R (x, N (y)) = R (N (x), y), for all x, y in Ω .$

7.5.1 Remark

Co-symmetric metrics on the set of fuzzy sets with the fuzzy complementN considered in 1978 in [6] under the name symmetric metrics. In [9], such metrics studied on Kleene algebras. In [5], a co-symmetric resemblance measure on binary vectors called self-complementary. In [35], a co-symmetric similarity measure called a proximity measure. In [25], this property used in the definition of a restricted equivalence function. Co-symmetric similarity measures used in construction of association (correlation) measures in [13]. Furthermore, we consider the co-symmetrization of resemblance functions.

7.5.2 Examples of consistent and co-symmetric similarity measures on 2×2 tables

The properties of consistency and co-symmetry for similarity measures on 2×2 tables have the form:

consistency:

$S (X, \bar{X}) = 0, S (0, b, c, 0) = 0,$

co-symmetry:

$S (\bar{X}, \bar{Y}) = S (X, Y), S (d, c, b, a) = S (a, b, c, d) .$

The last relation means that if we replace in the formula S (a, b, c, d) the variables a, b, c, d by variables d, c, b, a, respectively, we obtain the formula equivalent to S (a, b, c, d).

Any reasonable similarity function on the set of binary n-tuples x = (x₁, …, x_n) should be consistent. From N (0) =1, N (1) =0, it follows: N (x_i) ≠ x_i, for all i = 1, …, n, hence, any n-tuple x has nor positive (a = 0) nor negative (d = 0) matches with its negation N (x). The consistency of similarity measures on 2×2 tables is as follows:

$S (x, y) = 0 if a = d = 0 .$

For example, Jaccard, Simple Matching and Ochiai similarity measures are consistent

The Simple Matching similarity measure is co-symmetric but Jaccard and Ochiai similarity measures are not co-symmetric.

7.6 Transformation of negations

Proposition 7.SupposeN : Ω → Ωis a negation on the setΩandφ : Ω → Ω₁is a bijective mapping ofΩonto a setΩ₁then the functionN₁ : Ω₁ → Ω₁defined for allyinΩ₁by

$N_{1} (y) = φ (N (φ^{- 1} (y))),$

is a negation on Ω₁.

From Proposition 7 it follows

Proposition 8.SupposeN : Ω → Ωis a negation on the setΩandφ : Ω → Ωis a bijective mapping ofΩontoΩthen the function

$N_{1} (x) = φ^{- 1} (N (φ (x))),$

defined for all x in Ω, is a negation on Ω.

The proof of both propositions is straightforward.

Proposition 7 generalizes the method of constructing negation on bipolar scale where N defines negation on the set of indexes of gradations of the scale and φ defines the mapping of this set on another set of indexes, or on the set of linguistic labels of the scale, or on the set of utility values [18].

Proposition 8 generalizes the method of constructing involutive negations on [0,1] proposed in [67] where the fuzzy negation N (x) =1 - x was used.

8 Transformations of resemblance functions

8.1 Equivalent transformations of resemblance functions

Two resemblance functions R₁ and R₂ of the same type defined on the set Ω called equivalent (by ordering) if for all x, y, u, v in Ω it fulfills:

$\begin{matrix} R_{1} (x, y) \leq R_{1} (u, v) if and only if \\ R_{2} (x, y) \leq R_{2} (u, v) . \end{matrix}$

It is clear that two equivalent resemblance functions should have the same type.

A continuous, strictly increasing function φ : [0, 1] → [0, 1] such that φ (0) =0 and φ (1) =1 is called an automorphism of the interval [0,1].

Proposition 9.IfRis a resemblance function onΩandφis an automorphism of the interval [0,1] then the function

$R_{1} (x, y) = φ (R (x, y)),$

defined for all x, y in Ω will be a resemblance function equivalent to R. If R is 0-normal (1-normal) then R₁ is 0-normal (1-normal). If R is strictly reflexive (strictly irreflexive) then R₁ is strictly reflexive (strictly irreflexive).

Proposition 10.IfRis a resemblance function on the setΩwith negationNand/or with partial ordering ≤, φis an automorphism of the interval [0,1] andR₁ (x, y) = φ (R (x, y)), for allx, yinΩ, then ifRsatisfies one of these properties:

min-transitivity,

ultrametric inequality,

monotonicity,

co-symmetry,

consistency,

then R₁ also satisfies this property.

Below there are examples of simplest equivalent transformations of resemblance functions defined by the parametric family of automorphisms of [0,1] φ (x) = x^k, k > 0:

$R_{1} (x, y) = R^{2} (x, y), R_{1} (x, y) = \sqrt{R (x, y)} .$

8.1.1 Remark

Due to the limitations on the size of the paper, the proofs of all statements of propositions of this and the following sections are not included. Some proofs are straightforward, some of them and other types of transformations of resemblance functions can be found in [5 , 50].

8.2 0-(1-)normality transformations

Proposition 11.LetSbe a similarity function onΩsuch that $a = min_{x, y \in Ω} (S (x, y)) < 1$ andf : [a, 1] → [0, 1] is a strictly increasing function such thatf (a) =0, f (1) =1 then

$S_{1} (x, y) = f (S (x, y)),$

is a 0-normal similarity function. If f is a bijection then S and S₁ are equivalent.

Such function f called 0-normality transformation. Below there is the linear 0-normality transformation:

$S_{1} (x, y) = \frac{S (x, y) - a}{1 - a} .$

Dually, we have

Proposition 12.LetDbe a dissimilarity function onΩsuch that $b = max_{x, y \in Ω} (D (x, y)) > 0$ andg : [0, b] → [0, 1] is a strictly increasing function such thatg (0) =0, g (b) =1 then

$D_{1} (x, y) = g (D (x, y)),$

is a 1-normal dissimilarity function. If g is a bijection then D and D₁ are equivalent.

Such function g called 1-normality transformation. Below there is the linear 1-normality transformation:

$D_{1} (x, y) = \frac{1}{b} D (x, y) .$

8.3 Negation based transformation

Proposition 13.IfRis a resemblance function on the setΩwith negationNthen the function:

$R_{1} (x, y) = R (N (x), N (y)),$

defined for all x, y in Ω, will the resemblance function of the same type as R. If R is 0-normal (1-normal) then R₁ is 0-normal (1-normal). If R is consistent then R₁ is consistent. If R is co-symmetric then R₁ = R.

8.4 Aggregation of resemblance functions

A function g (a₁, …, a_m) of m arguments a_i ∈ [0, 1], i = 1, …, m, taking values in [0,1], satisfying the properties: g (0, …, 0) =0, g (1, …, 1) =1, and strictly increasing in each variable is called an aggregation function [41]. It will be called conjunctive if g (a₁, …, a_m) =0 when a_i = 0, for some i = 1, …, m, and g will be called disjunctive if g (a₁, …, a_m) =1 when a_i = 1, for some i = 1, …, m.

Proposition 14.IfR₁, …, R_m, m > 1, are resemblance functions onΩof the same type andgis an aggregation function then the function

$R (x, y) = g (R_{1} (x, y), \dots, R_{m} (x, y)),$

defined for all x, y in Ω is a resemblance function of the same type satisfying the following properties:

If all R_i are strictly reflexive (strictly irreflexive) then R is strictly reflexive (strictly irreflexive).

If all R_i are 0-normal similarity functions such that for some x and y in Ω it is fulfilled R_i (x, y) =0 for all i = 1, …, m, then R is 0-normal.

If all R_i are 1-normal dissimilarity functions such that for some x and y in Ω it is fulfilled R_i (x, y) =1 for all i = 1, …, m, then R is 1-normal.

If all R_i are similarity functions, one of R_i is 0-normal and g is conjunctive then R is 0-normal.

If all R_i are dissimilarity functions, one of R_i is 1-normal and g is disjunctive then R is 1-normal.

If Ω has a negation and all R_i are consistent then R is consistent.

If Ω has a negation and all R_i are co-symmetric then R is co-symmetric.

The proof of the proposition is straightforward.

Below there are examples of aggregation of resemblance functions:

Mean aggregation:

$R (x, y) = \frac{1}{n} \sum_{i = 1}^{n} R_{i} (x, y) .$

Weighted aggregation:

$R (x, y) = \sum_{i = 1}^{n} w_{i} R_{i} (x, y), where \sum_{i = 1}^{n} w_{i} = 1 .$

Convex combination:

R (x, y) = w_{1} R_{1} (x, y) + w_{2} R_{2} (x, y),

where w₁ + w₂ = 1.

Conjunctive aggregations:

$S (x, y) = min {S_{1} (x, y), \dots, S_{m} (x, y)},$ $S (x, y) = S_{1} (x, y) \cdot \dots \cdot S_{m} (x, y),$ $S (x, y) = \sqrt[m]{S_{1} (x, y) \cdot \dots \cdot S_{m} (x, y)} .$

Disjunctive aggregation:

$D (x, y) = max {D_{1} (x, y), \dots, D_{m} (x, y)} .$

8.5 Symmetrization of functions

From some reflexive or irreflexive association measure that is not symmetric, it is possible to construct a symmetric resemblance measure.

Proposition 15.IfA : Ω × Ω → [0, 1] is a function on the setΩandgis a commutative aggregation function then the following function:

$R (x, y) = g (A (x, y), A (y, x)),$

is a symmetric function. If A is reflexive then R is a similarity function. If A is irreflexive then R is a dissimilarity function.

Examples of simplest symmetrizations:

$\begin{matrix} R (x, y) = \frac{1}{2} (A (x, y) + A (y, x)), \\ R (x, y) = max (A (x, y), A (y, x)), \\ R (x, y) = min (A (x, y), A (y, x)), \\ R (x, y) = A (x, y) \cdot A (y, x) . \end{matrix}$

8.6 Co-symmetrization of resemblance functions

Proposition 16.IfRis a resemblance function on the setΩwith negationNandgis a commutative aggregation function then the following function:

$R_{1} (x, y) = g (R (x, y), R (N (x), N (y))),$

is a co-symmetric resemblance function of the same type as R. If R is consistent then R₁ is consistent. If R is co-symmetric and g satisfies the property g (a, a) = a for all a in [0,1] then R₁ = R.

Below are the simplest co-symmetrizations:

$\begin{matrix} R_{1} (x, y) = \frac{1}{2} (R (x, y) + R (N (x), N (y))), \\ R_{2} (x, y) = max (R (x, y), R (N (x), N (y))), \\ R_{3} (x, y) = min (R (x, y), R (N (x), N (y))), \\ R_{4} (x, y) = R (x, y) \cdot R (N (x), N (y)) . \end{matrix}$

If in these formulas R is co-symmetric then R₁ = R₂ = R₃ = R.

8.7 Examples of transformations of similarity measures on 2×2 tables

8.7.1 Equivalent transformations

From Ochiai similarity measure:

S (x, y) = \frac{a}{\sqrt{(a + b) (a + c)}},

one can obtain an equivalent similarity measure:

$S_{1} (x, y) = S^{2} (x, y) = \frac{a^{2}}{(a + b) (a + c)} .$

8.7.2 Symmetrization

Consider the inclusion association measure:

A (x, y) = \frac{a}{a + b} = \frac{| X \cap Y |}{| X |} .

It is reflexive but not symmetric. By symmetrization one can obtain, for example, the following similarity functions (measures):

\begin{matrix} S (x, y) = A (x, y) \cdot A (y, x) = \\ = \frac{a^{2}}{(a + b) (a + c)} = \frac{{| X \cap Y |}^{2}}{| X | | Y |}, \\ S (x, y) = \frac{1}{2} (A (x, y) + A (y, x)) = \\ = \frac{a (2 a + b + c)}{2 (a + b) (a + c)} = \frac{| X \cap Y | (| X | + | Y |)}{2 | X | | Y |} . \end{matrix}

The first function called Sorgenfrei similarity measure [28] and the second Kulczynski similarity measure [5].

8.7.3 Similarity measure using negative matching

Most of all known similarity measures use in formulas the number of positive matches a or both numbers of positive and negative matches a and d. However, it is reasonable to introduce also the similarity measure using only negative matches.

Negative Matching:

$S (x, y) = \frac{d}{d + b + c} = \frac{| \bar{X} \cap \bar{Y} |}{| \bar{X} \cup \bar{Y} |} = \frac{d}{n - a},$ $D (x, y) = \frac{b + c}{d + b + c} = \frac{| X \oplus Y |}{| \bar{X} \cup \bar{Y} |} .$

The function S is reflexive and symmetric hence it is the similarity function. This similarity measure obtained from Jaccard similarity measure replacing attributes by their negations. Similarly, one can construct negative matching similarity measures from any non-co-symmetric positive matching similarity measures. See also [5] for “complementary” measures, where the term “complementary” has another meaning than in this paper.

The introduced similarity measure can be used when there exists the sets of “permissions” (“recommendations”) and “restrictions” (“prohibitions”) on the attributes (properties) that can possess the objects x and y from the domain Ω. Similarity between the sets of permissions in x and y can be calculated by the Jaccard similarity measure. Similarity between the sets of restrictions in x and y can be calculated by the negative matching measure.

The examples of such data are dietetic, religious and custom prescriptions for different countries or societies. Comparing two diets x and y one can measure similarity between recommendations “what to eat” by Jaccard similarity measure and similarity between recommendations “what not to eat” by the negative matching similarity measure.

Similarly to parametric family of (a)-similarity functions T_θ introduce the parametric family of (d)-similarity functions:

$S_{d} (x, y) = \frac{d}{d + θ (b + c)} .$

8.7.4 New co-symmetric similarity measure

The co-symmetrization of the Jaccard similarity measure equals to the aggregation of the positive and negative matching similarity measures. Below the co-symmetric similarity measure is obtained as the product of the Jaccard and Negative Matching similarity functions:

$\begin{matrix} S (x, y) = \frac{ad}{(a + b + c) (d + b + c)} = \\ = \frac{| X \cap Y |}{| X \cup Y |} \cdot \frac{| \bar{X} \cap \bar{Y} |}{| \bar{X} \cup \bar{Y} |} = \frac{| X \cap Y |}{| X \cup Y |} \cdot \frac{| \bar{X \cup Y} |}{| \bar{X \cap Y} |} = \\ = \frac{ad}{(n - a) (n - d)} = \frac{ad}{ad + n (b + c)} . \\ D (x, y) = \frac{n (b + c)}{ad + n (b + c)}, \end{matrix}$

Similarly, it is possible to co-symmetrize parametric (a)-family of similarity measures T_θ aggregating it by (d)-family using some aggregation function g:

$S (x, y) = g (\frac{a}{a + θ (b + c)}, \frac{d}{d + θ (b + c)}) .$

9 Visualization of similarity measures

Visualization of resemblance functions gives new looks to these measures and often helps to understand their properties. Unfortunately, it is not simple to visualize similarity measures defined on any domain.

9.1 Visualization of similarity measures on 2 × 2 tables

Batyrshin et al. in [17] propose the method of 3D visualization of similarity measures for binary data (on 2 × 2 tables). From a + b + c + d = n, it follows: b + c = n - (a + d), and hence some similarity measures can be represented as functions S (a, d) of two arguments a and d subject to a + d ≤ n with the domain located within the triangle a + d ≤ n. Instead of a, b, c, d one can use relative frequencies dividing these values by n and converting them into percentages multiplying frequencies by 100%.

For example the parametric family of (a)-similarity measures

$T_{θ} = \frac{a}{a + θ (b + c)},$

represented as a function of a and d:

$S (a, d) = \frac{a}{a + θ (n - (a + d))} .$

When θ = 1 we obtain the Jaccard similarity measure

$S (a, d) = \frac{a}{n - d},$

presented in Fig. 1a). If in the formula of some similarity measure the parameters b and c cannot be grouped into sub-formulas (b + c), the visualization can be made for different proportions of b and c in the sum b + c. Figure 1b) depicts the surface of the Ochiai similarity measure when b = s(b + c), with s = 0.5. For other values of s in [0,1] the surfaces will be similar.

Fig. 1.

Visualization of similarity measures.

It is surprising that the surfaces of the similarity measures depend mainly on their properties. Due to the reflexivity all surfaces include the horizontal line with the value S (a, d) =1 over the diagonal of the domain where a + d = n, and b = c = 0. The surfaces of all similarity functions with single a in nominator include the line S (a, d) =0 when a = 0.

For example, the Jaccard and Ochiai (cosine) similarity measures have different formulas, but as one can see from Fig. 1a) and 1b) the surfaces of the corresponding functions are very similar if we compare them with the surfaces of other similarity measures. This explains why in some works, where hierarchical clustering of similarity measures is considered, the Jaccard and Ochiai similarity measures are classified together as similar measures [5 , 52].

Generally, the parametric (a)-family of similarity functions (measures) T_θ can be used for approximating similarity measures containing single parameter a in the nominator of the formula of these measures. For example, when θ = 0.59 the surface of T_θ similarity measure almost coincides with the surface of the Ochiai (cosine) similarity measure and, hence, T_0.59 can be used instead of this popular measure.

The surfaces of Simple Matching and all similarity measures of the (a + d)-family S_θ (see Section 5.6) contain the line S (a, d) =1 when a + d = n, and the point S (a, d) =0 when a = d = 0 (see Fig. 1c). A change of the parameter θ causes the transformation of the plane triangle in Fig. 1c) into convex or concave triangles [17] having as limits the triangles of the values near 0 or near 1 in almost all points. Hence, this transformation can be very drastic.

The Fig. 1e)-1i) depict the surfaces of the similarity measures obtained as an aggregation of the Jaccard similarity measure presented in Fig. 1a) and the Negative Matching similarity measure depicted in Fig. 1d) by different aggregation operators. The obtained measures can be considered also as results of co-symmetrization of the Jaccard similarity measure using different aggregation functions, because the Negative Matching S_NM and the Jaccard S_J similarity functions are related for all binary n-tuples x and y as follows:

$S_{NM} (x, y) = S_{J} (N (x), N (y)) .$

Using formulas: a + b + c = n - d and d + b + c = n - a the following representations of aggregated measures can be considered:

Min-aggregation, Fig. 1e):

\begin{matrix} S (x, y) = min {\frac{a}{a + b + c}, \frac{d}{d + b + c}} = \\ = min {\frac{a}{n - d}, \frac{d}{n - a}} . \end{matrix}

Max-aggregation, Fig. 1f):

$\begin{matrix} S (x, y) = max {\frac{a}{a + b + c}, \frac{d}{d + b + c}} = \\ = max {\frac{a}{n - d}, \frac{d}{n - a}} . \end{matrix}$

Mean-aggregation, Fig. 1g):

\begin{matrix} S (x, y) = \frac{1}{2} (\frac{a}{a + b + c} + \frac{d}{d + b + c}) = \\ = \frac{1}{2} (\frac{a}{n - d} + \frac{d}{n - a}) . \end{matrix}

Product-aggregation, Fig. 1h):

S (x, y) = \frac{a}{a + b + c} \cdot \frac{d}{d + b + c} = \frac{ad}{(n - a) (n - d)} .

Square root-product-aggregation, Fig. 1i):

$S (x, y) = \sqrt{\frac{ad}{(n - a) (n - d)}} .$

Co-symmetric similarity functions have the surfaces symmetric with respect to the diagonal a = d : S (a, d) = S (d, a).

9.2 Classification of similarity measures on 2 × 2 tables

Based on the properties of formulas and on visualization of similarity functions consider the following classes of (reflexive) similarity functions on 2 × 2 tables (examples are given in parenthesis):

Co-symmetric (Simple Matching).

Non-cosymmetric (Jaccard).

S (a, d) =0 if and only if (iff) a = 0 (Jaccard).

S (a, d) =0 iff d = 0 (Negative Matching).

S (a, d) =0 iff a = d = 0 (Simple Matching).

S (a, d) =0 iff a = 0 or d = 0 (Min-co-symmetrization of the Jaccard measure).

Some of these classes considered also in [42].

Note that for arguments values (a = n, d = 0), and (a = 0, d = n) from a + b + c + d = n, it follows: b = c = 0, and in some formulas of similarity measures the denominators will be equal to zero. In such cases due to the reflexivity of all similarity functions, the values of these functions evaluated as 1:

$S (n, 0, 0, 0) = S (0, 0, 0, n) = 1 .$

10 Lattice of resemblance functions

Define the operations intersection ∩ and union ∪ of resemblance functions R₁ and R₂ on a set Ω for all x, y in Ω as follows:

$\begin{matrix} (R_{1} \cap R_{2}) (x, y) = min {R_{1} (x, y), R_{2} (x, y)}, \\ (R_{1} \cup R_{2}) (x, y) = max {R_{1} (x, y), R_{2} (x, y)} . \end{matrix}$

These operations define the partial ordering of resemblance functions by:

$\begin{matrix} R_{1} \subseteq R_{2} if and only if R_{1} \cap R_{2} = R_{1} and \\ R_{1} \cup R_{2} = R_{2}, \end{matrix}$

or, equivalently, by:

$R_{1} \subseteq R_{2}, if R_{1} (x, y) \leq R_{2} (x, y) for all x, y in Ω .$

We write R₂ ⊇ R₁ if and only if R₁ ⊆ R₂.

Denote, R₁ ⊂ R₂ if R₁ ⊆ R₂ and R₁ ≠ R₂.

Denote, P (R), P (S) and P (D) the sets of all resemblance, similarity and dissimilarity functions, respectively, on Ω with operations ∩ and ∪ defined above. For any similarity functions S₁, S₂ ∈ P (S) and dissimilarity functions D₁, D₂ ∈ P (D) it follows:

$\begin{matrix} S_{1} \cap S_{2}, S_{1} \cup S_{2} \in P (S); \\ D_{1} \cap D_{2}, D_{1} \cup D_{2} \in P (D); \\ S_{1} \cap D_{1} \in P (D), S_{1} \cup D_{1} \in P (S) . \end{matrix}$ (21)

P (R) is a distributive lattice [23] with sublattices P (S) and P (D).

Consider the following similarity and dissimilarity functions defined for all x, y in Ω:

$\begin{matrix} S_{0} (x, y) = {\begin{matrix} 1, if x = y \\ 0, otherwise \end{matrix}, \\ S_{I} (x, y) = 1, \\ D_{0} (x, y) = 0, \\ D_{I} (x, y) = {\begin{matrix} 0, if x = y \\ 1, otherwise \end{matrix} . \end{matrix}$

The functions S₀ and S_I are the minimal and the maximal elements in the lattice P (S), respectively. The functions D₀ and D_I are the minimal and the maximal elements in P (D), respectively. The functions D₀ and S_I are the minimal and the maximal elements in P (R), respectively.

If A : Ω × Ω → [0, 1] is a symmetric function then

$S = A \cup S_{0}, D = A \cap D_{I},$

are similarity and dissimilarity functions, respectively.

A complementN (R) of a resemblance function R on Ω defined for all x, y in Ω by:

$N (R (x, y)) = 1 - R (x, y) .$

It is clear that N is involution (negation):

$N (N (R)) = R .$ (22)

The negation of the similarity and dissimilarity functions will be equal to their complementary dissimilarity and similarity functions, respectively:

$N (S) = D, N (D) = S .$ (23)

The lattice P (R) with negation N will be a Kleene (a normal De Morgan) algebra [9] where for any resemblance functions R₁ and R₂ in P (R) the following properties fulfill:

De Morgan Laws:

$N (R_{1} \cap R_{2}) = N (R_{1}) \cup N (R_{2}),$ $N (R_{1} \cup R_{2}) = N (R_{1}) \cap N (R_{2}) .$

Normality:

$R_{1} \cap N (R_{1}) \subseteq R_{2} \cup N (R_{2}) .$

From Equations (21–23) and De Morgan Laws for any resemblance function R in P (R) it fulfills: R ∩ N (R) = D and R ∪ N (R) = S for some complementary dissimilarity D and similarity S functions.

Set theoretic operations are considered for fuzzy relations in fuzzy set theory [2 , 65]. The negation of resemblance functions defined as fuzzy negation introduced by Zadeh [1].

10.1 Examples

For the Jaccard S_J and Simple Matching S_SM similarity measures it fulfills: S_J ⊆ S_SM.

Min-aggregation and max-aggregation of the Jaccard S_J and Negative Matching S_NM similarity measures represented, respectively, as follows:

$S_{1} = S_{J} \cap S_{NM}, S_{2} = S_{J} \cup S_{NM} .$

11 Non-probabilistic entropy

11.1 Entropy of resemblance functions

A non-negative real function H defined on the lattice P (R) of resemblance functions on Ω called a non-probabilistic measure of entropy if for all R₁ and R₂ in P (R) it fulfills:

P1. H (D₀) =0.

P2. H (R₁) = H (N (R₁)).

P3. If R₁ ∩ N (R₁) ⊂ R₂ ∩ N (R₂) then H (R₁) < H (R₂).

P4. H (R₁ ∪ R₂) + H (R₁ ∩ R₂) = H (R₁) + H (R₂).

P2 says that the complementary similarity and dissimilarity functions have equal entropy values.

Such measure of entropy can be defined by:

$H (R) = 0.5 (d (D_{0}, S_{I}) - d (R, N (R))),$ (24)

where d is the co-symmetric positive metric defined on P (R) [9]. This formula says that the entropy of a resemblance function R increases when the distance d between R and the complementary function N (R) decreases. If R is a similarity (dissimilarity) function then N (R) is its complementary dissimilarity (similarity) function, hence from Equation (24) it follows for the complementary similarity S and dissimilarity D functions:

$H (S) = H (D) = 0.5 (d (D_{0}, S_{I}) - d (S, D)) .$

The normalized entropy of a similarity function S represented as follows:

$H (S) = H (D) = 1 - D_{R} (S, D) = S_{R} (S, D),$

$H (S) = S_{R} (S, N (S)),$ (25)

where $D_{R} (S, D) = \frac{d (S, D)}{d (D_{0}, S_{I})}$ is the dissimilarity function between resemblance functions and S_R is its complementary similarity function on the set of resemblance functions. The last formulas have the following interpretations.

Entropy of similarity measure S equals to similarity S_R between S and its complementary dissimilarity measure D. The more similar a similarity measure S to its negation the higher the entropy of S.

The following complementary similarity and dissimilarity measures have the maximal entropy values:

$S_{M} (x, y) = {\begin{matrix} 1, if x = y \\ 0.5, otherwise \end{matrix},$ $D_{M} (x, y) = {\begin{matrix} 0, if x = y \\ 0.5, otherwise \end{matrix} .$

The entropy of resemblance function can be considered as a measure of indiscernibility of objects with respect to the similarity measureS: the larger the entropy of similarity measure S the more indiscernible the similarity values between different pairs of objects S (x, y) and S (u, v) from the domain Ω. For S_M it is fulfilled: S (x, y) = S(u, v) for all x, y, u, v from Ω such that x ≠ y and u ≠ v. It can be considered the problem of equivalent transformation of a similarity measure S into the similarity measure S* with minimal entropy value.

Another interpretation of the entropy of the resemblance function related with the interpretation of the entropy of fuzzy set [31]: H (S) is a global measure of uncertainty in classification of pairs of objects of Ω into the same or different classes based on information about similarity (dissimilarity) between them. The maximal uncertainty appears for the similarity function S_M when the similarity between all different objects equals 0.5. In this case, the uncertainty of the decision to classify or not different objects in the same class is maximal.

11.1.1 Remark

The concept of the non-probabilistic measure of entropy of fuzzy sets introduced in [31] where different interpretations of the measure of entropy of fuzzy sets are given. A strict monotonic (P3) measure of entropy of fuzzy sets considered in [6] and further extended on Kleene algebras in [9]. This paper extends the non-probabilistic measure of entropy of the elements of the Kleene algebra on resemblance functions. The relationships between entropy and distance are studied for example in [4 , 72].

11.2 Entropy of elements of the domain

One can use the formula of the non-probabilistic entropy of resemblance functions Equation (25) also for the entropy of the elements of the setΩassociated with similarity functionS:

$H_{S} (x) = S (x, N (x)),$

where S is a co-symmetric similarity function defined on the set Ω with a negation N. For consistent similarity function S it fulfills: S (x, N (x)) =0, and the entropy of all elements of the set Ω equals to zero: H_S (x) =0. For non-consistent similarity function S, the entropy equals to similarity between the element and its negation. The entropy of fixed points has the maximal value: H_S (x_FP) =1.

For example, all reasonable similarity functions on the set Ω of binary n-tuples x = (x₁, …, x_n) (on 2 × 2 tables) are consistent and all elements x of Ω have the entropy value H_S (x) =0. Because the similarity functions on 2 × 2 tables can be considered also as similarity functions on the sets of attributes X possessed by n-tuples x, the entropy of these sets also will have the value zero: H_S (X) =0. This situation is similar to the entropy of fuzzy sets when the entropy of crisp sets equals to zero.

On the set [0,1] the negation N (x) =1 - x has the fixed point x_FP = 0.5. The similarity measure S (x, y) =1 - |x - y| will define the following measure of entropy: H_S (x) = S (x, N (x)) =1 - |2x - 1| with H (0.5) =1 and H (0) = H (1) =0.

On the set of real numbers the negation N (x) = - x has the fixed point x_FP = 0. The similarity measure $S (x, y) = \frac{1}{1 + | x - y |}$ defines the measure of entropy: $H_{S} (x) = \frac{1}{1 + 2 | x |}$ . This entropy has the maximal value H (0) =1 in the fixed point of the negation. It is approaching zero when x is increasing.

12 Composition and transitive transformation of proximity relations in hierarchical clustering

12.1 Composition of proximity relations

Consider proximity relations (similarity functions) S₁ and S₂ on a finite set Ω. We use here the terms proximity relation and similarity function as interchangeable. The max-min compositionS₁ ∘ S₂ of proximity relations is defined [1, 2] (see also [4 , 65]) for all x, y, z in Ω as follows:

$(S_{1} \circ S_{2}) (x, z) = max_{y \in Ω} min (S_{1} (x, y), S_{2} (y, z)) .$

Denote S₁ ⊇ S₂ if S₁ (x, y) ⩾ S₂ (x, y) for all x, y in Ω. For all x, z in Ω it fulfills:

$\begin{matrix} (S_{1} \circ S_{2}) (x, z) ⩾ min (S_{1} (x, z), S_{2} (z, z)) ⩾ \\ min (S_{1} (x, z), 1) = S_{1} (x . z), \end{matrix}$

i.e. S₁ ∘ S₂ ⊇ S₁. Similarly: S₁ ∘ S₂ ⊇ S₂. Max-min composition of proximity relations is reflexive but generally, it is not symmetric. It can be converted to proximity relation (to similarity function on Ω) by the symmetrization method considered in Section 8.5.

12.2 Min-transitive closure of proximity relations

The composition of proximity relation defined on Ω with itself will give a new symmetric and reflexive proximity relation S ∘ S defined on Ω such that S ∘ S ⊇ S. If S is a valued equivalence relation, then from min-transitivity of S it follows S ⊇ S ∘ S. Hence, S is an equivalence relation if and only if S ∘ S = S. Denote S^k+1 = S^k ∘ S, k = 1, 2, …, where S¹ = S. All S^k are proximity relations and S^k+1 ⊇ S^k for all k = 1, 2, …. The transitive closure $\hat{S}$ of S is defined by: $\hat{S} = ⋃_{k = 1}^{\infty} S^{k}$ . The transitive closure $\hat{S}$ of any proximity relation S will be a min-transitive (equivalence, similarity) relation. For finite Ω with n elements it fulfills: $\hat{S} = S^{n - 1} = S^{n} = \dots$ . As a shortcut method for obtaining $\hat{S}$ one can calculate S², S⁴, …, S^{2^k} finishing when S^{2^k-1} = S^{2^k} for some k or when k ⩾ log ₂ (n - 1) [34, 65].

A min-transitive closure of proximity relations is a closure operation on the lattice P (S) of all proximity relations (similarity functions) defined on Ω satisfying the following properties [7 , 23]:

$\begin{matrix} S \subseteq \hat{S}, \\ If S_{1} \subseteq S then {\hat{S}}_{1} \subseteq \hat{S}, \\ \hat{\hat{S}} = \hat{S} \cdot \end{matrix}$

Because all S^k, k = 1, 2, …, and transitive closure $\hat{S}$ are again similarity functions on Ω they can be considered as transformations of similarity functions.

12.3 Transitive closure and hierarchical clustering

Tamura et al. [65] proposed the hierarchical clustering algorithm transforming non-transitive proximity relation S given on a set X into its transitive closure $\hat{S}$ that defines a hierarchical partition of X on equivalence classes of valued equivalence relation $\hat{S}$ . This method coincides with a spanning tree clustering, and with a version of the single linkage clustering [34 , 45].

Hubert and Baker [43] used min-max composition D^r = D^r-1 ∘ D of a dissimilarity function D and applied the complete linkage clustering to D^r. Dually, it is possible to apply the single linkage clustering to max-min composition S^k of a proximity relation S.

12.4 Min-transitive transformation of proximity relation and invariant hierarchical clustering

Batyrshin [7] proposed the general scheme of hierarchical clustering algorithms invariant under monotone transformation of similarity values and under initial numeration of objects based on min-transitive transformation of given proximity relation S into valued equivalence relation E. Some versions of these algorithms published in [8, 11].

These algorithms transform the given proximity relation S into valued equivalence relation E by:

$E = TC (F (S)),$ (26)

where TC is the max-min transitive closure of the proximity relation F (S) obtained as a correction of the proximity relation S such that F (S) ⊆ S. The resulting equivalence relation defines the hierarchy of partitions of the set Ω on equivalence classes. Since the transitive close TC is invariant under both types of data transformation mentioned above, the resulting clustering algorithm will be also invariant if the data correction procedure F is invariant.

The proposed method of hierarchical clustering based on the general solution Equation (26) of the problem of optimal approximation of proximity relation in the lattice of proximity relations (similarity functions) with closure operation [7, 11].

Invariant correction procedure F is based on analysis of neighborhoods of elements x and y. Denote Ω_x,y = Ω ∖ {x, y}. One version of this algorithm calculates the following sets of “neighbors” of x and y:

$V_{x, y} = {z \in Ω_{x, y} | max {S (x, z), S (y, z)} ⩾ S (x, y)},$ $W_{x, y} = {z \in Ω_{x, y} | min {S (x, z), S (y, z)} ⩾ S (x, y)},$

where V_x,y is the set of “all” neighbors of x and y, while W_x,y is the set of “common” neighbors of x and y. The correction F (S (x, y)) of the value S (x, y) depends on the relative part of W_xy in the set V_x,y “supporting” the similarity value S (x, y). On many real problems of hierarchical clustering the proposed scheme of clustering algorithms based on correction procedure F showed a good approximation of the optimal hierarchical clustering [7 , 11].

13 Similarity and correlation functions

Let N be a negation on a set Ω, and V be a subset of Ω_N = Ω ∖ FP (N, Ω) closed under negation N. A function A : V × V → [-1, 1] satisfying for all x, y ∈ V the following properties:

A (x, y) = A (y, x), (symmetry)

A (x, x) =1, (reflexivity)

A (x, N (y)) = - A (x, y), (inverse relationship)

will be called a correlation function on V. These properties considered in [13] as the properties of association measure generalizing the Pearson’s product-moment correlation coefficient:

$A (x, y) = \frac{\sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}} \sqrt{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}} .$ (27)

The following theorem is a particular case of the general method of construction of correlation functions using co-symmetric similarity functions (similarity measures) considered in [12, 13].

Theorem 1.LetΩbe a set with a negationNandSbe a consistent co-symmetric similarity function on the setV ⊆ Ω ∖ FP (N, Ω) closed under negationNthen the functionA : V × V → [-1, 1] defined by:

$A (x, y) = S (x, y) - S (x, N (y)),$ (28)

for all x, y in V, is a correlation function on V.

Proof. Symmetry of A follows from the symmetry and co-symmetry (Section 7.5) of SA (y, x) = S : (y, x) - S (y, N (x)) = S (x, y) - S (N (x), y) = S (x, y) - S (x, N (y)) = A (x, y).

Reflexivity of A follows from the reflexivity and consistency of S : A (x, x) = S (x, x) - S (x, N (x)) =1 - 0 =1.

Inverse relationship follows from the involutivity of N: A (x, N (y)) = S (x, N (y)) - S (x, N (N (y))) = S (x, N (y)) - S (x, y) = - (S (x, y) - S (x, N (y))) = - A (x, y). ■

The formula Equation (28) has simple interpretation: the correlation betweenxandyis positive ifxis more similar toythan to the negation ofyand the correlation is negative in the opposite case.

From Equation (28) it follows the formula for complementary dissimilarity function:

$A (x, y) = D (x, N (y)) - D (x, y) .$ (29)

Proposition 17.LetΩbe the set of real n-tuplesx = (x₁, …, x_n) with negationN (x) = (- x₁, …, - x_n). Then the function Equation (29) defines on the set of non-constant n-tuplesVthe Pearson’s product-moment correlation coefficient Equation (27) ifDis the dissimilarity function onVgiven by:

$D (x, y) = \frac{1}{4} \sum_{i = 1}^{n} {(\frac{x_{i} - \bar{x}}{\sqrt{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}}} - \frac{y_{i} - \bar{y}}{\sqrt{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}})}^{2} .$

This dissimilarity function considered in Section 6.7. One can check that it is the co-symmetric and consistent dissimilarity function and the Pearson’s correlation coefficient is the correlation function. The simple proof of Proposition 17 find in [15].

Proposition 18. Let

$\begin{matrix} S (x, y) = \frac{ad}{ad + bc} \\ = \frac{| X \cap Y | \cdot | \bar{X} \cap \bar{Y} |}{| X \cap Y | \cdot | \bar{X} \cap \bar{Y} | + | X \cap \bar{Y} | \cdot | \bar{X} \cap Y |} \end{matrix}$

be the similarity measure on 2 × 2 table then by Equation (28) it defines the Yule’s Q association coefficient:

$A_{YuleQ} (x, y) = \frac{ad - bc}{ad + bc} .$

One can easy check that S is the consistent co-symmetric similarity function and the popular A_YuleQ coefficient of association is the correlation function.

The papers [12 –14] give more general methods of constructing correlation functions on almost any domain where a negation and co-symmetric similarity function are defined. Note, that in these works the negation, co-symmetry and consistency are called reflection, cancellation of reflections and non-similarity of reflections, respectively.

14 Related works and conclusion

It is impossible to survey hundreds of works related with the topics discussed in this paper. Some given in the list of references.

The concept of fuzzy relation proposed by Lotfi Zadeh [1, 2] and studied in many works on fuzzy sets and relations [4 , 65, etc.] was used in this paper as the basis for consideration of similarity measures as functions defined on universal domain and satisfying some sets of properties.

Sup-min-transitivity (or min-transitivity) of fuzzy similarity relations was introduced by Lotfi Zadeh in 1971 in [2]. Later, T-transitivity of fuzzy relations studied in many works [21 , 36]. From the results obtained by De Baets et al. [30] it follows (see Section 5.5 above) that the wide parametric class of consistent similarity measures on 2 × 2 tables satisfies neither min-transitivity nor product-transitivity. Only T_L-transitivity that is dual to triangle inequality of metrics can be fulfilled. These considerations say that although there is a large intersection between the properties of fuzzy similarity relations and similarity measures (similarity functions) they have differences both in properties and in applications. Roughly speaking the theory of valued (fuzzy) proximity and similarity relations study relationships between proximity values (S(x,y), S(y,z) etc) of different pairs of objects. The works on similarity measures consider mainly the methods of construction and transformation of similarity measures. This paper defining similarity functions as fuzzy (valued) relations and considering the methods of construction and transformation of these functions paves the way for integration of methods developed in the theory of fuzzy relations, and in the works on similarity or resemblance measures. The last chapters consider the concepts of lattice, entropy and transitive transformations of resemblance functions developed in the theory of fuzzy sets and relations.

Due to the difference in methods of construction of similarity and dissimilarity measures these measures considered in this paper together as similarity functions (SF) and dissimilarity functions (DF) and under the general name of resemblance functions (RF). This paper is related also with [5] using general definition of resemblance measures as symmetric real valued functions that are reflexive or irreflexive. The work [5] studies the transformations of resemblance measures, some of them extended here on RF on universal domain. The work [33] describes many distances on different domains. One can use them for defining new similarity measures in various tasks.

The paper proposes new methods (partially based on [12]) of constructing DF based on Minkowski distance and p-transformation of data.

Several dozens of similarity measures on 2 × 2 tables are considered in the works on classification and data analysis [5 , 49 etc.]. Some of them used in this paper for illustrating the properties of RF and their transformations. The new RF proposed here as the result of aggregation or co-symmetrization of known RF. The methods of visualization of similarity measures on 2 × 2 tables proposed in [17] used here to demonstrate the classes of these measures.

The new important concept explicitly introduced in the analysis of similarity and association measures here and in [13] is an involution (reflection, negation) operation defined on the domain of these measures. The involutive negation widely used in the theory of fuzzy sets, but in statistics, in analysis of similarity and association measures, the involution or negation usually are not explicitly considered. The concepts of the consistency and co-symmetry of similarity functions related with an involution (reflection) are called in [13] as non-similarity of reflections and cancellation (or permutation) of reflections, respectively.

This paper is a sequel of the works [12 –14] where association measures (called here correlation functions (CF)) and similarity measures considered as functions with given properties, and association measures constructed by means of similarity measures and pseudo-difference operations studied in the theory of aggregation functions [41]. Section 13 considers a particular case of more general methods of constructing CF from [13] that one can use for obtaining CF on almost any domain with negation and co-symmetric SF. Due to the limitations and drawbacks of Pearson correlation coefficient used as a similarity or correlation measure on different domains [16, 18] the new correlation coefficients more suitable for specific domains can be introduced.

Generally, any real valued function of two or more arguments establishes some relationship between them and can be considered as an association function (AF). Similarity and dissimilarity functions are special types of symmetric AF taking values in [0,1]. Section 13 considers symmetric and reflexive correlation functions taking values in [– 1,1]. Sections 8.7.2 and 10 consider the methods of construction of RF from AF that can be non-symmetric or non-reflexive. As a sequel of this work, it is supposed to extend the approach developed in this paper on other types of association functions.

The results obtained in this paper for resemblance functions on universal domain one can transfer on specific domains. On some domains, resemblance functions can have additional properties related with the properties of these domains. The consideration of RF on universal domain serves as a bridge in transferring these functions from one to another domain. However, in this case the similarity measure on a new domain can have drawbacks due to differences in the properties of these domains.

Due to the limitations on the size of the paper it does not consider the similarity measures on many specific domains, for example, on interval [0,1], on the sets of fuzzy sets of different types (intuitionistic, hesitant, soft fuzzy sets etc.), on the sets of time series, graphs, images, texts and so on. Some similarity measures on these sets one can find in the works included in the list of references or can construct with the methods considered in this paper. Vice versa, the methods of analysis and construction of similarity measures on specific domains the reader can extend on universal domain as proposed in this paper.

The paper includes many examples illustrating properties and methods of transformation or construction of similarity, dissimilarity and correlation functions. The author hopes that these examples will be helpful for researchers applying similarity and association measures in ecology, social and behavioral sciences, bioinformatics, time series analysis, natural language processing etc. The results of the paper one can use as a part of the course on Data Analysis, Data Mining or Data Science.

Footnotes

Acknowledgments

The author would like to thank the Editor-in-Chief Dr. Reza Langari for the opportunity to publish the paper in the presented form is this journal. The research partially supported by projects SIP 20181315, IPN, and A1-S-43766, FSSEP02-C-2018-1, CONACYT, Mexico.

References

L.A.

Zadeh , Fuzzy sets, Information and Control8 (1965), 338–353.

L.A.

Zadeh , Similarity relations and fuzzy orderings, Information Sciences3 (1971), 177–200.

F.J.

Aherne ,

N.A.

Thacker and

P.I.

Rockett , The Bhat-tacharyya metric as an absolute similarity measure for frequency coded data, Kybernetika34 (1998), 363–368.

A.N.

Averkin ,

I.Z.

Batyrshin ,

A.F.

Blishun ,

V.B.

Silov and

V.B.

Tarasov , Fuzzy Sets in Models of Control and Artificial Intelligence, D.A.

Pospelov

, ed., Nauka, Moscow, 1986(in Russian).

Batagelj and

Bren , Comparing resemblance measures, Journal of Classification12 (1995), 73–90.

I.Z.

Batyrshin , On measures of entropy of fuzzy sets, in: Operations Research and Analytical Design in Technique, KAI Publisher, Kazan, 1978, pp. 40–45(in Russian).

I.Z.

Batyrshin , Methods of system analysis based on weighted relations, PhD Dissertation, Moscow Power Engineering Institute, Moscow, 1982(in Russian).

I.Z.

Batyrshin and

V.A.

Shuster , The structure of semantic spaces of verbal estimates of actions, in: Fundamental Questions of the Theory of Knowledge, Transactions of the University of Tartu688 (1985), 20–38(in Russian).

I.Z.

Batyrshin , On fuzzinesstic measures of entropy on Kleene algebras, Fuzzy Sets and Systems34 (1990), 47–60.

10.

Batyrshin , On the structure of involutive, contracting and expanding negations, Fuzzy Sets and Systems139 (2003), 661–672.

11.

Batyrshin and

Rudas , Invariant hierarchical clustering schemes. In: Perception-based Data Mining and Decision Making in Economics and Finance, Springer, 2007, pp. 181–206.

12.

Batyrshin , Constructing time series shape association measures: Minkowski distance and data standardization, in: BRICS-CCI2013, Brazil, https://arxiv.org/ pdf/1311.1958v3.

13.

I.Z.

Batyrshin , On definition and construction of association measures, Journal of Intelligent & Fuzzy Systems29 (2015), 2319–2326.

14.

I.Z.

Batyrshin , Association measures on [0,1], Journal of Intelligent & Fuzzy Systems29 (2015), 1011–1020.

15.

Batyrshin and

Kreinovich , One more geometric interpretation of Pearson’s correlation, Thailand Statistician13(1) (2015), 125–126.

16.

Batyrshin ,

Solovyev and

Ivanov , Time series shape association measures and local trend association patterns, Neurocomputing175 (2016), 924–934.

17.

I.Z.

Batyrshin ,

Kubysheva ,

Solovyev and

L.A.

Villa-Vargas , Visualization of similarity measures for binary data and 2 x 2 tables, Computación y Sistemas20 (2016), 345–353.

18.

Batyrshin ,

Monroy -Tenorio,

Gelbukh ,

L.A.

Villa-Vargas ,

Solovyev and

Kubysheva , Bipolar rating scales: A survey and novel correlation measures based on nonlinear bipolar scoring functions, Acta Polytechnica Hun-garica14 (2017), 33–57.

19.

F.B.

Baulieu , A classification of presence/absence based dissimilarity coefficients, Journal of Classification6 (1989), 233–246.

20.

Beliakov ,

Pradera and

Calvo , Aggregation functions: A guide for practitioners, vol. 221, Springer, Heidelberg, 2007.

21.

J.C.

Bezdek and

J.D.

Harris , Fuzzy relations and partitions: An axiomatic basis for clustering, Fuzzy Sets and Systems1 (1978), 111–127.

22.

Bhattacharyya , On a measure of divergence between two statistical populations defined by their probability distribution, Bulletin of the Calcutta Mathematical Society35 (1943), 99–110.

23.

Birkhoff , Lattice theory, 3rd ed, American Mathematical Society, 1967.

24.

Bouchon-Meunier ,

Rifqi and

Bothorel , Towards general measures of comparison of objects, Fuzzy Sets and Systems84 (1996), 143–153.

25.

Bustince ,

Barrenechea and

Pagola , Restricted equivalence functions, Fuzzy Sets and Systems157 (2006), 2333–2346.

26.

S.H.

Cha , Comprehensive survey on distance/similarity measures between probability density functions, Intern J Math Models and Methods in Applied Sciences1 (2007), 300–307.

27.

P.Y.

Chen and

P.M.

Popovich , Correlation: Parametric and nonparametric measures, Sage, Thousand Oaks, CA, 2002.

28.

S.S.

Choi ,

S.H.

Cha and

C.T.

Charles , A survey of binary similarity and distance measures, Journal of Systemics, Cybernetics and Informatics8 (2010), 43–48.

29.

H.T.

Clifford and

Stephenson , An introduction to numerical classification, Academic Press, New York, 1975.

30.

De Baets ,

Janssens and

De Meyer , On the transitivity of a parametric family of cardinality-based similarity measures, International Journal of Approximate Reasoning50 (2009), 104–116.

31.

De Luca and

Termini , A definition of a nonprobabilis-tic entropy in the setting of fuzzy sets, Inform Control20 (1972), 301–312.

32.

K.G.

Derpanis , The Bhattacharyya measure, Mendeley Computer1 (2008), 1990–1992.

33.

M.M.

Deza and

Deza , Encyclopedia of distances, 2nd ed., Springer, Berlin, Heidelberg, 2013.

34.

J.C.

Dunn , A graph theoretic analysis of pattern classification via Tamura’s fuzzy relation, IEEE Transactions on Systems, Man, and Cybernetics3 (1974), 310–313.

35.

Fan and

Xie , Some notes on similarity measure and proximity measure, Fuzzy Sets and Systems101 (1999), 403–412.

36.

J.C.

Fodor and

M.R.

Roubens , Fuzzy preference modelling and multicriteria decision support, vol. 14, Springer Science & Business Media, 1994.

37.

F.J.

Garcia-Lopez ,

Batyrshin and

Gelbukh , Analysis of relationships between tweets and stock market trends, Journal ofIntelligent and Fuzzy Systems34(5), 3337–3347.

38.

González-Caballero ,

Díaz ,

Espín and

Montes New measures of similarity based on fuzzy implications, Journal of Intelligent & Fuzzy Systems33 (2017) 3493–3503.

39.

J.C.

Gower and

G.J.S.

Ross , Minimum spanning trees single linkage cluster analysis, Applied Statistics (1969) 54–64.

40.

J.C.

Gower and

Legendre , Metric and Euclidean properties of dissimilarity coefficients, Journal of Classification3 (1986), 5–48.

41.

Grabisch ,

J.L.

Marichal ,

Mesiar and

Pap , Aggregation Functions, Cambridge Univ., Press, Cambridge, UK, 2009.

42.

W.J.

Heiser and

M.J.

Warrens , Families of relational statistics for 2 x 2 tables, in: Advances in Interdisciplinary Applied Discrete Mathematics, H.

Kaul

, H.M.

Mulder

, eds., World Scientific, Singapore, 2010, pp. 25–52.

43.

L.J.

Hubert and

F.B.

Baker , An empirical comparison of baseline models for goodness-of-fit in r-diameter hierarchical clustering, in Classification and Clustering, J.

Van Ryzin

, ed., Academic Press, 1977, pp. 131–153.

44.

Janson and

Vegelius , Measures of ecological association, Oecologia49 (1981), 371–376.

45.

S.C.

Johnson , Hierarchical clustering schemes, Psychome-trika32 (1967), 241–254.

46.

Kaufmann , Introduction to the Theory of Fuzzy Subsets, vol. 2, Academic Pr, 1975.

47.

E.P.

Klement ,

Mesiar and

Pap , Triangular norms, vol. 8, Springer Science & Business Media, 2000.

48.

Legendre and

L.F.

Legendre , Numerical ecology, 2ndEnglish Ed, Elsevier, 1998.

49.

M.-J.

Lesot ,

Rifqi and

Benhadda , Similarity measures for binary and numerical data: A survey, Int J Knowledge Engineering and Soft Data Paradigms1 (2009), 63–84.

50.

Li ,

Qin and

He , Some new approaches to constructing similarity measures, Fuzzy Sets and Systems234 (2014), 46–60.

51.

Menger , Probabilistic theories of relations, Proceedings of the National Academy of Sciences37 (1951), 178–180.

52.

A.D.S.

Meyer ,

A.A.F.

Garcia ,

A.P.D.

Souza and

C.L.D.

Souza Jr , Comparison of similarity coefficients used for cluster analysis with dominant markers in maize (Zea mays L), Genetics and Molecular Biology27 (2004), 83–91.

53.

Mirkin , Core concepts in data analysis: Summarization, correlation and visualization, Springer Science & Business Media, 2011.

54.

Ovchinnikov , Similarity relations, fuzzy partitions, and fuzzy orderings, Fuzzy Sets and Systems40 (1991), 107–126.

55.

G.V.

Rauschenbach , Proximity and similarity measures. in: Analysis of non-numerical information in sociological research, Nauka, Moscow, 1985, pp. 169–202. (in Russian)

56.

Recasens , Indistinguishability Operators: Modelling Fuzzy Equalities and Fuzzy Equivalence Relations, vol. 260, Springer Science & Business Media, 2010.

57.

J.L.

Rodgers and

W.A.

Nicewander , Thirteen ways to look at the correlation coefficient, The American Statistician42 (1988), 59–66.

58.

M.E.

Rodriguez-Salazar ,

Alvarez-Hernandez and

Bravo-Nunez Coeficientes de Asóciacion, Plaza y Valdes Editores, Mexico, 2001.

59.

Salton , Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley Publishing Co., Inc., Boston, MA, USA, 1989.

60.

B.I.

Semkin , Descriptive sets and their application, in: Systems Research, vol. 1: Complex Systems Analysis, DVNTs AN SSSR, Vladivostok, 1973, pp. 83–94, (in Russian).

61.

B.I.

Semkin , Elementary theory of similarities and its use in biology and geography, Pattern Recognition and Image Analysis22 (2012), 92–98.

62.

Sidorov ,

Gelbukh ,

Gomez-Adorno and

Pinto , Soft similarity and soft cosine measure: Similarity of features in vector space model, Computacion y Sistemas18 (2014), 491–504.

63.

P.H.

Sneath and

R.R.

Sokal , Numerical Taxonomy. The Principles and Practice of Numerical Classification, Freeman and Company, San Francisco, 1973.

64.

Szmidt and

Kacprzyk , Distances between intuition-istic fuzzy sets, Fuzzy Sets and Systems114 (2000) 505–518.

65.

Tamura ,

Higuchi and

Tanaka , Pattern classification based on fuzzy relations, IEEE Transactions on Systems, Man, and Cybernetics1 (1971), 61–66.

66.

P.N.

Tan ,

Kumar and

Srivastava , Selecting the right interestingness measure for association patterns, 8th Proc Eighth ACM SIGKDD Int Conf Knowledge Discovery and Data Mining, 2002, pp. 32–41.

67.

Trillas , Sobre funciones de negación en la teoría de conjuntos difusos, Stochastica3 (1979), 47–60.

68.

Tversky , Features of similarity, Psychological Review84 (1977), 327–352.

69.

Valverde and

Ovchinnikov , Representations of T-similarity relations, Fuzzy Sets and Systems159 (2008), 2211–2220.

70.

Vangelis , On the utility of the E-correlation coefficient concept in psychological research, Educational and Psychological Measurement38 (1978), 605–611.

71.

Xu and

Xia , On distance and correlation measures of hesitant fuzzy information, International Journal of Intelligent Systems26 (2011), 410–425.

72.

R.R.

Yager , A note on fuzziness in a standard uncertainty logic, IEEE Trans. Systems Man Cybernet9 (1979), 387–388.

Towards a general theory of similarity and association measures: Similarity,dissimilarity and correlation functions

Abstract

Keywords

1 Introduction

2 Similarity measures on universal set and on specific domains

2.1 The set of n-tuples

2.2 Binary n-tuples and 2×2 tables

Table 1 2×2 table Y Y ¯ 1 0 X 1 a b a + b X ¯ 0 c d c + d a + c b + d n

3.1 Similarity functions

3.2 Dissimilarity functions

3.4 Examples of resemblance functions

3.4.1 Resemblance functions on 2 × 2 tables

3.4.2 Cosine similarity measure

4.1 Min-transitivity and ultrametric inequality

4.3 Non-fuzzy equivalence relations

5 T-transitivity and triangle inequality

5.1 Distance functions and metrics

5.3 T-transitive similarity functions

5.4 T-transitivity and metrics

5.5 Transitivity of parametric families of similarity measures on 2×2 tables

6 Data transformation

6.1 Transformations on universal domain

6.2 Transformations of binary n-tuples

6.3 P-transformations in constructing dissimilarity functions by Minkowski distance of order p

6.4 Dissimilarity functions for non-negative n-tuples

6.7 P-transformations with aggregation functions

6.8 Similarity functions for real n-tuples

6.9 Resemblance functions for finite probabilistic distributions

7.1 Resemblance functions on ordered domain

7.2 Involution (negation) on Ω

7.3 Negation on ordered domain

7.3.1 Examples of negations

7.4 Consistent resemblance functions

7.5 Co-symmetric resemblance functions

7.5.1 Remark

7.5.2 Examples of consistent and co-symmetric similarity measures on 2×2 tables

7.6 Transformation of negations

8 Transformations of resemblance functions

8.1 Equivalent transformations of resemblance functions

8.1.1 Remark

8.2 0-(1-)normality transformations

8.3 Negation based transformation

8.4 Aggregation of resemblance functions

8.5 Symmetrization of functions

8.6 Co-symmetrization of resemblance functions

8.7 Examples of transformations of similarity measures on 2×2 tables

8.7.1 Equivalent transformations

8.7.2 Symmetrization

8.7.3 Similarity measure using negative matching

8.7.4 New co-symmetric similarity measure

9 Visualization of similarity measures

9.1 Visualization of similarity measures on 2 × 2 tables

10 Lattice of resemblance functions

11 Non-probabilistic entropy

11.1 Entropy of resemblance functions

11.2 Entropy of elements of the domain

12 Composition and transitive transformation of proximity relations in hierarchical clustering

12.1 Composition of proximity relations

12.2 Min-transitive closure of proximity relations

12.3 Transitive closure and hierarchical clustering

12.4 Min-transitive transformation of proximity relation and invariant hierarchical clustering

Footnotes

Acknowledgments

References

Table 1
2×2 table

Y $\bar{Y}$

1 0

X 1 a b a + b

$\bar{X}$ 0 c d c + d

a + c b + d n