Abstract
The paper draws attention to the use of Symbolic Data Analysis (SDA) in the field of Official Statistics. It is composed of three sections presenting three pilot techniques in the field of SDA. The three contributions range from a technique based on the notion of exactly unified summaries for the creation of symbolic objects, a model-based approach for interval data as an innovative parametric strategy in this context, and measures of similarity defined between a class and a collection of classes based on the frequency of the categories which characterize them.
The paper shows the effectiveness of the proposed approaches as prototypes of numerous techniques developed within the SDA framework and opens to possible further developments.
Introduction
This paper intends to refer to the contribution of Symbolic Data Analysis [1] in the domain of Official Statistics (OSs). As is well known, OSs are presented in the form of aggregate data, primarily as summary measures of complex economics and social phenomena with high variability in time and space. They are mostly provided as composite indicators, distributions of phenomena at the spatial level, usually referring to different categories of reference populations or strata of the target population. The more recent use of Big Data techniques, such as Machine Learning, increasingly requires adequate synthesis to reduce the dimensionality of data. OSs are reported in the form of aggregates both as the outcome of appropriate summaries and in order to preserve the confidentiality and privacy of the data. Symbolic Data Analysis was initiated at the end of the 1980s with the pioneering work of Edwin Diday [2, 3, 4] and it has represented a new branch of research with the aim of modeling statistical units no more through punctual (categorical or numerical) values observed on a set of characters, and collected in a classical table (
The development of SDA was made possible by the theoretical and conceptual framework elaborated by Edwin Diday and by the involvement of numerous researchers, not only from Europe, who collaborated on two major European projects between the late 1990s and early 2000s. These latter allowed for developing Symbolic Data Analysis methods, standardizing data representation, and exploring applications in Official Statistics. The first European research project, “Symbolic Objects Data Analysis System”, SODAS (1996–1999), gathered 17 teams working in SDA, including National Statistical Institutes (NSI’s). The project led to the first statistical package for SDA, ’SODAS’, which made it possible for Data Analysis researchers and users alike to produce, edit, and analyze symbolic data. At the same time, the first book on SDA “Analysis of Symbolic Data” [1] was published. SODAS was followed by another European project, “Analysis System of Symbolic Official Data”, ASSO, gathering 15 teams, including three NSI’s. The ASSO project allowed the development of new methodologies and the publishing of a second book – “Symbolic Data Analysis and the Sodas Software” [5].
These projects fostered the development of a field of research that, through doctoral theses, workshops, conference sessions, publications in scientific journals, in the fields of Statistics, Data Analysis, and Computational Statistics, has known considerable progress over the past twenty years, and spread far beyond Europe. In the 1990s, Big Data analysis was in its early days and was not yet a challenge in the statistical community, while an area that had to be dealt with by statisticians was data confidentiality and privacy. It is evident that SDA has been a pioneering line of research for the analysis of Big Data, of complex, aggregated data, which represent today, in various fields, the information to be processed. If we consider that most techniques for analyzing large amounts of data are based on the use of synthesis functions, dimension reduction, and analysis of aggregated data, SDA methods can provide strong support. This is demonstrated, e.g., by the use of data in the form of distributions for the synthesis of data streams or data from sensors or high-frequency time sequences. Related to the new line of research for Big Data, are the recent proposals for metrics and measures of dissimilarity to compare descriptions of aggregate data and classes. Hence, the new developments on concordance and discordance were introduced by the relentless research work and innovative propositions of Edwin Diday and some colleagues who have recently collaborated with him. This paper, far from summarising the enormous scope in which SDA research has developed and branched out, brings together some of the research contributions presented at the NTTS2023 conference and mentions a final scientific inheritance left by Edwin Diday.
The population pyramids of China and Vanuatu.
The manuscript is organized as follows. In the second section, written by V. Batagelj, the notion of exactly mergeable summaries is introduced and discussed. The third section presents a contribution by P. Brito and A.P. Duarte Silva on parametric models for interval data for the discovery of patterns and trends, with application to the Portuguese Labour Force Survey. In a fourth section, provided by S. Korenjak-Černe and J. Dobša and based on the theoretical contributions of E. Diday, two examples show the applicability of concordance and discordance in two very distinct areas: one using data from the international measurement of reading achievement, and the other for the representation of textual data for automatic classification. Finally, the Conclusion section summarises the presented approaches, putting in evidence their main contributions.
Motivation
In our program Clamix [6] for clustering symbolic data, they are represented by discrete distributions
fixed space required for a description of a unit/cluster; description of a union of two disjoint clusters can be obtained from their descriptions.
In this paper, we will elaborate on the second observation.
For example, let us consider the population pyramids of the world’s countries. How to join the population pyramids of China and Vanuatu (see Fig. 1)?
When comparing two countries
In an analysis of large data sets the aggregation is a standard way for reducing the size (complexity) of the data. Recently some books dealing with the theoretical and algorithmic background of the traditional aggregation (replacing values of a variable over a group by a single value) were published [7, 8, 9, 10, 11].
Data analysis programs provide aggregation functions such as means (arit, geom, harm, median, modus), min, max, product, bounded sum, counting, etc. [12]. Special care has to be given to variables measured in different measurement scales.
In theoretical discussion the traditional aggregation functions are usually “normalized” to the interval
The applications of traditional aggregation functions are used, besides determining a representative value for a group of measurements, mainly to combine partial criteria into a single criterion (multicriteria optimization and decision-making) or to express the membership degree in combined fuzzy sets.
A problem with traditional aggregation is that often too much information is discarded, thus reducing the precision of the obtained results.
A much better, preserving more information, summarization of original data can be achieved by representing aggregated data using selected types of complex data such as symbolic objects [2, 13], compositions [14], functional data [15], etc. In the Symbolic Data Analysis (SDA) framework, much work is devoted to the summarization process, for example, the function classic.to.sym in RSDA [16], and SODAS or SYR software.
Mergeable summaries
In complex data analysis the measured values over a selected subset of units
An interesting question is, which complex data types are compatible with the merging of disjoint sets of units
Selecting a name for this kind of summary we were inclined towards the term hierarchical or mergeable summary. Searching on Google we learned that the term mergeable summary was already proposed and elaborated by [17]. They enable parallelization in big data algorithms and stream processing. The summarization in big data is not deterministic and allows some errors. A summary is mergeable if the error and space (size of the summary) do not increase after the merge.
In this paper, we will discuss exactly mergeable summaries “without errors”.
A summary
We can consider merging as a partially defined binary operation
Simple examples
We assume that a numerical variable
Let
It is easy to check that the following summaries are exactly mergeable:
The distribution of values of variable
Then the distribution of additional values of variable
where
This result can be extended to higher moments.
Set membership count
Counting the number of units from
is an exactly mergeable summary.
Combining exactly mergeable summaries
Let
is an exactly mergeable summary.
Since min and max are mergeable summaries also their composition – the interval summary of the variable
is an exactly mergeable summary. Let
Let
is called a bar chart.
Let
is called a histogram.
A histogram (and also a bar chart) is essentially a frequency distribution
Therefore, since set membership counts are exactly mergeable, the bar charts and histograms are exactly mergeable summaries.
Proving that a summary is not exactly mergeable
If for a summary
– a contradiction.
Note that also
Conclusions
In measurement theory [18, 19] measurement scales are divided into absolute, ratio, interval, ordinal, and nominal. The corresponding “best representatives” are count, geometric mean, average (arithmetic mean), median, and mode. The count is an exactly mergeable summary (2.4.1.1). So are the geometric mean (2.4.1.7) and the average (2.4.1.5), provided that we keep also the size of the corresponding set of units.
Median and mode are not exactly mergeable. A good exactly mergeable alternative is to use the corresponding frequency distribution (histogram or bar chart). In the case of a large number of categories, less frequent categories can be combined into a common category. By using the frequency distribution also for aggregation of numerical (ratio and interval) variables, we get a uniform representation for all types of variables.
Discovering patterns and trends with interval-valued data P. Brito, A.P. Duarte Silva
Context
Array of interval-valued data
Array of interval-valued data
This study concerns the Portuguese Labour Force Survey (LFS), analysing data from the 1st trimester of 2008 and the 4th trimester of 2010. We only consider people who were unemployed at the time of the survey (had no job and were looking for one), and focus on the Activity Time (in years)(AT) and Unemployment Time (in months) (UT). Disregarding records with missing values, and keeping only those from mainland Portugal (i.e. excluding Madeira and Azores), we end up with 1150 observations in 2008 and 1569 in 2010.
These micro-data were then gathered, in each case, on the basis of Gender (Mas, Fem), Region (North, Centre, Lisbon and Tagus Valley (LTV), South), Age-Group (Young: 15–24, Prime: 25–44, Mature: 45 and above) and Education (Basic or less, Secondary, Higher), leading to 58 sociological groups in 2008 (T1) and 68 in 2010 (T4) (as some of the 72 possible combinations do not occur) and which constitute the statistical units to be analysed.
We note that although the individuals at micro data level are not the same in 2008 and 2010, the aggregate units formed correspond to the same sub-populations – e.g., Young Women, from the North, with Secondary Education – and data are hence comparable at aggregate level.
The objective of this study is to cluster the aggregate units in each year, and compare the obtained partitions, trying to get insights about the dynamics between 2008 and 2010. For this purpose, we rely on the parametric model for interval-valued variables proposed in [20] and the model-based clustering methodology developed in [21]. Data aggregation as well as all analysis are done with R package MAINT.Data [22, 23].
Let
where
Let
In our case, for each aggregate unit, in each year, the minimum and maximum values of each of the Activity Time (AT) and Unemployment Time (UT) were recorded. As a result, each group is described by two intervals, that represent the within range of variation of the Activity Time and Unemployment Time in the corresponding year. Table 2 displays some rows of the 2008 and 2010 data arrays.
Data in 2008 (left) and 2010 (right), partial views
Data in 2008 (left) and 2010 (right), partial views
Comparing the interval data arrays for 2008 and 2010, we could observe that in several cases the UT interval became much wider within the two years, with the maximum value for UT showing a large increase. This is specially the case for groups with higher education levels, being not so frequent and clear in groups with basic education. Examples of such cases are the Mature Women from the South with Secondary education (Fem-South-Mat-Sec), for whom the UT interval went from
Interval data for 2008 (left) and 2010 (right), groups with superior education (top), secondary education (center), and basic education (bottom).
Figure 2 displays the interval-valued data separately for the two years under analysis and the three education levels. We note that in 2008 the UT intervals (along the horizontal axis) differ considerably across education levels, getting larger as education level decreases. However, these intervals are much wider in 2010 than in 2008 for groups with superior education (upper figures), somehow for groups with secondary education (middle figures), but not so much for groups with basic education (lower figures). This effect reduces the differences in the UT intervals across education levels observed in 2008. In both years, the Activity Time variability increases as education level decreases (intervals along the vertical axis get wider). However, we do not observe remarkable changes in AT from 2008 to 2010, for any of the three education levels.
The value of an interval-valued variable
The Gaussian model (see [20]) assumes a multivariate Normal distribution for the MidPoints
We note that the model does not allow considering observations with degenerate intervals, where the range is null.
This model allows for the application of classical inference methods; however one should keep in mind that the MidPoint
– Non-restricted configuration: allowing for non-zero correlations among all MidPoints and Log-Ranges;
– Interval-valued variables
– MidPoints (Log-Ranges) of different variables may be correlated, but no correlation between MidPoints and Log-Ranges is allowed:
– All MidPoints and Log-Ranges are uncorrelated, both among themselves and between each other:
From the Normality assumption it obviously follows that imposing non-correlations with Log-Ranges is equivalent to imposing non-correlations with Ranges. In cases
The mean vector and the variance-covariance matrix may be estimated by maximum likelihood. In the restricted configurations
Model-based Clustering considers the data as coming from a distribution that is a mixture of several components [24, 25, 26]. Each component is then associated with a cluster, characterized by a conditional density/mass function, and has a probability or “weight”. When the conditional probability is specified as the multivariate Gaussian, the model will be a finite mixture of multivariate Normals, known as the Gaussian mixture model.
The model parameters for each component, and the membership (posterior) probabilities of each unit, must be estimated, this is commonly accomplished by the Expectation-Maximisation (EM) algorithm [27]. This algorithm alternates an expectation (E) step, where the expectation of the log-likelihood at the current parameter estimates is computed, and a maximisation (M) step, where parameters are estimated by maximising the expected log-likelihood found in the E step.
Model-based Clustering of interval-valued data has been developed in [21], considering the Gaussian model described above (see Section 3.3), where the EM algorithm has been suitably adapted to the likelihood maximisation for the different covariance configurations.
Best partition in 2008
Best partition in 2008
BIC values for 2008 (left) and 2010 (right).
The finite mixture model with
where all weights
with
In Model-based Clustering of interval data,
Mean values by component in 2008
The covariance configuration and the number of components
The method presented above was applied to the LFS data described in Section 3.1, separately for 2008 and 2010. However, we had to disregard six units in 2008 and two in 2010, since they presented a degenerate interval in at least one of the two variables. The units removed in 2008 were: F-Center-Young-Sup, F-LTV-Young-Sup, F-North-Young-Sup, F-South-Mature-Sup, M-Center-Prime-Sec, M-Center-Young-Sup, and in 2010: F-Center-Young-Sec and M-North-Young-Sup.
Figure 3 shows the BIC values for
We notice that the lowest BIC value is attained for configuration
The best partition obtained for 2008 is displayed in Table 3, and Table 4 gathers the mean values per component for the four indicators. Figure 4 displays the parallel coordinate plot for this partition, providing some insights for its characterisation.
Parallel coordinate plot for 2008, best model.
Component 5 comprises Mature units with Basic education, and is characterised by high Activity and Unemployment Times, both with high variability (as conveyed by the log-ranges).
Component 1 is also composed by Mature units, now mostly with some education, with high Activity Times, lower Unemployment Times, both with not so high variability.
Component 3 is formed by Young and a few Prime units, it is characterised by low Activity and Unemployment Times, both with low variability.
Component 4 is mostly composed by Prime units with some education, and a few Young units. Its characterisation is similar to that of Component 3, but less pronounced; however, Unemployment time shows higher variability.
Finally, Component 2 comprises Prime units mostly with Basic education; it shows intermediate Activity and Unemployment Times, both with relatively high variability.
Table 5 describes the best partition obtained for 2010, Table 6 provides the corresponding mean values, and Fig. 5 displays the parallel coordinate plot.
Best partition in 2010
Parallel coordinate plot for 2010, best model.
Component 4 gathers Young units and is characterised by low Activity and Unemployment Times, both with low variability. It somehow corresponds to Component 3 of 2008, although now with no Prime units. In comparison, and as expected, Activity Time is on average lower and with lower variability. However, Unemployment Time is now on average higher, and with higher variability, which may be a consequence of the European sovereign debt crisis.
Component 1 is essentially formed by Mature units, irrespective of education level, coming from CP1 and CP5 of the 2008 partition. Activity and Unemployment Times are high, as expected, both with relatively high variability.
Component 2 is mostly composed by Prime units, with a few Mature. It is characterised by the high variability of both Activity and Unemployment Times, and high Unemployment Time.
Finally, Component 3, similarly to Component 4 of 2008, gathers mainly Prime and some Young units, with intermediate Activity and Unemployment Times, and respective variabilities.
We note that, unlike what happened in 2008, in 2010 education seems not to play a role in the formation of clusters. In particular, in 2008 there is a clear separation of Mature groups in two clusters, mainly distinguished in terms of education, while in the 2010 partition most Mature are grouped in one single cluster.
As a final remark, we observe that, although the individuals at micro data level are not the same in 2008 and 2010, the aggregate units formed correspond to the same sub-populations, allowing for a comparative analysis of the resulting partitions.
In both years, given the two variables used, age seems to be a driving force in the formation of clusters; in 2008 education also appears to play a role, but not so much in 2010.
It is noteworthy that, although some cluster correspondences may be identified, the partition is essentially not stable from 2008 to 2010, the sociological units gather in a different way. This may be the result of the change in the Portuguese labour market created by the economic crisis.
A “similarity” as a “concordance” in data analysis represents a mathematical modeling of the words “similarity” and “concordance” used in our natural language. Table 6 similarity measure quantifies the similarity between two objects and has a symmetric property, while the concordance measures the similarity between an object and (with) a collection of objects and therefore has no symmetric property. Thus, similarity and concordance express two different kinds of knowledge.
Mean values by component in 2010
Mean values by component in 2010
The concordance measure based on symbolic data description was introduced by Diday in 2020 [29]. It is named s-concordance with the prefix “s” because it is defined for symbolic data where the objects represent aggregations of individuals, i.e., classes, so the definition falls within the framework of symbolic data analysis (SDA) [29]. A class has high concordance with a given collection of classes for a category
The definition of s-concordance is based on two functions:
An axiomatic definition of s-concordance was given by Diday ([29], [30]) where he presented examples of concordance and related discordance measures. In our illustrative examples, we have focused on three of them:
Note that none of the pairs of these concordance and discordance measures are complementary opposites.
Two very different examples of the use of the presented measures are presented below.
In the first example, we are interested in the concordances and discordances of countries based on the proportion of students who scored at or above a high (high or advanced) level on the traditional paper reading assessment in the Progress in International Reading Literacy Study (PIRLS [31]). The PIRLS is an international assessment and research project designed to measure the reading achievement of fourth graders and the instructional practices of schools and teachers. The reading achievement scale is derived from several variables that measure the quality of reading. More detailed information can be found on the website of the cited reference (PIRLS 2016 User Guide, Chapter 4, p. 63). We focus our study on the 2016 survey data from fifty participating countries.
In this case, individuals are students and classes are countries. The collection
Distribution of countries by proportion of students with high or advanced levels of traditional reading achievement in the PIRLS 2016 survey
Distribution of countries by proportion of students with high or advanced levels of traditional reading achievement in the PIRLS 2016 survey
As it can be seen from Table 7, almost half of the countries have more than
Positioning of the countries based on their values of the functions 
The second chosen s-concordant measure
There are seven countries where the proportion of well-skilled students is above
Collection of textual documents
In the second example, we explore the use of the discordance measure as an alternative to the well-known Tf-Idf (term frequency - inverse document frequency) measure in Text Mining. The vector space model for representing text collections of documents is represented by a term-document matrix in which the documents are represented by columns and the index terms used to index the document collection are represented by rows. The basic idea of Tf-Idf (see [32]) is to characterize a category (here presence of the term
the proportion of documents within the class containing that term is high and; the classes of the given partition
There are several variants of the Tf-Idf measure. We will use its basic form, which is for the term
where
Note that the definition of Tf-Idf does not take into account the differences between classes due to the number of documents in which the term occurs. In the definition of the s-discordance, however, these differences are included in the function
We illustrate the difference between these measures in the collection of 15 documents (book titles) from the field of data mining (DM documents), linear algebra (LA documents), and one document combining these two fields (Table 8) [33].
Values of Tf-Idf and s-discordance measures of index terms for classes DM (data mining documents) and LA (linear algebra documents).
The list of index terms consists of terms that appear in at least two documents, with so called stop words or words commonly used in a language sorted out, and word variants mapped in their base form. In this way, a list of 16 index terms was created.
The values of the function
To distinguish relevant terms for classes based on defined measures of s-discordance and Tf-Idf we define
We have shown two examples of possible applications of the new measures s-concordance and s-discordance from different contexts.
In the first example, these measures were used to measure the concordance and discordance of a single country with the collection of all countries. We used data from a PIRLS 2016 survey where we focused on at least a high level of traditional paper reading assessment. Measures of s-concordance and s-discordance are used to compare the country to the collection of all countries. Since in the definitions of s-concordance and s-discordance we use the distribution of classes included with the function
In the second example, we used the measure of s-discordance to identify relevant terms that are characteristic of a particular class of documents. We compared our results with those obtained using the Tf-Idf measure. In our case, the s-discordance measure identified more relevant terms for classes.
The Tf-Idf measure detects relevant terms for a class if that term occurs only in that class, while the s-discordance measure detects relevant terms for a class if the frequency of their occurrence in that class is greater than in other classes. Because of this property, the s-discordance measure could be used to extend a weighting of terms when representing a document in a vector space model to improve classification performance. It could also be used in sentiment analysis to automatically capture a lexicon by calculation of s-discordance measure of index terms for classes of documents with positive and negative sentiment.
Conclusion
In conclusion, this paper intends to draw attention to Symbolic Data Analysis (SDA) in the field of Official Statistics, with a number of examples in the various sections that corroborate the potential of the proposed methods. SDA has been a pioneering line of research in the treatment of unconventional data, i.e. in the form of aggregated or, by their very nature, complex data. Nowadays, we refer to the latter as data with a greater degree of granularity. The interest behind SDA is to be able to extend statistical techniques and analysis of basic data to data representing classes of individuals. These are typical data from Official Statistics which, for reasons of confidentiality, synthesis and relative classification, are appropriately expressed in ranges of values and/or in divisions linked to spatial classifications. The several SDA pilot contributions presented in this paper highlight the application value and prospects that can be opened up for new developments in the extension of analysis techniques to Big Data. They require, as exhibited, summarization methods and appropriate data aggregation techniques that can be provided through symbolical data modelling. Each section emphasises the importance of a contribution.
Section 2 presents new aggregation methods based on the notion of exactly unified summaries. This property is generally not invoked in data reduction techniques and appropriate summaries and representations. For this reason, it makes an original contribution to the creation of a methodological framework for the aggregation of data in the form of intervals, bar charts, and histograms. Particularly interesting is the extension of the concept of unifiable summations to moments and distributions by adding the dimension of the set of units. Finally, this first contribution shows how the concept of unifiability can be extended to already synthesised or higher-level data.
The second contribution presented in this paper, in Section 3, focuses on an innovative approach to identify patterns and trends by analysing interval data, with an application to data from the Portuguese Labour Force Survey. This work shows how the representation of data in the form of intervals allows summarising well the information expressed by aggregated numerical data when there are no assumptions about the distribution within the given intervals of values. However, the strength of the contribution is to consider the variability referring to the entire range of values between the minimum and maximum, rather than just the central value as the centre of the distribution of values within the range. The work presented in Section 3 focuses on a parametric model-based clustering approach for aggregate interval-valued data. The application strength of the proposed methodology is demonstrated on data from the Portuguese Labour Force Survey – a typical study provided by NSI’s – for which it is proposed to cluster sociological units described by the corresponding range of Activity Time and Unemployment Time. Although the individuals at the micro-data level are not the same in 2008 and 2010, the aggregate units formed correspond to the same sub-populations, allowing for a comparative analysis of the resulting partitions. The proposed methodology also made it possible to assess the major influence of the analysis variables, among which age plays a more prevalent role than education.
The third contribution refers to one of the latest works that Edwin Diday was developing with the section co-authors on s-concordance and s-discordance measures. These are intended to express the degree of similarity between a class of individuals and a collection of classes based on the frequency of the categories used to describe them. Two applications highlight the value of the proposed measures and the applicative contribution. The first concerns data from the international measurement of reading achievement among young students in the Progress in International Reading Literacy Study PIRLS 2016 survey in fifty participating countries. The proposed measures are used to assess the concordance and discordance of a single country with the collection of all countries. A few countries with the highest percentage of students who are well qualified in reading and writing are identified as the countries with the highest s-discordance values, as they are very different from most other countries in this respect. The second application, on a collection of textual documents, highlights the effectiveness of using the discordance measure to identify relevant terms that are characteristic of a particular class of documents, as an alternative to the usual term frequency measure – inverse document frequency (Tf-Idf) in Text Mining. As main result of such application, the novel measures have allowed recognizing relevant terms more successfully than Tf-Idf.
SDA is still a flourishing and exciting field of research, with obvious potential in the context of Official Statistics, both for analysing aggregate data and for providing tools for constructing composite indicators that take into account the distribution of observed phenomena over time and space. We are therefore confident that attention may be focused on the ongoing and future developments of SDA. The field of Official Statistics remains particularly relevant to take advantage of the application of these techniques and to suggest new approaches. A collaboration with researchers from the NSI’s is desirable to demonstrate the explanatory power of the SDA approach, also compared with modern Machine Learning techniques that often do not guarantee equal interpretative ability.
Footnotes
Acknowledgments
Special acknowledgments go to Edwin Diday who, with his passion and focus on the research that has crowned his entire life, honored the NTTS2023 conference session and the event as a whole with his presence.
Section
by V. Batagelj is an elaboration of ideas presented at the 7th Workshop on Symbolic Data Analysis, SDA 2018, held in Viana do Castelo, Portugal, 18-20 October 2018. It was presented at NTTS2023 – Conference on New Techniques and Technologies for Statistics, 6–10 March 2023 (Brussels, Belgium). This work is supported in part by the Slovenian Research Agency (research program P1-0294, research program CogniCom (0013103) at the University of Primorska, and research projects J5-2557, J1-2481, and J5-4596), and prepared within the framework of the COST action CA21163 (HiTEc).
The work of P. Brito and A.P. Duarte Silva (presented in Section
) was supported by National Funds through the Portuguese funding agency, FCT – Fundação para a Ciência e a Tecnologia, within projects LA/P/0063/2020, DOI 10.54499/LA/P/0063/2020; and UID/GES/00731/2019.
Section
by S. Korenjak-Černe, J. DebŠa, and E. Diday is based on the work An illustration of the use of the measures s-concordance and s-discordance in applications presented to the Conference on New Techniques and Technologies for Statistics NTTS2023, 6–10 March 2023, Brussels, Belgium. This work is supported in part by the Slovenian Research Agency (research program P1-0294).
