Abstract

Compositional data refer to data where the relevant information is contained in the ratios between the values of the variables, called compositional parts. Typical examples are data consisting of chemical concentrations in some sample medium or export figures from trade of various commodity groups. In the first example, an increase of the concentration of one element necessarily leads to a decrease in other elements, because the concentration is expressed in a relative unit (e.g., mg/kg). In the second example, the raw export numbers of countries with different size and GDP are not directly comparable. For both examples, the ratios between the variables contain the meaningful information to be analyzed, which is done in compositional data analysis.
This special issue on ‘Compositional Data Modelling’ outlines the ongoing research in some topics of compositional data analysis. It can be considered as an outcome of the Fifth International Workshop on Compositional Data Analysis (CoDaWork), which was held from 3–7 June, 2013, in Vorau, a wonderful small village in the Austrian mountains, where the Guest Editors were among the main organizers. This workshop series intends to bring together specialists in the field as well as practitioners that are confronted with this kind of data.
This issue includes five articles which we have organized alphabetically according to the first author’s name.
The first article of Di Marzio et al. extends parametric regression for composi- tional data to the nonparametric case. The authors deal with all situations, namely that the response, the predictors, or both of them are compositions. Local constant and local linear smoothing is employed, and all methods are formulated coherently in the Aitchison geometry on the simplex.
The second article by Martín-Fernández et al. treats the important problem of count zeros. Generally, zeros in a compositional dataset are difficult to handle because the conventional log-ratio approach fails. With count zeros, the authors refer to discrete compositional count data which can result from insufficiently large samples. A Bayesian-multiplicative treatment is proposed, involving a Dirichlet prior distribution. Different zero replacements are then obtained by different parameterizations of the prior distribution.
The third article by Mert et al. is devoted to high-dimensional compositions as they appear in bioinformatics or chemometrics. A dimension reduction in the spirit of principal component analysis is intended, but the components are constructed by so-called balances, representing coordinates of an orthonormal basis in the simplex. An ultimate goal is to reduce the number of involved variables in the construction of the balances, reason why the authors use the concept of sparseness.
The fourth article by Pawlowsky-Glahn et al. describes how the total, i.e., the abundance, mass or amount, can be considered in addition to the relative information. The usual approaches to deal with this problem are (i) to take the logarithms of the compositions and (ii) to normalize the compositions to a constant sum and consider the total sum as an additional component. The authors study these different procedures from a mathematical point of view.
The final article by van den Boogaart et al. is devoted to regression with a compositional response, where missing values or values below a detection limit may occur. Usually, some form of imputation is used initially in order to estimate the regression coefficients. Here, the authors employ other techniques like Gibbs sampling and the Metropolis–Hastings algorithm.
We would like to thank all the authors for their work, as well as all referees for sending their reviews in time. Furthermore, we thank Herwig Friedl, Brian Marx and Jeffrey Simonoff for inviting us to coordinate this collective effort.
