A Bayesian approach for the segmentation of series with a functional effect

Abstract

Abstract:

In some application fields, series are affected by two different types of effects: abrupt changes (or change-points) and functional effects. We propose here a Bayesian approach that allows us to estimate these two parts. Here, the underlying piecewise-constant part (associated to the abrupt changes) is expressed as the product of a lower triangular matrix by a sparse vector and the functional part as a linear combination of functions from a large dictionary where we want to select the relevant ones. This problem can thus lead to a global sparse estimation and a stochastic search variable selection approach is used to this end. The performance of our proposed method is assessed using simulation experiments. Applications to three real datasets from geodesy, agronomy and economy fields are also presented.

Keywords

Segmentation functional effect dictionary approach Bayesian inference variable selection

1 Introduction

The motivation for this work initially comes from the change-point detection framework. Indeed, change-point detection problems arise in many fields such as biology (Boys and Henderson 2004), geodesy (Williams 2003; Bertin et al. 2017), meteorology (Caussinus and Mestre 2004; Fearnhead 2006; Wyse et al. 2011; Ruggieri 2013) or astronomy (Dobigeon et al. 2007), for example. The statistical purpose is therefore the detection of these change-points. However, some additional effects have been observed in some series from these fields. For example, in genomics, for comparative genomic hybridization (CGH) data, the change-points are associated to chromosomal aberrations and the studied series can also contain technical artefacts like ‘waves’ (see Picard et al. 2011 and references therein for more details). In geodesy, GPS coordinates series are used to determine accurate station velocities in tectonic studies. These series are affected by two different types of biases: abrupt changes due to equipment changes or earthquakes (change-points) and environmental effects such as soil moisture or atmospheric pressure changes that are reflected in these series by the apparition of periodic signals (Dong et al. 2002). An example of such series is given in Figure 1 where the vertical dotted lines correspond to known changes (antenna changes, clock changes, receiver changes, etc.). The correct and separate estimation of these two effects (the change-points and the ‘additional’ effect) is crucial. Both problems have been considered by Picard et al. 2011 for CGH data and Bertin et al. 2017 for GPS series in a frequentist inference framework. They propose to consider the ‘additional’ effect as a functional effect and to estimate it using splines, wavelets or using a dictionary approach. This latter approach was motivated by the fact that for the GPS application, contrary to the CGH one, the interpretation of the functional part in terms of known functions is important.

Figure 1:

Example of a GPS coordinate series (series YAR2 studied in this article). The known changes and reported in databases (Altamimi et al. 2011) are indicated by the vertical dotted lines

In this article, we consider the problem in a Bayesian framework and propose a novel approach that allows us to both estimate the segmentation part (the change-points and the means over the segments) and the functional part. The Bayesian approach has the advantage that expert knowledge can be introduced through prior distributions, as, for example, the known changes in the GPS application. Moreover, posterior distributions allow us for a quantification of the uncertainty, giving in particular posterior probabilities or credible intervals for the change-point locations. This is of particular interest for practitioners.

Segmentation part: Several methods have been proposed in a Bayesian framework for the multiple change-point detection problems. One of the first Bayesian methods, proposed by Barry and Hartigan (1993), used recursive techniques based on filtering distributions, and methods using the same approach were then developed (see Fearnhead 2006; Erdman and Emerson 2008; Wyse et al. 2011). Other methods applied Bayesian decision theory through the minimization of simple cost functions (Hannart and Naveau 2009), dynamic programming recursions (Ruggieri 2013) or non-parametric Bayesian approaches (Martínez and Mena 2014 and references therein). A great number of methods are based on Monte Carlo Markov Chain algorithms. In particular, Boys and Henderson (2004) or Tai and Xing (2010) use reversible jump Monte Carlo Markov Chain algorithms (RJMCMC), based on the fact that the dimension of the model can change according to the number of change-points. However, these reversible jump algorithms converge slowly and a good mixing of the chains is often difficult. To circumvent this problem, some authors proposed a parametrization of the change-points as latent binary variables to identify the change-points (see for instance Lavielle and Lebarbier 2001; Dobigeon et al. 2007; Harlé et al. 2016).

Here, following the work of Harchaoui and Lévy-Leduc (2010) in a frequentist framework, we propose to consider the change-point detection problem as a variable selection issue in a regression framework, that is new in the Bayesian framework to our knowledge. More precisely, the associated part in the model is expressed as a product of a lower triangular matrix by a sparse vector (with non-zero coordinates corresponding to change-point positions).

Functional part: To estimate the functional part of the model, we follow Bertin et al. (2017), and we consider a dictionary approach (Bickel et al. 2009) where the functional part is represented by a linear combination of functions composing the dictionary. This dictionary contains a large variety of functions, such as wavelets, splines or Fourier bases which leads to a more efficient approach than separate strategies to capture different effects of the functional part of the series. The large size of the dictionary leads then to a selection issue to obtain a sparse representation of this functional part in terms of these functions.

The new modelling of the change-point problem and the way of determining the functional effect we propose leads to a unified estimation strategy in terms of variable selection issues. Many efficient methods have been developed for Bayesian variable selection in a regression framework, in particular, the well-known stochastic search variable selection (SSVS) approach (George and McCulloch 1997; George and McCulloch 1993) that we consider in this article.

The remainder of the article is organized as follows. Section 2 presents the hierarchical Bayesian model considered, Section 3 outlines the procedure used to estimate the model parameters. In Section 4, the performance of the proposed method is studied through simulations. The advantages of the Bayesian approach are also illustrated in three real data studies in the fields of geodesy, agriculture and economy. Finally, in Section 6, we discuss and summarize the procedure results and benefits.

2 Model

2.1 Segmentation model with functional part

We observe a series $Y = (Y_{1}, \dots, Y_{n})^{'}$ that satisfies

Y_{t} = μ_{k} + f (x_{t}) + ε_{t}, \forall t \in I_{k} = (τ_{k - 1}, τ_{k}], k \in {1, \dots, K},

(2.1)

where

K

, the total number of segments of the series, is unknown, the

ε_{t}

are i.i.d centred Gaussian variables with variance

σ^{2}

x_{t}

is a covariate (the simple one is the time

t

f

is an unknown function to be estimated,

τ_{k} \in {2, \dots, n - 1}

is the

k

th change-point and

μ_{k}

is the mean of the series on the segment

I_{k}

. We assume that

τ_{0} < τ_{1} < τ_{2} < \dots < τ_{K}

and use the convention

τ_{0} = 0

and

τ_{K} = n

A classical approach in non-parametric frameworks is to expand the functional part $f$ with respect to an orthonormal basis, such as Fourier or wavelet ones (see Härdle et al. 1998 and references therein). Following Bickel et al. (2009) or Bertin et al. (2017), we choose here to adopt a dictionary approach that consists in finding an over-complete representation of $f$ . More precisely, we expand $f$ with respect to a large family of functions $(ϕ_{j})_{j = 1, \dots, M}$ , named dictionary that can, for example, be the union of two orthonormal bases. Then $f$ is assumed to be of the form

f (x) = \sum_{j = 1}^{M} λ_{j} ϕ_{j} (x),

where

λ = (λ_{1}, \dots, λ_{M})^{'} \in ℝ^{M}

is a vector of coordinates of

f

in the dictionary and

(f (x_{1}), \dots, f (x_{n}))^{'} = F λ,

where

F

is the

n \times M

matrix

F = (ϕ_{j} (x_{i}))_{i, j}

. Note that since large dictionaries are considered, this allows us to obtain a sparse representation of the function

f

, that is, the vector

λ

is expected to have few non-zero coordinates.

To estimate the change-points in the series, we follow the strategy proposed by Harchaoui and Lévy-Leduc (2010), which consists in reframing this task in a variable selection context. We denote by $X$ , the $n \times n$ lower triangular matrix having only $1$ ’s on the diagonal and below it. We consider the $n \times 1$ vector $β$ with only $K$ non-zero coefficients at positions $(τ_{k} + 1)_{k = 0, \dots, K - 1}$ with $β_{τ_{k} + 1} = μ_{k + 1} - μ_{k}$ and using the convention $μ_{0} = 0$ . Note that the segmentation (the change-points $τ_{k}$ and the means $μ_{k}$ ) will be recovered by the vector $β$ .

The model (2.1) can be rewritten as follows:

Y = X β + F λ + ε,

where

ε = (ε_{1}, \dots, ε_{n})^{'}

. Our objective is now to estimate the parameters

β

λ

and

σ^{2}

. Since both

β

and

λ

vectors are expected to be sparse, we propose using Bayesian methods of variable selection for their estimation.

2.2 Bayesian hierarchical framework

Following George and McCulloch (1993), we first introduce latent variables $γ$ and $r$ to identify non-null components of the vectors $β$ and $λ$ . The vector $γ = (γ_{1}, \dots, γ_{n})$ is such that $γ_{i} = I_{{β_{i} \neq 0}}$ , where $I$ denotes the indicator function and the vector $r = (r_{1}, \dots, r_{M})$ satisfies $r_{j} = I_{{λ_{j} \neq 0}}$ . The unknown numbers of non-zero coordinates of $γ$ and $r$ are denoted by $d_{γ} = K$ and $d_{r}$ , respectively. The product $X β$ is equal to $X_{γ} β_{γ}$ , where $X_{γ}$ is the $n \times d_{γ}$ matrix containing only the $j$ columns of $X$ so that $γ_{j}$ is non-zero and $β_{γ}$ is a $d_{γ} \times 1$ vector containing only the non-zero coefficients of $β$ . Similarly, we can express $F λ$ as $F_{r} λ_{r}$ , where $F_{r}$ is a $n \times d_{r}$ matrix and $λ_{r}$ is a $d_{r} \times 1$ vector. The model (2.1) can be then rewritten as:

Y = X_{γ} β_{γ} + F_{r} λ_{r} + ε,

where the parameters to be estimated are

θ = {β_{γ}, γ, λ_{r}, r, σ^{2}}

Then, as usual in a Bayesian context, these parameters are treated as random variables, assumed here to be independent, and we consider the following prior distributions. The $γ_{i}$ are independent Bernoulli variables with parameter $0 \leq π_{i} \leq 1$ for $i = 2, \dots, n$ and with $π_{1} = 1$ by convention. The $r_{j}$ are also independent Bernoulli variables with parameter $0 \leq η_{j} \leq 1$ for $j = 1, \dots, M$ . Then, the noise parameter follows a Jeffrey distribution, $π (σ^{2}) \propto σ^{- 2}$ . The conditional distribution of $β_{γ} | γ, σ^{2}$ is the classical $g$ -prior of Zellner (1986) given by $β_{γ} | γ, σ^{2} \sim N_{d_{γ}} (0, c_{1} σ^{2} {(X_{γ}^{'} X_{γ})}^{- 1})$ . Finally, the conditional distribution of $λ_{r} | r, σ^{2}$ is also a $g$ -prior, with $λ_{r} | r, σ^{2} \sim N_{d_{r}} (0, c_{2} σ^{2} {(F_{r}^{'} F_{r})}^{- 1})$ .

The posterior distribution of $θ$ has the following expression:

\begin{matrix} π (θ | Y) & = & \frac{π (Y | θ) π (β_{γ} | γ, σ^{2}) π (λ_{r} | r, σ^{2}) π (γ) π (r) π (σ^{2})}{π (Y)}, \end{matrix}

(2.2)

where

π (Y | θ) = {(\frac{1}{2 π σ^{2}})}^{\frac{n}{2}} exp (- \frac{1}{2 σ^{2}} {(Y - X_{γ} β_{γ} - F_{r} λ_{r})}^{'} (Y - X_{γ} β_{γ} - F_{r} λ_{r})) .

3 MCMC schemes

A classical approach for the computational scheme would be to estimate all of the parameters at the same time $(β_{γ}, γ, λ_{r}, r, σ^{2})$ using a Metropolis-within-Gibbs algorithm combined with the grouping (or blocking) technique of Liu (1994). Indeed, $β_{γ}$ and $γ$ , as well as $λ_{r}$ and $r$ , cannot be considered separately. An iteration of the algorithm would be made in three steps: update of $β_{γ}, γ ∣ λ_{r}, r, σ^{2}, Y$ , update of $λ_{r}, r ∣ β_{γ}, γ, σ^{2}, Y$ and update of $σ^{2} ∣ β_{γ}, γ, λ_{r}, r, Y$ . However, some drawbacks are associated with this algorithm. First, the update rates for $(β_{γ}, γ)$ and $(λ_{r}, r)$ will be very low since it is difficult to make good proposals for both $β_{γ}$ and $γ$ or for both $λ_{r}$ and $r$ . Second, as explained in (Lavielle and Lebarbier, 2001, see Section 4), in a Bayesian segmentation framework, the posterior mean of $β_{γ}$ , obtained from the posterior distribution of $(β_{γ}, γ)$ , is of no interest. The interpretation of this mean is not obvious since it is calculated over all the possible configurations of change-points. A solution is to use the posterior distribution of $β_{γ}$ , conditionally to $γ$ . We have the same drawback for the functional part.

For these two reasons, we propose the following two-step strategy: the first step aims at detecting the positions of the change-points and at selecting the functions, that is, to estimate the latent vectors $γ$ and $r$ . To this end, the parameters $β_{γ}$ , $λ_{r}$ and $σ^{2}$ can be considered as nuisance parameters, and we use the joint posterior distribution integrated with respect to $β_{γ}$ , $λ_{r}$ and $σ^{2}$ . This can be viewed as a collapsing technique, see Liu (1994) and van Dyk and Park (2008). In the second part, we estimate $β_{γ}$ , $λ_{r}$ and $σ^{2}$ , conditionally to $γ$ and $r$ . The MCMC scheme would then be as follows:

Estimation of $γ$ and $r$ : use of a Metropolis–Hastings algorithm to draw from the joint posterior distribution $π (γ, r | Y)$ integrated with respect to $β_{γ}$ , $λ_{r}$ and $σ^{2}$ .

Estimation of $β_{γ}, λ_{r}$ and $σ$ : given the estimates $\hat{γ}$ and $\hat{r}$ , use of a Gibbs sampler algorithm.

In the following subsections, we give some details of both steps.

3.1 Metropolis–Hastings algorithm

The joint posterior distribution integrated with respect to $β_{γ}$ , $λ_{r}$ and $σ^{2}$ is the following (see details in 7):

π (γ, r | Y) \propto (1 + c_{1})^{- d_{γ} / 2} π (γ) π (r) g (γ, r, Y),

where

\begin{matrix} g (γ, r, Y) = & {(\frac{|{(F_{r}^{'} (U_{γ}^{- 1} + \frac{I}{c_{2}}) F_{r})}^{- 1}|}{| c_{2} (F_{r}^{'} F_{r})^{- 1} |})}^{1 / 2} \\ \times {[\frac{1}{2} Y^{'} (U_{γ}^{- 1} - U_{γ}^{- 1} F_{r} {(F_{r}^{'} (U_{γ}^{- 1} + \frac{I}{c_{2}}) F_{r})}^{- 1} F_{r}^{'} U_{γ}^{- 1}) Y]}^{- n / 2}, \end{matrix}

and

U_{γ} = {(I - \frac{c_{1}}{1 + c_{1}} X_{γ} {(X_{γ}^{'} X_{γ})}^{- 1} X_{γ}^{'})}^{- 1} .

To sample from

π (γ, r | Y)

, a Metropolis–Hastings algorithm is used. At iteration

s

, a candidate

(γ^{*}, r^{*})

is proposed from

(γ^{(s)}, r^{(s)})

, and using symmetric transition kernel, the acceptance rate is:

ρ ((γ^{(s)}, r^{(s)}); (γ^{*}, r^{*})) = min {1, \frac{π (γ^{*}, r^{*} | Y)}{π (γ^{(s)}, r^{(s)} | Y)}} .

To have a symmetric transition kernel, two kinds of proposals are used (each one of probability 1/2): either

k

components of

γ^{(s)}

are randomly changed, or

l

components of

r^{(s)}

are randomly changed (a value of 0 is switched to 1, and conversely), where

k

and

l

are two fixed integers. We modify only one of the two latent vectors at each iteration since proposals with modifications of both vectors have too low acceptance rates. In this algorithm, by convention, we suppose

γ_{1} = 1

(time 0 is a change-point, corresponding to

τ_{0} = 0

) and

r_{1} = 1

(the constant function is always selected).

The number of iterations of this algorithm is $b + m$ , where $b$ corresponds to the burn-in period. Then, $γ$ and $r$ are estimated using the sequences ${γ^{(s)}}$ and ${r^{(s)}}$ , for $s = b + 1, \dots, b + m$ . The most relevant positions for the change-points and the most relevant functions for the functional part are those which are supported by the data and prior information. In other words, they are those corresponding to the $γ$ and $r$ components with higher posterior probabilities. In practice, the selected components are those with posterior probability higher than a given threshold. As in Muller, Parmigiani, Robert and Rousseau (2004) or Muller, Parmigiani and Rice (2006), we choose a threshold that minimizes a loss function. Here, the considered loss function is the sum of the false discovery and of the false negative ( $FD + FN$ ), leading to a threshold of $1 / 2$ . This also corresponds to the selection of the median probability model in Barbieri and Berger (2004), which has been shown to have greater predictive power than the most probable model, under many circumstances.

3.2 Gibbs sampler algorithm

Once $γ$ and $r$ have been estimated, our goal is to estimate $β_{γ}$ , $λ_{r}$ and $σ^{2}$ from the distribution $π (β_{γ}, λ_{r}, σ^{2} | r, γ, Y) \propto π (β_{γ}, λ_{r}, σ^{2}, r, γ | Y) .$ Note that we use $γ$ and $r$ instead of $\hat{γ}$ and $\hat{r}$ for notational simplicity.

A Gibbs sampler algorithm is then used. At each iteration, the three parameters should be drawn from its full conditional distribution given by:

\begin{matrix} β_{γ} | λ_{r}, σ^{2}, r, γ, Y \sim N_{d_{γ}} (\frac{T_{γ} X_{γ}^{'} (Y - F_{r} λ_{r})}{σ^{2}}, T_{γ}), \\ λ_{r} | β_{γ}, σ^{2}, r, γ, Y \sim N_{d_{r}} (\frac{W_{r} F_{r}^{'} (Y - X_{γ} β_{γ})}{σ^{2}}, W_{r}), \\ σ^{2} | β_{γ}, λ_{r}, r, γ, Y \sim IG (a, \frac{b}{2}), \end{matrix}

where the notation

IG

stands for the inverse Gaussian distribution,

T_{γ} = σ^{2} {[\frac{1 + c_{1}}{c_{1}} X_{γ}^{'} X_{γ}]}^{- 1}

W_{r} = σ^{2} {[\frac{1 + c_{2}}{c_{2}} F_{r}^{'} F_{r}]}^{- 1}

a = \frac{n}{2} + \frac{d_{γ}}{2} + \frac{d_{r}}{2}

and

b = {(Y - X_{γ} β_{γ} - F_{r} λ_{r})}^{'} (Y - X_{γ} β_{γ} - F_{r} λ_{r}) + β_{γ}^{'} (\frac{X_{γ}^{'} X_{γ}}{c_{1}}) β_{γ} + λ_{r}^{'} (\frac{F_{r}^{'} F_{r}}{c_{2}}) λ_{r} .

To estimate

β_{γ}

λ_{r}

and

σ^{2}

, empirical posterior means are computed using only post-burn-in iterations. Using the estimators

\hat{β}

and

\hat{λ}

β

and

λ

, we then obtain the estimator

\hat{f} (\cdot) = \sum_{j = 1}^{M} {\hat{λ}}_{j} ϕ_{j} (\cdot)

of the function

f

, the estimators

{\hat{τ}}_{k}

of the change-points and the estimators

{\hat{μ}}_{k}

of the means (see Section 2.1). The estimated number of change-points is given by

\hat{K} = \sum_{i = 1}^{n} I_{{\hat{β}}_{i} \neq 0}

4 Simulation study

In this section, we conduct a simulation study to assess the performance of our proposed method to estimate both the ‘parametric’ part (the segmentation part) and the ‘non-parametric’ part (the functional part). Our method is called SegBayes_SP for ‘semi-parametric’. Moreover, in order to investigate the impact of taking into account the functional part on the estimation of the segmentation part, we compare the results of SegBayes_SP with those of the same Bayesian procedure that includes only the segmentation part in the model (i.e., if the model is supposed to be $Y_{t} = μ_{k} + ε_{t}, \forall t \in I_{k} = (τ_{k - 1}, τ_{k}], k \in {1, \dots, K}$ ). This case is called SegBayes_P for ‘parametric’. The estimation is still obtained using a Metropolis–Hastings algorithm to estimate $γ$ , followed by a Gibbs sampler to estimate $β_{γ}$ and $σ^{2}$ . Section 4.1 contains our simulation design, the parameters needed for the procedures and the quality criteria. Section 4.2 gives the results.

4.1 Simulation design, parameters of the procedures and quality criteria

Simulation design: We consider model (2.1) with $x_{t} = t$ and the function $f$ which is a mixture of a sine function with three peaks:

f (t) = 0.3 \times sin (2 π \frac{t}{20}) + 1.5 I_{0.1 \times n} (t) - 2 I_{0.5 \times n} (t) + 3 I_{0.6 \times n} (t),

(4.1)

where

I_{A}

stands for the indicator function of the set

A

. Note that this function contains both smooth components and local irregularities (see plot (

h

) of Figure 4). We simulate 100 series of length

n = 100

with

K = 4

segments and the mean of each segment takes a value in

{0, 1, 2, 3, 4, 5}

randomly, without replacement over adjacent intervals (two adjacent intervals can not have the same mean). The positions of the three change-points are randomly chosen with the following constraints: they are positioned at a distance from the peaks of at least 3, and each segment is at least of length 5. In order to consider several change-point detection difficulties, four levels of noise

σ

are considered:

σ \in {0.1, 0.5, 1, 1.5}

(the more

σ

increases, the more difficult the detection becomes).

Parameters: Several parameters or quantities need to be fixed in the procedures. For the procedure SegBayes_SP, we consider the following dictionary which contains 151 functions: 128 Haar functions ( $t \mapsto 2^{7 / 2} I_{[0, 1]} (\frac{2^{7} t}{100} - k)$ , $k = 0, \dots, 2^{7} - 1$ ), the Fourier functions ( $t \mapsto sin (2 π j \frac{t}{n})$ , $t \mapsto cos (2 π j \frac{t}{n})$ , $j = 1, \dots, 10$ ), the functions $t \mapsto t$ and $t \mapsto t^{2}$ , and the constant function. Note that Table 1 gives the indexes of these functions (those of $f$ given by (4.1) are in particular 11, 51, 61 and 110).

For both procedures, we use the same prior parameters. With respect to the Metropolis–Hastings algorithm and the Gibbs sampler, we run each one for 20 000 iterations including 5 000 burn-in iterations. The parameters $c_{1}$ and $c_{2}$ are fixed at 50, which is quite standard and recommended for instance by Smith and Kohn (1997). The initial numbers of segments and dictionary functions are 3. The number of change-points proposed to be changed at each iteration is 2, as well as the number of functions of the dictionary. The initial probability for each position of change-point is 0.01, as well as the initial probability for each function of the dictionary. To select relevant change-points and functions, we used a threshold equal to 1/2 (see Section 3.1 for a justification of this threshold).

Quality criteria: To study the performance of the procedures, the same criteria are used for both the segmentation and the functional parts. The first one is the root mean squared error (RMSE) that allows us to assess the quality of the estimation. The other ones (the false discovery rate ( $FDR$ ) and the false negative rate ( $FNR$ )) evaluate the quality of detection of the change-points and selection of functions. These criteria are described in detail in the following:

For the segmentation part,

- the root mean squared distance between the true mean and its estimate is $RMSE (μ) = \sqrt{\frac{1}{n} \sum_{t = 1}^{n} (μ (t) - \hat{μ} (t))^{2}}$ , with $μ (t) = μ_{k}$ for $t \in I_{k} = (τ_{k - 1}, τ_{k}]$ , $k = 1, \dots, K$ and $\hat{μ} (t) = \hat{μ_{k}}$ for $t \in \hat{I_{k}} = ({\hat{τ}}_{k - 1}, {\hat{τ}}_{k}]$ , $k = 1, \dots, \hat{K}$ .

-the proportion of erroneously detected change-points among detected change-points, denoted by $FDRbp$ , and the proportion of undetected change-points among true change-points denoted by $FNRbp$ .

For the functional part,

- the root mean squared distance between $f$ and its estimate: $RMSE (f) = \sqrt{\frac{1}{n} \sum_{t = 1}^{n} (f (t) - \hat{f} (t))^{2}}$ .

- the proportion of erroneously detected functions among detected functions, denoted by $FDR (f)$ , and the proportion of undetected functions among true functions denoted by $FNR (f)$ .

Note that a perfect segmentation results in both null

FDRbp

and

FNRbp

, as well as both null

FDR (f)

and

FNR (f)

are equivalent to a perfect selection of the functions.

The averages of these criteria over the 100 simulations are considered. Note that we also look at the estimation of the standard deviation of the noise

\hat{σ}

4.2 Results

Both procedures have been implemented in R (R Core Team, 2009) on quad-core processors 2.8GHz Intel X5560 Xeons with 8MB cache size and 32 GB total physical memory. In particular, the standard Metropolis–Hastings algorithm for procedure SegBayes_SP took 169 minutes, for the 400 simulations (100 series for 4 noises, so it took approximately 25 seconds for each simulation).

In the following, we first analyse the results obtained with the 100 simulated series and then we make some remarks on the study of a particular series.

Figure 2:

Average quality criteria on 100 simulated series with $σ = 0.1$ , $σ = 0.5$ , $σ = 1$ and $σ = 1.5$ , for SegBayes_SP (semi-parametric model) and SegBayes_P (parametric model)

Overall results and comparison between SegBayes_SP and SegBayes_P: Figure 2 shows the average quality criteria over the 100 simulated series for the 4 different values of $σ$ , and Figure 3 gives the average of the estimates $\hat{σ}$ .

When the detection problem is easy ( $σ = 0.1$ ), SegBayes_SP tends to recover correctly the two parts of the model. Indeed the selected number of segments is close to the true one and the change-points are well positioned (small $FDRbp$ and $FNRbp$ ). The same occurs with the selected functions. When the functional part is not taken into account, using the procedure SegBayes_P, the segmentation needs to compensate: the number of segments is overestimated leading to bad estimation of the segmentation part (higher $RMSE (μ)$ and $FDRbp$ ). This is illustrated in Figure 5 on one simulated series where the procedure selects false positive change-points to fit the series well (in particular to catch the peaks of the function). The higher $FNRbp$ compared to SegBayes_SP is explained by the fact that the precise positioning will be better obtained when the functional part is well estimated, in particular at positions where the jump of the means is small.

When the noise $σ$ increases, the two procedures tend to underestimate the number of segments and for SegBayes_SP also the number of functions. These results were expected (and generally observed for the segments in segmentation problems). Indeed, in this case, the segmentation and the functional part will be taken out confused together and also with the errors. To avoid false detection and selection, one may prefer to select fewer change-points and functions. This is particularly true for the functional part for which we observed a strong decrease in the number of selected functions from $σ = 0.5$ . Consequently, $RMSE (f)$ increases and $FNR (f)$ is close to $1$ . Moreover, since the functional is not well estimated, the two procedures lead to the same results for the estimation of the segmentation part (same values of the different criteria for the quality of the segmentation).

Finally, the quality of the estimation of both parts is related to the quality of the estimation of $σ$ : estimates $\hat{σ}$ are good for $σ = 0.1$ and $σ = 0.5$ , but they appear too large for $σ = 1$ and $σ = 1.5$ (see Figure 3).

An advantage of these Bayesian procedures is that we obtain empirical posterior probabilities for the possible change-points and functions. This will be illustrated in the following paragraph.

Figure 3:

Average of the estimates $\hat{σ}$ on 100 simulated series with $σ = 0.1$ , $σ = 0.5$ , $σ = 1$ and $σ = 1.5$ , for SegBayes_SP (semi-parametric model) and SegBayes_P (parametric model)

Results for particular series: We look here in detail at particular series simulated as follows: three change-points are considered at positions 7, 18 and 36, the means over the four segments are 2, 0, 2 and 3, respectively, and we consider two series: one with $σ = 0.1$ and the second one with $σ = 1$ . We choose these series because they have two easy to detect change-points (jumps from 2 to 0 and from 0 to 2), and one change-point which is more difficult to detect (the jump from 2 to 3).

The results of the procedure SegBayes_SP for the first series with $σ = 0.1$ (low noise) are given in Figure 4. We observe that, in this case, the true change-points and functions are exactly recovered (see plots ( $a$ ) and ( $b$ )) and the two parts of the model are well estimated (see plots ( $g$ ) and ( $h$ )) leading to a good estimation of the whole (see plot ( $f$ )). Note that Table 1 gives the correspondences between the selected functions from the dictionary and their indexes, and by convention the change-point at time 0 and the constant function are selected.

The advantage of a Bayesian approach compared to a frequentist one, is that the posterior probabilities give some additional interesting information. For instance, the posterior probabilities of the change-points at positions 0, 7 and 18 are 1, while those of the change-points 35 and 36 are 0.48 and 0.55, respectively (see plots ( $a$ ) and ( $b$ )). Indeed, the choice between change-points 35 and 36 is not so easy for the sampler (see the traces in plots ( $c$ ) and ( $d$ )). Also observe that we can deduce from the traces that the posterior probability of having a change-point in ${35, 36}$ is close to 1. Moreover, let us make another comment. Instead of using a threshold to obtain $\hat{γ}$ and $\hat{r}$ , another way could have been to use the posterior modes. However, this approach generally leads to more false positive change-points. This has been observed in many series and this is also the case in this particular series. Here, using the posterior modes results in the detection of the true change-points at positions 7 and 18 as previously, but also of a change-point at position 35 instead of 36 and of a false positive change-point at position 11.

On the same simulated series, the result of the procedure SegBayes_P (without considering the functional part) is given in Figure 5. The change-points 7, 18, 36, 59 and 60 are selected with high posterior probabilities. The two last detected change-points are false positive, but they correspond to the Haar 60 from the functional part. As explained in the previous paragraph, when the functional part is forgotten, the segmentation tends to catch it.

Figure 4:

Results of SegBayes_SP on the particular simulated series with $σ = 0.1$ . Posterior probabilities for the $γ$ and $r$ components (plots ( $a$ ) and ( $b$ )). Traces of the $γ$ components 35, 36 and 11 (plots ( $c)$ , ( $d$ ) and ( $e$ )). The series, the whole true expectation and its estimation, the true positive (TP), false positive (FP) and false negative (FN) change-points are also represented (plot ( $f$ )). The true and estimated segmentation part and functional part (plots ( $g$ ) and ( $h$ ))

Table 1:

Functions from the dictionary and their corresponding indexes

Index	Function
1	constant term
2	Haar function at $t = 1$
3	Haar function at $t = 2$
⋮	⋮
101	Haar function at $t = 100$
102	$sin (2 π \times 1 \times \frac{t}{100})$
103	$cos (2 π \times 1 \times \frac{t}{100})$
⋮	⋮
120	$sin (2 π \times 10 \times \frac{t}{100})$
121	$cos (2 π \times 10 \times \frac{t}{100})$
122	$t$
123	$t^{2}$

Figure 5:

Result of SegBayes_P on the particular simulated series with $σ = 0.1$ : the series, the whole true expectation and its estimation, the true positive (TP), false positive (FP) and false negative (FN) change-points are also represented

As pointed out previously, the case $σ = 1$ is challenging since the jump of 1 (even 2) on the mean of the series becomes difficult to detect, as well as peak functions which can be confused with the noise. In Figure 6, we can see that using SegBayes_SP, the posterior probabilities of the true change-points at positions 7 and 18 are 0.60 and 1, respectively, while the posterior probabilities of the change-point at position 36 which is not selected is 0.19. Concerning the functional part, only the Haar 60 is detected with posterior probability 1. Looking at the series, this is expected since at position 36, the jump is not marked. The result of the procedure SegBayes_P on this same series is not very good, since only the true change-point 18 is detected with a posterior probability of 0.72.

Figure 6:

Results of SegBayes_SP on the particular simulated series with $σ = 1$ . Top: posterior probabilities for the $γ$ and $r$ components. Bottom: the series, the whole true expectation and its estimation, the true positive (TP), false positive (FP) and false negative (FN) change-points are also represented

Sensitivity and convergence: To study the sensitivity of the estimates $\hat{γ}$ and $\hat{r}$ to the choice of prior parameters, we ran the Metropolis–Hastings algorithm on the same particular series as before with $σ = 0.1$ , with different choices of prior parameters. Table B1 in Appendix B shows the results obtained for several simulation scenarios (21 different runs).

On average, the procedure is not over sensitive to the choice of the prior parameters: from the 21 runs, 10 detected exactly the true change-points and functions, and 8 runs detected the true change-points and functions with a shift or an ‘exchange’. For instance, runs 9 and 13 select change-points at positions 37 and 35, respectively, instead of the 36. Runs 6, 11, 16 and 19 select a change-point at position 10 instead of 7, and functions 9 and 10 (Haar 8 and Haar 9) instead of 11 (Haar 10).

Some other sensitivity remarks can still be made. First, we can see that too small a value for $c_{1}$ and $c_{2}$ should not be used since it results in too many undetected change-points and functions (see run 2). Moreover, the number of components of $γ$ and $r$ to be changed at each iteration should not be too high. Indeed, in this case, the proposed changes are too difficult to accept, leading to a poor acceptance rate. For instance, for runs 12 and 15, the acceptance rates for $γ$ and $r$ are respectively 0.05% and 0.02% (instead of 3% for the other runs). Consequently, for these two runs some false-positive change-points and functions, are selected. On the contrary, the initial number of segments and functions, and the values of the probabilities $π_{l}$ and $η_{j}$ do not seem to influence the number and the validity of selected change-points and functions from the dictionary too much (see runs 4 to 9 and 16 to 21).

To study the convergence, we ran the algorithm three times with the same prior parameters as run 1, with 20 000 iterations (5 000 of burn-in), and one time with 50 000 iterations (10 000 of burn-in). The results are given in Table B2 in Appendix B. We observe that 20 000 iterations, including 5 000 of burn-in, seem to be enough to reach convergence since the obtained results are similar for these four runs (or to the previous runs 1, 5, 8, 11, 14, 17 and 20 that have the same prior parameters). The acceptance rates for $γ$ and $r$ are not very high in general (around 3%). However, if we look in more detail at the traces, it appears that usually when a true change-point or a true function from the dictionary is selected, it will be selected until the end of the algorithm, while a position which is not a change-point will be alternately selected and unselected. Consequently, when most of the change-points and bias functions have been selected, the chain will not be updated much, resulting in a poor acceptance rate.

5 Application

5.1 Geodetic data

In this section, we propose to use our procedure in the geodesic field for the problem of homogenization of GPS series. Indeed, such series are used to determine accurate station velocities for tectonic and Earth mantle studies (King et al. 2010). However, they are affected by two effects: (i) abrupt changes that are related to equipment changes (documented or not), earthquakes or changes in the raw data processing strategy and (ii) periodic signals that are due to environmental signals, such as soil moisture or atmospheric pressure changes. The correct detection of these effects is fundamental for the aforementioned application.

Here, we consider a particular series (the height coordinate of the series) from the GPS station in Yarragadee, Australia, YAR2 at the weekly scale. The size of the series is $n = 781$ . The data can be downloaded at http://sideshow.jpl.[nasa.gov/post/series.html.] We refer the readers to Bertin et al. (2017) for more detail about the problem and the data.

We apply our proposed procedure SegBayes_SP to this series with a dictionary of $195$ functions that includes the constant function and the Fourier functions: $t \mapsto sin (2 π w_{i} t), t \mapsto cos (2 π w_{i} t)$ where natural frequencies $w_{i} = i / T$ , $T = \max (t) - \min (t)$ (Scargle 1982) and $T / i$ is larger than 8 weeks (smaller periods are generally negligible, see Ray et al. 2008).

The Metropolis–Hastings algorithm is run for 100 000 iterations (30 000 burn-in), with $c_{1} = c_{2} = 50$ . The initial number of segments and functions is 5, the number of change-points or functions proposed to be changed at each iteration is 1. The Bayesian framework offers the opportunity to take into account the available change-point information: the initial probability for a position which is not associated with a known equipment change is set to 0.01 and the initial probability for a position associated with a known equipment change is set to 0.5. Concerning the Gibbs sampler algorithm, we run it for 100 000 iterations including 50 000 burn-in iterations and we choose $c_{1} = c_{2} = 50$ . On quad-core processors 2.8GHz Intel X5560 Xeons with 8MB cache size and 32 GB total physical memory, the standard Metropolis–Hastings algorithm for procedure SegBayes_SP took 536 minutes.

The results are given in Figure 7. We call ‘validated change-points’ the detected change-points that are documented in databases (in red); ‘unreported change-points’ the detected change-points, but not documented in databases (in blue); and ‘missed change-points’ the reported changes in databases that are not detected here (in green). The different status of the change-points should be tempered. Indeed, small earthquakes may not have been reported in databases, some changes of equipment should have no impact or a delayed impact. Moreover, according to the obtained posterior probabilities, we have chosen a threshold of $0.8$ .

Figure 7:

Result of the procedure SegBayes_SP on the YAR2 series. Top: posterior probabilities of the change-points and functions. Bottom: Estimated expectation and validated, unreported and missed change-points (based on known equipment changes and malfunctions)

A total of 5 of the 14 change-points detected by our procedure correspond exactly or are close to known changes reported in databases: 1 085 which corresponds exactly to a clock change, and GPS weeks 1 689 and 1 708 which correspond exactly to radome radar changes. The change-points at GPS week 1 057 and 1 494 may be associated with the receiver changes at GPS week 1 052 and 1 479. At the beginning of the series, a large number of positions declared as change-points may be due to the particular erratic behaviour of the data that contrasts with the rest of the series. Indeed we can observe that this part contains two known changes at GPS week 1 016 (a clock change) and 1 020 (receiver change), followed by many missing data and then again a receiver change at the GPS week 1 052. A possible explanation of this erratic behaviour could then be a calibration problem of the equipment during this period. Note that we also apply our procedure with non-informative priors for the positions of the change-points (all the initial probabilities equal to 0.01). In this case, several documented change-points are no longer detected, showing the interest of using prior knowledge.

Examining the selected functions, four of them were selected: $sin (2 π \times \frac{t}{52})$ , $cos (2 π \times \frac{t}{52})$ , $sin (2 π \times \frac{t}{41})$ and $sin (2 π \times \frac{t}{26})$ . These functions furnish relevant geodetic information. In particular, the selection of the two functions with periods of 52 and 26 weeks is consistent with the fact that atmospheric pressure can be approximated by periodic signals with dominant annual and semi-annual periods (Dong et al. 2002).

5.2 Exchange rate data

In this section, we propose to use our procedure in the econometrics field for the problem of the exchange rate. More precisely, we study the daily records of the Mexican peso/US dollar exchange rate from January 2007 to December 2012 (data available at www.federalreserve.gov ). These data were studied by Martínez and Mena (2014) with a Bayesian non-parametric method.

We apply our proposed procedure SegBayes_SP to this series with a dictionary of $23$ functions that includes the following functions of time ${t \mapsto t^{j}, j = 1, 2}$ and the Fourier functions: $t \mapsto sin (2 π i t / n), t \mapsto cos (2 π i t / n)$ for $i = 1, \dots, 10$ . The Metropolis–Hastings algorithm is run for 100 000 iterations (30 000 burn-in), with $c_{1} = c_{2} = 50$ . The initial number of segments and functions are 5, the number of change-points or functions proposed to be changed at each iteration is 1.

The initial probability for each possible function is 0.01, as well as the initial probability for a position. Concerning the Gibbs sampler algorithm, we run it for 100 000 iterations including 50 000 burn-in iterations and we choose $c_{1} = c_{2} = 50$ .

At the top of Figure 8, the posterior probabilities of the change-points and functions are given. At the bottom of Figure 8, the series with both the estimation of the expectation and the estimated change-points are given. The five most probable change-points detected are dated 3 October 2008, 14 January 2009, 26 March 2009, 3 April 2009 and 14 September 2011.

These are close to the ones obtained by Martínez and Mena (2014) and, as explained in this article, four can be related to events in Mexico and USA: (i) September–October 2008: the 2007–2008 financial crisis; (ii) March–May 2009: the flu pandemic suffered in Mexico; (iii) the US debt-ceiling crisis in 2011. The other change-points detected by Martínez and Mena (2014), that do not correspond to known events, are not detected by our procedure, certainly due to the presence of the functional part.

Figure 8:

Result of the procedure SegBayes_SP on the Mexican peso/US dollar exchange rate series. Top: posterior probabilities of the change-points and functions. Bottom: estimated expectation and change-points

5.3 Agronomic data

In this section, we propose to use our procedure to analyse the Périgord black truffle production in France. A decrease in the production has been observed, and researchers in agronomy and the truffle orchard managers are interested in exhibiting the causes of this phenomena. In particular, the specialists want to detect abrupt changes due to sociological factors such as wars and rural desertification. Moreover, the production can be affected by periodical climatic variations and long-term tendencies like climatic warming (see Le Tacon et al. (2014)). Using data from the French Ministry of Agriculture, the annual production of Périgord black truffle in the Vaucluse was calculated for the years 1904 to 1972. The access to and the calculation of these reliable data was not easy, and the reader can refer to Le Tacon (2017) for more information. Our procedure SegBayes_SP was applied to these data, with a dictionary of 110 functions that includes the constant function, 26 Fourier functions ( $t \mapsto sin (2 π j \frac{t}{n})$ , $t \mapsto cos (2 π j \frac{t}{n})$ , $j = 1, \dots, 13$ ), 64 Haar functions ( $t \mapsto 2^{6 / 2} I_{[0, 1]} (\frac{2^{6} t}{100} - k)$ , $k = 0, \dots, 2^{6} - 1$ ), 14 b-splines of order 3 (13 knots), and the functions $t \mapsto t^{1 / 3}$ , $t \mapsto t^{1 / 2}$ , $t \mapsto t$ , $t \mapsto t^{2}$ and $t \mapsto \log (t)$ . The Metropolis–Hastings algorithm is run for 100 000 iterations (30 000 burn-in), with $c_{1} = c_{2} = 50$ . The initial number of segments is 3, the initial number of functions is 5, the number of change-points or functions proposed to be changed at each iteration is 1. The initial probability for each possible function is 0.01, the initial probability for the years 1914, 1918, 1939 and 1945 (corresponding to the beginning and end of the first and second world wars) is 0.5 and the initial probability for the other years is 0.01. The Gibbs sampler algorithm was run for 100 000 iterations including 30 000 burn-in iterations and we choose $c_{1} = c_{2} = 50$ . On quad-core processors 2.8GHz Intel X5560 Xeons with 8MB cache size and 32 GB total physical memory, the standard Metropolis–Hastings algorithm for procedure SegBayes_SP took 88 seconds.

Figure 9:

Result of the procedure SegBayes_SP on the Périgord black truffle production dataset. Top: posterior probabilities of the change-points and functions. Bottom: the series and its estimated expectation

At the top of Figure 9, the posterior probabilities of the change-points and functions are given. At the bottom of Figure 9, the series and its estimated expectation are plotted. Only one change-point is detected with a threshold of

0.5

and two at years 1917 and 1938 with a threshold of

0.3

, identifying the two World War periods. After the first war, the truffle orchards had not been managed for several years and the canopy had gradually closed, which decreased the light and water availability and led to the disappearance of some truffle orchards (see Le Tacon et al. (2014)). Moreover, after the second war, a massive rural exodus in France led to a general agricultural decline, but, especially so, concerning truffle orchards. Concerning the functional part, only one function has been selected using a threshold of 0.5, the Haar function associated with the year 1915. Looking at the series, this year can be characterized as exceptional, with the highest production in the whole of the 20th century. The use of our method on these data does not show any long-term tendency which can be attributed to climate change. Note that without considering the functional part of the model, this exceptional year would have been captured by the segmentation part, which is less relevant in terms of interpretation for the specialists.

6 Discussion

In this article, we propose a novel Bayesian method to segment a series with functional effects. The functional part is estimated by a linear combination of functions from a dictionary. Since the dictionary can be large, this approach is flexible and allows us to estimate functions with both smooth components and local irregularities. The estimation procedure consists in selecting the relevant functions of the dictionary. For the change-points, following Harchaoui and Lévy-Leduc (2010), the associated part of the model is reformulated as a variable selection issue. Globally, these two considerations result in the estimation of sparse vectors for which a SSVS approach is applied.

We show the good performance of our procedure in a simulation study and for three real datasets. In particular in the three examples, expected change-points and functions of interest are obtained. The flexible modelling of the functional part allows us to recover periodic components suggested in previous works in the GPS example or exceptional year of production (as an irregularity) for the Périgord black truffe production data.

Our method is based on MCMC algorithms (Metropolis–Hastings algorithm and Gibbs sampler). Although these algorithms can take more time than those used in a frequentist approach, our procedure benefits from the Bayesian framework, which results in two important aspects. The first one is that posterior distributions of the parameters are obtained. From these distributions, different quantities can easily be derived as credibility intervals of the means, the change-points and the selected functions, or the probability to have a change-point in a given interval. Note that obtaining such information is a very intricate task in a frequentist framework. The second important aspect is that we can introduce expert knowledge through prior distributions (see Section 5). For example, as shown in the GPS example for which information about potential change-points are available, some change-points are not detected when non-informative priors are used, whereas they are detected when previous knowledge is taken into account. Note that for this field of application, up until recently, the detection of the abrupt changes was done by a visual inspection.

An important issue is the choice of the criterion to estimate the parameters $γ$ and $r$ . As proposed by Muller, Parmigiani, Robert and Rousseau (2004), the criterion used in this work minimizes a loss function which is the sum of the false discovery and false negative, leading to a threshold of $1 / 2$ for the posterior probabilities. In the simulation study, this criterion seems to outperform other strategy based on the posterior mode which leads to more false positive change-points. Moreover, other thresholds can be used to minimize different loss functions (see Muller, Parmigiani, Robert and Rousseau (2004) or Muller, Parmigiani and Rice (2006)). Finally, another way to select the final change-points and functions from the dictionary would be to run the algorithm, say three times, and to take the intersections of the three results, for both the change-points and the bias functions. That would lead to perfect results for most of the groups of three runs from the sensitivity analysis.

To use our procedure, hyper-parameters should be chosen, but the sensitivity analysis shows that the procedure is not over sensitive to these choices.

Finally, this work could be extended in the two following ways: the analysis of multiple series instead of one series, that allows us to improve the estimation of the functional part when this part is shared by all the series as in Picard et al., 2011, and the analysis of time-dependent series.

Acknowledgments

Meili Baragatti, Karine Bertin, Emilie Lebarbier and Cristian Meza are supported by FONDECYT grants 1141256 and 1141258, and the mathamsud 16-MATH-03 SIDRE and SaSMoTiDep 18-MATH-07 projects.

The authors are grateful to F. Le Tacon and J.L. Dupouey for permission to use the agronomic data, which were difficult to collect.

Appendix Integration of the joint posterior distribution

Integrating the joint posterior (2.2) with respect to $β_{γ}$ , we obtain:

\begin{matrix} π (γ, λ_{r}, r, σ^{2} | Y) & \propto (2 π σ^{2})^{- n / 2} (1 + c_{1})^{- d_{γ} / 2} π (γ) π (r) σ^{- 2} \\ \times exp [- \frac{1}{2 σ^{2}} (Y - F_{r} λ_{r})^{'} \{I - \frac{c_{1}}{1 + c_{1}} X_{γ} (X_{γ}^{'} X_{γ})^{- 1} X_{γ}^{'}\} (Y - F_{r} λ_{r})] \\ \times (2 π)^{- d_{r} / 2} {|c_{2} σ^{2} (F_{r}^{'} F_{r})^{- 1}|}^{- 1 / 2} exp [- \frac{1}{2 σ^{2}} λ_{r}^{'} (\frac{F_{r}^{'} F_{r}}{c_{2}} λ_{r})] . \end{matrix}

Integrating with respect to $λ_{r}$ , we obtain:

\begin{matrix} π (γ, r, σ^{2} | Y) & \propto & (2 π σ^{2})^{- \frac{n}{2}} (1 + c_{1})^{- d_{γ} / 2} π (γ) π (r) σ^{- 2} {(\frac{|{(F_{r}^{'} (U_{γ}^{- 1} + \frac{I}{c_{2}}) F_{r})}^{- 1}|}{| c_{2} (F_{r}^{'} F_{r})^{- 1} |})}^{1 / 2} \\ \times exp [- \frac{1}{2 σ^{2}} Y^{'} (U_{γ}^{- 1} - U_{γ}^{- 1} F_{r} {(F_{r}^{'} (U_{γ}^{- 1} + \frac{I}{c_{2}}) F_{r})}^{- 1} F_{r}^{'} U_{γ}^{- 1}) Y] . \end{matrix}

Finally, integrating over

σ^{2}

, we obtain the integrated posterior (3.1).

Table B1:

SegBayes_SP: prior parameters used in the different runs of the Metropolis–Hastings algorithm applied on the particular series with $σ = 0.1$ . The number of iterations is 20 000 with a burn-in of 5 000 iterations for all runs

				Nb of $γ$	Nb of $r$
				comp.	comp.				Selected
	Values	Initial	Initial	changed	changed	Values	Values		functions
	and $c_{2}$	nb of	nb of	at each	at each	of the	of the	Selected	(function of index 1 always
Run	of $c_{1}$	segments	functions	iter.	iter.	$π_{l}$	$η_{j}$	change-points	selected (constant term))
1	50							7, 18, 36	11, 51, 61, 110
2	10							7, 18	61
3	500	3	3	2	2	1/100	1/100	7, 18, 36	11,51,61,110
4		1						7, 18, 36, 49, 50	11, 61, 110
5		3						7, 18, 36	11, 51, 61, 110
6	50	10	3	2	2	1/100	1/100	10, 18, 36	9, 10, 51, 61, 110
7			1					7, 18, 36	11, 51, 61, 110
8			3					7, 18, 36	11, 51, 61, 110
9	50	3	10	3	2	1/100	1/100	7, 18, 37	11, 51, 61, 110
10				1				7, 18, 36	11, 51, 61, 110
11				2				10, 18, 36	9, 10, 51, 61, 110
12	50	3	3	5	2	1/100	1/100	7, 18, 35, 74, 82, 98	11, 37, 51, 61, 110
13					1			7, 18, 35	11, 51, 61, 110
14					2			7, 18, 36	11, 51, 61, 110
15	50	3	3	2	5	1/100	1/100	7, 18, 36, 59, 60	11, 16, 51, 68, 110, 120
16						1/20		10, 18, 36	9, 10, 51, 61, 110
17						1/100		7, 18, 36	11, 51, 61, 110
18	50	3	3	2	2	1/500	1/100	7, 18, 36	11, 51, 61, 110
19							1/20	10, 18, 36	9, 10, 51, 61, 110
20							1/100	7, 18, 36	11, 51, 61, 110
21	50	3	3	2	2	1/100	1/500	7, 18, 36, 59, 60	11, 51, 110

Table B2:

SegBayes_SP: results of four runs of the Metropolis–Hastings algorithm applied on the particular series with $σ = 0.1$ , with the same prior parameters than run 1 in Table 2

	Number of		Selected	Selected
Run	iterations	Burn-in	change-points	functions
22	20 000	5 000	7, 18, 36	1, 11, 51, 61, 110
23	20 000	5 000	7, 18, 36	1, 51, 61, 76, 110
24	20 000	5 000	8, 18, 36	1, 9, 11, 51, 61, 110
25	50 000	10 000	7, 18, 36	1, 11, 51, 61, 110

References

Altamimi

Collilieux

M/’etivier

(2011) ITRF2008: An improved solution of the International Terrestrial Reference Frame. Journal of Geodesy , 85, 457–73.

Barbieri

Berger

(2004) Optimal predictive model selection. The Annals of Statistics , 32, 870–97.

Barry

Hartigan

(2004) A Bayesian analysis for change point problems. Journal of the American Statistical Society , 88, 309–19.

Bertin

Collilieux

Lebarbier

Meza

(2017) Semi-parametric segmentation of multiple series using a DP-Lasso strategy. Journal of Statistical Computation and Simulation , 87, 1255–68.

Bickel

Ritov

Tsybakov

(2009) Simultaneous analysis of lasso and Dantzig selector. The Annals of Statistics , 37, 1705–32.

Bottolo

Richardson

(2010) Evolutionary stochastic search for Bayesian model exploration. Bayesian Analysis , 5, 583–618.

Boys

Henderson

(2004) A Bayesian approach to DNA sequence segmentation. Biometrics , 60, 573–88.

Caussinus

Mestre

(2004) Detection and correction of artificial shifts in climate series. Applied Statistics , 53, 405–25.

Dobigeon

Tourneret

J-Y

Scargle

(2007) Joint segmentation of multivariate astronomical time series: Bayesian sampling with a hierarchical model. IEEE Transactions on Signal Processing , 55, 414–23.

10.

Dong

Fang

Bock

Cheng

Miyazaki

(2002) Anatomy of apparent seasonal variations from GPS-derived site position time series. Journal of Geophysical Research (Solid Earth) , 107, ETG 9-1-ETG 9–16.

11.

Erdman

Emerson

(2008) A fast Bayesian change point analysis for the segmentation of microarray data. Bioinformatics , 24, 2143–48.

12.

Fearnhead

(2006) Exact and efficient Bayesian inference for multiple changepoint problems. Statistics and Computing , 16, 203–13.

13.

George

McCulloch

(1993) Variable selection via Gibbs sampling. Journal of the American Statistical Association , 88, 881–89.

14.

George

McCulloch

(1997) Approaches for Bayesian variable selection. Statistica Sinica , 7, 339–73.

15.

Hannart

Naveau

(2009) Bayesian multiple change points and segmentation: Application to homogenization of climatic series. Water Resources Research , 45, 1944–73.

16.

Harchaoui

Levy-Leduc

(2010) Multiple change-point estimation with a total variation penalty. Journal of the American Statistical Association , 105, 1480–93.

17.

Hardle

Kerkyacharian

Picard

Tsybakov

(1998) Wavelets, approximation, and statistical applications: Vol. 129. Lecture Notes in Statistics . New York, NY: Springer-Verlag.

18.

Harle

Chatelain

Gouy-Pailler

Achard

(2016) Bayesian model for multiple change-points detection in multivariate time series. IEEE Transactions on Signal Processing 64, 4351–62.

19.

King

Altamimi

Boehm

Bos

Dach

Elosegui

Fund

Hernandez-Pajares

Lavallee

Cerveira

(2010) Improved constraints on models of glacial isostatic adjustment: A review of the contribution of ground-based geodetic observations. Surveys in Geophysics , 31, 465–507.

20.

Lavielle

Lebarbier

(2001) An application of MCMC methods for the multiple change-points problem. Signal processing , 81, 39–53.

21.

Le Tacon

(2017) les Truffes: Biologie, ecologie et domestication. AgroParisTech.

22.

Le Tacon

Marcais

Courvoisier

Murat

Montpied

Becker

(2014) Climatic variations explain annual fluctuations in French perigord black true wholesale markets but do not explai the decrease in black true production over the last 48 years. Mycorrhiza , 24, S115–25.

23.

Liu

(1994) The collapsed Gibbs sampler in Bayesian computations with application to a gene regulation problem. Journal of the American Statistical Association , 89, 958–66.

24.

Martinez

Mena

(2014) On a nonparametric change point detection model in Markovian regimes. Bayesian Analysis , 9, 823–58.

25.

Muller

Parmigiani

Robert

Rousseau

(2004) Optimal sample size for multiple testing: The case of gene expression microarrays. Journal of the American Statistical Association , 99, 990–1001.

26.

Muller

Parmigiani

Rice

(2006) FDR and Bayesian multiple comparisons rules. Proceedings of the Valencia/ISBA 8th World Meeting on Bayesian Statistics. URL http://biostats.bepress.com/jhubiostat/paper115

27.

Picard

Lebarbier

Hoebeke

Rigaill

Thiam

Robin

(2011) Joint segmentation, calling and normalization of multiple CGH proles. Biostatistics , 12, 413–28.

28.

R Core Team (2009) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing.

29.

Ray

Altamimi

Collilieux

van Dam

(2008) Anomalous harmonics in the spectra of GPS position estimates. GPS Solutions , 12, 55–64.

30.

Ruggieri

(2013) A Bayesian approach to detecting change points in climatic records. International Journal of Climatology , 33, 520–28.

31.

Scargle

(1982) Studies in astronomical time series analysis. II: Statistical aspects of spectral analysis of unevenly spaced data. The Astrophysical Journal , 263, 835–53.

32.

Smith

Kohn

(1997) Non parametric regression using Bayesian variable selection. Journal of Econometrics , 75, 317–44.

33.

Tai

Xing

(2010) A fast Bayesian change point analysis for the segmentation of microarray data. Bioinformatics , 24, 2143–48.

34.

van Dyk

Park

(2008) Partially collapsed Gibbs samplers: Theory and methods. Journal of the American Statistical Association , 103, 790–96.

35.

Williams

(2003) Offsets in global positioning system time series. Journal of Geophysical Research (Solid Earth) , 108, 2310.

36.

Wyse

Friel

Rue

(2011) Approximate simulation-free Bayesian inference for multiple changepoint models with dependence within segments. Bayesian Analysis , 6, 501–28.

37.

Zellner

(1986) On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In PK Goel and A Zellner eds. Bayesian inference and decision techniques: Essays in honour of Bruno De Finetti, pages 23343. New York: Elsevier Science Publishers, Inc.

A Bayesian approach for the segmentation of series with a functional effect

Abstract

Abstract:

Keywords

1 Introduction

Figure 1:

Example of a GPS coordinate series (series YAR2 studied in this article). The known changes and reported in databases (Altamimi et al. 2011) are indicated by the vertical dotted lines

2.1 Segmentation model with functional part

3.1 Metropolis–Hastings algorithm

3.2 Gibbs sampler algorithm

4 Simulation study

4.1 Simulation design, parameters of the procedures and quality criteria

Figure 2:

Average quality criteria on 100 simulated series with σ = 0.1 , σ = 0.5 , σ = 1 and σ = 1.5 , for SegBayes_SP (semi-parametric model) and SegBayes_P (parametric model)

Average of the estimates σ ̂ on 100 simulated series with σ = 0.1 , σ = 0.5 , σ = 1 and σ = 1.5 , for SegBayes_SP (semi-parametric model) and SegBayes_P (parametric model)

Functions from the dictionary and their corresponding indexes

Result of SegBayes_P on the particular simulated series with σ = 0.1 : the series, the whole true expectation and its estimation, the true positive (TP), false positive (FP) and false negative (FN) change-points are also represented

Results of SegBayes_SP on the particular simulated series with σ = 1 . Top: posterior probabilities for the γ and r components. Bottom: the series, the whole true expectation and its estimation, the true positive (TP), false positive (FP) and false negative (FN) change-points are also represented

5.1 Geodetic data

Figure 7:

Result of the procedure SegBayes_SP on the YAR2 series. Top: posterior probabilities of the change-points and functions. Bottom: Estimated expectation and validated, unreported and missed change-points (based on known equipment changes and malfunctions)

Figure 8:

Result of the procedure SegBayes_SP on the Mexican peso/US dollar exchange rate series. Top: posterior probabilities of the change-points and functions. Bottom: estimated expectation and change-points

Figure 9:

Result of the procedure SegBayes_SP on the Périgord black truffle production dataset. Top: posterior probabilities of the change-points and functions. Bottom: the series and its estimated expectation

Acknowledgments

Appendix Integration of the joint posterior distribution

SegBayes_SP: prior parameters used in the different runs of the Metropolis–Hastings algorithm applied on the particular series with σ = 0.1 . The number of iterations is 20 000 with a burn-in of 5 000 iterations for all runs

SegBayes_SP: results of four runs of the Metropolis–Hastings algorithm applied on the particular series with σ = 0.1 , with the same prior parameters than run 1 in Table 2

References

Average quality criteria on 100 simulated series with $σ = 0.1$ , $σ = 0.5$ , $σ = 1$ and $σ = 1.5$ , for SegBayes_SP (semi-parametric model) and SegBayes_P (parametric model)

Average of the estimates $\hat{σ}$ on 100 simulated series with $σ = 0.1$ , $σ = 0.5$ , $σ = 1$ and $σ = 1.5$ , for SegBayes_SP (semi-parametric model) and SegBayes_P (parametric model)

Result of SegBayes_P on the particular simulated series with $σ = 0.1$ : the series, the whole true expectation and its estimation, the true positive (TP), false positive (FP) and false negative (FN) change-points are also represented

SegBayes_SP: prior parameters used in the different runs of the Metropolis–Hastings algorithm applied on the particular series with $σ = 0.1$ . The number of iterations is 20 000 with a burn-in of 5 000 iterations for all runs

SegBayes_SP: results of four runs of the Metropolis–Hastings algorithm applied on the particular series with $σ = 0.1$ , with the same prior parameters than run 1 in Table 2