Abstract
This study attempts to explore the influence of observations in a time series or a discrete time signal. The goal is to detect abnormal observations from a frequency domain point of view, while the most of relevant studies have been done from a time domain point of view. The concept of the influence function in the field of robust statistics is borrowed to identify influential observations in a time series. An empirical version of the influence function on the discrete Fourier transform of a time series is designed and subsequently a statistic is proposed to identify influential observations of a time series from the frequency domain point of view. Though the proposed statistic is simple enough to be calculated with simple arithmetic operations, case studies show that the proposed method is capable of identifying influential or abnormal observations of a time series. By identifying influential or abnormal observations, we would be able to gain a better understanding of the nature of a time series and to control possible future influential observations.
Keywords
Introduction
When dealing with a time series or a discrete time signal, it is important to be aware that influential observations may exist. This study attempts to explore the influence of observations in a time series. In this study, in order to achieve the goal of detecting influential observations, we tried it from the frequency domain point of view, though most of the related studies have been done from the time domain point of view. Most of the previous studies have considered this issue mainly on a time domain but this study attempts to identify influential observations affecting the frequency with the large magnitude in the discrete Fourier transform (DFT) of a time series or a discrete time signal.
Peña (1990) studied how to identify influential observations in univariate autoregressive integrated moving average (ARIMA) time series models and presented influence statistics based on the Mahalanobis distance. Lefrançois (1991) presented a method to obtain various measures of the influence for the autocorrelation functions as well as thresholds for declaring an observation over-influential. Bruce and Martin (1989) proposed diagnostics by measuring the change in the parametric estimates of autoregressive integrated moving average models fitted for time series formed by deleting observations from the whole data. Gupta et al. (2013) provided a comprehensive and structured overview of large-scale and interesting outlier definitions for different types of temporal data. Shittu and Shangodoyin (2008) considered the identification of outliers in a frequency domain using the spectral method. Most recently, Ren et al. (2019) proposed a novel algorithm based on spectral residual. In particular, an outlier detection procedure has been proposed by Chen and Liu (1993) for detecting several outlier types in autoregressive integrated moving average time series models such as ‘innovative outliers’, ‘additive outliers’, ‘level shifts’ and ‘temporal and seasonal changes’.
The idea in this study is rooted in the concept of the influential function in the field of robust statistics. An empirical version of the influence function for the discrete Fourier transform of a time series is driven and subsequently a statistic is proposed to identify influential observations of a time series from the frequency domain point of view. The proposed method, which is based on the influence function, is rather straightforward in identifying outliers or influential observations. The contents of this paper are as follows: (1) show how to measure the influence of an observation for the DFT; (2) propose a way to test whether an observation is influential or not by referring to the
Two data sets were considered, one for fine dust levels and the other for retail sales. Both the proposed method and the widely used method by Chen and Liu (1993) as the reference method are applied to identify influential or abnormal observations of the example data sets. We’d like to argue that the method we propose has the same level of performance as the reference method, but is easier to use than the reference method.
Methodology
Influence function
Let
where
The influence function signifies the effect of an infinitesimal contamination at the point
The empirical version of influence function can be obtained by replacing
where
For example, the influence function of the population mean
and it’s corresponding empirical influence function (EIF) is
where
The discrete Fourier transform transforms a sequence of
for
In fact, the discrete Fourier transform
for measuring the influence of an observation
On the other hand, consider the linear process of the variable
where
where
Based on Eqs (5) and (6), consider the following two expressions;
and
If the quantity in Eq. (8) is significantly different from that in Eq. (7), it could be said that
Given a time series Calculate an estimate For any
then identify
Two examples, (i) a data set of fine dust levels in a city in Asia and (ii) the retail sales data from Hillmer et al. (1982), were selected for empirical studies. We aim to confirm the usefulness of the proposed method in identifying influential observations through data analysis. Having de-trended and centralized the observations, the periodogram is used as an estimator for a spectral density. R-package ‘tsoutliers’ implements the procedure according to the approach described in Chen and Liu (1993) for automatic detection of outliers in time series.
A fine dust data
A fine dust data set of a city in Asia, from April 2008 to November 2017, is plotted in Fig. 1. By the spectral analysis, the periodogram indicates that the dominant frequency is about 0.083
The tsoutliers returns that the observation numbered 21 (additive), 30 (additive), 36 (additive), 45 (temporary change), 57 (level shifts), 61 (level shifts), 80 (additive) and 82 (level shifts) are found to be outliers. The tsoutliers was executed with a critical value of 3.0 as proposed by Chen and Liu (1993). The tsoutliers are known to identify observations whose fitted values differ significantly from observations as outliers or influential observations.
Although the results of both methods have several things in common, there are also some notable differences. For example, the observation 9 is identified as an influential observation by the proposed method, but not by tsoutliers. In fact, the observation 9 has the second largest value. The numbers 110 and 112 observations have relatively low values and the proposed method indicates them as possibly influential observations, while tsoutliers does not.
The time series of the fine dust levels. The influential observations are marked by numbers and the observations detected by ‘tsoutliers’ marked as solid circles.
Influence statistics for the fine dust data when 
The same data analyzed by Chen and Liu (1993) is considered. The data set includes the monthly retail sales of various stores from January 1967 to September 1979. This data was originally discussed in Hilmer et al. (1982). The plot of the monthly retail sales is in Fig. 3. The periodogram indicates that the dominant frequency is about 0.166
On the other hand, the proposed method identifies all December sales and also some January and February sales in later years as outliers or influential observations (Figs 3 and 4). Unlike previous years, January sales since 1974 showed a relatively large decline compared to December.
The time series of the sales data. The influential observations are marked by numbers and the observations detected by ‘tsoutliers’ marked as solid circles.
Influence statistics for the sales data when 
A method for finding outliers or influential observations that may exist in time series data are designed from a frequency domain point of view. The proposed method is designed to find the observations affecting the dominant frequency. The case studies show that the proposed method has the same performance as the well-known method, but is easy to use. The proposed method is expected to provide additional insights for identifying anomalous observations.
