Abstract
Time series pattern discovery is of great importance in a large variety of environmental and engineering applications, from supporting predictive models to helping to understand hidden underlying processes. This work develops a multiresolution time series method for extracting patterns in weather records, particular temperature data. The topic is important, as, given a warming climate, morbidity and mortality are expected to rise as heatwave frequency and intensity increase. By analysing summer temperature quantiles at different levels of coarseness, it was found that compounding models can contain a complete description of severe weather events. This new multiresolution quantile approach is developed as an extension of the symbolic aggregate approximation of the temperature time series in which quantiles are computed at every stretch of the piecewise partition. The process is iterated at different scales of the partition, and it was found to be a very useful approach for finding patterns related to both heatwave periods and intensities. The method is successfully tested using real weather records from Brazil (Recife) and the UK (London), and it was found that in both locations heatwave intensity and frequency are increasing at a substantial rate. In addition, it was found that the rate of increase in intensity of the heatwaves is far outstripping the rate of increase in mean summer temperature: by a factor of 2 in Recife and a factor of 6 in London. The approach will be of use to those looking at the impact of future climates on civil engineering, water resources, energy use, agriculture and health care, or those looking for sustained extreme events in any time series.
Introduction
Multiresolution time series analysis [34] decomposes the data into components at different frequencies. The approach is often used to allow ordinary predictive models using smaller time frames to deal with higher frequency components, but automatically larger time steps for the lower frequencies. Multiresolution time series analysis also helps noise removal processes and data compression [30].
This paper proposes a time series decomposition in various time-steps, thereby generating new data over different scales from a common source. As the proposal focuses on summer temperature time series, the desire is to better explain the extremes of the historical series in comparison to only using mean conditioned models. The work proposes a multiresolution quantile (MRQ) approach, computing the quantiles for each level of resolution in the time series.
In order to represent the warmer temperatures, the quantiles under analysis need to be focused on both upper and lower quantiles. On one side, quantile regressions at upper values gather how weather interacts with the near maximum temperatures during the day. On the other side, analysing lower quantiles provides insights about when these values are relatively high during the night and the potential factors affecting these circumstances. This is key, as it is known that the lack of a diurnal cycle was one of the reasons for the high mortality during the Paris heatwave of 2003 [18,31]. Short distances between upper and lower quantiles are of key importance in establishing criteria regarding the existence of heatwave events, conditioned to steadily higher minimum temperatures [14]. The analyses of previous heatwaves are of particular interest for those looking to create future weather time series when looking at ventilation and overheating issues in the built environment [10,12,17], drought periods in agriculture [36], the management of water resources [3,8] and energy systems [11,13].
Various methods for the multiresolution analysis of time series are discussed in the literature. The discrete wavelet transform (DWT) is a natural way to reach such a decomposition and is useful in indexing time series [7]. However, different alternatives have been proposed for those looking for improved efficacy. For example, those based on fuzzy inductive reasoning [6] decompose the time series into a trend series and another complementary series describing the deviation from this trend. Alternatives based on symbolic aggregate approximation (SAX) have also been successfully developed [5,25]. The main advantage of SAX based methods is that they provide a suitable avenue for pattern recognition models associated with time series [28].
There are a few works related to multiresolution time series with respect to climate or weather time series, including solar irradiance and climate reconstructions [27] and a hybrid approach to removing noise from a climate time series [19]. However, the work introduced in this paper is closer to the hierarchy-of-clusters approach proposed for use in multiresolution image analysis [22]; being a variation of 1d-SAX [23] but extending the more common SAX methodology to deal with time series quantiles. MRQ starts with the typical piecewise aggregate approximation (PAA) to divide the series into segments of equal length [16]. After that, the quantiles for each segment of data are saved for further analysis instead of only the average values, and SAX run on these quantities. Finally the differences between upper and lower quantiles at each resolution level is computed by a lower-bounding distance measure [20]. Possible heatwave events could then be detected based on the persistence of minimum distances at various time-based resolution levels.
Indexing and mining time series for multiresolution purposes
Indexing time series is traditionally used as a way to efficiently store a large temporal database [15]. However, its use has been expanded to allow the extraction of patterns in time series. While common indexing and mining models are based on measures formed from the average, the method proposed here is quantile based: this is to allow for better treatment of the higher temperatures.
Multiresolution based on average values
Wavelets are mathematical functions that represent data or other functions in terms of the averages and differences to a prototype function [2]. An interesting wavelet property w.r.t. multiresolution approaches is that the first coefficients forming a wavelet expression contain an overall approximation of the time series under study, while additional coefficients take into account data characteristics in greater detail [37]. This endows the wavelets with the properties required for a suitable methodology for investigating time series at various resolutions. Haar’s Discrete wavelet transform [7] is a widely used case in which the prototype function follows an orthonormal system, at discrete times, for the space of square-integrable functions on the unit interval

Example of a combination of various wavelet resolution levels for a time series.
Haar’s DWT’s are ultimately made up by step functions. Approaches based on piecewise aggregate approximation (PAA), which divides the time series dataset into equally spaced segments, provide similar results to Haar’s DWT in terms of time series decomposition. Consequently, DWT results and PAA based outcomes are equivalents to computing distances between time series [16]. The efficient results found by using PAA based methods, together with their capability to be straightforwardly part of further data mining processes, represent a key advantage for the work presented here.
The symbolic aggregate approximation of time series (SAX) [20] represents the time series as a sequence of symbols. It was primarily developed to reduce the dimensionality of a numerical series into a short chain of characters. However, it has been found useful for data mining tasks such as: indexing [32], clustering [1,24], and classification [38]. SAX is based on three steps:
Divide a time series into segments of length L.
Compute the average of the time series on each segment.
Represent the average values as a symbol from an alphabet of size N.
The time series division is based on a previous PAA phase. SAX is based on the assumption that time series values follow a Gaussian distribution for each of the segments into which PAA divided the series. The conversion of the average values into a symbol makes use of (
This alphabetic approach is then useful in further analyses using methods such as hashing [35], variations of Markov models [4,21], and suffix tree approaches [29]. In addition, it automatically has associated a sliding windows approach in which every time-frame is encoded by a letter. Figure 2 represents the output of a single use of SAX process for a temperatures time series from London (April to September, 1989).

Example of the SAX conversion process for a time series with length 549, w = 9 and resolution 4 (a, b, c, d). Temperature variations from the long term baseline.
The SAX variation known as 1d-SAX [23] extends the usual alphabetic symbols to a system able to contain information about the average and the trend of the series within a segment. A natural extension of 1d-SAX takes median values on each interval, and provides similar results to linear regression. The new MRQ method proposed generalizes this approach by computing upper and lower quantiles at every segment of the PAA partition. The PAA partition becomes a bi-level PAA-quantile based partition (PAA-Q), and we have termed the modified SAX approach SAX-Quantile, or SAX-Q for short. PAA-Q is defined in a similar way to the single PAA for a time series of length n.
This multiresolution approach facilitates working at the different coarseness of PAA-Q used in SAX-Q to allow for posterior analysis [26]. MRQ focuses specifically on quantile information for discovering patterns, in our case of steadily high summer temperatures at upper and lower quantiles i.e. heatwaves. Having enough variety in the coarseness levels of the PAA-Q partition, and computing upper and lower quantiles at each resolution level, we believe allows heatwave events to be identified successfully.
First, with larger PAA-Q segments, the series is filtered to allow the rapid identification of patterns associated with high temperatures even at lower quantiles. This is easily achieved by splitting the series using a binary SAX codification of values lower and higher than a certain threshold. For the set of segments of interest (those encoded as high temperatures) we approach a finer PAA-Q resolution. The overall MRQ process is:
Set PAA to coarsest scale with segment:
Compute upper and lower quantiles:
Approach SAX-Q coding as ‘1’ those segments encoded with higher alphabet values for both lower and upper quantiles; ‘0’ otherwise
Target the set of PAA-Q encoded as ‘1’
Define ‘runs’ as sub-series segment of consecutive PAA-Q stretches coded as ‘1’
Set PAA-Q for the runs to a finer scale:
Go to 1. (a)
Two meteorological databases in Brazil and UK have been analysed. Both of them contain 50 years of data taken at 8 hour intervals during the period 1961 to 2010.
Brazil summer database
Average RMSE in the PAA approximation of the Recife 1961–2010 8 hourly temperature time series at different segment sizes
Average RMSE in the PAA approximation of the Recife 1961–2010 8 hourly temperature time series at different segment sizes
The Brazil weather database was collected at Curado weather station in Recife (Pernambuco State, Brazil). For each one of the 50 years analysed, the months of October to March (of the following year) were selected to represent the summer period. The average temperature for this period is 27.59°C with an associated standard error of 1.71. The choice of the number of segments to approach the first PAA basis partition takes into account a suitable length of every segment and the error related to the approximation (see Table 1). Thus, for a segment length of size 30 (10 days) the average root mean square error over (RMSE) the 50 years is only
Figure 3 shows the performance of SAX-Q for a given PAA-Q partition and one selected summer. The summer of 1989–1990 is chosen for consistency with the choice of using the summer of 1989 for the UK analysis. Upper and lower quantiles, together with the median (quantile 50%) are shown in Fig. 3.

SAX-Q conversion process for Recife’s temperature time series with length 540, w = 18, resolution 4 (a, b, c, d), and quantiles 0.05 and 0.95. Temperature variations in summer 1989–1990 are given w.r.t. the baseline period of 1961–2010.
After obtaining the output of the MRQ process for Recife’s summer temperatures, it is possible to identify the warmest temperatures with the beginning and the end of the summer period. Figure 4 shows the histogram of the normalised probabilities regarding the frequencies of 2-level SAX-Q in the observed time series. This shows that at the beginning of the summer there is often a brief period of high temperatures, and from the end of December and until February also high temperatures.

Probability of warmest temperatures during Recife summers (1961–2010).
Figure 5(a) shows how the frequency of periods of high temperature increases in the latter years of the study period. This figure uses a colour scale in which light red represents no heatwaves that year, red is 1 period of high temperatures, and dark red is 2 or more periods of high temperatures. It can be seen how the ratio of years with heatwaves to years without is 10/13 in the first half of data. However, post year 1985, the situation is completely different, with two or more heatwaves in some years and the ratio of years with heatwaves to years without increasing to 14/7.

Change in frequency and intensity of heatwave candidates. Recife 1961–2005. (a) Frequency of warmer temperatures. (b) Trend of average temperatures associated with heatwave events.
Figure 5(b) complements the information given in Fig. 5(a) by showing how the mean temperature during each heatwave changes over the 50 years of the study. It is clear that there is an increasing trend in the mean, so not only are the number of heatwaves increasing, their magnitude are also increasing, with the mean heatwave temperature being 1.5°C higher in 2005 than at the start of the data. Applying linear regression suggests an increase in heatwave magnitude of 0.4°C per decade; this should be compared to the overall temperature trend during the summer period which is less than 0.2°C per decade.
The UK weather data was collected at the Heathrow weather station (London, UK) [33]. For each one of the 50 years analysed, the months of April to September were selected to represent the summer period. The average temperature for this period is 15.78°C with an associated standard error of 4.71. The temperature is clearly below Recife’s values. However, the proposed MRQ process automatically adapts itself to locally set conditions for screening heatwaves. The choice of the number of segments to approach the first PAA basis partition takes into account a suitable length for every segment and the error related to the approximation (see Table 2). Thus, for a segment length of size 61 (20 days) the average RMSE over the 50 years is
Average RMSE in PAA approximation of time series at different segment sizes. London 1961–2010
Average RMSE in PAA approximation of time series at different segment sizes. London 1961–2010
Figure 6 shows the performance of SAX-Q for a given PAA-Q partition and the summer of 1989. The year 1989 was chosen as it represents a moderately warm year and is used as a standard warm summer by the Chartered Institution of Building Services Engineers (CIBSE, UK) [9].

SAX-Q conversion process for London’s temperatures time series with length 549, w = 9, resolution 4 (a, b, c, d), and quantiles 0.05 and 0.95. Temperature variations in summer 1989 w.r.t. the baseline period 1961–2010.
After obtaining the output of the MRQ process for London it is found that the warmest temperatures typically occur during the last week of July and the first days of August. Figure 7 shows the histogram of the normalised probabilities of 2-level SAX-Q in the observed time series.

Probability of warmest temperatures during London summers (1961–2010).
Figure 8(a) shows how the heatwave frequency changes over the study period. In the first 25 years the ratio of years with heatwaves to years without is 18/7. However, post year 25 (1986), the ratio of years with heatwaves to years without increases to 21/4. This increase in heatwave frequency is even higher than that observed for the previous case-study of Recife.
Figure 8(b) shows how the mean temperature associated with each heatwave has increased with the mean heatwave temperature being about 4°C higher towards the end of the series that near the beginning. Applying linear regression suggests an increase in heatwave magnitude of near 1°C per decade, this should be compared to the overall temperature trend during the summer period which is less than 0.15°C per decade.

Change in frequency and intensity of heatwave candidates. London 1961–2010. (a) Frequency of warmer temperatures. (b) Trend of average temperatures associated with heatwave events.
It is interesting to note the difference in the values of the candidate heatwave temperatures identified by the process for Recife and London. While the candidate temperatures for heatwaves in Recife have a minimum value over 27°C, in London 18°C at night is enough to indicate the temperature is potentially part of a heatwave – a 9°C difference between the two cities. However, the average temperature of the selected dataset in Recife is around 29°C, compared with 24°C in London, only a 5°C difference.
The proposed method is blind to the temperatures themselves and suggests a possible heatwave based only on the variability and the size of the variation, thus it is useful in that it is universal. The corollary however is that if given an unsuitable time series it might identify heatwaves where the maximum temperature reached is still below any common definition of a heatwave. In a practicable application there would hence need to be the use of an heuristic for rejecting unsuitable time series or locations, or one applied after identifying the heatwave; for example that the peak temperature must be at least 28°C. However, this constant would depend greatly on the location: 28°C is a high temperature in London for example, but not in Saudi Arabia.
The paper presents a new method based on a fundamental modification of the SAX approach for time series, called SAX-Q, which focuses on the symbolic approximation of the quantiles instead of using the mean. Its iterative use leads to an automatic detection process for periods of interest w.r.t. time series extremes.
Multi-resolution analysis appears to be an ideal tool to uncover relationships at different temporal scales in weather data that are normally hidden by background noise. A posterior analysis of a multi-resolution SAX-Q has been carried out the summer temperature record in Brazil and the UK and heatwaves identified in these two, very different, climates; as the methodology automatically adapts itself to local conditions. Fundamental questions about when, how often and the intensity of high temperatures have been answered for both locations. Most importantly, in both locations heatwave intensity and frequency are increasing at a substantial rate. In addition, it was found that the rate of increase in intensity of the heatwaves is far outstripping the rate of increase in mean summer temperature: by a factor of 2 in Recife and a factor of 6 in London.
An interesting extension to this work would come from the relaxation of the PAA-Q partition proposed, by instead using a dynamical basis for segments or change points for the initial SAX-Q partition of the time series. This might provide better insights about the length of the hottest periods lying in the data.
Footnotes
Acknowledgements
This research has been performed under the COLBE (The Creation of Localized Current and Future Weather for the Built Environment) project funded by the UK’s Engineering and Physical Sciences Research Council (EP/M021890/1).
