## Abstract

In the current context of global infectious disease risks, a better understanding of the dynamics of major epidemics is urgently needed. Time-series analysis has appeared as an interesting approach to explore the dynamics of numerous diseases. Classical time-series methods can only be used for stationary time-series (in which the statistical properties do not vary with time). However, epidemiological time-series are typically noisy, complex and strongly non-stationary. Given this specific nature, *wavelet analysis* appears particularly attractive because it is well suited to the analysis of non-stationary signals. Here, we review the basic properties of the wavelet approach as an appropriate and elegant method for time-series analysis in epidemiological studies. The wavelet decomposition offers several advantages that are discussed in this paper based on epidemiological examples. In particular, the wavelet approach permits analysis of transient relationships between two signals and is especially suitable for gradual change in force by exogenous variables.

## 1. Introduction

Globalization has altered human infectious disease risks due to land use changes, intensification of agriculture, loss of biodiversity, climate change, population increase, urbanization, and increased travel. Emerging and re-emerging infectious diseases are nowadays an increasing public health challenge. Disease emergences reveal the complex dynamical relationships between humans, pathogens and the global environment (Morens *et al*. 2004; Woolhouse *et al*. 2005). Recently, new concerns about global warming have yielded numerous studies about the role of climate variability and climate change in interannual disease patterns (McCarthy *et al*. 2001; Cazelles *et al*. 2005; Patz *et al*. 2005; McMichael *et al*. 2006).

Within this context, developing effective public health policy requires integrating data that are diverse and highly variable in quality. Mathematical models of disease transmission can provide a framework for improving our understanding of the complex dynamics of infectious disease epidemics (Anderson & May 1991). This is crucial in attempts to design effective intervention and control strategies. Since the early 1900s, sophisticated mathematical and statistical methods have been used (Ross 1911; Greenwood 1924). Nevertheless, the lack of appropriate datasets has impeded the validation of mechanistic mathematical models. More recently, time-series methods have appeared as an interesting alternative and have been used to explore the dynamics of numerous epidemics (e.g. Helfenstein 1986; Catalano & Serxner 1987).

Time-series are often used in short-term analyses of air pollution and human health (Dominici *et al*. 2002; Bell *et al*. 2004). To take into account the time dependences, trends and cycles in epidemiological time-series, spectral analysis has also been used. For instance, Bishop (1977) has presented a brief review illustrated by epidemiological examples. Bloomfield (1976) has investigated the use of Fourier analysis as a tool for determining whether aggravation of asthma symptoms is related to daily minimum temperature and/or to atmospheric pollutants. Spectral techniques have also been used to remove seasonal components before using regression models for analysing the relation between levels of particulate air pollution and daily mortality (Schwartz 1993; Dominici *et al*. 2002). Pascual and co-authors have used spectral techniques coupled with other approaches (singular spectrum analysis and non-parametric regression models) to study the influence of large-scale climatic oscillations on cholera epidemics (Pascual *et al*. 2000; Rodó *et al*. 2002).

These classical techniques can only be used for time-series in which the statistical properties do not vary with time, i.e. are stationary. However, epidemiological time-series are typically noisy, complex and strongly non-stationary. Long-term changes in climate, human demography and/or social features of human populations have generated non-stationarities in numerous epidemics datasets as underlined by the analyses of measles and whooping cough recordings (Duncan *et al*. 1996; Rohani *et al*. 1999). Recently, a robust relationship between El Niño oscillations and cholera prevalence in Bangladesh has been clearly demonstrated though the association was transient over time (Rodó *et al*. 2002).

To overcome the problems of analysing non-stationary time-series, it has recently been proposed to apply wavelet analysis to characterize them and also to estimate dependencies among non-stationary signals. Wavelet analysis performs a time-scale decomposition of the signal, which means the estimation of its spectral characteristics as a function of time (Lau & Weng 1995; Torrence & Compo 1998). This approach reveals how the different scales (i.e. the periodic components) of the time-series change over time. Cross-wavelets and wavelet coherency generalize these methods, allowing the analyses of scale dependencies between the two signals.

Wavelet analysis appears particularly attractive given the specific nature of epidemiological and environmental time-series and the relationships between them. In 2001, Grenfell and collaborators published a paper that analysed synchrony patterns of measles in the UK (Grenfell *et al*. 2001). They used wavelets to show for the first time a progressive increase in epidemic phase with time, accompanying the increasing trend in vaccination rates. Since this work, several applications of wavelet analysis have been published. Broutin *et al*. (2005) performed a wavelet-based analysis of pertussis time-series in 12 countries to detect and quantify periodicity and synchrony between them. They showed a clear 3–4-year cycle in all countries, but the main finding was that this periodicity was transient and no global pattern in the effect of vaccination on pertussis dynamics emerged. Some of these papers were based on the characterization of time-series and the analysis of their possible association with environmental signals. Cazelles *et al*. (2005) used this approach to demonstrate a highly significant but discontinuous association between El Niño, precipitation and dengue epidemics in Thailand. This transient association has important consequences for the dynamics of this epidemic: when association with El Niño is strong, high synchrony of dengue epidemics over Thailand is observed. When this association is absent, the seasonal dynamics become dominant and the synchrony initiated in Bangkok collapses. Chaves & Pascual (2006) recently described the oscillating dynamics of cutaneous leishmaniasis incidence in Costa Rica using several methods (including wavelets) and provide evidence for their association to climate variability. Constantin de Magny *et al*. (2006) documented an association between cholera incidence in Ghana and some climatic proxies (local precipitations and surface temperature) in the 1980s. In a subsequent work, they also demonstrated a significant synchrony of cholera epidemics across countries of the Gulf of Guinea, synchrony that would be driven by the association between global and local climatic forcing (Constantin de Magny *et al*. submitted). Researchers have used the wavelet approach to compare the frequency features of simulated and observed data (e.g. Koelle & Pascual 2004). Wavelet analysis of epidemiological time-series has also been compared with other classical spectral techniques (e.g. José & Bishop 2003). Other analyses investigated the phenomenon of population synchrony where wavelets are employed to extract the phase of the time-series (e.g. Rohani *et al*. 2003 or Xia *et al*. 2004).

In the following, we present the basic ideas of the wavelet analysis and its key features illustrated with both synthetic and observed epidemiological examples. Our main goal is to emphasize the advantages of these techniques in the context of time-series analyses in epidemiology.

## 2. Wavelet analysis

Although Fourier analysis is well suited to quantifying constant periodic components in a time-series, it is not able to characterize signals whose frequency content changes with time. On the other hand, a Fourier decomposition may determine all the spectral components embedded in a signal and does not provide any information about when they are present. To overcome this problem, several solutions have been developed to simultaneously decompose a signal as a function of both time *t* and frequency *f* (period or scale *a*).

Gabor (1946) introduced a windowed Fourier decomposition to quantify the time–frequency content of signals (figure 1). The time–frequency localization of this approach is, however, inefficient because the frequency resolution is the same for all the frequencies. A transient (with higher frequencies) needs a high time resolution to be well localized in time (figure 1). In contrast, a low-frequency structure might need a small time resolution (figure 1).

The wavelet transform decomposes a signal using functions (*wavelets*) that narrow when high-frequency features are present and widen on low-frequency structures (Daubechies 1992; Lau & Weng 1995). This decomposition yields a good localization in both time and frequency (figure 1), which is well suited for investigating the temporal evolution of aperiodic and transient signals. Indeed, wavelet analysis is the time–frequency decomposition with the optimal trade-off between time and frequency resolution (Lau & Weng 1995; Mallat 1998).

### 2.1 Wavelet transform

The wavelet transform of a signal *x*(*t*) (a time-series) is defined as(2.1)where ‘^{*}’ denotes the complex conjugate form. In the definition, parameters *a* and *τ* denote the dilation (scale factor) and translation (time shift), respectively. The wavelet transform can be thought as a cross-correlation of a signal *x*(*t*) with a set of wavelets *ψ*((*t*−*τ*)/*a*) of various ‘widths’ or scales *a*, at different time position *τ* (figure 2*a*). The wavelet coefficients, *W*_{x}(*a*, *τ*), represent thus the contribution of the scales *a* at different time positions *τ*. In equation (2.1), the factor normalizes the wavelets so that they have unit variance and hence are comparable for all scales *a*.

Note that the choice of the wavelet function *ψ*(*t*) is not arbitrary. This function is normalized to have unitary variance and it verifies . The wavelet decomposition is therefore a linear representation of the signal where the variance is preserved (Daubechies 1992). This implies that the original signal can be recovered by means of the inverse wavelet transform,(2.2)where and denotes the Fourier transform of *ψ*(*t*).

The wavelet transform appears basically as a linear filter whose response function is given by the wavelet function. By means of the inverse transform, the original signal can be recovered by integrating over all scales *a* and locations *τ*, respectively. Nevertheless, one can also limit the integration over a chosen range of scales, *a*_{1} to *a*_{2}, to perform a band-pass filtering of the original time-series in this chosen range of scales.

### 2.2 Different wavelets

There are several considerations in making the choice of a wavelet, for example, real versus complex wavelets, continuous or discrete wavelets, orthogonal versus redundant decompositions. Briefly, the continuous wavelets often yield a redundant decomposition (the information extracted from a given scale band slightly overlaps that extracted from neighbour scales) but they are more robust to noise as compared with other decomposition schemes. Discrete wavelets have the advantage of fast implementation but generally the number of scales and the time invariant property (a filter is time invariant if shifting the input in time correspondingly shifts the output) strongly depend on the data length. If quantitative information about phase interactions between two time-series is required, continuous and complex wavelets provide the best choice (further details can be found in Mallat 1998). However, all the wavelets share a general feature: low oscillations have good frequency and poor time resolution, whereas fast oscillations have good time resolution but a lower frequency resolution.

One particular complex continuous wavelet, the Morlet wavelet, is defined as(2.3)This wavelet is the product of a complex sinusoidal exp(−i2*ω*_{0}*t*) by a Gaussian envelope (exp(−*t*^{2}/2)) where *ω*_{0} is the central angular frequency of the wavelet. The term *π*^{−1/4} is a normalization factor to ensure unit variance. For our applications we have chosen this wavelet.

For the Morlet wavelet, the relation between frequencies and wavelet scales is given by . When *ω*_{0}≈2*π*, the wavelet scale *a* is inversely related to the frequency, i.e. *f*≈1/*a*. This greatly simplifies the interpretation of the wavelet analysis and one can replace, on all equations, the scale *a* by the period 1/*f* (or wavelength).

### 2.3 Wavelet power spectrum

In some sense, the wavelet transform can be regarded as a generalization of the Fourier transform and by analogy with spectral approaches, one can compute the *local wavelet power spectrum* by . The Fourier spectrum of a signal can be compared with the global wavelet power spectrum which is defined as the averaged energy (the averaged variance) contained in all wavelet coefficients of the same scale (period) *a*.(2.4)with the variance of the time-series *x* and *T* the duration of the time-series. Another interesting computation is the mean variance at each time location, obtained by averaging the scale components,(2.5)

Figure 2 attempts to visualize the underlying processes associated with the computation of the wavelet transform and the wavelet power spectrum. In figure 2*a*, Morlet wavelets of scales (periods) *a*=2 and 4-year centred at two different time locations *τ*_{i} are shown superimposed on an epidemiological time-series. The time-series used is the number of infectious cases generated by a susceptible–exposed–infectious–recovered model (SEIR) described in caption of figure 2. In the case of good matching between the time-series of the infectious *x*(*t*) and the wavelets *ψ*((*t*−*τ*)/*a*), as shown for *a*=2-year at location *τ*_{1}=4.2 year, the integral of the product of the time-series with this wavelet (2.1) produces a large value for the wavelet transform, and then a large value for the wavelet power spectrum *S*_{x}(*a*, *τ*_{1}) at this position *τ*_{1} (figure 2*b*). When the matching is low, as for the Morlet wavelet with *a*=4-year at location *τ*_{2} (figure 1*a*), *S*_{x}(*a*, *τ*_{2}) takes low values (figure 2*b*). By moving the wavelet along the time-series (by increasing the *τ* parameter), structures relating to a specific scale (period) *a* can be identified by high value of *S*_{x}(*a*, *τ*). This process is repeated over continuous range of *a* and *τ* for identifying all the coherent structures within the time-series producing a two-dimensional surface of *S*_{x}(*a*, *τ*) (figure 2*c*). In this example, with the wavelet power spectrum, we are able to recover the two dominant periodic components of the SEIR model with periods 1- and 2-year in accordance with the classical approaches (Aron & Schwartz 1984).

### 2.4 Wavelet coherency and phase difference

In many applications, it is desirable to quantify statistical relationships between two non-stationary signals. In Fourier analysis, the coherency is used to determine the association between two signals, *x*(*t*) and *y*(*t*). The coherence function is a direct measure of the correlation between the spectra of two time-series (Chatfield 1989). To quantify the relationships between two non-stationary signals, the following quantities can be computed: the *wavelet cross-spectrum* and the *wavelet coherence*.

The wavelet cross-spectrum is given by with ‘^{*}’ denoting the complex conjugate. As in the Fourier spectral approaches, the wavelet coherency is defined as the cross-spectrum normalized by the spectrum of each signal,(2.6)where ‘〈 〉’ denotes a smoothing operator in both time and scale. The smoothing can be obtained by a convolution with a constant-length window function both in the time and scale axis: denotes a smoothing operation with the weight function *U*_{δ,Δ}(*α*,*t*) that satisfies (Chatfield 1989). The values of *R*_{x,y}(*a*, *τ*) are thus bounded by 0≤*R*_{x,y}(*a*, *τ*)≤1. The wavelet coherency is equal to 1 when there is a perfect linear relation at particular time and scale between the two signals, and equal to 0 if *x*(*t*) and *y*(*t*) are independent. The advantage of these ‘wavelet-based’ quantities is that they may vary in time and can detect transient associations between studied time-series (Liu 1994).

As with the Morlet wavelet the *W*_{x}(*a*, *τ*) is a complex number, one can write *W*_{x}(*a*, *τ*) in terms of its phase *ϕ*_{x}(*a*, *τ*) and modulus ‖*W*_{x}(*a*, *τ*)‖ (Le Van Quyen *et al*. 2001). The local phase of the Morlet wavelet transform is proportional to the ratio between the imaginary part () and the real part () of the wavelet transform,(2.7)The phase of a given time-series *x*(*t*) can be viewed as the position in the pseudo-cycle of the series and it is parameterized in radian ranging from −*π* to *π*. The phases can then be useful to characterize possible phase relationships between *x*(*t*) and *y*(*t*) by computing the phase difference *ϕ*_{x,y}(*a*, *τ*)=*ϕ*_{x}(*a*, *τ*)−*ϕ*_{y}(*a*, *τ*) or(2.8)A unimodal distribution of the phase difference (for the chosen range of scales or periods) indicates that there is a preferred value of *ϕ*_{x,y}(*a*, *τ*) and thus a statistical tendency for the two time-series to be phase locked. Conversely, the lack of association between the phase of *x*(*t*) and *y*(*t*) is characterized by a broad and uniform distribution. To quantify the spread of the phase difference distribution, one can use circular statistics or quantities derived from the Shannon entropy (Pikovsky *et al*. 2001; Cazelles & Stone 2003).

### 2.5 Assessment of statistical significance

As with other time-series methods, it is crucial to asses the statistical significance of the patterns exhibited by the wavelet approach. To this end, bootstrap methods have been used to quantify the statistical significance of the computed patterns. The idea is to construct, from observed time-series, control datasets, which share with the original series some properties but are constructed under the following null hypothesis: *the variability of the observed time-series or the association between two time-series is no different to that expected from a purely random process*. The construction of our control datasets was performed by classical bootstrap (Efron & Tibshirani 1993), but other resampling schemes are possible (Cazelles & Stone 2003). To test whether the raw time-series is inconsistent with the null hypothesis, we have computed the wavelet transform and related quantities for each time-series of the control set. Then we can compare the original values computed from raw series with the distribution of the same quantities under the null hypothesis, extracting, for instance, the 99th or the 95th quantiles of these distributions.

Epidemiological and environmental time-series are very often short and noisy. The values of the wavelet transform are generally corrupted as the wavelet approaches the edges of the time-series, creating a boundary effect. Further, the affected region increases in extent as the scale (period) parameter *a* increases. This region is known as the cone of influence (Torrence & Compo 1998) and the spectral information within this cone is likely to be less accurate.

### 2.6 Available wavelet toolboxes

Several wavelet toolboxes are catalogued on the Amara's wavelet page (www.amara.com/current/wavelet.html). A general time–frequency toolbox developed for GNU Octave and Matlab is available at tftb.nongnu.org/. Another toolbox widely used in epidemiology and geosciences is that of Torrence & Compo (1998) (atoc.colorado.edu/research/wavelets/). Nevertheless, these toolboxes only consider univariate time-series analysis. Other collections of functions have been recently developed to explore bivariate time-series (including wavelet coherence) in geosciences: Maraun & Kurths (2004) propose a set of R functions (www.agnld.uni-potsdam.de/∼maraun/wavelets/) whereas Grinsted *et al*. (2004) have extended the Matlab functions of Torrence & Compo (www.pol.ac.uk/home/research/waveletcoherence/). The Matlab functions used in our research, as well as some examples of analysis of epidemiological time-series can be downloaded at ecologie.snv.jussieu.fr/cazelles/wavelets/.

## 3. Analysis of epidemiological time-series

### 3.1 Characterization of time-evolving epidemics

The wavelet approach enables the evolution of the oscillating characteristics of a given time-series to be described. To illustrate this, we use a synthetic time-series generated by a classical SEIR model with chaotic and transient dynamics (as described in the caption of figure 2; Tidd *et al*. 1993; Engbert & Drepper 1994).

The results of wavelet analysis are summarized in figure 3*a*–*d*. Figure 3*a* displays the transient dynamics of the infectious population simulated by the SEIR model. Figure 3*b* displays the classical Fourier spectrum (Chatfield 1989) of this time-series. There are large peaks at a period of 1, 2 and around 3–4 years. Although these periodic modes are present in the time-series (figure 3*a*), the Fourier spectrum is not able to specify when these modes are present. Conversely, the wavelet power spectrum can quantify the time evolution of these oscillatory modes and show when they are dominant (figure 3*c*).

This synthetic time-series is dominated by a 2-year mode (and its 1-year subharmonic) before time index 1340 (figure 3*c*). After this, the dynamics become more complex (chaotic, see Tidd *et al*. 1993) with numerous significant modes of oscillation, but the 3–4 year mode dominates (figure 3*c* and also figure 3*b*). Note that the wavelet power spectrum is coherent with the time-series, with a major shift in the periodicity around time index 1340. We can quantify the shift in the dominant mode by estimating the variance of the time-series explained by different modes using the scale-averaged wavelet power (2.5). Figure 3*d* shows the time evolution of the percentage of the variance explained by the 2- and the 3–4 year modes. In the first part of the time-series before the time index 1340, the 2-year mode explains more than 40% of the variance. After this, the 3–4 year mode becomes dominant, explaining more than 30% on an average of the variance.

The above example shows that wavelet analysis allows the detection of shifts in the periodic components of a given time-series. To illustrate the method using an observed time-series, we have analysed weekly measles notifications in the city of York (UK) for the pre-vaccination era, 1944–1966 (figure 3*e*–*g*; data from Grenfell *et al*. 2001). This series is dominated by the 2-year periodic mode (figure 3*f*). As with classical approaches, the wavelet power spectrum reveals not only the importance of this oscillatory mode, but also its time localization. Before 1960, the 2-year mode dominates (figure 3*g*) as for other cities (Grenfell *et al*. 2001) but after 1960, this periodic mode disappears and is replaced by more aperiodic dynamics (figure 3*g*). This observation is supported by examining the time evolution of the percentage of the variance explained by the 2-year mode. Figure 3*h* shows that the 2-year mode explains on average more than 50% of the variance before 1960 and a significant decrease of the dominance of this mode around 1960. This graph (figure 3*h*) also shows the rapid alternation of the 2-year and the 1-year modes after 1960.

### 3.2 Analysis of transient relationship between epidemics and environmental forcing

Cholera is a highly contagious disease caused by specific strains of the bacterium *Vibrio choleræ* that is naturally present in the environment and is autochthonous to coastal and estuarine ecosystems (Colwell 1996). Cholera is known to be influenced by climate and ocean conditions (Bouma & Pascual 2001; Pascual *et al*. 2002). Previous studies of cholera population dynamics have proposed links between oceanographic environmental conditions and human cases of cholera morbidity and mortality (Epstein 1993; Colwell & Huq 2001). Climate may also affect the dynamics of cholera by shifting pathogen abundance, host species abundance, population dynamics or community interaction. For example, cholera outbreaks in Peru and Bangladesh have been linked to periodic climatic cycles such as the El Niño Southern oscillation (ENSO; Colwell 1996; Salazar-Lindo *et al*. 1997; Pascual *et al*. 2000; Rodó *et al*. 2002). For example, Rodó *et al*. (2002) used singular spectral analysis and Fourier analysis in order to isolate the main interannual variability in long-term Bangladesh cholera time-series. In this study, the authors showed that the strong association between cholera dynamic and ENSO is discontinuous in time owing to the shifts in the ENSO frequency spectrum. They needed to analyse data in three successive time periods, with classical Fourier spectrum, in order to capture these changes. Our initial aim was to determine whether the associations between ENSO and cholera outbreaks in Ghana (Western Africa) are similar to those observed in Peru and/or Bangladesh, but with a method that potentially is able to capture the non-stationary features of the observed dynamics if present. Then we analysed with wavelets reported cholera cases for Ghana for the years 1975–2002, compiled from the weekly epidemiological record (available at the WHO website: www.who.int/wer/en/). As a proxy for ENSO, we used the southern oscillation index (SOI) for the same years (www.cpc.ncep.noaa.gov/data/indices/soi).

Figure 4*a* (i) displays the observed time-series. Results from the classical Fourier spectral analysis tell us that both time-series have similar oscillating components mainly around the 2–3 year and the 4–5 year modes (figure 4*a*, ii). This could suggest a possible association between cholera dynamics and ENSO. These associations estimated by the classical coherence (Chatfield 1989) are shown to be high for the 1.2-year, the 2.5-year and the 4-year modes (figure 4*d*, ii). Nevertheless, the results of the wavelet analysis lead to more restrained conclusions.

For cholera incidence in Ghana, the wavelet power spectrum shows a continuous oscillating mode at both 3–5 year and 6–8 year during the whole time period (figure 4*b*). Nevertheless, these modes of oscillation vary in strength. The dominant modes are the 3-year mode for 1989–1995, the 4-year mode for 1989–1996, around the 6-year mode for 1975–1985 and the 8-year mode for 1990–1998 (figure 4*b*, i). Wavelet analysis of SOI detected a significant 4-year periodic mode for the whole time-series but the SOI wavelet power spectrum is characterized by a strong 4-year periodic component during both the 1980–1989 and 1995–2002 time periods (figure 4*c*, i). During 1989–1995, one observed a shift of the dominant mode to the 5-year periodic band. The right of figure 4*c*,*d* display the average wavelet power spectrum (2.4) that are very similar than those obtained with the Fourier method (figure 4*a*, ii). Although wavelet analysis reveals that these two time-series have similar oscillating components, these components are not always present at the same time. In fact, the wavelet coherence (that quantifies the association between the time-series) is significant only for 1980–1985 around the 4-year mode (figure 4*d*, i). These results are in contrast to those obtained with the classical approach (figure 4*d*, ii).

Wavelet analysis suggests that the incidence of cholera cases in Ghana is weakly coherent with ENSO except for 1980–1985 in the 4-year periodic band. This pattern of association appears more complicated than those observed in other regions (Salazar-Lindo *et al*. 1997; Pascual *et al*. 2000). This particular analysis reveals that the timing of association between disease dynamics and environmental forcing must not be disregarded. Wavelet analysis is one of the statistical tools with the power account for such time-evolving relationships.

### 3.3 Analysis of spatial synchrony of epidemics

Using monthly data for Thailand, Cummings *et al*. (2004) identified travelling waves of dengue hemorrhagic fewer (DHF), initiated in the capital city, Bangkok. In subsequent research, it has been demonstrated that this synchrony is transient and linked to El Niño (Cazelles *et al*. 2005).

The data are monthly reports of DHF in each of 72 provinces of Thailand (Cummings *et al*. 2004). We analysed two incidence time-series from this dataset: the incidence in Bangkok and the averaged incidence for the rest of Thailand (figure 5*a*). The oscillations of the dengue incidence time-series are dominated by the annual mode, but they also have a statistically significant common mode of oscillation around a period of 2–3 year (Cazelles *et al*. 2005).

We computed the different temporal associations between the DHF incidence in Bangkok and the rest of Thailand, showing three main regions of high and significant coherence (figure 5*b*). The first one is for the 2–3 year periodic band for the time period 1983–1991, the second is for the 1-year mode for 1983–1985 and for 1991–1997, and the last for the 5-year mode after 1988. We also analysed the phases of the time-series that permit to obtain information about the possible delay in the relationship (i.e. in phase or out of phase relations; figure 5*c*). This phase analysis is completed by the computation of the evolution of periodic components in the 2–3 year and the 1-year modes for DHF in Bangkok and in the rest of Thailand (figure 5*d*,*e*). The two incidence series are phase locked with a mean delay of three months for the 2–3 year band (figure 5*c*,*d*), but mainly within the period 1983–1992 where there is high coherence with El Niño oscillations (Cazelles *et al*. 2005). For 1983–1986 and 1991–1997, the seasonal mode is dominant. During these years, phase locking is seen only for the seasonal mode and the dengue incidence in Bangkok follows the incidence in the remainder of Thailand with an average delay of one month (figure 5*e*).

The wavelet allows us to explore the transient nature of the spatial synchrony of dengue incidence in Thailand and the potential influence of El Niño on these complex phenomena. With wavelets, we are able to state that the observed travelling waves in the 2–3 year mode described by Cummings *et al*. (2004) coincides with a period of high coherence between dengue and El Niño, convincingly suggesting that climate may be involved in these travelling waves (Cazelles *et al*. 2005).

## 4. Concluding remarks

Wavelet analysis is an important addition to time-series methods with practical applications in epidemiology. We have reviewed the basic concepts of the wavelet analysis and illustrated the approach using examples, which appear particularly interesting from an epidemiological point of view. Wavelet analysis can help us to interpret multi-scale, non-stationary time-series data and reveals features we could not see otherwise. This is clearly the case for both the complex relationship between cholera incidence in Ghana and SOI and for the transient spatial synchrony of DHF in Thailand.

The major aims of much current epidemiological research are to characterize and understand disease processes and the potential influence of exogenous changes on these processes. In this context, as experiment is usually not possible, the retrospective (historical) approaches are frequently employed. Retrospective approaches use a mode of analysis, which is dependent on the comparative and observational richness of the data. A key requirement is to take into account the major characteristics of the observations that mirror the underlying properties of the system. As stressed in §1, epidemiological and environmental time-series observed are typically non-stationary: epidemiological time-series can change dramatically with time. These characteristics may make use of traditional techniques inappropriate and results from classical approaches must be interpreted with caution.

Using the wavelet approach, we have shown that it is possible to study irregular, non-stationary and noisy time-series, and also to analyse weak and transient interactions between such series. Wavelet power spectra quantify the main periodic component of a given time-series and its time evolution. Wavelet coherence is used to quantify the degree of linear relation between two non-stationary time-series in the time–frequency domain. The main advantage of the wavelet approach is the ability to analyse transient dynamics, for one-dimensional signals and for the association between two time-series. A further interesting aspect is the analysis of the phases of the studied time-series (Grenfell *et al*. 2001; Cazelles & Stone 2003). Phase analysis is a nonlinear technique that makes possible to study rather weak interactions (Cazelles & Stone 2003). Indeed, the notion of phase synchronization implies only some inter-dependence between phases, whereas the irregular amplitudes may remain uncorrelated (see Rohani *et al*. 2003 or Xia *et al*. 2004).

As a statistical analysis, wavelet approaches provide no information about the underlying epidemiological mechanisms. There is no single relation neither between cyclical features and biological or epidemiological mechanisms nor between relationships and mechanisms since a given pattern of association between series may be generated by a wide variety of different mechanisms. However, like the other correlative approaches, wavelet analysis can provide useful clues about the nature of the underlying epidemiological processes (e.g. Grenfell *et al*. 2001 or Cazelles *et al*. 2005). Such clues pave the way for future modelling approaches in which explicit mechanisms can be incorporated. A great strength of wavelet analysis is its freedom from stationarity assumptions, providing a condensed, quantitative, temporally explicit summary of time-series. Wavelet analysis can be used as a first step for exploring the complexity of both the observed environmental and the epidemiological signals before the modelling stage. But this modelling approach must also take into account these non-stationary features. In this context, Bayesian approaches such as Kalman filtering (e.g. Cazelles & Chau 1997) or particle filters (e.g. Andrieu *et al*. 2001) seem very promising.

We hope that wavelet analysis will be more widely employed to analyse epidemiological time-series. The wavelet approach should be employed alongside other time-series tools to examine directly the relationship (association, dependence, synchrony) between studied time-series. We believe that wavelet analysis should yield significant future advances in our understanding of epidemiological process in our changing world.

## Footnotes

- Received November 22, 2006.
- Accepted January 2, 2007.

- © 2007 The Royal Society