Abstract
A cardinal challenge in epidemiological and ecological modelling is to develop effective and easily deployed tools for model assessment. The availability of such methods would greatly improve understanding, prediction and management of disease and ecosystems. Conventional Bayesian model assessment tools such as Bayes factors and the deviance information criterion (DIC) are natural candidates but suffer from important limitations because of their sensitivity and complexity. Posterior predictive checks, which use summary statistics of the observed process simulated from competing models, can provide a measure of model fit but appropriate statistics can be difficult to identify. Here, we develop a novel approach for diagnosing misspecifications of a general spatiotemporal transmission model by embedding classical ideas within a Bayesian analysis. Specifically, by proposing suitably designed noncentred parametrization schemes, we construct latent residuals whose sampling properties are known given the model specification and which can be used to measure overall fit and to elicit evidence of the nature of misspecifications of spatial and temporal processes included in the model. This model assessment approach can readily be implemented as an addendum to standard estimation algorithms for sampling from the posterior distributions, for example Markov chain Monte Carlo. The proposed methodology is first tested using simulated data and subsequently applied to data describing the spread of Heracleum mantegazzianum (giant hogweed) across Great Britain over a 30year period. The proposed methods are compared with alternative techniques including posterior predictive checking and the DIC. Results show that the proposed diagnostic tools are effective in assessing competing stochastic spatiotemporal transmission models and may offer improvements in power to detect model misspecifications. Moreover, the latentresidual framework introduced here extends readily to a broad range of ecological and epidemiological models.
1. Introduction
Stochastic spatiotemporal models are playing an increasingly important role in epidemiological and ecological studies relating to transmission of diseases [1,2], invasion of alien species [3] and population movements in response to climate changes [4]. It is well known that the predicted dynamics of such systems can be extremely sensitive to the choice of model, with consequent implications for the design of control strategies [1,5–7], but as yet there is a lack of effective model assessment tools described in the literature. For example, studies of foot and mouth disease have cited the importance of selecting between a longtailed spatial kernel (see §2) versus a localized spatial kernel, but this modelchoice problem is far from being resolved [5,8]. Further modelchoice problems arise in relation to the parametric form of the distributions of incubation and infectious periods in models of measles [9–11], and in relation to diseases such as smallpox [12] and AIDS [13].
Bayesian model assessment techniques appear appealing [14–16] particularly because many of the above studies use Bayesian techniques for model fitting. However, it is well known that this approach is extremely sensitive to prior assumptions regarding the distribution of parameters within the competing models. Moreover, the computational challenges presented by Bayesian modelchoice algorithms can be prohibitive. An alternative approach, based on the deviance information criterion (DIC) [17], is known to be problematic when applied to processes that are only partially observed [18], as in the case of most realworld systems. Furthermore, neither of these approaches leads to a diagnostic tool for assessing the fit of a model in absolute terms; rather they provide an assessment of the relative fit of the competing models. Posterior predictive checks [19], whereby summary statistics of the observed process are compared to their predictive distributions, do provide a measure but appropriate statistics can be difficult to identify [20].
We address this gap in available methodology by pursuing the following objectives:

— to innovate a statistically sound framework for assessing stochastic spatiotemporal models, which can be readily implemented as an addendum to a Bayesian analysis and which avoids the sensitivity and complexity of Bayesian model assessment;

— to illustrate how the approach can be targeted to assess particular aspects of a spatiotemporal stochastic model, here principally the choice of spatial kernel; and

— to demonstrate the effectiveness of the approach using simulated data and to apply it to an ecological dataset describing the spread of an alien species throughout Great Britain.
The approach adopted (see §2) involves representing stochastic spatiotemporal models using appropriately designed noncentred parametrization schemes [21], from which latentresidual processes can be defined. The assessment of fit of a model to a given dataset is then achieved in the Bayesian framework by imputing these residuals, and testing them for compliance with their (known) sampling model using classical tests. The approach has its roots in the framework proposed in [19], and extended in [20,22]. The key innovation in this paper is to design the residual processes so that the resulting tests are sensitive to misspecification of specific aspects of the model under consideration. In particular, we formulate tests for detecting misspecification of the spatial transmission kernel as this aspect typically has major implications for control strategies, for example based on culling or removal of susceptibles in the neighbourhood of infected individuals.
2. Model and methods
2.1. Spatiotemporal stochastic model
We consider a broad class of spatiotemporal stochastic models exemplified by the SEIR epidemic model with susceptible (S), exposed (E), infectious (I) and removed (R) compartments. Suppose that we have a spatially distributed population indexed 1, 2,…, denote ξ_{S}(t), ξ_{E}(t), ξ_{I}(t) and ξ_{R}(t) as the set of indices for individuals who are in classes S, E, I and R, respectively, at time t, and let S(t), E(t), I(t) and R(t) be the respective numbers of individuals in these classes at time t. Then individual j∈ξ_{S}(t) becomes exposed during [t, t + dt) with probability 2.1where α represents a primary infection rate and β is a contact parameter. The term K(d_{ij}, κ) characterizes the dependence of the infectious challenge from infective i to j as a function of distance d_{ij} and is known as the spatial kernel function. Following exposure, the random times spent by individuals in classes E and I are modelled using a suitable distribution such as a Gamma or a Weibull distribution [12,23,24]. Specifically, we use a Gamma(μ, σ^{2}) parametrized by the mean, μ, and variance, σ^{2}, for the random time x spent in class E with density function 2.2For the random time x spent in class I, we use a Weibull(γ, η) parametrized by the shape and scale with density function 2.3All sojourn times are assumed independent of each other given the model parameters.
2.2. Latent residuals
Let Z denote the complete set of data (that is the time, nature and affected individual for all transitions between states) describing an epidemic generated randomly from the above model parametrized by θ. Then, as long as the sampling properties of the stochastic model described by equation (2.1) are preserved, we can consider Z to be generated in nonunique ways. In particular, we consider Z as a deterministic function h_{θ}(·) of a random vector where the components of the latter are generated as a random sample from a Uniform(0, 1) distribution. That is 2.4
This representation is essentially a functional model in the sense of [25] and is an illustration of the concept of generalized residuals proposed in [26]. Note that the selection of a residual process and a function h_{θ}(·) which together specify the model given by equation (2.1) can be effected in a multiplicity of ways. In §2.2.2, we consider how this selection may be done in order to facilitate the design of statistical tests based on that can be sensitive to misspecification of particular aspects of the model.
The particular construction we exploit involves a process that can be partitioned into four independent random samples from Uniform(0, 1) and expressed as where each j = 1, 2, 3, 4, is a vector which determines events relating to a different aspect of the process. The process defines a set of populationlevel thresholds from which the time of each subsequent exposure can be determined by considering the integrated infectious challenge. For the kth exposure (ordered temporally), and specify the quantile of the random sojourn times in class E and I, respectively. The residuals determine the particular infectious contacts that generate each exposure. We now describe concisely how the epidemic process can be constructed through the residuals , and A full description of the residuals can be found in the electronic supplementary material.
2.2.1. Exposure time residuals
We refer to the residuals as exposure time residuals (ETRs). Starting from the (k − 1)th exposure event, we define the accumulated infectious challenge in the population by time t as where t_{k−}_{1} is the time of the (k − 1)th exposure event. The time of kth exposure is then determined from 2.5
2.2.2. Infectionlink residuals
We refer to the residuals as infectionlink residuals (ILRs). Given that the kth exposure event occurs during (t_{k}, t_{k} + dt), and given the other transitions that have occurred prior to t_{k}, the probability that the respective contact is between susceptible and infective is given by 2.6
Note that the primary infection process can be accommodated by adding a notional and permanently infectious individual which presents a challenge α to every susceptible. To generate the particular infectionlink from the residual, , we arrange the p_{ij} in the ascending order p_{(1)}, … , p_{(m)}, where m = S(t_{k})(I(t_{k}) + 1) is the total number of ‘active’ links. The link responsible for the kth exposure is then determined from 2.7
so that individual j becomes exposed due to contact with individual i, and p_{ij} = p_{(s′)}. The inclusion of the ordering operation in this process is motivated by our aim of designing tests that may be sensitive to misspecification of the spatial kernel function in the model equation (2.1). Suppose that the modelled kernel function deviates from reality in a systematic way—for example by declining too rapidly or too slowly with distance. Then a heuristic argument (see the electronic supplementary material, Section 1.4) suggests that we may see a correspondingly systematic deviation from the U(0, 1) distribution when the residuals are imputed (as described in Imputation of infectionlink residuals) from observations, with a concentration, or scarcity of residuals at the extremes, so that model misspecification may be readily detected using standard tests of the fit of the imputed to the uniform distribution.
2.2.3. Latent time residuals
We refer to the residuals as latent time residuals (LTRs). For the kth exposure, define the accumulated pressure of becoming infectious by time t as 2.8where f(y) and F(y) are the density and cumulative distribution function of the latent period, respectively. The time of becoming infectious is then determined from 2.9We remark that the time of recovery can be determined similarly by using r_{4k} and an appropriate sojourn time distribution in class I.
2.3. Bayesian inference and model assessment
It is now standard practice to conduct Bayesian analyses of partially observed epidemics using the process of data augmentation supported by computational techniques such as Markov chain Monte Carlo (MCMC) methods [5,20,27,28]. Given partial data y, these approaches involve simulating from the joint posterior distribution where z represents the complete epidemic data as above. As applied to fit models in this paper, this approach is described more fully in the electronic supplementary material. As the complete epidemic z is reconstructed, or ‘imputed’, it naturally lends itself to the residualbased testing methods now described.
Given a random draw from it is generally straightforward to invert equation (2.4) to impute the corresponding residual by sampling it uniformly from the set (see §2.3.1), the set of residuals mapped to by Therefore, a sample from the posterior distribution can easily be generated with a minor insert to an existing algorithm. On applying a classical test for consistency with the uniform distribution to the (specifically, we use an Anderson–Darling hypothesis test [29]; see also the electronic supplementary material) a posterior distribution of pvalues, is generated, from which evidence against the modelling assumptions can be discerned. In Bayesian parlance, we note that the pair represents a noncentred parametrization.
2.3.1. Imputation of infectionlink residuals
The imputation of and given is straightforward by inverting the procedure specified by equation (2.5) and equation (2.9), respectively. Imputation of is achieved by inverting equation (2.7) but, as the infection link is discrete and the space of residuals continuous, the imputation process warrants description here. The particular infection link for the kth exposure event is randomly chosen from the links between the corresponding exposed individual k and according to probabilities p_{ik} defined in equation (2.6). The ranking of this particular infection link, , is then determined among all links between and infective i ∈ ξ_{I}(t). Finally, the residual is imputed as a random draw 2.10
Bayesian residuals have been used in other contexts [30]. In [20], it was shown that Bayesian latent residuals based on Sellke thresholds [31] could be used to assess the fit of spatiotemporal models. However, the specific approach is problematic when the epidemic is small as thresholds must be imputed even for uninfected individuals. By contrast, the construction proposed here requires residuals to be imputed only for each infection event and avoids this shortcoming. Moreover, as the components of the residual process each relate to a different aspect of the stochastic model, then it is plausible that testing for misspecification of a given aspect may be best achieved by considering only the relevant component of In particular, misspecification of a spatial kernel or the latent period distribution may be assessed by examining the posterior samples of (ILR) and (LTR), respectively. We stress again the importance for the detection of misspecified spatial kernels of the ordering operation (see §2.2.2) in the construction of the ILR, which is included with the expectation that it leads to systematic, detectable and interpretable deviations from U(0, 1) in the imputed residuals. This issue is discussed further in §3.1 and the electronic supplementary material, Section 1.4. As described in §§3 and 4, this hopedfor sensitivity is indeed achieved.
2.4. Interpretation of latentresidual tests
Posterior distributions of pvalues arising from a classical test applied to a latent process have been exploited in [20,22,32]. For completeness, we include some comments on the statistical interpretation of such distributions in the Bayesian context. To the Bayesian observer of data y, represents their posterior belief regarding the pvalue that a classical observer of would compute. Should this distribution be concentrated on small values, the Bayesian would infer that the classical observer may reject the hypothesis that the was generated as a random sample from a U(0, 1) distribution. The latter hypothesis is a key assumption for the functionalmodel representation given in equation (2.4) so that the classical observer would likewise question the validity of this model. Therefore, the Bayesian observer can extract from summaries, for example (as used here), and interpret them as measures of evidence against their model assumptions.
3. Simulated example
To test the methodology, we apply it to analyse spatiotemporal epidemics simulated in a population of size N = 1000, whose locations are sampled independently from a uniform distribution over a square region, between times t = 0 and t = t_{max} = 50. We assume that the entire population is susceptible at time 0, that the epidemic evolves according to equation (2.1) with α = 0.001, β = 3, and that the sojourn times in classes E and I follow Gamma(5, 2.5) and Weibull(2, 2) distributions, respectively. The observations y constitute only the precise times and locations of transitions from E to I and from I to R that occur during the observation period. Figure 1 illustrates the spatiotemporal progression of a typical realization of y.
To assess our modeltesting framework, we fit to the simulated data y a model with the correct structure (Case A), and three further models in which the spatial kernels (Cases B and C) and the latent period distribution (Case D) have been misspecified, respectively. Specifically, we consider

— Case A: and the latent period is distributed as Gamma(μ, σ^{2});

— Case B: and the latent period is distributed as Gamma(μ, σ^{2});

— Case C: and the latent period is distributed as Gamma(μ, σ^{2}); and

— Case D: and the latent period is distributed as Exp(μ).
A Weibull infectious period is fitted in all cases. Uniform priors, which should be constrained to bounded regions to ensure a proper posterior distribution, are specified for all model parameters estimated in the following analyses.
In each case, we use standard MCMC and data augmentation to generate a sample from from which—see §2—we impute posterior samples of the ILRs () and of the LTRs (). In addition, we impute posterior samples of The Anderson–Darling test for consistency with the uniform distribution is applied to each sample of the residuals. Posterior distributions of the model parameters are described in detail in the electronic supplementary material.
Table 1 shows the values of , j = 1, 2, 3, from three independently simulated replicates, y, of the epidemic. From and , it appears that these posterior summaries systematically give evidence against the model when the spatial kernel and the latent period have been misspecified, respectively. On the other hand, suggests no evidence against the model specifications in Cases B, C and D. Note that, for ease of comparison, we only present the value of for relevant cases. Values in cases not presented range from 3 to 6% and, therefore, suggest no evidence against the respective model specification.
3.1. Diagnosing model misspecification
We now illustrate the insights our approach offers for understanding the causes of model inadequacy. Specifically, having observed considerable evidence against a model from the measure , we show that the pattern of residuals can suggest the manner in which the fitted model may be deficient. Consider two (symmetric) scenarios. In Scenario I, the epidemic is simulated from a kernel K(d, κ_{2}) = d^{−}^{2.8} and fitted with a kernel K(d, κ_{1}) = exp(−κ_{1}d); in Scenario II, the epidemic is simulated from a kernel K(d, κ_{1}) = exp(−0.03d) and fitted with a kernel K(d, κ_{2}) = Under the assumption that the fitted model is correct, the imputed ILR should resemble samples from U(0, 1). To highlight any systematic deviations from this null hypothesis, figure 2 presents the histogram formed from the subset of the ILR processes that produce small pvalues (<0.05) revealing a symmetry between the two scenarios. Scenario I and Scenario II, respectively, lead to a concentration or a scarcity of residuals at the extremes of the unit interval. This symmetry suggests that the nature of the incompatibility of the spatial kernel may be diagnosed from systematically different deviations of the distribution of the ILR from U(0, 1). This is discussed in more detail in the electronic supplementary material, Section 1.4.
3.2. Comparison with common Bayesian model checking techniques
One common tool for model checking is the DIC [17]. The model with smallest DIC corresponds to the best model and, conventionally, models whose DIC exceeds that of the best model by more than 10 units are considered to display substantial evidence of poor fit. A key limitation of this approach is that it is known to be problematic when applied to processes that are only partially observed, as in the case of most realworld systems, where the DIC cannot be uniquely defined [18]. Following [18], we compute two versions of DIC, 3.1
and 3.2where X and y represent the unobserved and observed data, respectively, and is often estimated by posterior point estimates, for example the posterior mean, which is used here (note that DIC_{1} and DIC_{2} are referred to as DIC_{4} and DIC_{8} in [18]). The quantities and represent contributions to the likelihood from both the observation model and the process model in the first case, and the observation model alone in the second. Note that calculation of each version of the DIC requires expectations of these quantities which can be estimated using MCMC techniques. The main difference between DIC_{1} and DIC_{2} is that the first takes the unobserved data into account. However, there is no absolute theoretical justification for a preference of one definition over another.
Table 2 shows that, although both versions of DIC can differentiate the relative goodnessoffit between Case A and Case C as well as that between Case A and Case D, DIC_{2} misleadingly suggests that the fit for Case B is better than that for Case A. Note that the DIC is not a direct measure of model adequacy and only measures the relative goodnessoffit between two models. Moreover, as shown in table 2, the ranking of models can also vary between different versions of DIC.
A further approach to model checking is to compare the posterior predictive distribution of summary statistics with their observed values [20,28]. In the electronic supplementary material, Section 6, we consider posterior predictive checks based on common spatial autocorrelation measures including Moran's I and Geary's c indices [33], where application to simulated data shows that these measures are insensitive to the choice of model. Such summary statistics are based only on partially observed data and therefore reflect the behaviour of the competing models averaged over the missing data. By contrast, our method is based on the full posterior distribution of (imputed) unobserved data and other model parameters, which may explain its higher sensitivity to model misspecification.
We further consider the performance of DIC and posterior predictive checks in an application to British floristic atlas data in §4.
4. Case study: spread of giant hogweed in Great Britain
Invasive alien species represent a major threat to ecosystems and cause significant environmental and financial loss worldwide [34]. Heracleum mantegazzianum (giant hogweed) causes significant problems in Great Britain and has rapidly spread since 1970 [27]. We apply our testing framework to British floristic atlas data which assess the presence of giant hogweed over a square lattice of 10 × 10 km resolution in 1970, 1987 and 2000. In total, 2838 such squares are considered to be habitable for the giant hogweed (see [27]). These are classified as susceptible or colonized at the given survey times according to the absence or presence of giant hogweed in the lattice. These data are well suited to testing our methodology. Detection of the species is relatively easy because of its height (more than 2 m), so that the number of false absences in the dataset should be limited. Moreover, the data give ‘snapshots’ of the distribution at three distinct times (from 1970 to 2000) and over a large region making them particularly suitable for inferring the spatiotemporal transmission mechanism.
We first represent the lattice of square regions as a lattice of points where the position of a point is given by the centre of the square which it represents. Figure 3 shows the snapshots of the spread of giant hogweed in Great Britain taken at three distinct times (1970, 1987 and 2000). In the light of the aggressive nature of giant hogweed, we assume that, once colonized, sites can immediately start to colonize other sites and remain colonized. In the terminology of the epidemic model, we consider, therefore, the E and I classes to be a single class and dispense with the recovery class R from our model. Effectively, we fit an S–I (susceptible–infectious) model to the presence/absence data and use our model assessment methods to compare the goodnessoffit of several formulations, discussed in detail in the electronic supplementary material. In summary, the models differ in the choice of spatial kernel and with regard to the inclusion of terms quantifying the suitability of each site, j, for the species. Suitability is represented by a measure c_{j} ∈ [0, 1], where c_{j} are taken from an earlier analysis [27] in which an extensive range of covariates including average temperature, altitude and other factors were considered in their estimation. With suitability included, the instantaneous rate at which a susceptible site j becomes colonized equation (2.1) is moderated by a factor c_{j}.
Full model specifications and posterior estimates of the model parameters are described in the electronic supplementary material. Specifically, we consider three forms of spatial kernel with and without homogeneous suitabilities giving rise to six models:

— Model 1 (M1, kernel A): K(d_{ij}, κ_{1}) = exp(−κ_{1}d_{ij}), heterogeneous suitabilities, c_{j};

— Model 2 (M2, kernel B): K(d_{ij}, κ_{2}) = 1/(1 + d_{ij}/κ_{2}), heterogeneous suitabilities, c_{j};

— Model 3 (M3, kernel C): K(d_{ij}, κ_{3}) = d_{ij}^{−κ3}, heterogeneous suitabilities, c_{j};

— Model 4 (M4, kernel A): K(d_{ij}, κ_{1}) = exp(−κ_{1}d_{ij}), homogeneous suitabilities, c_{j} = 1;

— Model 5 (M5, kernel B): K(d_{ij}, κ_{2}) = 1/(1 + d_{ij}/κ_{2}), homogeneous suitabilities, c_{j} = 1; and

— Model 6 (M6, kernel C): K(d_{ij}, κ_{3}) = d_{ij}^{−κ3}, homogeneous suitabilities, c_{j} = 1.
These models are fitted to the data using Bayesian methods as described in the electronic supplementary material. For the simple SI formulation, the residual process reduces to and specifies exposure times and infection links. We apply three tests to imputed values of these residuals for each of the models. As with the simulated example, we investigate and arising from an Anderson–Darling test applied to the respective subset of residuals. We also consider a combined test, with pvalue , based on a test statistic whose distribution under the modelling assumptions is , and report for each model. Conclusions arising from the various tests are now presented.
4.1. Model assessment and implications for control strategies
From table 3, we first note that, regardless of whether dependence on suitability is included, and are larger for the models with Cauchyform kernel (kernel B, M2 and M5) than for the respective models with exponentially bounded kernel (kernel A, M1 and M4) or with powerlaw kernel (kernel C, M3 and M6), suggesting that the Cauchy kernel typically provides a poorer fit. When dependence on suitability is not included (M4, M5 and M6), the fact for M4 (kernel A) and for M6 (kernel C) suggests there are substantial probabilities that the U(0, 1) hypothesis for the imputed residuals would be rejected by the classical observer and calls these models into question. By contrast, the results for M1, M2 and M3 present no evidence against the model with exponentially bounded kernel (A) and powerlaw kernel (C), while there remains a substantial posterior probability of rejection (0.82) for the model with Cauchyform kernel (B). Figure 4 presents samples from for M1 and M2, highlighting the evidence against kernel B. It is clear from other results in table 3 that the evidence against a given model arises from the ILR residuals the test based on alone presents little evidence against any of the models M1–M6, while the evidence arising from the combined test is typically weaker than that from the tests of alone.
In summary, we find evidence that the dispersion mechanism for hogweed cannot be represented adequately by the Cauchy dispersal kernel, while no evidence against the exponentially bounded kernel and powerlaw kernel is found as long as habitat heterogeneity is accommodated. Figure S8 (see electronic supplementary material) shows that the longtail behaviour of M2 tends to induce a scarcity of residuals at the left end and a concentration of residuals at the right end of the unit interval. Although the ILR residuals were constructed with assessment of spatial kernels in mind, comparison of between models with and without the dependence on suitability (i.e. comparison between M1 and M4 and that between M3 and M6) highlights the potential for the method to detect misspecification of other aspects of the model. This is not surprising given the key role of and the suitabilities in the construction of the colonization links.
Giant hogweed spreads its seed mostly through wind, water and human activities [35]. Localized dispersal mechanisms typically involve the dispersal by wind or animal activities. Human activities, such as soil transport and transport of seeds adhering to car tyres, are mainly responsible for longdistance dispersal. Understanding of the importance of short and longdistance dispersal provides valuable insight for devising appropriate control strategies [36,37]. Our results and figure 5 clearly suggest that the spread of hogweed is mainly via a nearestneighbour mechanism. Given this highly localized dispersal mechanism, and the constraints imposed by the lattice structure of the hogweed data (§4) which forces a minimum distance of 10 km between two sites (in contrast to a more general continuous space in the simulated example), the difference between an exponentially bounded kernel and a powerlaw kernel becomes insignificant (also see figure 5). Hence, the two models display similar goodnessoffit. This suggests that control measures—for instance, education programmes of increasing public awareness and participation in prevention and reporting [38], and field survey and subsequent eradication measures [35,39]—may be most effectively deployed by focusing implementation on the neighbourhood of known colonizations.
4.2. Comparison with deviance information criterion and posterior predictive checks
Similar to §3, we compute two versions of DIC, DIC_{1} and DIC_{2} (see equations (3.1) and (3.2)), for M1 and M2 and the corresponding models M4 and M5 which do not take the suitability into account. From table 4, we note that, although both versions of DIC can differentiate the relative goodnessoffit between M1 and M2, they unreasonably indicate that M4 (exponentially bounded kernel without considering suitability) is the best or the second best model.
In §3 and in the electronic supplementary material, Section 6, we have shown that summary statistics quantifying spatial autocorrelation may be less sensitive to model misspecification than our latentresidual approach. We therefore focus on more intuitive summary statistics based on the number of colonizations. We also adopt a conservative approach, running forward simulations of the fitted model using point estimates of parameters, here the posterior mean. Moreover, these simulations are conditioned on the colonized sites observed in 1970. We focus on comparing models M1 and M2 (which include dependence on suitability and for which our residual analysis shows a significant difference in goodnessoffit) and examine the following predictive outcomes: the predictive distribution of the number of colonized sites, at the end of the observation period (2000), within annular regions centred on a given location (see the electronic supplementary material, figure S7, for a representation of the regions) and the numbers of reported colonized sites at the second and third observation times (1987 and 2000). Figure 6 and table 5 compare predicted distributions and the actual observations.
Figure 6 and table 5 suggest that, similar to §3.2, model checks based on these apparently reasonable summary statistics may be insensitive to the choice of model. Figure 5 shows that M1 and M2 represent very different transmission mechanisms, with kernel B exhibiting a strong propensity for longrange transmission. Figure 7 shows that the estimates of transmission rates can be different when different kernels are fitted. Nevertheless, the predictive distributions of the summary statistic appear consistent with observed values for both M1 and M2.
5. Discussion
Earlier work [40] has championed the view that Bayesian and classical reasoning are natural approaches to follow for parameter estimation and model criticism, respectively, and therefore should be used in combination. We have presented a statistical framework that combines classical and Bayesian reasoning in testing for misspecifications of a spatiotemporal model by investigating the consistency of imputed latent residuals with a known sampling distribution using a classical hypothesis test. In particular, we have shown how such residuals can be tailored so as to be sensitive to misspecifications of the spatial kernel or the sojourntime distributions within a spatiotemporal model. Analysis of simulated epidemics and data on the spread of an invasive species in the UK demonstrates how our modeltesting framework can be used to diagnose the model fit and how the results can be interpreted in practice. In §3.1, we also discuss how one might perform the complementary diagnosis of the type of incompatibility of a spatial kernel by examining deviations (from the null hypothesis that the fitted model is correct) in the distribution of residuals after observing considerable evidence against the model.
Predicted dynamics of epidemiological and ecological systems can be extremely sensitive to the choice of model, with consequent implications for the design of control strategies [1,5–7]. For example, choices of temporal distributions influence both epidemic final size and the persistence of disease, and have important implications for devising effective control strategies which target symptomatic subjects and the timing of infectiousness [6,12,13]. There are examples of such effects related to the parametric form of incubation and infectious period distributions in models of measles [9–11], smallpox [12] and AIDS [13]. It is well known that the spatial transmission mechanisms are difficult to assess in practice yet have major implications for optimal control strategies. Studies of animals and plant diseases such as foot and mouth and citrus canker have cited the importance for selecting between a longtailed spatial kernel versus a localized spatial kernel when devising the most appropriate strategy of culling [5,8,41]. Therefore, we believe that the methodology presented here, based on ILRs, is a novel and potentially powerful tool for diagnosing misspecification of a spatial kernel which can provide valuable insights to modellers in practice. Moreover, we remark that the principles introduced here should be readily extendable allowing the construction of analogous residuals for a wide range of processes included in models in ecology and epidemiology.
We also believe the approach offers several advantages over alternatives. Bayesian model assessment approaches, for example Bayes factors, are known to be sensitive to selection of prior distributions and are challenging computationally [42,43]. Moreover, they allow only relative comparison of competing models, a disadvantage shared by information criteria measures, for example the DIC [17]. The latter is also problematic when dealing with partially observed processes [18], the norm in epidemiological studies, where the DIC is not uniquely defined. By contrast, the tests based on latent residuals offer an assessment of model discrepancy in absolute terms. Posterior predictive checks that use only partially observed data may be insensitive to the model choice (as shown in §§3 and 4.2) even if summary statistics are appropriately chosen. A key feature of the proposed tests is that they can be easily embedded within any Bayesian analysis of a spatiotemporal system that makes use of data augmentation. Also, in contrast to other approaches, for example posterior predictive checks, our method uses the full posterior distribution of unobserved data and model parameters, and may offer a higher sensitivity to model misspecifications. As it is common practice to conduct Bayesian analyses of partially observed epidemics using data augmentation supported by computational techniques, for example MCMC methods [5,20,27,28], the framework represents a potentially valuable addendum to the modeltesting toolkit used in epidemiological and ecological studies.
Funding statement
We acknowledge financial support from Heriot–Watt University and the Scottish Government's Rural and Environment Science and Analytical Services Division (RESAS).
Acknowledgements
The distribution data were obtained from the National Biodiversity Network Gateway and are compiled from numerous sources including the Countryside Council for Wales, Bristol Regional Environmental Records Centre, the Scottish Wildlife Trust and Scottish Borders Biological Records Centre (for details see www.nbn.org.uk/). We also thank Dr Stephen Catterall for helping with the hogweed data and providing suitability estimates used in §4. Finally, we are thankful to the Editor and three referees for their constructive reviews of the manuscript, which have helped us to introduce some further insights into the paper.
 Received November 23, 2013.
 Accepted January 20, 2014.
 © 2014 The Author(s) Published by the Royal Society. All rights reserved.