## Abstract

Biological invasions have movement at the core of their success. However, due to difficulties in collecting data, medium- and long-distance dispersal of small insects has long been poorly understood and likely to be underestimated. The agricultural release of parasitic hymenoptera, a group of wasps that are critical for biological pest control, represents a rare opportunity to study the spread of insects on multiple spatial scales. As these insects are typically less than 1 mm in size and are challenging to track individually, a first-time biocontrol release will provide a known spatial position and time of initial release for all individuals that are subsequently collected. In this paper, we develop and validate a new mathematical model for parasitoid wasp dispersal from point release, as in the case of biocontrol. The model is derived from underlying stochastic processes but is fully deterministic and admits an analytical solution. Using a Bayesian framework, we then fit the model to an Australian dataset describing the multi-scale wind-borne dispersal pattern of *Eretmocerus hayati* Zolnerowich & Rose (Hymenoptera: Aphelinidae). Our results confirm that both local movements and long-distance wind dispersal are significant to the movement of parasitoids. The model results also suggest that low velocity winds are the primary indicator of dispersal direction on the field scale shortly after release, and that average wind data may be insufficient to resolve long-distance movement given inherent nonlinearities and heterogeneities in atmospheric flows. The results highlight the importance of collecting wind data when developing models to predict the spread of parasitoids and other tiny organisms.

## 1. Background

Movement is a fundamental characteristic of living organisms, and understanding movement is crucial for understanding the invasion [1] and persistence [2] of populations in ecosystems. A particular challenge is characterizing and predicting movement and its consequences when the organism both (i) is difficult to detect, and (ii) has the potential for stratified movement over different spatial scales using different modes of movement. Many organisms share both of these characteristics, such as species with dispersal polymorphism [3] or organisms that undergo long-distance anthropogenic dispersal [4,5]. In this paper, we used the first-release of a small parasitoid wasp, *Eretmocerus hayati* Zolnerowich & Rose (Hymenoptera: Aphelinidae), as a case study to quantify stratified dispersal on multiple scales. Such small entomophagous insects (less than 1 mm) are significant for biocontrol of crop pests in agro-ecosystems [6], and *E. hayati* undergoes a form of stratified dispersal [7] that is common in small insects: a combination of wind-borne dispersal over long distances with short-range dispersal governed by different mechanisms [8].

The challenge presented by stratified movement is both theoretical and empirical. The theoretical challenge is that models are required that can integrate different modes and scales of movement, such as individuals' response to local cues that trigger long-distance dispersal as well as aerodynamics at both the local and geophysical scales. The empirical difficulty lies in the challenge of obtaining multi-scale data to verify such models. The size of a study area is often constrained by research feasibility, and consequently there exists a bias towards measurement of routine small-scale movements compared to long-distance emigration and immigration movements [9]. Furthermore, wind data, if collected, is often temporally averaged and only collected at one location. The resolution of data collection is chosen to match a particular mode of movement. As a consequence, integrating multiple spatial scales is typically done by only including movement behaviours relevant to the focal scale and neglecting movement occurring on larger spatial scales. As a consequence of the latter, rates of spread tend to be underestimated due to study design [9], and the neglect of infrequent long-distance dispersal events has been identified as a cause of spread-rate underestimation in the literature [1].

Accurate descriptions and predictive models of long-distance dispersal events are important given their significance to population dynamics. For example, the agro-ecosystems that insect pests and biocontrol agents inhabit are typically fragmented, such that the functional connectivity of the landscape—which incorporates both distance between suitable habitat and dispersal ability—is a key determinant of spread [10]. Long-distance dispersal capacity may reduce colonization lag leading to greater biocontrol efficacy [11], while local-scale search ability is necessary for effective pest suppression [6].

Local-scale and individual-level processes are also important to dispersal dynamics because these factors are often the trigger for shifts between different modes of dispersal. For example, the physiological internal state of the individual [12] and the organism's reaction to environmental cues (e.g. [13,14]) may initiate long-distance dispersal. Even during the so-called ‘passive’ dispersal (long-distance dispersal primarily by wind or water advection), the individual controls entry into, exit from, and movement within the advective stream (examples include spider ballooning [15,16] and the vertical migration of plankton [17]). These particulars of dispersal control can lead to biologically significant differences in dispersal distance [18]. Therefore, a fundamental challenge in the study of long-distance dispersal is how to effectively integrate processes on multiple spatial scales and meaningfully confront hypotheses (i.e. models) with data.

Even rudimentary attempts to model dispersal on multiple spatial scales can reveal new insights into the fundamental biological and physical processes. Particularly in the case of simple models, mismatches between the theoretical distributions and data are especially useful for falsifying hypotheses about movement processes and identifying gaps in our understanding [1]. The subsequent challenge is then to obtain the data necessary to refine and validate more sophisticated models, and to develop the modelling techniques necessary to overcome data limitations.

For example, fast-moving individuals dispersing across large spatial scales are unlikely to be detected, but it may be feasible to infer the presence of an organism from their impact—e.g. infections, parasitizations or evidence of predation. With an additional modelling layer inferring presence from impact, a link can be made between the question of interest and the type of data that is feasible to obtain. Such complex models can be fitted using Bayesian frameworks, which have been used to answer a variety of ecological hypotheses in recent years [19]. For example, Ovaskainen *et al.* [20] used Bayesian methods to discern the relationship between emigration rate and age, and Bayesian methods have also been used to estimate extinction risks [21], parameters for demographic models [22], epidemiological parameters [23] and gene frequencies [24]. In contrast to mechanistic mathematical simulation models, which are typically validated with data using ad hoc approaches combined with parameter sensitivity analysis, the key benefit of the Bayesian approach is that it can connect model outputs to proxy data in a systematic way. The kinds of complex and multi-scale spatial models that are needed for characterizing stratified dispersal require long computation run-time, which may mean that full characterization of the Bayesian posterior distribution is not possible. However even in such cases, Bayesian methods are valuable because they calibrate model parameters in a way that is both rigorous and transparent.

Our study builds upon previous work that used a simple model to identify wind advection as a necessary process to explain the dispersal of an introduced *E. eretmocerus* population [25]. Three shortcomings of this earlier study were that (i) the diffusion processes occurring during wind-borne flight were not modelled explicitly, but rather an approximation was made by assuming that spread occurred over the entire spatial grid cell; (ii) model fitting was performed based upon a match between the presence/absence in the model and the data, rather than between inferred and predicted densities; and (iii) active flight behaviour was not considered in a probabilistic sense, but operated as a global switch mechanism based on changing environmental conditions.

In this study, diffusion is modelled explicitly during both local and wind-borne dispersal modes, and active flight behaviour is modelled nonlinearly based on a probabilistic interpretation of flight decisions given environmental variables. Simplicity and mathematical transparency were valued over mechanistic detail in order to assess the performance of our hypotheses. Additionally, we use a Bayesian framework to infer observed parasitizations from modelled population densities and thus estimate the parameter values of the parasitoid dispersal mode by comparing to data. Parameters from a related whitefly parasitoids [26] were used to construct the priors for the Bayesian framework.

## 2. Material and methods

Data were taken from a first-time biocontrol release of a small parasitoid wasp, *E. hayati*, near the town of Kalbar in eastern Australia, described in detail in earlier studies [7,25]. In the earlier study [25], a model for the parasitoid dispersal was fitted using the Kalbar data, and then tested on a later and separate first-time release near the town of Carnarvon in Western Australia. This model, upon which our study builds, assumed that dispersal was via simple wind-advection and was fitted by matching emergence of parasitoids in the field from the F_{1} generation to the predicted presence of females from the F_{0} or release generation. More specifically, the earlier model fitted three parameters: (i) a maximum wind-speed at which females would undertake a wind-borne flight, (ii) a diurnal time-window during which the wind-borne flights could occur, and (iii) a dispersal distance factor *f* that scaled flight time and wind speed to displacement distance. Importantly, in this earlier model, no diffusion processes were explicitly modelled at the landscape scale; rather, a grid cell size of 500 × 500 m was used, which in effect spread females evenly over the area of each cell after every wind-borne flight. By contrast, in this study, we model diffusion explicitly.

We outline the additional specific assumptions of the previous model below for easy comparison with our new model. It was assumed that females would undertake at most one wind-borne flight per day, with equal probability of occurring during any 30 min time interval within the time-window provided that the average wind-speed during that interval was lower than the maximum permitted. Another simplifying assumption was that females would travel at the same speed and in the same direction as the averaged wind, and that the flight time was the same for all individuals. Flight distance was found by multiplying wind speed and flight time by the factor *f*. Oviposition occurred for the females' whole lifetime, such that the presence of a female at a location translated directly to detection of emerging F_{1}. It was assumed that variance in development time was low such that the predicted presence of females in the model could be fitted to the emergence data with a single time-shift.

As a starting point for the new model analysed in this paper, we will maintain the following assumptions from [25] about *E. hayati* flight. We refer the reader to the earlier paper for detailed discussion of these assumptions [25].

— Each wasp will undertake at most one wind-borne flight per day. This is a simplifying assumption; however, we extend the previous work by allowing the probability to undertake the daily flight to be fitted by the model.

— The mean direction of this flight will be in the same direction as the average wind velocity during the flight, and to a distance proportional to the wind's speed. The proportionality factor describing wind speed to drift is fitted by the model.

— Wind velocity is assumed spatially constant in the study area, though it varies temporally according to collected data. It should be noted that during the original release experiment, wind measurements were recorded at 1.8 m off the ground at the release point only, and wind vectors were averaged over 30 min intervals. Therefore, wind variability across the landscape, vertically (e.g. boundary layer effects, eddies) and on shorter timescales (e.g. gusting, calms) are neglected.

— Wind-borne flights have a fixed duration, initially set at 30 min. This duration is comparable to flight durations measured in flight-chamber experiments for another

*Eretmocerus*species [27].— Wasps are less likely to initiate wind-borne flight in higher wind velocities. In many insect species, individuals will delay flight when wind speeds make flight difficult or dangerous (e.g. [28]). In the previous study [25], a parameter specifying the maximum wind speed at which flights occurred was found to be a key parameter determining fit between the model and data.

Daily movement for each wasp can be thought of as consisting of two cases: (i) the wasp takes one wind-borne flight at some point during the day, which we model as drift diffusion or (ii) the wasp only moves around locally, which we model as diffusion with possibly a slight drift in the day's (weighted) average wind direction (see Results). For the purposes of this model, both modes of dispersal will be considered passive and independent of the movement of other wasps; we leave it to future studies to determine the relative effect of factors such as host distribution, landscape heterogeneity and wasp aggregation due to mating.

Departing from [25], we will explicitly model both diffusive processes by assuming that the probability distribution for each wasp's dispersal on a given day is determined via a special case of the Fokker–Planck equation
2.1with wind drift vector ** μ**(

*t*) = (

*μ*

_{1}(

*t*),

*μ*

_{2}(

*t*))

^{T}and diffusion tensor In this equation, the first spatial derivative models drift in the direction of

**(**

*μ**t*) while the second derivative models Brownian motion with two dimensional diffusivity

*D*. Since the initial condition for equation (2.1) is given to be a point mass at the origin (each day's dispersal is considered separately), solutions are given analytically as a bivariate normal distribution function with wind-based mean

**(**

*μ**t*) and covariance matrix

*D*which we will assume is non-singular so that it can be specified in terms of correlation

*ρ*and standard deviations in the

*x*- and

*y*-directions,

*σ*

_{x}and

*σ*

_{y}, respectively.

For wasps that only move around locally, ** μ** is taken to be a small constant (based on average wind direction) or zero so that the dynamics are primarily diffusion. Additionally, the diffusion tensor is assumed to be smaller than that for wind-based flight. For wasps taking a wind-based flight, the timing of the flight is considered to be active behaviour by the wasp based on the time of day and past and current wind conditions. The effective

**for a given wasp could be calculated as the integral of the wasp's wind drift while it was in flight, and this is aggregated over all possible flight take-off times based on take-off time probabilities achieve the time-dependent drift vector**

*μ**μ*(

*t*) seen in equation (2.1). Total wasp dispersal for the day is then a realization of a mixture distribution obtained from the two cases of wind-borne and local movement, based on the probability of taking a wind-borne flight during the day given environmental conditions. Finally, we take each day's dispersal to be independent of all previous days so that we can run this process on a discrete daily time step for as long as desired, aggregating to obtain the total wasp dispersal at the end.

### 2.1. Active behaviour: initiating wind-borne flight

During any given day *d*, we assume that the probability of a wasp initiating a wind-borne flight at time *t* (measured in hours) is described by a density function *h*(*t*, **w**_{d}(*t*); *θ*_{h}) where **w**_{d}(*t*) is a function giving the day's wind velocity at time *t* and *θ*_{h} is a set of shape parameters required to define the components described below (see the electronic supplementary material, appendix A for a complete, explicit mathematical description of this function). Integrating *h*(*t*, **w**_{d}(*t*); *θ*_{h}) from *t* = 0 to 24 h yields a value between zero and one which represents the probability that a wasp takes a wind-borne flight during the day. This function completely describes the active-flight component of our model, and in our formulation, *h* can be understood as having three main components: (i) a probability density function *f*(*t*) based on daylight availability which specifies the hours in which wasps are most likely to fly, (ii) a scaling function *g*(∥**w**_{d}(*t*)∥) based upon wind velocity, and (iii) a redistribution function that raises the probability of taking a wind-borne flight later in the day if conditions were previously unfavourable.

Our implementation of *f*(*t*), the daylight probability density function, is the difference of two logistic functions scaled by the integral of *f*(*t*) from *t* = 0 to 24 hours (see figure 1, yellow dashed line). It requires four parameters, *a*_{1}, *a*_{2}, *b*_{1} and *b*_{2}, the first two of which locate the centre of the first and second logistic, respectively, and the second two modify the steepness of the incline and decline of the functions. Similarly, *g*(∥**w**_{d}(*t*)∥) is implemented as a single, decreasing logistic function which takes on values between zero and one depending on how probable it is that a wasp would fly at that wind speed. Its two parameters, *a*_{g} and *b*_{g}, locate and scale the function, respectively (see the electronic supplementary material, appendix A for explicit equations for *f*, *g* and all other functions).

A naive approach to implementing active flight take-off behaviour would be to simply multiply these two functions together, obtaining a density for the time that wind-borne flight is initiated. This approach, however, assumes that each moment's probability is independent of wind conditions earlier in the day. The behaviour it models could be imagined as a process where, before the day starts, each wasp randomly chooses a moment *t*_{0} to initiate wind-borne flight and then actually takes the flight with probability *g*(∥**w**_{d}(*t*_{0})∥). If conditions are poor, the wasp only moves in a more local manner.

To allow previous conditions to increase the likelihood of flight later in the day, we combine *f* and *g* in a way that considers the average decrease of probability in all previous moments weighted by the cumulative distribution function of *f* (lost flight opportunity later in the day has less effect than if the poor conditions occurred early). The result we call *h*(*t*, **w**_{d}(*t*); *θ*_{h}). An example of this redistribution can be seen in figure 1, and for specifics of the function, we direct the reader to the electronic supplementary material, appendix A.

### 2.2. Probability aggregation

Having determined the probability that a wasp initiates wind-borne flight at any point during the day, the total spatial probability distribution for each wasp's movement at the end of a given day is a weighted average of local diffusive movement and drift-diffusion based on wind conditions during flight. The expected spatial population density of wasps is given by multiplying the number of wasps by this probability distribution. Assuming that movement on each day is determined independent of all previous days, we convolute the distribution of each successive day to the current wasp distribution in order to determine each new, aggregate spatial density. This process yields the final model simulations seen in figure 3.

### 2.3. Bayesian modelling framework

In order to compare model results to field data and thus assess, the model's relative performance under differing parameter choices, it is necessary to introduce a framework that can estimate the likelihood of observing the collected data under a given model scenario. The details of this framework are made more complex by the fact that on the landscape scale, wasps were not directly observed in the field but rather leaves were collected and observed for emerging offspring over a number of days. Each collection location (sentinel field) varied in size, and collection was not standardized in any rigorous way—collectors were instructed to search out leaves holding the host species, ideally displaying signs of having been parasitized. The time duration of this collection was unspecified. In the release field, leaves were collected in the same collection grid as had been previously used for direct observation of wasps [7]. The location of each sentinel field and the release field can be seen in figure 2.

To produce the likelihood of the model given emergence data from the sentinel fields, we assumed that the effective area canvassed by the collector in each field was roughly equivalent between fields and that parasitized hosts do not leave or enter the field prior to collection. Each sentinel field was then assigned an observation probability describing the likelihood that each given parasitized host present in the field during collection would be collected and a wasp later observed to emerge. To account for the differing size of the sentinel fields, these probabilities were given a Beta prior with the mean chosen to be the fraction of the field effectively canvassed by the collector, thus conveying *a priori* information that a parasitized host in a larger field is less likely to be collected. A more detailed description of the priors is given in the electronic supplementary material, appendix B. Fecundity was assumed constant in time for the duration of the study, based on [29].

To model the number of parasitized hosts present in each field at collection time, we assumed that time from oviposition to emergence was 19–25 days, distributed approximately according to a truncated normal distribution with variance of 2 days. Although the literature on laboratory observed times often suggests a shorter period [30,31], the field data observed by Kristensen *et al*. necessitated the use of this longer time distribution [7,25] based on the number of emergences observed at longer times from collection. The mean number of collected parasitized hosts could then be calculated based on the population density history and the probability of collection in each field. Actual per-day emergence observations were then assumed to be a Poisson distribution parametrized by this mean, resulting in a likelihood of the emergence data given the model. Calculations for the release field were done in a similar manner, but restricted to the collection grid.

Owing to the 25 × 25 m cell resolution of the model, adult count data collected in each cardinal direction out to a maximum distance of 22 m was not included in this study. Direct observation of adults in the release field grid was included (figure 4) and used to fit model parameters alongside emergence data. The likelihood of the data given the model was once again assumed to be Poisson. To calculate the mean of the distribution, we assumed the probability of observing a given parasitoid present in a grid cell was equivalent for each cell scaled by the number of leaves turned over, which was either 90 or 270 depending on the cell. Parasitoid movement during the data collection period was not considered.

### 2.4. Implementation details

In order to implement the wasp dispersal model in a simulation, certain decisions must be made concerning the time and space discretization. In discretizing time, we approximated all described processes using a left Riemann sum: more precisely, we converted all probability densities that were a function of time (e.g. *h*(*t*, **w**_{d}(*t*); *θ*_{h}) and its subfunctions) into probability mass functions by assuming that wind velocity is constant on each time interval, which we chose to be 1 min in length. Since wind data were only provided in 30 min intervals, we used a linear interpolation to specify velocity on a per-minute basis.

The grid resolution for spatial discretization was chosen as a balance between computational cost when calculating maximum *a posteriori* parameter estimates and the need to maintain a fine enough resolution to place each point in the release field collection grid into different discretization cells. Since all wasp movement represents a solution to equation (2.1) starting from a point mass, and thus is a normal distribution, spatial discretization requires evaluating the bivariate normal cumulative distribution function to acquire total density in each cell. Though we implement this calculation using a Fortran library [32] through SciPy, this process typically represents the primary computational bottleneck since it must be done at each point in the time discretization for as many cells as constitutes the non-negligible area of the probability distribution, defined in our code as the smallest square block of cells centred at the mean which integrates to at least 0.999.

Our implementation of the model is written in Python 3.5 and has been specifically designed for general reuse and transparency, including basic documentation and pervasive comments within the code. Daily dispersal is computed in parallel using the Python multiprocessing library, and the convolution of each resulting distribution if via FFT may be performed on a GPU if CUDA and the necessary Python libraries are installed. Field data is organized using the Pandas library, which has some similarities to R. The Bayesian modelling framework is built and implemented using the PyMC library. While the time required to run the model for a given parameter set is dependent on domain size, resolution and dispersal variance, typical run times for a domain of 64 km^{2} are around 1 min for 19 simulated days. The source code for this implementation available and maintained at https://github.com/mountaindust/Parasitoids [33].

## 3. Results

Results for the maximum *a posteriori* estimate of the model parameters (electronic supplementary material, table A2) reveal local and wind-based diffusion coefficients that are largely in line with findings in the literature [7]. Local movement was found to be uncorrelated, with a standard deviation of several metres in either direction. The data further suggested that there was no local drift based on average wind direction. For wind-based diffusion, results suggested standard deviations around 0.15 km with an east–west bias. This bias is likely to be caused by the species' phototactic response to the rising and setting of the sun [34–36]. In contrast to local movement, this diffusion was found to be positively correlated. The resulting slope of the best linear unbiased prediction of *y* given *x* for this distribution is 0.84 which, in its north-eastern direction, corresponds to 40° north from east. This is roughly the direction of the sentinel fields E and G (figure 2), so it is likely that this is an effort on the part of the maximum *a posteriori* algorithm to either spread wasps further in these fields' direction or to make up for a fixed flight duration, as this is approximately the average wind heading for winds that wasps actually fly in (the average heading for wind speeds under 1 m s^{−1} is 57.4° north from east; see the electronic supplementary material, figure A1).

Wind-based flight time was held constant at 30 min when estimating parameters, with wasps able to fly a distance proportional to the per-minute wind speed. The constant of proportionality relating wind speed to parasitoid drift, *μ*_{r}, was fit to data as 1.18. This value is similar to what was found by Kristensen *et al*. [7] to provide the best fit for data collected from the Kalbar site and reinforces a roughly 30 min flying period. However, maximum *a posteriori* parameter estimates fit the model so that wasps fly during a more extended portion of the day than might be expected, roughly 7.18 to midnight. At the Kalbar site, at the time of year the field study was conducted, sunrise and sunset occurred at 6.00 and 20.00, respectively. We consider it likely that these extended flight times reflect lack-of-fit to emergence data further away from the release site as described below. Flight probability as a function of wind speed was fit to be 0.5 at 1.26 m s^{−1}. With the fitted scaling parameter *b*_{w} = 3.91, the probability of wind-advected flight with wind speed of 0 m s^{−1} at takeoff is roughly 0.99, and at 2 m s^{−1}, it is about 0.05.

Figure 3*a*–*d* shows the model prediction for wasp density (number/625 m^{2}) 3, 6, 9 and 19 days post release, respectively. Predictions of 3, 6 and 9 days were chosen to reflect the times at which data for direct observation counts of adult parasitoids were recorded for the release field (see figure 4). 19 days post release, leaves were collected in each of seven fields (red outlines in figure 3*a*–*d*, labelled A–G in figure 3*e*–*f*) and subsequently observed for emergences (figure 3*e*) in order to assess long-distance dispersal. Figure 3*e*–*f* shows emergence observations (number of emergences) (*e*) and the model's projected emergences (number/100 m^{2}) in the field (*f*) from hosts in a parasitized state 19 days post release using the parameters electronic supplementary material, table A2.

As can be seen in figure 3*f*, our model broadly captures emergence trends in fields close to the release field A, but largely underestimates the number of parasitoids in fields further away. Also of particular note is the dual spatial-scale nature of the wasp distribution in figure 3*a*–*c*. Over the entire landscape, the distribution may be characterized as heavy-tailed with the tail skewed roughly in the average direction of the wind (northwest). However, on a more local spatial scale where the wasp density is higher (yellow and green), the distribution is skewed instead to the northeast in a direction more closely following the average low-velocity wind heading. This dual-spatial scale dynamic continues up until 13 days post release, when local dynamics begin to merge with the average full velocity profile of the wind. At 16 days post release, the wasp density is similar in character to that seen in figure 3*d*, where it should be noted that the centre of the distribution actually lies to the north of the release field rather than the northwest, even though the distribution is skewed to the northwest with the average wind. Unfortunately, there were no sentinel fields in the northwest direction for direct comparison with data.

Figure 4 shows the model's projected density of parasitoids (individuals/100 m^{2}) over the release field as a surface plot along with the observed parasitoid densities normalized by collection methodology (shown as bars). The largest peak in this local projected parasitoid density falls northwest and northeast of the release site, which correspond to the average wind direction and average low-velocity wind direction, respectively. This is in general agreement with observed data and the northeast direction is further represented in the model outside the release field by a pocket of relatively high population density that predominately aligns to the northeast through 9 days post release (figure 3*a*–*c*).

These observations are also evident when examining the maximum *a posteriori* estimates for parameters in the Bayesian modelling framework (electronic supplementary material, table A3). Specifically, the data suggest a probability of 0.037 to observe any single given emerging parasitoid in field G that was incubating on the collection day. *A priori*, this number seems rather high given that the parasitized host is around a millimetre in size, possibly out of view on foliage, and could be anywhere in the sentinel field. Probabilities fitted for fields closer to the release point were anywhere from 0.00031 to 0.0034, with the exception of field D which was 0.033 and field E which was 0.0123. We note that the probability for observing a given emerging wasp in the release field was 0.064, but in this case, the area was heavily restricted to a sampling grid, and the probability only applies to incubating wasps located within this grid.

In addition to maximum *a posteriori* results, we fed the model into an adaptive Metropolis–Hastings Monte Carlo Markov Chain (MCMC) algorithm in order to analyse the shape of the posterior distribution for each of the model's parameters. This algorithm features block-updating using a multivariate normal jump distribution whose covariance is tuned during sampling [37,38]. Given that our model is spatially explicit, with a discrete daily time-step, and a Bayesian modelling layer connecting proxy data to predicted wasp density, analytical derivatives are not available and therefore Langevin and Hamiltonian MCMC methods (such as those seen in [23]) cannot be used. Additionally, posterior distributions could not be obtained. It is important to note that, even after coding for efficiency, realizations of a fully spatial, multi-scale model take far longer to obtain than is common in statistical Bayesian posterior analysis. One million model realizations represents a typical number for fully formed and converged posteriors. In our case, 10 000 model realizations requires approximately one week to produce. Therefore, after 220 000 model samples representing three months of computation time, the algorithm has yet to converge to a posterior distribution.

## 4. Discussion

On a spatial scale of less than 1 km, the model reproduces results seen in the field, suggesting that passive drift-diffusion with active behaviour at take-off may often be sufficient to capture the spread of small insects from point release locally. Despite the coarse nature of our field data, the Bayesian framework was clearly able to separate out two modes of diffusion, local and wind-based, when given similar priors (see the electronic supplementary material, table A1). The east–west phototactic bias noted in the literature was also reproduced on the landscape scale. A strong fitted correlation coefficient for wind-based flight suggests that the model could be improved by allowing flight duration to vary, which would roughly stretch the resulting distribution along the direction of the wind. Finally, our model indicates that within the release field and the fields immediately surrounding it, the predominant local dispersal direction may be forecast for up to two weeks by averaging over wind velocities with magnitude less than 1 m s^{−1}.

On spatial scales over 1 km, our results strongly suggest that drift-diffusion based solely on spatially homogeneous and temporally averaged wind velocities are insufficient to capture the long-distance dispersal of parasitoid wasps in the field. This fact is evident both in the inflated observation probabilities returned by the Bayesian framework for the far sentinel field (G) and a midrange sentinel field (D), and by the increasingly poor fit of the model to emergence data as the collection field gets further away from the release point (figure 3). A positively correlated wind-based diffusion pattern and extended flight hours also point to this conclusion.

The layout of the fields was roughly perpendicular to the average wind direction, yet parasitoids reached high densities in further fields. As noted in the previous work [7,25], wind speed during the study was related to wind direction such that high wind speeds tended to point perpendicular to the further fields and lower wind speed tended to flow parallel. Therefore, a model that constrains parasitoids to only fly when wind speeds are relatively low allows parasitoids to remain in the cultivated area and disperse to far fields. However even with this behavioural mechanism included, the parasitoid densities predicted by the model are low compared with the data. We suggest two possible non-mutually exclusive explanations for the higher-than-predicted concentrations of parasitoids in far fields: (i) variability of wind-flow, both spatially (across the landscape and as a function of height) and temporally or (ii) parasitoids' ability to direct landing towards cultivated land.

First, on spatial scales on the order of a kilometre or more, it is likely important to resolve the spatial and smaller scale temporal variability of the wind. In order to obtain an analytical solution, our model assumed uniform wind speed and direction in space at any given point in time. Wind speed does vary as a function of height, and parasitoids that are advected upward on, for example, thermal updrafts may be carried significantly farther than those flying near the ground [39]. Previous work suggests that other tiny insects may suspend active flight once airborne, potentially extending flight times and distances travelled [40]. Furthermore, heterogeneous landscapes and vegetation will create spatially varying winds [41]. Complex structures such as rolling eddies, convection cells and other instabilities generated by density stratifications may also be present and greatly alter flight trajectories [42]. Note that characterization of these complex structures requires additional spatial and temporal resolution of wind, and we were not able to capture these effects in this study given the limited wind data. Gusts may also play a role in longer than predicted dispersal distances. Given that the parasitoids strongly prefer to fly on calm days, we do not think that gusts alone can explain the differences we see between actual and predicted long-distance dispersal. Overall, our results suggest that it is important to resolve the spatially and temporally varying air flow above the landscape in order to correctly predict the dispersal of passively moving, wind-borne organisms.

Second, while very small insects in the wind-stream are generally regarded as ‘passive’ dispersers, and cannot control their direction of flight, they can potentially control entry to and exit from the wind-stream. There is some evidence that visual flight-arrest triggers may be used by small insects to land in habitat that is more likely to provide them with needed food and reproduction resources, thus reducing the risk that wind-borne dispersal holds of transporting individuals to resource-poor areas. In this study, the agricultural area formed a thin strip (approx. 3 km wide) along a creek, and the wider landscape was predominantly livestock pasture. If the parasitoids were triggered to land by the greener irrigated crop fields compared to the browner surrounding landscape, and particularly if the parasitoids are capable of detecting the oncoming colour difference that signalled the edge of the agricultural area, then this may have reduced the proportion of parasitoids leaving the agricultural area. Most phytophagous insects are responsive to plant-related light wavelengths as triggers to land [43,44]. They show an attraction to plant-related colours and are influenced by colour contrasts that occur at field edges or between crop types [45,46]. In related *Eretmocerus eremicus*, females in particular have been demonstrated to respond to a plant cue in a wind tunnel experiment [36], and UV-absorbing plastic sheeting impairs the ability of *Eretmocerus mundus* to locate plants containing hosts [47]. This raises the possibility that parasitoids in the wind-stream may respond to visual stimuli in order to reduce the risk of being carried past potentially resource-rich areas [48], which would explain the higher concentrations of parasitoids indicated by the data compared to that predicted by the model.

## 5. Conclusion

Matches and mismatches between data and model predictions allow us to refine our hypotheses about the mechanisms of movement. In this study, a Bayesian framework was used to rigorously connect a mechanistic, mathematical simulation model to sparse proxy data in order to verify a stratified dispersal mechanism. It was capable of identifying two separate diffusion kernels associated with local and wind-borne dispersal modes, respectively, and the exogenous condition (wind speed) that likely triggers the switch between dispersal modes. The model also indicated that, while conditional wind-borne dispersal provides some of the explanation for how the parasitoid successfully dispersed to other host-containing fields on the landscape scale, it could not explain how the parasitoid did so in such high numbers. Therefore two possible explanations—variability of the windflow across the landscape scale, or the use of visual host-habitat cues as triggers for landing—are proposed. The Bayesian framework provided here can be used to test either of these hypotheses once the appropriate data is collected.

## Data accessibility

All software and data needed to reproduce the results in this study is maintained at https://github.com/mountaindust/Parasitoids and provided open source and free of charge under the terms of the GNU GPLv3 license agreement. For other licensing arrangements, please contact the lead author. A snapshot of the software and data at time of acceptance is available with (doi:10.5281/zenodo.556462) as release version 1.0.0 [33].

## Authors' contributions

C.S. designed the model, authored all code, interpreted the results, and drafted the manuscript and figures. N.K. provided the data, ecological motivation for the model and drafted the manuscript. L.M. conceived of the study, coordinated the study and edited the manuscript.

## Competing interests

We declare we have no competing interests.

## Funding

This material was based upon work partially supported by the National Science Foundation under grant no. DMS-1127914 to the Statistical and Applied Mathematical Sciences Institute and CBET-1511427 and DMS-1151478 to L.A.M.

## Disclaimer

Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

## Acknowledgments

The authors wish to acknowledge Richard Smith for his extremely helpful guidance in formulating the Bayesian framework used to fit model parameters to data. We would also like to thank our four anonymous reviewers for their insightful and detailed comments that greatly helped improve our manuscript.

## Footnotes

Electronic supplementary material is available online at https://dx.doi.org/10.6084/m9.figshare.c.3773294.

- Received January 2, 2017.
- Accepted April 26, 2017.

- © 2017 The Author(s)

Published by the Royal Society. All rights reserved.