## Abstract

Tracking the movement of individual cells or animals can provide important information about their motile behaviour, with key examples including migrating birds, foraging mammals and bacterial chemotaxis. In many experimental protocols, observations are recorded with a fixed sampling interval and the continuous underlying motion is approximated as a series of discrete steps. The size of the sampling interval significantly affects the tracking measurements, the statistics computed from observed trajectories, and the inferences drawn. Despite the widespread use of tracking data to investigate motile behaviour, many open questions remain about these effects. We use a correlated random walk model to study the variation with sampling interval of two key quantities of interest: apparent speed and angle change. Two variants of the model are considered, in which reorientations occur instantaneously and with a stationary pause, respectively. We employ stochastic simulations to study the effect of sampling on the distributions of apparent speeds and angle changes, and present novel mathematical analysis in the case of rapid sampling. Our investigation elucidates the complex nature of sampling effects for sampling intervals ranging over many orders of magnitude. Results show that inclusion of a stationary phase significantly alters the observed distributions of both quantities.

## 1. Introduction

Tracking is a widely used experimental method to probe the movement of living organisms over a huge range of spatial scales, from bacteria swimming over distances of tens of micrometres [1], to birds migrating thousands of kilometres [2]. While the details of the experimental protocol vary between different studies, in the majority of cases the data obtained are similar, namely an ordered list of position vectors (observations) sampled at discrete time points that approximates the continuous-time underlying motion. The time intervals between observations may have a profound impact on the movement patterns observed and the conclusions drawn [3–5]. It is therefore crucial to characterize these effects. We restrict our attention to cases where the sampling interval between successive observations is constant, as is the case in many tracking systems, including global positioning systems [6] and microscopy [1].

Tracking generates large quantities of information about the motion of individuals and, where sufficiently many tracks are available, populations. These data may be interrogated in order to extract statistics and test hypotheses specific to the organism being studied. Examples include the homing behaviour of rockfish [7] and oscillations in the dive depths of sharks [8]. To ensure the wide applicability of the present work, we consider the effect of the sampling rate on two quantities that are of fundamental interest in many such studies: the apparent speed of an individual between two consecutive observations and the apparent angle change (AAC) between three consecutive observations. These quantities have been used to elucidate the chemotactic response of the bacterium *Rhodobacter sphaeroides* [9] and the movement of beetles close to the boundaries of habitats [10]. Our work also applies to apparent displacement, since for a given sampling interval this quantity is linearly proportional to the apparent speed.

Any theoretical study of the tracking process requires the specification of a model of motion. Here, we consider individuals undergoing a correlated random walk (CRW) [11]. Various species move in a manner well described by a CRW, in the sense that their motility takes the form of approximately straight line movements interspersed by stochastic reorientations. This includes the ‘run-and-tumble’ pattern of motility exhibited by many species of planktonic bacteria [12,13], ovipositing butterflies [14], foraging bumblebees [15] and the clonal growth of the plant species *Solidago altissima* [16]. Correlation is introduced because turning angles are specified relative to the previous direction of motion. Turning angles are also commonly assumed to be independent and identically distributed (iid), hence the correlation exists between successive positions, rather than successive angle changes. In many studies, it is assumed that reorientations occur instantaneously. Here, we consider both this model and a variant, proposed by Othmer *et al.* [13], in which reorientations are concomitant with a stationary ‘resting’ phase. This is a good description of the motion of bacteria such as *R. sphaeroides* that stop swimming to reorient [9], and elk that pause at foraging sites to feed [17].

The effect of sampling rate on observed tracks is a complex problem that has, to date, received relatively little research attention. An important study performed by Kareiva & Shigesada [14] considers a CRW with variable speed, deriving an exact expression for the mean squared displacement (MSD) after a specified number of steps, in terms of the mean cosine of angle changes, mean step length and mean squared step length. This work was extended by McCulloch & Cain [18], who derive an approximate expression for the mean displacement. Bovet & Benhamou [15] study a CRW with fixed step lengths between observations. In order to make tracks compatible with this description a spatial rediscretization is required, leading to variable time intervals between observation points. The authors obtain an empirical expression for the variation of angular standard deviation with step length.

While the above studies consider the effect of number of steps or step length (spatial rediscretization), in the present work we are concerned with the effect of sampling rate (temporal rediscretization). In the first study to consider this problem, Hill [19] rediscretizes tracks with a variable sampling interval and makes the ad hoc assumption that the mean and standard deviation of apparent speeds vary linearly with the size of this interval. More recently, Codling & Hill [12] have analysed the effect of the sampling rate on a CRW, repeating the analysis of Bovet and Benhamou with a temporal rediscretization of tracks. The authors use simulated tracks to find empirical expressions for the variation of the standard deviation of AACs, and the mean and standard deviation of apparent speeds. They demonstrate that Hill and Häder's assumption of a linear relationship between these properties and the sampling interval is only valid for a limited range of sampling intervals.

### 1.1. Aims and outline

In this study, we address several of the open problems in this field. In §3, we obtain the stationary distributions of the apparent speeds and AACs for the run-only model in the limit of large sampling intervals. In §4, we extend the work of Codling & Hill [12], by computing numerically the full distribution of these quantities, rather than the first two moments, thus fully characterizing their complex dependence on the sampling interval. We also consider a modified model, in which reorientations are accompanied by a stationary phase. This has, to our knowledge, not previously been investigated. In §5, we present a novel analytic approach to describe the apparent speed and AAC distributions for the run-only model in the limit of rapid sampling.

## 2. Models and methods

A summary of all mathematical notation used in this work is given in table 1. In this study, we investigate the effect of the fixed sampling interval, denoted by *τ*, on the observations made from a tracking experiment of an individual moving with a constant speed, *c*_{const}, in a two-dimensional unbiased CRW. Reorientation events occur as a Poisson process with constant mean inter-arrival time *τ*_{R}. Following Codling & Hill [12], we consider the non-dimensional ratio *τ*/*τ*_{R}, which represents the mean number of reorientations in a sampling interval. We examine two variants of the model, proposed by Othmer *et al.* [13]: (i) the ‘run-only’ model, in which reorientation events occur instantaneously and (ii) the ‘run-and-stop’ model, in which reorientations take a finite time, during which the individual is stationary. In the run-and-stop model, the individual also leaves the stationary phase as a Poisson process, with mean inter-arrival time *τ*_{S}. Figure 1 shows a simulated run-only track (see §2.1 for details). Between consecutive reorientation events, an individual covers a random straight-line distance *L*, which is distributed exponentially with mean *c*_{const}*τ*_{R}. During a reorientation event, an individual turns through an angle , a random variable with a probability density function (pdf) given by *f*_{Φ}(*ϕ*). We assume that the angle changes are iid and independent of the running distance, *L*. In the run-and-stop model, we also assume that angle changes occur independently of the duration of the reorientation phase.

In §§4 and 5, we assume the underlying angle changes *Φ* are drawn from the Von Mises (VM) distribution, a commonly used distribution in directional statistics [12,14,15]. It is a good approximation for the normal distribution on a circle and mathematically more convenient, as the pdf does not involve an infinite summation [20]. We choose the VM distribution with pdf
2.1where *I*_{0} denotes the modified Bessel function of the first kind with order zero and the parameter *κ* ≥ 0 controls the ‘peakedness’ of the distribution. This distribution is symmetric about zero, hence has zero mean, since our motion model is an unbiased CRW. When dealing with circular statistics, alternative summary statistics are needed from those used in linear statistics. We shall use the angular deviation [20], given by
2.2where *ρ* denotes the mean cosine of the underlying angle change, *Φ*. For the VM distribution (2.1), we have [20]
2.3We may also use equation (2.2) to compute the angular deviation of the AACs, *σ*_{θ}, in which case *ρ* denotes the apparent mean cosine. For consistency, we therefore specify the angular deviation of the underlying VM distribution, *σ*_{δ}, which we convert to the VM parameter *κ* using equations (2.2) and (2.3). This equation has a unique solution as *I*_{1}(*κ*)/*I*_{0}(*κ*) is monotonic in *κ*; we compute this using a numerical optimization routine based on the trust-region algorithm [21].

The sampling process is modelled as a uniform temporal discretization of the underlying track, with consecutive observations separated by the sampling interval *τ*. Since reorientations occur stochastically in continuous time, observations do not in general coincide with reorientation events. The observed track takes the form of a series of transition vectors, denoted **d*** _{i}*, where gives the ordering of the transitions. The quantities of interest are the relative apparent speed (RAS) between consecutive observations,

*R*/

_{i}*c*

_{const}= ||

**d**

*||*

_{i}*c*

_{const}

*τ*, and the AAC between three consecutive observations,

*Θ*

_{i}= arccos(

**d**

*·*

_{i}**d**

_{i}_{+}

_{1}/||

**d**

*|| ||*

_{i}**d**

_{i}_{+}

_{1}||) (figure 1). The RAS is a non-dimensional measure of the proportion of the true speed obtained by the observed individual across a sampling interval; use of this quantity ensures that our results are independent of

*c*

_{const}. We record all observed quantities in order to approximate the pdf by computing a histogram. We also compute the mean RAS, , the standard deviation of RASs,

*σ*

_{c}, and the angular deviation of AACs,

*σ*

_{θ}.

When computing summary statistics for the run-and-stop CRW, we discard AACs and speeds arising from static intervals in which the simulated individual has zero displacement. Static intervals in tracks are readily discernable, and provide a direct means to calculate the underlying reorientation angle. Where the tracking data permit this type of direct analysis, for example, when reorientations occur at landing sites [14], other analysis methods are more appropriate than computing all apparent quantities.

For simplicity, we neglect the effects of measurement noise; while the effects of noise and sampling frequency are related [4], many open questions remain about the role of sampling frequency alone. We discuss this in further detail in §6.

### 2.1. Stochastic simulation algorithm

Throughout this study, we use the following stochastic simulation algorithm to generate realizations of the unbiased two-dimensional CRW. In the run-only model, waiting times between reorientation events are drawn from the exponential distribution with mean time *τ*_{R}. In the run-and-stop model, the duration of each stationary reorientation event is also exponentially distributed with mean *τ*_{S}. Reorientation events are simulated by drawing an angle change from the VM distribution with concentration parameter *κ* using the algorithm of Best & Fisher [22]. Between reorientations, the individual moves in the chosen direction with constant speed, *c*_{const} = 1 m s^{−}^{1}. The stochastic trajectory of a single individual is simulated in this way until the total time exceeds a predefined value, at which point the trajectory is ceased. The position of each simulated individual is finally calculated at regularly spaced times, separated by the sampling interval *τ*.

All simulations are implemented in Matlab. The code used to simulate tracks is available in the electronic supplementary material, in addition to a movie illustrating the simulated tracks.

## 3. Stationary sampling distributions

When a run-only CRW is sampled at low frequency, many reorientation events occur between consecutive observations and hence persistence in the observed motion is lost. The observed process is diffusive, provided that the mean cosine of the underlying angle change distribution satisfies *ρ* < 1 [11,23]. We now describe the asymptotic behaviour of the CRW in the diffusive limit. Consider an individual undergoing two-dimensional Brownian motion, observed at regular time intervals, *τ*. Between each observation, the displacements in each dimension, *d _{x}* and

*d*, are iid and normally distributed, with zero mean and variance equal to 2

_{y}*D*, where

*τ**D*is the macroscopic diffusion coefficient. Othmer

*et al.*[13] have previously shown that, for a constant-speed two-dimensional run-only CRW, . The RAS between consecutive sample points, , is therefore distributed according to the scaled

*χ*distribution with two degrees of freedom and scale parameter . The mean of this distribution is given by 3.1and its standard deviation is given by 3.2Expressions (3.1) and (3.2) are plotted in figure 2 overlaid with stochastic simulation results for a range of values of

*σ*

_{δ}. As

*σ*

_{δ}increases, so does the time scale of relaxation to the diffusion limit. This result is intuitive; trajectories with high angular deviation lose persistence rapidly, as reorientations are often drastic, hence the time scale at which they appear diffusive is short compared to a trajectory with low angular deviation. Regardless of the value of

*σ*

_{δ}, the standard deviation of RASs initially increases with sampling interval, before eventually decreasing at sampling intervals near to the diffusive limit. We return to this observation in §4.1.

For a diffusing individual, the AACs between consecutive pairs of observations are distributed uniformly on a circle [23]. The wrapped uniform distribution has a mean cosine of zero, therefore from equation (2.2) the apparent angular deviation is . This result is plotted in figure 3. As for the mean RAS, the greater the true angular deviation, the more quickly the observed process relaxes to the diffusion limit.

The results presented above may be extended to include a stationary reorientation phase in a straightforward manner, by noting that the mean proportion of time spent in a running state is given by *τ*_{R}/(*τ*_{R} + *τ*_{S}). Multiplying the macroscopic diffusion coefficient *D* by this quantity gives the effective diffusion coefficient for a run-and-stop CRW.

## 4. Simulation study of dynamic sampling distributions

For many applications, the apparent motile behaviour of individuals at shorter sampling intervals is of greater practical interest than in the diffusive limit. Having derived results for the asymptotic properties of the run-only CRW when *τ*/*τ*_{R} is large, we now consider the effect of sampling on time scales corresponding to *τ*/*τ*_{R} ≥ 1. Here, persistence in the motion is perceivable and the observed process is not purely diffusive. We shall use stochastic simulations of the run-only and run-and-stop CRW models to compute the pdf of RASs and AACs for a range of values of *τ*/*τ*_{R}.

### 4.1. Run-only model

The distribution of RASs for different sampling intervals is shown in figure 4*a* for two different values of *σ*_{δ}. There is a greater spread in the case *σ*_{δ} = 0.4 compared with the case *σ*_{δ} = 0.1, because each reorientation event tends to result in a larger change of direction. As *τ*/*τ*_{R} is increased, the RAS distribution becomes broader because each sampling interval has a greater probability of containing one or more reorientation events. When a reorientation occurs in an interval, the RAS is reduced.

The results in figure 4*a* are in agreement with the findings of Codling & Hill [12]. As we increase *τ*/*τ*_{R}, the distribution of RASs is skewed towards lower speeds. This decreases the mean RAS. We further investigate this phenomenon in figure 5, in which we show the variation of the first percentile RAS with *τ*/*τ*_{R}. When the underlying angular deviation is high, the first percentile RAS is less than 0.5 for even the highest sampling rate considered. Conversely, when *σ*_{δ} = 0.1, the first percentile RAS remains close to one until the sampling interval is relatively large. This is a key example of the use of alternative summary statistics to demonstrate the effects of sampling rate. The insight gained from considering the full distribution of RASs now allows us to rationalize the variation of the standard deviation of RASs, plotted in figure 2*b*. The RASs are tightly clustered about the true value of *c* for small *τ*/*τ*_{R}, becoming increasingly skewed towards lower RASs as *τ*/*τ*_{R} increases, leading to the initial increase in *σ*_{c} with *τ*/*τ*_{R}. As the sampling interval becomes large, RASs are observed in the vicinity of zero with increasing frequency, which reduces *σ*_{c}, leading to the observed asymptotic behaviour.

As we sample more frequently, the distribution of RASs converges to the true underlying distribution, as individuals undergo fewer reorientations between sampling intervals. For the constant-speed process considered here, the distribution of RASs therefore tends towards a delta function at the true underlying speed as we decrease *τ*/*τ*_{R}. This result is not restricted to a constant-speed process; when there is an underlying distribution of speeds, the observed speed distribution converges to the true underlying distribution (data not shown).

Figure 4*b*,*c* show the distribution of AACs for several values of *τ*/*τ*_{R} and *σ*_{δ}. The VM distribution used for reorientations in the underlying CRW is also shown. Note that this distribution is constant in each figure; the change in height is because of a rescaling of the *y*-axis. The broadening of the distribution of AACs results in an increase in *σ*_{θ}. In the rapid sampling limit, this distribution tends towards a delta function at the origin, because most sampling intervals contain no reorientations and hence measure zero angle change. Such intervals are described as ‘artificial zero turns’ by Codling & Hill [12]. In contrast with RASs, where the true underlying distribution is obtained in the limit *τ*/*τ*_{R} → 0, at no sampling interval does the distribution of AACs match the true underlying angle change distribution. This is because, in making observations at regular intervals, we sample both across and between reorientations.

### 4.2. Run-and-stop model

We now perform a similar analysis for the run-and-stop model, in which the additional parameter *τ*_{S} specifies the mean duration of a stopping phase. The distributions of RASs for two values of *σ*_{δ} and *τ*_{S} are shown in figure 6. Compared with the run-only model (figure 4), two peaks are visible at *c* = 0 and 1, corresponding to individuals that are stationary for the duration of an interval and to individuals that undergo no reorientations in an interval, respectively. Both decay as *τ*/*τ*_{R} increases. The density at intermediate RASs spans the full range of permissible values, in contrast to the run-only case, in which the range of observed values is more limited. When *τ*_{S} = *τ*_{R}, the densities in the two outer peaks are equal, as individuals populate both the running and reorientating states equally, on average. Reducing *τ*_{S} causes the distribution to be more negatively skewed for all values of *τ*/*τ*_{R} considered, as individuals spend more time running than reorienting.

In figure 7, we plot the mean and standard deviation of RASs, and *σ*_{c}, for different values of *τ*_{S}. As discussed earlier, we exclude static intervals in the calculation of these quantities. Comparison of figures 2*a* and 7*a* indicates that the time scale on which the process appears diffusive is longer in the run-and-stop CRW than the run-only CRW. In all cases, the mean RAS decreases monotonically for all sampling intervals considered, with the initial rate of decrease becoming more marked for larger *τ*_{S}.

We find that *σ*_{c} displays a non-monotonic dependence on the sampling rate for *τ*_{S} = 0.2, 0.5 and 1 s. For small sampling intervals, the RAS pdf is dominated by the two peaks at 0 and 1, with resulting large standard deviation. As the sampling interval increases, the density between these peaks initially increases, hence *σ*_{c} decreases. When log(*τ*/*τ*_{R}) ≈ 2, the outer peaks are no longer apparent, the density in the central region continues to broaden with sampling interval, and hence *σ*_{c} increases with *τ*/*τ*_{R}. At yet longer sampling intervals corresponding to , RASs are increasingly observed in the vicinity of 0, therefore *σ*_{c} again decreases with *τ*/*τ*_{R}, as predicted in §3 and discussed in the case of the run-only model.

The distribution of AACs is shown for *σ*_{δ} = 0.4 and *τ*_{S} = 1 s in figure 8*a*. The distributions shown do not include angle changes arising from static intervals. Comparing with figure 4, we see that the inclusion of a stopping phase causes the distribution of observed angle changes to become more concentrated about *θ* = 0. The shape of the distribution is, however, qualitatively similar in both cases. Figure 8*b* shows the dependence of *σ*_{θ} on sampling frequency. As for the run-only CRW, an asymptote is reached at , though the time scale of relaxation to this diffusive limit increases with *τ*_{S}. Furthermore, as *τ*_{S} increases, the angular deviation decreases for a given value of *τ*/*τ*_{R}. A larger value of *τ*_{S} means that, on average, fewer reorientations occur over a sample interval. As a result, the distribution of AACs has greater density at low values, and the apparent angular deviation is lower.

## 5. Analytic study of dynamic sampling distributions

While stochastic simulations are useful for assessing the effect of sampling rate on the observed CRW, they are time-consuming, as many realizations are required to obtain sufficiently smooth distributions. Furthermore, a new simulation is required for each new set of parameters, making it hard to draw general conclusions about the effect of varying the reorientation angle pdf. We therefore seek an analytic description of the underlying CRW, for application to the sampling rate problem. As before, we consider an unbiased, run-only, constant-speed, two-dimensional CRW. We include a general underlying pdf for reorientations, *f*_{Φ}(*ϕ*). In order to make analytic progress, we assume that the relative sampling interval, *τ*/*τ*_{R}, is sufficiently small that the probability of two or more reorientations occurring between consecutive sample points is negligible. Using the fact that reorientations in the underlying CRW occur as a Poisson process, this probability is given by
5.1where *N* denotes the number of reorientations in a time interval of duration *τ*. We also assume that no two consecutive sampling intervals contain reorientation events, an occurrence with probability given by
5.2Expression (5.2) is always greater than (5.1). For example, choosing *τ*/*τ*_{R} = 0.05, (5.1) evaluates to 1.2 × 10^{−}^{3}, and (5.2) gives a value of 2.4 × 10^{−}^{3}, so one of the two assumptions is broken once every 280 intervals, on average. Discounting these events, all intervals are assumed to contain at most one reorientation event, with no consecutive events. A key example of an experimental protocol that is compatible with these assumptions is bacterial motility, which is commonly probed using video microscopy to perform tracking [1]; in these studies, the frame capture rate of the digital microscope camera is typically significantly higher than the rate at which bacteria reorient [1,24,25].

Figure 9 shows two portions of a simulated trajectory corresponding to *N* = 0 and 1. The former case always returns AACs equal to 0 and step lengths equal to *c*_{const}*τ*. In the case of a single reorientation event, the true underlying trajectory consists of two straight line sections separated by an instantaneous reorientation of angle *Φ*, given relative to the previous direction of travel. In contrast, the observed path appears to contain two reorientation events, with angles *Θ*_{1} and *Θ*_{2} = *Φ* − *Θ*_{1}. The statistical properties of the unbiased CRW are time-reversal invariant, meaning that the distributions of *Θ*_{1} and *Θ*_{2} must be identical. We therefore drop the subscript in all further notation. The apparent displacement is denoted by *R*. As before, the difference between the apparent and true paths is because of the discrete sampling of the continuous underlying trajectory. Let *L* denote the distance between the last sampling point and the reorientation event. For a constant-speed process, the distance travelled by an individual over the course of a sampling interval is given by *c*_{const}*τ*. The distance between the reorientation event and the next sampling point is therefore *c*_{const}*τ* − *L*. We reiterate that the CRW model used in the present study assumes *Φ* is independent of *L*.

We now derive an expression for the joint pdf of the apparent step lengths and angle changes, *R* and *Θ*, denoted *f _{R}*

_{,Θ}(

*r*,

*θ*), assuming that the functional form of the angular pdf

*f*

_{Φ}(

*ϕ*) and the value of

*c*

_{const}are both known. Knowledge of this pdf permits us to approximate the pdf of RASs and AACs when . We use stochastic simulations to study the range of values of

*τ*/

*τ*

_{R}for which our approximation holds.

We first define non-dimensional parameters by rescaling the total path length between sampling points, *c*_{const}*τ*, as follows:
5.3

Note that is equivalent to the RAS. The desired joint pdf, denoted following rescaling, is then given by a sum of conditional joint pdfs:
5.4where is the joint pdf of and *Θ*, conditional on the number of reorientation events, . Since *N* = 0 corresponds to no change in *r* and *θ*, we have
5.5where *δ* denotes the Dirac delta function. This result alone is unremarkable; it is the inclusion of the *N* = 1 case that explains the observed broadening of the distributions of speeds and angle changes with small *τ*/*τ*_{R}. We now use an analytic description of the bijective mapping between the and representations of the *N* = 1 trajectory to compute :
5.6and
5.7The joint pdf of and *Θ*, conditional upon *N* = 1, is then
5.8where the first term is obtained by evaluating the determinant of the Jacobian and we have made use of the independence of *L* and *Φ*. For a Poisson process, the distribution of events within an interval, conditional on the number of events, is uniform [26], and hence . Substituting (5.6) and (5.7) into (5.8) yields

5.9

We now integrate the joint pdf (5.4) to obtain the marginal pdfs for and *Θ*, respectively, denoted by and *f*_{Θ}(*θ*). In order to compare our results with those from a stochastic simulation, we must specify *f*_{Φ}(*ϕ*). We again use the VM distribution with zero mean and angular deviation *σ*_{δ} = 0.4. The remaining parameters are *τ*_{R} = 1 s, *τ* = 0.2 s and *c*_{const} = 1 m s^{−}^{1}. Using the functional form of the VM distribution (2.1) in equation (5.9), it is not possible to obtain analytic forms for the marginal pdfs. We therefore use Gaussian quadrature to evaluate the pdf numerically.

Figure 10 shows the observed pdfs of RASs and AACs, computed with the stochastic simulation algorithm for *τ*/*τ*_{R} = 0.05 and *τ*/*τ*_{R} = 2, overlaid with the result of performing numerical integration of equation (5.9). The stochastic data are filtered to remove transitions in which no reorientations occur (the *N* = 0 case), as these result in a large number of artificial 0 turns, and RASs equal to 1. This is achieved by discarding transitions with an AAC whose magnitude is less than a defined numerical tolerance, which we take to be 1 × 10^{−}^{6}. Changing this value by two orders of magnitude makes no significant difference to the results (data not shown). The remaining data therefore correspond to one or more reorientation events. The agreement is good when *τ*/*τ*_{R} = 0.05 as this relative sampling interval is sufficiently small that our earlier assumptions are justified. Conversely, the agreement is poor when *τ*/*τ*_{R} = 2, with the stochastic simulation generating a broader distribution in both cases. This discrepancy for large relative sampling interval is caused by frequent violation of the underlying assumptions of our analysis.

In order to investigate the range of values of *τ*/*τ*_{R} for which equation (5.9) is valid, we compute the Kullback–Leibler (KL) divergence between the analytic and stochastic results [27]. The KL divergence quantifies the information lost when a discrete pdf, *Q*, is used to approximate a known discrete pdf, *P*, and is given by
5.10where *x* denotes the set of all permissible values of the observed data for which *Q*(*x*) ≠ 0. We let *P* equal the empirical pdf obtained from a histogram of the stochastic results and evaluate the numerical integrals of equation (5.9) over the same grid spacing to obtain *Q*. Figure 11 shows the variation of the KL divergence with *τ*/*τ*_{R} for a CRW with *σ*_{δ} = 0.4. The divergence increases more rapidly with *τ*/*τ*_{R} over the range considered for RASs than for AACs. In both cases, the observed agreement is poor for *τ*/*τ*_{R} ≥ 2 (compare with figure 10). The small residual KL divergence observed when *τ*/*τ*_{R} is very small is because of stochastic fluctuations in *P*, and decreases as the number of simulation iterations is increased (data not shown).

## 6. Conclusions

In this paper, we have considered the effect of the sampling frequency on observed quantities of interest arising from an unbiased, constant-speed CRW in two dimensions. We studied two variants of the CRW, denoted the run-only and run-and-stop models. The CRW is a very versatile framework for modelling the motion of organisms. For example, it has previously been used to incorporate biased motion owing to the influence of gravity and light [19], stationary phases [13] and prokaryotic biochemical signalling pathways [28]. Sampling frequency is an important factor, as it has a strong effect on many of the quantities that researchers are commonly interested in extracting from tracking data [4,12]. For this study, we focused on the two most widely applicable quantities: apparent speeds and angle changes between successive observations. We initially derived results that are valid at long sampling intervals, based on the assumption of a diffusive process. These provided a useful consistency check for the ensuing work. We next analysed intermediate sampling interval regimes using simulated tracks, before using mathematical methods to describe the run-only model when the sampling frequency is high.

In our simulation study, we have extended the work of Codling & Hill [12] by investigating the effect that sampling frequency has on the observed pdf, rather than on common summary statistics. Our work represents a significant advance as summary statistics, while undoubtedly useful information, may hide the complex nature of the sampling effect. For example, figure 4*a* shows that the distribution of RASs is skewed significantly towards lower speeds, a result not evident by considering the mean and standard deviation alone. Furthermore, several statistical tests based on the empirical cumulative distribution function exist to quantify the compatibility of observed data with a reference distribution [29]; this is not possible with only knowledge of summary statistics. For example, Kareiva & Shigesada [14] develop a method to test whether the observed motion in their tracking data fits a CRW model based on the MSD. The authors assess the compatibility of the data with the CRW by direct comparison of theoretically predicted and empirically observed MSDs, yielding no information relating to the statistical significance of the match. In more recent tracking studies, sufficient data have been available to perform a more in-depth comparison, using the full distribution of speeds and angle changes [30]. We therefore propose that the present study enables a robust test of the statistical compatibility of these observables with the CRW model, taking sampling effects into account. Further work is required to test this conjecture on real tracking data.

The effect of incorporating a stationary reorientation phase into the CRW has not been considered before. This model of motion is applicable to many species that are amenable to study using tracking methods. Examples include bacteria such as *R. sphaeroides* that reorient by stopping the rotation of their flagellum [31], flying insects that pause on landing sites [14] and ruminants that pause at foraging sites [4,17]. We demonstrated that the inclusion of stationary phases in the underlying motion leads to complex variation in the observed standard deviation of RASs, and suggested an explanation based on consideration of the evolution of the RAS pdf. This again highlights the utility of analysing the full pdf of observed quantities.

An open question arising from the simulation study was how to infer the true distribution of angle changes. Since sampling intervals do not in general coincide with reorientation events, we cannot simply ‘read off’ the result from a histogram of framewise angle changes. This provided the motivation for our analytic study in §5, in which we derived a mathematical description of the sampling process and its effect on the observed data in the limit of rapid sampling. In previous studies, the analytic forms for the first, second and fourth apparent displacement moments after a specified number of reorientation events have been determined [14,18]. While these results are related to the problem considered here, they are not directly applicable to a study of the effect of sampling frequency, as the independent variable being considered is the number of reorientation events, not the sampling interval. Furthermore, to our knowledge no studies have derived an analytic description of the pdf of AACs. In contrast, our method permits a description of the variation of the full pdfs of both RASs and AACs with the sampling interval for any given pdf of underlying angle changes, provided it is possible to integrate equation (5.9) numerically. Additional investigation showed that our analytic approximation agrees well with simulated data for sampling intervals corresponding to . This is a wider range of values than expected based on consideration of the probability of more than one reorientation event occurring in a given sampling interval. This suggests that our result for the pdf of RASs and AACs conditional on a single reorientation event is similar to the pdfs conditional on a greater number of reorientations, which we discounted for the purposes of this study. As a result, many tracking experiments are sampled sufficiently rapidly that our results are applicable [1,4,9]. Our approach gives important new mathematical insight, and lays the groundwork for describing more complicated motion models.

The CRW model considered in this study is two-dimensional; further work is needed to investigate whether the findings are significantly different in the case of a three-dimensional CRW. This may be a more realistic model for some types of motile behaviour, such as the motion of air- and water-borne organisms. The tracking in these cases may still be two-dimensional owing to technical constraints [19], in which case it is necessary to determine the combined effects of sampling interval and projection of the observed motion onto a plane.

An important next step in this work is to consider the role of noise, which is present in all tracks obtained experimentally. The sources of noise depend on the application, for example, measurement error in acquiring position fixes [4,32] and errors in computing cell centroids in microscope tracking studies [33,34]. Further work is required to determine the complex interplay between experimental noise and the sampling frequency, which is beyond the scope of this study. Several studies have explicitly considered the joint effects of noise and sampling rate on observed tracks, for example [4,34–36]; however, to our knowledge, no studies exist in which these joint effects have been systematically analysed using a CRW movement model. The simulation approach presented in §4 may be extended in a straightforward manner to include noise, for example by modelling measurement error as a Gaussian process [4]; this represents ongoing work.

A further area in which research is required is in more complex CRW models of motility. For example, our assumption that organisms move with a constant speed is not upheld in many species of bacteria [37], mammals [4] and fish [38]. More work is needed to elucidate the effects of sampling rate when organisms move with varying speeds. It remains to be seen whether it is possible to extend the analytic approach developed in the current study to incorporate this effect.

## Acknowledgements

G.R. is supported by an EPSRC-funded Life Sciences Interface Doctoral Training Centre Studentship (EP/ES0160S/1). G.R. and A.G.F. are funded by EPSRC (EP/I017909/1) and Microsoft Research Cambridge.

- Received March 26, 2013.
- Accepted May 16, 2013.

- © 2013 The Author(s) Published by the Royal Society. All rights reserved.