## Abstract

Zebrafish are gaining momentum as a laboratory animal species for the investigation of several functional and dysfunctional biological processes. Mathematical models of zebrafish behaviour are expected to considerably aid in the design of hypothesis-driven studies by enabling preliminary *in silico* tests that can be used to infer possible experimental outcomes without the use of zebrafish. This study is motivated by observations of sudden, drastic changes in zebrafish locomotion in the form of large deviations in turn rate. We demonstrate that such deviations can be captured through a stochastic mean reverting jump diffusion model, a process that is commonly used in financial engineering to describe large changes in the price of an asset. The jump process-based model is validated on trajectory data of adult subjects swimming in a shallow circular tank obtained from an overhead camera. Through statistical comparison of the empirical distribution of the turn rate against theoretical predictions, we demonstrate the feasibility of describing zebrafish as a jump persistent turning walker. The critical role of the jump term is assessed through comparison with a simplified mean reversion diffusion model, which does not allow for describing the heavy-tailed distributions observed in the fish turn rate.

## 1. Introduction

Zebrafish (*Danio rerio*) are rapidly emerging as an experimental species for the investigation of functional and dysfunctional biological processes, owing to their sequenced genome, high reproduction rate, short intergeneration time, prominent shoaling tendency and elevated stocking density compared with laboratory mammals [1–4]. Furthermore, with the aim of generating high-throughput behavioural data [5–7], considerable research is being performed to integrate computer-animated images and robotic replicas of conspecifics, heterospecifics and predators [8–12]. Zebrafish have been used to study the fundamental mechanisms governing the exhibition of emotional patterns [13,14], individual response to alcohol and drugs of abuse [15–17], and higher order brain functions, such as memory and learning [18]. Zebrafish behaviour is typically measured from locomotory patterns [19,20], which are detected in the form of changes in speed and turn rate [5,6]; these patterns are annotated via human observation [21] or classified on the basis of sequences of trajectory data obtained using vision-based tracking [22,23].

While building of a comprehensive ethogram of zebrafish behaviour from video tracking data is well on its way [5], progress in the opposite direction, that is, generating realistic trajectory data through mathematical modelling of a given behaviour, is far from being realized. The ability to simulate such complex behaviours may inform experimental protocols and hypotheses by enabling *in silico* experiments that can be used to predict possible experimental outcomes without animal use. Within the three R's principle (replacement, reduction and refinement) [24], such virtual experiments can be used to perform *a priori* power analysis on the number of subjects, maximize the biological output of experimental observations and reduce the effect of biological confounds. All of these factors can contribute to a reduction in animal use and suffering. Moreover, models of individual fish motion form an important building block in modelling and analysis of fish collective behaviour [25–27].

Several models for the locomotion of an individual animal have incorporated uncorrelated and unbiased random walks, where increments to an animal's position are assumed to be independent [28,29]. Recent works have also addressed animal movement at the individual level using correlated random walks (CRWs) [26,30–33]. In these studies, successive steps taken by the animal are assumed to be correlated, as in the persistent random walker (PRW) model proposed in [30] to study ant displacement and the CRW model proposed in [31] to study caribou movement. These types of processes have recently been adapted to describe fish turn rate in the form of a persistent turning walker (PTW) [33]. In particular, the tendency of a barred flagtail fish (*Kuhlia mugil*) to turn both ways—left and right—was captured by modelling the turn rate as a stochastic mean reverting process with a drift toward zero [33].

In this paper, we develop a stochastic mean reverting jump diffusion model to study zebrafish locomotion. To better represent the swimming movements of zebrafish, we extend the PTW model [33] to allow for the possibility of sudden jumps in the fish turn rate (figure 1). We refer to this dynamic model as the jump persistent turning walker (JPTW). The jumps are modelled through a stochastic mean reverting jump diffusion process, which captures large deviations in turn rate in an otherwise well-behaved random process. Jump diffusion processes are commonly used to model sudden changes in stock prices and interest rates in finance [35,36], considerable deviations in particle kinematics in statistical physics [37–39], and the dispersal of microorganisms and foraging of animals in biology [40–42]. We evaluate the JPTW model on an experimental dataset [34] consisting of trajectories of single zebrafish in a shallow water tank. We identify the parameters of our model and of the PTW model from the observed trajectories, and statistically compare the goodness of fit of the models, revealing the importance of incorporating large-deviation jumps in zebrafish locomotion.

## 2. Results

### 2.1. Zebrafish exhibits a complex behavioural response during experimental observations

Visual inspection of 5-min video recordings revealed typical zebrafish behavioural responses. Specifically, in figure 2, traces of these fish trajectories show that zebrafish direction of motion is highly variable, while the speed displays more modest variations (see also the electronic supplementary material). We identify segments of a fish trajectory in which the direction of motion changes abruptly with black circles. These circles are placed where the absolute value of the instantaneous turn rate was found to be at least three times larger than the turn rate standard deviation; the prevalence of such circles emphasizes the rate at which large deviations in the turn rate can occur.

The zebrafish turn rate has an average value of −0.062 ± 0.098 rad s^{−1} for the entire group, suggesting that, on average, fish have no preference to turn in either direction. In figure 3, we present a representative zebrafish turn rate time series with a corresponding histogram of the increments to this process. Consistent with the circles in figure 2, the instantaneous turn rate can exceed three times the turn rate standard deviation (figure 3*a*). Meanwhile, the distribution of turn rate increments exhibits heavy tails and a peakedness that cannot be described by a Gaussian distribution (figure 3*b*). In particular, the heavy tails observed in the histogram are due to the presence of numerous spikes in the turn rate time series.

Figure 4 shows that the increments to the turn rate are correlated with their history over the duration of a few seconds. While indicative of an inertia in the fish turn rate, the exponential decay seen in the autocorrelations also justifies the representation of the turn rate as an autoregressive (AR) process [43]. Assuming a first-order AR(1) model, for a given time step Δ*t* corresponding to the video acquisition sampling time, the one-step correlation coefficient [44] of turn rate is approximately e ^{−θ}^{Δt}, where the parameter *θ* is defined such that 1/*θ* is the autocorrelation time. This parameter is included in the proposed turn rate model and has an average value 1.451 ± 0.245 s^{−1} across the whole group.

### 2.2. The JPTW turn rate distribution fits the empirical zebrafish turn rate

A comparison on the basis of a quantile–quantile (Q–Q) plot [45] of the empirical data against the predictions by the JPTW model shows a linear agreement (figure 5). In particular, the JPTW model presents a better agreement for all experimental datasets (figure 5*a*) compared with the PTW model (figure 5*b*). Closer analysis of the upper and lower quantiles shows that the tails of the probability distribution remain aligned with the theoretical tails in the JPTW model, but do not match for the PTW model. Specifically, the PTW model assumes a turn rate where random increments are Gaussian, and, as a consequence, the upper and lower tails of the experimental data, corresponding to large changes in direction, are not matched by the theoretical quantiles of the Gaussian distribution. The PTW model, shown in green in figure 5*c*, does not match the observed distribution of turn rate, both in the tails of the distribution and for small turn rates, owing to the nature of the observed data, which show a distinct peakedness and heavy tails.

### 2.3. Including jumps in the PTW model improves the statistical fit on empirical data

The proposed JPTW model introduces two parameters, in addition to the relaxation rate *θ* and turn rate variability *σ* defined in the PTW model, for describing the large, occasional changes in direction. A likelihood ratio test (LRT) comparing the two models (table 1), PTW and JPTW, shows that the incorporation of jumps in the former improves the statistical fit (*p* < 0.001, LRT > *χ*^{2}(2 d.f.)). One-way ANOVA comparisons on model parameters common to the PTW and JPTW models indicate that neither the relaxation rate nor the turn rate variability is differently estimated by the two models (*p* = 0.077, *F*_{1,12} = 3.65 relaxation rate; *p* = 0.400, *F*_{1,12} = 0.75, turn rate variability).

## 3. Discussion

In behavioural studies of zebrafish, methods that rely on human observation comprise an important tool-set used in analysis [5,6]. Coupled with recently introduced tracking-based tools [23,46], these methods have led to considerable advancements in our understanding of zebrafish behaviour. Besides serving to facilitate the behavioural annotation process, the trajectory data made available through these tools can be further exploited to formulate and calibrate mathematical models of zebrafish behaviour. Such models can, in turn, be used to perform highly tailorable exploratory *in silico* experiments before animal testing. Computational trials can benefit both the biological output of animal tests, by maximizing the information content of trials, and the welfare of subjects, by improving the reliability of power analysis and reducing the role of biological confounds.

In this work, we present a model of fish turn rate temporal evolution in the form of a stochastic mean reverting jump diffusion process. In addition to offering a more robust model of zebrafish locomotion, the inclusion of jumps is intended to capture sudden and abrupt changes in direction [5,23]. Similar to previous studies on individual animal motion [32,33], zebrafish turn rate is found to have correlated time increments. Unlike the smooth trajectories seen in representative traces of larger fish, such as the barred flagtail fish in [33], the direction of motion of zebrafish is found to change abruptly. These occasional large deviations are responsible for the heavy tails observed in the probability distribution of the turn rate, and they suggest the use of a turn rate model that allows for such extreme events.

The fact that zebrafish locomotion is better described through the inclusion of jumps highlights well-known effects of morphology on swimming [47,48]. In particular, fish must accommodate the constraints imposed by viscous and inertial forces of the water. Zebrafish are thought to transition from a viscous to an inertial flow regime as they develop from larvae to adults, and its burst-and-coast swimming style may be a consequence of such a transition [49]. In adults, the burst phase of motion is often powered by a single tail flick, which changes the direction of the fish, and subsequent bursts can then occur before the resulting coasting phase is complete [49]. It is therefore tenable that zebrafish turn rate is better described through jumps. Larger fish, on the other hand, which spend more time in the inertial flow regime, tend to employ a more continuous swimming style [50]. In [26,33], for instance, a model of the turn rate of *K. mugil* without jumps, yields good predictions of its locomotion. However, *K. mugil* is at least six times larger than zebrafish and has a larger mass, which affects the interplay of viscous and inertial forces on the locomotion of the fish.

The parameters common to both the PTW and JPTW model include the relaxation rate, which measures the rate at which a turning fish ceases to turn, and the variability of the turn rate. Both parameters are not significantly different between models, suggesting that the inclusion of jumps in the PTW may not affect its baseline parameters, especially the turn rate variability. The parameters unique to the JPTW model, namely, the frequency of large turns and their magnitude, show that while jumps occur infrequently (approx. one large turn every 3 s), such turns are quick (2.69 ± 0.59 rad s^{−1}) compared with the average turn rate (−0.062 ± 0.098 rad s^{−1}). These large infrequent jumps are probably indicative of the high degree of manoeuvring control, coupled with an increased rate of axial undulations by the tail and fins by zebrafish adults [51]. Such manoeuvres, which are well below the magnitude of a typical escape response [5], are commonly exhibited by zebrafish in their natural habitat as part of an ontogenic shift in their foraging behaviour to avoid inter- and intra-specific competition [51].

Overall, these findings suggest that the inclusion of sudden jumps in a stochastic mean reverting model can aid in the quantification of zebrafish locomotion. The resulting JPTW model enables a robust interpretation of the components of fish turn rate, while improving the fit of the experimental data to the model with respect to existing models of fish locomotion, where jumps are discarded. In addition to informing the design of zebrafish behavioural studies, the new model is fully customizable as a baseline for more complex studies. In particular, similar to [26,33], the effects of stimuli (e.g. a predator or conspecifics), environmental factors (e.g. wall or obstacle, light/dark) and possible variations of speed [52] can be incorporated in the model as control parameters. With respect to the latter aspect, we have recently been collaborating to assess the possibility of extending the PTW model to account for speed variations [53]. Finally, under the framework proposed here, collective behaviour can be studied using the individual kinematics of the JPTW model alongside defined information sharing rules [26].

## 4. Material and methods

### 4.1. Experimental apparatus and procedure and data collection

The data used in this paper are derived from the trajectory dataset published in [34]. The experimental apparatus consisted of a large 120 × 120 cm square tank with a 90 cm diameter ring to partition off the experimental region. The water was maintained at a constant depth of 10 cm. A high-resolution wide-angle camera (Hero 3, GoPro, San Mateo, CA, USA) was mounted 150 cm above the water surface to observe the fish. The tank was lit from above with four 25 W fluorescent lamps mounted approximately 100 cm above the water surface. The set-up was isolated with dark curtains on each side of the tank.

Zebrafish were procured from an online aquaria (Liveaquaria.com Rhinelander, WI, USA) and were kept in the housing facility for at least 12 days prior to the experiment. The water temperature and pH in the housing tanks was maintained at 25 ± 1°C and 7.2, respectively. The stocking density was maintained at no more than 1 fish per litre and the lighting was controlled according to a 12 L : 12 D circadian rhythm [54].

The experimental procedure consisted of introducing a single zebrafish in a large tank and recording videos of its swimming movements. Ten experimentally naive fish were observed for 5 min at 24 frames per second, so that Δ*t* = 1/24 s. Further details of the experimental procedure are provided in [34].

### 4.2. Data processing

Fish motion was tracked using an automated tracking algorithm [34] developed in MATLAB that used a Kalman filter to estimate the two-dimensional position. Briefly, the Kalman filter used the fish blob centroid, after background subtraction, as measurement and a constant velocity assumption between successive frames to optimize the position and velocity estimates in a minimum mean-square sense. Fish trajectories were further verified and repaired, if needed, on a graphical user interface (GUI) to ensure the availability of full-length tracks for each experiment. Similar to [33], oscillations due to the beating of the fish tail, which occur at approximately 2 Hz on average for zebrafish [55], were removed from the data using a Daubechies wavelet filter [56] implemented with the *wden* routine in MATLAB.

Trajectory data were analysed to exclude trials where the fish demonstrated abnormal freezing or wall-following. Freezing was quantified on the basis of [57], where the fish was considered freezing if it remained within a circle of 2 cm radius for more than 2 seconds. A threshold of 10% of time freezing was used to dismiss trials for abnormal stress levels [11,21]. To avoid numerical confounds associated with the interaction with the wall, fish spending more than 35% of the experimental time within two body lengths of the circular partition were also dismissed. Future work will seek to enrich the proposed JPTW with a model of wall interactions, following a similar line of arguments as [53], to predict both orientation and position of zebrafish in tanks of varying size and geometry. We anticipate that such interactions will be more relevant as the tank size is decreased and corners are present.

Following the methodology of [33], trajectory data were mapped to the intrinsic coordinates (*S*(*k*), *ψ*(*k*)), where *S*(*k*) is the path length at discrete time *k*, and *ψ*(*k*) is the direction of motion. The latter is defined such that *ψ*(*k*) − *ψ*(*k* − 1) is the angle between the velocity vectors of the fish at the video frames *k* and *k* − 1, for *k* = 2, *…*, *T*, where *T* is the total number of frames. Fish instantaneous speed *v*(*k*) (cm s^{−}^{1}) and turn rate *ω*(*k*) (rad s^{−}^{1}) were initially computed with central difference approximations; notably, no filtering was performed on the turn rate, except for treating a few false readings (with values greater in magnitude than 20 rad s^{−1}), for which the turn rate at the previous time step was then used. The intrinsic coordinates (*S*(*k*), *ψ*(*k*)) were then obtained using finite differences. We comment that the procedure to construct *ψ*(*k*) does not wrap the fish direction of motion around *±π*, leaving the variable *ψ*(*k*) free of artificially induced jumps.

### 4.3. Fish model

We extend the work in [33] to model zebrafish turn rate in the form of a mean reverting stochastic differential equation (SDE) with jumps, which are intended to capture sudden changes in the turn rate. Our model consists of three components in a continuous-time formulation. The first component drives the turn rate to its mean value, which is chosen to be zero. We refer to this process as mean reversion. The second component is standard Brownian motion, which represents continuous but well-behaved variation in the turn rate, and is modelled using a Wiener process [58]. The final component is a jump process whose mechanism is derived in what follows. In brief, the jump term will occasionally increment the turn rate by a potentially larger amount. More formally, the turn rate *ω _{t}* is described by the following SDE in time

*t*4.1where d

*t*(s) is an infinitesimal time increment;

*θ*(s

^{−}^{1}) denotes the mean reversion coefficient, also known as the relaxation rate [32] that characterizes the tendency of a turning fish to return to straight swimming (its inverse

*τ*= 1/

*θ*(s) is the relaxation time [26]);

*σ*(rad s

^{−}^{1}) captures the variability of the turn rate increments;

*B*is a standard Wiener process [58] whose increments d

_{t}*B*are mutually independent and sampled from a normal distribution ;

_{t}*J*is the jump process [59]; and d

_{t}*J*is an infinitesimal jump increment. Here and in what follows, we use a subscript

_{t}*t*to identify continuous-time stochastic processes.

The jump diffusion term *J _{t}* in equation (4.1) is modelled as a compounded Poisson process [35], that is, , where

*Y*, for

_{j}*j*= 1, 2,…, are independent and identically distributed (i.i.d.) Gaussian random variables with variance

*γ*

^{2}. The value of the jump process at time

*t*depends on the number of terms

*Y*that are included in the summation, which is determined by

_{j}*ν*. The number of jumps

_{t}*ν*is a counting process [60], defined so that for any

_{t}*r*and

*t*such that

*r*≤

*t*,

*ν*−

_{t}*ν*is a Poisson random variable with parameter

_{r}*λ*(

*t*−

*r*). Hence,

*ν*controls the number of discrete changes to the fish turning rate through

_{t}*J*, and the rate of the occurrence of these jumps is

_{t}*λ*. When a change in the fish turn rate due to a jump occurs, the value of the jump is normally distributed with variance

*γ*

^{2}. Based on this approach, the model (4.1) is observed to capture two distinct types of dynamics, one based on the Wiener process, and another that additionally considers larger changes in turn rate occurring at a rate

*λ*.

Using Itô's integral formula [58,61], for any *r* and *t* such that *r* ≤ *t*, a solution to equation (4.1) is
4.2where and are the conditional mean and variance of the stochastic process *ω _{t}* without the jump diffusion term and given a previous turn rate

*ω*. They are given by the following analytical expressions: 4.3aand 4.3bFor small time increments Δ

_{r}*t*, a discrete-time approximation [62,63] of equation (4.2) is 4.4where

*ω*(

*k*),

*k*= 1,

*…*,

*T*, is a discrete-time approximation that converges weakly to the continuous time process

*ω*[64];

_{t}*ɛ*(

*k*) and

*ζ*(

*k*), for

*k*= 1,

*…*,

*T*, are normally distributed random variables with zero mean and unit variance; and

*ν*(

*k*Δ

*t*) −

*ν*((

*k*− 1)Δ

*t*) is a Poisson random variable with intensity

*λ*Δ

*t*.

We assume that at most one jump occurs in a small discrete time interval Δ*t*. Therefore, the Poisson random variable *ν*(*k*Δ*t*) − *ν*((*k* − 1)Δ*t*) can be approximated by a Bernoulli-distributed random variable with parameter *λ*Δ*t* [65]. Thus, with probability *λ*Δ*t*, an increment to the turn rate includes the jump term, the Brownian motion, and the mean reversion; with probability 1 − *λ*Δ*t*, the turn rate is affected by only the Brownian motion and mean reversion. Accordingly, we obtain the following conditional likelihood probability density function (pdf) for the turn rate in equation (4.4)
4.5where is the Gaussian pdf whose arguments are the mean *m* and variance *v* defined in equations (4.3*a*) and(4.3*b*), respectively, and (*θ*, *σ*, *γ*, *λ*) denotes the set of unknown parameters of the JPTW model.

### 4.4. Statistical analysis

The autocorrelation lag coefficients were computed in MATLAB according to the approach in [44]. Specifically, the autocorrelation for a corresponding time lag *k* is
4.6where is the average value of the turn rate time series *ω*(*k*), *k* = 1,*…*, *T*. Using the expression of *ω*(*k*) in equation (4.4), the one-step correlation coefficient of the turn rate realizations is for a given time increment Δ*t*.

Model parameters were estimated using a maximum-likelihood estimation (MLE) method [66]. Specifically, we numerically maximize the log-likelihood for the PTW modelwith the respect to the parameters *θ* and *σ*, where *T* is the number of time steps. The initial value for the PTW model parameters are set to the values obtained using a simple linear regression, see for example [63]. We also numerically maximize the log-likelihood JPTW modelwith respect to the parameters *θ*, *σ*, *λ* and *γ*. To estimate the parameters of the JPTW model, the initial values of the parameters *θ* and *σ* are set to the MLE estimates [63] of the PTW model without jumps, the initial value for the jump deviation is *γ* = 1, and the initial value of the jump rate *λ* is set to the fraction of video frames in which the fish turn rate exceeds three times the standard deviation of the experimental turn rate. The optimization algorithm was implemented in MATLAB using the global search and multistart solvers available in the global optimization toolbox (details on the robustness of the identification can also be found in the electronic supplementary material).

To compare the goodness of fit of either the JPTW model or the PTW model to fish data, we used the LRT [67]. The null hypothesis is rejected if the value of was greater than the 5% quantile of a *χ*^{2} distribution with 2 d.f. One-way ANOVA tests were used to analyse zebrafish turn rate and estimated model parameters for individual fish and their variations. The significance level was set to *p* < 0.05.

### 4.5. Model validation

Model validation was performed using a Q–Q plot [68]. When sample data arise from the modelled probability distribution, the Q–Q plot depicts a straight line [45]. The simulated turn rate time series with jumps was obtained using the discrete-time approximation in equation (4.4) with the MLE parameters estimated from experimental data, while simulated turn rates for the PTW model were obtained from equation (4.4) with the jump frequency *λ* set to 0.

## Ethics statement

The experimental procedure in [34] is approved under the protocol number AWOC-2013–103 by the Animal Welfare Oversight Committee of the Polytechnic Institute of New York University (now New York University Polytechnic School of Engineering).

## Funding statement

This work was supported by the National Science Foundation under grant nos. CMMI-0745753, CMMI-1129820, and DGE-0741714.

## Acknowledgements

The authors would like to thank Dr Simone Macrì for useful discussions.

- Received August 8, 2014.
- Accepted October 17, 2014.

- © 2014 The Author(s) Published by the Royal Society. All rights reserved.