## Abstract

Most bacterial habitats are topographically complex in the micro scale. Important examples include the gastrointestinal and tracheal tracts, and the soil. Although there are myriad theoretical studies that explore the role of spatial structures on antagonistic interactions (predation, competition) among animals, there are many fewer experimental studies that have explored, validated and quantified their predictions. In this study, we experimentally monitored the temporal dynamic of the predatory bacterium *Bdellovibrio bacteriovorus*, and its prey, the bacterium *Burkholderia stabilis* in a structured habitat consisting of sand under various regimes of wetness. We constructed a dynamic model, and estimated its parameters by further developing the direct integral method, a novel estimation procedure that exploits the separability of the states and parameters in the model. We also verified that one of our parameter estimates was consistent with its known, directly measured value from the literature. The ability of the model to fit the data combined with realistic parameter estimates indicate that bacterial predation in the sand can be described by a relatively simple model, and stress the importance of prey refuge on predation dynamics in heterogeneous environments.

## 1. Introduction

The topography of most bacterial habitats is complex in the micro scale. The gastrointestinal, tracheal tracts and the soil are important examples. Previous studies showed that heterogeneous habitat may alter the interaction between bacteria and their prey and promote coexistence, mainly due to prey refuge [1–3]. Prey refuge may occur due to the existence of bacterial post-predation debris [1], the existence of habitat fixed structures (soil particles, [2]), or self-organized ones, such as variable prey and predator densities and biofilms [2–4]). Thus, exploring the role of spatial structures in the interaction between sympatric microorganisms such as bacteria and their predators/parasites can be of fundamental importance to microbial ecology, epidemiology and agronomy [5,6].

Although there are a myriad of theoretical (mathematical modelling) works that explore the role of complex spatial structures on antagonistic interaction among animals (e.g. competition, predation and parasitism), there are many fewer experimental studies that explore, validate and quantify their predictions [7–9]. For example, most evidence of the effect of prey refuge on the temporal dynamics of predator–prey systems are based on theoretical works and simulations [1,2,7,8], while only few experimental set-ups have found evidence that refuge indeed promotes coexistence [3,4]. One of the main reasons for this mismatch is the practical difficulty in terms of resources and time needed to manipulate natural habitats into experimental set-ups, which enable quantification and hypothesis testing. In this context, microbial habitats can be much easier to control compared with macro ecological systems due to their smaller size and the shorter generation time of microorganisms [10].

In this study, we quantify the dynamics of the predatory bacterium *Bdellovibrio bacteriovorus* and its prey, *Burkholderia stabilis* st.2 in soil. *Bdellovibrio bacteriovorus* belongs to the *Bdellovibrio* and like organisms (BALOs), a group of bacterial predators that prey upon Gram-negative bacteria and can be found in various environments like soil, fresh and marine water, and animal guts [11]. The life cycle of BALOs includes a highly motile free-living phase that searches for an appropriate prey. Upon encounter, the predator penetrates the prey periplasm and converts it into a bdelloplast that provides nutrient and shelter to the predator during its growth and division to progeny cells, that finally burst out to start a new cycle. Predation by BALO has a lot in common with bacteriophage parasitism: in both cases, infectious particles invade a host cell, which finally dies while emitting more infectious particles into the environment. However, unlike BALOs, which typically kill and consume their victims, phages can either engage in a lytic cycle or stay inactive for a long period.

During the past decades, several mathematical methodologies have proposed ways to incorporate spatial heterogeneities into population dynamics models. The most important ones include the use of partial differential equations (e.g. reaction–diffusion models), coupled map lattices, spatial moments and metapopulation models [7,12–15]. These models are usually difficult to analyse and parametrize, and it is difficult to draw general conclusions. Consequently, attempts have been made to formulate simpler frameworks, also known as ‘strategic models’ [16] which use low dimensional ordinary differential equations (ODEs) that have fewer parameters, and are thus easier to analyse and interpret. These models capture some features regarding the complex role of space in population dynamics by using implicit spatial parameters [17]. In addition, the limited number of parameters in strategic models facilitates their estimation even in cases of a limited amount of data.

Fitting models to time-series data has become standard practice in epidemiology [18–26], while in ecology it is much less used, due mainly to the lack of sufficient data. Nevertheless, some important theoretical and practical contributions include [27–30]. Fitting a differential equation model to empirical data is usually done by maximum-likelihood, nonlinear least-squares or Bayesian methods. Although maximum-likelihood estimation has desirable statistical properties, currently, there is no method (either numerical or analytical) which can assure optimal parameter estimates. The likelihood function is often multivariable, and may have complicated surfaces with several local minima, maxima and saddle points, all of which (provided that they are interior points) may lead to local convergence by the optimization algorithms. Finding the maximum-likelihood estimators can therefore be highly dependent on the chosen initial values used in the optimization algorithm, thus making the search for acceptable optimum computationally demanding and complex.

To quantify the dynamics of the predatory bacteria *B. bacteriovorus* and its prey, *Bu. stabilis* st.2 in soil, we constructed a ‘strategic’ dynamic model, fitted it to data in order to study its dynamical behaviour in different soil humidity and estimated its parameters. The model describes the predation between a predator and a bacterial prey strain, and more generally, explores the role of complex spatial structures on predator–prey interaction in a microbial system. The model is relatively simple, and captures the effect of spatial structures implicitly via a special refuge parameter (see model description for further information). The model was fitted to the experimental data by further developing and optimizing a novel statistical procedure; specifically, we used a direct integral approach that exploits the separability of state equations and parameters, thus overcoming the difficulty of exploring complex likelihood parameter space. Model validation was done by fitting the model to data and contrasting some of our estimated parameters with their known values from the literature, thus providing additional confidence to both model representative and statistical methodology abilities.

## 2. Data and experimental design

Soil microcosm experiments: 10 g of fine sand was used to construct microcosms in flasks. Each flask was inoculated with a suspension of the *B. bacteriovorus* 109 J predator (2 × 10^{6} plaque forming units, PFU × ml^{−1}) and of the *Bu. stabilis* st.2 prey (1 × 10^{8} colony forming unit, CFU × ml^{−1}) and water to create microcosms with different water contents (WC, w/w) ranging from 100% (fully saturated) to 20%. Treatments were in triplicate. The vials were sealed and incubated at 28°C. At selected time points (8, 12, 24, 36, 48, 96 and 168 h), the total volume of water in each flask was adjusted with HEPES buffer by weight, to reach 100% WC, mixed and the liquid was removed from the flask to a sterile test-tube. In total, 50 µl from this was taken for dilution plating to measure the concentration of the predator and of the prey in each sample.

## 3. Model description

As noted earlier, the interaction between the BALO and its prey resembles the dynamics of phage–bacteria. We therefore adopted a designated modelling framework developed previously for this type of system [31–34], and further developed it to fit the above experimental set-up. In this framework, ODEs are used to model the dynamics of the system which includes free predators, P, prey, N, and predator–prey complexes (the bdelloplast), C. In ours and similar systems (like phage–bacteria), the time it takes the predator to handle its prey (i.e. searching and invading it to form bdelloplast) is of the same order as the time it takes for the consumed prey items to be converted into new predators (i.e. reproduction). To account for this dynamics, we explicitly modelled them in a separate compartment.

The separate complex compartment therefore models the delay between the predator invasion and the burst of the bdelloplast (which releases the predator progeny) as was proposed in some earlier models [31,33,34]. The dynamics of these compartments are described by
3.1where the notation *P*′(*t*) stands for d*P*/d*t*, and the same for *C* and *N*. In equation (3.1), the predators are produced at the rate *ksC* (first equation in (3.1)), where *k* is the number of predatory bacteria emitted by one bdelloplast (multiplication factor) and *s* is the bdelloplast decay rate. We assume for simplicity that the predator has a density-independent death rate, *d*, and the complex formation rate can be approximated using the classical mean field assumption, i.e. it is equal to *aNP* [35]. To account for the prey refuge dynamics, we add an additional new parameter, *r*, denoting refuge prey density, which is not available to the predator. Prey refuge may occur in our set-up due to habitat spatial heterogeneity consisting of soil grains [1,2]. The formation rate of bdelloplast complexes is therefore updated here to be *a*(*N* − *r*)*P*, where *a* is the interaction (contact) rate. Because in our system the prey is not supplied with nutritional substrate, it neither reproduces nor dies (its death rate is too low to be accounted for in our experimental timescale). Thus, the only prey loss in the model is due to predation, i.e. *a*(*N* – *r*)*P*.

In order to assess whether the model can capture the predator–prey dynamics in a heterogeneous environment and to estimate the model parameters, we further develop the *direct integral* approach [36,37]. Indeed, testing the model both qualitatively and quantitatively (i.e. estimating the parameters based on experimental data) is of interest to several disciplines, including microbial ecology, theoretical ecology (modelling) and statistics. Furthermore, in our case, the conversion ratio *k*, has been previously measured directly and independently in designated experiments. Our estimations can therefore be contrasted with a ‘gold standard’ value, thus enabling the validation of the model and its proper interpretation (i.e. that the model parameters indeed represent what is commonly believed).

## 4. Estimating model parameters using the direct integral method

### 4.1. Background

We denote a system of ODEs by
4.1where *x*(*t*) takes values in in and in Given known values of *ξ* and *θ* the solution of (4.1) is denoted by

The common statistical model assumes that measurements are collected at a series of time points *t*_{1}, … ,*t _{n}*, each of them includes a signal and an additive error term,
4.2where the random variables are independent measurement errors (not necessarily Gaussian) with zero mean and finite variance. As is the case with the experimental study considered in this work, we allow the system to be only partially measured and thus

*r*≤

*d*. Based on the observation

*Y*(

_{j}*t*),

_{i}*i*= 1, … ,

*n*,

*j*= 1, … ,

*r*one could estimate the parameters

*θ*. In cases where the initial values

*ξ*are not known, one would also like to estimate them. However, here the initial values are known and therefore are not discussed further in this paper.

As in most biological models, the ODEs (4.1) are nonlinear and therefore numerical integration techniques are required in the estimation process. For instance, the nonlinear least-squares estimator of *θ* is defined as a minimizer of the least-squares criterion function
Here, is a numerical solution (e.g. using Runge–Kutta) of the ODEs equation (4.1) for a given parameter and initial values. Thus, estimation methods such as nonlinear least-squares or maximum-likelihood require the system to be solved numerically for a large set of potential parameters values, and then choosing an optimal parameter using some nonlinear optimization technique. However, the combination of sparse and noisy data, nonlinear optimization and the need for numerical integration makes the parameter estimation a complex task (even for systems of low dimensions, e.g. [38]), and in many instances requires heavy computation. In recent years, the inverse problem of parameter estimation for ODEs has received growing attention in the statistical literature. In particular, much focus has been given to developing estimation methods that bypass the need for numerical integration (see [39] and the discussion therein; [40–44], and more recently [36,37,45,46]).

### 4.2. The direct integral approach

Below, we further develop the direct integral approach. The method is an extension of ‘two-step’ approaches [47,48] that include step (i): bypassing numerical integration by using non-parametric smoothing of the data, and step (ii): estimating the parameters by fitting the ODEs model to the estimated functions, as explained below. Let , and stand for non-parametric estimators (e.g. smoothing the data using splines or local polynomials) of the solution *x* of the ODEs equation (4.1), and its derivative *x*′, respectively. The criterion function of the two-step approach for a fully observed system of ODEs takes the form
4.3where *w* is an appropriate weight function and denotes the standard Euclidean norm. The estimator of the parameter will be the minimizer of the criterion function (4.3), with respect to *θ*. The direct integral approach takes advantage of linear feature of the ODEs system as explained now. Consider the Lotka–Volterra system of ODEs [49], a classical population dynamics model that describes evolution over time of the populations of two species, predator and its prey. The system takes the form
4.4Here, *x*_{1} and *x*_{2} represent the prey and predator population, respectively. One can view (4.4) as a regression where the ‘covariates’ variables are the solutions of the ODEs on the right-hand side of the equations, while the ‘response’ variables are the derivatives *x*′(*t*) on the left-hand side. We refer to such systems as ‘linear in the parameter’ (as in a linear regression model), where stands for the matrix transpose. Thus, when using the criterion equation (4.3), the parameter *θ* can be estimated in an ordinary least-squares fashion where nonlinear optimization is not required. More generally, consider the case where the model is linear in the parameters, namely, that
4.5where the measurable function maps the *d*-dimensional column vector *x* into a *d* × *p* matrix (typically *d* ≤ *p*). For instance, model (4.4) has *d* = 2 equations and *p* = 4 parameters and the 2 × 4 matrix *g*(*x*(*t*)) from (4.5) is given by

Further, note that by integration, (4.1) and (4.5) yield the system of integral equations 4.6

Here, is the true solution of the ODE. Let . Motivated by (4.6), one can consider minimizing with respect to *θ* the criterion function
4.7

Let . Minimizing the criterion function (4.7) with respect to *θ* yields the direct integral estimator
4.8

Necessary and sufficient conditions for -consistency of the direct integral estimator (4.8) are provided in [36]. Furthermore, the extensive simulation study in the aforementioned paper has demonstrated that using integrals as in (4.8) instead of derivatives as in (4.3) yields more accurate estimates. Indeed, it is well known (see [50] and [51]) that estimating derivatives from noisy and sparse data may be rather inaccurate. An application of the direct integral method to a variety of synthetic and real data was shown to yield accurate and stable results in [45] and [46].

The methodology discussed so far is aimed at cases where all the equations of the ODEs can be measured. Recall that in our experiment *x* = (*P*, *C*, *N*), and one variable (the complex *C*) is unobserved. The direct integral methodology for dealing with partially observed systems linear in the parameters was proposed in [37] and its consistency was proven in [52]. However, in the case of equation (3.1) there are two main challenges (i) the system is partially unobserved, and (ii) equation (3.1) is not fully linear in all the parameters. Thus, below we further develop the direct integral method to enable parameter estimation in such a case. Note that our notation in the sequel resembles the specific case where only one equation of the system is unmeasured; however, the method can be used for more general cases. We first describe the model we have in mind
4.9where . Here, *θ*_{NL} stands for the ‘nonlinear’ parameters that cannot be separated from the equations, while *θ*_{L} are the ‘linear’ parameters, as above in equation (4.5). More details regarding the above abstract form equation (4.9) are given in the electronic supplementary material. Now, denote the measured equations of *x* by *m* and the unmeasured ones by *u*. Using the observations we have for *m*, we first generate an estimator using non-parametric smoothing, denoted by . In order to deal with the unmeasured states, we continue as follows. Define for some function the quantity (the vector *x* is assumed to be arranged such that its first equation is the measured one *m*). Let
Motivated by equation (4.7), we define the criterion function
4.10Minimizing with respect to *θ*_{L} yields
where
Plugging back into equation (4.10) results with
Finally, let be some appropriate space of functions on [0,*T*] and define
Then the estimator for *θ* is given by
4.11

### 4.3. Smoothing

In order to estimate the solutions of the ODEs model (3.1), we use cubic B-splines. In particular, it is assumed that the ODE solutions can be approximated for any by a linear combination of cubic *B*-spline functions denoted by *ϕ _{k}*(

*t*),

*k*= 1, … ,

*K*, , namely,

_{ℓ}*The choice of number of bases*. As the regularity of the solutions of the ODEs model (3.1) might be different, we allow the number of bases

*K*

_{1},

*K*

_{2},

*K*

_{3}to be (potentially) different. Choosing the number of cubic spline bases is crucial [53] and is known to affect the accuracy of the final estimates. Let

*K*= {

*K*

_{1},

*K*

_{2},

*K*

_{3}} stand for the number of bases used for the first, second and third equations of the system (3.1), respectively. Denote by the vector of parameter estimators calculated using the direct integral method (4.11) for a given number of bases

*K*= {

*K*

_{1},

*K*

_{2},

*K*

_{3}}. For each , we solve the system of ODEs (using numerical integration) to obtain , and . We then choose the combination that minimizes the squared distance between the observations (recall that

*C*is unobserved), denoted by , and the solutions of the system 4.12

The final estimator is given by . Here, we consider variety of combinations of triplets *K* = {*K*_{1}, *K*_{2}, *K*_{3}}. Extensive simulation studies suggest (see §5, electronic supplementary material) that the regularization (4.12) leads to accurate and stable estimation results.

In the following section, we study some finite sample properties of the methodology described above.

## 5. Finite sample properties of the estimation method

In order to test the finite sample performance of the direct integral method developed above, we conducted a large Monte Carlo experiment. We solve model (3.1) with initial conditions given by , similar to the experimental design, and the ‘true’ parameters given by (for values that are in the vicinity of those estimated from the real data, see table 2). Recall that the statistical model we consider is additive as in equation (4.2). Here, we add Gaussian measurement errors to the deterministic system equations so that the measurements of the predator will be given by , and those of the prey are given by , *i* = 1, … ,*n*, where *P*(*t _{i}*;

*θ*,

*ξ*) and

*N*(

*t*;

_{i}*θ*,

*ξ*) are the deterministic solutions of (3.1) at point

*t*with respect to initial values

_{i}*ξ*and parameter

*θ*. The measurement error follows a Gaussian distribution with zero expectation and a standard deviation that is proportional to both

*P*and

*N*: , and where , and are the means of , and over the time interval of the experiment, respectively.

In the Monte Carlo experiment, 400 different random samples were generated. We consider two experimental set-ups, in the first we sample the model using the exact sampling times as done in the experimental set-up [0, 8, 16, 24, 36, 48, 96, 168], hence, *n* = 8. In the second set-up, we consider *n* = 16 time points [0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 72, 96, 168]. Next, we estimate the model parameters using the direct integral method. Table 1 summarizes the estimation from the simulations with the additive noise, note the ability of the direct integral method to obtain good estimates (especially given a small sample of size *n* = 8).

In section 1.5 of the electronic supplementary material, a comparison between the direct integral method and nonlinear least-squares was conducted for various noise levels. The findings suggest that for small number of points, the performance of the direct integral method as applied here, in terms of residual sum of squares, is comparable to that of the nonlinear least-squares.

## 6. Estimating parameters of the predator–prey–bdelloplast system

We fitted model (3.1) to five different environmental conditions (various different water contents). Thus, we estimated the five parameters of each configuration using its corresponding data (table 2).

Figure 1 demonstrates the ability of our simple ‘strategic’ model to capture both qualitatively and quantitatively the dynamics of predation in a heterogeneous environment.

To the best of our knowledge, in previous studies only the progeny size (parameter *k*) was directly measured. The estimated *k*'s in our case fall within the range of 1–9, known values from the literature, see, e.g. fig. 1*a* in [54]. The estimation results of the other four parameters seem to be stable over the water content gradient.

In order to provide further information regarding the estimation process, and to show the strength of the direct integral method, we present two additional plots. The spline estimators used in the estimation procedure are displayed in electronic supplementary material, figure S1. One can see that the splines are able to capture the dynamics of the predator–prey interactions and specifically, to recover the unmeasured complex *C*. The loss function given in equation (4.7) of electronic supplementary material, used to choose the optimal *s* (i.e. the bdelloplast decay rate) is displayed in electronic supplementary material, figure S2 for the five experimental set-ups. One can see there a clear minimum, which is an indication of the ability of the method to estimate the parameter *s*.

## 7. Discussion

To the best of our knowledge, this is the first attempt to fit a BALO–prey model to experimental data using a statistical procedure. The model we have developed can both qualitatively and quantitatively describe the temporal dynamics of a predator–prey system in the heterogeneous soil environment. The fact that the model can be fitted to the experimental data indicates that although the soil is a spatially complex environment at the microbial scale, predation dynamics can be captured by a relatively simple dynamical system of ODEs with prey refuge and separated complex compartment denoting the time delay between predator invasion and progeny emergence. This is a relatively rare and encouraging result; mean field approximation of interacting species in a spatially complex environment usually gives poor results [55]. To obtain reasonable results under such circumstances, dynamical models usually include spatial structures by various techniques (e.g. metapopulation, spatial moment equations, coupled map lattice, etc.) [55].

According to our model, not all the prey is available to the predator (i.e. some of it has a refuge); the prey population density decreases due to predation, and approaches *r*, the refuge parameter, asymptotically. When the refuge parameter was absent from the model equations (i.e. *r* = 0), fitting results were poor (data not shown) and the prey gradually approached zero (i.e. become extinct) and not a positive value. In our experimental set-up, the prey cannot coexist with its predator, because it does not reproduce due to the lack of nutritional substrate, yet, our model indicates the role of the complex spatial structure in protecting the prey and thus may contribute to coexistence, as indicated by previous theoretical and laboratory studies [1–4,56].

In this paper, we demonstrated the application of the direct integral approach to a challenging experimental set-up. The observations were collected for only eight time points, and only for part of the system, namely, for predator and prey; the ‘complex’ was not observed. By exploiting linear features of the dynamics system and using non-parametric smoothing, the direct integral approach enabled us to reduce the complex nonlinear optimization to a simple search over a grid of values of a single parameter *s* (see details in electronic supplementary material, and also figure S2 there). That resulted in a reasonable parameter estimates as well as important insights. For instance, we note that information about the parameter *r* can be recovered mostly from the tail of the experiment, where we have only two observations (this can be easily seen from model (3.1), where *N*′ = 0 when *N* = *r*). This fact made the estimation task a very challenging one. Thus, one concludes that measuring the process in its ‘tail’ is also important, a fact that should be taken into account in future experimental design. Our analysis was broken into steps (see the detailed analysis in the electronic supplementary material), an approach that resulted in improved stability of the parameter estimation (see, e.g. [38]). Finally, our analysis is based on a small number of observations, so the conclusions should be viewed as encouragement for further research using more data points, which will allow better inference. Indeed, in order to move from point to interval estimation, future work should combine improved data collection and further theoretical studies, which will enable the calculation of reliable confidence intervals (see the electronic supplementary material for details).

## Data accessibility

The Matlab code used to generate the results of this paper will be provided by the first author upon request.

## Authors' contributions

I.D. developed the statistical methodology, carried out the simulations, analysed the data and drafted the manuscript; E.M. developed the mathematical model, participated in data analysis and helped draft the manuscript; M.P. carried out the laboratory work; D.E.K. is DARPA programme's principle investigator and project coordinator; E.J. designed the laboratory work, and helped draft the manuscript; A.H. conceived the study, coordinated the study and helped draft the manuscript. All authors gave final approval for publication.

## Competing interests

We declare we have no competing interests.

## Funding

This research was sponsored by the U.S. Army Research Office and the Defense Advanced Research Projects Agency (DARPA) was accomplished under Cooperative Agreement Number W911NF-15-2-0036. In addition, this research was supported by the Israeli Science Foundation grants nos. 1583/12 and 387/15, and by a Grant from the GIF, the German-Israeli Foundation for Scientific Research and Development number I-2390-304.6/2015.

## Disclaimer

The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Office, DARPA, or the US Government. The US Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon.

## Acknowledgements

We thank the editor and two anonymous reviewers for their constructive comments, which helped us to improve the manuscript.

## Footnotes

Electronic supplementary material is available online at https://dx.doi.org/10.6084/m9.figshare.c.3593693.

- Received July 1, 2016.
- Accepted November 28, 2016.

- © 2017 The Author(s)

Published by the Royal Society. All rights reserved.