Abstract
Ecological monitoring aims to provide estimates of pest species abundance—this information being then used for making decisions about means of control. For invertebrate species, population size estimates are often based on trap counts which provide the value of the population density at the traps' location. However, the use of traps in large numbers is problematic as it is costly and may also be disruptive to agricultural procedures. Therefore, the challenge is to obtain a reliable population size estimate from sparse spatial data. The approach we develop in this paper is based on the ideas of numerical integration on a coarse grid. We investigate several methods of numerical integration in order to understand how badly the lack of spatial data can affect the accuracy of results. We first test our approach on simulation data mimicking spatial population distributions of different complexity. We show that, rather counterintuitively, a robust estimate of the population size can be obtained from just a few traps, even when the population distribution has a highly complicated spatial structure. We obtain an estimate of the minimum number of traps required to calculate the population size with good accuracy. We then apply our approach to field data to confirm that the number of trap/sampling locations can be much fewer than has been used in many monitoring programmes. We also show that the accuracy of our approach is greater that that of the statistical method commonly used in field studies. Finally, we discuss the implications of our findings for ecological monitoring practice and show that the use of trap numbers ‘smaller than minimum’ may still be possible but it would result in a paradigm shift: the population size estimates should be treated probabilistically and the arising uncertainty may introduce additional risk in decisionmaking.
1. Introduction
Ecosystems are under pressure owing to anthropogenic impact, which has increased considerably over the last few decades [1]. This pressure can significantly affect the structure of ecological communities, often enhancing population outbreaks of harmful species (e.g. as a result of a collapsed topdown control, cf. [2]). In particular, biological invasions have been identified as one of the main reasons of ecosystem degradation and biodiversity loss around the world [3]. The situation is exacerbated by the ongoing environmental/climatic changes that may affect areals as well as abundances. A comprehensive ecological monitoring is therefore necessary in order to provide sufficiently detailed and timely information about species that can potentially cause problems, such as, for instance, alien species that may become dangerous pests. Wellknown examples of the latter are given by invasion of the gypsy moth in the USA [4], invasion of Japanese knotweed in the UK [5,6] and invasion of Mnemiopsis leidyi in the Black and Caspian Seas [7,8].
In terrestrial ecosystems, especially in agriecosystems, ecological monitoring is usually a part of the integrated pest management programme [9,10], which recommends application of pesticides once the pest abundance exceeds a certain threshold [11]. Use of chemical pesticides has its obvious drawbacks in additional costs, including wheeling damage to the crop and operator costs (sometimes substantial) added to the agricultural product, and in potential damage caused to the environment [12]. Although it does not seem possible to fully avoid the use of pesticides at the current state of agricultural science and technology, it is certainly possible to further optimize their use. One way to achieve this goal is to provide robust information about the abundance of a given target species. In a somewhat different ecological context, accurate information about population abundance is required to trace the spread of harmful invading species: control measures are much more effective when applied at the right time and in the right place.
There are several methods to collect information about species abundance depending on their biological traits. In this paper, we mostly focus on the problems associated with invertebrates. For invertebrates such as insects or planarians, a common method to collect field data regarding the species' abundance is trapping. A number of traps are installed across the monitored area (figure 1), e.g. in a field or grassland, they are emptied on a regular basis (e.g. daily or weekly), their content is analysed, different species identified and counted. The trap counts are then used to estimate the population density of the problematic species at the position of traps, e.g. by dividing the trap counts by the effective catchment area (cf. [13,14]).
Exhaustive information about species' presence in a given area is given by its population size, i.e. by the total number of its individuals. In order to get reliable estimates of the population size, the information about population abundance obtained at some particular locations must somehow be ‘extended’ over the whole monitored area. This is routinely performed by statistical analysis of the samples [15]. Indeed, application of statistical methods in ecology has a long and successful history [16–18]. Estimation of the population size is usually based on the arithmetic mean: let u_{1}, u_{2}, … , u_{N} be the population densities at N different positions in space (i.e. position of the traps), then the average density is estimated as [16] 1.1where M is the population size and A is the total monitored area.
There are, however, a few problems in implementing of equation (1.1). The probability theory predicts that equation (1.1) gives the exact value of the population size when N becomes infinitely large. In practical applications, N is of course finite, which means that the theoretical results obtained in the largeN limit may not be valid. While in a particular scientific study, the number N of traps per given area can be quite large, e.g. of the order of hundreds, in routine pest monitoring programmes N rarely exceeds 20 (cf. [19]) and, in some cases, it can be as small as one or a few traps per field [20]. There are several reasons why the number of traps cannot be larger, for instance:

— handling a large number of traps is labour and resourceconsuming and hence is often not possible, especially in nationwide or regional ecological monitoring programmes where traps may be operated simultaneously in many fields;

— traps introduce a disturbance into the field (cf. figure 1). Installing a large number of them can damage the corresponding agricultural product considerably, thus making the subsequent protective measures rather senseless;

— large number of traps may also have a disruptive effect on the behaviour of the monitored animals and hence result in a biased estimate of the population size.
Application of equation (1.1) in the case of small N increases the level of error of the population estimate and hence remains disputable. In particular, the question arises as to what can be the minimum number of traps required to provide a robust estimate of the population size. While equation (1.1) may work well for a small N when the population spatial distribution is random or homogeneous, its application is problematic when the population distribution exhibits a considerable degree of aggregation. However, the latter is a much more common case in ecology than the former [21], e.g. either owing to the heterogeneity of environment [22] or owing to the selforganized animal grouping and pattern formation [23,24]. Several methods have therefore been developed to estimate the required number of traps for a pest sampling programme where the trap catches display aggregated spatial distribution. Most often these are based on the negative binomial distribution or Taylor's power law, which provide indexes of aggregation [25]. However, these techniques still require an initial intensive sampling phase to determine the expected parameters of the frequency distribution.
The second and even more important problem is that equation (1.1) and the existing statistical methods for estimating the required number of traps do not take space into account. That is, they do not use the spatial information available to its fullest extent (cf. [26,27]). In particular, they do not contain any information about the distance between the traps. Meanwhile, the area associated with each trap as well as the way how this area is accounted for (e.g. by ascribing a larger ‘weight’ to the trap counts collected from a bigger area) are likely to be of crucial importance, especially with attractive traps such as colour traps, light traps, pheromonebaited traps, etc., which are most commonly used in pest monitoring programmes as they provide a selective catch.
In practice, the conclusion about the optimum number of traps is often compromised by cost and agronomic considerations, and can be based solely on the intuition of the grower or crop adviser. Although the importance of intuition is widely recognized, so that ‘one should not underestimate the processing power of the human brain’ [28], yet it can hardly be accepted as an adequate substitute for a rigorous quantitative analysis. One of the continuing problems of pest management is getting scientific principles of ecological monitoring implemented at commercial production. Correspondingly, there is a strong need for conceptually simple and yet scientifically sound and effective methods to extract higher quality information from sparse spatial data.
In this paper, we develop a new method that is based on ideas different from statistical analysis and hence may provide an alternative to the existing approaches. Moreover, we will show that, in many cases, it actually appears to be significantly more accurate and/or more effective than the commonly used statistical methods, in particular than those based on equation (1.1).
It has been recently shown that the situation when the population density is only known at certain locations can be interpreted as a numerical integration problem [29,30]. The conventional problem of numerical integration is well known and well studied, and many efficient methods have been developed to compute the integral of a given function with a required accuracy (e.g. [31]).
However, while we can benefit from using the numerical integration techniques, their application in ecological monitoring is not at all straightforward. Contrary to the standard problem of numerical integration where a computational grid can be refined to ensure required accuracy of the integration, the grid of traps in a routine monitoring procedure cannot be refined for the reasons listed above. Calculation of the population size therefore requires numerical integration of the population density defined on a very coarse grid. Integration on a coarse grid is a challenging mathematical problem that has been poorly studied in the literature. Most of the results of the numerical integration theory, such as, for instance, estimates of the convergency rate which make it possible to compare the accuracy of different methods, simply do not apply to the coarse grid integration. Since the grid cannot be refined, the whole concept of convergency appears to be irrelevant.
In this paper, we consider the problem of integration of sparse data in the context of calculation of the population size as required by the goals of ecological monitoring. In §2, we revisit a few of the most commonly used methods of numerical integration. We then apply these methods to different types of spatial population distribution either obtained from a population dynamics model or generated straightforwardly and show that a reasonably accurate population size estimate (with integration error consistently less than 25%) can be obtained on a coarse grid of 25 nodes (e.g. 5 × 5 traps in a square domain). Moreover, we show that a reasonable estimate of the population size can often be obtained even on a coarser grid of just nine nodes (i.e. 3 × 3 traps). We then apply the methods of numerical integration to some available field data and show that, apart from some cases of extreme spatial aggregation, an estimate of the population size with accuracy less or about 20–30% can indeed be obtained on a grid of nine traps. Finally, we discuss the implications of our findings for ecological monitoring practices and show that they may lead to a paradigm shift and a new concept of ecological monitoring.
2. Methods of numerical integration
2.1. The problem statement
Let a function f(x,y) be the density of a pest population in a given twodimensional domain . We assume that the information about the pest population density is collected through trapping where traps are placed at the nodes of a rectangular grid. From a computational viewpoint, the trapping procedure means that the function f(x,y) is not available at an arbitrary point . Instead, we have to deal with a discrete function f_{ij} ≡ f(x_{i},y_{j}), i = 1, … , N_{x}, j = 1, … , N_{y} defined on a Cartesian grid consisting of the total number N^{2} = N_{x} × N_{y} nodes.
For the sake of simplicity, in this paper, we restrict our study by considering a rectangular domain . For convenience, we transform (using a linear mapping) domain into the unit square D = [0,1] × [0,1], where a Cartesian grid can easily be generated. Namely, let us consider a set of points x_{i}, i = 1, … , N_{x} at the interval [0,1], where we require that x_{1} = 0, x_{i+1} = x_{i} + h_{x}, i = 1, … , N_{x} − 1 and the grid step size h_{x} is defined as h_{x} = 1/(N_{x} − 1). Consider now a set of points y_{j}, j = 1, … , N_{y} in the domain [0,1] and generate a onedimensional grid in the ydirection as y_{1} = 0, y_{j+1} = y_{j} + h_{y}, j = 1, … , N_{y} − 1, h_{y} = 1/(N_{y} − 1). The grid node position is then given as (x_{i}, y_{j}).
Once a computational grid has been generated and the data are available at grid nodes, there are several methods of numerical integration that can be applied in an ecological monitoring problem. We briefly revisit them below.
2.2. Composite rules of numerical integration
A conventional approach to integration over a Cartesian grid is to consider a composite rule of integration where the integral over the unit square D is replaced by the sum of integrals evaluated at each grid cell (e.g. [31]). Consider the grid cell c_{ij} defined as c_{ij} = [x_{i}, x_{i+1}] × [y_{j}, y_{j+1}], where i = 1, … , N_{x} − 1, j = 1, … , N_{y} − 1. A composite rule of integration therefore reads as 2.1where 2.2
Hence, the integration problem is reduced to the integral evaluation at each rectangular subdomain . Different integration rules use different ‘local’ approximation of the integrand at each grid cell .
2.2.1. The midpoint rule
The simplest evaluation of the integral (2.2) can be performed under the assumption that the function is approximated by a constant at each grid cell. Such approximation results in the following formula: 2.3where A_{ij} = h_{x} h_{y} for the interior nodes, i.e. for i = 2, … , N_{x} − 1, j = 2, … , N_{y} − 1. At boundary points, where one of the conditions i = 1, i = N_{x}, j = 1 or j = N_{y} holds, we have A_{ij} = (1/2)h_{x}h_{y}, and A_{ij} = (1/4)h_{x}h_{y} at the four corners of the domain.
2.2.2. The trapezoidal rule
The trapezoidal rule of integration implies the approximation of f(x,y) by a linear function in each grid cell c_{ij}. Correspondingly, the integral of I_{ij} is evaluated as 2.4
2.2.3. The Simpson rule
The Simpson rule of integration is another ‘local’ integration rule, where the integrand f(x,y) is approximated by a quadratic polynomial at each subdomain. The application of this rule in the cell c_{ij} requires that the data f(x,y) are available at points (x_{i+q}, y_{j+r}), where q = 0,1/2,1 and r = 0,1/2,1. The notations q = 1/2 and r = 1/2 are used for the midpoints in the x and ydirections, that is x_{i+1/2} = 0.5(x_{i} + x_{i+1}) and y_{i+1/2} = 0.5(y_{i} + y_{i+1}). The function f(x,y) is then integrated in the cell c_{ij} by the Simpson rule as 2.5
Note that the straightforward application of the Simpson rule given by equation (2.5) corresponds to the case when the grid can be refined by introducing the new ‘halfinteger’ nodes. In the case we consider in this paper, the number of nodes cannot be increased. Therefore, in order to apply the Simpson rule, we have to increase the size of integration cells by considering h_{x} → 2h_{x}, h_{y} → 2h_{y}; we can then use the evennumbered nodes instead of the halfinteger ones. As an immediate consequence, it means that the number of grid nodes in each direction x,y must be odd.
2.3. Leastsquares polynomial approximation
One alternative to the composite rules discussed above is to consider a ‘global’ polynomial approximation, where the function f(x,y) is replaced by a single polynomial over the entire domain of integration. Consider a polynomial p_{K}(x,y) of degree K, 2.6where ϕ_{m} (x,y) = (x − x_{0} )^{m1} (y − y_{0} )^{m2} are polynomial basis functions, m_{1} + m_{2} = 0, 1, … , K. We then require that 2.7where the basis functions ϕ_{m}(x,y), m = 1, … , M are now linked to the central point of the unit square, x_{0} = 1/2, y_{0} = 1/2. Hence, integral (2.1) can be computed as 2.8where the partial integrals are easily available in closed form for polynomial functions ϕ_{m}(x,y).
The polynomial coefficients in expansion (2.6) can be defined by interpolation or by leastsquares approximation. In our work, we use a leastsquares method to define the coefficients a_{m}. The leastsquares procedure is explained in appendix A.
2.4. A coarse grid problem
A conventional approach to the choice of a numerical integration method is to evaluate the error that appears when the exact integral is approximated by a numerical rule. Provided the grid step size h = max(h_{x}, h_{y}) is sufficiently small, the integration error e usually depends on h as e = Ch^{k} , where C is a constant and k ≥ 1 [31]. If two methods of numerical integration are compared, then the most suitable candidate for the problem will be the method that has a smaller integration error. In other words, if the integration method A has the error e_{A} = O(h^{k1}) and method B has the error e_{B} = O(h^{k2}), where k_{1} > k_{2}, then the first method should give a more accurate result on a grid with a fixed number of grid nodes. Thus, an integration method with faster convergence rate (i.e. a method that has a greater value of k in the error estimate) is thought of as a more accurate method that should be recommended for numerical integration.
Meanwhile, a crucial feature of the numerical integration problem we deal with is that the above conclusion is not necessarily true for coarse grids, i.e. for grids with a small number N^{2} of nodes. It has been shown in Petrovskaya & Venturino [30] that the asymptotic error estimates in the form e = Ch^{k} cannot be implemented when sparse data are integrated, because we cannot provide an accurate estimate of the constant C on coarse grids. The estimate of C depends on the integrand function f(x,y), while discrete data f_{ij} available on coarse grids may not have the same properties as their continuous counterpart f(x,y). In other words, the method that has slower convergence rate can nevertheless have a smaller integration error on coarse grids, where the integrand function is not well resolved. Hence, our next task is to compare the integration error for all the methods above and to conclude if any of them can provide a user with acceptable accuracy on coarse grids.
3. Numerical test cases I: ecological model
The ultimate goal of our study is to apply the methods of numerical integration to trap count data of pest monitoring with the purpose to estimate, with a reasonable accuracy, the population size in a given area. We aim to understand the relative efficiency of different integration methods and also show how the number of traps can be minimized without any essential loss of information about pest abundance.
However, the currently available field data appropriate for our purposes are rare, in particular, because the standard methods of integration imply certain restrictions on the traps number and position. Indeed, the methods discussed above can be applied only when the traps are installed at the nodes of a rectangular grid but not to the case of their arbitrary position. Besides, the Simpson method requires that the number of traps in each direction must be odd.
Therefore, in order to evaluate the effectiveness and accuracy of numerical integration on a coarse grid, we first consider simulation data obtained from an ecologically sound mathematical model of population dynamics. Specifically, we consider the following spatially explicit predator–prey model with the Allee effect (e.g. [28,32]): 3.1and 3.2
Here U and V in equations (3.1) and (3.2) are the densities of prey and predator, respectively, at time T (T > 0) and position (X,Y). Note that, depending on a particular application, either of the two species can be the pest. We consider population dynamics in a squareshaped domain so that 0 < X < L and 0 < Y < L, where L can be interpreted as the typical size of a given agricultural field. At the domain boundaries, the zeroflux conditions are used. Coefficients D_{1} and D_{2} describe species diffusivity owing to the movement of the individuals, parameter K is the prey carrying capacity, U_{0} is the Allee threshold density (0 < U_{0} < K), ν is the maximum prey per capita growth rate (cf. [33]), A is the predator attack rate, B is the halfsaturation prey density, κ is the food assimilation efficiency coefficient and M is the predator mortality. For the sake of simplicity, below we assume that D_{1} = D_{2} = D. Note that this assumption is not ecologically unrealistic: available estimates of the range of diffusivity values often overlap between different species, even for species from different taxonomic groups ([34], §3.7).
Here we want to mention that the equations (3.1) and (3.2) are, in an appropriate parameter range, in a good qualitative and sometimes quantitative agreement with experimental data ([24], §11.3) and therefore can be used to simulate ecologically realistic spatial density distributions.
In order to obtain the population distributions, equations (3.1) and (3.2) are solved numerically. For convenience, we first introduce dimensionless variables. The domain length L makes a convenient scale for the coordinates, x = X/L and y = Y/L, where the new variables x and y are dimensionless, 0 < x < 1, and 0 < y < 1. In a similar way, the carrying capacity K makes a convenient scale for the prey population density, u = U/K. By introducing dimensionless predator density as v = V/(κK) and dimensionless time as t = aT, where a = A κ K/B, equations (3.1) and (3.2) take the following form: 3.3and 3.4where Λ = K/B, b = U_{0}/K, β = 4νBK/(Aκ(K − U_{0} )^{2}), m = M/a, d_{1} = D_{1}/(aL^{2}) and d_{2} = D_{2}/(aL^{2}) are dimensionless parameters.
For the parameter range where the local prey–predator dynamics is oscillatory, equations (3.3) and (3.4) are known to generate a rich variety of spatiotemporal patterns [24]. The left column in figure 2 shows three different cases (the corresponding parameter values, the initial conditions and other technical details are given in appendix B). Figure 2a shows a smooth distribution of the prey density over the area, it reaches its maximum value (shown in red) near the xaxis and drops to zero in the righthand side of the domain. Figure 2b shows the almost homogeneous prey distribution inside the round area (shown in yellow) separated from the rest of the domain, where the population is absent, by a steep gradient in the population density. This case may correspond to the species spread from the place of its initial introduction in the centre of the domain. Figure 2c also shows the species spread but following a different dynamical regime when the population distribution in the wake of the population front forms a distinct patchy structure (cf. [35]).
Now, we are going to check whether the method of numerical integration on a coarse grid can be applied to population distributions shown in figure 2. Note that, in order to estimate the integration error, we need the exact value of the integral (i.e. the population size), which is not known because equations (3.3) and (3.4) cannot be solved analytically. To overcome this difficulty, we first solve equations (3.3) and (3.4) numerically on a very fine grid G_{f} that has N_{x} = N_{y} ≡ N_{f} = 2^{10} + 1 grid nodes at each direction, and consider the result as the exact solution to the problem. Consequently, the corresponding value of the solution integral is considered as the exact integral that will be compared with the value I_{N} of the integral on a coarse grid. Coarse grids G_{c} are generated as grids where the number of nodes at each direction is N_{x} = N_{y} ≡ N = 2^{s} + 1, 1 ≤ s ≤ 7, and a projection of the ‘exact’ solution obtained on G_{f} onto each coarse grid is considered. Once the function u(x,y) is available at nodes of a coarse grid, the numerical integration is performed and the integration error is computed as 3.5
An advantage of using simulation data instead of real data to test the integration methods on a coarse grid is that we can easily make the grid as refined as we want by adding more grid nodes, which is difficult in empirical field studies. Correspondingly, we can consider how the integration accuracy changes depending on the number of grid nodes. Error (3.5) as a function of the number N of grid nodes at each direction, starting from the minimum number of N = 3, is shown in the right column of figure 2. The dashed red horizontal line shows the hypothetical value of desirable error of 25 per cent. It is readily seen from figure 2 that, for the coarse grid with N = 5, the Simpson rule and the leastsquares method provide good accuracy of less than 25 per cent consistently over all three different types of population distribution. Even on a very coarse grid with N = 3 (so that the total number of grid nodes is just N^{2} = 9), methods of numerical integration provide a very good accuracy (less than 10%) for the density distribution shown in figure 2a and a reasonable accuracy^{1} less or about 50 per cent in the case of figure 2c.
The population distributions shown in figure 2 correspond to three qualitatively different cases but they, of course, do not exhaust all the possibilities. An interesting and ecologically important pattern is given by the ‘patchy invasion’ [38–40] when, contrary to what is shown in figure 2e, the strongly heterogeneous patchy spatial distribution of a spreading population is not preceded by the propagation of any continuous front. The left column of figure 3 shows the snapshots of the prey population density at three different stages of the patchy invasion, i.e. (a) early, (b) intermediate, and (c) late (the parameters and other details are given in appendix B). The right column shows the corresponding accuracy of numerical integration. Rather counterintuitively, except for the early stage of the patchy spread, even a very coarse grid provides a reasonable accuracy which is consistently less than 50 per cent for N = 3 and less than 30 per cent for N = 5.
4. Numerical test cases II: extreme aggregation
In §3, we showed that numerical integration on coarse grids provides good accuracy in the case where the population is distributed more or less over the whole area, no matter whether this distribution is simple (e.g. like the one shown in figure 2a) or may have a complex structure (figure 3e). However, the accuracy appears to be much lower when the population is aggregated within one or a few small areas which may fall in between the sampling grid nodes (cf. figure 3a).
It should be noted that, even on a very coarse grid, such a narrow peak in the spatial population distribution is not necessarily undetectable. It depends on the peak location with respect to the position of the traps (i.e. position of the grid nodes) if its contribution to the population size is properly accounted. The actual problem is that the location of the peak is not known a priori. Therefore, it is impossible to distinguish in advance between the cases when the numerical integration on a coarse grid may still provide a reasonable estimate and when it fails. However, it is possible to assess the frequency of different outcomes if we fix the grid and treat the location of the peak as random.
In order to make a quantitative insight into this matter, we now reduce the grid size to a minimum sensible case, e.g. N^{2} = 9 or N^{2} = 25 nodes, and consider another series of numerical tests. We do not need any population dynamics model now as we can generate the desirable spatial distributions directly. We start with the case when the population consists of a single peak distribution (figure 4a). We describe the shape of the peak by a normal distribution: 4.1where U_{0} is a scaling factor, the variance σ^{2} is fixed at a small hypothetical value and and are random numbers distributed uniformly between 0 and 1, so that the position of the peak is chosen randomly. We consider 10 different realizations (numbered sequentially in the first column of table 1) and calculate the population size and the corresponding error for each of them. We use the methods of numerical integration described in §2 as well as the statistical method given by equation (1.1). The results are given in tables 1 and 2. Expectedly, we observe that the typical accuracy of the integration on a very coarse grid with N^{2} = 9 nodes is low. In fact, only in one or two cases (shown in bold in table 1) out of 10, the accuracy is within the desirable 25 per cent; see also table 2a. However, we point out that the number of cases, where the accuracy is within 90 per cent (which may still be acceptable for some applications, see footnote 1), is five out of 10 for four integration rules (the leastsquares method appears to be somewhat less effective).
In order to further reveal the tendency, we repeat this procedure in the case where the population distribution consists of a few peaks: 4.2where and (j = 1, … , J) are random numbers and J is the total number of peaks. Table 2b,c shows respectively, the results obtained for J = 3 and J = 7 (figure 4b,c). Interestingly, the frequency of the ‘good’ cases does not change much when the number of peaks increases from one to three. However, it increases significantly when the number of peaks increases from three to seven. Note that even in the latter case, the population density is negligibly small in more than 80 per cent of the area, yet the midpoint and trapezoidal rules ensure the acceptable accuracy within 90 per cent in all the cases considered.
We also mention here that, in all three cases considered above, i.e. for J = 1, J = 3 and J = 7, there is no significant difference in accuracy between the statistical method (1.1) and the numerical integration rules.
The situation, however, changes remarkably if the integration is considered on a slightly less coarse grid, i.e. when the number of nodes increases from N^{2} = 9 to N^{2} = 25 (table 2d). Now, even in the extreme situation of the single peak distribution (J = 1), the Simpson rule ensures the desirable accuracy (within 25%) in 10 out of 10 cases. Almost the same good accuracy is provided by the leastsquares method. The efficiency of other integration rules is comparable with the Simpson rule in the low accuracy range of ɛ ≤ 90% but appears to be much lower in the high accuracy range of ɛ ≤ 25%.
5. Application to field data
In the previous sections, we demonstrated that numerical integration on a grid as coarse as N^{2} = 9 nodes can provide a reliable estimate of the population size even when the population spatial distribution has a highly complicated structure. Now, we are going to further validate our approach by applying it to field data of ecological monitoring. The experimental data we use have been collected for a New Zealand flatworm population (Arthurdendyus triangulatus) [41]. This flatworm is an alien species originally brought from New Zealand and eventually having established itself over a large part of the British Isles [42] as well as having potential to colonize Europe [43]. The New Zealand flatworm is regarded as a potentially dangerous pest because of its predatory impact on earthworms [44], which may have a severe disruptive effect on the corresponding ecosystem, in particular, causing a decline in the soil fauna and strongly affecting earthworm predators like moles and badgers [45,46].
For this reason, flatworms have been a subject of controversy, intensive study and monitoring, particularly, in Northern Ireland [42,47]. Here, we use the results obtained in a field study performed in January–March of 2002. The goal of the study was to reveal the features of flatworms' spatial distribution in grassland, including a population assessment. An area of a flatworminfested grassland field at Newforge Lane, Belfast, Northern Ireland, was used for the investigation (World Geodetic System 54°33′24′′ N, 5°56′47′′ W, Irish Grid J330698). The data on flatworms abundance at different locations were collected by means of trapping (figure 1). Shelter traps were heavy duty black polythene sacks, filled with 5 kg of gravel/sand mixed in a ratio of 4 : 1 and measuring 50 × 30 cm. Arthurdendyus triangulatus typically shelters on the soil surface under stones, wood and discarded plastic, so shelter traps such as these provide a relatively cheap and convenient method of sampling and have been widely used [48–50]. The traps were positioned in the nodes of 12 × 12 uniform grid with 2 m spacing, they were examined every week and the numbers of flatworms caught were counted. The features of the experimental design (in particular, trap spacing) are consistent with other similar studies on flatworms (e.g. [49]) and hence can be regarded as typical for this species.
The six distributions of trap counts obtained in the field study are shown in figure 5. Over the sampling period, the 12 × 12 grid traps caught 465–748 flatworms per week. The initial two adjacent sampling weeks, gave counts of 624 and 544 flatworms, equating to a total of 1168 individual flatworms within the grid area [41]. For our purposes, from the trap data collected in the field, we extract the subgrid with N_{x} = N_{y} ≡ N = 11 nodes at each direction. That has been carried out because the Simpson rule requires an odd number N of grid nodes. An example of the corresponding trap data is given in table 3.
Dividing the trap counts at each location by the area of the grid cell (i.e. by 4 m^{2}), the trap counts are linked to the local population density [13,14]. Having then integrated the population density over the fine grid of N^{2} = 121 nodes, we reproduce the total number of collected flatworms (shown in the first row of table 4 for each of distributions from figure 5a–f). This number is considered as the exact value of the population size.
We then recalculate the population size using a very coarse grid of N^{2} = 9 nodes by the midpoint rule (the row marked as I_{MR} in table 4), by the trapezoidal rule (the row I_{TR}), by the Simpson rule (the row I_{SR}), by the leastsquares approximation (the row I_{LS}) and by the baseline statistical approach given by equation (1.1) (the row I_{stat}). For each of these rules, we compute the integration error (3.5) (the rows marked as e_{MR}, e_{TR}, e_{SR}, e_{LS} and e_{stat}, table 4).
It is readily seen from table 4 that, apart from case (a), integration on the grid of N^{2} = 9 nodes gives an estimate of the total population with error consistently less or about 30 per cent. Note that, even in the ‘unfortunate’ case (a) where the information is largely lost because the patch of a high flatworm population density falls in between the grid nodes, the error of the population size estimate still remains within 55 per cent. Such accuracy can still be considered as acceptable for largescale monitoring programmes (cf. [20]).
The value of the error averaged over the six cases appears to be just 22 per cent for all four numerical integration rules and 30 per cent for the statistical rule (1.1). It shows that a robust information about the population size of flatworm population can be obtained using much less traps per unit area (or much larger intertrap distance) than it is usually used. It also demonstrates a higher accuracy of the numerical integration rules compared with the standard statistical approach.
6. Discussion
Ecological monitoring is an essential part of the integrated pest management and control. Its main goal is to provide information about the abundance of problem species, which is performed through collecting information in the field. This information is then used as a basis for decisionmaking, for instance, about application of pesticides. The reliability of the information on pest abundance is therefore an issue of high importance. On the other hand, collecting field data on a large scale, as required by national or regional pest surveys, is costly. A challenging problem is therefore to obtain sufficiently accurate estimates of pest abundance keeping the resources spent on collecting field data collection at a minimum possible value.
For invertebrate species, the information about population abundance is usually obtained from trap counts. The trap counts are then recalculated into the population density at the position of the traps (e.g. [13,14]). However, as the number of traps cannot be made large, it gives the value of population density only at several locations. The statistical methods to estimate the total population size based on this sparse information remain difficult, in particular, owing to the majority of pests displaying some form of nonrandom aggregated distribution [21]. The commonly used statistical approaches [25,18] are not fully satisfactory as they are based on restrictive assumptions about the properties of the population spatial distribution or neglecting the spatial aspect at all. The determination of the optimum number of traps within a sampling programme often relies on properties of the frequency distribution of counts rather than the actual pattern [27].
In our previous work [29], we considered a hypothetical onedimensional case and showed that this problem can, in principle, be solved by applying methods of numerical integration. The population size is obtained by integrating the population density on a coarse grid, where the grid nodes are given by the traps' position. In this paper, we applied this approach to a realistic twodimensional case using both numerical data (in particular, simulation data from a mathematical model of population dynamics) and field data on some invertebrate species (New Zealand flatworms). We used several different methods of numerical integration along with a baseline statistical method and showed that a reliable estimate of the population size (with a good accuracy within 25%) can be obtained on a grid consisting of just 5 × 5 = 25 nodes, even in the cases when the population density exhibits a highly complex spatial structure (e.g. figure 3c) or shows extreme spatial aggregation (e.g. figure 4a). In fact, for some of the numerical tests, as well as for the field data, even a smaller grid of 3 × 3 = 9 nodes is sufficient to provide a good accuracy, although the results become more sensitive to the details of the density distribution. This is in good agreement with the results of other field studies. In particular, Boag et al. [50] analysed the results of trap counts on a spatially aggregated flatworm population and suggested that a minimum trap number of 15 was necessary to achieve a good population estimate, although they cautioned that this is a compromise between statistical accuracy and fieldwork practicalities.
We also mention here that in almost all cases, the statistical approach based on the arithmetic mean, see equation (1.1), appears to be less accurate than (some of) the integration rules; in general, the Simpson rule seems to be the most accurate one.
6.1. Explicit space
The simulation results shown in figures 2–4 are obtained in the dimensionless space. It evokes questions as to what spatial scale the shown population distributions may correspond in reality and how we can relate their scale to the number of grid nodes required for an accurate numerical integration. Recalling that, in simulations, the distances are measured as fractions of the domain size (see the definition of the dimensionless variables above equations (3.3) and (3.4)), for a hump with a characteristic dimensionless size δ_{hump}, we obtain that its real size Δ_{hump} is calculated as 6.1
Obviously, for a given size L of the domain, a smaller δ_{hump} corresponds to a smaller Δ_{hump}. However, let us note that L is a parameter which, generally speaking, can be made different in different simulations, e.g. by considering fields of different size. Therefore, we can address this situation by assuming that the size Δ_{hump} of the hump is the same but the size of the domain is different. Indeed, there is significant theoretical and empirical evidence (cf. [23,51–53]) that, for a given species, the typical size of animals' spatial aggregates can be regarded as a species' trait as it remains approximately the same for different habitats.
We therefore assume that the distribution shown in figure 2a corresponds to the ‘microscale’ of a single hump in a small field, whereas the multihump distribution shown in figure 3e corresponds to a ‘macroscopic’ spatial scale of the same population but in a much larger field. Having estimated the number of humps at each direction, we conclude that the spatial scale of the distribution shown in figure 3e is about 20 times larger than that of figure 2a. Interestingly, the accuracy of the numerical integration on the same coarse grid does not at all drop down 20 times; in fact, using the midpoint integration rule, the accuracy of the population size estimate on the grid of 5 × 5 nodes appears to be even higher for figure 3e than for figure 2a. Therefore, contrary to what has been observed before in the onedimensional case (cf. [29]), in order to achieve reasonable accuracy, the numerical grid must not necessarily resolve all the solution's heterogeneities. (A heuristic explanation for this can be that, owing to the high irregularity of the complex ‘multipatch’ spatial pattern, the overshoots and undershoots at different grid nodes may balance each other.) Having in mind the practical applications of our approach, it leads to a highly counterintuitive conclusion that the number of traps required to obtain a reliable estimate of the population size does not directly depend on the size of the monitored area, and the same minimum number of traps (e.g. nine, as required by the Simpson rule) may be sufficient for a small agricultural field as well as for a large one.
In order to relate the simulation results to the real scale of metres or kilometres, we consider the characteristic scale of the pattern, e.g. an average linear size of a single hump. Its value is known to be speciesspecific. For instance, it is estimated to be between 30 and 50 m for many common agricultural insect pests [51,52] but it can be much smaller for soil invertebrates; e.g. the results shown in figure 5 suggest the value of the order of several metres. Therefore, the pattern shown in figure 3e may correspond to an insect population distribution over a field of roughly 1 × 1 km. Thus, we conclude that a grid of 5 × 5 traps should be sufficient to provide robust information about the population size over the area of up to 100 ha for insect species and over the area of about 4 ha for flatworms. A conceptual possibility of using smaller grids of traps is discussed below.
6.2. Ultracoarse grids: a paradigm shift?
A traditional approach to numerical integration assumes that the integral can be calculated with a given accuracy. Indeed, in computational mathematics, any prescribed accuracy can be reached by refining the numerical grid (i.e. by increasing the number of nodes per unit area) as the theory predicts that the calculated value converges to the exact value when the grid spacing tends to zero. In the case that the ideas of numerical integration are applied to ecological monitoring, the situation becomes qualitatively different: we have to deal with sparse data (e.g. population density) obtained at certain given locations (traps position). The numerical grid is therefore fixed and cannot be refined.
Convergency of the calculated integral to the exact value implies that, as a general tendency, the integration error should be the smaller the more refined the grid is, i.e. the larger the number of nodes is. That, in principle, makes it possible to give a robust estimate of the population size with a reasonable accuracy even on a fixed grid if the number of nodes is not too small. Ideally, the number of nodes should be sufficiently large in order to resolve the inhomogeneities in the population spatial distribution [29]. This requires some a priori information about the distribution features, which is often unknown. Moreover, careful resolution of small size heterogeneities such as narrow peaks would require a number of nodes per unit area much larger than it is normally available in ecological monitoring programmes [29]. In practical terms, however, we have observed that, using the Simpson integration rule, a good accuracy of less than 25 per cent can be reached^{2} on a grid consisting of just 25 nodes, even that this grid apparently does not resolve the single peak distributions well enough (figure 4a).
A grid of 25 traps is still larger than that used in practice (cf. [20]) so there is a need for further optimization. A question is whether any useful information can be obtained on a smaller grid. Obviously, for an ‘ultracoarse’ grid, the above approach is not valid any more: indeed, a smaller grid of nine nodes largely fails when applied to highly aggregated population distributions consisting of one or a few narrow peaks (see §4). The problem is that the position of the peak(s) is not known in advance. Hence, it is not possible to adjust the position of the grid nodes (traps) to the position of the peaks. A narrow peak can still be ‘integrated’ properly but only if it is located close to one of the grid nodes. This is a typical example of a problem with uncertainty, which can have more than one outcome, e.g. depending on relative position of the peaks and traps. A standard way to deal with this type of problem (e.g. [54]) is by introducing probabilities of different outcomes (table 2). That leads to a paradigm shift: instead of considering the integration result per se, we now have to interpret it probabilistically. For instance, whatever is the estimate obtained, there is a probability p_{1} that the integration error is within 25 per cent, a probability p_{2} that it is within 50 per cent, etc. The results of the monitoring should be interpreted accordingly by considering the risks associated with each of the probability ranges. Although this approach is unlikely to be informative if the monitoring is limited to one particular field or area, it may be effective for nationwide programmes where the corresponding probabilities can be obtained from comparison of the results between different fields, with the results of case studies on the given species, and with simulations.
6.3. Concluding remarks
The main conclusions from our study are as follows:

— numerical integration rules (in particular, Simpson rule) can provide a significantly more accurate estimate of the population size from sparse trap data than the standard statistical approach. Application of integration rules is not based on any assumption about the population distribution and remains effective even when the distribution exhibits a complex spatial structure;

— the minimum number of traps required to provide a robust estimate of the population size is found to be 25 (on a rectangular grid). Integration on a grid of 5 × 5 traps keeps the error consistently within 25 per cent. This result shows only slight dependence on the spatial scale and hence is applicable to large areas as well as to small ones; and

— the use of a smaller grid of traps is possible but it may result in a paradigm shift. Integration on an ultracoarse grid cannot provide an estimate of the population size with any prescribed accuracy because of the insufficient information (uncertainty) about the population distribution properties. Instead, the results of the integration can be treated probabilistically by considering the integration error as a random variable. Results of a largescale ecological monitoring should be interpreted accordingly by considering additional risks associated with different accuracy ranges.
Our study leaves a number of open questions. On the theoretical side of our approach, for the methods considered in this paper, it is required that the values of the population density are known at the nodes of a rectangular grid, so that the traps' position must be chosen accordingly. It remains unclear how the accuracy of the integration may change if the traps are installed at arbitrary locations. Obviously, the power of our approach would increase considerably should we be able to place the traps flexibly. However, the latter requires application of more advanced rules of numerical integration. A similar observation can be made with regards to the domain's geometry. Application of the methods considered here is restricted to a rectangular domain. In order to ensure a broad practical application of our approach, the integration rules should be generalized onto the case of an arbitrary field shape.
A related question is to what extent the integration accuracy may be affected if the traps are not laid exactly at the grid points but at some neighbouring position, e.g. as a result of environmental heterogeneity. Our preliminary analysis indicates that relatively small deviations (up to approx. 25% of the intertrap distance) from a uniform grid do not seriously affect the integration error. A more detailed consideration of this problem will become a focus of a separate study.
On a more practical side, there can be some possible limitations that we do not take into account in this study but those can affect the application of our approach. In particular, spatial and temporal variations in trapping results may reflect not only variations in densities, but also errors connected to the trapping methods used (e.g. how efficient are the traps). The effect of such factors may call for a larger trapping effort than our theoretical study indicates. Future research should develop a more quantitative approach to address these issues.
Acknowledgements
The authors thankfully acknowledge instructive criticism from four anonymous reviewers. This study was partially supported by The Leverhulme Trust through grant F/00568/X (to S.P.). S.P. also acknowledges support from the U.K. East Midlands Development Agency through Innovation Fellowship ‘QuantiFly’.
Appendix A. the leastsquares approximation
Consider points P_{i} = (x_{i}, y_{i}), i = 1, …, N and let f_{1}, f_{2}, … , f_{N} be the values of the function f(x, y) at points P_{1}, P_{2}, … , P_{N}, respectively. We have to fit the data f_{i}, i = 1, … , N to the function A 1where ϕ_{m}(x, y), m = 1, … , M, are basis functions, and the expansion coefficients (u_{1}, u_{2}, … , u_{M}) are considered as fitting parameters. The polynomial basis functions ϕ_{m}(x, y) are given by A 2where k_{1} + k_{2} = 0,1, … , K, and the K defines the highest polynomial degree of the expansion (A 1). For any given K, the number of basis functions is determined as M = (K + 1)(K + 2)/2. The point P_{0} = (x_{0}, y_{0}) is a fixed point in the domain D. We take P_{0} = (0.5, 0.5) when the function f(x, y) is reconstructed in the unit square D = [0, 1] × [0,1].
The leastsquares approach considers the vector u = (u_{1}, u_{2}, … , u_{M}) as the best fit to a given dataset, if u minimizes the following merit function F^{2}: A 3where σ_{i} are the weights of the method. Thus, the parameters u_{m} can be found from the M conditions which are called the normal equations of the method. Taking into account the definition (A 3), we obtain the normal equations in the following form A 4
Introducing the weighted data b and the design matrix [D] as the normal equations can be written as [A] u = r, where the matrix [A] = [D]^{T} [D], and the righthand side r = [D]^{T}b. They are to be solved for the vector of parameters u = (u_{1}, … , u_{M} ), A 5
Appendix B. simulation parameters and the initial conditions
The population density snapshot shown in figure 2a is obtained at t = 50 for parameters β = 3, b = 0.28, Λ = 0.5, m = 0.48, D = 10^{−6} and the initial conditions when both populations are distributed over the domain: B 1and B 2where ɛ_{ux} = 0.007, ɛ_{uy} = 0.008, ɛ_{vx} = 0.008 and ɛ_{vy} =−0.007.
Figures 2c,e and 3 are obtained for the initial conditions when both populations are concentrated around the place of the original introduction, e.g. around the domain centre: B 3and B 4where u_{0} and v_{0} are the initial prey and predator densities, respectively, and x_{ij}, y_{ij} (where i = 1, 2 and j = 1, 2) are parameters with obvious meaning.
The snapshot shown in figure 2c is obtained at t = 800 for m = 0.5 (other parameters are the same as in figure 2a) and the initial conditions (B 3 and B 4) with x_{11} = 0.42, x_{12} = 0.53, y_{11} = 0.45, y_{12} = 0.55, x_{21} = 0.42, x_{22} = 0.48, y_{21} = 0.45 and y_{22} = 0.51.
The snapshot shown in figure 2e is obtained at t = 1200 for m = 0.45 (other parameters are the same as in figure 2a,c) and the same initial conditions as in figure 2c.
Figure 3 is obtained for m = 0.414 (other parameters are the same as in figure 2) and the same initial conditions as figure 2c,e. The snapshots are shown for (a) t = 450, (c) t = 1500 and (e) t = 3500.
Footnotes
 Received June 16, 2011.
 Accepted July 19, 2011.
 This journal is © 2011 The Royal Society