The hypothesis that the optimal search strategy is a Lévy walk (LW) or Lévy flight, originally suggested in 1995, has generated an explosion of interest and controversy. Long-standing empirical evidence supporting the LW hypothesis has been overturned, while new models and data are constantly being published. Statistical methods have been criticized and new methods put forward. In parallel with the empirical studies, theoretical search models have been developed. Some theories have been disproved while others remain. Here, we gather together the current state of the art on the role of LWs in optimal foraging theory. We examine the body of theory underpinning the subject. Then we present new results showing that deviations from the idealized one-dimensional search model greatly reduce or remove the advantage of LWs. The search strategy of an LW with exponent μ = 2 is therefore not as robust as is widely thought. We also review the available techniques, and their potential pitfalls, for analysing field data. It is becoming increasingly recognized that there is a wide range of mechanisms that can lead to the apparent observation of power-law patterns. The consequence of this is that the detection of such patterns in field data implies neither that the foragers in question are performing an LW, nor that they have evolved to do so. We conclude that LWs are neither a universal optimal search strategy, nor are they as widespread in nature as was once thought.
A random walk is a stochastic process in which the location X(t) of the random walker varies with time t according to a defined set of probabilistic rules. Random walks have been used for many years as a model for animal movement (e.g. [1–5]). Codling et al.  reviewed many of these models (but did not consider Lévy walks (LWs)). Shlesinger et al.  were among the first to introduce LWs as a class of random walk in which the distance travelled between reorientation events (often referred to as the step length) is drawn from a probability distribution that is heavy-tailed, meaning that it does not have finite variance. The most commonly used such distribution is the power-law or Pareto distribution, in which the probability density function of a step length l is given by 1.1where the exponent μ satisfies 1< μ ≤ 3. Exponents μ ≤1 do not correspond to a well-defined probability distribution, while exponents μ > 3 correspond to distributions with finite variance, leading to a non-LW. Once the two parameters μ and lmin have been determined, the normalization constant C is determined by C = (μ − 1)lminμ−1 (e.g. ).
In an LW, the time taken to complete a given step is related to the length of that step . In contrast, the term Lévy flight (LF) refers to a process in which the random walker jumps between successive locations instantaneously (or at equally spaced time intervals), with step lengths given by equation (1.1). LFs are sometimes used in cases where the distances between successive reorientation locations are known, but the corresponding times are unknown. However, it should be noted that the terms LF and LW are often used interchangeably in the foraging literature (following Viswanathan et al. ).
Heavy-tailed distributions do not conform to the conditions of the central limit theorem (which requires finite variance) and, as a consequence, standard results about the long-term limit of random walks (e.g. ) do not apply for LWs. Instead, LWs are superdiffusive, meaning that the long-term mean-squared displacement of the walker is proportional to tα, where t is the time from the start of the walk and α > 1 . One advantage of LWs is that they allow for a continuous transition from diffusive (Brownian) random walks (μ > 3), through superdiffusion (1< μ ≤ 3), to ballistic (straight-line) motion, which occurs in the limit μ → 1. LW thus provides an important conceptual link between these modes of movement. Another important feature of LWs is that they are scale-free, meaning that they do not have any characteristic spatial scale, and exhibit the same patterns regardless of the range over which they are viewed. Figure 1 shows simulations of various two-dimensional random walks. Each walk covers the same total distance of 1000 units (the final step is truncated to ensure this). LWs with a low exponent cover the required distance in a small number of steps; a higher exponent gives a step-length distribution with more shorter steps, resulting in a walk that stays much closer to the starting point (note the different scales in figure 1).
It should be mentioned that the processes discussed here are highly simplified representations of real foragers. The models assume that the forager is memoryless, i.e. each step taken is independent of previous movements and the forager has no knowledge of the environment outside its immediate perceptive range. While these assumptions are necessary to allow progress (particularly analytical) in modelling, they are not wholly realistic . There is strong evidence that foraging in many species involves complex behaviour that violates these simple assumptions . Furthermore, real foragers have to make complex trade-offs involving a wide range of factors, such as risk of predation, energy storage and expenditure, and intraspecifc and interspecific competition. It should also be remembered that evolutionary selection pressure does not always have the effect of maximizing individual mean fitness and risk sensitivity may be an important factor . The recent book by Stephens et al.  gives an excellent review of observed behaviours and their links to different types of models. The random walk models that are the focus of this paper are intended to conceptualize some of the general principles underlying foragers' decisions about where to search for food, and are necessarily a huge simplification of reality.
The subject of LW in foraging was initiated by empirical papers that demonstrated the presence of a heavy-tailed distribution in data describing the movements of fruitflies  and wandering albatrosses . These were complemented by a theoretical study of the efficiency of a forager carrying out a random walk search with a power-law distribution of step lengths in an environment designed to model patchily distributed search targets [16,17]. Defining search efficiency as the mean number of targets located per unit distance travelled, Viswanathan et al.  showed that: (i) LWs (1 < μ ≤ 3) are more efficient than non-Lévy walks (μ > 3) and (ii) the optimal Lévy exponent is approximately 2. Diffusive (i.e. Brownian) movement (μ > 3) involves much backtracking, which can be advantageous in keeping the forager in a food patch, but can also entail repeatedly searching empty space when not in a patch. Ballistic movement (μ → 1) avoids repeatedly searching the same space, but is less suited to exploiting the patchy nature of the food environment. The hypothesis of Viswanathan et al.  was that an LW with μ ≈ 2 represents an optimal compromise between the Brownian and the ballistic search modes.
The studies of Viswanathan et al. [9,16] sparked an explosion of interest in LW and the twin streams of empirical and theoretical research continued in subsequent years. A large number of empirical papers advanced evidence of LW in the observed movements of a wide range of species, including reindeer , spider monkeys , grey seals , fruitflies , bees , moths  and marine predators [24,25]. Evidence was also demonstrated of LW in the movements of fishing trawlers  and human hunter–gatherers . A number of theoretical papers generalized the LW hypothesis, considering for example cases with moving targets , targets that regenerate a certain period of time after being located by the forager [29,30] and cases where the perceptive capability of the forager depends on the step length .
However, a re-analysis by Edwards et al.  of the original albatross, bumble-bee and deer studies of Viswanathan et al. [9,16] demonstrated flaws in both the interpretation of the data and the statistical methods used to analyse them. One regression-based method was recommended over the others , though maximum likelihood was then shown to estimate μ accurately [8,34] and avoid the bias of all regression-based methods. Other problems were the lack of proper testing of alternative hypotheses and of goodness-of-fit. A recent re-analysis of previously published statistical studies overwhelmingly rejected the original Lévy model for 16 of the 17 datasets tested , including for some of the foragers mentioned above.
Further theoretical work showed that, even if an animal is performing an LW, some of the more common field techniques employed will not necessarily give heavy-tailed data ; conversely, heavy-tailed patterns in field data can arise from non-LWs . It was also shown that alternative search strategies can outperform LW [36,38]. The conclusions of the theoretical work concerning moving prey  were shown to be misleading . Simulations involving LW were shown to have hidden pitfalls and numerical inaccuracies that can lead to biased results . Together, these developments have led to a greater appreciation of the need to draw a distinction between pattern and process [36,37], and a reconsideration of the conditions under which an LW is the optimal search strategy [41,42].
In §2, we consider the theoretical work on random searches in detail. We pay specific attention to the assumptions underlying the conclusion that LWs are an optimal strategy and we present new results showing the effects of relaxing these assumptions. In §3, we review the techniques available for analysing movement data and consider what can and what cannot be inferred from these about search mechanisms. In §4, we summarize the theoretical and empirical findings and conclude that, in contrast to what was once thought, LWs are not the universal optimal search strategy.
2. Theoretical models of search efficiency
2.1. Lévy walks as optimal search strategies
The results of Viswanathan et al. [16,17] concerning optimal search strategies motivated much of the ensuing work on LWs, so we first review their results in detail. They proposed a simple, one-dimensional model of an individual searching for food. In each search, the individual starts at location x = x0 on the line and food items are situated at x = 0 and x = 2λ (the quantity λ is referred to as the mean free path and corresponds to the mean straight-line distance to a target in a randomly chosen direction; figure 2). The forager searches for food by picking a direction at random (left or right with equal probability) and then moving at a constant speed for a distance chosen from the step-length distribution. If during this step the forager moves within a distance rv of a food item then the step is truncated and the forager moves directly to the item (rv is termed the perceptive range). If the forager does not find a food item during the step, it chooses another step length and direction, independent of previous steps. The forager's efficiency η is defined as the reciprocal of the mean distance L travelled to find a food item: 2.1
Two scenarios were considered: destructive foraging, in which food items cannot be visited more than once; and non-destructive foraging, in which food items can be revisited an unlimited number of times. In the destructive scenario, the forager's initial position x0 is assumed to be equal to λ. This represents the situation where, having located and consumed a food item, the forager begins the next search positioned equidistantly between two food items (figure 2a). In order to maintain a constant target density, the food items are always positioned the same 2λ distance apart. In the non-destructive scenario, x0 is assumed to be equal to rv (which is assumed to be much smaller than λ). This represents the situation where, having located a food item, the same item remains available for future searches. So that the forager cannot simply remain at the same food item indefinitely, it is assumed that the forager must first move (just) outside perceptive range of the food item (i.e. to x0 = rv) before beginning the next search (figure 2b). As it is also assumed that it has no knowledge or memory of anything outside its perceptive range, the forager cannot remember the direction in which the food item lies.
Viswanathan et al.  explored the class of strategies where the step-length distribution is a power law defined by equation (1.1). The advantage of this approach is that strategies that were hitherto considered as being of separate types could now be represented as opposite ends of a continuum of strategies, characterized by the power-law exponent μ. As μ → 1, the step-length distribution becomes increasingly dominated by long steps and the strategy becomes more and more similar to ballistic motion (figure 1). For values of μ > 3, the step-length distribution is not heavy-tailed and the random walk is therefore not an LW. Although a power-law walk with μ > 3 is diffusive (rather than superdiffusive) in the long-term limit, and is frequently referred to as a Brownian walk (e.g. ), it is important to remember that it is only one example of a non-LW and there are a wide variety of walks that are diffusive and are not based on a power law. Indeed, a power-law walk where μ is only just greater than 3 will still have some very large step lengths, which means that the walk will only appear diffusive over very large time scales. For intermediate values of μ between 1 and 3, the strategy is an LW. As steps are truncated when a food item is found, the distribution of actual distances travelled has a finite upper bound and therefore has finite mean and variance. Strictly, this is a truncated LW, but is commonly referred to simply as an LW to distinguish it from the case where the distribution of step lengths defined by equation (1.1) is truncated, either by a fixed upper bound or an exponential cut-off. To be unambiguous, we shall refer to the strategy where steps are truncated at food items as a target-truncated LW.
It should be noted that the assumption of Viswanathan et al.  that step lengths are independent and identically distributed (IID) according to this particular class of distribution is a significant limitation. A wide class of other potential strategies, such as strategies involving some memory or intelligence and intermittent (composite) strategies, are excluded. In the rest of this section, it should be remembered that the term ‘optimal strategy’ actually means ‘most efficient strategy of those with IID step lengths drawn from a power-law distribution’. Foragers that use memory to find food will usually have a fitness advantage over those that use only random searches (Lévy or otherwise).
In the destructive case, Viswanathan et al.  showed that the optimal strategy is to move ballistically (μ → 1). This is intuitively sensible as the forager begins at a location equidistant between the two food items, so there is never any advantage to a strategy that involves backtracking. Any strategy with μ > 1 involves backtracking, but the closer μ is to 1, the higher the probability of selecting a step length that is long enough to reach the target.
In the non-destructive case, where the forager begins each bout very close to one of the food items, Viswanathan et al.  used analytical approximations to show that choosing an exponent 2.2is the optimal strategy. In cases where the ratio λ/rv of mean free path to perceptive range is large, the second term in this expression for μopt is small (e.g. if λ/rv = 100 then μopt = 1.95). Hence, if λ/rv is large but its exact value is not known to the forager, the optimal strategy is to use an exponent μ ≈ 2.
Under the assumptions made by Viswanathan et al.  that each search begins with one of the targets on the very edge of the forager's perceptive range (i.e. x0 = rv) and with the second food item a large distance away (x0 ≪ λ), the advantage conferred by the exponent μ ≈ 2 is highly significant. For example, with a mean free path of λ = 104rv and x0 = rv, a target-truncated LW with μ ≈ 2 is approximately eight times more efficient than either ballistic searching (μ → 1) or Brownian searching (μ > 3). The relative advantage of the Lévy strategy decreases as mean free path decreases. Numerical simulations showed that an equivalent two-dimensional model with randomly placed food items, where every search starts very close to a food item, gives a similar optimal exponent. However, the advantage conferred is smaller than in the one-dimensional model: for similar parameter values, the optimal strategy (μ ≈ 2) is approximately 30 per cent more efficient than the ballistic motion and only 13 per cent more efficient than Brownian motion . In all the above theory and simulations, the minimum step length of the forager lmin is equal to the forager's perceptive range rv, but changing this has very little effect on the results.
The key feature of the non-destructive scenario is that every search starts close to a food item in one direction. This property can be interpreted as genuine non-destructive foraging, where a food item still exists after the forager has eaten (but the forager has moved just outside the perceptive range of the food and cannot remember the direction in which it lies). Alternatively, the non-destructive model can be interpreted as a proxy for a sparse (because rv ≪ λ) and patchy (because x0 ≪ λ) food distribution [16,43]. Finding one food item implies that there are others nearby and hence each search always begins near to a food item. If the forager chooses the wrong direction, it will have to travel a long distance to the next food item, so some backtracking is beneficial. Note that this scenario ignores any memory on the part of the animal.
The effect of relaxing the assumption that each search begins just outside the perceptive range of a food item (x0 = rv ≪ λ) has not yet been systematically explored. We now present new results showing that the starting position x0 is in fact a very important parameter. Figure 3 shows the effect of changing x0 in the model of Viswanathan et al.  on both the optimal exponent and the optimal efficiency relative to the efficiency of ballistic movement. Here, the mean free path is λ = 103rv (increasing this gives a small increase in the relative efficiency of the optimal strategy); the starting position x0 is given as a fraction of the mean free path λ. As the starting position moves further from the target at x = 0, the optimal strategy becomes closer to the ballistic limit and the efficiency gained by following the optimal strategy relative to a ballistic strategy decreases. For example, if the starting position is 10 per cent of the distance between the two food items (x0 = 0.1λ = 100rv), then the optimal exponent is approximately 1.2 and this Lévy search strategy is less than 5 per cent more efficient than a ballistic search.
In a similar way to that in which LW provides a continuum between Brownian and ballistic search modes, the parameter x0 provides a continuum between the destructive and non-destructive extremes. In the destructive case (x0 = λ), ballistic movement (μ → 1) is optimal; in the non-destructive limiting case of x0 → 0, a target-truncated LW with μ = 2 is optimal; intermediate values of x0 lead to intermediate values of μopt. As one possible interpretation of the non-destructive case is that of patchily distributed targets, increasing x0 can be thought of as a proxy for decreasing the patchiness of the distribution of targets.
To conclude, the one-dimensional model of Viswanathan et al.  provides a useful method of moving between Brownian and ballistic search modes, by varying the exponent μ of the power-law walk, and also for moving continuously between destructive and non-destructive foraging, by varying the initial starting point x0. Provided one stays very close to the non-destructive case, there is an optimal search strategy close to μ = 2. However, the optimal exponent is very sensitive to changes in x0, moving closer to the ballistic limit of μ = 1 as x0 increases. Furthermore, when the model is increased to higher dimensions, the improvement in efficiency at the optimal exponent is far smaller.
2.2. Extensions to the basic search model
Further work generalized the conditions under which an LW with μ ≈ 2 is the optimal search strategy, though always with the assumption that the forager starts close to a target. It was shown that neither the inclusion of energy considerations that constrained the allowed exponent values , nor the addition of an absorbing boundary [45,46] alters the optimal exponent of μ ≈ 2. Raposo et al.  and Santos et al.  explored the concept of regenerating targets: food items become temporarily unavailable for a fixed time τ after being located by the forager. As τ approaches 0, the optimal exponent approaches its non-destructive limit of approximately 2. As τ approaches infinity, the destructive limit is recovered and the optimal exponent is therefore 1 (ballistic motion). In fact, as τ increases from 0, the optimal exponent decreases and eventually reaches 1 for a finite value of τ. In the one-dimensional model, this occurs for a delay time of approximately 14 per cent of the typical search time (τ = e−2λ/v, where v is the speed of movement); however, in two dimensions, delay times of less than 1 per cent of the typical search time mean that ballistic motion is optimal . As with the starting position x0, the delay time provides a continuum between the destructive and the non-destructive scenarios. The respective optimal exponents are recovered in these limiting cases, but a relatively small departure from the non-destructive limit (a short delay time for target regeneration) means that LWs are no longer advantageous.
In Santos et al. , the two-dimensional scenario was restricted to a lattice, allowing some analytical results (not usually possible in more than one dimension). In this case, μ ≈ 2 is only optimal for very low target densities; at higher food densities a ballistic search is the optimal strategy. Santos et al.  and Raposo et al.  showed that the presence of defects in the lattice further decreases the advantage of any given strategy over others. Bartumeus et al.  compared the efficiency of LWs with correlated random walks (CRWs)—random walks with a non-heavy-tailed step-length distribution and with some correlation in the directions of successive steps, termed directional persistence . They considered the non-destructive scenario, i.e. where every search starts close to the target, and found that μ ≈ 2 is the optimal Lévy exponent and that this Lévy strategy outperforms CRW with varying degrees of directional persistence.
Reynolds  developed a model for central-place foragers, with a search strategy based on moving in a series of loops from the origin. This is an appropriate model for foragers, e.g. desert ants and honeybees, that know that there is a target in the vicinity of a certain point, as opposed to the freely roaming model of Viswanathan et al. , which assumes no prior knowledge. The random looping strategy assumes that the forager does not have sufficiently accurate navigation mechanisms to execute a deterministic, systematic search of the relevant area reliably. It was shown that, assuming a power-law distribution of loop lengths truncated at a fixed upper limit lc, the optimal exponent is μ ≈ 2, provided the maximum loop length lc is large relative to the typical distance of the target from the origin. The optimal exponent decreases towards 1 as the target is moved further from the origin (i.e. as the forager's knowledge of the target location becomes less precise). Again the key feature of the model is that there exists a target very close to the forager's central place; as in figure 3, relaxation of this assumption shifts the optimal strategy towards ballistic motion.
Reynolds & Bartumeus  showed that if the destructive search model of Viswanathan et al.  is extended to account for the gradual depletion of targets, a target-truncated LW with 1 < μ ≤ 2 outperforms ballistic motion. This result relies on the assumption that the forager cannot remember its direction of movement from one search to the next, otherwise ballistic motion is again optimal. The benefit of an LW is due, in large part, to the restriction to one dimension: initially the targets are equally spaced along a line; after a period of time, an interval of the line will become devoid of targets and the forager will be situated relatively close to one end of the empty interval. The situation thus gradually grows to resemble the non-destructive scenario in which target-truncated LW outperforms ballistic motion. In higher dimensions, LWs are less efficient than ballistic motion, though only marginally so for small exponents (μ < 1.5) and at low target density. However, an LW is slightly more efficient if targets are not always captured when detected, so steps are frequently truncated and new searches commenced in close proximity to an available target . Again, the probability of capture pc can be thought of as providing a continuum between the destructive (pc = 1) case where ballistic motion is optimal and the non-destructive (pc → 0) case where a target-truncated LW with μ ≈ 2 is optimal.
James et al.  showed that, for a non-destructive forager with no knowledge of or ability to react to its environment, the search efficiency is completely independent of both the distribution of food items (for a given food density) and the search strategy. The model of Sims et al.  falls into this category and numerical simulations confirm that, provided a sufficiently long timeframe is considered, the mean search efficiency is always the same (figure 4), regardless of the step-length distribution or the prey distribution (see appendix A for details). Hence, in contrast to the findings of Sims et al. , neither an LW search strategy, nor a power-law distribution of prey confers any increase in search efficiency.
The model of Viswanathan et al.  and subsequent models based on similar principles do not fall into the category considered by James et al.  because, in a non-destructive search, truncating steps when food items are detected ensures that every search starts very close to a target. An efficient strategy will exploit this knowledge.
To summarize, a number of models have been developed wherein the forager starts every search very close to a target. These models can represent either revisitable (non-destructive foraging) or patchily distributed targets. In this class of model, the most efficient strategy, of those with purely a power-law step-length distribution, is a target-truncated LW with an exponent between 1 and 2 . The closer the situation to the sparse, non-destructive limit, the closer the optimal exponent to 2. This theory applies only to a forager with restricted perceptive capabilities and with no memory or prior knowledge of the environment, a set of conditions that is likely to be rare in real foragers. For destructive foraging, ballistic motion tends to be the optimal strategy. Between the non-destructive and destructive limits, there is a continuum of cases, which can be characterized by any one of several model parameters, such as target regeneration time , target patchiness and probability of target detection . Importantly, relatively small deviations from the idealized non-destructive scenario owing to factors such as these rapidly attenuate, or even completely remove, the advantage of the LW strategy.
2.3. Searching for moving targets
Another strand of theoretical work considers the case of moving targets. Bartumeus et al.  and Viswanathan et al.  used a periodic, one-dimensional model, where the predator and the food item (in this case a mobile prey) move on a domain equivalent to the perimeter of a circle. Predator and prey each follow a movement strategy with step lengths chosen from a power-law distribution, with a random direction for each step and with constant speed. Two power-law movement strategies were considered: μ = 2, referred to by Bartumeus et al.  as the Lévy strategy; and μ = 3, referred to as the Brownian strategy (although it should be noted that μ = 3 actually corresponds to an LW as the step-length distribution has infinite variance for this value of μ). Each simulation begins with the predator and the prey placed randomly on the circle, i.e. there is no assumption that the prey is initially close to the predator (foraging is destructive). Simulations were used to calculate the predator's search efficiency, defined by equation (2.1), for different relative velocities and perceptive ranges of predator and prey. In almost every case, the ‘Lévy’ predator outperforms the ‘Brownian’ predator. In the most advantageous case of a large, fast-moving predator searching for a small, slow-moving prey, the Lévy predator is almost four times more efficient than the Brownian one. In the worst-case scenario (small, slow predator and large, fast prey), the Lévy and Brownian predators have the same efficiency.
The work of Bartumeus et al.  was extended by James et al.  to compare the whole range of power-law exponents, as originally considered by Viswanathan et al. , rather than just the special cases of μ = 2 and μ = 3. It was found that, regardless of the prey movement strategy, the most efficient predator strategy is ballistic motion (μ → 1). In almost every scenario (fast/slow, large/small prey), both strategies considered by Bartumeus et al.  are either equalled or outperformed by the ballistic strategy. The only exception to this is the case where both predator and prey follow a ballistic strategy with the same speed. In this case the periodic nature of the model means that, if they both choose the same movement direction, they will simply move around the circle indefinitely without ever meeting. This is clearly an unrealistic, degenerate case, and when this scenario is converted to the more realistic, non-periodic model that it is designed to represent, the ballistic strategy once again outperforms all other strategies . These results are consistent with those of Bartumeus et al. , who also carried out simulations in two and three dimensions and generalized to an environment with many targets moving independently.
2.4. Composite and intermittent strategies
One of the most useful features of a random walk with power-law-distributed step lengths is that variation of the exponent μ allows a continuous change between different types of walk as described above. However, despite this range of walks that can be described by a power-law distribution, there are many types of search strategy that cannot. Intermittent behaviour, where the forager's movements consist of a mixture of strategies, has been frequently observed in animal movement paths [55–59]. Composite random walks, where the forager has two or more different modes of movement, have been suggested as a model of this behaviour. Benhamou  carried out simulations of a composite random walk model in the same one-dimensional scenario as Viswanathan et al. . Again the forager always starts close to a food item but, instead of taking step lengths from a power-law distribution, the forager first undertakes a local intensive search consisting of short steps. If it has not located a food item after some time τ (called the giving-up time), it switches to an extensive ballistic search, i.e. the predator chooses a direction at random and moves in a straight line until the next food item is found. Plank & James  used stochastic differential equations to gain analytical results for a similar model. Both sets of results show that if the correct giving-up time is chosen, this simple composite strategy can outperform a target-truncated LW.
Reynolds  subsequently noted that the composite Brownian/ballistic strategy of Benhamou  could be interpreted as a special case of a composite LW (or adaptive LW in the terminology of Reynolds ), with the exponent μ switching between 3 and 1. Reynolds  considered a more general adaptive LW, in which the extensive phase exponent μ is not fixed at 1 (ballistic motion) but can take any value. It was shown that the optimal extensive phase exponent in general lies between 1 and 2 and moves closer to 2 as the target density decreases. This is under the assumption of a fixed giving-up time (Plank & James  optimized over all giving-up times) and that the forager begins each bout close to one food item (x0 ≪ λ). Increasing x0 will shift the optimal exponent towards the limiting ballistic value of 1 (as in figure 3).
Bartumeus et al.  and Bartumeus & Levin  considered a ‘Lévy modulated’ CRW, where random reorientations, which break the short-term directional persistence of the CRW, occur after periods of time drawn from a power-law distribution. A classical CRW is recovered in the limit μ → 1 and it was shown that, for non-destructive foraging where every search starts close to a target, the efficiency is increased by choosing a reorientation exponent of μ ≈ 2.
The concept of a dual-mode searching regime was also considered by Bénichou et al. . A composite (or intermittent in the terminology of ) model was proposed wherein the searcher switches between an intensive (searching) phase, consisting of Brownian motion, and an extensive (relocation) phase, in which the forager is incapable of detecting the target. The duration of each phase is assumed to be exponentially distributed. The model corresponds most closely to the destructive foraging regime, as the initial location of the forager is uniformly distributed throughout the domain. Bénichou et al.  showed that if the targets are sparse and can only be detected during the Brownian search phases, the optimal strategy is an intermittent Brownian/ballistic search, with a power-law relationship between the times spent in the two phases. Bénichou et al.  extended the model to two dimensions and showed that the intermittent strategy is more efficient than a simple, target-truncated LW. Thus, although the extensive phase relocations are wasteful in the short term as the target cannot be detected, they improve long-term search efficiency by reducing the oversampling associated with Brownian motion. This result was further generalized by Lomholt et al.  to show that the efficiency of the intermittent search can be increased if the extensive phase duration (i.e. relocation step length) is chosen from a power-law distribution, rather than an exponential distribution. The optimal exponent for the relocation step length decreases from 3 towards 2 as the target density decreases. The efficiency of this Brownian/Lévy intermittent strategy is also less sensitive to changes in target density than the Brownian/ballistic strategy. It should be noted that following either a Brownian/ballistic or a Brownian/Lévy intermittent strategy is only advantageous under the assumption that the forager is completely incapable of detecting targets in the relocation phases, and can only detect them during periods of Brownian motion. Bénichou et al.  showed that, if the targets are readily detectable in any movement phase, the optimal strategy is simply ballistic motion, in agreement with the original destructive foraging results of Viswanathan et al. .
Reynolds  proposed a model of intermittency in which the forager follows the original target-truncated LW strategy of Viswanathan et al. , but is incapable of detecting a food item during any step whose length l is greater than some threshold l0. In this case, the optimal strategy is a target-truncated LW with μ ≈ 2 in both the destructive and non-destructive cases. The scaling between the durations of intensive (l < l0) and extensive (l > l0) is consistent with that found by Bénichou et al. .
In summary, the original model of Viswanathan et al.  showed that the optimal strategy for destructive foraging in a non-patchy environment (i.e. where each search does not necessarily begin close to a food item) is ballistic motion. This finding rests on the assumption that this type of motion does not degrade the forager's perceptive ability. If there is a loss of perceptive ability associated with long, straight movements, intermittent strategies with Brownian searching interspersed with Lévy relocation steps (with exponent μ between 1 and 2) become advantageous.
3. Observations of foraging movements
3.1. Fitting power-law distributions to data
There is a large body of empirical research that has purportedly found evidence of LW in movement data (see §1 for examples). Such evidence originally arose from comparing straight lines with data plotted on log–log axes [9,16]. Viswanathan et al.  plotted their data using the geometric midpoints of bins whose widths progressively doubled in size, and drew a line corresponding to a power-law distribution with μ = 2. This line appeared to give a good fit to the data (though a linear regression actually gives μ =1.89 ± 0.07 (s.e.)). They also used other techniques to reveal long-range correlations in the data.
The realization that data of Viswanathan et al.  had been misinterpreted led to the original conclusions being overturned by Edwards et al. . However, a wider issue is the demonstration of problems with the statistical techniques, many of which were subsequently adopted by other researchers. For instance, Viswanathan et al.  used bins of equal width on a linear scale (rather than logarithmic, as for ). Sims et al.  showed that this method leads to inaccurate estimates of μ, and recommended the method of Viswanathan et al.  that involved the doubling of bin widths. White et al.  and Edwards  then showed that this regression-based method gives biased estimates of μ. Moreover, when applied to a dataset on grey seal movement, this method estimated μ = 0.8 (see fig. 4c of Sims et al. ). However, exponents of μ ≤ 1, by definition, do not correspond to a properly defined probability distribution.
A further problem with regression-based methods is that they adjust two parameters—the slope and the intercept of the line on log–log axes—to obtain the best fit to the data. The slope is used to estimate the exponent μ of the power-law distribution and, because of the requirement that the total area under the fitted distribution must be 1, the intercept then implicitly determines the value of lmin (see equation (1.1)). However, there is no guarantee that this value of lmin will be consistent with the data: some or even all of the data points may actually be smaller than lmin (see appendix B for details and an example where the fitted lmin is more than double the largest data point). These problems with regression-based methods do not seem to have been previously acknowledged. The problems clearly demonstrate that these methods are inherently flawed because they can give results that are inconsistent with the power-law hypothesis that they purport to be testing.
White et al.  and Edwards  instead advocated use of the maximum-likelihood method , which was shown to provide an accurate estimate of μ (this realization concerning power laws has previously occurred in other fields—see references in White et al. ). Furthermore, as the maximum-likelihood method is fundamentally based on the distribution for which it is being used to estimate parameters, it is neither possible to inadvertently fit values of lmin that are incompatible with the data, nor to get values of μ ≤ 1.
It is possible to use statistical techniques to fit a power-law distribution to any dataset; this does not mean that the power-law distribution is a suitable description of the data. Thus, Edwards et al.  used the Akaike information criterion (e.g. ) to compare the weight of evidence for alternative competing models, including both bounded and unbounded power laws, and used goodness-of-fit tests to test whether the best model was indeed suitable for the data. Elliott et al.  used this approach to show that there is no evidence for a power-law search pattern in Arctic seabirds. Edwards  applied this approach to 17 datasets for which unbounded power laws were previously concluded and found that such power laws were not supported by the data (and were overwhelmingly rejected for 16 of the datasets). Edwards  also tested the bounded power-law distribution, and found it to be consistent with the data for only one of 17 datasets. This calls into question much of the empirical evidence for LW and LF.
3.2. Heavy-tailed characteristics of non-Lévy walks
It is becoming increasingly recognized that certain key properties of non-LWs can have distributions that are heavy-tailed. It is well known that the first passage times tf of a one-dimensional Brownian walk with a single absorbing barrier have an inverse Gaussian distribution with a power-law tail tf−α with exponent α = 3/2 (e.g. , §13.4). The first passage time is equivalent to the search time (total time taken to locate a food item) in the model of Viswanathan et al.  with only one food item at x = 0 (i.e. no second food item at x = 2λ). More recently, Reynolds  showed analytically that in a one-dimensional, continuous-time CRW, the distances between changes in direction (i.e. the step lengths) have a (truncated) power-law tail with exponent 4/3. It should be noted that these properties do not necessarily extend to higher dimensions.
Reynolds  considered a population in which each individual follows a Brownian random walk with mean step length 1/λ, and where the parameter λ varies across the population with some probability density function f(λ). If f(λ) ∼ λv as λ → 0, then the distribution of step lengths over the whole population has a power-law tail: p(l) ∼ l−(2+v) for large l. The same result applies in the long term if the heterogeneity arises from a single individual switching between different values of λ according to the distribution f(λ) . Although the condition that f(λ) ∼ λv as λ → 0 encompasses a range of distributions, including the exponential and gamma distributions, it is important to realize that it is equivalent to assuming that the distribution of mean step lengths (1/λ) has a power-law tail with exponent 2 + v. It is not surprising that this assumption leads to an overall distribution of step lengths that also has a power-law tail, though it does offer an alternative explanation for the observation of heavy-tailed distributions. Petrovskii & Morozov , Hapca et al.  and Gurarie et al.  obtained similar results showing that the positional distribution of a heterogeneous, diffusive population can be heavy-tailed, using both theoretical arguments and empirical data.
3.3. Observed distributions
While the theoretical models all define a Lévy movement strategy as one where the distances between changes in direction are drawn from a power-law distribution, these distances are difficult to observe in practice without continuous positional data. The published datasets usually relate to more readily observable quantities, for example the assumed times between finding food items [9,58,68], referred to as search times, or the location of the forager at given time intervals [18–20,74].
In the case where the forager always moves with constant speed, search time is effectively equivalent to the total distance travelled before locating a food item. It should be noted that this total distance is not a step length but the sum of several step lengths (with the last step truncated by prey detection). Therefore, the distribution of total distances (or search times) is a priori unrelated to the step-length distribution of the underlying random walk.
As observed in §3.2, search times for one-dimensional searches with a single target have a power-law tail with exponent α = 3/2. We now present new results analysing the search time data from simulations of the non-destructive model of Viswanathan et al.  with two targets, separated by distance 2λ (see §2.1 and figure 2). We assume that the forager always moves with constant speed 1, so that the search time is equivalent to the total distance travelled. The forager is assumed to move with a power-law distribution of step lengths with exponent μwalk and to have perceptive range rv = 0.001λ, starting position x0 = rv and minimum step length lmin = rv. For each simulation, the search time was recorded; each simulation was repeated 104 times. A power-law distribution (with minimum value lmin) was fitted to the sample of 104 search times and the best-fit exponent α was calculated using maximum likelihood [32,66]. This process was repeated for a range of values of μwalk. In all cases tested, including the non-Lévy case (μwalk > 3), a power law provides a better fit (higher likelihood) to the search time data than an exponential distribution. The best-fit exponent α for the search times is shown in figure 5 for a range of random walk exponents μwalk. This shows that if a forager's step lengths are chosen from a power-law distribution, the resulting search time data have an exponent of approximately 3/2, which can be significantly smaller than the underlying random walk exponent. Even in cases where the forager's step-length distribution is not heavy-tailed (μwalk > 3), the search time data (or equivalently the distance travelled to find a food item) are heavy-tailed (α ≤ 3), despite the fact that the forager is not moving according to an LW.
The results shown in figure 5 highlight the fact that a heavy-tailed distribution of search times can arise from a non-LW. Therefore, the observation of heavy tails in search time data (or equivalently, under the assumption of constant speed, search distances) should not be used to infer that the forager is undergoing an LW. It should be noted that these results are for a one-dimensional search. Although mean search efficiency has been calculated for an equivalent two-dimensional model , the distributional properties of search times for a two-dimensional search have not yet been well characterized. Search times in two dimensions may not be heavy-tailed either for LWs or non-LWs (this is partly a consequence of the fact that any random search strategy is much less likely to find the food item that is initially nearby in two dimensions than in one). Hence, empirical data on search times in two dimensions (e.g. ) cannot currently be used to distinguish between Lévy and non-Lévy models.
Reynolds  claimed that LWs are robust to subsampling, in the sense that the distances between the locations of a Lévy walker at discrete time intervals (e.g. ) have a distribution that is approximately linear on log–log axes. However, no comparison was made with an alternative distribution (and see §3.1 for problems with regression-based methods on log–log axes). Plank & Codling  showed that a power-law distribution gives a better fit (higher maximum likelihood) than an exponential distribution to the sub-sampled data if the underlying LW has an exponent μwalk < 2, but that an exponential distribution gives a better fit if μwalk > 2. Plank & Codling  also showed that a power law can, depending on the sampling rate, give a better fit than an exponential distribution to data from a composite CRW (i.e. a CRW comprising two distinct phases of movement), if there is sufficient heterogeneity between the two behavioural modes. As shown by Viswanathan et al. , it is necessary to observe a movement path for a sufficient period of time, much longer than the inherent persistence time of the random walk, to be able to distinguish a genuinely superdiffusive pattern from a movement that is ultimately Brownian.
Some empirical studies have used positional data to designate reorientation events as points where the direction of motion changes by more than some specified threshold angle [21,22,77]. The distances between successive reorientation events can then be fitted to a power law and other candidate distributions. Codling & Plank  showed that the best-fit distribution can be sensitive to the choice of threshold angle. These findings underline the need for absolute goodness-of-fit tests, as well as a comparison of the relative fit of candidate models . It is quite possible for all candidate distributions to have a poor fit, in which case alternative models are needed before any meaningful conclusion can be drawn.
3.4. Simulation studies
There has been a substantial amount of work done analysing the move length distributions resulting from model simulations in which foragers follow simple rules. These studies show that biologically motivated movement mechanisms can produce heavy-tailed observations. For example, Reynolds  considered a model of olfactory-driven foraging in bumble-bees. Reynolds  explored a model of individuals that avoid odour trails left by conspecifics (termed a self-avoiding walk), as occurs, for example, in carabid beetles . Reynolds  modelled a forager using chemotaxis (movement up a gradient of concentration of some chemical produced by search targets) to search for food. Reynolds  modelled bumble-bees foraging destructively on a two-dimensional lattice. In both the empirical and simulation studies by Reynolds et al. linear regression on log–log data was used to find the best-fit values for quantities such as the exponent μ of the observed move length distribution, the fractal dimension D of the movement path and the exponent α of the scaling of the root mean square fluctuation with time. Brownian motion would give μ > 3, D ≈ 2 and α ≈ 0.5, whereas Reynolds finds values of 1 < μ ≤ 3, 1 < D < 2 and 0.5 < α < 1, indicative of scale-free characteristics in the observed data . Reynolds  showed that these characteristics are a necessary but not sufficient condition for the presence of an LW, as other types of fractal movement strategy (e.g. fractional Brownian motion ) will lead to values in these ranges (although these are less efficient than an LW in the model of Viswanathan et al. ). Codling & Plank  further showed that a non-fractal strategy (a composite CRW or a heterogeneous population of individuals following a CRW) can give rise to statistics in the scale-free ranges, depending on the rate at which the forager's location is sampled, and the threshold angle used to designate reorientation events.
The implication of simulation studies such as these is that there is a wide range of search strategies not based on LW that can give rise to heavy-tailed observational data. These issues highlight the fact that there can be a major discrepancy between the pattern that is obtained by observation of a forager's movements and the underlying process (such as a Brownian walk, LW or other set of movement rules) governing the behaviour of the forager. The observed pattern is the result of applying imperfect sampling methods to the actual movement path, which itself arises from the interaction of the underlying process with the environment (in particular, the spatial distribution of food items). Any of these three factors (underlying process, distribution of food, sampling methods) can contribute to heavy-tailed distributions in the observational data [37,41,42]. More sophisticated state-space models are emerging that attempt to identify switching between different behavioural modes, such as transiting and foraging [57,59]. Identifying distinct behavioural modes in this way offers a possible route to understanding the cues that foragers use to switch between them. This is potentially more informative than simply fitting a single probability distribution to the entire dataset.
One of the key questions in the field of optimal foraging is: ‘under what circumstances is it advantageous for a forager to follow a movement strategy based on an LW?’ Crucially, the answer to this question depends on what alternative strategies are realistically available to the forager. Theoretical models typically assume that the forager has no memory of anything prior to its most recent detection of a food item, and that its choice of strategy therefore corresponds to choosing a probability distribution for its move lengths. These steps are truncated whenever the forager detects a food item; a new move length is then drawn from the distribution and a new direction is chosen at random. Under this assumption, a (target-truncated) LW is the optimal strategy when searching for targets that are not destroyed on consumption, but remain available as future targets and are just outside the forager's perceptive range at the beginning of the subsequent search. This scenario can be thought of as a proxy for destructive foraging in a sparse, patchy environment, with the key assumption being that each new search begins with a target just outside the perceptive range. It is worth stressing that in the two-dimensional version of the original model of Viswanathan et al. , the LW strategy is only 13 per cent more efficient (in terms of the mean distance travelled to find a food item) than the Brownian motion.
The new results presented in this paper show that relatively small deviations from the idealized model of Viswanathan et al.  can destroy the μ = 2 LW optimum, and greatly reduce the advantage of LW search strategies in general. For example, if there is a small increase in the initial distance between forager and target (figure 3), or a short period of time following detection for which a target is available for future searches , the optimal Lévy exponent decreases from μ = 2 towards the ballistic limit of μ = 1. Furthermore, the efficiency of an LW relative to that of a ballistic search is greatly reduced. Therefore, the theoretical optimum of a μ = 2 LW is not as robust as is widely thought.
If the restriction on the forager's memory is removed, allowing even a modest level of cognitive ability, a wide range of strategies becomes available. For instance, the forager can modify its behaviour depending on the amount of time since the last food item was detected [36,38,60] or maintain some directional persistence from one step to the next [37,50,61,87], a strategy that may help the forager to move into the centre of a patch or to avoid excessive backtracking. Such strategies have more of a mechanistic basis in the behavioural biology of the forager than a pure LW. It is also clear from the wider literature that real foragers exhibit behaviours driven by a range of motivations , many of which are exceedingly difficult to include in such simplified models as those discussed here. For example, bees have been observed to exhibit highly complex memory and associative learning characteristics  and it is thought that hummingbirds and hummingbird flowers have co-adapted to allow for more efficient nectar harvesting .
It is becoming increasingly apparent that a wide range of movement strategies not based on an LW can lead to the observation of heavy-tailed patterns. These include an apparent power-law distribution of observed move lengths, superdiffusive movement, long-term correlations in the reorientation data and fractal movement paths. These scale-free characteristics can be generated via the interaction of the forager's behaviour with the distribution of food in the environment , via the sampling methods used to observe the movement  or via demographic or temporal heterogeneity in individual movement strategies [41,71]. As noted by Benhamou , the fact that a heavy-tailed, heterogeneous population of Brownian walkers (or an individual switching between different Brownian strategies) can produce heavy-tailed observations is quite different from saying that animals have evolved to spontaneously perform LW as an optimal search strategy. While it is certainly valuable to recognize that scale-free patterns can arise from a wide variety of natural mechanisms, it is essential to remember that observing such patterns does not imply that the forager is ‘doing an LW’. In the absence of additional information, little more can be inferred from the observation of scale-free characteristics than that the forager is not undergoing pure Brownian motion, nor is it moving in an environment in which food items are uniformly (rather than patchily) distributed and revisitable. Furthermore, many inferences of LWs from data have not held up to closer scrutiny. In light of this, the hypothesis that foragers have evolved to follow an optimal LW strategy (e.g. [24,25,90]) has little supporting evidence in the way of observational data.
We would like to thank Simon Benhamou, Richard Brown, Edward Codling and Jonathan Pitchford for helpful discussions. We also thank three anonymous referees for their very thoughtful and knowledgable comments that have helped to improve this work.
Appendix A. Numerical simulations
We constructed a two-dimensional simulation based on the model described in electronic supplementary methods and results 3 of Sims et al. . This model is analytically intractable because of its complexity. It contains features present in the work of Viswanathan et al. , for example a patchy environment and a Lévy search strategy, but the forager does not truncate its steps when a food item is found. Sims et al.  found a significant increase in the forager's efficiency (measured as total amount of biomass found in a simulation run) when the forager performed an LW rather than a non-Lévy random walk. It was also found that a forager in a patchy environment outperforms a forager in a non-patchy environment (where the food is distributed uniformly).
We carried out the simulations on a two-dimensional grid consisting of 5000 cells horizontally by 2500 cells vertically. Each cell contains an amount of prey biomass. The forager starts at the top-left corner of the grid (0,0) and at each step moves in a straight line to a point that is one grid cell to the right and n grid cells down, where n is drawn from a specified distribution. In each simulation, the forager takes 5000 such steps (i.e. eventually traverses the width of the landscape). If a step takes the forager beyond the lower boundary of the landscape, it re-enters the landscape at the upper boundary (i.e. periodic boundary conditions are applied on the upper and lower boundaries). The total amount of biomass consumed is the sum of the biomass contained in all the cells the forager passes through. Efficiency is defined as the total amount of biomass consumed divided by the number of cells travelled through (i.e. the area searched), divided by the total amount of prey biomass available in the entire landscape.
The distribution of prey biomass across the landscape is generated as a series of patches. Each patch is constructed using a simple, unbiased random walk . Patches are pasted into the landscape either with an exponential or a power-law distribution for the distance between successive patch centres. (Following Sims et al. , these two cases for the distribution of prey are termed ‘random’ and ‘Lévy’, respectively.) In each case, a sufficient number patches is pasted into the landscape so that the total amount of available biomass is approximately 106 units.
The forager's vertical move lengths follow either a uniform distribution on the integers between 1 and 10, or a truncated power-law distribution on the integers with exponent μ = 2, minimum step length 1 and maximum step length 2500 (corresponding to the total height of the domain). (Again these two cases for the forager's strategy are termed ‘random’ and ‘Lévy’, respectively.) The upper limit of 10 for the uniform distribution is chosen to ensure that the two distributions have similar means so that, on average, the forager visits approximately the same number of cells in each simulation, regardless of the move length strategy.
We carried out extensive numerical explorations of this model using different patch generation details and different move length distributions. The results show that, when scaled to account for the total area searched (i.e. the total number of cells visited) and the total amount of prey biomass available, the expected amount of biomass obtained per simulation is always the same, provided enough simulations are performed (figure 4). In cases where an LW is involved (either for the prey distribution or the forager's movements), the convergence is: (i) extremely slow, meaning that very many simulations need to be carried out to obtain a reliable answer; (ii) biased, meaning that if insufficient simulations are carried out, the efficiency always appears to be greater than the true value. These findings are consistent with the results of James et al. .
Appendix B. Regression-based methods and minimum step size
When linear regression is used to fit a straight line to data on log–log axes, the resulting probability density function is of the form ln p(l) = a ln l + b, or equivalently p(l) = Cl−μ with C = eb and μ = −a. Thus, the slope a of the straight line determines the exponent μ of the power-law distribution, and the intercept b determines the normalization constant C. In order for this to be a well-defined probability distribution on l ∈ [lmin,∞), not only must μ be greater than 1, but the normalization constant must also satisfy This requires that Substituting C = eb and μ = −a and rearranging for lmin shows that B 1
Hence lmin is implicitly determined by the regression parameters a and b, and there is no guarantee that the value of lmin given by equation (B 1) will provide the best fit for (or even be consistent with) the data.
For example, Austin et al.  fitted power-law distributions to the movement lengths of grey seals using linear regression. In one case (see fig. 3 of Austin et al.  and table 1), the regression parameters are a = −1.26 and b = −0.474. According to equation (B 1), the fitted distribution needs to have a value of lmin of 31 km. It is clearly nonsensical to use such a distribution to describe data that are in the range 2–15 km.
- Received April 4, 2011.
- Accepted May 5, 2011.
- This journal is © 2011 The Royal Society