## Abstract

On studying strategy update rules in the framework of evolutionary game theory, one can differentiate between imitation processes and aspiration-driven dynamics. In the former case, individuals imitate the strategy of a more successful peer. In the latter case, individuals adjust their strategies based on a comparison of their pay-offs from the evolutionary game to a value they aspire, called the level of aspiration. Unlike imitation processes of pairwise comparison, aspiration-driven updates do not require additional information about the strategic environment and can thus be interpreted as being more spontaneous. Recent work has mainly focused on understanding how aspiration dynamics alter the evolutionary outcome in structured populations. However, the baseline case for understanding strategy selection is the well-mixed population case, which is still lacking sufficient understanding. We explore how aspiration-driven strategy-update dynamics under imperfect rationality influence the average abundance of a strategy in multi-player evolutionary games with two strategies. We analytically derive a condition under which a strategy is more abundant than the other in the weak selection limiting case. This approach has a long-standing history in evolutionary games and is mostly applied for its mathematical approachability. Hence, we also explore strong selection numerically, which shows that our weak selection condition is a robust predictor of the average abundance of a strategy. The condition turns out to differ from that of a wide class of imitation dynamics, as long as the game is not dyadic. Therefore, a strategy favoured under imitation dynamics can be disfavoured under aspiration dynamics. This does not require any population structure, and thus highlights the intrinsic difference between imitation and aspiration dynamics.

## 1. Introduction

In the study of population dynamics, it turns out to be very useful to classify individual interactions in terms of evolutionary games [1]. Early mathematical theories of strategic interactions were based on the assumption of rational choice [2,3]: an agent's optimal action depends on its expectations of the actions of others, and each of the other agents' actions depend on their expectations about the focal agent. In evolutionary game theory, successful strategies spread by reproduction or imitation in a population [4–8].

Evolutionary game theory not only provides a platform for explaining biological problems of frequency-dependent fitness and complex individual interactions, such as cooperation and coordination [9,10]. In finite populations, it also links the neutral process of evolution [11] to frequency dependence by introducing an intensity of selection [12–15]. Evolutionary game theory can also be used to study cultural dynamics, including human strategic behaviour and updating [16–18]. One of the most interesting open questions is how do individuals update their strategies based on the knowledge and conception of others and themselves?

Two fundamentally different mechanisms can be used to classify strategy updating and population dynamics based on individuals' knowledge about their strategic environment or themselves: imitation of others and self-learning based on one's own aspiration. In imitation dynamics, players update their strategies after a comparison between their own and another individual's success in the evolutionary game [19–21]. For aspiration-driven updating, players switch strategies if an aspiration level is not met, where the level of aspiration is an intrinsic property of the focal individual [22–25]. In both dynamics, novel strategies cannot emerge without additional mechanisms, for example spontaneous exploration of strategy space (similar to mutation) [19,26–30]. The major difference is that the latter does not require any knowledge about the pay-offs of others. Thus aspiration-level-based dynamics, a form of self-learning, require less information about an individual's strategic environment than do imitation dynamics.

Aspiration-driven strategy-update dynamics are commonly observed in studies of animal and human behavioural ecology. For example, fish would ignore social information when they have relevant personal information [31], and experienced ants hunt for food based on their own previous chemical trails rather than imitating others [32]. Furthermore, a form of aspiration-level-driven dynamics plays a key role in the individual behaviours in rat populations [33]. These examples clearly show that the idea behind aspiration dynamics, i.e. self-evaluation, is present in the animal world. In behavioural sciences, such aspiration-driven strategy adjustments generally operate on the behavioural level. However, it can be speculated that self-learning processes can have such an effect that it might actually have a downward impact on regulatory, and thus genetic levels of brain and nervous system. This, in turn, could be seen as a mechanism that alters the rate of genetic change [34]. Whereas such wide-reaching systemic alterations are more speculative, it is clear that aspiration levels play a role in human strategy updating [23].

We study the statistical mechanics of a simple case of aspiration-driven self-learning dynamics in well-mixed populations of finite size. Deterministic and stochastic models of imitation dynamics have been well studied in both well-mixed and structured populations [6,19,24,26,35,36]. For aspiration dynamics, numerous works have emerged studying population dynamics on graphs, but its impact in well-mixed populations—a basic reference case, one would think—is far less well understood. Although deterministic aspiration dynamics, i.e. a kind of win-stay-lose-shift dynamics, in which individuals are perfectly rational have been analysed [37], it is not clear how processes with imperfect rationality influence the evolutionary outcome. Here, we ask whether a strategy favoured under pairwise comparison-driven imitation dynamics can become disfavoured under aspiration-driven self-learning dynamics. To this end, in our analytical analysis, we limit ourselves to the weak selection, or weak rationality approximation, where pay-offs via the game play little role in the decision-making [35]. It has been shown that under weak selection, the favoured strategy is invariant for a wide class of imitation processes [21,27,38]. We show that for pairwise games, the aspiration and imitation dynamics always share the same favoured strategies. For multi-player games, however, the weak selection criterion under aspiration dynamics that determines whether a strategy is more abundant than the other differs from the criterion under imitation dynamics. This paves the way to construct multi-player games, for which aspiration dynamics favour one strategy, whereas imitation dynamics favour another. Furthermore, in contrast to deterministic aspiration dynamics, if the favoured strategy is determined by a global aspiration level, the average abundance of a strategy in the stochastic aspiration dynamics is invariant with respect to the aspiration level, provided selection is weak. We also extrapolate our results to stronger selection cases through numerical simulation.

## 2. Mathematical model

### 2.1. Evolutionary games

We consider evolutionary game dynamics with two strategies and *d* players. From these, the more widely studied 2 × 2 games emerge as a special case [36]. In individual encounters, players obtain their pay-offs from simultaneous actions. A focal player can be of type *A*, or *B*, and encounter a group containing *k* other players of type *A*, to receive the pay-off *a _{k}*, or

*b*. For example, a

_{k}*B*player, which encounters

*d −*1 individuals of type

*A*, obtains pay-off

*b*

_{d−}_{1}. An

*A*player in a group of one other

*A*player, and thus

*d −*2

*B*players obtains pay-off

*a*

_{1}. All possible pay-offs of a focal individual are uniquely defined by the number of

*A*in the group, such that the pay-off matrix reads 2.1For any group engaging in a one-shot game, we can obtain each member's pay-off according to this matrix.

In a finite well-mixed population of size *N*, groups of size *d* are assembled randomly, such that the probability of choosing a group that consists of another *k* players of type *A*, and of *d* − 1 − *k* players of type *B*, is given by a hypergeometric distribution [39]. For example, the probability that an *A* player is in a group of *k* other *A*s is given by prob_{A} where *i* (*i* ≥ *d*) is the number of *A* players in the population, and is the binomial coefficient.

The expected pay-offs for any *A* or *B* in a population of size *N*, with *i* players of type *A* and *N* − *i* players of type *B*, are given by
2.2and
2.3

In summary, we define a *d*-player stage game [7], shown in equation (2.1), from which the evolutionary game emerges such that each individual obtains an expected pay-off based on the current composition of the well-mixed population. In the following, we introduce an update rule based on a global level of aspiration. This allows us to define a Markov chain describing the inherently stochastic dynamics in a finite population: probabilistic change of the composition of the population is driven by the fact that each individual compares its actual pay-off to an imaginary value that it aspires. Note here that we are only interested in the simplest way to model such a complex problem and do not address any learning process that may adjust such an aspiration level as the system evolves. For a sketch of the aspiration-driven evolutionary game, see figure 1.

### 2.2. Aspiration-level-driven stochastic dynamics

In addition to the inherent stochasticity in finite populations, there is randomness in the process of individual assessments of one's own pay-off as compared to a random sample of the rest of the population; even if an individual knew exactly what to do, that individual might still fail to switch to an optimal strategy, e.g. owing to a trembling hand [40,41].

Here, we examine the simplest case of an entire population having a certain level of aspiration. Players need not see any particular pay-offs but their own, which they compare to an aspired value. This level of aspiration, *α*, is a variable that influences the stochastic strategy updating. The probability of switching strategy is random when individuals' pay-offs are close to their level of aspiration, reflecting the basic degree of uncertainty in the population. When pay-offs exceed the aspiration, strategy switching is unlikely. At high values of aspiration compared to pay-offs, switching probabilities are high.

The level of aspiration provides a global benchmark of tolerance or dissatisfaction in the population. In addition, when modelling human strategy updating, one typically introduces another global parameter that provides a measure for how important individuals deem the impact of the actual game played on their update, the intensity of selection, *ω*. Irrespective of the aspiration level and the frequency-dependent pay-off distribution, vanishing values of *ω* refer to nearly random strategy updating. For large values of *ω*, individuals' deviations from their aspiration level have a strong impact on the dynamics.

Note that although the level of aspiration is a global variable and does not differ individually, owing to pay-off inhomogeneity there can always be a part of the population that seeks to switch more often owing to dissatisfaction with the pay-off distribution.

In our microscopic update process, we randomly choose an individual, *x*, from the population and assume that the pay-off of the focal individual is *π*_{x}. To model stochastic self-learning of aspiration-driven switching, we can use the following probability function:
2.4which is similar to the Fermi rule [22,42] but replaces a randomly drawn opponent's pay-off by one's own aspiration. The wider the positive gap between aspiration and pay-off, the higher the switching probability. Reversely, if pay-offs exceed the level of aspiration individuals become less active with increasing pay-offs. The aspiration level, *α*, provides the benchmark used to evaluate how ‘greedy’ an individual is. Higher aspiration levels mean that individuals aspire to higher pay-offs. If pay-offs meet aspiration, individuals remain random in their updates. If pay-offs are below aspiration, switching occurs with probability larger than random; if they are above aspiration, switching occurs with probability lower than random. The selection intensity governs how strict individuals are in this respect. For *ω* = 0, strategy switching is entirely random (neutral). Low values of *ω* lead to switching only slightly different from random but follow the impact of *α*. For increasing *ω*, the impact of the difference between pay-offs and the aspiration becomes more important. In the case of *ω → ∞*, individuals are strict in the sense that they either switch strategies with probability one if they are not satisfied or stay with their current strategy if their aspiration level is met or overshot.

The spread of successful strategies is modelled by a birth–death process in discrete time. In one time step, three events are possible: the abundance of *A*, *i*, can increase by one with probability decrease by one with probability or stay the same with probability All other transitions occur with probability zero. The transition probabilities are given by
2.5
2.6
2.7

In each time step, a randomly chosen individual evaluates its success in the evolutionary game, given by equations (2.2) or (2.3), compares it to the level of aspiration and then changes strategy with probability lower than 1/2 if its pay-off exceeds the aspiration. Otherwise, it switches with probability greater than 1/2, except when the aspiration level is exactly met, in which case it switches randomly (note that this is very unlikely to ever be the case).

Compared to imitation (pairwise comparison) dynamics, our self-learning process, which is essentially an Ehrenfest-like Markov chain, has some different characteristics. Without the introduction of mutation or random strategy exploration, there exists a stationary distribution for the aspiration-driven dynamics. Even in a homogeneous population, there is a positive probability that an individual can switch to another strategy owing to the dissatisfaction resulting from pay-off–aspiration difference. This facilitates the escape from the states that are absorbing in the pairwise comparison process and other Moran-like evolutionary dynamics. Hence, there exists a non-trivial stationary distribution of the Markov chain satisfying detailed balance. Specifically, for the case of *ω* = 0 (neutral selection), the dynamics defined by equations (2.5)–(2.7) are characterized by linear rates, while these rates are quadratic for the neutral imitation dynamics and Moran process.

In the following analysis and discussion, we are interested in the limit of weak selection, and its ability to aptly predict the success of cooperation in commonly used evolutionary *d*-player games. The limit of weak selection, which has a long-standing history in population genetics and molecular evolution [11], also plays a role in social learning and cultural evolution. Recent experimental results suggest that the intensity with which human subjects adjust their strategies might be low [18]. Although it has been unclear to what degree and in what way human strategy updating deviates from random [43,44], the weak selection limit is of importance to quantitatively characterize the evolutionary dynamics. In the limiting case of weak selection, we are able to analytically classify strategies with respect to the neutral benchmark, *ω →* 0 [19,21,35,45,46]. We note that a strategy is favoured by selection if its average equilibrium frequency under weak selection is greater than one half. In order to come to such a quantitative observation, we need to calculate the stationary distribution over the abundance of strategy *A*.

### 2.3. Stationary distribution

The Markov chain given by equations (2.5)–(2.7) is a one-dimensional birth–death process with reflecting boundaries. It satisfies the detailed balance condition where is the stationary distribution over the abundance of *A* in equilibrium [47,48]. Considering we find the exact solution by recursion, given by
2.8where is the probability of successive transitions from *j* to *k*. Analytical solution equation (2.8) allows us to find the exact value of the average abundance of strategy *A*
2.9for any strength of selection.

## 3. Results and discussion

It has been shown that imitation processes are similar to each other under weak selection [21,27,38]. Thus, in order to compare the essential differences between imitation and aspiration processes, we consider such selection limit. To better understand the effects of selection intensity, aspiration level and pay-off matrix on the average abundance of strategy *A*, we further analyse which strategy is more abundant based on equation (2.8). For a fixed population size, under weak selection, i.e. *ω →* 0, the stationary distribution *ψ*_{j}(*ω*) can be expressed approximately as
3.1where the neutral stationary distribution is simply given by and the first-order term of this Taylor expansion amounts to
3.2Interestingly, in the limiting case of weak selection, the first-order approximation of the stationary distribution of *A* does not depend on the aspiration level. For higher order terms of selection intensity, however, *ψ*_{j}(*ω*) does depend on the aspiration level.

In the following, we discuss the condition under which a strategy is favoured and compare the predictions for stationary strategy abundance under self-learning and imitation dynamics. Thereafter, we consider three prominent examples of games with multiple players through analytical, numerical and simulation methods, the results of which are detailed in figures 2⇓–4 and appendix B. All three examples are social dilemmas in the sense that the Nash equilibrium of the one-shot game is not the social optimum. First, the widely studied public goods game represents the class of games where there is only one pure Nash equilibrium [49]. Next, the public goods game with a threshold, a simplified version of the collective-risk dilemma [50–52], represents the class of coordination games with multiple pure Nash equilibria, depending on the threshold. Last, we consider the *d*-player volunteer's dilemma, or snowdrift game, which has a mixed Nash equilibrium [53,54].

### 3.1. Average abundance of strategy *A*

Based on approximation (3.1), for any symmetric multi-player game with two strategies of normal form (2.1), we can now calculate a weak selection condition such that in equilibrium *A* is more abundant than *B*. As for neutrality, holds, and thus it is sufficient to consider positivity of the sum of over all *j* = 0, *…* , *N*. Under weak selection, strategy *A* is favoured by selection, i.e. if
3.3which holds for any *d*-player games with two strategies in a population with more than two individuals. For a detailed derivation of our main analytical result, see appendix A. Note that for a two-player game, *d* = 2, the above condition simplifies to *a*_{1} + *a*_{0} > *b*_{1} + *b*_{0}, which is similar to the concept of risk-dominance translated to finite populations [35].

The left-hand side expression of inequality (3.3) can also be compared to a similar condition under the class of pairwise comparison processes [19,22], where two randomly selected individuals compare their pay-offs and switch with a certain probability based on the observed inequality. Typically, weak selection results for pairwise comparison processes lead to the result that strategy *A* is favoured by selection if [35,55,56]
3.4which applies both to evaluate whether fixation of *A* is more likely than fixation of *B*, or whether the average abundance of *A* is greater than one-half under weak mutation and weak selection, which can be shown using properties of the embedded Markov chain [57]. The sums on the left-hand sides of (3.3) and (3.4) can thus be compared with each other in order to reveal the nature of our self-learning process driven by a global aspiration level.

Our main result, equation (3.3), holds for a variety of self-learning dynamics, not only for the probability function given in equation (2.4). Considering the general self-learning function *g*[*ω*(*α* − *π*_{x})] with *g*(0) *≠* 0, here *g*(*x*) is strictly increasing with increasing *x*. Denoting *u* = *ω*(*α*−*π*_{x}), we have Then, for *ω →* 0, and equation (3.2) can be rewritten in a more general form
3.5

As is a positive constant, equation (3.3) is still valid for any such probability function *g*(*x*); see appendix A.

### 3.2. Linear public goods game

Public goods games emerge when groups of players engage in the sustenance of common goods. Cooperators *A* pay an individual cost in the form of a contribution *c* that is pooled into the common pot. Defectors *B* do not contribute. The pot is then multiplied by a characteristic multiplication factor *r* and shared equally among all individuals in the group, irrespective of contribution. If the multiplication factor is smaller than the size of the group *d*, each cooperator recovers only a fraction of the initial investment. Switching to defection would always be beneficial in a pairwise comparison of the two strategies. The pay-off matrix thus reads
3.6where 1 < *r* < *d* is typically assumed. As is a negative constant for any number of cooperators in the group, we find that
3.7is always negative. Cooperation cannot be the more abundant strategy in the well-mixed population (figure 2). However, if the self-learning dynamics are driven by a sufficiently high aspiration level, then individuals are constantly dissatisfied and switch strategy frequently, even as defectors, such that cooperation can break even if selection is strong enough, namely for all values *ω*. On the other hand, if the aspiration level is low, then cooperators switch more often than defectors such that the average abundance of *A* assumes a value closer to the evolutionary stable state of full defection, which depends on *ω*. In the extreme case of very low *α* and strong selection, defectors fully dominate, and thus the stationary measure retracts to the all defection state.

### 3.3. Threshold public goods game

Here, we consider the following public goods game with a threshold in the sense that the good becomes strictly unavailable when the number of cooperators in a group is below a critical threshold, *m*. This threshold becomes a new strategic variable. Here, *c* is an initial endowment given to each player, which is invested in full by cooperators. Whatever the cooperators manage to invest is multiplied by *r* and redistributed among all players in the group irrespective of strategy, if the threshold investment *mc* is met. Defectors do not make any investment and thus have an additional pay-off of *c*, as long as the threshold is met. Once the number of cooperators is below *m*, all pay-offs are zero, which compares to the highest risk possible (loss of endowment and investment with certainty) in what is called the collective-risk dilemma [50,52]. The pay-off matrix for the two strategies, cooperation *A* and defection *B*, reads
3.8We can examine when the self-learning process favours cooperation. We can also seek to make a statement about whether under self-learning dynamics cooperation performs better than under pairwise comparison process. For self-learning dynamics, we find while the equivalent statement for pairwise comparison processes based on the same pay-off matrix would be Thus, the criterion of self-learning dynamics can be written as whereas simply leads to *r* > *d* − *m*. Comparing the two conditions, we find
3.9As the first factor on the right-hand side of equation (3.9) is always positive, the factor
3.10determines the relationship between self-learning dynamics and pairwise comparison processes: for sufficiently large threshold *m*, expression (3.10) is positive. In conclusion, the aspiration-level-driven self-learning dynamics can afford to be less strict than the pairwise comparison process. Namely, it requires less reward for cooperators' contribution to the common pool (lower levels of *r*) in order to promote the cooperative strategy. The amount of cooperative strategy depends on the threshold: higher thresholds support cooperation, even for lower multiplication factors *r* (figure 3). For fixed *r*, our self-learning dynamics are more likely to promote cooperation in a threshold public goods game, if the threshold for the number of cooperators needed to support the public goods is large enough, i.e. not too different from the total size of the group. For small thresholds, and thus higher temptation to defect in groups with less cooperators, we approach the regular public goods games, and the conclusion may be reversed. Under such small *m* cases, imitation-driven (pairwise comparison) dynamics are more likely to lead to cooperation than aspiration dynamics.

### 3.4. *d*-player snowdrift game

Evolutionary games between two strategies can have mixed evolutionary stable states [6,36]. Strategy *A* can invade *B* and *B* can invade *A*; a stable coexistence of the two strategies typically evolves. In the replicator dynamics of the snowdrift game, cooperators can be invaded by defectors as the temptation to defect is still larger than the reward of mutual cooperation [54,58]. In contrast to the public goods game, cooperation with a group of defectors now yields a pay-off greater than exclusive defection. The act of cooperation provides a benefit to all members of the group, and the cost of cooperation is equally shared among the number of cooperators [59]. Hence, the pay-off matrix reads
3.11

If cooperation can maintain a minimal positive pay-off from the cooperative act, then cooperation and defection can coexist. The snowdrift game is a social dilemma, as selection does not favour the social optimum of exclusive cooperation. The level of coexistence depends on the amount of cost that a particular cooperator has to contribute in a certain group. Evaluating weak selection condition (3.3) in the case of the *d*-player snowdrift game leads to the condition
3.12in order to observe in aspiration dynamics under weak selection. For imitation processes, on the other hand, we find Note that, except for *a*_{0} − *b*_{0} = *b* − *c* > 0, *a _{k}* −

*b*< 0 holds for any other

_{k}*k*. Because of this, the different nature of these two conditions, given by the positive coefficients for any

*d*>

*k*> 0, reveals that self-learning dynamics narrow down the parameter range for which cooperation can be favoured by selection. In the snowdrift game, self-learning dynamics are less likely to favour cooperation than pairwise comparison processes. Larger group size

*d*hinders cooperation: the larger the group, the higher the benefit of cooperation,

*b*, has to be in order to support cooperation (figure 4).

## 4. Summary and conclusion

Previous studies on self-learning mechanism have typically been investigated on graphs via simulations, which often use stochastic aspiration-driven update rules [23,25,60–62]. Although results based on the mean field approximations are insightful [24,25], further analytical insights have been lacking so far.

Thus, it is constructive to introduce and discuss a reference case of stochastic aspiration-driven dynamics of self-learning in well-mixed populations. To this end, here we introduce and discuss such an evolutionary process. Our weak selection analysis is based on a simplified scenario that implements a non-adaptive self-learning process with global aspiration level.

Probabilistic evolutionary game dynamics driven by aspiration are inherently innovative and do not have absorbing boundaries even in the absence of mutation or random strategy exploration. We study the equilibrium strategy distribution in a finite population and make a weak selection approximation for the average strategy abundance for any multi-player game with two strategies, which turns out to be independent of the level of aspiration. This is different from the aspiration dynamics in infinitely large populations, where the evolutionary outcome crucially depends on the aspiration level [37]. Thus, it highlights the intrinsic differences arising from finite stochastic dynamics of multi-player games between two strategies. Based on this we derive a condition for one strategy to be favoured over the other. This condition then allows a comparison of a strategy's performance to other prominent game dynamics based on pairwise comparison between two strategies.

Most of the complex strategic interactions in natural populations, ranging from competition and cooperation in microbial communities to social dilemmas in humans, take place in groups rather than pairs. Thus multi-player games have attracted increasing interest in different areas [36,63–68]. The most straightforward form of multi-player games makes use of the generalization of the pay-off matrix concept [63]. Such multi-player games are more complex and show intrinsic difference from 2 × 2 games. Hence, as examples here we have studied the dynamics of one of the most widely studied multi-player games—the linear public goods game [64], a simplified version of a threshold public goods game that requires a group of players to coordinate contributions to a public good [17,50–52,69,70] as well as a multi-player version of the snowdrift game [66] where coexistence is possible. Our analytical finding allows a characterization of the evolutionary success under the stochastic aspiration-driven update rules introduced here, as well as a comparison to the well-known results of pairwise comparison processes. While in coordination games, such as the threshold public goods game, the self-learning dynamics support cooperation on a larger set in parameter space, the opposite is true for coexistence games, where the condition for cooperation to be more abundant becomes more strict.

It will be interesting to derive analytical results that either hold for any intensity of selection or at least for the limiting case of strong selection [13,71] in finite populations. On the other hand, the update rule presented here does not seem to allow a proper continuous limit in the transition to infinitely large populations [20], which might give rise to interesting rescaling requirements of the demographic noise in the continuous approximation [72] in self-learning dynamics.

Our simple model illustrates that aspiration-driven self-learning dynamics in well-mixed populations alone may be sufficient to alter the expected strategy abundance. In previous studies of such processes in structured populations [25,60–62], this effect might have been overshadowed by the properties of the network dynamics studied *in silico*. Our analytical results hold for weak selection, which might be a useful framework in the study of human interactions [18], where it is still unclear to what role model individuals compare their pay-offs and with what strength players update their strategies [18,30,44]. Although weak selection approximations are widely applied in the study of frequency-dependent selection [27,29,35,45], it is not clear whether the successful spread of behavioural traits operates in this parameter regime. Thus, by numerical evaluation and simulations we show that our weak selection predictions also hold for strong selection. Models similar to the one presented here may be used in attempts to predict human strategic dynamics [73,74]. Such predictions, likely to be falsified in their simplicity [75], are essential to our fundamental understanding of complex economic and social behaviour and may guide statistical insights to the effective functioning of the human mind.

## Funding statement

This work is supported by the National Natural Science Foundation of China (NSFC) under grant no. 61020106005 and no. 61375120. B.W. gratefully acknowledges generous sponsorship from the Max Planck Society. P.M.A. gratefully acknowledges support from the Deutsche Akademie der Naturforscher Leopoldina, grant no. LPDS 2012-12.

## Acknowledgement

We thank four anonymous referees for their constructive and insightful comments.

## Appendix A

In this appendix, we detail the deducing process of the criterion of for a general *d*-player game. We consider the first-order approximation of stationary distribution, *ψ*_{j}(*ω*), and get the criterion condition (shown in §3), as follows:
A 1Inserting equation (2.8), we have

A 2

Denoting the above equation can be simplified as A 3where

A 4

and

A 5

We have
A 6and
A 7As *ω →* 0,
A 8
A 9
A 10
A 11
A 12and
A 13Then, inserting equations (A 10)–(A 13) into equation (A 4)

A 14

Similarly, we can get

A 15

In addition A 16 A 17

Therefore, inserting equations (A 14)–(A 17) into equation (A 3), A 18

Combined with equation (A 1), the criterion is rewritten as
A 19where *π*_{A}(*i*) and *π*_{B}(*i* − 1) refer to equations (2.2) and (2.3). Hence,
A 20Therefore, the criterion can be written as
A 21

We can prove that the above inequality leads to a general criterion as follows: A 22This is the result we want to show. For this, we only need to demonstrate A 23

This is equal to A 24

As such an equation should hold for any choice of (*a _{k}* −

*b*), then A 25

_{k}Using the identity we can simplify the equivalent condition as A 26This can be easily proved through mathematical induction.

Thus, we get the criterion of for general multi-player games as equation (A 22). We rewrite this as follows: A 27

## Appendix B

In tables 1–3, we demonstrate how selection intensity *ω* and the population size *N* influence the evolutionary results (the average fraction of cooperators) through simulation.

It is found that for the examples we discussed, namely the linear public goods game, the threshold collective risk dilemma and a multi-player snowdrift game, our result under weak selection can be generalized for a wide range of parameters (higher values of *ω*, small and large populations).

- Received January 22, 2014.
- Accepted February 13, 2014.

- © 2014 The Author(s) Published by the Royal Society. All rights reserved.