## Abstract

The last decade has seen much work on quantitative understanding of human behaviour, with online social interaction offering the possibility of more precise measurement of behavioural phenomena than was previously possible. A parsimonious model is proposed that incorporates several observed features of behavioural contagion not seen in existing epidemic model schemes, leading to metastable behavioural dynamics.

## 1. Introduction

There has been much recent interest in modelling the spread of behaviours in society, particularly health behaviours with respect to infectious disease [1]. At the same time, recent empirical work highlights that the complex nature of social contagion makes it very different from ‘simple’ microparasite contagion [2]. Modelling techniques have so far typically involved either explicit stochastic simulation [3–6], or else application of mathematical models originally developed for other applications, such as the Susceptible-Infectious-Susceptible (SIS) epidemic model considered by Kiss *et al.* [7] and Funk *et al.* [8]. An alternative is to use a discrete-time formalism [9,10], next-generation arguments [5] or methods from statistical physics [11,12] to obtain results about asymptotic behaviour of socially motivated models, although typically calculating transient features of system dynamics requires Monte Carlo simulation.

While the existing dynamical models have clearly significantly clarified thinking about behavioural spread, and have also motivated important empirical work, they often suffer from lack of mathematical transparency, or are not specifically customized to social contagion. In this paper, a mathematical model that has a small number of easily interpretable parameters is proposed, which reproduces several key features of empirical work and empirically motivated simulation.

## 2. Methods

### 2.1. General model

The general model framework is described as follows. Consider a large, closed population, with a proportion *B*(*t*) of that population engaging in a behaviour at a given time *t*. At a given time, each individual is canvassing the opinions of *n* other individuals in the population in such a way that the proportion of individuals in the population canvassing *m* individuals engaging in the behaviour in question is *D*_{m} (which depends on *B*(*t*) in addition to other static parameters). We assume that individuals with *m* canvassed neighbours who are engaging in the behaviour commence at a rate *τ*_{m} or cease at a rate *γ*_{m} as appropriate for their current behaviour state. The dynamical system for behaviour prevalence in the population at time *t* is then
2.1

To specify an integrable system, it is then necessary to define a form for the dynamical parameters *τ*_{m}, *γ*_{m} and a process for the generation of the proportion *D*_{m}.

### 2.2. Dynamical parameters

We now choose a form for the vectors (*τ*_{m}), (*γ*_{m}). It is worth noting that the general form above can be specialized to incorporate several other dynamical forms. For example, if *γ*_{m} = *γ* and *τ*_{m} = *m**τ*, we recover the SIS dynamics of Funk *et al.* [1] and Kiss *et al.* [7]. As another example, the approach of Salathé & Bonhoeffer [6] takes *τ*_{m} = *m/n*, *γ*_{m} = (*n* − *m*)/*n*. Importantly, both of these schemes depend only on the mean of *D*_{m} and so are unaffected by the different distributions proposed later. Dodds & Watts [9,10] consider generalized ‘dose–response’ behaviour, which in the simplest case is a discrete time version of the simplest continuous time model considered here. The more sophisticated models analysed by Dodds & Watts make use of the discrete time framework to consider agents with memory while preserving independent sampling of the population, whereas here dynamics remain Markovian but the population samples are potentially dependent.

For opinion dynamics, motivated by a comprehensive review of the literature and compelling empirical evidence [2,4], we expect an S-shaped curve for the response of behavioural transmission probability to the number of encounters with a behaviour. For simplicity, the limiting case of such a curve is taken so that 2.2

This complex form for transmission has not yet been included in other dynamical systems models of behaviour spread, and is the main benefit of the modelling approach considered here. We assume for simplicity that cessation of behaviour happens over time at a rate independent of *m*, and for convenience work in units of time, where *γ*_{m} = 1. Where *a* is close to *n*/2, then there will be similarities between these transmission dynamics and majority vote models (e.g. [11,12]) although behaviour cessation will be qualitatively different.

### 2.3. Canvassing method

To complete our model description, we need a form for the proportion *D*_{m}. The simplest assumption is that there are *n* independent trials with each trial having probability *B*(*t*), meaning that
2.3where Bin() is a binomial probability mass function as defined in appendix A. This is interpreted as each individual canvassing the opinion of *n* individuals, chosen at random from the whole population. We now consider two different generalizations of the binomial distribution through different models of canvassing.

#### 2.3.1. Clustering

To introduce clustering to the trials, we consider the method of Klotz [13] (and the parametrization of Lindqvist [14]) for generation of *D*_{m}. In this construction, the *n* individuals canvassed have states {*X*_{i}}_{i=1, … , n}, which are stochastic variables taking the value 1 for individuals engaging in the behaviour and 0 otherwise. These are chosen sequentially with
2.4

This introduces one static parameter, the clustering *c* ∈ [0,1]. The full distribution *D*_{m} for each possible value *m* of ∑_{i} *X*_{i} that follows from this construction is not reproduced here because of its complexity, but can be found in equation (3.1) of Klotz [13].

#### 2.3.2. Homophily

Homophily is the social process of ‘associating with like people’, and could be modelled in the framework presented here by stratifying the population as standard epidemic models represent risk groups [15]. This would increase the dimensionality of the dynamical system, and remove much of its attractive simplicity. An alternative is to model homophily as a partition of the population into self-loving groups. This means that each individual canvasses without replacement from a finite group of size *N*. A homophily parameter *h* can then be defined through
2.5so that as *N* → ∞, individuals canvass the whole population, leading to the minimum homophily value of 0, and where *N* = *n* individuals canvas all of their homophily group leading to the maximum *h* = 1. Where *M* is the largest integer less than *N* *B*(*t*), a well-behaved distribution is then
2.6where Hyp(*m*|*N*, *M*, *n*) is the hypergeometric distribution, representing the probability of *m* successful trials out of *n*, drawing without replacement from a population of size *N* with *M* individuals in the positive state. Equation (2.6) assumes that homophily groups are as representative as possible of the prevalence of belief in the population. This assumption therefore represents a limiting case of the process that generates finite groups. While in practice, these groups are likely also to be heterogeneous with respect to behaviour prevalence, such heterogeneity is similar to the clustering introduced above, and so once we have determined the impact of clustering, it makes sense to consider homophily at minimal values of clustering to deliver an unambiguous dynamical signature.

## 3. Results and discussion

Having defined an Ansatz for a model of behavioural contagion, equation (2.1) becomes a closed system with one dynamical variable *B*(*t*), a real transmission parameter *τ* and an integer threshold for adoption of behaviour, *a*. We also defined two methods for canvassing of opinion that introduce a neighbourhood size *n*, and either clustering *c* or homophily *h*. Having an ODE-based dynamical system as a model means that critical behaviour, in particular, the ability of a behaviour to become established in a sizeable proportion of the population, can be evaluated exactly (meaning at machine precision) and numerical integration is not computationally intensive. At the same time, this model includes the feature of complex contagion as defined by equation (2.2), meaning that it can capture behaviour not present in, for example, the SIS model.

While general analytical results for this model are not obvious, if we consider the case where *n* = 2, *a* = 2, then there are three fixed points of the system with complex contagion:
3.1

When *τ* < 4*γ*, only the behaviour-free steady state exists, and is stable. When *τ* > 4*γ*, *B*_{0}^{*} and *B*_{2}^{*} are stable steady states, with *B*_{1}^{*} being an unstable fixed point above which the system evolves towards *B*_{2}^{*} and below which the system evolves towards *B*_{0}^{*}. This is in contrast to SIS dynamics where there are only two fixed points: , which is stable when 2*τ* < *γ*, and , which is stable when 2*τ* > *γ*.

Figure 1 shows some results from numerical integration of the model. Figure 1*a* shows three of the distributions considered: binomial, clustered and homophilous. In figure 1*b*, we see one of the main features of this model that is qualitatively different from SIS dynamics: complex contagions are metastable, with both the ‘behaviour-free’ and ‘established behaviour’ steady states being absorbing. As Centola & Macy [3] argued, this is a necessary feature for explaining how initially unpopular norms can become established and maintained through social pressure.

Also in figure 1, the impact of clustering (figure 1*c*) and homophily (figure 1*d*) on behavioural dynamics is shown. This provides a mathematical explanation for the results seen in empirical work and simulation [2,4], namely that clustering enhances behavioural transmission, while homophily (as defined here, subject to caveats about interpretation) reduces behavioural transmission. The non-monotonicity seen in figure 1*d* is just an artefact of the discretization equation (2.6). These effects are not seen for simple transmission, which only depends on the mean of the distribution *D*_{m} and so is unaffected by changes in clustering *c* or homophily *h*.

In summary, the mathematical model introduced here complements and develops the existing work in three main ways. Firstly, it incorporates many of the advantages of simple transmission models like the SIS model, in that the threshold behaviour, fixed points, transient behaviour and parameter sensitivity can be calculated numerically at machine precision. Secondly, the rates and processes defined implicitly in equation (2.1) can be used to define a natural stochastic model using the methods of Dangerfield *et al.* [16]. As there are relatively few parameters, this opens up the possibility of rigorous statistical fitting of model parameters, although finding a robust method for inference and sufficiently high-quality data is likely to pose a significant challenge. Finally, the mathematical transparency of the model acts as a guide to intuition, meaning that the exact causes of effects seen in more sophisticated simulations and empirical work can be better interpreted.

## Acknowledgements

Work supported by the UK Engineering and Physical Sciences Research Council (grant number EP/H016139/1). The author would like to thank Martine Barons and Matt Keeling for helpful comments relating to this work.

## Appendix A: statistical notation

The binomial coefficients are given by A 1

The binomial probability mass function is, for integer *m* ∈ {0, … , *n*},
A 2The hypergeometric probability mass function is, for integer *m*∈ {0, … , *n*},
A 3

- Received January 13, 2011.
- Accepted January 28, 2011.

- This Journal is © 2011 The Royal Society