Royal Society Publishing

Characterizing an outbreak of vancomycin-resistant enterococci using hidden Markov models

E.S McBryde, A.N Pettitt, B.S Cooper, D.L.S McElwain


Background. Antibiotic-resistant nosocomial pathogens can arise in epidemic clusters or sporadically. Genotyping is commonly used to distinguish epidemic from sporadic vancomycin-resistant enterococci (VRE). We compare this to a statistical method to determine the transmission characteristics of VRE.

Methods and findings. A structured continuous-time hidden Markov model (HMM) was developed. The hidden states were the number of VRE-colonized patients (both detected and undetected). The input for this study was weekly point-prevalence data; 157 weeks of VRE prevalence. We estimated two parameters: one to quantify the cross-transmission of VRE and the other to quantify the level of VRE colonization from sporadic sources. We compared the results to those obtained by concomitant genotyping and phenotyping.

We estimated that 89% of transmissions were due to ward cross-transmission while 11% were sporadic. Genotyping found that 90% had identical glycopeptide resistance genes and 84% were identical or nearly identical on pulsed-field gel electrophoresis (PFGE).

There was some evidence, based on model selection criteria, that the cross-transmission parameter changed throughout the study period. The model that allowed for a change in transmission just prior to the outbreak and again at the peak of the outbreak was superior to other models. This model estimated that cross-transmission increased at week 120 and declined after week 135, coinciding with environmental decontamination.

Significance. We found that HMMs can be applied to serial prevalence data to estimate the characteristics of acquisition of nosocomial pathogens and distinguish between epidemic and sporadic acquisition. This model was able to estimate transmission parameters despite imperfect detection of the organism. The results of this model were validated against PFGE and glycopeptide resistance genotype data and produced very similar results. Additionally, HMMs can provide information about unobserved events such as undetected colonization.


1. Introduction

There has been an alarming worldwide increase in the rate of infection from vancomycin-resistant enterococci (VRE) in the last 15 years (Murray 2006). Enterococci are part of the normal gastrointestinal flora and VRE colonization often is asymptomatic and undetected. However, in patients with compromised immune systems and breached integument, enterococci can become pathogenic, causing, for example, urinary tract infection, bacteraemia and endocarditis. Large teaching hospitals and intensive care units (ICUs) have the highest rate of infection with VRE (Weinstein 2005). Infection with enterococci harbouring a vancomycin resistance gene is associated with higher mortality (Lodise et al. 2002) and many strains of VRE are resistant to all known antibiotics.

Acquisitions of VRE colonization can be grouped broadly into those that come from cross-transmission within the ward which we call transmitted, and VRE that comes from other sources which we call sporadic. Ward transmission of multi-resistant organisms is believed to be predominantly from patient to patient via the transiently contaminated hands of health care workers (Boyce 2001). The sources of sporadic VRE include patients' gastrointestinal tract, prior colonization with VRE and transmission from outside the ward. The presence of VRE on admission is often initially not detected owing to infrequent swabbing, poor sensitivity of swabs or undetectable quantities of organism. VRE may exist in subdetectable numbers in human gut so that exposure of patients to antibiotics which facilitate VRE growth (Donskey et al. 2002) may lead to an apparently new case of VRE. VRE is also known to spread from other hospital wards via patient and staff movements (Trick et al. 1999).

To select the most appropriate infection control interventions, one needs to be able to estimate how much of the new acquisition is transmitted and how much is sporadic. Restricting antibiotic exposure is thought to control sporadic VRE, by reducing selection pressure in patients' endogenous flora, while hand hygiene, cohorting, patient isolation and limiting admission of colonized patients are thought to impact on transmitted VRE.

Outbreak investigation often involves time intensive methods to characterize the mode of VRE acquisition. Genotyping techniques such as pulsed-field gel electrophoresis (PFGE), distinguish clonal outbreaks, which are presumed to be due to transmitted VRE, from multiple new strain introductions, which are presumed to be due to sporadic VRE. There are occasions when this technique breaks down, when horizontal transfer of the resistance gene, vanA or vanB, can lead to several different genotypes being detected when in fact a single transposon is being transmitted (Suppola et al. 1999; Bradley et al. 2002; Weinstein 2005).

Attempts have been made to distinguish between these two processes of colonization based on statistical analysis of surveillance data. Pelupessy et al. (2002) used a Markov model, without hidden states, to estimate transmission parameters; finding estimates were similar to those using full event data and genotyping (PFGE). Cooper & Lipsitch (2004) used structured and unstructured hidden Markov models (HMMs) to describe infection incidence time-series data and to estimate transmission parameters. Collinearity between parameter estimates, failure of convergence and computational difficulties were identified as potential problems using HMMs for sparse data such as is typically found in time-series infection control data. Forrester & Pettitt (2005) compared background rates with cross-transmission rates of methicillin-resistance Staphylococcus aureus, finding background rates were larger than cross-transmission rates.

Estimating transmission coefficients using hospital infection control data has a number of challenges. There are unobserved processes occurring; the time of new acquisition of colonization is not observed. Additionally, when relying on routine swabs to determine the number of colonized patients, the sensitivity of swabs is less than 100%.

This study uses an epidemic model structure to characterize transmission of VRE during an outbreak at an 800 bed Australian teaching hospital. The current paper extends the work by Pelupessy et al. (2002) by estimating epidemiological parameters in the presence of suboptimal swab sensitivities. It also allows for the fact that new colonizations are not immediately detectable. We use an HMM structure to estimate transmission in the face of incomplete datasets and unobserved events. This framework distinguishes between rates of transmitted and sporadic VRE acquisitions. This study also considers that the transmission rates may change over time. Section 2.1 describes the data used to estimate VRE epidemic determinants. Section 2.4 describes the model of VRE transmission, while §2.5 describes the HMM and the methodology behind it. Section 3 gives the results of the parameter estimates, comparison of model estimates and genotyping data and model selection.

2. Methods

2.1 Description of outbreak and infection control interventions

VRE was first isolated at the Princess Alexandra Hospital, Brisbane, Australia in October 1996 and a VRE screening programme commenced in January 1997, the beginning of the data collection period for this study. Data used in this study are VRE colonization data from the ICU, renal and infectious diseases units. VRE colonized patients and were identified by clinical isolates, weekly routine screening and contact tracing swabs. Infection control interventions introduced from the start of the study period were restriction of vancomycin and third-generation cephalosporin use and isolation of colonized patients. From week 125 of this study, infection control teams were aware of an increased prevalence of VRE and further measures were taken. Dedicated equipment was used in patient rooms and patients were cohorted. VRE patients requiring haemodialysis used a dialysis facility within the infection control unit. Medical and nursing staff wore disposable aprons and latex gloves for patient contacts. An environmental audit was performed in August 1999, approximately week 135 of the study period and an aggressive cleaning programme was instituted (Bartley et al. 2001).

2.2 Serial surveillance data used for statistical analysis

Input data for the statistical model in this study were:

  1. Weekly prevalence data for VRE colonization.

  2. Mean length of stay of colonized patients: 15 days. This was calculated as the time from first identification of colonization to discharge.

  3. The total number of beds in the wards, N=68.

The data were collected from 1 January 1997 to 31 December 1999. The weekly prevalence data are shown in figure 1.

Figure 1

Prevalence data for VRE over 157 weeks. Arrows show times in which changes in transmission rates may have taken place.

2.3 Data used for cluster analysis

Microbiological and clinical data were collected, including admission dates and discharge dates of VRE colonized patients, as well as the date of first positive isolate. Additionally, we had information on the colonization status on admission of three of the patients transferred from other hospitals. Genotype data, both PFGE and glycopeptide resistance genotyping, were compared with the results of the statistical analysis as part of the study validation. Presumptive VRE colonies were identified using standard techniques. Speciation (distinguishing Enterococcus faecium and Enterococcus faecalis) was initially achieved by carbohydrate fermentation reactions of arabinose, mannose and raffinose then confirmed by a multiplex PCR assay based on specific detection of genes encoding d-alanine: d-alanine ligases (Bartley et al. 2001). VRE phenotype was identified based on vancomycin and teichoplanin mean inhibitory concentrations using the E-test method. This presumptively distinguishes vanA VRE, resistant to both vancomycin and teichoplanin, from vanB VRE, resistant to vancomycin but sensitive to teichoplanin. This phenotype result was confirmed by glycopeptide resistance genotyping, achieved through a modified multiplex PCR assay, described in detail by Bartley et al. (2001).

In the study on this outbreak by Bartley et al. (2001), isolates were also characterized using PFGE. Electrophoretic band patterns were analysed according to the criteria established by Tenover et al. (1995). Computer comparison using Gel Compar v. 4.1 (Applied Maths Kortrijk, Belgium) was based on the algorithm of the unweighted pair group method for arithmetic averages and using the Dice coefficient with 1.5% band tolerance (Bartley et al. 2001). This information was used to estimate the proportion of isolates that were from the same strain.

2.4 Model of transmission

We based our ward transmission model on the Susceptible-infected model with migration, described by Bailey (1975). Modified versions of this model have been used previously to analyse nosocomial transmission data (Pelupessy et al. 2002; Cooper & Lipsitch 2004; Forrester & Pettitt 2005).

A schematic of the model is shown in figure 2. The rate of cross-transmission of VRE colonization (per colonized per susceptible patient per day) is denoted by β. It is assumed that the ward is of fixed size, N, hence the number of uncolonized patients is NC. Colonized patients are assumed to remain colonized for their entire hospital stay, therefore, transition from colonized to uncolonized occurs via discharge of a colonized patient and replacement with an uncolonized patient, which occurs at a rate μ. Duration of stay of colonized patients was available from the dataset. Acquisition of VRE, that is transmitted, is described by the mass-action term, βC(NC). VRE acquisition, that is sporadic, can arise through ward admission of a colonized patient or any other process that is not related to the number of colonized patients, and occurs at a rate, ν(NC). Each of the processes that lead to sporadic acquisition (for example, prior colonization or colonization from out-of-ward sources, endogenous gastrointestinal colonization) can reasonably be assumed to be independent of the number of colonized patients in the ward.

Figure 2

The transmission of bacterial pathogens in the hospital ward.

The probability of a change in the number of colonized patients, C, in a short time period, h, is given byEmbedded Image(2.1)The number of colonized patients in the ward, C(t), forms a Markov process on state space 0, …, N, where N is the number of patients on the ward. Reflecting boundaries occur at states i=0 and i=N, provided ν>0, otherwise 0 is an absorbing state, and provided μ>0, otherwise N is an absorbing state.

2.5 Hidden Markov model

We aim to estimate parameters associated with sporadic colonization, ν, and the colonization caused by ward transmission, β, using the structured HMM illustrated in figure 3.

Figure 3

Hidden Markov model. Here, C represents the number of colonized patients in the ward (detected or undetected), Y represents the number of patients detected at each time point. The horizontal arrows represent the transition from one state to the next, and the vertical arrows represent the relationship between the hidden state and the corresponding observation.

Our HMM consists of: observations, Y, the number of patients detected at each time point; underlying hidden states, C, the number of colonized patients in the ward; a transition model linking each hidden state with its adjacent states, represented by horizontal lines in figure 3; an observation model linking the data with the hidden state, represented by the vertical lines in figure 3. There is one hidden state for each observation, denoted C1, C2, …, Cn.

The full conditional probability of any node depends only on neighbouring nodes to which it is connected directly. The observation component of the HMM, denoted by Y, consists of 157 data inputs of weekly VRE prevalence taken over 3 years and the vector of time points, t=t1, …, tn, corresponding to each observation time. The vector C consists of the n=157 hidden states. The transition probability matrix, giving the relationship between the hidden states, is described in appendix A. The observation model, giving the relationship between the observed and hidden states, is described in §2.6.

The parameters used in the model are given in table 1.

View this table:
Table 1

Parameters used in the model. Fitted values are discussed in §3.

2.5.1 Model assumptions

The model makes the following assumptions

  1. The ward is of fixed size, N.

  2. The model parameters are time invariant (this assumption is relaxed later in the study).

  3. Each colonized patient remains colonized for the remainder of their stay.

  4. Each observation of patients not known to be colonized is conditionally independent given the corresponding hidden state.

  5. The hidden states follow a first order time homogenous Markov process, that isEmbedded Image

  6. Homogenous mixing of patients takes place.

  7. Uncolonized patients are identical with respect to susceptibility to colonization.

  8. Colonized patients are identical with respect to transmission of VRE.

  9. Time from colonization to discharge is exponentially distributed. Review of patient histories confirms that this is approximately the case.

These assumptions are discussed in §4.

2.6 Observation model

The probability of being known to be colonized (and therefore being included in the prevalence data) given that a patient is colonized, d, was unknown. Literature sources regarding the sensitivity of rectal swabs in detecting VRE were used to develop an expression for the uncertainty in this parameter. Estimates of the sensitivity of a rectal swab for VRE range from 0.58 (D'Agata et al. 2002) to 0.97 (Reisner et al. 2000) with values in between (Lemmen et al. 2001; Trick et al. 2004). We allowed for the uncertainty regarding the detection by assigning a uniform [0.58, 0.97] prior distribution to d. The probability of detection at a given prevalence check, d, used in this study was patient related rather than simply swab related. If a patient was known from previous swabs to be colonized, the patient was automatically detected, thus d would be expected to be at least as high as the sensitivity of a single swab. The observation model assumed that each week's observed prevalence is independent of the previous week's observed prevalence, given the underlying true prevalence. This is an approximation as the true detection is the known colonized patients from the previous week and the new colonizations from the current week.

The probability relationship between the states and the data is described by the binomial distribution Yk∼Bin(Ck, d), where Yk is the kth observed colonization prevalence and Ck is the actual number of colonized patients, the hidden state, at time tk. This assumes that the probability, d, remains constant over the study period (for each iteration) and the probability of detection of each colonized patient is independent of the number of other colonized patients.

Alternative observation models with greater dispersion could have been used. For example, the Poisson or negative binomial distribution could have been chosen, had we been dealing with incidence rather than prevalence data. We chose the binomial distribution because it has a sound probabilistic basis (assuming fixed detection) and, unlike the Poisson, ensures that the hidden state (number colonized) is always larger than the observation (number detected), a necessary result when using prevalence data.

2.7 Bayesian framework

The parameters for transmitted VRE, β and sporadic VRE, ν were estimated using a Bayesian framework. Let θp={β, ν, d} be the vector of model parameters. Baum et al.'s (1970) recursion formula, summarized in appendix B, was used to determine the likelihood of the data, l(Y|θp). Uniform U[0, 0.1] priors were assigned to β and ν, because little was known about these parameters other than that negative values or values higher than 0.1 were completely implausible. The posterior probability distributionsEmbedded Image(2.2)were estimated using a Monte-Carlo Markov chain algorithm, described in appendix C.

The Bayesian framework can provide estimates (and full posterior probability density) of any function of model parameters including functions which depend upon knowledge of hidden states. Let θh be the vector of n inferred hidden states C1, …, Cn and let θ={θp, θh}. The expected number of within-ward transmissions for the week, following week k is βCk(NCk), while the total number of transmissions is βCk(NCk)+ν(NCk). The expected proportion of VRE acquisitions due to ward transmission over the time of the study, f(θ), is approximated byEmbedded Image(2.3)

We evaluate the expectation, E[f(θ)|Y], by drawing samples θk, k=1, …, m from p(θ|Y) and using the approximation of Gilks et al. (1996, ch. 1)Embedded Image(2.4)

The algorithm for this Monte-Carlo integration is given in appendix C.

2.8 Comparison of cluster analysis results using genotyping with statistical analysis

A genotyping study was performed on the VRE isolates by Bartley et al. (2001). Of the 49 isolates available for analysis, 44 were found to be E. faecium vanA using glycopeptide resistance genotyping. The estimated number of isolates having identical or closely related patterns on PFGE using the criteria of Tenover et al. (1995) was 41 of 49.

2.8.1 Cluster analysis based on genotypic relatedness

We compared the proportion of ‘identical isolates’ (presumed to be part of a cluster) with the estimated proportion of transmitted VRE derived from the HMM and prevalence data. The posterior probability distribution of the proportion of VRE cases that are identical by genotype can readily be derived using a Bayesian framework and conjugate prior distribution (Gelman et al. 2004). Denote the parameter of interest, the proportion of VRE acquisitions that are identical, by p. Assume the form Beta(1, 1) for the prior distribution for the proportion; this is the same as the uniform[0, 1] prior. The probability of the data is given by the binomial Bin(a; (a+b), p), where a is the number of identical isolates and b is the number of non-identical isolates, as detected by the laboratory methods. The posterior probability density of p is Beta(1+a, 1+b).

3. Results

3.1 Transmission parameter estimation

The estimated value for the transmission coefficient, β was 10×10−4 (CI95 7.9×10−4, 13×10−4) and the sporadic acquisition rate ν was 2.0×10−4 (CI95 0.85×10−4, 3.8×10−4). The coefficient of correlation between β and ν was estimated to be −0.24. These results were obtained using a Markov chain Monte-Carlo algorithm with a burn-in period of 50 000 as described in appendix C.

The basic reproduction ratio, R0, is ‘the average number of persons directly infected by an infectious case during its entire infectious period, after entering a totally susceptible population’ (Giesecke 1994). In this model it can be shown to be R0=βN/μ. This formula for R0 is an approximation as there is a finite population in this setting. The basic reproduction ratio is estimated to be 1.07 (CI95 0.78–1.34).

The mean value for the estimated detection rate, d, was 0.75 with a 95% credible interval of 0.59–0.93.

3.2 Comparison of statistical model and genotyping data

The proportion of VRE acquisitions due to transmission, was estimated to be 89% (CI95=78–95%), using Bayesian inference applied to the HMM structure. This compares with 84% (41/49) of isolates observed to be identical or nearly identical using PFGE genotyping and 90% (44/49) using glycopeptide resistance genotyping. The posterior distribution of the estimated proportion of colonizations due to ward transmission compared with those found to be identical by glycopeptide resistance genotype and PFGE methods are displayed in figure 4.

Figure 4

Posterior distribution of proportion of VRE acquisitions that are due to ward transmission. The histogram gives the posterior distribution from the Bayesian analysis of the HMM, the solid curve gives the posterior distribution based on the observed proportion of identical strains using PFGE genotype data and the broken line gives the posterior distribution based on observed proportion of identical strains using glycopeptide resistance phenotype and genotype data (Bartley et al. 2001).

3.3 Sensitivity analysis

The length of stay following colonization could be greater than the estimated 15 days because acquisition could have preceded detection. Conversely, the length of stay could have been less than 15 days because undetected colonized patients are likely to have shorter stays than detected colonized patients. We therefore performed a sensitivity analysis on the discharge rate parameter, μ. We took upper and lower values for μ which we believed at the extreme ends of plausibility. We then repeated the estimation of the proportion of VRE acquisitions due to within-ward transmission. Results are given in table 2.

View this table:
Table 2

Analysis of sensitivity of the model outcome to changes in the discharge rate, μ.

Table 2 shows there is a small change in the estimate for large changes in the discharge parameter, μ.

3.4 Model selection

The values of the deviance information criterion (DIC) were used to assess the optimum model to fit the data (Gelman et al. 2004). Results are given in table 3.

View this table:
Table 3

Comparison of different models using the deviance information criterion. Pd: effective number of parameters.

Several models were explored. Setting either β or ν to zero led to much higher values for the DIC, giving substantial statistical support to a mixed model, in which VRE colonization arose both from cross-transmission in the ward and sporadically. The model in which β changed after week 120 was a superior fit to the model with time-invariant parameters. Allowing for a further change in β after week 135 provided the best fit of those models investigated. The effective number of parameters in a latent variable model depends on the collinearity of the parameters and the influence of the latent variables.

4. Discussion

The aim of this study was to characterize transmission of VRE using statistical methods and simple serial surveillance data. We included a term for sporadic colonization because we believe that new acquisitions of VRE could occur through means other than within-ward patient to patient cross-transmission. Sources of sporadic colonization have been labelled in the past as endogenous, spontaneous (Pelupessy et al. 2002) or background (Forrester & Pettitt 2005). Our statistical methods were designed to quantify the rates of sporadic and cross-transmitted VRE. Previous attempts have encountered difficulties especially with identifiability of variables (Cooper & Lipsitch 2004).

Full patient histories, PFGE and glycopeptide resistance genotype data were used for validation but were not included in the statistical analysis in this study. Estimates of the proportion of VRE resulting from cross-transmission based on statistical methods (HMMs) in this study were very similar to those based on vancomycin resistance genotype data.

The proportion of clustered isolates based on PFGE analysis was lower than both the vancomycin resistance genotype data and the statistical analysis. This could be due to horizontal transfer of resistance gene to new strains of enterococci, which has been reported previously (Suppola et al. 1999; Weinstein 2005). If horizontal transfer of resistance genes occurs during an outbreak, cross-transmitted strains have identical glycopeptide resistance genotypes but different PFGE patterns, hence PFGE underestimates clustering.

Using a structured HMM, one can estimate the hidden states behind the data, the number of patients colonized on the ward (both detected and undetected). We estimated the basic reproduction ratio to be close to unity, the threshold value that could lead to endemic VRE. We were able to make estimates of transmission in the face of imperfect datasets in which transmission times and patient histories were unknown and swab sensitivity was considerably less than 100%. This approach is similar to that of Cooper & Lipsitch (2004), who observed monthly infection incidence and assumed a Poisson relationship with the number colonized, the hidden state. The current study avoids the ambiguity of the relationship between the observations and the hidden state using prevalence (observed number detected at time points) which relates directly to the hidden state, the number colonized, through a binomial relationship.

For simplicity, this study assumed homogenous mixing of staff and patients. Future studies could extend this model to include ward coupling, however, dividing the data to incorporate ward structure would lead to reduced precision in parameter estimates and increased model complexity. We incorporated this uncertainty into our parameter estimates and model conclusions were robust to changes in its value.

The time to discharge was estimated by taking the mean time from first identification of colonized patients to discharge (15 days). The discharge rate was taken as the reciprocal of the mean time to colonization. This assumes that the time to discharge was exponentially distributed which was indeed approximately the case for those known to be colonized. The true time to discharge of colonized patients could have been longer than estimated in this study if patients were colonized for significant time-intervals prior to detection or they could have been shorter if a substantial number of the undetected colonized patients had shorter durations of stay. We performed a sensitivity analysis on the discharge rate parameter, μ, and found large changes in μ (±33%) resulted in small changes in the estimates of proportion of patients colonized within the ward and is therefore unlikely to have influenced the conclusions of this study.

The model presented in this study postulated that VRE acquisition arose from both cross-transmission and sporadic sources. Model comparison techniques found this model to be a far superior fit to the data compared with models which relied on either cross-transmission or sporadic sources of VRE acquisition alone, strongly supporting that both modes of acquisition were taking place.

We investigated changes in transmission over time using a structured epidemic model. Model comparison showed that there was evidence supporting the conclusion that there was an increase in cross-transmission just prior to the outbreak. There was limited evidence that the cross-transmission rate reduced after the epidemic peak at week 135, coinciding with the environmental cleaning intervention. Future studies using larger surveillance datasets could extend the methodology presented to consider more models in which parameters are time-dependent. One approach to this would be to use the reversible jump Monte-Carlo Markov chain method (Green 1995) or the birth–death Markov process model (Stephens 2000).

Inaccuracies in PFGE cluster analysis can arise from the horizontal transfer of resistance genes. Glycopeptide resistance genotype analyses are not subject to inaccuracies due to gene transfer but cannot distinguish different strains that might all be of the same resistance genotype. Statistical methods are not subject to these problems and have the additional advantage that they are not resource intensive. It is interesting to speculate whether they also have the potential to be used in real time, within a control-chart outbreak alert system.

The model presented here can be used to model the transmission of other bacterial pathogens in small scale settings of healthcare institutions, such as methicillin-resistant Staphylococcus aureus, extended spectrum beta-lactamase producing and other multi-resistant Gram-negative pathogens.


This work was partially supported by a grant under the Australian Research Council Linkage Scheme (LP0347112) and NHMRC scholarship number 290541. The authors would like to thank Dr Mike Whitby for providing data and Dr Paul Bartley for helpful comments. The authors would like to thank the anonymous reviewers for their constructive comments.


    • Received January 17, 2007.
    • Accepted February 9, 2007.


View Abstract