The epidermal growth factor receptor (EGFR) is a major target for drugs in treating lung carcinoma. Mutations in the tyrosine kinase domain of EGFR commonly arise in human cancers, which can cause drug sensitivity or resistance by influencing the relative strengths of drug and ATP-binding. In this study, we investigate the binding affinities of two tyrosine kinase inhibitors—AEE788 and Gefitinib—to EGFR using molecular dynamics simulation. The interactions between these inhibitors and the EGFR kinase domain are analysed using multiple short (ensemble) simulations and the molecular mechanics/Poisson–Boltzmann solvent area (MM/PBSA) method. Here, we show that ensemble simulations correctly rank the binding affinities for these systems: we report the successful ranking of each drug binding to a variety of EGFR sequences and of the two drugs binding to a given sequence, using petascale computing resources, within a few days.
Lung carcinoma, like all other forms of cancer, has two main characteristics: uncontrolled growth of cells and the ability of these cells to spread to distant sites. Lung cancer is the most common cause of cancer death in Western countries . Our understanding of cancer has been increasing with advances in molecular, cellular and tissue biology. To fully apprehend the natural phenomenon of cancer, an integrated approach is needed that spans all length and timescales, from molecular to cellular and tissue levels, from nanoseconds to years, and from nanometres to metres.
Like most cancers, lung cancer is usually treated by a combination of chemotherapy alongside other treatments, such as radiation therapy or surgery . Such treatments often have a range of side effects; some are so serious that they may even lead to death. ‘Targeted therapy’ is a new approach to cancer treatment, which uses drugs to block the growth and the spread of cancer . These drugs are designed to identify and attack a specific molecular target, usually a protein that plays a critical role in tumour growth. As only specific targets are interfered with, targeted therapies are more effective and less harmful than most current combination treatments.
The epidermal growth factor receptor (EGFR) is an especially important enzyme target in lung cancer because it is over-expressed in most (approx. 62%) non-small cell lung carcinoma (NSCLC) tumours and mutates frequently in a subset of NSCLC patients with adenocarcinoma, women, East Asians and non-smokers . EGFR is a membrane-spanning cell surface protein. It is characterized by immunoglobulin-like extracellular domains that have a ligand binding site, a transmembrane segment and an intracellular domain containing tyrosine kinase (TK; figure 1a) plus autophosphorylation regions. The extracellular domain of EGFR binds epidermal growth factor (EGF) and EGF-like ligands with high affinity; the binding induces an EGFR dimerization that stimulates its intrinsic kinase activity. As a result, autophosphorylation of several tyrosine residues occurs at the C-terminal domain of EGFR. This autophosphorylation initiates several signal transduction cascades, leading to DNA synthesis and cell proliferation . Blockage of ligand binding and inhibition of kinase activation are frequently used to suppress its functions: monoclonal antibodies are directed against the extracellular EGFR domain, whereas tyrosine kinase inhibitors (TKIs) target the intracellular TK domain . Some antibodies and TKIs have been approved for use in cancer therapy, while several others are currently in various stages of clinical trials .
The majority of TKIs are ATP-competitive inhibitors that bind in the ATP binding site located between the N-terminal kinase (N-lobe) and C-terminal kinase (C-lobe) of the TK domain (figure 1a). The ATP binding site of TKs is highly conserved. Hence, the overall characteristics of these kinase inhibitors remain the same : a heterocyclic core scaffold (figure 1b,c) makes many of the same contacts as the purine ring of ATP (figure 1d), and side chains on the scaffold allow for the generation of inhibitors that target a diverse set of kinases. Among the many reported heterocyclic cores, the 4-anilinoquinazoline plays a particularly important role in demonstrating the potential for selectivity. All three clinically approved TKIs for EGFR—Gefitinib (figure 1b), Erlotinib and Lapatinib—are based on the 4-anilinoquinazoline scaffold. Selective 4-anilinoquinazoline inhibitors have also been developed to target a number of other kinases. Indeed, there are a range of heterocyclic core scaffolds used in TKI design and synthesis. AEE788  (figure 1c), obtained by optimization of the pyrrolopyrimidine scaffold, potently inhibits the kinase activity associated with EGFR and vascular endothelial growth factor receptor (VEGFR). VEGFR is another important kinase receptor involved in the regulation of vasculogenesis and angiogenesis. The former is the formation of de novo blood vessels, which usually occurs during a limited period early in embryonic development; the latter is the growth of blood vessels from pre-existing ones and represents a critical step in tumour growth and development. AEE788 exhibits antiproliferative activity against a range of EGFR- and VEGFR-overexpressing cell lines . AEE788 is currently in phase I/II clinical trials .
EGFR mutations usually occur in its tyrosine kinase domain and are clustered around the ATP-binding pocket of the enzyme (figure 1a). Clinical studies manifest a strong correlation between the presence of mutations and patient response to TKIs. The two most common mutations are exon 19 deletions and L858R substitution in exon 21 (the entire kinase domain is encoded by exons 18–24) . They have been associated with dramatic responses to Gefitinib and Erlotinib. Other rare mutations have been reported within the EGFR TK domain, including exon 20 insertions, exon 18 point mutations and exon 20 point mutations. These mutations are also associated with sensitivity or resistance to TKIs . They can distort the ATP-binding cleft and introduce potential new interactions between EGFR and inhibitor/ATP , which in turn influence the relative strengths of inhibitor- and ATP-binding. A better understanding of the reasons for the success or failure of a therapeutic intervention will help in the selection of subgroups of patients who are most likely to respond to specific drugs and paves the way for individualized treatment, building on similar approaches already developed to handle infectious disease treatment .
Personalized drug ranking has been included in a prototype clinical decision support system for HIV treatment . An automated workflow supplements existing inductive (‘Baconian’) decision support systems with deductive (‘Popperian’) predictive modelling and drug ranking . To perform such ranking with optimal efficiency, we have designed a highly automated molecular simulation-based free energy calculation workflow tool, the binding affinity calculator (BAC) . Our approach employs large-scale molecular dynamics (MD) techniques principally exploiting petascale resources on US TeraGrid  in order to study the interactions between inhibitors and viral proteins in atomistic detail, together with free energy methods to predict the effect of mutations on drug binding affinities. We have integrated the application hosting environment (AHE)  with BAC, so that applications can be launched automatically on numerous high-performance computing (HPC) resources on geographically distributed grids and federations of grids; these include, in addition to the TeraGrid, the UK National Grid Service (NGS) and the EU Distributed European Infrastructure for Supercomputing Applications (DEISA). AHE is a lightweight hosting environment for running unmodified applications on these resources and transferring data around one or more grids inter alia. AHE provides a high level of abstraction for MD simulations and analyses of molecular level inhibitor–protease interactions, and functions as a ‘gateway’ from local (campus based) to the largest supercomputing resources.
MD simulation has been used to study structural and energetic properties for ligands complexed with wild-type (WT) and mutant kinases [16–20]. In these studies, binding free energies were calculated by the molecular mechanics/generalized Born solvent area (MM/GBSA or MM/PBSA) method  using snapshots accumulated from a single MD trajectory. As simulation of complex biomolecular systems often samples only a very few specific conformational states within the timescale typically accessible today, achieving accurate free energies is challenging from a single trajectory. Various methods have been proposed to enhance the sampling of conformational space, and hence the convergence of observable properties . Ensemble simulation  is expected to be more efficient to explore conformational space than one single long MD simulation. We have previously reported a series of ensemble simulations for a range of HIV-1 protease inhibitors and associated mutations [24–26] using BAC , with ensembles of 50 individual simulations and relatively short simulation time (up to 4 ns for each separate simulation). A remarkable level of correlation is obtained between computed binding free energies and experimental data [24–26]. A comparison study [14,25] of ensemble simulations and single long simulations has demonstrated the effectiveness of the ensemble approach, which exhibits significantly enhanced thermodynamic sampling. In the present paper, we present ensemble simulations of TKIs binding to WT and mutant EGFRs.
2.1. Molecular mechanics/Poisson–Boltzmann solvent area method
Large-scale MD techniques are invoked to study the interactions between TKIs and EGFRs in atomistic detail, specifically employing MM/PBSA methodology [21,27] to predict the effect of mutations on drug binding affinities. The inhibitor binding free energy can be evaluated, following the MM/PBSA approach, as 2.1where 2.2Here, 〈 … 〉 denotes an average for a set of structures from MD simulations, while EMM is the molecular mechanics energy of species i (the index i corresponding to either complex (com), inhibitor (inh) or protein (pro)) in the gas phase, being the sum of bonded energy (comprising bond, angle and dihedral terms), van der Waals (EvdW) and electrostatic (Eelec) interactions. Gsolv is the solvation free energy of the ith moiety; it is estimated as the sum of the electrostatic solvation free energy (GPB) calculated using the Poisson–Boltzmann equation and the non-polar solvation free energy (GSA) calculated from the solvent-accessible surface area (SASA). −TSconf is the contribution of configurational entropy Sconf at temperature T. Because AEE788 and Gefitinib are ‘reversible’ inhibitors that bind non-covalently to the receptor, the bonded terms cancel exactly in the binding free energy calculation (equation (2.1)). Hence, the binding free energy is 2.3Here, ΔGcalc denotes the calculated free energy difference upon binding, which can be decomposed into ΔEMM and ΔGsolv, and further into ΔEvdW, ΔEelec, ΔGPB and ΔGSA. The −TΔSconf term denotes the configurational entropy change ΔSconf at temperature T, which includes translational (ΔStrans), rotational (ΔSrot) and vibrational (ΔSvib) components (see electronic supplementary material for more details).
The MM/PBSA method has been applied to rank affinities for inhibitors bound to WT and resistant variants of HIV-1 protease [24–26], and TKIs to kinases [16–20]; here ensemble simulations are performed to calculate the binding free energies of the inhibitors AEE788 and Gefitinib to WT and four mutant EGFRs in a similar manner to that described by Sadiq et al. .
2.2. Initial preparation of models
The X-ray structure (protein data bank (PDB) id: 2J6M) of WT EGFR TK domain  (amino acid 697-984) bound to the inhibitor AEE788 was used as the initial structure for the simulation protocol if not specified. Gefitinib–EGFR complex was constructed by replacing inhibitor AEE788 in 2J6M by Gefitinib taken from within a Gefitinib–EGFR X-ray structure (PDB id: 2ITY) through alignment of the two receptors' common mainchain atoms; energy minimization and constrained MD simulations were then performed from this structure, using available Gefitinib–EGFR X-ray coordinates (PDB id: 2ITY) as the reference structure; the final constructed structure reproduced the Gefitinib–EGFR X-ray structure with completed activation (A-loop) residues. The G719S, L858R, T790M and T790M/L858R mutant EGFRs were obtained by modification of corresponding amino acid(s) within the WT structures to the desired residue(s). Most of the reported AEE788/Gefitinib–EGFR X-ray structures  have approximately nine residues missing within the A-loop (figure 1a). There is considerable ambiguity in modelling these missing residues. Hence, the complete X-ray structure of AEE788–EGFR was used here as a template on which all systems were built. To test the effects of the initial structures on the binding free energy calculation, two additional systems were built: one for the AEE788–G719S system constructed from its own complete X-ray structure (PDB id: 2ITP), termed 2ITP-G719S hereafter (figure 2a), another for the AEE788–L858R complex using its X-ray structure (PDB id: 2ITT) combined with a modelled A-loop, termed 2ITT-L858R hereafter (figure 2b). The same two AEE788–EGFR complexes modelled from the WT X-ray structure (PDB id: 2J6M) were labelled as 2J6M-G719S and 2J6M-L858R, respectively (figure 2). The 2ITT-L858R combined coordinates were obtained by overlapping the incomplete structure with the complete 2J6M X-ray structure, cutting the loop from the complete structure, combining it with the incomplete X-ray structure, and finally optimizing the loop conformation so as to fit within the incomplete structure (figure 2b).
Initial preparation of all systems was implemented using BAC , including parameterization of the inhibitors, solvation of the complexes and electrostatic neutralization of the systems by adding chloride counterions. Gaussian 03  was used to perform geometric optimization and to determine electrostatic potentials of all inhibitors at the Hartree–Fock level with 6-31G** basis functions. The restrained electrostatic potential (RESP) module in the AMBER package  was used to calculate the partial atomic charges for the inhibitors. The protonation state of a molecule is the subject of uncertainty , and it may vary with molecular conformation . While in aqueous solvent, the tails of the inhibitors are more likely to be protonated because of their pKa values (around 8.4 for piperazine and 7.4 for morpholino), although their state may change when they interact with the receptor. The inhibitors were not protonated in this study, as in other simulation studies of these systems [16,18,32]. These papers provide MD simulation data that are in good quantitative agreement with experimental data using this unprotonated state. The force field parameters for the inhibitors were assigned by the general AMBER force field (GAFF) . The AMBER ff03 all-atom force field  was used to describe the protein parameters, the Leap module in the AMBER package for solvation and charge neutralization. This resulted in the final molecular models in which the inhibitor–EGFR complexes were solvated in a cuboid water box with a minimum 14 Å buffering distance in all three orthogonal dimensions. The systems consisted of approximately 50 000 atoms in rectangular boxes of dimensions ca 72 × 93 × 79 Å.
The MD package NAMD  was used throughout the production simulations. Periodic boundary conditions were imposed in all three spatial dimensions. An ensemble simulation was performed for each system, in which 50 identical replicas of the solvated molecular model were used. For each replica, energy minimizations were first performed with heavy protein atoms restrained at their X-ray positions. Then a series of short simulations was conducted while the restraints on heavy atoms were gradually reduced. The systems were all maintained at a temperature of 300 K and a pressure of 1 bar (NPT ensemble). Finally, single 4 ns production simulations were run for all replicas. The total MD simulation time for this study was in excess of 3 µs, and the amount of data accumulated was more than 2 TB. The simulations showed structural and energetic stability on the timescale of the production runs (see below). Longer simulation may display drifts for measured properties as a function of time owing to the occurrence of rare events , for example the transition between active and inactive EGFRs, which we are also currently investigating by ensemble simulations with much longer runs (on a timescale of tens to hundreds of nanoseconds) for each replica. Although all replicas in one ensemble simulation started from the same initial structure, they produced different trajectories because of the randomly assigned initial velocities from a Maxwell–Boltzmann distribution in each run. Production runs were performed on 64 or 128 cores of HECToR (Cray XT4), a 11 328 core UK supercomputer based in Edinburgh, and Huygens (IBM pSeries 575), a 3328 core Dutch national supercomputer at the SARA Computing and Network Center in Amsterdam, both part of EU DEISA; and on Ranger, a 62 976 core TeraGrid supercomputer at the Texas Advanced Computing Center (TACC, USA). Each individual MD simulation took approximately 4 h ns−1 on 64 cores of Ranger. With the vast amount of processing resources available, all replicas within a single ensemble simulation can be deployed to run concurrently and completed in 1 day.
2.4. Post-production analysis
Root-mean-squared deviation (RMSD) behaviour was determined using VMD . The MM/PBSA module of AMBER  was used to determine the free energy contribution ΔGcalc (equation (2.3)). The molecular mechanics energy differences (ΔEvdW and ΔEelec) were calculated using the SANDER module in the AMBER package. The polar solvation free energy (ΔGPB) was calculated with the finite-difference PB equation solver in the AMBER suite. The PB calculation employed a grid spacing of 0.5Å, with dielectric constants of 1 and 80 for the inhibitor–EGFR and the surrounding solvent, respectively. The non-polar solvation free energy (ΔGSA) was calculated from the SASA: with the surface tension γ being set to 0.00542 kcal (mol Å2)−1 and the offset to 0.92 kcal mol−1. The changes in configurational entropy upon ligand binding (ΔSconf) were estimated by the AMBER normal-mode (NMODE) module. Energy minimizations were performed for the complex, inhibitor and receptor before the NMODE calculations, with a distance-dependent dielectric constant ε(r) = 4r and a convergence tolerance of 10−4 kcal (mol Å)−1. Four hundred snapshots were used for MM/PBSA, and 20 for NMODE, in each single simulation. The analyses were carried out using the Leeds node of the UK National Grid Service (http://www.ngs.ac.uk) and the Mavrino Cluster at the Center for Computational Science at UCL. One MM/PBSA calculation with 400 snapshots required 8 h on one Opteron CPU; the entropy computation with NMODE was expensive, one snapshot requiring 8 h on a single Opteron CPU.
3.1. Simulation stability
To assess the structural stability of the complexes in the ensemble MD simulations, RMSD of the backbone atoms from their X-ray coordinates and inhibitor-EGFR binding free energies are examined. Figure 3 shows the distribution of the RMSDs and free energies of AEE788–EGFR complexes, calculated from all 50 replicas in one ensemble simulation. The averaged RMSDs, energies and entropies from concurrent time points of all replicas are provided in the electronic supplementary material, figure S1. The coordinates of backbone atoms from the crystal structure of WT EGFR (PDB id 2J6M) were used for all RMSD calculations. The RMSDs and energies for the Gefitinib–EGFR complexes exhibit similar behaviour to those for AEE788–EGFR (data not shown).
As demonstrated in figure 3, simulations show reasonable distributions for both structural deviations (RMSDs) and binding free energies (ΔGcalc). The backbone atoms of the different ligand–EGFR complexes show similar deviations from the X-ray structure. The distribution of the RMSD for backbone atoms is approximately lognormal (figure 3). A lognormal distribution has been reported previously in describing the structural differences between an nuclear magnetic resonance ensemble and crystal structures of the same protein . MD simulations sample a diverse conformational ensemble present in solution, leading to a broad distribution peaked near 1.5 Å but extending beyond 3 Å owing to the presence of variability in loops and other flexible regions of the protein (figure 4). The inhibitors reveal a broader RMSD distribution than the protein's backbone atoms. In all of the simulations, the heterocyclic core remains in the same orientation, in alignment with the purine ring of ATP (figure 1d) in the binding cleft. Examination of the trajectories shows that the central scaffolds of the inhibitors—the quinazoline of Gefitinib and the pyrrolopyrimidine of AEE788 (figure 1b,c)—remain anchored in the binding pocket and the structures of the binding sites are unchanged (figure 5). The large deviations come from the flexible propylmorpholino and ethylpiperazine tails (figure 5) that interact with solvent and fluctuate significantly . The distribution of the free energy exhibits an approximately normal distribution for all systems, indicating that converged sampling is reached with respect to the energy minima .
As shown in figure 4 for the ensemble and the single simulation, the helix core in the C-lobe is very stable, while the N-lobe has a larger fluctuation owing to its overall twist relative to the C-lobe. The peripheral region around the helix core, especially the A-loop, experiences relatively large movement. The ensemble simulations sample more conformational space than a single run (figure 4). The sampling of ligand orientations is consistent with the experimentally observed conformations (figure 5). The solvent-exposed tails visually exhibit greater movement than the scaffolds, which is in agreement with the B-factors associated with the X-ray structures. Figures 4 and 5 demonstrate that the overall structure as well as the inhibitors and their surrounding key residues are well-sampled during the ensemble simulations.
3.2. Correlation with experimental data
In figure 6, the calculated binding free energies ΔGcalc (equation 2.3) are compared with the experimental data [28,39] for Gefitinib. The calculation is based on snapshots extracted from trajectories of ensemble simulations. The absolute binding free energies from MM/PBSA calculations overestimate the binding affinities (ΔGcalc versus ΔGexp), which presumably come from the exclusion of the configurational entropy contribution . Although inclusion of the configurational entropy brings the calculated absolute free energies closer to the experimental data, it does not help in achieving a better ranking of the drug affinities in this study (see electronic supplementary material, table S1). A similar finding was previously reported  for binding free energies of inhibitors with dihydrofolate reductase using MM/PBSA and MM/GBSA methods, in which inclusion of configurational entropy degraded the agreement of rankings between calculation and experiment. Excluding the configurational entropy contribution, our calculated results are strongly correlated with the experimental values  in all cases except the L858R mutation, as shown in figure 6a. The simulations correctly predict the trend of binding affinities for Gefitinib to WT, G719S, T790M and T790M/L858R mutant EGFRs. In another experimental dataset , the L858R mutant is well behaved and lies between WT and G719S mutant EGFRs (figure 6b). Although the binding affinities for Gefitinib–L858R are similar in two experimental datasets [28,39], it is more reasonable to compare the relative binding affinities instead of the absolute values as the experiments were performed at different conditions. It should be noted that, while a good correlation is presented for the four systems, the dataset used in figure 6 is very small and hence the correlation coefficient and coefficient of determination are not optimal. Although experimentally determined binding affinities are frequently reported to have very small variances between measurements performed by a single group, they can vary wildly when measurements are made under different protocols and/or different conditions. The binding affinity of AEE788 to WT EGFR differs by 0.43 kcal mol−1 according to published work [10,28], even though the measurements were made by the same group using the same method.
When combining calculated binding free energies of AEE788 and Gefitinib and comparing with experimental data [10,28], a reasonable correlation is obtained by excluding three data points (figure 7a), i.e. Gefitinib–L858R, AEE788–T790M and AEE788–T790M/L858R. A cross-drug correlation makes it possible to identify subgroups of patients who have a specific EGFR variant and are most likely to respond well to a particular drug treatment.
The excluded data points in the linear fitting are Gefitinib with L858R, and AEE788 with T790M and T790M/L858R mutant EGFRs. As discussed above, the calculated Gefitinib–L858R affinity is in good agreement with the ranking order reported experimentally by Fabian et al. . The other two data points are AEE788 with T790M-related mutations, which are the most negative in our calculations but the least negative according to published experiments [10,28]. For every EGFR sequence, the calculated binding free energies are more favourable for AEE788 than those for Gefitinib (figure 8). This finding is in line with a recent publication  in which the vast majority of EGFR mutants with AEE788 consistently show greater binding affinities when compared with mutant EGFRs with Gefitinib. The differences in the chemical structures of Gefitinib and AEE788 (figure 1b,c) are assumed to determine the distinct responses observed in the experiment . In the same experiment , T790M is completely resistant to both inhibitors, which makes it impossible to compare its relative sensitivities; and there is also no half-maximal inhibitory concentration (IC50) data in this experiment for the important concomitant T790M and L858R mutations. The only available data for T790M and T790M/L858R mutants are from Yun et al. [10,28]; these are the only two mutations that bind more tightly to Gefitinib than AEE788 (figure 8). Were these two data points in line with the others, i.e. ΔΔG < 0, in figure 8, the experimental binding affinities for AEE788–T790M and AEE788–T790M/L858R would increase and the two relative data points in figure 7 would then move towards the fitted lines (ΔGexp decreases). Because of the uncertainties of the calculations and the spread of various experimental measurements (figure 8), more experimental studies are needed to better evaluate our theoretical predictions.
3.3. Decomposition of molecular mechanics/Poisson–Boltzmann solvent area energies
To determine the dominant factors governing inhibitor binding, the various energy components (equation 2.3) are analysed (table 1) to elucidate the binding strength and specificity. The calculated binding free energies can be clustered into two groups: (i) AEE788 and Gefitinib with WT, G719S and L858R mutant EGFRs; (ii) AEE788 and Gefitinib with T790M and T790M/L858R mutant EGFRs. The members in the second group evidently have lower calculated binding free energies than those in the first group (table 1). In the group of T790M-related mutations, although the calculated binding free energies for the inhibitor AEE788 do not fit within the overall trend of published experimental binding free energies (figure 7a), our calculations nevertheless correctly predict the relative binding affinities between T790M and T790M/L858R mutations (ΔΔG = ΔGT790M − ΔGT790M/L858R) for each inhibitor, i.e. ΔΔG = 0.33 kcal mol−1 for Gefitinib and −0.20 kcal mol−1 for AEE788, compared with 0.51 and −0.23 kcal mol−1 from experiment [10,28].
A chain of conserved hydrophobic residues in the tyrosine kinases, a ‘hydrophobic spine’, links the N- and C-lobes and stabilizes the active conformation . Residue 790 sits on the top of the hydrophobic spine; its T790M mutation extends the hydrophobic spine and hence strengthens the active kinase conformation. The gatekeeper mutation T790M is known to significantly increase the binding affinity for ATP [10,18], explaining resistance to Gefitinib in lung cancer treatment on the basis of its being less competitive for EGFR binding than ATP within cells. The hydrophobic region is partially exposed to solvent when the binding site is empty, which introduces higher solvation free energies for apo-EGFR with T790M and T790M/L858R mutations than other EGFR variants. As a result, the calculated solvation free energy ΔGPB upon binding is smaller for T790M-related mutations than those in the first group (table 1).
Gefitinib and AEE788 combine with the ATP-binding site of EGFR in a reversible manner. The quinazoline core of Gefitinib and the pyrrolopyrimidine core of AEE788 position themselves along the hinge region in a very similar fashion (figure 1b,c). This orientation allows the quinazoline core to form one hydrogen bond and the pyrrolopyrimidine core two hydrogen bonds with the kinase hinge region. This is the main reason why AEE788 has a more negative electrostatic component, ΔEelec, and hence more negative molecular mechanics energy, ΔEMM (equation 2.3), in its calculated binding free energy than Gefitinib (table 1).
The electrostatic ΔEelec and van der Waals ΔEvdw components, hence the molecular mechanics component ΔEMM, of the binding free energy are negative, conferring net favourable contributions towards drug–EGFR binding. ΔGPB has a positive sign, reflecting an unfavourable electrostatic solvation upon binding. As the favourable electrostatic interaction between the inhibitors and the receptor is partially cancelled by the unfavourable solvation energy, the van der Waals interaction is the most favourable component of the binding free energy. Moreover, the net loss in non-polar solvation free energy ΔGSA is favourable for binding, although the loss of configurational entropy of the inhibitor–receptor complex does not favour binding (see the electronic supplementary material, table S1). As noted earlier, adding the entropic component to ΔGcalc would bring the calculated binding free energy more in line with that from experiment , although here it does not help in achieving an enhanced correlation between calculation and experiment . Comparing the correlations between the experimental binding free energies and each of the components (ΔEvdW, ΔEelec, ΔEMM, ΔGPB and ΔGSA) shows that none of them correlates well with the experimental data, i.e. no individual component dominates the binding specificity of an inhibitor to EGFR sequences (table 1). This finding contradicts the claims of a previous single trajectory MD simulation  on the same inhibitor–EGFR complexes. Analyses of the non-polar (ΔEvdW + ΔGSA) and the total electrostatic (ΔEelec + ΔGPB) contributions show that the electrostatic contribution correlates well with the experimental binding free energies (figure 7b) while the non-polar contribution does not show any correlation. Thus, although the non-polar energy is the most favourable component of the binding free energy, it is the total electrostatic contribution, which is largely responsible for the binding preferences of inhibitors to WT or mutant EGFRs.
3.4. Per-residue decomposition of molecular mechanics/Poisson–Boltzmann solvent area energies
To investigate which residues contribute significantly to inhibitor binding, pairwise decomposition of the interaction energy is examined between the inhibitors and every residue of the receptor (figure 9). Both inhibitors have a similar pattern of interactions, on a residue-by-residue basis, because of their common structures. The most favourable interaction comes from the ‘hinge’ residue M793, which links through hydrogen bonds with the inhibitors. The pyrrolopyrimidine core of AEE788 forms two hydrogen bonds with M793, introducing more favourable interactions for AEE788 than for Gefitinib of which the quinazoline core enjoys only one H-bond with M793.
The interaction footprints for WT and L858R mutant EGFRs are almost identical, with differences no larger than 0.27 kcal mol−1 for AEE788 and 0.43 kcal mol−1 for Gefitinib on a per-residue basis. Residue 858 is exposed to solvent and does not have significant interactions with either inhibitor (|ΔE| < 0.24 kcal mol−1). Indeed, the R858 residue itself introduces slightly more unfavourable interactions with the inhibitors than does L858 (0.08 versus 0.03 kcal mol−1 for AEE788, and 0.24 versus 0.09 kcal mol−1 for Gefitinib). The L858R mutation brings in unfavourable overall molecular mechanics interactions (ΔEMM in table 1) for both inhibitors with the receptor. It is the electrostatic solvation free energy (see ΔGPB in table 1) that contributes to the favourable binding of both inhibitors to the L858R mutant EGFR.
The G719S mutation itself introduces a slightly more favourable interaction with the inhibitors than WT EGFR does (figure 9). This mutation is believed to stabilize the phosphate-binding loop (P-loop) conformation (figure 1a) in the active state . The energy profile shows little change around the P-loop (residues 719, 723 and 726) in the N-lobe and residues bordering the catalytic cleft in the C-lobe (residues 797, 799, 800 and 803).
Just like the WT and the L858R mutant, the T790M and T790M/L858R mutations show no overall differences in total binding free energy (table 1), per-residue energy components exhibiting negligible variances at all positions (figure 9). The energetic changes of both T790M and T790M/L858R mutations from WT stem principally from the hydrophobic pocket residues (745, 762, 766, 777, 790, 854 and 855). The point mutation T790M does not sterically hinder binding of reversible inhibitors to the resistant EGFR mutant; instead, the mutation introduces a more favourable interaction with both inhibitors than WT does.
The largest difference between AEE788 and Gefitinib comes from D855. In Gefitinib–EGFR, the propylmorpholino tail of Gefitinib undergoes conformational changes, moving closer to residue D855 (figure 5). This makes Gefitinib interact favourably with D855, not only through its aniline group as shown in X-ray structures  but also via its propylmorpholino ‘tail’ (figure 1). The tail is flexible, as evidenced by its weak electron density in X-ray experiments  and large fluctuations in MD simulations [16,44]. A three-dimensional quantitative structure–activity relationship (3D-QSAR) study also reveals that D855 is likely to be very important for inhibitor–receptor interactions .
3.5. Reproducibility of ensemble simulations
As standard MD simulations are insufficient for searching exhaustively the conformational space of biomolecular systems, the choice of initial structure remains a critical issue in many simulation studies. It is, therefore, of importance to examine whether utilization of different initial structures has any significant effect on the calculated results. Our simulations of AEE788/Gefitinib–EGFR complexes employed the same set of protein coordinates (PDB id: 2J6M)  as the initial structure. To validate the MM/PBSA method on the inhibitor–EGFR systems, two more ensemble simulations were performed with different initial structures (see §2.2). Each ensemble also contained 50 replicas with 4 ns of production dynamics for every individual simulation.
The binding free energies from different initial structures are calculated to be −36.43 ± 0.21 (2J6M-G719S) and −35.93 ± 0.27 kcal mol−1 (2ITP-G719S) for AEE788-G719S, and −35.96 ± 0.24 (2J6M-L858R) and −36.03 ± 0.31 kcal mol−1 (2ITT-L858R) for the AEE788–L858R complex, respectively (table 2). The energetic results are thus very similar. These calculations are hence not sensitive to the choice of initial structures, thanks to the ensemble simulation approach that renders conformational sampling considerably more extensive . However, if just a single simulation is performed, the calculated properties are unlikely to be reproducible even if one and the same initial structure is used (see below).
3.6. Reproducibility of single trajectory derived binding affinities
Most published MD studies use data accumulated from a single production run from which structural, dynamic and energetic properties are obtained. In our current study, the averaged MM/PBSA free energies from separate members in one ensemble simulation differ by up to 11.57 kcal mol−1 (figure 10), which falls far outside the standard deviation (approx. 2 kcal mol−1) for a single simulation. As many published studies use a single simulation to extract properties, the question arises as to what extent such results are at all reproducible. This question can be answered based on the data accumulated during our ensemble simulation study.
To study the reproducibility of a single run, one replica is chosen randomly from each ensemble simulation. The chosen replicas are combined to form a complete set for the four, five, seven or 10 EGFR mutants complexed with Gefitinib and AEE788 (table 3; see also figures 6 and 7). There are 504, 505, 507 and 5010 (6.3 × 106, 3.1 × 108, 7.8 × 1011 and 9.8 × 1016) possible sets of single trajectory simulations from our ensemble studies. For each set, the computed binding free energies are compared with experimental data, just as for the ensemble results of figure 6. The correlation coefficients (r) and the coefficients of determination (r2) are then calculated for all possible sets (table 3). (The correlation coefficient measures the degree to which two variables are linearly related, while the coefficient of determination measures the quality of the regression). A perfect linear relationship (r = 1) is obtained when the points of a scatterplot fall on a straight line with a positive slope, and a negative perfect linear relationship (r = −1) with a negative slope. In these cases, all the observations are explained by the regression line fits (r2 = 1). A strong linear relationship (r close to 1 or −1) means more points in a scatterplot tend to fall along a straight line, while no linear relationship (r = 0) means no linear trend can be discerned. The corresponding coefficients of determination give the proportion of observations being explained by the regression fit: r2 is small if only some of the observations are explained, r2 = 0 if none is explained. Although the calculated binding free energies from a set of single simulations have limited likelihood of manifesting a perfect explanation (r2 = 1) by the experimental data, there is a possibility that they might exhibit a negative linear relationship (r = −1). For the strong (r = 0.98) and moderate (r = 0.68) correlations (four and seven point sets in table 3), there are less than 5 and 8 per cent probabilities respectively that a set of single simulations could report better than our ensemble results. For the weaker correlations (r = 0.50 for the five point set, and r = 0.13 for the 10 point set), one set of single simulations is possibly (approx. 30%) not worse than the ensemble. This result is not surprising when the ensemble results do not yield a strong correlation with experimental data. When there is no correlation at all in an ensemble simulation, the single simulation method is likely to achieve a 50 per cent positive correlation. However, ensemble simulation is still better than the single simulation method as it produces results that are reproducible (see §3.5). These findings serve to confirm that a single simulation can only explore a restricted subset of conformational space, while ensemble simulation substantially enhances conformational sampling. The principal conclusion is that the use of ensemble simulation is essential to obtain consistent and reproducible structural and thermodynamic properties.
The promising progress made in drug development for targeted cancer therapy  invites the prospect of developing a drug ranking system to predict drug sensitivity and resistance at the genotypic level. MD simulations performed on a federation of high-performance computational grids featuring petascale supercomputers have been used in the present study to correctly rank the binding affinities of TKIs to WT and mutant EGFRs. We have performed ensemble simulations on TKI–EGFR complexes, in which 50 replicas of each system are simulated. The binding affinities are calculated by the MM/PBSA method.
The free-energy analyses reveal that the binding specificity of EGFR inhibitors is largely determined by the total electrostatic interaction, and the binding strength by the van der Waals interactions. Some mutations do not introduce significant changes in terms of per-residue interactions and the overall binding affinities; others may result in evident changes in interactions as determined by free energy decomposition, which provides a clear explanation for their drug efficacy. The analyses should be able to offer insight into structure–affinity relationships at the molecular level. The method is able to rank binding affinities of one drug to multiple EGFR mutants, as well as the efficacy of drugs with respect to a single EGFR sequence. In recent research on breast cancer patients with poor prognosis , it was suggested that functional genomics analyses would be used to identify the drug-specific predictor for paclitaxel response. Optimal treatment of cancer patients depends on selection of an effective treatment regimen. Our simulations reveal that mutations in the tyrosine kinase domain may enhance or lessen the inhibitory effects of TKIs; they may thus be important for evaluating the cause of treatment success/failure, the selection of targeted drugs and the identification of more responsive patients. Therefore, it seems that it would probably be beneficial for cancer patients to have genotypic assays performed prior to treatment in a similar way to that which is routine for HIV/AIDS patients today .
The issue of reproducibility, crucial for the validity of the entire approach, is tied to the convergence of such calculations. A single MD simulation is unlikely to be reproducible. However, ensemble MD simulation provides greatly enhanced conformational sampling and far superior reproducibility. By contrast, extreme caution must be exercised when reported properties are extracted from single trajectory calculations.
Ensemble MD simulations explore conformational space more effectively than single long time-scale simulations. Given the vast number of cores on petascale supercomputers (Ranger, for example, has 62 976 processing cores), in principle, all replicas within a single ensemble simulation can easily be run concurrently in one day, with post-processing analysis completed within a further 24 h [14,25]. Our studies are facilitated by the use of BAC  and AHE , which permit rapid and automated simulations and analyses across multiple grid-based petascale computing resources. The ensemble method, therefore, has the potential to accurately rank drug binding affinities on clinically relevant timescales (2–3 days), opening the way for its potential use in a clinical setting to match proposed drug treatment to individual patients' genetic profiles .
This research was funded by the EU FP7 ContraCancrum project (ICT-2007.5.30) (http://www.contracancrum.eu/). Our work was partially supported by the National Science Foundation under NRAC grant MCA04N014, which provided access to Ranger at the Texas Advanced Computing Center (TACC), and by the DEISA Consortium (co-funded by the EU, FP7 project 222919) via the Virtual Physiological Human Virtual Community managed by the VPH Network of Excellence (IST-223920: www.vph-noe.eu), which gave access to HECToR, the UK's national supercomputer based in Edinburgh, and to Huygens, the Dutch national supercomputer at the SARA Computing and Network Center in Amsterdam. We also wish to acknowledge the UK NGS (http://www.ngs.ac.uk) for providing access to their resources and their support for this project.
- Received November 2, 2010.
- Accepted December 15, 2010.
- This Journal is © 2011 The Royal Society