Homology Model of SARS-CoV Mpro Protease

Ashley Wiley and Ghislain Deslongchamps*

Department of Chemistry, University of New Brunswick, Fredericton, N.B. CANADA    E3B 5A3

May 30, 2003


On May 1, 2003, the sequence of the Toronto strain (TOR2) of SARS coronavirus (SARS-CoV) was reported by Marra et al.(1) That same day, Rota et al. (2) at the CDC reported the sequence of the Urbani strain of SARS-CoV, which is essentially identical to that of the TOR2 strain. On May 12, the x-ray stuctures of two non-SARS proteases were deposited (3) to the PDB by Hilgenfeld et al.(4), the Mpro protein from human coronavirus strain 229E and Mpro from porcine coronavirus strain TGEV (the latter complexed to a peptidomimetic chloromethyl ketone inhibitor). The following day, Hilgenfeld et al. published the aforementioned x-ray structures (4), and used them as templates for deriving a homology model for SARS-CoV Mpro.

Unaware of the Hilgenfeld publication, we independently developed a homology model of SARS-CoV Mpro based on the sequence information published by Marra et al. (TOR2 strain). The Hilgenfeld x-ray structures (PDB ID: 1P9S and 1P9U) were found via BLAST and used as templates in the alignment and modeling of SARS-CoV Mpro. Subsequent to our homology modeling exercise, we encountered the Hilgenfeld et al. publication, noted its own homology model and its deposition in the PDB. Our homology model of SARS-CoV Mpro, which differs somewhat from the Hilgenfeld structure, is reported here along with corresponding atomic coordinates.


In order to extract the protein sequence coding for SARS-CoV Mpro from the SARS genome (Marra et al.), NCBIs online BLAST engine (http://www.ncbi.nlm.nih.gov/BLAST) was searched for protein sequences similar to the monomeric unit of TGEV Mpro (input in FASTA format). The reported SARS replicase 1ab (TOR2) sequence scored highly in the search (43% sequence identity, 61% positive sequence similarity, 4 residue gaps). To find sequences homologous to SARS with known cartesian coordinates (for use as templates in homology modeling), the matching SARS TOR2 sequence was extracted from the BLAST report and input into the BLAST engine from scratch. Again, several results appeared, but only the TGEV Mpro sequence had a resolved 3D structure (perhaps the coronavirus 229E Mpro structure has yet to be linked in the BLAST sequence library).

Protein monomers from TGEV and 229E proteases were used to create a homology model for SARS TOR2 Mpro. All homology modeling was carried out with the Molecular Operating Environment (MOE) (5). The sequences were first aligned in MOE, then the model calculated with chain B of TGEV (1P9U.B) as the primary template. 25 intermediate models were created, each coarsely energy-minimized for steric interactions. The best intermediate was then finely energy-minimized using the MMFF94s forcefield.

Visual comparison of the lowest-energy homology model with chains 1P9U.B (TGEV monomer) and 1P9U.G (substrate analog hexapeptidyl chloromethyl ketone inhibitor) showed some steric incompatibility between residues of the homology model (that do not agree with chain 1P9U.B) and the inhibitor, particularly Pro66, Met63, and Gln82. Thus, a short molecular dynamics simulation on the homology model was carried out in MOE to relax its structure, limited to a 12-angstrom radius from the ligand (1P9U.G).


The original homology modeling procedure produced 25 intermediate structures. The majority of each structure remained highly conserved with the homology templates, but there were three regions of substantial variability between intermediates (Fig.1). The final SARS-CoV Mpro homology model (after fine-grained MM optimization) was compared to the primary template revealing striking similarities in the backbone structures (Fig. 2), and some variation in residue side-chains near the active site.

Fig. 1 - Backbone representation of homology model intermediates. Most of the structure is conserved, but there are three noticeable areas of variability.
Fig. 2 - Backbone representation displaying the difference between the final homology model (green) and the primary template (red; TGEV, PDB ID: 1P9U, chain B).

Residues Pro66, Met63, and Gln82, in particular, were found to clash with the TGEV inhibitor (Fig. 3), suggesting some minor incompatibility between the homology model and the inhibitor pose found in TGEV (Fig. 4).

Fig. 3 - Stick representation displaying those residues that differ between the homology model and the primary template (red: TGEV Mpro; yellow: SARS-CoV Mpro homology model; green: TGEV Mpro inhibitor).
Fig. 4 - Analytic Connolly surface representation of the TGEV Mpro active site with bound peptidomimetic chloromethyl ketone inhibitor.

Thus, a molecular dynamics simulation was carried out on all residues within 12 of the inhibitor (Fig. 5). This resulted in a shift of the backbone away from the inhibitor near Pro66, and also in rotation of the side-chains of Met63 and Gln82 to accomodate the inhibitor (Fig. 6, 7). This not only reduced steric clash between the active site and substrate analog, but also ensured an energetically favorable protein conformation of the homology model.

Fig. 5 - Backbone line representation of the difference between the homology model (purple) and the MD-minimized model (cyan). In yellow is the TGEV protease inhibitor.
Fig. 6- AnalyticConnolly surface representation of the unminimized SARS-CoV Mpro homology model active site with bound peptidomimetic chloromethyl ketone inhibitor.
Fig. 7 - AnalyticConnolly surface representation of the MD-minimized SARS-CoV Mpro homology model active site with bound peptidomimetic chloromethyl ketone inhibitor.

Superposition of the final MD-optimized structure with the Hilgenfeld et al. homology model revealed a strong similarity between the two (Fig. 8). Superposition RMSD was calculated to 2.57.

Fig. 8 - Backbone line representation of the difference between the MD-minimized homology model (red) and Hilgenfeld's model (green). TGEV inhibitor in blue.

Here are the atomic coordinates for the aforementioned homology models (PDB format):

compression homology model

MD-relaxed homology model
(with TGEV inhibitor)

none X X
ZIPped X X


A homology model of SARS-CoV Mpro was derived using two crystal structures recently deposited into the Protein Data Bank. Comparison to the homology model presented by Hilgenfeld et al. revealed a strong similarity between the two, but obvious differences are visible in the active site region. The MD-relaxed homology model reported herein displays a wider groove, in likeness to that of the TGEV Mpro-inhibitor complex, allowing for a better fit of the inhibitor. Indeed, it would be interesting to compare these homology models to the actual structure of SARS-CoV Mpro and/or its inhibitor complexes if crystallographic data are made available in the future.

* Author to whom correspondence should be addressed, ghislain@unb.ca.


1. www.sciencexpress.org / 1 May 2003 / 10.1126/science.1085953

2. www.sciencexpress.org / 1 May 2003 / 10.1126/science.1085952

3. rcsb.pdb.org, PDB ID: 1P9S and 1P9U.

4. www.sciencexpress.org / 13 May 2003 / 10.1126/science.1085952

5. Molecular Operating Environment 2003.02, The Chemical Computing Group Inc., 2003.