This is the SARS-CoV-2 main protease (MPro). SARS-CoV-2 is the novel coronavirus responsible for causing the pandemic disease Covid-19. This virus needs this protease in able to make functional proteins, so if scientists can develop drugs to stop the protease, they can stop the virus from doing damage. Today I want to tell you about what this structure shows us and how scientists around the world are using it (and others) to try to custom design MPro inhibitors to serve as treatments for Covid-19. 

The structure in the first figure was the first structure of a SARS-CoV-2 protein to be published (it was released February 5 by Zihe Rao and Haitao Yang’s research team at ShanghaiTech University). I remember the first (or second?) time I an image of it. It was back in February – on Valentine’s Day –  and it was during a presentation. In an auditorium. With PEOPLE! Seems like forever ago… Less long ago, I went to the PDB (Protein DataBase), the online repository for protein structural data, to find this structure – and I was surprised to find not just 1 structure of it, but 80+! 

One of the weird silver linings of the very very very dark cloud that is the coronavirus pandemic is that it’s shone light on one of my favorite fields of science, structural biology. You’ve likely been seeing a lot of “pictures” of viral proteins, and it’s structural biologists’ job to take those molecular “pictures” (using methods like x-ray crystallography and cry electron microscopy (cryoEM) and study what those “pictures” reveal about how the molecules’ form (structure) relates to the molecules’ functions. You can learn a lot about how a protein works by seeing what it looks like (imagine seeing a picture of an open Swiss army knife). And you can also learn about how a protein might “not work” if you can tie up or hide certain parts with another molecule, like a pharmaceutical drug. 

Going back to our Swiss army knife analogy, it’s like if you see a corkscrew and you want to prevent people from de-corking bottles – you could design a “cap” that covers up the pointy tip of the corkscrew. Similarly, if you can see what a protein’s “active site” looks like, the part where the protein “does stuff” (e.g. the place in the protease where it grabs onto and cuts the polypeptide), you can better design a drug that binds there and blocks it. 

Instead of designing from scratch, scientists often start by screening pre-existing drugs (some of which are already approved for treating other diseases) through a compound library screen, or screening pieces (a fragment screen). As I will tell you about in more detail later on, the first group used the first way – they tested pre-existing drugs and found a couple of promising leads. And an international group followed up with a massive x-ray crystallography-based fragment screen and crowd-sourced chemical design efforts called the COVID Moonshot

As you can see in the figures, MPro doesn’t look like a corkscrew. Instead it looks like a heart. At the heart of the protease’s heart shape is its “primary structure” – which is just a fancy way of saying its sequence of protein letters (amino acids). All proteins are made up of the same 20 (common) amino acids, each of which has a generic backbone part that allows it to link to 2 other amino acids to form chains as well as a unique part called a side chain or “R group” that sticks out (kinda like a charm on a charm bracelet). We can refer to amino acids by their full names (e.g. Alanine), their 3-letter nicknames (e.g. Ala), or their initial (e.g. A).

Proteins fold up in a way that makes all of their amino acids happy (e.g. put – charged ones next to + charged ones, let the water-loving hydrophilic ones hang out near the surface and hide the water-avoided hydrophobic ones in the center). Since different proteins have different numbers and combos of amino acids, their amino acid “happy places” will be different, so proteins have unique 3D shapes. And, just like a spoon, a fork, and a knife have shapes that suit their purposes, proteins do too. So having these different shapes enables them to do different things like cut up other proteins!

And this is task given to SARS-Cov-2’s Main PROtease (Mpro) (aka 3CLpro). The virus uses a lot of our cells’ proteins, and, by providing the instructions in its RNA, it gets our cells to make the proteins it needs but we don’t have. It makes some of these, the nonstructural proteins aka Nsps) as a long polyprotein (pp1a or pp1ab depending on where the protein-making complex (ribosome) stops reading). This polyprotein, as the name suggests, has many (poly) proteins (around a dozen) that are all connected because the ribosome makes them as one continuous chain. So it’s up to the virus’ proteases (protein “scissors”) to recognize where one protein ends and another starts and cut them apart into individual proteins. The virus needs to get us to make these proteases too – as part of the polyprotein, in fact, but they’re able to cut themselves out. There are 2 viral proteases, the main protease Mpro (nsp5) and a papain-like protease (PLpro), which is responsible for the first few cuts (separating nsp1/2, 2/3, and 3/4), with Mpro handling the rest

Since our cells don’t make these proteases, or even any very similar ones, they represent good potential drug targets since, if scientists can design some compound that binds and inhibits these proteases, it could be used to hurt the virus without hurting us. This strategy isn’t new. In fact, some of the components in the treatment mixtures for HIV are protease inhibitors. Earlier on in the pandemic, scientists tested a couple of these HIV inhibitors Lopinavir/ritonavir (trademark Kaletra) to see if they would also inhibit SARS-Cov-2’s proteases, but no luck.  

The failure of those HIV protease inhibitors highlights the importance of finding out more the structure of the SARS-CoV-2 MPro. So let’s look closer at the heart, which actually comes in 2 parts – it’s a dimer. Specifically, it’s a homodimer, which means that it has 2 copies of the same protein chain. On their own, these chains are called protomers – and they have to come together for the protease to be fully functional, because they stabilize each other’s active sites (one copy’s Ser1 interacts the other copy’s Phe140 and Glu166, thereby stabilizing the S1 subsite in case you were wondering). More about what these “subsites” mean in a second. 

Each protomer has 3 main “sections” aka “domains.” First (N-terminal-most) is Domain I (residues 8–101), followed by Domain II (residues 102–184). Both of these have an antiparallel β-barrel structure, kinda like if you took a long strip of paper, folded it zig-zag ( ↑↓↑↓) and then brought the ends near each other to form a tube. Then comes a flexible linker leading to Domain III (residues 201-303), which, instead of strands, has spring-shaped α-helices – 5 of them, mostly antiparallel too.

The active sites (where these protein scissors’ “blades” are) are nestled in between domains I & II, with the cutting occurring thanks to the teamwork of a couple of the amino acids whose side chains stick out into the active site: a Cis-His catalytic dyad (Cys145 & His41). Each promoter has one of these active sites, so the protease has 2 sites open for binding substrate (thing to cut) or inhibitor. These substrate-binding pockets are talked about as a series of subsites (e.g. S1, S2…) which bind consecutive amino acid residues in the substrate (in this case the polyprotein)…or different chemical groups of inhibitors? If an inhibitor could get there first, it could prevent the polyprotein from binding – and – if an inhibitor could permanently stick onto that catalytic Cys, it could permanently blunt the proteases’ scissors. Such “permanent sticking” is called covalent bonding, and I’ll tell you more about it later on. But first let’s talk about how scientists are looking for inhibitors. 

Diamond Light Source, a beam line in the UK, did one of those fragment screens using x-ray crystallography, that cool technique I was telling you about yesterday: 

They did what they call an XChem fragment screen – they took a huge library of chemical pieces (not full drugs but instead just parts of them). They then got MPro to crystalize and soaked crystals of MPro in them, then collected x-ray diffraction data from them, worked out their structures, put them online, and sent out a call for help, launching a crowdsourced initiative to use the structures of the fragment-bound crystals to try to design therapeutic drugs that could inhibit the protease. Which pieces bound the active site? Which pieces could you combine? How could you modify the bound pieces to make them bind better? 

On March 6, they released data from their first 600 crystals, and then on March 17, they released the rest – 1500 in total. Of course, not all of them had fragments bound to “useful” regions – but they identified 68 “hits of high interest” including 44 covalently bound to the active site (stuck through strong “irreversible” bonds) and 22 compounds that were non-covalently bound (stuck through weaker bonds). 

Chemists around the world submitted their ideas to a website and experts then looked through them and picked out what they thought were the most promising. In addition to considering binding optimality, these “judges” took into account things like ease of synthesis (it might look great on screen, but you have to be able to actually make it!) and potential toxicity (although you can’t know how toxic something is until you test it, some toxic effects can be predicted based on certain chemical groups the molecules have and similarities to other known drugs). 

They got over 3500 submissions  & the top candidates were then synthesized by a custom-synthesis firm and then put through another screen to test for binding activity. If the compounds bind well, they might then move on to seeing if they can stop protease activity in a dish, if they’re toxic in animals, and if they help animals with the disease. 

Once they have an initial “hit” they can further modify it as needed, adding little pieces here or there to try to give it better solubility, lower toxicity, etc., all the while using that structural information to know which parts they can’t change without messing up the binding-ability. More on the project here: The project is actively working (and seeking funding) so if you are able/interested, you can donate to the Covid Moonshot project’sGo Fund Me site: 

Now I want to tell you more about that first MPro structure, which is actually the structure of MPro bound to an inhibitor, N3 – and it’s going to get really geeky and I will try my best to explain, but sincere apologies if you get lost! That group at ShanghaiTech University was able to get their work out so quickly because they were used to working on coronavirus proteases. They even had a compound that they’d previously designed when looking for drugs to inhibit other coronavirus Mpros. They decided to test this compound, N3, to see if it could inhibit Mpro – and it did (at least in a test tube). 

They knew this because they developed a FRET-based reporter peptide (chain of amino acids) that mimics one of the protease’s natural cut sites (in this case the N-terminal “auto-cleavage site” that the protease uses to cut itself out of the polypeptide chain). On one end of the peptide is a fluorophore (a part that can let off light) and near the other end is a quencher that can prevent light from being given off – but only if they’re close together. This is because of this phenomenon called Forster Resonance Energy Transfer and it’s really cool but too complicated to go into detail here so check out this post if you’re interested: 

For now, just know that if this peptide gets cut by the protease, the fluorophore and the quencher will be separated so you won’t see light – and this tells you that the protease is active. If you add an inhibitor that works, you should keep seeing light. 

N3 passed this test and they were able to solve the structure of the SARS-Cov-2 Mpro in complex with N3. This structure (6LU7 in the PDB) showed that N3 binds the active site covalently through something called Michael addition of the vinyl group at the S1 subsite. I’m not going to try to explain that here, sorry, just wanted to put that out there in case people were wondering. 

They then did a “virtual” aka “in silico” screen to find potential other drugs that would fit into the proteases’ active site based on whether their structures matched well (could they “dock” a drug into the substrate-binding pocket?). They got some hits this way and they also did a high-throughput physical screen of ~10,000 compounds (approved drugs, clinical trial drug candidates, natural products) with their FRET assay, and then followed up on their hits. A couple that they found most interesting were PX-12, carmofur, & ebselen

They recently (May 7) published another paper with the results of their follow-up on carmofur, including a crystal structure of it bound to MPro.

Carmofur would be great because it is already an approved treatment for some cancers. In those cases, it’s thought to work by targeting thymidylate synthase (involved in DNA-letter-making) and/or acid ceramidase, which is important for lipid-y stuff.  

Carmofur (1-hexylcarbamoyl-5-fluorouracil) is a derivative of 5-fluorouracil (5-FU), which is actually a nucleotide analog (it looks like a DNA letters) that can be used as a mutagen (introduces DNA mutations). But in its original form, 5-FU isn’t gonna fool any DNA copiers (polymerases) until it gets metabolized- it clearly is not a DNA letter – it has a long hydrocarbon chain sticking off of it. In between the DNA-base-look-alike part and this chain is an amide group (a carbonyl (carbon double bonded to an oxygen) next to a nitrogen). 

That carbonyl carbon is really electrophilic – atoms (like those individual carbons and oxygens) join together to form molecules by sharing pairs of negatively-charged subatomic particles called electrons, but some don’t share fairly. Oxygen is a major electron hog (it’s highly electronegative) so, in the carbonyl, it draws the shared electrons closer to it, making it partly negative and the carbon partly positive. The carbon thus “wants” more electrons so it’s called electrophilic. 

The active site’s catalytic Cysteine has a sulfur that wants to share electrons, and that sulfur wants it more than the 5-FU part, so the sulfur attacks, site, kicking off the DNA-base-look-alike part, with the chain part staying stuck. Hydrocarbon chains like this are hydrophobic – water doesn’t like to hang out with them – so they end up hanging out with other hydrophobic things as well. And carmofur’s fatty acid chain finds a nice hydrophobic pocket in the Pro S2 subsite. Note: This is different from how N3 gets stuck on – same cysteine, and covalent bonds both times, but different attack strategies. 

The group showed that carmofur could inhibit SARS-CoV-2, but it’s not “great” as an inhibitor – thankfully, the crystal structure offers hints as to where medicinal chemists might be able to make changes to the molecule to make it more effective, similarly to how we were talking about with the crystal screen. 

Hope that wasn’t too geeky – but I think it’s so fascinating! If you want to play around with the molecular structures, you can search those 4 letter PDB codes (6LU7, etc.) – even if you just type that into Google, it should take you to the Protein DataBank & then you can play around (I played around & made the figures in a software called ChimeraX, but there are built-in 3D viewers on the PDB websites). Have fun! 

more Covid-19 resources:

more on topics mentioned (& others) #365DaysOfScience All (with topics listed) 👉

Leave a Reply

Your email address will not be published.