As epidemiologists look at the large scale, closely watching for when coronavirus cases spike, structural biologists look at the really really tiny scale, sights set (and zoomed way in) on the coronavirus’ Spike protein. Even if you don’t know Spike by name (he often goes by his nickname S) I’m sure you’ve seen him around. He’s that protein that studs the virus’ oily lipid membrane, jutting out like a crown or halo, hence the viral family’s name “CORONAviridae.” He’s getting so much attention because he’s responsible for binding to cellular receptors and helping the virus get inside. So let’s take a (much, much) closer look.

The Spike protein is kinda like a molecular Transformer. It latches onto the cell’s angiotensin-converting enzyme 2 (ACE2) receptor, gets cleaved by cellular proteases (protein cutters) and then undergoes a major shape-shift (aka conformational change). This transformation from the “pre-fusion conformation” to the “post-fusion conformation” is kinda like snipping the string tying something  – it releases structural constraints so S can extend parts of itself that were previously hidden called fusion peptides, which “attack” the cellular membrane and pull it towards the viral one, getting them to fuse & allowing the virus to spill out its guts inside of our cells and hijack our cellular machinery to do its bidding. 

Just learning more about this process from a “basic science” perspective would be fascinating in and of itself, but there’s also “translational science” value (putting the knowledge to direct use). The best way to prevent a person from getting the disease Covid-19 is to prevent them from coming into contact with SARS-CoV-2, the novel coronavirus that causes it (this is why we’re doing all this social distancing stuff). But if the virus gets into someone’s body, it still has to get inside their cells. If you could block the Spike protein from binding and/or shape-shifting (such as with neutralizing antibodies) you could prevent the virus from getting in. For this reason, S has been of great interest for the development of vaccines (get someone’s body to produce their own such antibodies prior to real exposure) and therapeutic antibodies (give infected people pre-made antibodies to help prevent the virus from infecting more cells). 

So let’s take a look at this awesome protein at the structural level and what its structure, along with other experiments done on it, reveal. We can do this deeper look thanks to several structures of the Spike protein – alone or bound to ACE2 or antibodies – solved using x-ray crystallography or cryo-electron microscopy (cryoEM). 

We often talk about protein structure at different “levels.” Proteins are “polypeptides” – they’re long chains of many (poly) protein letters called amino acids connected through peptide bonds. Each amino acid has a generic backbone part that it can use to link together into these chains as well as a unique “side chain” (aka R group) that sticks off like a charm on a charm bracelet. These chains fold up into 3D structures that best accommodate all the different side chains (e.g. stick the + charged ones next to – charged ones and hide the ones water doesn’t like (hydrophobic ones) in the center of the protein where the watery solvent (the liquid surrounding the protein) doesn’t have to see them. These shapes are held together by interactions between just the amino acid backbones (secondary structure) as well as interactions that involve the side chains themselves (tertiary structure). 

Sometimes, multiple chains work together, as is the case with S. When you have such inter-chain interactions, that’s referred to as quaternary structure. S works as a homotrimer (3 copies of the same chain stuck together) that looks a bit like a 3 heads of broccoli rubber banded together. The “crowns” are the S1 domains and the “stalks” are the S2 domains. Part of the S2 domain is embedded in the viral membrane. The part of the protein that sticks outside the cell (all of S1 & part of S2) is called the “ectodomain” (ecto- meaning outside, and domain meaning a specific region of a protein). 

Speaking of domains, an important one with the spike protein is the Receptor Binding Domain (RBD) which is located in the S1’s (there are 3 per trimer). The RBDs look a bit like “flaps” – they can hang down or stick up – and it’s this up position that exposes their binding site for the ACE2 receptor (this region of the RBD is called the receptor-binding motif (RBM)). Kinda like if you have your hands on your head & you want to do a basketball dunk you have to put your hands up before you jump, the spike protein has to put its RBDs up before it binds. And this has the added effects of destabilizing S1 and exposing its S2’ cleavage site, making it easier to cut off from S2, which is an important part of viral fusion (the process whereby the virus merges its own membrane with the cellular membrane so it can kinda “inside-out” itself into the general cellular interior (cytoplasm)). 

You can see the differences between the RBD “up” position (ready to bind!) and “down” position (not yet!) in the first structure of the SARS-CoV-2 S protein to be solved. Back in February, Jason McLellan’s lab at University of Texas at Austin used cryoEM to solve the structure of the ectodomain (the non-membrane-embedded part). Their speediness was in part because they’d already solved the structure of a couple of other coronavirus’ spike proteins, including the one for the MERS virus. So they knew what they were doing and they knew a “trick” for getting the protein to better cooperate for picture-taking. They introduced a couple of mutations into the protein that they were expressing recombinantly (i.e. they put the gene for it into cells in a dish/flask and had the cells make the proteins which they could purify out and, in this case, they altered the DNA letters in the gene to alter the amino acid letters in the protein). These specific letter swaps stabilized the protein in the “pre-fusion” conformation – the shape of the protein before it docks onto our cells. 

note: for those of you who want the nitty gritty details, they mutated 2 residues to proline in the C-terminal S2 fusion machinery – proline’s a weird amino acid because its side chain interacts with its backbone making it “stiffer” than the other residues, so these proline substitutions make the pre-fusion conformation more stable. If you want even more nitty gritty details, check out their paper 

In this structure (deposited in the ProteinDataBase (PDB) with the ID 6VSB, 2 of the protomers (individual chains) in each trimer have their RBD in the down position (hands on head) and 1 of them is in the up position (hands up & ready to bind). Speaking of binding, the same group found SARS-CoV-2’s S RBD binds ACE2 over 10X as tight as the one from SARS-CoV & Fang Li and his lab at the University of Minnesota in St. Paul used x-ray crystallography to help show why this might be. (though it was later found that the full-length S protein of SARS-CoV-2 binds about the same as the full-length S of SARS-CoV, likely because the RBD is likely in the flap down mode a lot of the time – more so than the SARS-CoV spike protein – so the tighter binding when it gets the chance to bind is a sort of compensatory mechanism for not getting the chance as often 

Li’s group solved the structure of the spike protein’s RBD bound to the ACE2 receptor. They used some crystallography “tricks” to help make the protein “stiffer” so it would crystallize and give clear data (which you can’t get if the protein’s flopping around because crystallography requires all of the lots and lots and lots of proteins (or groups of proteins) to freeze in the exact same pose with the exact same spacing so you can beam them with x-rays and have those x-rays scatter the same way). Their “trick” involved using a piece of the SARS-CoV (original one) spike protein “below” the SARS-CoV-2 RBM (the part that actually binds the receptor), making a “chimeric RBD”

When they compared this structure to the structure of plain SARS-CoV RBD + ACE2, they found that SARS-CoV-2 took a more compact shape that allowed it to make more contacts with the ACE2 receptor, helping the virus get a more secure grip.

Those were the first couple structures, but there have also been several more, including a non-chimera SARS-CoV-2 spike RBD bound to ACE2 (solved by Jun Lan et al.), which you can see in the figures (PDB 6M0J) – note that in these structures, it’s isn’t the full ACE2 – ACE2 is a membrane-bound protein and they only used the N-terminal domain (NTD) – the “beginning end” of the ACE2 protein chain which is the part that sticks out of the membrane and interacts with Spike’s RBD. 

Other structures show Spike bound to other things – including neutralizing antibodies. Antibodies are little proteins that specifically bind viral parts, such as viral proteins, like the spike protein. When someone’s infected, their body makes antibodies against the infector by mixing & matching constant & variable regions to find ones that specifically bind parts of the invader. This allows them to call in for backup from other immune system components when they find the virus. And neutralizing antibodies have the added bonus that they bind to the virus in such a way that the virus can’t get into cells at all – thus “neutralizing” the threat. Such neutralizing antibodies are therefore highly valued (whether person-made, lab made, or llama-made) and you can learn more about them here: 

Neutralizing antibodies often work by binding and thereby blocking the part of a virus that normally binds to the cell in an “I got here first!” fashion. For the spike protein, this means binding the Spike protein’s RBD. Some antibodies bind directly to the RBM (quick reminder since there’s been a lot of acronyms, the RBM is the receptor binding motif, the actual part of the protein that directly binds to the ACE2 receptor). It’s easy to see how these could block entry. But other neutralizing antibodies are sneakier – they bind the S protein elsewhere but keep the protein from shape-shifting. 

One challenge is that the virus doesn’t want your body to find it, so it uses tricks including glycosylation . When your immune system “sees” the spike protein, it doesn’t just see the amino acids, instead, the surface of the protein has a bunch of sugar chains sticking off of it. This sugary shelf helps protect it from our immune system – which, instead of seeing a foreign protein sees the sugars our body uses all the time. This “glycosylation” is a form of “post-translational modification.” Post-translational just means it happens after translation, the process by which protein-making complexes called ribosomes use the virus’ genetic info as a recipe for sticking together the amino acids into chains. After getting chained together, some of S’s amino acids get glycosylated, which means that they have sugar chains latched onto them. 

Only a few amino acids have side chains that allow for this type of modification – for example, asparagine (Asn, N) has a nitrogen that sugar can be attached to through N-linked glycosylation. This can be added by our cells’ sugar-adders when you have the sequence Asn-something-Ser/Thr. And speaking of serine (Ser) & threonine (Thr), those 2 amino acids have -OH groups whose oxygen atoms can get glycosylated in O-linked glycosylation, which is less common.

Some of these sugars are too floppy to be seen in the some of the structures, so to figure out where they’re located, scientists turned to a different technique – mass spectrometry. It’s a lot more complicated than this, but basically mass spec works by cutting something like a protein up into a bunch of pieces, “ionizing” them to charge them, then separating the charged parts by size (mass) and figuring out what they are based on their mass/charge ration. If a sugar is on a protein part, that protein part will be bigger than expected and you can use this sort of thing to figure out what types of sugars are where. 

Each S protomer has 22 potential N-glycosylation sites & 4 O-ones. And, although the glycosylation states found in a few studies vary a little, most, if not all, of potential N-glycosylation sites are indeed glycosylated (though the composition of the different sugar chains can differ)- so you’re looking at around 60 sugars per spike. This glycosylation, in addition to providing that camouflaging glycan shield, also earns the S protein the fancy title of “glycoprotein.” 

Another “title” you may have seen given to the spike protein is “class I fusion protein.” Other viruses use similar strategies for getting into cells – for example, the flu’s hemagglutinin (HA) protein is also in this class. I didn’t know much about these fusion proteins before, but they’re really quite amazing. Here’s a simplified overview. And here’s a link to an article with more details: 

A region called the fusion peptide is at their heart – literally! The fusion peptide is hydrophobic – water doesn’t want to hang out with it – so in the pre-fusion state it’s hidden deep inside the protein. But when you cut the top of the protein off and loosen things up, that lipid-loving peptide faces a watery environment, “sees” that nearby cellular lipid membrane and shoots out towards it, latching on, now connected on both ends. This “panic and grab” leaves it in an awkward position, so next up is getting comfy. To do this, it refolds into a newer, more compact & energetically-favorable, shape, pulling the cell membrane with it as it does. When 2 neighboring S proteins do this, they merge. Kinda like this = -> )( -> – . This merging dumps out the viral contents into the cell, where the virus can go to work getting our cells to go to work for it – making new copies of the virus and shipping them out. You can see the pre & post fusion states of SARS-CoV-2 in these cryoEM structures from Yongfei Cai & Bing Chen’s labs at Harvard, but as of writing this the structures haven’t yet been deposited in the PDB so we can’t play around with them yet: 

Since the Spike protein is the virus’ key to cellular entry, it’s been getting a lot of interest from a genomic and evolutionary level. You know how I said that S gets cleaved to trigger the major shape-shift where the fusion peptides shoot out? With the original SARS virus, this cleavage occurs by a plasma membrane-bound protease called TMPRSS2, which can do the cleavage early on, while the virus is still at the cell surface, or by a protease called cathepsin L, which can cleave it later, when the virus is “swallowed” into a membrane-bound pouch called an endosome – such “endocytosis” is kinda like the cell pinches in the part of the plasma membrane containing the receptor-bound spike protein, giving you a little membrane-bound “quarantine” bubble inside the cell that the virus now has to escape from. The escape is aided by a protease called cathepsin which is activated by the low pH (acidic-ness) of the endosome & can cleave the Spike protein to enable fusion. Both of those proteases cleave a site called S2’, which all coronaviruses have, and which is crucial to allowing for fusion. 

That’s one of 2 cleavage sites – and the other one’s getting more attention. This “other site” is called S1/S2 because it’s at the border of the S1/S2 domains. And one of the first findings that scientists made about SARS-CoV-2 after sequencing its genome (genetic blueprint) is that it has an “extra” letters at this second site, and those letters are ones a protease called furin likes. 

 Next to the S2’ site is a “polybasic cleavage site” referred to as S1/S2. Basically, polybasic (aka multibasic) means that there are a lot of usually-positively-charged protein letters (amino acids) next to each other – in the case of SARS-Cov2, there’s the sequence RRAR. And this type of sequence can serve as a cleavage site for a protease called furin. They thought this could be important because the furin protease is much more “ubiquitous,” meaning that it’s expressed in a wider variety of cells, therefore making it easier for SARS-CoV-2 to infect various cells. 

Scientists have now done a series of experiments showing that this furin site is important, but it’s not a major advantage in many cases. Basically, the furin cleavage at S1/S2 occurs while a new spike protein is being made. And it kinda loosens things up to make it easier for S2’ to be cleaved, which can make a difference in cells that don’t have a lot of TMPRSS2, but it doesn’t make a big difference if the cells do have a bunch of TMPRSS2.

Some people have claimed that this polybasic cleavage site is evidence of viral man-made-ness. But scientists were quick to point out that 1) not only had this exact sequence ever been seen before, 2) it isn’t “ideal” – so “makers” 1) wouldn’t have just been able to “plagiarize” another virus and 2) wouldn’t have just made up this sequence when they could have made up something better.

Speaking of better, I hope you now have a better understanding of Spike!

more Covid-19 resources:

more on topics mentioned (& others) #365DaysOfScience All (with topics listed) 👉 

2 Thoughts on “The coronavirus Spike protein”

Leave a Reply

Your email address will not be published.