Proteases! The coronavirus has made proteases “popular” in the media, but these “protein scissors” have been popular in biochemistry labs for a lot longer! To save space, the coronavirus, as well as other viruses like the Tobacco Etch Virus (TEV), whose protease I purified a couple days ago, makes its proteins as a big chain then uses a protease to cut them apart (kinda like a pillsbury making a giant cinnamon role tube thing then slicing it. It has to know where precisely to cut so it has a specific sequence it recognizes and cuts at.If we stick this sequence somewhere else, it will still cut there. 

So we can stick it between a fusion tag and our protein to remove the fusion tag. Fusion tags can help with solubility & expression (things like SUMO, GST, & MBP basically tricks the cells into making them make something they like to make and then sneaking in there once the “foot’s in the door”) and affinity purification (things like His & strep tags bind to matching groups attached to chromatography resin), but they can get in the way later on. You can buy these proteases, but if you use them a lot, and are in a protein biochemistry lab you can purify your own. And I use a lot of TEV, so I purify my own. And I’m happy because I verified that this latest batch is active. 

First, some quick terminology notes: Proteases & peptidases cut apart protein letters (amino acids). Sometimes they’re referred to as proteases other times peptidases and most of the time the terms are basically used interchangeably. Proteins are just long chains of amino acid “letters” linked together through peptide bonds – many peptide bonds – thus polypeptides… that fold up & do cool stuff in your cells. So even if you start with something you’d consider a protein, chew it up enough and the little pieces you’re left with are just “peptides.” So “peptidase” is a broader term that can apply to any cutter of peptide bonds – whether in short peptides or full-on proteins.

Within the broad class proteases/peptidases, you can further classify them into exopeptidases (protein chewers) and endoproteases (protein scissors). “Endo” means it DOES NOT cut ends. It cuts in the “middle” of a sequence – you can remember this by thinking of an endoscopy. If you have an endoscopy your doctor sticks a tube with a camera *inside* you to look at your insides. If you use an endo-something-ase, you cut that something *inside* of its sequence. On the other side of things, when scientists look for exoplanets, they’re looking for planets on the “outskirts.” When scientists use exo-something-ases they’re using enzymes (biochemical reaction mediators/speed-uppers) to chew that something from the ends. 

They all have different uses inside of your body and many have features that give them usefulness in the lab for different things – though the versions we use in the lab usually are bacterial or viral versions – companies make and purify a lot of it and then sell it to you. One of the awesome things about being a protein biochemist is you can save a lot of money on enzymes you use a lot and that other people usually view as “reagents” that they buy by expressing & purifying it yourself and cutting out the significantly-swipey middleman (more on this in a sec).

You might be more familiar with nucleases – nucleases cut nucleic acids which is the umbrella term for RNA (RiboNucleic Acid) and DNA (DeoxyriboNucleic Acid) – nucleases cut apart their nucleotide “letters”. “Restriction enzymes” are endonucleases that recognize specific sequences of DNA & cleave them – they’re useful for things like molecular cloning (moving pieces of DNA from one place to another) or mystery-solving – you can use analytical digests for things like seeing if those sequences are present (maybe in one suspect but not another, or in patients with a disease but not those without) – restriction polymorph fragment analysis can be used to cut (or try to cut) DNA & see what pieces you end up with. more on that here: 

Endoproteases are like restriction enzymes but for proteins not DNA – well, the site-specific ones at least. Endoproteases like TEV & HRV3C proteases recognize specific sequences of amino acids (protein letters) and cut them – useful for things like cleaving off affinity tags we attach to proteins to help us purify them). 

Different ones have a range of specificities – those 2 are super specific – which is what you need for things like cleaving off those affinity tags (most of the time with protein purification you’re trying to avoid proteases – you’re spending all your time trying to isolate and protect a precious protein so if you introduce scissors in there they better be really careful cutters!)

Some of the ones we work hard to avoid during protein purifications are the peptide “chewers” – exopeptidases which chew off the ends one amino acid at a time. The analogous enzymes for nucleic acids are called exonucleases. Proteins have “directionality” – a front end & an end end and because these 2 ends are different, they require different chewers. The front end of proteins is the N terminus. It gets this moniker because it has a free amino group, characterized by a Nitrogen (N) linked to hydrogens. At the end end is the C terminus, which has a Carboxyl group (C=O)-OH. Aminopeptidases chew from the N end and carboxypeptidases chew from the C end. And we say proteins go “from N to C” (and nucleic acids go from 5’ to 3’ in case you were interested…) 

When we lyse (break open) cells to purify out the protein we’ve had them make for us, we add protease inhibitors to protect our protein. But protease inhibitors can also be used as antiviral drugs because lots of proteins, like the novel coronavirus, SARS-CoV-2, make “polyproteins” – they make their proteins as a long chain of connected proteins with cleavage sites in between them & they make the specific endoprotease that recognizes those cleavage sites. Block that protease & the virus can’t separate their proteins.

So that’s why proteases have been in the news a lot lately – scientists are working to design protease inhibitors that specifically block the coronavirus’ proteases (it uses 2 such proteases – the main protease, MPro, & the papain-like protease, PLPro). For example, the COVID Moonshot project is a crowdsourced effort using information gotten through fragment-based compound screening to design compounds that connect and/or modify “drug parts” to make “full drugs.” More here: 

But endoproteases have “always” been the talk of the town in protein biochemistry. Because we can *use* (different) viral proteases for other purposes – stick the cleavage site it recognizes somewhere you *want* to get cut, then add that matching protease to cut there instead. This is how we get things like TEV (TEV stands for Tobacco Etch Virus) protease & HRV3C protease (HRV stands for Human RhinoVirus (common cold)). note: PreScission protease is just a brand name for HRV3C protease attached to GST and His. (The GST tag on the protease lets you remove the tag & the protease from the cleaved protein at the same time using a glutathione column)

There’s no punctuation or spaces in in amino acids – so instead you have long chains of letters like


and since the ends are different, you can think of it like


different endoproteases can recognize specific words & cut after or in them – like “this” or “some” – so the sequence you get will depend on the endoprotease you use:

N-this-C + N-issomeprotein-C 


N-thisissome-C + N-protein-C

Because endoproteases recognize a specific sequence, you can put that sequence anywhere* and it will still be recognized & cut (this principle might remind you of how we can stick affinity tags or antibody tags on anything…)

sometimes, endoproteases cut *within* their recognition site (like cutting after the “o” in some) – N-thisisso-C + N-meprotein-C whereas others cut *after* the letters they recognize – N-thisissome-C + N-protein-C

This can be important to keep in mind when designing a cleavage site to cleave an affinity or fusion tag off your protein – if you use a cleavage site for HRV3C, you’ll have a couple extra letters left over because it cuts *inside* its recognition sequence (LeuGluValLeuPheGln ↓ GlyPro), with the arrow showing the cleavage site). But if you use a cleavage site for TEV, which cuts *after* its recognition site, you can get a “scarless” cut (note: TEV recognizes the sequence ENLYFQ(G/S) and cuts between the Q & G(or S) but, the last letter is pretty flexible so it doesn’t have to be a G or an S as long as the beginning part is right). 

So, say you have 2 endoproteases that recognize “some” – you can design the gene to make the protein that has the sequence


if you use something that cuts after the e in “some” -> thisissome + MYPROTEIN

but if you use something that cuts after the o in “some” you’re left with some extra -> thisisso + mePROTEIN

One of the great things about being a protein biochemist is you can express and purify your own stocks of the ones you use a lot. For example, most of the protein constructs I use are designed in the format: StrepTag-SUMO-TEV cleavage site -my protein

The StrepTag is an affinity tag that I can use to bind to a streptactin column to help me purify it. more here:

SUMO is a fusion partner that helps with solubility & promotes expression through a “foot in the door” technique that tricks the cells into making something they like first. 

The tag & fusion partner are useful, but only during the expression & purification parts, then they can interfere with things. The TEV site lets me cleave off the Strep-SUMO part leaving me with “natural” (unnaturally expressed) protein after the Strep & SUMO have played their roles.

Earlier this week I purified some TEV protease, and this morning I tested it to see if it’s active. 

This TEV construct is super cool because it’s self-cleaving – it expresses with an MBP fusion tag but there’s a recognition site in between it & the tag – it’d be like licking your elbow to try to cut off its own tag, so another copy of it cuts it. This fusion part is on the opposite end as the His tag so the protein doesn’t get “de-affinitied” and will still stick to the column. The MBP part *shouldn’t* stick but because you’re over expressing the construct there’s so much of it that some of it sticks around. But, since it’s a different size than the TEV, you can remove it via size exclusion chromatography (SEC)(aka gel filtration). SEC has the added benefit of removing all the imidazole you used to push the TEV off the column. 

Compared to past preps, it wasn’t the greatest yield, but I got 14mg of pure TEV protein from 2L of bacterial cells I’d stuck the TEV-making instructions into. 

But of course, having a bunch of protein isn’t really helpful if the protein isn’t active – so this morning I tested it to make sure it is. I did a serial dilution of the TEV and then mixed it with my tagged protein then took samples at different timepoints. 

I can tell if it works by running SDS-PAGE gels and looking for a shift in my protein size. The tagged protein’s longer, so it gets tangled more in the SDS-PAGE gel’s mesh & travels more slowly so its band will be higher up when I turn off the power. When it gets cut, the band will get tangled less, so it will travel quicker and show up lower. At the same time, you’ll see the appearance of an additional band – that corresponding to the part you cut off. And usually it’ll be even smaller so it’ll travel lower. 

And in my gels I can see that the band representing the tagged protein disappears and lower bands representing the now separated protein & tag appear, showing me it worked (and helping me figure out the optimal TEV/protein ratio)!

note: I said that the sequence can be recognized “anywhere” – but, as usual, restrictions apply (no pun intended this time – remember restriction enzymes work on DNA not proteins anyway!). If a protein is folded in a way that “hides” the site, the enzyme can’t get to it and it won’t get cut.

Sometimes this can be annoying because it can prevent you from easily de-tagging your protein (if this is the case you might want to try switching the tag to the other end of the protein where it might be more accessible). Because accessibility varies from protein to protein, you might have to experiment to find out how much protease you have to add to he protein you’re working with add to get full tag cleavage.

But in other cases, this “can’t cut if hidden” property can be useful because it tells us what’s hidden. In a technique called limited proteolysis, you add a small amount of endoprotease to a protein and see where it gets cut – you take samples over time to see which areas are the most vulnerable (and therefore likely most accessible). 

There’s this really cool example of this from the lab I work in (that of Leemor Joshua-Tor at CSHL). They were studying an RNA-binding protein called Argonaute (Ago) that’s involved in RNA interference (RNAi) – it binds small RNAs (~22nt long) & uses them as guides to bind to & silence mRNA targets to regulate protein expression. more here:

When they did limited proteolysis on the RNA-free Ago, the protein got cut way more than when they added guide RNA, & this showed them that binding guide RNA causes Ago to “snap into shape” in a way that stabilizes the protein in a form that cut sites are less accessible. 

When you’re doing limited proteolysis, you don’t want to use a super specific cutter like TEV or HRV3C. Instead you want to use a more generic one that, instead of recognizing “words” recognizes single letters. You don’t want one that recognizes really common letters like “s” or “e” or else you’d get way too many pieces. Instead you want to recognize something common but not too common, like “p.” In mass spectrometry you use more generic proteases than you use with limited proteolysis – mass spec works by breaking a protein up into peptide fragments & measuring how big they are, so you want lots of fragments (but still not too many!)

A few common ones are ones that cut next to to the amino acids lysine (Lys, K) & arginine (Arg, R) in the case of the endoprotease trypsin, and next to aromatic amino acids like Tyrosine (Tyr, Y), Phenylalanine (Phe, F), and Tryptophan (Trp, W) in the case of the endoprotease chymotrypsin. Pepsin’s less predictable, but prefers to cleave next to aromatic amino acids & leucine. note: aromatic refers to having a special electron-stabilized ring. no room/time to go into it here but if you’re interested:

more on topics mentioned (& others) #365DaysOfScience All (with topics listed) 👉 

Leave a Reply

Your email address will not be published.