Endo- & exo- something-ases can leave you scratching your head for days-es. Protein or DNA or RNA, inside or out? What cuts what? And where about? When it comes to DNA/RNA chewers (nucleases) and protein chewers (proteases/peptidases), I know from firsthand embarrassing experience knowledge that it’s easy to mix up the terminology. So here’s a quick guide to exonuclease (DNA/RNA chewers), endonucleases (DNA/RNA scissors), exopeptidases (protein chewers) and endoproteases (protein scissors) including what makes different ones special and how we use them in the lab. 

First off the “endo” vs. “exo” – “ends” DOES NOT cut ends. It cuts in the “middle” of a sequence – you can remember this by thinking of an endoscopy. If you have an endoscopy your doctor sticks a tube with a camera *inside* you to look at your insides. If you use an endo-something-ase, you cut that something *inside* of its sequence. On the other side of things, when scientists look for exoplanets, they’re looking for planets on the “outskirts.” When scientists use exo-something-ases they’re using enzymes (biochemical reaction mediators/speed-uppers) to chew that something from the ends. 

They all have different uses inside of your body and many have features that give them usefulness in the lab for different things – though the versions we use in the lab usually are bacterial or viral versions – companies make and purify a lot of it and then sell it to you (although if you’re in a protein biochemistry lab you can purify your own stocks of ones you use a lot and save a lot of $!)

So now let’s tackle the “nuclease” vs “protease/peptidase.” Nucleases cut nucleic acids which is the umbrella term for RNA (RiboNucleic Acid) and DNA (DeoxyriboNucleic Acid) – nucleases cut apart their nucleotide “letters”. Proteases/peptidases, meanwhile, cut apart protein letters (amino acids). Sometimes they’re referred to as proteases other times peptidases and most of the time the terms are basically used interchangeably. Proteins are just long chains of amino acid “letters” linked together through peptide bonds – many peptide bonds – thus polypeptides… that fold up & do cool stuff in your cells. So even if you start with something you’d consider a protein, chew it up enough and the little pieces you’re left with are just “peptides.” So “peptidase” is a broader term that can apply to any cutter of peptide bonds – whether in short peptides or full-on proteins.

You might have heard of “restriction enzymes” – those are endonucleases that recognize specific sequences of DNA & cleave them – they’re useful for things like molecular cloning (moving pieces of DNA from one place to another) or mystery-solving – you can use analytical digests for things like seeing if those sequences are present (maybe in one suspect but not another, or in patients with a disease but not those without) – restriction polymorph fragment analysis can be used to cut (or try to cut) DNA & see what pieces you end up with. more on that here: http://bit.ly/30Npa8o 

note: New England Biolabs (NEB) has basically cornered this market and they sell tons and tons of restriction enzymes – I picked up this giant chart of them at a conference…. There are tons of different ones that recognize & cut various sequences, and you can use a free (unrelated as far as I know) software called “WatCut” to find restriction enzyme cut sites in the sequences you’re interested. What’s really cool is that the software will tell you “silent mutations” you can make to the DNA to introduce cut sites without changing the resultant protein. 

You can do that because there’s some redundancy to the genetic code – basically if a cell wants to make a protein it first makes (and edits) a messenger RNA (mRNA) copy of the DNA gene containing the instructions for that protein. These instructions tell the protein-makers (ribosomes) in what order to link up amino acids, and they’re read as 3-nucleotide “words” called codons which “spell” the different amino acids (e.g. CAG spells the amino acid glutamine and “AAA” spells the amino acid lysine). When then ribosome makes a protein it does it by going along the mRNA and helping link together amino acids that transfer RNA (tRNA) brings it. On one end the tRNA is a 3-letter anticodon that complements the codon “spelling” the amino acid that the tRNA carries. 

The “redundancy” refers to the fact that it’s not only “AAA” that spells lysine – “AAG” does too. Some letters have multiple tRNAs and/or tRNAs with a little “flexibility” so they can recognize more than one anticodon. So if the DNA says “AAA,” you can change it to “AAG” without changing the amino acid that gets brought. But this does change the DNA sequence that a restriction enzyme would see, which can be a way to introduce or remove cut sites. (note: make sure you’re in the right “reading frame” when reading codons – where you start reading matters – thi sis not tha t vs. t his isn ott hat vs. th isi sno tth at)

Endoproteases are like restriction enzymes but for proteins not DNA – well, the site-specific ones at least. Endoproteases like TEV & HRV3C proteases recognize specific sequences of amino acids (protein letters) and cut them – useful for things like cleaving off affinity tags we attach to proteins to help us purify them). 

Different ones have a range of specificities – those 2 are super specific – which is what you need for things like cleaving off those affinity tags (most of the time with protein purification you’re trying to avoid proteases – you’re spending all your time trying to isolate and protect a precious protein so if you introduce scissors in there they better be really careful cutters!)

Some of the ones we work hard to avoid during protein purifications are the peptide “chewers” – exopeptidases which chew off the ends one amino acid at a time. The analogous enzymes for nucleic acids are called exonucleases. Both nucleic acids and proteins have “directionality” – a front end & an end end and because these 2 ends are different, they require different chewers. 

DNA & RNA letters (nucleotides) have a generic sugar-phosphate backbone (the sugar’s ribose in RNA & deoxyribose in DNA) and unique “nitrogenous bases” (often just referred to as “bases” – A, C, G, & T/U (T in DNA and U in RNA). A complements T/U and C complements G, so that’s how you get specific base-pairing between strands, but no matter what the sequence, you gotta start and end somewhere! The “front end” of DNA & RNA is called the 5’ end because it has a free 5’ phosphate(s) (phosphorus surrounded by oxygens) (‘ is pronounced “prime” and it just refers to the “address” on the sugar ring that that group is hooked up to). And the “end end” is the 3’ end, which has a free 3’ OH (hydroxyl). Well, usually that is – when endonucleases cut DNA or RNA, sometimes they cut so that the phosphate ends up on the 3’ end of one & the hydroxyl on the 5’ end of the other. 

Proteins also have ends. The front end of proteins is the N terminus. It gets this moniker because it has a free amino group, characterized by a Nitrogen (N) linked to hydrogens. At the end end is the C terminus, which has a Carboxyl group (C=O)-OH. Aminopeptidases chew from the N end and carboxypeptidases chew from the C end. And we say proteins go “from N to C” and nucleic acids go from 5’ to 3’ 

When we lyse (break open) cells to purify out the protein we’ve had them make for us, we add protease inhibitors to protect our protein. But protease inhibitors can also be used as antiviral drugs because lots of proteins, like HIV, make “polyproteins” – they make their proteins as a long chain of connected proteins with cleavage sites in between them & they make the specific endoprotease that recognizes those cleavage sites. Block that protease & the virus can’t separate their proteins.

Or… *use* that protease for other purposes – stick the cleavage site it recognizes somewhere you *want* to get cut, then add that matching protease to cut there instead. This is how we get things like TEV (TEV stands for Tobacco Etch Virus) protease & HRV3C protease (HRV stands for Human RhinoVirus (common cold)). note: PreScission protease is just a brand name for HRV3C protease attached to GST and His. (The GST tag on the protease lets you remove the tag & the protease from the cleaved protein at the same time using a glutathione column)

There’s no punctuation or spaces in in amino acids – so instead you have long chains of letters like


and since the ends are different, you can think of it like


different endoproteases can recognize specific words & cut after or in them – like “this” or “some” – so the sequence you get will depend on the endoprotease you use:

N-this-C + N-issomeprotein-C 


N-thisissome-C + N-protein-C

Because endoproteases recognize a specific sequence, you can put that sequence anywhere* and it will still be recognized & cut (this principle might remind you of how we can stick affinity tags or antibody tags on anything…)

sometimes, endoproteases cut *within* their recognition site (like cutting after the “o” in some) – N-thisisso-C + N-meprotein-C whereas others cut *after* the letters they recognize – N-thisissome-C + N-protein-C

This can be important to keep in mind when designing a cleavage site to cleave an affinity or fusion tag off your protein – if you use a cleavage site for HRV3C, you’ll have a couple extra letters left over because it cuts *inside* its recognition sequence. But if you use a cleavage site for TEV, which cuts *after* its recognition site, you can have a “scarless” cut

So, say you have 2 endoproteases that recognize “some” – you can design the gene to make the protein that has the sequence


if you use something that cuts after the e in “some” -> thisissome + MYPROTEIN

but if you use something that cuts after the o in “some” you’re left with some extra -> thisisso + mePROTEIN

One of the great things about being a protein biochemist is you can express and purify your own stocks of the ones you use a lot. For example, most of the protein constructs I use are designed in the format: StrepTag-SUMO-TEV cleavage site -my protein

The StrepTag is an affinity tag that I can use to bind to a streptactin column to help me purify it. more here: http://bit.ly/2XBu9eI

SUMO is a fusion partner that helps with solubility & promotes expression through a “foot in the door” technique that tricks the cells into making something they like first. Its the guy I told you about yesterday if you’ve been following along http://bit.ly/2SDBlUn 

The tag & fusion partner are useful, but only during the expression & purification parts, then they can interfere with things. The TEV site lets me cleave off the Strep-SUMO part leaving me with “natural” (unnaturally expressed) protein after the Strep & SUMO have played their roles.

I said that the sequence can be recognized “anywhere” – but, as usual, restrictions apply (no pun intended this time – remember restriction enzymes work on DNA not proteins anyway!). If a protein is folded in a way that “hides” the site, the enzyme can’t get to it and it won’t get cut.

Sometimes this can be annoying because it can prevent you from easily de-tagging your protein (if this is the case you might want to try switching the tag to the other end of the protein where it might be more accessible). Because accessibility varies from protein to protein, you might have to experiment to find out how much protease you have to add to he protein you’re working with add to get full tag cleavage.

But in other cases, this “can’t cut if hidden” property can be useful because it tells us what’s hidden. In a technique called limited proteolysis, you add a small amount of endoprotease to a protein and see where it gets cut – you take samples over time to see which areas are the most vulnerable (and therefore likely most accessible). 

There’s this really cool example of this from the lab I work in (that of Leemor Joshua-Tor at CSHL). They were studying an RNA-binding protein called Argonaute (Ago) that’s involved in RNA interference (RNAi) – it binds small RNAs (~22nt long) & uses them as guides to bind to & silence mRNA targets to regulate protein expression. more here: http://bit.ly/2R5soTe

When they did limited proteolysis on the RNA-free Ago, the protein got cut way more than when they added guide RNA, & this showed them that binding guide RNA causes Ago to “snap into shape” in a way that stabilizes the protein in a form that cut sites are less accessible. 

When you’re doing limited proteolysis, you don’t want to use a super specific cutter like TEV or HRV3C. Instead you want to use a more generic one that, instead of recognizing “words” recognizes single letters. You don’t want one that recognizes really common letters like “s” or “e” or else you’d get way too many pieces. Instead you want to recognize something common but not too common, like “p.” In mass spectrometry you use more generic proteases than you use with limited proteolysis – mass spec works by breaking a protein up into peptide fragments & measuring how big they are, so you want lots of fragments (but still not too many!)

A few common ones are ones that cut next to to the amino acids lysine (Lys, K) & arginine (Arg, R) in the case of the endoprotease trypsin, and next to aromatic amino acids like Tyrosine (Tyr, Y), Phenylalanine (Phe, F), and Tryptophan (Trp, W) in the case of the endoprotease chymotrypsin. Pepsin’s less predictable, but prefers to cleave next to aromatic amino acids & leucine. note: aromatic refers to having a special electron-stabilized ring. no room/time to go into it here but if you’re interested: http://bit.ly/2qzMRFi

You can quibble over the whole protease/peptidase naming, but regardless of what you call ‘em, there’s only really one “type” of “product” written in amino acids. But when it comes to nucleic acids, there ARE 2 distinctive kinds – DNA & RNA. And the difference in their names really does reflect  key differences in their make-up. They’re both nucleic acids (NA) and they hook together with the same type of bonds – so some nucleases can cut either. But turns out their “different part” is different enough that some nucleases will only cut one or the other. Their different parts are that RNA has Ribose as its sugar and DNA has Deoxyribose as its sugar (it has 1 fewer oxygen) – and RNA has the letter U instead of T. 

Instead of recognizing specific sequences, some nucleases recognize things like structural features and “you don’t belong here’s.” For example, RNase H recognizes DNA hybrids (1 strand DNA bound to 1 strand RNA) & cuts the RNA (but NOT the DNA). If you want to cut single-stranded regions around hybrids, go for S1 nuclease, which can cut up single-stranded DNA or RNA (though it prefers DNA) while leaving alone double-stranded regions – great for seeing where probes bind and stuff. 

If you want to cut DNA but Not RNA, go for DNaseI, which cuts DNA “non-specifically.” But if you want to protect the RNA, make sure that the DNaseI you use is “RNase free” – typically the RNases you’re most worried about are RNase I, which cuts single-stranded RNA “non-specifically” and RNase A, which cuts single-stranded RNA 3’ of C’s & U’s (and leaves a tell-tale signature of a 3’ phosphate & 5’ OH).

In the lab, non-specific endonucleases can be useful for removing nucleic acids during protein purification – they can really gunk things up. And similarly to how protein folding can hide cut sites in proteins, proteins being bound to nucleic acids can hide cut sites in those nucleic acids – In “footprinting” you add cutters & chewers that cut & chew off the ends but get stuck when they reach the protein – get rid of all those chewed up stuff, release the protein, & sequence the “protected” bits to see where the protein was bound. 

Before we looked at how that’s used for ribosomal footprinting to see where protein makers are and what they’re trying to make: http://bit.ly/2DTndP8

more on topics mentioned (& others) #365DaysOfScience All (with topics listed) 👉 http://bit.ly/2OllAB0 

Leave a Reply

Your email address will not be published.