I hope this protein expressed well and solubly! Thankfully, that’s more likely since I used a fusion partner! Fusion partners like SUMO, GST, or MBP, are little proteins that cells love to make that you tag on to the end of the protein you’re trying to get the cells to make, a bit like a “foot in the door” marketing campaign for recombinant protein expression! Fusion partners can help your protein of interest express itself, protect it from harm, & stick with it as they try to find their “best self” (fold properly). They’re one, but not the only, way to try to improve the expression of soluble protein, so today I thought I’d go over some background and practical advice for things you can do if you ask a cell to make a protein for you and it refuses (or your protein ends up misfolded and clumped up in insoluble inclusion bodies). The post has an emphasis on fusion partners, but I also go into other tips and tricks to try.
My last protein purification was part of epic protein purification week (5 days, 6 protein purifications, >60L worth of insect cells, not enough time for coffee…). Today I’m just purifying single protein, 4L, but it’s a truncation construct I’ve never tried before so it’s more nerve-wracking! Especially when not much protein came off the column… Seems it didn’t express very well (though it wasn’t in the pellet – yay!) Hopefully it will be enough! (especially after all the columns…) Let me back up a bit before I lose anyone…
Proteins are chains of amino acids that fold up into functional (and beautiful) 3D shapes. The sequence of amino acids in the chain is determined by the gene. The gene (DNA form) is copied into an RNA version (mRNA) (transcription) then that mRNA is used as instructions to link together the right amino acids. When we express proteins recombinantly, we stick the DNA instructions for a protein into an easy-to-work-w/circular piece of DNA called a plasmid vector (this is the recombining part) then stick that into cells (often harmless bacteria or insect cells) to make the protein for us (this is the expression part). http://bit.ly/proteincleaning
We’re in control of the DNA we put in there, which means we’re in control of the sequence of amino acid protein letters in the protein that gets made (we can even add on extras like affinity tags and fusion partners). But the expression cells are in control of whether that protein actually gets made – and made correctly (though we have ways to try to coax them to).
When you go to purify the protein (assuming you’re purifying a non-secreted, non-membrane protein (i.e. a soluble cytoplasmic protein)), you break open the cells (lyse them), spin that lysate super fast in a centrifuge to pellet out the insoluble stuff (membrane bits, etc.), and then purify the protein of interest from the liquid part (supernatant) containing the soluble things. If you’re trying to purify a protein and you “can’t find it” there are a few possible culprits
- the protein wasn’t expressed (the mRNA didn’t get translated)
- the protein expressed but not “properly” so it’s hidden with the insoluble membrane gunk you pelleted out
- the protein was expressed but then degraded
- your ID method is flawed (problem with the purification if that’s how you’re checking or antibodies if you’re checking via western blot)
Later in the post I will get into how to check for these various things (and hopefully prevent or correct the problems). But for now let’s talk mostly about the solubility issue.
During translation, the amino acids are linked (w/help of ribosomes) 1 at a time to the end of a growing chain, going from N terminus to C-terminus. The protein chain starts folding as it emerges from the ribosome’s “chimney,” sometimes with the help of proteins called chaperones. *Correct* folding of the protein is really important for solubility! Solubility is whether each molecule of something is fully coated in water. Properly folded proteins should be soluble (or embedded in a lipid membrane if they’re a membrane protein). Proteins are able to be soluble because they fold so that parts of the protein that like to interact with water (are hydrophilic) are on the outside & parts that don’t like water (are hydrophobic) are sequestered away in the center. more on solubility (something we crystallographers think about a lot): http://bit.ly/2SJrdGR
If a protein doesn’t fold properly, those water-hating hydrophobic parts get exposed to water & panic – so they stick to the hydrophobic parts of other misfolded proteins (because that’s better than being next to water) -> they clump up & aggregate. This is a helpful way to think about it, although technically, it’s more like the water molecules don’t like them so the water molecules do whatever they can to link to each other instead or your protein, which forces the hydrophobic protein parts together – this the hydrophobic exclusion effect and you can learn more about it here: http://bit.ly/hydrophobesarenotafraid
Sometimes, especially when you try expressing a non-bacterial protein in bacteria, your protein folds incorrectly and it ends up in blobs of insoluble gunk called inclusion bodies. These inclusion bodies consist mainly of aggregated (clumped-together misfolded protein). This can especially tend to happen when the cells make too much of the protein too quickly, so you might be able to avoid them forming altogether by using less “aggressive” expression techniques as I’ll get into more later in the post (e.g. lower the temp, express for a shorter period, induce with less IPTG).
If your protein ends up in inclusion bodies, all may not be lost. Some proteins can be rescued from them and refolded, but not all proteins are cool with that. Here’s a link to some more information https://www.sigmaaldrich.com/technical-documents/protocols/biology/affinity-chromatography-tagged-proteins/handling-inclusion-bodies.html
Another way to promote proper folding is by the DNA instructions to add a fusion protein onto the end.
Since we’re putting in the DNA instructions for our protein, we can add extra DNA letters to add extra amino acids to the end(s) of the protein. When it comes to adding things onto the ends of proteins, you might be more familiar with affinity tags like His or Strep tags. Affinity tags give your protein something un-naturally unique so that you can get it to bind something that none of the other stuff will. In affinity chromatography you take a mix of proteins and other cellular content and flow it through a resin (little beads) coated with something the affinity tag has specific affinity for – (e.g. streptavadin for strep-tag or glutathione for GST). Tagged proteins stick, untagged don’t, then you add a competitor to push your protein off. http://bit.ly/streptag
properties of a good affinity tag
✔️ binds something that can be attached to little beads (resin)
✔️ that binding’s reversible
✔️ that binding’s specific (only tagged proteins bind it)
✔️ that binding’s high-affinity (your protein sticks & stays stuck until you’re ready to remove it after washing the other stuff off)
To make sure the tag doesn’t interfere with our protein, we can tether it with a flexible linker and we can design the linker to have a sequence that a site-specific endoprotease (protein scissors) will recognize and cut at so we can cut the tag off after it’s served its purpose. https://bit.ly/proteasess
That’s all great, BUT you can’t purify protein if you don’t have any, and sometimes your protein needs some help being made properly, Fusion partners are small proteins you tack onto the end of your protein to help your protein get made in the first place and stay happy. It’s kinda like a “foot in the door” marketing technique – bribe cells to start making something they like making, and then trick them into making your protein that you’ve attached to that thing they like. They fold really quickly & help your protein fold properly & stay soluble & safe.
properties of a good fusion partner:
✔️ expresses easily
✔️ folds quickly
✔️ stays soluble
Some fusion partners include maltose-binding protein (MBP), glutathione S-transferase (GST), thioredoxin (TRX), NUS A, and ubiquitin (UB). But the one I use the most is SUMO. GST is used a lot and it’s cool in that it can also serve as an affinity tag (more here: http://bit.ly/2LoRzOg ) but I don’t like to use it because it also has affinity for itself – it pairs up to form dimers. So instead I use SUMO. SUMO stands for Small Ubiquitin-like MOdifier. SUMO is 100 amino acids (protein building blocks) long (~8 kDa) but for some reason it runs as if it were ~20 kDa in an SDS-PAGE gel (a way to separate proteins by size & visualize them. more here: http://bit.ly/sdspageruler It has a compact, globular core with flexible ends – it comes with its own linker! a bonus note on SUMO at the end (no pun intended that time!) But how do SUMO and other fusion partners “work”?
Fusion tags help the proteins fold properly through a few different mechanisms:
🔹 because they’re soluble, & they’re made first, they help keep the protein soluble as it folds (the SUMO wants to stay surrounded with water & it drags your protein with it – harder for your protein to aggregate so it decides to try hiding its hydrophobic parts by refolding instead)
🔹 chaperoning – they bind to aggregation-prone folding intermediates (earlier-translated parts still waiting for their partners to emerge from the ribosome) and prevent their self-association – it also might recruit other chaperones present in the cells to help out
🔹 repelling other molecules &/or “hiding your protein” giving your protein more personal space to fold properly without distractions
Fusion tags can also protect against degradation such as by taking your protein to “safer” parts of the cell like the nucleus, where there are fewer proteases (protein-cutting enzymes) to worry about. And they can also enhance expression, potentially because the fusion part is so efficiently translated – it gives your protein a sort of translational “boost”
If you’re having trouble with expression or solubility and fusion partners don’t do the trick, you might need to try some other tricks I’ll mention. But before you jump into trying all these different strategies, you want to make sure you’re optimizing the right aspect, so you want to know if your protein: isn’t being expressed at all; is being expressed partway, is being expressed full-way but misfolded; or is being expressed but degraded. Sometimes, of course you have a combination of these things, especially because if they’re expressed “badly” they often get degraded.
Sometimes, the first sign of trouble is when nothing comes off of your affinity chromatography column (that tube of beads you used to stick specifically to your protein). In this case, what I normally do is run an SDS-PAGE gel of the lysate (broken-open cell gunk), the pellet from centrifugation, the supernatant (liquid part) from the centrifugation which was your column input, the flow-through from the column (before you started washing), the stuff that came off (eluted) from the column during the washes, and the stuff that came off during the elution step (when you added the competitor to push your protein off). If your protein expressed really strongly you may be able to see a band for it (or at least something similar in size). To really know if it’s your protein, you’ll need to do a western blot using specific antibodies to probe for it (this should work even if you had low expression) or cut out that band and send it for mass spectrometry (mass spec) to see what’s there. http://bit.ly/westernblotworkflow
If the protein was expressed, it will at least be in the lysate fraction. If the protein was in one of the non-pellet fractions, your protein got expressed and was soluble, so the problem’s something with your purification. If it’s in the pellet fraction, there’s a problem with solubility. If the protein isn’t even in the lysate there’s a problem with expression and/or degradation. If the protein’s getting degraded, antibodies might not recognize it, or they might mark lower-size bands than you’re expecting. Those bands could represent degradation or truncation products (the ribosome not making it all the way through the protein, which could also impact solubility). If you have antibodies against a tag or fusion partner you have on your protein and your protein, you can try doing a western blot and probing for both the tag and the protein to see if truncated versions are being made, such as just the fusion partner alone. Degradation can occur in the cell, or after lysis. When I lyse cells, I include a protease inhibitor to prevent post-lysis degradation. http://bit.ly/proteaseinhibitors
If you’re troubleshooting a lot of (non-purification problem related) things, you don’t want to have to go through the whole purification thing in order to see if your protein got made well. Thankfully you don’t have to!
If you’re testing for expression (soluble or not), what you can do is take ~1mL of your expression mixture (the cells in media) (pre- and post-induction for comparison if applicable), pellet the cells out by centrifuging it at top speed for ~10min and pouring of the liquid media. If you’re testing for solubility problems, you need to lyse your cells to separate soluble from insoluble and then take a little glob of the insoluble. You can’t run gunk through an SDS-PAGE (it’ll just get stuck in the wells). So, to these, what I normally do is:
- add 50uL 1X SDS loading buffer (this has a detergent so it will help small-scale lyse things but it will also solubilize things so you won’t be able to tell if your protein was soluble)
- boil 10-15min
- centrifuge briefly
- load supernatant (~5uL)
So, hopefully you now have a better idea of what the problem is. Now what can you do to prevent or fix it? Some things you can experiment with that might help are:
- temperature (e.g. for bacteria, try lowering the temperature to 16°C before inducing expression, and then let it express overnight instead of a shorter period at higher temp)
- media (the cell food) – for bacteria, maybe try TB instead of LB http://bit.ly/bacterialmedia
- expression time – how long are you letting the cells grow after telling them to make your protein?
- expression “strength” – if you’re inducing expression, such as with IPTG, you may want to try lowering the IPTG concentration so you don’t overwhelm the cells http://bit.ly/bacoverexpression
- codon usage – each amino acid is spelled for by one or more 3-letter RNA words called codons. These codons have corresponding transfer RNAs (tRNAs) which have a complementary “anticodon” and bring the matching amino acid to the ribosome. Some amino acids have multiple codons and multiple tRNAs. Sometimes, certain codons are “more popular” in certain organisms (kinda like how in the US people usually write “gray” but in the UK they spell it “grey”). Therefore, organisms have different ratios of tRNAs in stock and if you change your spellings to match the cells’ preferences, you might have better luck with expression. We call this “codon optimization” and you can learn more about it here: http://bit.ly/codonbias
- the vector (e.g. plasmid), especially which promoter is used (that’s the sequence that helps determine when and how much mRNA gets made)
- the cell line – different cells have different machinery and you might need to experiment, even within a certain “expression system.” For example, there are lots of different strains of bacteria you can use for expression and some have bells and whistles that might help in various circumstances, such as if your protein is toxic to certain cells
- cutting off regions of the protein that are predicted to be disordered (such as long loosey-goosey tails). Such “truncations” can help with expression, but remember that you’re removing part of the protein and even though it’s disordered, it might be important for what you want to test – so chop cautiously and, if you do make a truncation, you should, if at all possible, show that your truncated version acts “identically” to the full-length version in experiments (but of course, to do this, you need to have at least some full-length to compare to! so you might just need to show that the protein is “functional”)
- if you’re not using a fusion partner, what tag are you using, and where are you putting it? (N or C terminus?) You may need to try playing around
- also, check your construct for hidden, unwanted mutations. These can sometimes arise during cloning so sequence the DNA you put in to make sure there aren’t any typos http://bit.ly/sequenceclones
If none of that works you may need to consider using a different expression system: http://bit.ly/2G5N5tY
Bacterial expression is cheapest & easiest (if it works) but bacteria don’t have all the same protein-processing machinery as our cells. If you have a protein that’s more complex (needs special chaperones, gets modified post-translationally with things like glycosylation (sugar addition), etc,) you might need to use more “complex cells.” A step up complexity-wise is yeast cells. I’ve never used them, but I’ve heard they’re really hard to lyse (break open to get your protein out). What I used the most for expression are insect cells (e.g. Sf9, Sf21, S2, High-5). http://bit.ly/bevsinsect Sometimes even those won’t do the trick, so you have to turn to mammalian cell lines like CHO, HEK, or COS. The technology for these is getting better, so it’s easier and more high-yield-y than it used to be, but it’s still usually really expensive and you often don’t get much protein (but if it’s properly-made it’s way more useful than a ton of poorly-made!).
GOOD LUCK EVERYONE!
Now, as promised, SUMOre on SUMO. SUMO’s actually a natural thing – Your cells use SUMO attachment (SUMOylation) as a way to “flag” proteins for things like transport to different parts of the cell – In eukaryotic cells (most things except bacteria), SUMO can be added naturally as a POST-TRANSLATIONAL modification – this means that the instructions for it are NOT in the gene for the protein. It’s added AFTER the protein is made (translated). (a more familiar form of post-translational modification is phosphorylation, which we discussed here: http://bit.ly/threoninetale)
When it’s added post-translationally, it can be at various locations throughout the protein (it likes certain Lyses) and it’s not on *all* the protein. We want to make sure ALL of our protein gets the SUMO tag where we want it, so we attach the sequence for SUMO to the sequence for our protein in the plasmid -now SUMO is added DURING translation so whenever our protein’s made, it’s on there. And since it’s made first, it’s more like every time SUMO’s made, our protein’s on there – which is great since the cells like making sum o that SUMO!
Another great thing about SUMO is that it will leave if you ask it to – there are SUMO-specific proteases that you can add to cleave it off. And instead of just recognizing an amino acid sequence (which could, if you’re unlucky also occur in your protein) it also recognizes the shape of SUMO so it will only cut where it’s supposed to. And it doesn’t leave anything behind.
#365DaysOfScience All (with topics listed) 👉 http://bit.ly/2OllAB0