Today I thought I’d go over some background and practical advice for things you can do if you ask a cell to make a protein for you and it refuses (or your protein ends up misfolded and clumped up in insoluble inclusion bodies).
note: I mashed up some old post stuff for the text so apologies for formatting and jumping around
another note: today’s post is pretty in-the-weeds, so if you have any confusion about recombinant protein expression, especially in bacteria, see yesterday’s post: http://bit.ly/bacterialproteinoverexpression
Proteins are chains of amino acids that fold up into functional (and beautiful) 3D shapes. The sequence of amino acids in the chain is determined by the gene. The gene (DNA form) is copied into an RNA version (mRNA) (transcription) then that mRNA is used as instructions to link together the right amino acids. When we express proteins recombinantly, we stick the DNA instructions for a protein into an easy-to-work-w/circular piece of DNA called a plasmid vector (this is the recombining part) then stick that into cells (often harmless bacteria or insect cells) to make the protein for us (this is the expression part). http://bit.ly/proteincleaning
We’re in control of the DNA we put in there, which means we’re in control of the sequence of amino acid protein letters in the protein that gets made (we can even add on extras like affinity tags and fusion partners). But the expression cells are in control of whether that protein actually gets made – and made correctly (though we have ways to try to coax them to).
When you go to purify the protein (assuming you’re purifying a non-secreted, non-membrane protein (i.e. a soluble cytoplasmic protein)), you break open the cells (lyse them), spin that lysate super fast in a centrifuge to pellet out the insoluble stuff (membrane bits, etc.), and then purify the protein of interest from the liquid part (supernatant) containing the soluble things. If you’re trying to purify a protein and you “can’t find it” there are a few possible culprits
- the protein wasn’t expressed (the mRNA didn’t get translated)
- the protein expressed but not “properly” so it’s hidden with the insoluble membrane gunk you pelleted out
- the protein was expressed but then degraded
- your ID method is flawed (problem with the purification if that’s how you’re checking or antibodies if you’re checking via western blot)
Sometimes, of course you have a combination of these things, especially because if they’re expressed “badly” they often get degraded.
The solubility issue is a big thing. During translation, the amino acids are linked (w/help of ribosomes) 1 at a time to the end of a growing chain, going from N terminus to C-terminus. The protein chain starts folding as it emerges from the ribosome’s “chimney,” sometimes with the help of proteins called chaperones. *Correct* folding of the protein is really important for solubility! Solubility is whether each molecule of something is fully coated in water. Properly folded proteins should be soluble (or embedded in a lipid membrane if they’re a membrane protein). Proteins are able to be soluble because they fold so that parts of the protein that like to interact with water (are hydrophilic) are on the outside & parts that water doesn’t like (are hydrophobic) are sequestered away in the center. more on solubility (something we crystallographers think about a lot).
If a protein doesn’t fold properly, those water-hating hydrophobic parts get exposed to water & panic – so they stick to the hydrophobic parts of other misfolded proteins (because that’s better than being next to water) -> they clump up & aggregate. This is a helpful way to think about it, although technically, it’s more like the water molecules don’t like them so the water molecules do whatever they can to link to each other instead or your protein, which forces the hydrophobic protein parts together – this the hydrophobic exclusion effect and you can learn more about it here: http://bit.ly/hydrophobesarenotafraid
Sometimes, especially when you try expressing a non-bacterial protein in bacteria, your protein folds incorrectly and it ends up in blobs of insoluble gunk called inclusion bodies. These inclusion bodies consist mainly of aggregated (clumped-together misfolded protein). This can especially tend to happen when the cells make too much of the protein too quickly, so you might be able to avoid them forming altogether by using less “aggressive” expression techniques as I’ll get into more later in the post (e.g. lower the temp, express for a shorter period, induce with less IPTG).
I will go into some tips and tricks for things you can do to try to fix what’s wrong – but it’s important to first know what’s wrong to need to fix!
Sometimes, the first sign of trouble is when nothing comes off of your affinity chromatography column (that tube of beads you used to stick specifically to your protein). In this case, what I normally do is run an SDS-PAGE gel of the lysate (broken-open cell gunk), the pellet from centrifugation, the supernatant (liquid part) from the centrifugation which was your column input, the flow-through from the column (before you started washing), the stuff that came off (eluted) from the column during the washes, and the stuff that came off during the elution step (when you added the competitor to push your protein off). If your protein expressed really strongly you may be able to see a band for it (or at least something similar in size). To really know if it’s your protein, you’ll need to do a western blot using specific antibodies to probe for it (this should work even if you had low expression) or cut out that band and send it for mass spectrometry (mass spec) to see what’s there. http://bit.ly/westernblotworkflow
If the protein was expressed, it will at least be in the lysate fraction. If the protein was in one of the non-pellet fractions, your protein got expressed and was soluble, so the problem’s something with your purification. If it’s in the pellet fraction, there’s a problem with solubility. If the protein isn’t even in the lysate there’s a problem with expression and/or degradation. If the protein’s getting degraded, antibodies might not recognize it, or they might mark lower-size bands than you’re expecting. Those bands could represent degradation or truncation products (the ribosome not making it all the way through the protein, which could also impact solubility). If you have antibodies against a tag or fusion partner you have on your protein and your protein, you can try doing a western blot and probing for both the tag and the protein to see if truncated versions are being made, such as just the fusion partner alone. Degradation can occur in the cell, or after lysis. When I lyse cells, I include a protease inhibitor to prevent post-lysis degradation. http://bit.ly/proteaseinhibitors
If you’re troubleshooting a lot of (non-purification problem related) things, you don’t want to have to go through the whole purification thing in order to see if your protein got made well. Thankfully you don’t have to!
If you’re testing for expression (soluble or not), what you can do is take ~1mL of your expression mixture (the cells in media) (pre- and post-induction for comparison if applicable), pellet the cells out by centrifuging it at top speed for ~10min and pouring of the liquid media. If you’re testing for solubility problems, you need to lyse your cells to separate soluble from insoluble and then take a little glob of the insoluble. You can’t run gunk through an SDS-PAGE (it’ll just get stuck in the wells). So, to these, what I normally do is:
1) add 50uL 1X SDS loading buffer (this has a detergent so it will help small-scale lyse things but it will also solubilize things so you won’t be able to tell if your protein was soluble)
2) boil 10-15min
3) centrifuge briefly
4) load supernatant (~5uL)
So, hopefully you now have a better idea of what the problem is. Now what can you do to prevent or fix it? Some things you can experiment with that might help are:
- temperature (e.g. for bacteria, try lowering the temperature to 16°C before inducing expression, and then let it express overnight instead of a shorter period at higher temp)
- media (the cell food) – for bacteria, maybe try TB instead of LB http://bit.ly/bacterialmedia
- expression time – how long are you letting the cells grow after telling them to make your protein?
- expression “strength” – if you’re inducing expression, such as with IPTG, you may want to try lowering the IPTG concentration so you don’t overwhelm the cells http://bit.ly/bacoverexpression
- codon usage – each amino acid is spelled for by one or more 3-letter RNA words called codons. These codons have corresponding transfer RNAs (tRNAs) which have a complementary “anticodon” and bring the matching amino acid to the ribosome. Some amino acids have multiple codons and multiple tRNAs. Sometimes, certain codons are “more popular” in certain organisms (kinda like how in the US people usually write “gray” but in the UK they spell it “grey”). Therefore, organisms have different ratios of tRNAs in stock and if you change your spellings to match the cells’ preferences, you might have better luck with expression. We call this “codon optimization” and you can learn more about it here: http://bit.ly/codonbias
- the vector (e.g. plasmid), especially which promoter is used (that’s the sequence that helps determine when and how much mRNA gets made)
- the cell line – different cells have different machinery and you might need to experiment, even within a certain “expression system.” For example, there are lots of different strains of bacteria you can use for expression and some have bells and whistles that might help in various circumstances, such as if your protein is toxic to certain cells
- cutting off regions of the protein that are predicted to be disordered (such as long loosey-goosey tails). Such “truncations” can help with expression, but remember that you’re removing part of the protein and even though it’s disordered, it might be important for what you want to test – so chop cautiously and, if you do make a truncation, you should, if at all possible, show that your truncated version acts “identically” to the full-length version in experiments (but of course, to do this, you need to have at least some full-length to compare to! so you might just need to show that the protein is “functional”)
- what tag are you using, and where are you putting it? (N or C terminus?) You may need to try playing around
- also, check your construct for hidden, unwanted mutations. These can sometimes arise during cloning so sequence the DNA you put in to make sure there aren’t any typos http://bit.ly/sequenceclones
If your protein ends up in inclusion bodies, all may not be lost. Some proteins can be rescued from them and refolded, but not all proteins are cool with that. Here’s a link to some more information https://www.sigmaaldrich.com/technical-documents/protocols/biology/affinity-chromatography-tagged-proteins/handling-inclusion-bodies.html
Another way to promote proper folding is by the DNA instructions to add a fusion protein onto the end. Fusion partners are small proteins you tack onto the end of your protein to help your protein get made in the first place and stay happy. It’s kinda like a “foot in the door” marketing technique – bribe cells to start making something they like making, and then trick them into making your protein that you’ve attached to that thing they like. They fold really quickly & help your protein fold properly & stay soluble & safe.
properties of a good fusion partner:
✔️ expresses easily
✔️ folds quickly
✔️ stays soluble
Some fusion partners include maltose-binding protein (MBP), glutathione S-transferase (GST), thioredoxin (TRX), NUS A, and ubiquitin (UB). But the one I use the most is SUMO. GST is used a lot and it’s cool in that it can also serve as an affinity tag (more here: http://bit.ly/2LoRzOg ) but I don’t like to use it because it also has affinity for itself – it pairs up to form dimers. So instead I use SUMO. SUMO stands for Small Ubiquitin-like MOdifier. SUMO is 100 amino acids (protein building blocks) long (~8 kDa) but for some reason it runs as if it were ~20 kDa in an SDS-PAGE gel (a way to separate proteins by size & visualize them. more here: http://bit.ly/sdspageruler It has a compact, globular core with flexible ends – it comes with its own linker! a bonus note on SUMO at the end (no pun intended that time!) But how do SUMO and other fusion partners “work”?
Fusion tags help the proteins fold properly through a few different mechanisms:
- because they’re soluble, & they’re made first, they help keep the protein soluble as it folds (the SUMO wants to stay surrounded with water & it drags your protein with it – harder for your protein to aggregate so it decides to try hiding its hydrophobic parts by refolding instead)
- chaperoning – they bind to aggregation-prone folding intermediates (earlier-translated parts still waiting for their partners to emerge from the ribosome) and prevent their self-association – it also might recruit other chaperones present in the cells to help out
- repelling other molecules &/or “hiding your protein” giving your protein more personal space to fold properly without distractions
Fusion tags can also protect against degradation such as by taking your protein to “safer” parts of the cell like the nucleus, where there are fewer proteases (protein-cutting enzymes) to worry about. And they can also enhance expression, potentially because the fusion part is so efficiently translated – it gives your protein a sort of translational “boost”
If none of that works you may need to consider using a different expression system: http://bit.ly/2G5N5tY
Bacterial expression is cheapest & easiest (if it works) but bacteria don’t have all the same protein-processing machinery as our cells. If you have a protein that’s more complex (needs special chaperones, gets modified post-translationally with things like glycosylation (sugar addition), etc,) you might need to use more “complex cells.” A step up complexity-wise is yeast cells. I’ve never used them, but I’ve heard they’re really hard to lyse (break open to get your protein out). What I used the most for expression are insect cells (e.g. Sf9, Sf21, S2, High-5). http://bit.ly/bevsinsect Sometimes even those won’t do the trick, so you have to turn to mammalian cell lines like CHO, HEK, or COS. The technology for these is getting better, so it’s easier and more high-yield-y than it used to be, but it’s still usually really expensive and you often don’t get much protein (but if it’s properly-made it’s way more useful than a ton of poorly-made!).