In recombinant protein engineering and expression in bacteria, where you’re getting  bacterial cells to make a protein you’re interested in, you edit circular pieces of DNA called plasmid vectors to contain the genetic instructions for the protein you want to study, then you stick that plasmid into bacteria in a process called transformation. But what type of bacteria? We can transform different types of cells depending on what our goals are. Broadly speaking, there are “cloning cells” and “expression cells” – we use “cloning cells” like DH5α when we want to get lots of copies of the DNA (not the protein) and “expression cells” (like BL21(DE3)) when we care more about getting protein than DNA. So let’s discuss the DH5-alpha-bet soup!

*note: this post might seem random but I think it hopefully answers a question someone asked me

*note 2: technically-speaking, the genetic instructions we’re putting in as the insert aren’t the “gene” – instead, they’re the cDNA (complementary DNA) which is the DNA version of the messenger RNA (mRNA) version of a gene. Basically, it’s an edited version of the gene which has the regulatory information removed. much more here:

but it’s easier to just talk in terms of “gene”

There are 2 main stages to recombinant protein making. In the first stage, molecular cloning, you’re sticking that “gene” into a plasmid, using bacteria to make lots of copies of it, and checking that you got it right. In the second stage, expression, you’re sticking that plasmid that you made into different bacteria and asking them to make lots of the protein for you. You often want to use different bacterial strains for these different stages. 

molecular cloning stage: 

There are different molecular cloning techniques, including cutting the parent vector open with restriction enzymes and pasting your gene in with DNA ligase or using PCR-based methods like SLIC. more here: 

At this stage, you want the bacteria to 1) fill in any gaps that are left over (if you’re using SLIC cloning) 2) make more copies *without changing it at all!* You already did the changing you want and you don’t want the bacteria adding changes you don’t want. Your key concern is stability – so you want cells that have disabled “changing machinery.” And, ideally, we’d like them to be “easy to get into.”  Turns out DH5α fits the bill!

DH5α is a type of e. coli. In their bacterial lineage, the first ancestor we have records of for these guys is “K-12” which came from the “wild” – it was isolated from a diphtheria patient’s poop in 1922 at Stanford. We call such naturally-occurring strains “wild-type

Leave them alone and over time, bacteria develop mutations randomly, but scientists can “speed things up” by blasting them with radiation or adding chemicals call mutagens that do things like make breaks in the DNA and/or keep the bacteria’s fixing machinery from working. 

And scientists have done this to K-12 lots because, even though it came from a patient, it lacks virulence genes so it “can’t” hurt us – it’s considered nonpathogenic.

So you take K-12, then mutate, mutate, mutate, and you get to another important ancestor, MG1655, which gave rise to DH5α and its cousin DH10b (aka TOP10 if you want to get all commercial)

The “DH” comes from the name of its isolator, Douglas Hanahan. I’m guessing the 5’s cuz he isolated a bunch of different strains and this happened to be the “5th” in some experiment. And the alpha’s cuz it has the alpha complementation allele φ80lacZΔM15, which allows for blue-white screening (more here

There are lots of strains with many names – remembering them all can be a pain. Especially since the name’s not really what we care about, we care about what they can do for us. And what they can do for us depends on what their “genotype” is – what genes does it have? are they functional? We can use a sort of “shorthand notation” to describe the key genetic features of a particular strain. 

Genes usually have 3 letter abbreviations, often followed by a number. We use a delta (Δ) if some gene’s been deleted; And a +/- if it has or doesn’t have some trait; and an “r” for resistance (e.g. to some antibiotic).

The mutations that make DH5α great for us are:

  • recA1 – it has a mutation in the gene for the protein recA which is needed for some types of homologous recombination, which would change up the DNA sequences which we don’t want. (Thankfully there are other recombination mechanisms that will finish our SLICs but won’t mess up other stuff)
  • endA1 – it has a  mutation in the gene for endonuclease 1, a nonspecific DNA chewer so the DNA is more stable
    • endA1 strains come thanks to Hoffman-Berling, who took a bunch of bacteria and added a chemical mutagen to speed up mutations, then he tested the mutated cell’s nucleic-acid chewing activity and isolated some strains that had lost their generic DNA chewingness
  • relA1 – alters the membrane composition so its easier to get into & “relaxes” the cell by removing its ability to “stringently respond” to low amino acid (protein building block) levels by halting RNA production -> relA1 allows the cell to keep transcribing RNA even if it can’t make protein from it yet
  • φ80lacZΔM15 – provides the beta peptide that complements the alpha peptide in blue/white screening (if your plasmid has the alpha peptide, it can complement this and make a functional B-galactosidase gene that can make a white lactose mimic blue)

There are other mutations too, including hsdR17(rK–, mK+) which allows for KI methylation but not degradation; gyrA96, which gives it nalidixic acid resistance; thi-1 which makes it a thiamine auxotroph (can’t make its own thiamine); Δ(lacZYA-argF)U169 – a deletion that leads to increased hydrogen peroxide resistance; and others

It’s not that we need *all* of these mutation, but as long as they don’t get in the way, we’ll let them stay!

And, after all the mutagenesis these cells have gone through, there are tons more that just haven’t shown to be “interesting” yet – but a lot of mutations are only discovered under conditions when that protein’s needed – like, you wouldn’t know that there’s a mutation in the thiamine metabolism gene if you always grew them with plenty of thiamine. So a lot of the “new” mutations scientists discover have likely been there (or something close) for quite a while. 

All this is housed in a single, circular chromosome with ~4.6million DNA letters & ~4 and a half thousand genes.

When you stick your plasmid into the bacteria, copies of the plasmid will be housed extra-chromosomally – they will stay separate from the bacteria but get copied and passed along when the bacteria divides. How many copies? That depends on the plasmid. Some plasmids have a a “relaxed” origin of replication initiation (ORI) (the sequence in the DNA that tells DNA Polymerase to unzip the double-stranded DNA & copy each to give you an identical copy. Plasmids have their own ORI so they can replicate independently of the host (the host only replicates right before dividing, and the plasmids don’t want to wait). So host cells can hold lots of copies of the plasmid, and the average # of plasmid copies per cell (copy number ) depends on the plasmid, and especially on its ORI. Some produce lots of copies, others just a few because their ORIs are regulated differently -> “relaxed” will give you lots of copies, “stringent” just a few

It comes down to a balance between + regulatory factors (make more copies!) and – regulatory factors (make less copies!) which depends on the sequence of the ORI & what factors like to bind it. And those regulatory factors can be really picky! Just 2 mutations in the pMB1 ORI gives you the pUC ORI which makes ~700 copies/cell as opposed to 20.

But let’s get back to the cells. After you’ve done your cloning, you stick the plasmid into cells like DH5α for them to make lots of copies of it. You might be more familiar with using PCR (polymerase chain reaction) to make copies, but, unlike PCR which is great for copying short stretches of linear DNA, bacteria are really good at making copies of long, sealed circles of DNA. Which is good because you need to make a lot of the plasmid so that you can check it (e.g. with sequencing, colony PCR, and/or analytical restriction digest) and then, if all’s okay, stick it into the expression cells. 

Before you stick it into those other cells, you need to purify it so that 1) you don’t just stick one bacterium’s genome into another and 2) you stick in LOTS of plasmid DNA so you have the best chance of some of it actually getting in.

Purification is frequently done with “miniprep” kits which use alkaline lysis to break open the cells and separate the plasmid from the larger genomic DNA. more here:

So now let’s briefly discuss expression cells, where the key is to make lots of protein. Often, for bacterial expression we use inducible expression using the T7 polymerase system. Basically, T7 is a bacteriophage (phage), a virus that infects bacteria. It makes its own RNA polymerase that recognizes its own special promoter sequence and makes lots of mRNA copies of the genes following that sequence. And then the bacterial ribosomes make lots of copies of the corresponding protein. Because T7’s so active & exclusive it can easily swamp out the bacterial mRNA, So, if we put a T7 PROMOTER before our gene, a T7 TERMINATOR after it & give it some T7 Pol, we can get bacteria to OVEREXPRESS our protein. With bacterial overexpression, you get the bacteria to devote almost all their resources to expressing your gene –  after just a few hours over ½ of all protein in the cell could be yours. ⠀

BUT because the bacterial cells are devoting themselves to making our protein, they’re neglecting their own needs – including reproduction – that reason why bacteria are so useful in the lab (well, one of many reasons) is that their population booms rapidly because it doesn’t take them long to copy all their DNA (DNA replication) then split in half, giving each new cell a copy. That takes a lot of energy and resources, which the bacteria doesn’t have if it’s devoting itself to T7 protein-making. T7 doesn’t care about this, but we *do*, because we need to be able to grow the cells to get enough cells to express lots of our protein. ⠀

One way to do this is to just not give it T7 Pol – that special polymerase that makes the RNA copies of the T7 genes (which ribosomes use to make T7 proteins) or anything that “looks” like a T7 gene because it’s under the control of a T7 promoter (like the gene we want to express). And in fact, if you look at a pET vector, you’ll see it does NOT have the T7 Pol gene. So how does our protein get made? Wasn’t the whole point of using the T7 promoter to make a lot of it?! Don’t worry – we still have the T7 Pol gene – we just keep it separate so we can activate it “on command”

We rely on the bacterial host DNA and NOT the plasmid DNA to provide T7 Pol. Bacteria don’t normally have this gene (it’s from a virus that wants to sabotage it, remember), but specific strains of bacteria have been designed so they DO. If we’re still in the cloning phase & only want to make more copies of the plasmid ⭕️ -> ⭕️⭕️⭕️… we can stick the plasmid in bacteria that don’t have the T7 Pol gene (strains like DH5α or TOP10). And then, when we want to express it, we stick it into bacteria that DO have it (like BL21(DE3)). 

BUT we still want more control – we want to be able to control when those bacteria that *have* the T7 Pol gene actually *make* T7 Pol. So we steal from another clever biological setup – the lac operon, to be able to control *when* we express the protein by adding IPTG to “derepress” the operon: 

This works because the T7 Pol gene is under the lac promoter’s control in the bacterial genome in a sequence that belongs to the DE3 prophage. Basically, it’s a bit of bacterial-genome-inserted viral information that tells the cells to make T7 when you want it. And then that T7 will make mRNA of your gene and the ribosomes will make protein from it (there’s no nucleus in bacteria, so this is all happening in the same place, with transcription (mRNA making) and translation (protein making) happening simultaneously. 

Sometimes, you may need to try “fancier” cell lines if your protein isn’t expressing well. These cell lines often have additional “helper” plasmids with extra genes to promote expression of your gene and/or protect the cell from any toxic effects the gene might have. 

For example, one reason some proteins don’t express well in bacteria is that bacteria have different codon usage than we do. Basically, mRNA instructions contain the instructions for making proteins and 3 consecutive RNA letters spells one amino acid (protein letter). These 3-letter words are called codons and when the protein-making machinery (ribosomes) encounter a codon, a transfer RNA (tRNA) with the complementary 3-letter “anticodon” brings the corresponding amino acid. 

Multiple codons can spell the same amino acid and some amino acids therefore have multiple tRNAs that can be used to make them. Different organisms prefer different spellings (they use certain codons more frequently) and thus they “stock up” on the corresponding tRNA. If your gene has a bunch of “rare” codons, the bacteria might get held up waiting for tRNA. 

In codon optimization, you go through and edit your sequence to use the spelling the expressing cells like. For example, there are 6 codons that spell Leucine (Leu, L) & E. coli have 4 Leu tRNAs. The tRNA that recognizes CUG is very abundant, whereas the one for CUA is rare, so swapping CUA for CUG can lead to the recruitment of a more common tRNA and less holdup. more here:

But a (cheaper) alternative to codon optimization is expression of alternative tRNAs – basically, instead of changing your sequence to match tRNA availably, you change tRNA availability to match your sequence, getting the cells to make more of the tRNAs your mRNA called for. For example, Rosetta™️ E. coli strains contain a “pRARE” plasmid containing copies of tRNAs that are more common in humans than in bacteria (for example it adds copies of the CUA-readers as well as tRNAs that recognize AGG, AGA, AUA, CCC & GGA), so human proteins can be expressed in E. coli with less translational holdups. 

This can be great because it can help out for expressing a whole host of different proteins – but your host might not think it’s so great… Cells have evolved to work best at their natural tRNA #s/proportions, so if you skew those you can anger them and this can have effects including reduced cell growth.

That’s just a couple of examples of commonly-lab-used bacterial cells – biased by what our lab happens to use. But I hope it gives you a general idea of some of the qualities to look for with whatever you’re using!

If you want more information, I highly recommend addgene’s blog. 

more on transformation methods:

#365DaysOfScience All (with topics listed) 👉 

Leave a Reply

Your email address will not be published.