Making RNA is all the rage and if you want to make a lot a lot of copies of a specific sequence, you’re probably gonna take a page out of the book of the T7 phage! (or at least take its DNA-dependent RNA Polymerase (RNAP)!) You know all those mRNA vaccines? Part of the reason it’ll take so long to get them to everyone is that you have to make them. Which involves a lot of in-vitro transcription. Transcription is the process of making RNA copies of DNA. And “in vitro” just means that we’re doing it basically “in a test tube” (or a vat or something, just not in an actual organism). So, “in vitro transcription” is a way to make lots of RNA from a DNA template, often using the RNAP of a bacteriophage (bacteria-infecting virus) called T7, without even harming bacteria in the process! And that’s far from all that T7 RNAP is good for – we can use it to help us express proteins too. So today I want to promote T7 RNAP and the T7 promoter!
The idea with mRNA vaccines is that you get your body to make a part of the virus which on its own is harmless so that your body learns to recognize it as foreign and make antibodies against it. Basically, you take messenger RNA (mRNA), which are copies of the recipe for making a protein (in the case of the SARS-CoV-2 coronavirus, the recipes are of the Spike protein) and get those mRNAs into cells (often snuck in by encapsulation in a lipid coat). These mRNAs are then used by the cell’s ribosomes (protein-making complexes) to make the corresponding protein, Spike, which then gets displayed to the immune system. If you want to learn more about them, I covered in the past. https://bit.ly/modernamrnavaccine
They’ve been getting lots of attention. As have potential timelines for manufacturing. But what’s been getting less attention is how that manufacturing actually happens. So, I thought it would be cool to do a post on in-vitro transcription, and you know I like to know the history of techniques, so I went looking. And I found something really disturbing. The original paper describing the discovery and purification of T7 RNA Polymerase had only been cited 133 times! And it’s a really beautiful piece of biochemistry. So I thought I’d walk you through it so it will at least be “sighted” more even if it isn’t “cited” more!
I was so surprised it hadn’t been cited more because scientists use T7 RNAP *all the time*! In addition to in-vitro transcription, we use it for recombinant over-expression of proteins in bacteria – basically we “recombine” a gene of interest with a plasmid vector, which is a circular piece of DNA we can stick into bacteria. And we get the bacteria to make mRNA from it and then protein from that mRNA. Often, we use T7 RNAP to do the mRNA making. As I’ll get into, this lets us control expression and get lots of protein. Basically we can hijack the virus that hijacks the bacteria so that the bacteria start devoting themselves to making a protein that we want when we want it.
A couple terminology notes: “gene” is basically just a stretch of DNA that gets transcribed, kinda like a chapter in a book. Some, but not all, genes have “recipes” for making proteins. The gene is transcribed by RNAP into mRNA and that is translated by ribosomes into protein. In animal cells, these processes are separated, but in bacteria, transcription is coupled to translation so that proteins are made on the mRNA as it’s made.
And if you want to learn more about that process, check out:https://bit.ly/translationtimestwo
There are a couple reasons why T7 RNAP specifically is so useful…
One is that it’s highly specific. Like other RNAPs it “recognizes” (by selectively binding to) specific stretches of DNA called promoter sequences which are located upstream of the start of the gene that they copy. note: for those people like me who always have to think twice about upstream vs. downstream, promoters are in front!
Another reason is that it’s only a single subunit. Proteins are basically long chains of amino acid letters that fold up into 3D shapes that are well-suited for doing different things. Unlike most RNAPs which are made up of multiple protein chains, T7 RNAP is a single piece. Which makes things way easier in terms of expressing and purifying it etc. (no needing to worry that you’ve lost one!)
And a final reason is that it’s really fast! (5-10 times faster than E. coli’s) So fast that it helps pull the phage DNA into the bacteria cell as we’ll see.
I hope I’ve convinced you (at least partly) that T7 RNAP is a useful tool (just ask Moderna!). But the only reason we’re able to use it as a tool is that it, and it’s usefulness, was discovered! Like many of now-ubiquitous molecular biology tools (such as CRISPR), the discovery of T7 RNAP was the result of basic research. The scientists weren’t looking to find a molecule they could use to make lots of copies of RNA on demand. Instead, they were just trying to figure out what the heck was going on in bacteria infected with T7. So let’s talk more about T7…
T7 is a bacteriophage (“phage” for short), which is a virus that infects bacteria – T7’s technical “species name” is Escherichia virus T7. And, as that hints at, it infects Escherichia coli (E. coli) and related bacteria. T7’s a “lytic phage” which means that it keeps its DNA separate from the bacteria’s (doesn’t integrate) and, after making lots of copies of itself inside the bacterium (replicating), it breaks the cell open to get all those new phage particles out (no gentle budding like coronaviruses do). Another difference between T7 and SARS-CoV-2 is that T7 is a double-stranded DNA virus (as opposed to a single-stranded RNA virus). This has the result that, once T7 gets inside a cell, it has to get the cell’s RNAP to make messenger RNA (mRNA) copies of those genes which the cell’s ribosomes can then use as recipes to make phage proteins. By controlling when mRNAs get made of which genes, it can control when its various proteins get made.
T7 was named by Demerec and Fano in 1945 (it was the 7th of 7 phage types they were describing in their study) but it had been studied since the 30’s under different names like δ). If you want to know more than you ever thought you’d want to know about phages, here’s a cool article https://bit.ly/3461YpR
One of the things researchers found out early on is that early in the infection cycle, T7 phage would make a few RNA transcripts (and proteins from them) and then later it would stop making those and then switch to expressing (transcribing/translating) different genes, “late genes.” But they didn’t know why, but seemed like a good way to learn about gene regulation.
Scientists suspected that the phage might have a gene for its own σ subunit of E. coli RNAP. You know how I told you most RNAPs have multiple subunits? Well, in addition to several subunits that make up the core RNAP which does the actually letter-connecting, E. coli RNAP has a sigma (σ) subunit that’s important for recognizing and binding the promoter. Different σ subunits like different sequences, and by swapping out σ, E. coli can transcribe different genes at different times. So, the theory was, maybe T7 got E. coli RNAP to swap in a T7 σ – but instead, as I will tell you more about, scientists in 1970 found that T7 actually had the bacteria make its own whole T7 RNAP!
I want to tell you an overview of what we know now so it can help you interpret what they saw, and then I will tell you what they saw and walk you through their findings in the figures. Hope that works out okay and sorry for the spoilers!
One of the coolest things I learned about T7 is that it doesn’t even wait until it’s all inside to put the infected bacterium to work. (did anyone else just have J.C. Penny’s commercial flashbacks?) T7 has this sort of protein shell called a capsid and this capsid docks onto the outer surface of bacteria kinda like a moon lander. And then it goes drilling. It bores a protein pore through the bacterial membrane and injects its DNA.
mRNA starts getting made from the front end of the DNA once it gets inside (using the bacterial RNAP). The process of transcription helps physically pull the DNA in partway, but E. coli RNAP is relatively slow (compared to T7 RNAP) so this buys time for the “early genes,” which are located in the “front end” of the DNA to get made.
The early genes include the gene for T7’s very own RNAP, as well as genes for proteins which shut down bacterial transcription. A major E. coli transcription shut-downer is the gene 0.7 (I’m assuming they discovered it after they thought they’d found the 1st gene…). gene 0.7 codes for a kinase that phosphorylates (adds a negatively-charged phosphate group) to bacterial RNAP, somewhat crippling it. It also does other things, as do other early genes, including 0.3, whose protein product inhibits the bacteria’s restriction enzymes (which normally cut foreign DNA). The end result is that the bacterium stops transcribing its own RNA and has its defenses dampened.
But don’t worry, that’s okay (for T7) because you now have T7 RNAP, and the late genes are under control of the T7 promoter. So T7 transcribes those genes, and without competition from the bacterial mRNAs, it’s able to hog the ribosomes and make lots and lots of protein.
We can take advantage of this system in the lab by sticking T7 promoters in front of a gene of interest in a plasmid in E. coli. But since T7 RNAP is so powerful, you don’t want to have it active until you want it active or else you won’t have enough bacteria to make your protein, so you want to control when T7 gets expressed, which we commonly do by using a trick from a different phage, λ which has a “lac repressor” that stops transcription of the genes it’s in front of unless lactose (or the lactose mimic IPTG) is present. IPTG will bind it and it will fall off, so the gene will get transcribed and protein made.
So, into the E. coli we want to express our protein, we put a second plasmid in the E. coli containing an inducible T7 RNAP gene. When we add the inducer (as mentioned, we often we use the lac operon system, so we add IPTG which derepresses the promoter in front of the T7 RNAP gene) T7 RNAP gets made and it makes mRNAs of our gene which the cell makes protein for. It’s really great for making a lot of protein and the inducible-ness makes it good for expressing “toxic proteins” which would prevent the E. coli from growing and thriving enough to be able to make the protein. more here: http://bit.ly/bacoverexpression
So, now we have 2 things T7 RNAP is really useful for in the lab –
- in-vitro transcription (make lots of RNA from a DNA template (which can be double-stranded DNA (dsDNA) such as a linearized plasmid, or single-stranded DNA (ssDNA) with a double-stranded promoter region) and
- recombinant protein expression (make lots of protein (because you made lots of mRNA)).
BUT, on order for these schemes to work (for us and for T7), T7 RNAP has to be incredibly picky so that it doesn’t transcribe the bacterial genes, just its own genes. And this pickiness is achieved by having a highly specific promoter sequence. As long as we stick this promoter sequence in front of a gene (or any bit of DNA we want to copy) inside a cell or out, we can get RNAP to make copies of it.
note: the promoter sequence of T7 is kinda weird in that instead of sitting in front of but at a slight distance from the start site, it sits “on it” – so the last letter or letters in the promoter sequence get copied in RNA (so is the first letter of your RNA sequence). T7 RNAP’s “favorite sequence” (it’s so-called consensus sequence) is 5’-TAATACGACTCACTATAGG-3’. Turns out you can fiddle around a bit with the middle part. But you can’t touch the end G (and you get much better results if you keep the G before that). So when using T7 RNAP, your RNA will start with GG or G, which isn’t a problem for protein-making since the actual protein instructions are still further downstream (the promoter only overlaps with the 5’ UnTranslated Region (5’-UTR). But it could be a problem for in-vitro transcription if you don’t want a G.
Speaking of G, one of the early findings about T7 RNAP was that it really liked to make G-rich transcripts (which means it likes C-rich templates), which is a pretty lame segue into those findings but, it’s what I’ve got, so let’s go!
The paper I want to tell you about is titled “New RNA Polymerase from Escherichia coli infected with Bacteriophage T7. It was published in the journal Nature in October 1970 and the scientists involved (or at least credited in the paper) were Michael Chamberlin, Janet McGrath, and Lucy Waskell, at UC Berkeley. It’s open access, so anyone can (and should) read it! But here’s a summary. https://www.nature.com/articles/228227a0
So, they wanted to figure out how the T7 phage was able to switch its transcript-making from early genes to late genes. As I mentioned before, one of the leading hypotheses was that the T7 phage made a separate σ subunit for E. coli’s RNAP to use, swapping out the E. coli RNAP’s specificity. Could they find that subunit? They went looking in T7-infected E. coli.
They used a variety of protein purification strategies, including ion exchange chromatography, where you have charged resin (little beads) filling columns and flow a solution through it. Proteins in the solution (which are each uniquely-charged because of their different amino acid combinations) separate based on the proteins’ charge. The more oppositely charged the protein is to the resin (e.g. positively-charged protein and negatively-charged resin) the tighter it will bind and the more salt it will take to compete it off. By gradually adding more salt and collecting what comes off of the column in fractions (e.g. first mL to come out, 2nd mL, 3rd mL, etc.) you end up (hopefully) with different proteins in different fractions. And then you can test those fractions for some activity you’re looking for and then investigate the “hits” to find the culprits.
In this case, they were looking for RNAP activity. So they’d add various DNA templates and radioactively-labeled RNA letters and then measure if and how much radioactive RNA got made. If they saw it get made, it indicated that an RNAP was present in that fraction
When they did this they found that there were 2 separated fractions (5A & 5B) that both showed RNAP activity. One of the first hints that they contained different RNAPs was that they were losing a lot of the RNAP activity during the purification, even though they knew that E. coli RNAP was stable in those purification conditions. This suggested that there was an additional RNAP that was losing activity during the purification (later they were able to optimize the purification conditions to keep it happier).
5B behaved like normal E. coli RNAP – it readily transcribed a dAT template (a stretch of DNA with lots of A’s and T’s) and preferred that over T7 DNA template. But 5A barely had any activity towards the dAT but LOVED T7 DNA, copying it 29X better than the dAT. So that was weird…
When they ran an SDS-PAGE gel to separate a little sample of the proteins by size, they found that the fractions contained different size proteins – but 5A contained only 1 protein! Whereas they saw that the E. coli RNAP had multiple bands (corresponding to the multiple subunits), the T7 RNAP fractions had a single band. But with complete activity! They’d just discovered the first single-subunit RNAP! When they did a zone sedimentation which separates proteins by their size/shape based on how far in a sugar gradient they sink, and varies from SDS-PAGE in many ways, but one way they differ is that here they’re able to look at the protein’s “shape/size” under native conditions (where proteins are folded as opposed to in SDS-PAGE where the proteins are unfolded and the subunits are separated hence multiple bands for E. coli’s RNAP). With this, combined with the SDS-PAGE showing it was a single protein, they were able to estimate it to have a size of ~100,000 Da (100kDa) which is pretty damn close to the 98 kDa we now know it to be.
So they found it, they have an idea about its size, and now they wanted to know more about how it worked…
Speaking of working, it could work even in the presence of E. coli RNAP inhibitors (including streptolydigin & rifamycin). This made it easier to purify and study because they could add those inhibitors during their purifications because they could more easily find it without getting led on by E. coli RNAP. And they could add it to their assays (experiments where you measure things, such as RNAP activity) so they wouldn’t have to worry about activity from any contaminating E. coli RNAP muddying their results.
They then did things to better characterize what type of DNA it likes by trying their assay using various templates – dAT (what E. coli likes) as well as the DNA from a couple of other phages. And, they found that T7 RNAP really liked its best. And if you gave it a duplex of a G strand and a C strand where G was radioactive or C was radioactive (separate experiments) it would make a G strand, but “not” (i.e. really minimal) the C strand. And this made sense because it was known that T7 DNA had C-rich regions, whereas E. coli and those other phages didn’t. So, good for exclusively transcribing T7 genes.
But where was the T7 gene? They started by checking when in the infection phase it gets made. So they infected cells, took time samples, and then checked for T7 RNAP activity. And they found it in the “early phase.” So it seemed to be an “early gene” and at that time only 3 were known. And only 1 of them (gene 1) was of a size which would correspond to a ~100kDa protein. So they were suspicious that was their guy, but they decided to try “amber mutants” of all 3 – where the stop codon called amber is in either gene 1, gene 2, or gene 3. And only The gene 1 mutant showed a lack of T7 activity. And, really hammering in the proof, they had a temperature-sensitive gene 1 mutant. With this one, they found T7 activity at the “permissive temperature” where the mutation doesn’t affect T7, but not at the “non-permissive temperature,” where that mutation is known to affect T7 growth.
Scientists would later clone out this gene and confirm it.
And now we can use it!