Even knowing what DNA looks like, it’s far from clear how DNA “spells” proteins… They’re in completely different biochemical languages! And some of my favorite classic experiments involve figuring out how they’re linked. The “Nirenberg experiments” of the 1960s “cracked the genetic code,” by showing which RNA words (codons) present in “recipes” copied from DNA genes spell which protein letters, providing a “Rossetta Stone” linking the DNA & RNA language of nucleotide letters to the protein language of amino acid letters.
Yesterday we looked at how Rosalind Franklin and grad student Raymond Gossling used a technique called X-ray fiber diffraction (related to but distinct from x-ray crystallography) to capture a blurry X “image” of DNA that led to the solving of the double helical structure of DNA. http://bit.ly/39JMgmj
But that didn’t explain how DNA could direct related to proteins. There’s more cracking needed!
Science isn’t a “happens in a bubble” type of thing. And at the height of the “molecular revolution” findings were coming in from all around the world that provided clues to various aspects of the DNA-protein relationship.
For example, the 1952 Hershey-Chase “blender experiment” showed that hereditary information (the blueprints for beings that are passed down) is written in DNA – NOT proteins. They figured this out by using bacteria-infecting viruses called bacteriophages, or “phages.” They knew that phages infect bacteria by docking on the surface of the bacteria and injecting instructions for making more copies of themselves. They just didn’t know what those instructions were made up of.
By radioactively labeling either the phage’s protein or its DNA and letting these labeled phages infect bacteria, they could look at whether protein or DNA got injected by using a blender to shear off any docked phage “shells” so that only injected stuff stayed in the bacteria and then checking the bacteria for radioactivity. If they radioactively-labeled the phage protein, no radioactivity stayed in the bacteria. But when they radioactively-labeled the phage DNA, the bacteria became radioactive, indicating that DNA had been injected http://bit.ly/2m87CoL
This told them that the DNA contained genetic blueprints (it wasn’t quite this cut-and-dry but was rather a key experiment of many important experiments that led to this conclusion, which many were skeptical to accept because DNA seemed too “simple” to contain the wealth of information required).
But just because DNA has the blueprints doesn’t mean proteins aren’t important! In fact, key components of those blueprints are instructions for making proteins. But proteins are made up of building blocks called amino acids, whereas DNA is made up of building blocks called nucleotides. And, although scientists knew that there must be a way to get from one to the other, they didn’t know what that way was.
It was kinda like DNA was written in a “genetic code” that cells could decrypt but scientists couldn’t – couldn’t yet that is… All they needed was a “Rosetta stone,” and in the 1960s, a biochemist named Marshall W. Nirenberg, running a lab at the National Institutes of Health, along with colleagues including postdoc Heinrich Matthaei, successfully cracked it! They found out which RNA words (codons) spell which protein letters (amino acids).
RNA? I thought we were talking about DNA?! Turns out that when a cell wants to make a protein it doesn’t make it directly from the DNA (that would be too dangerous (hands off the original) and it’d be inefficient since you’d only have one recipe available). Instead, it first makes RNA copies, and those “messenger RNA” (mRNA) copies are what get read by the protein-making machinery & used to make proteins.
Note: RNA’s really similar to DNA – both have a “generic” sugar-phosphate backbone that lets them link letters together into chains. And a unique nitrogenous base (“base”) that allows for specific “base pairings” between chains. RNA & DNA differ in that RNA has one extra oxygen in its sugar (ribose vs deoxyribose) and it has the base U instead of T (but U still pairs with A). So DNA has A, T, C & G, with A::T & C:::G pairing and RNA has A, U, C, & G, with A::U & C:::G pairing. So you can use one strand of DNA for making a complementary strand of DNA or RNA. More here: http://bit.ly/2FqasfN
How do we know there’s an RNA intermediary? There were a number of experiments pointing to this, including some from Hershey – our blender man. Hershey didn’t just blend stuff. When he infected E. coli with T2 phage, he found that there was some RNA being made and degraded. And other scientists – Volkin & Astrachan – found that that temporary RNA was complementary to the phage’s DNA, not the E.coli’s.
It looked like the phage was infecting bacteria by docking onto bacteria and injecting their phage DNA. Then, in addition to the DNA copies of this DNA being made to make more full-on phage particles, RNA copies of parts of it were being made. And this was happening at the same time the bacteria were making the proteins needed to build the phage coat to surround the DNA copies of phage so that, once the bacteria broke open (lysed) they could survive outside the bacterial cell & infect new cells. This evidence was part of a growing collection of evidence suggesting that RNA copies of parts of DNA acted as an “intermediary” between DNA & proteins.
So scientists strongly suspected that “template RNA” was being made and somehow that was being read & “understood” by the protein-making machinery. But they didn’t have the translation dictionary.
They started off with some “brute force password-guessing” – but it wasn’t as simple as guessing a single password. They had this puzzle where they knew there were 20 amino acids and 4 nucleotide letters – A, C, G, & T/U. If you had 2 letter words, that would only give you 4²=16 code words (codons) which isn’t enough. But 3-letter combos would give you 4³=64 possible codons. That’s more than the # of amino acids, so if the 3-letter thing were true some of the codons might not spell anything, and some amino acids might be spelled by multiple codons (like how grAy and grEy mean the same thing) – we call this “degeneracy.”
Nirenberg suspected that, as George Gamow proposed and they & other group would confirm in various ways, that different amino acids are specified by 3-letter “words.” But scientists didn’t know which codon (RNA word) spelled which amino acid (protein letter). So, although we talk about “cracking the code” it wasn’t a one hack and you’re in type of thing – instead the situation’s like 64 different passwords to crack. It was a tall task, but scientists did have a couple of things on their side. You know how they say to make your password long & complicated so it’s hard for hackers to randomly guess? Thankfully the genetic code is basically the opposite of that.
It only uses 4 letters (DNA & RNA both have A, C, & G and DNA has T but RNA has U instead), and the passwords are only 3 letters long. So, if scientists had a way to introduce different strings of letters to protein-makers & get them to make protein from it, scientists could “guess” AAA, UUU, CCC, GGG, etc. and see when they “broke in.” A twist is that each of those “passwords” really does mean something – it just unlocks something the scientists couldn’t detect unless they’re looking for the right thing.
It’s kinda like in that episode of Friends (which I’m missing already!) where Monica’s trying to figure out what a light-switch is controlling, but it’s controlling the TV in another apartment. So Monica can’t tell that anything’s happening. Each time they stuck in a string of RNA letters it was like they were switching on a light-switch and turning on a light somewhere – but where? If they didn’t label any of the amino acids, they could detect if peptides (strings of amino acids) were being made (they knew a light was going on), but not what amino acids were in the string (they don’t know which room).
They knew they could radioactively-label amino acids so that they could track them. more here: http://bit.ly/radiolabeling
So if they radiolabeled an amino acid it was like sticking the camera in that amino acid “room,” then introducing RNA to “flip the switch” and seeing if you see light – if the RNA spells that letter, the protein-makers will start making strings of radioactive amino acids that the scientists could detect (we often refer to radioactive things as “hot”). But if the RNA spelled a different letter, the peptides that were made wouldn’t be seen (the light would come on in another room).
They radiolabeled 1 letter at a time & and they mixed it with “cold” (unlabeled) versions of the others. This “put them in the hot one’s room” – if that RNA sequence spelled that amino acid, a chain of that amino acid would be strung together. And since that amino acid was hot (radioactive), the chain would be hot too. So if they checked if the string was hot and it was that would mean that the string had the instructions for making that amino acid they’d labeled. The RNA “passwords” would “always” spell something – but the scientists wouldn’t detect that something unless it spelled the labeled thing. A couple of passwords didn’t seem to spell anything – instead, they had discovered the STOP codons (RNA words that tell the protein-makers to stop making protein).
One problem Nirenberg had to overcome was how to make proteins outside of cells – “cell-free synthesis.” Cells are stuffed with stuff, and the stuffed stuff varies between cells and even in the same cells over time, and scientists didn’t know what of that stuff was needed for protein-making. Nirenberg had begun his work on cell-free synthesis with other aims in mind – he was working on making a protein called penicillinase and he wasn’t the only scientist working on such cell-free systems. Getting protein to be made outside of cells was a difficult task on its own – which Nirenberg successfully did – and showed it required ribosomes (protein/RNA complexes that mediate the amino acid linking), template RNA (not just transfer RNA (tRNA)), GTP, & a couple transfer enzymes – but getting cell free systems to make specific proteins was even harder.
In addition to the protein-making stuff, the extract contained DNA from the cells it came from. And the DNA to RNA copy-makers (RNA polymerases). So RNA templates from the cellular DNA were still being made. And thus their corresponding proteins were being made. And this was tying up the protein-making machinery & it could incorporate radio labeled amino acids too – so this would make it harder both to get the ribosomes to make the thing at all and to tell what the introduced RNA was spelling versus what was already in the library (endogenous RNA).
So they added DNase. This DNA-chewer degraded the endogenous DNA, so endogenous RNA stopped being made. And then they could add in a bunch of “exogenous” (aka introduced) template RNA. Importantly, they showed that transfer RNA couldn’t replace template RNA – the peptides being made were templated, not just randomly strung together.
in his Nobel Prize lecture, Niremberg describes the genetic code cracking as a ~6-year endeavor with 2 main “phases” & confirming the tRNA role would be crucial for the second phase of code-cracking… But 1st phase first – & the 1st phase was that brute-force password guessing. They stuck different strings of RNA letters into cell-free extracts and saw what got made.
When Nirenberg and Heinrich Matthaei stuck a string of U’s in there, they got a “hit” in the radiolabeled-phenylalanine sample. This now-famous “poly-U” experiment showed that strings of uracils coded for phenylalanine, and was a “proof of concept” for their whole experimental setup.
That “poly-U” nucleotide string was unambiguous – no matter where in a chain of U’s you start (what “reading frame”) the 3-letter words are the same
UUU UUU UUU
U UUU UUU UU
UU UUU UUU U
And even if the words weren’t three letters – no matter how long the word was all of its letters would be Us. So they “quickly” found that poly-U made phenylalanine. Similarly, they found poly-C made proline. They had trouble when they looked at poly-G because G-rich sequences fold up weird (strong secondary structure) making them hard-to-access templates. And poly-A made lysine, which they found out a different way (and which their competitors found out).
Those single-letter ones didn’t answer the # needed question either. But later we’ll see how they address that too. For now, just think about 3-letter words.
If you have a string with more than one letter, you start having multiple options. If the letters are randomly ordered (e.g. the string only has As & Us but they can be in any order, AUUAUAAUAUUUA etc.) there are a lot of different combinations of codons in there (AAA, AAU, AUU, AUA, UUU, UUA, UAA, UAU)…
And the situation gets even more complicated if you add in a third unique letter. With 1 kind of base you have 1 kind of triplet but with randomly-ordered polynucleotides with 2 kinds of bases you get 2³=8 triplets, 3 kinds of bases gives you 3³=27 triplet combos, & when you add in the 4th kind of base you’re up to a whopping 4³=64 possible codons.
Even if you only have 2 letters and they’re precisely ordered, there are still options.
UAU AUA UAU
U AUA UAU AU
UA UAU AUA U
So if you “get a hit” with this string (it makes a string containing the amino acid you have labeled) you don’t know – is coming from the UAU or the AUA?
Enter Phase 2: One way you can tease them apart is by using more combos to help unambiguize it. But a more elegant way was provided by a binding experiment testing whether tRNAs charged with different amino acids would bind to a ribosome holding the corresponding template RNA.
We now know that peptides are put together amino acid by amino acid with the help of molecular machinery called ribosomes, protein/RNA complexes which rely on servants called transfer RNAs (tRNAs) to bring them the right amino acid to add. tRNA is a type of “functional RNA” meaning that, unlike the messenger RNA (mRNA) intermediary which is just an RNA copy of the DNA gene, tRNA never gets made into protein – but it does help make other proteins by TRANSFERing free-floating amino acids to a growing protein chain! One part of tRNA binds a specific amino acid (gets “charged” with that amino acid) and the other end contains a 3-nucleotide anticodon that is complementary to the matching 3-letter codon on the mRNA. Different tRNAs have different anticodons & carry different amino acids. More on this “translation” process here: http://bit.ly/31IwofL
We know a lot about translation *now*, but at the time, the role of tRNA was still inconclusive. In one of many nice collaborations in the competitive race, Lipmann & Nathans gave the Nirenberg group purified “transfer enzymes” and they showed that Phe-tRNA is required for polyphenylalanine synthesis – and transfer enzymes & GTP were also needed.
So they wanted to see whether tRNAs charged with different amino acids would bind to a ribosome holding the corresponding template RNA. They got the idea from some other scientists’ experiments – Arlinghaus, Favelukes and Schweet and Kaji and Kaji had found that poly-U causes tRNA loaded with phenylalanine (Phe-tRNA) to attach to ribosomes before a peptide bond was actually formed (i.e. before the amino acid got linked to the growing protein chain). This got Nirenberg & his colleague Philip Leder thinking – what if they introduced trinucleotides or hexanucleotides of known sequence – would the corresponding amino-acid-loaded tRNA bind it?
If yes, this would tell them that that sequence order spelled that amino acid. But they’d need a way to separate the ribosomal-bound from the unbound amino-acid-tRNA. They used a filter-based membrane-binding technique. The ribosomal intermediate got trapped on a disc of cellulose nitrate but unbound tRNA didn’t stick. So if the tRNA was labeled & the membrane was hot, the tRNA must have bound to the ribosome.
When they added the trinucleotide AAA, Lys-tRNA bound to the ribosomes (and also if they added AAAA or AAAAA) But when they just added a doublet, it didn’t -> at least 3 sequential bases were needed (I told you they’d show this!).
Now they had a way to distinguish between different letter orders – so, for example, they could separately test GUU, UGU, & UUG -> which they found caused Val-tRNA, Cys-tRNA, & Leu-tRNA, respectively, to bind the ribosomes. And this told them that GUU spells valine, UGU spells cysteine, & UUG spells leucine.
They were able to get those triplets with just 2 letters by digesting poly-(U,G) sequences and fractionating the products – using their slightly different chemical properties to separate them. It would make more sense to just write the strings they wanted, but the technology for synthesizing RNA was just developing. It wasn’t like know where you can just order custom RNA of any sequence you want and a company will make it for you in a couple weeks. At the time, custom RNA making was still a new, cutting-edge thing, so Nirenberg and friends got help from some other great biochemists.
The project of figuring out what order of letters spelled what was largely carried out by a research associate in Nirenberg’s lab at NIH, Philip Leder. He looked into enzymatic methods for synthesizing trinucleotides – ways to utilize proteins that knew how to do it well.Speaking of knowing how to do things well, he also got help from human experts – Leon Heppel & Maxine Singer – who continued to advise them throughout the course of their experiments.
Leder ordered all the diribonucleotides (each of the 16 2-letter nucleotide combos) from a German company he saw advertising them in a journal and used an enzyme called polynucleotide phosphorylase to add a third letter onto the end to get triplets of defined sequence. This allowed him to make all 64 triplet combos. Side note – That polynucleotide phosphorylase was discovered by Grunberg-Manago, Oritz, & Ochoa (Nobel laureate Severo Ochoa & his big lab at NYU was one of their main competitors).
Using these methods, Nirenberg’s group was able to get the code mostly cracked by 1966 and, in 1968, Nirenberg was awarded the Nobel Prize in physiology or medicine – sharing the honor with Robert W. Holley & Har Gobind Khorana. You can (and I recommend it) read Nirenberg’s Nobel Lecture (it should pop up quickly if you Google it). And the National Library of Medicine has the “Marshall W. Nireberg Papers” which include lots of his laboratory notebooks, etc. and you can see a lot of it online.
In case it’s confusing you (it confused me) that they were able to get the extracts to make peptides without a start codon… Nirenberg got “lucky” – he didn’t know it at the time (the whole thing he was trying to figure out was what spelled what!) but there’s 1 RNA word that spells “START” – it tells the protein-making machinery to latch on & start making protein. His synthetic RNAs didn’t have this start signal – but his solutions did have a LOT of magnesium (Mg²⁺).
One of the reasons that the start codon is normally needed is that it helps recruit additional machinery to “clamp” the ribosome and tRNA on. You see, in addition to those unique bases that allow for the letter-to-letter base pairing between strands, RNA letters have a generic sugar(ribose)-phosphate backbone. And the phosphate in that backbone is negatively charged. And negative charges repel. So even though the base-pairing is favorable, sticking a bunch of negative charge together isn’t.
This is especially an issue when writing RNA & DNA because the letters come in in their triphosphate form (3 of those negatively-charged phosphates linked together like a smooshed-up spring). When the letters link up, 2 of the phosphates split off – it’s kinda like un-smooshing the spring – energy is released and this helps pay the cost of the decrease in entropy of linking (the molecules are more restricted in their movement when bound together and that’s entropically unfavorable).
Magnesium hangs out as a divalent cation (2+ charged particle) Mg²⁺. And those plusses are attracted to phosphates’ minuses. So they hang out and this “distracts” the phosphate so it doesn’t mind hanging out near other phosphates as much. So the RNA letters can get close and stay close – long enough to get translation started. And once it’s started you increase the base pairing so there’s more reason to stay. So, by having so much magnesium in there, Nirenberg had bypassed the start codon requirement.
more on Hershey experiment: http://bit.ly/hershey_chase
more on radiation & radiolabeling: http://bit.ly/radiolabeling
more on amino acids: http://bit.ly/2KBFRiS
more on translation: http://bit.ly/2XwGdKO