Last week we looked at how the Hershey-Chase “blender experiment” showed that hereditary information (the blueprints for beings that are passed down) is written in DNA – NOT proteins. But that doesn’t mean proteins aren’t important! In fact, key components of those blueprints are instructions for making proteins.
But proteins are made up of building blocks called amino acids, whereas DNA is made up of building blocks called nucleotides. And, although scientists know that must be a way to get from one to the other, they didn’t know what that way was.
It was kinda like DNA was written in a “genetic code” that cells could decrypt but scientists couldn’t – couldn’t yet that is… All they needed was a “Rosetta stone,” and in the 1960s, a biochemist named
Marshall W. Nirenberg, running a lab at the National Institutes of Health, along with colleagues including postdoc Heinrich Matthaei, successfully cracked it! They found out what RNA words (codons) spell what protein letters (amino acids).
RNA? I thought we were talking about DNA?! Turns out that when a cell wants to make a protein it doesn’t make it directly from the DNA – it first makes an RNA copy, and that “messenger RNA” (mRNA) copy is what gets read by the protein-making machinery & used to make proteins.
Note: RNA’s really similar to DNA except it has one extra oxygen in its sugar and it has a U instead of T but it still pairs with A. So DNA has A, T, C & G, with A::T & C:::G pairing and RNA has A, U, C, & G, with A::U & C:::G pairing. More here: http://bit.ly/2FqasfN
How do we know? There were a number of experiments pointing to this, including some from Hershey – our blender man. Hershey didn’t just blend stuff. When he infected E. coli with T2 phage, he found that there was some RNA being made and degraded. And other scientist – Volkin & Astrachan – found that that temporary RNA was complementary to the phage’s DNA, not the E.coli’s.
It looked like the phage was infecting bacteria by docking onto bacteria and injecting their phage DNA. Then, in addition to the DNA copies of this DNA being made to make more full-on phage particles, RNA copies of parts of it were being made. And this was happening at the same time the bacteria were making the proteins needed to build the phage coat to surround the DNA copies of phage so that, once the bacteria broke open (lysed) they could survive outside the bacterial cell & infect new cells. This evidence was part of a growing collection of evidence suggesting that RNA copies of parts of DNA acted as an “intermediary” between DNA & proteins.
So scientists strongly suspected that “template RNA” is being made and somehow this is being read & “understood” by the protein-making machinery. But they didn’t have the translation dictionary.
They started off with some “brute force password-guessing” – but it wasn’t as simple as guessing a single password. They had this puzzle where they knew there were 20 amino acids and 4 nucleotide letters – A, C, G, & T/U. If you had 2 letter words, that would only give you 4^2 = 16 code words (codons) which isn’t enough. But 3-letter combos would give you 64 possible codons. That’s more than the # of amino acids, so if the 3-letter thing were true some of the codons might not spell anything, and some amino acids might be spelled by multiple codons (like how grAy and grEy mean the same thing) – we call this “degeneracy.”
Nirenberg suspected, that as George Gamow proposed, and they, as we’ll see, & other groups confirmed in various ways) that different amino acids are specified by 3-letter “words.” But scientists didn’t know which codon (RNA word) spelled which amino acid (protein letter).
So, although we talk about “cracking the code” it wasn’t a one hack and you’re in type of thing – instead the situation’s like 64 different passwords to crack. It was a tall task, but scientists did have a couple of things on their side. You know how they say to make your password long & complicated so it’s hard for hackers to randomly guess? Thankfully the genetic code is basically the opposite of that.
It only uses 4 letters (DNA & RNA both have A, C, & G and DNA has T but RNA has U instead), and the passwords are only 3 letters long. So, if scientists had a way to introduce different strings of letters to protein-makers & get them to make protein from it, scientists could “guess” AAA, UUU, CCC, GGG, etc. and see when they “broke in”
A twist is that each of those “passwords” really does mean something – it just unlocks something the scientists couldn’t detect unless they’re looking for the right thing. It’s kinda like in that episode of Friends where Monica’s trying to figure out what a light-switch is controlling but it’s controlling the TV in another apartment. So Monica can’t tell that anything’s happening.
Each time they stuck in a string of RNA letters it was like they were switching on a light-switch and turning on a light somewhere – but where? If they didn’t label any of the amino acids, they could detect if peptides (strings of amino acids) were being made (they knew a light was going on), but not what amino acids were in the string (they don’t know which room).
They knew they could radioactively-label amino acids so that they could track them (more here: )
So if they radiolabeled an amino acid it was like sticking the camera in that amino acid “room,” then introducing RNA to “flip the switch” and seeing if you see light – if the RNA spells that letter, the protein-makers will start making strings of radioactive amino acids that the scientists could detect.
But if the RNA spelled a different letter, the peptides that were made wouldn’t be seen (the light would come on in another room).
They radiolabeled 1 of the amino acids at a time & and they mixed it with “cold” (unlabeled) versions of the others. This “put them in the hot one’s room” – if that RNA sequence spelled that amino acid, a chain of that amino acid would be strung together. And since that amino acid was hot, the chain would be hot. So if they checked if the string was hot and it was, that would mean that the string had the instructions for making that amino acid. The RNA “passwords” would “always” spell something – but the scientists wouldn’t detect that something unless it spelled the labeled thing. A couple of passwords didn’t seem to spell anything – instead, they had discovered the STOP codons (RNA words that tell the protein-makers to stop making protein)
One problem Nirenberg had to overcome was how to make proteins outside of cells – “cell-free synthesis.” Cells are stuffed with stuff, and the stuff varies between cells and even in the same cells over time, and scientists didn’t know what of that stuff was needed for protein-making.
Nirenberg had begun his work on cell-free synthesis with other aims in mind – he was working on making a protein called penicillinase and he wasn’t the only scientist working on such cell-free systems.
Getting protein to be made outside of cells was a difficult task on its own – which Nirenberg successfully did and showed it required ribosomes, template RNA (not just tRNA) & GTP & a couple transfer enzymes – but getting it to make specific proteins was even harder.
In addition to the protein-making stuff, the extract contained DNA from the cells they came from. And the DNA to RNA copy-makers (RNA polymerases) so RNA templates from the cellular DNA were still being made. And thus their corresponding proteins were being made. And this was tying up the protein-making machinery & it could incorporate radio labeled amino acids too – so this would make it harder both to get the ribosomes to make the thing at all and to tell what the introduced RNA was spelling versus what was already in the library (endogenous RNA).
So they added DNAse. This DNA-chewer degraded the endogenous DNA, so endogenous RNA stopped being made. And then they could add in a bunch of template RNA. Importantly, they showed that transfer RNA couldn’t replace template RNA – the peptides being made were templated, not just randomly strung together
in his Nobel Prize lecture, Nuremberg describes the genetic code cracking as a ~6-year endeavor with 2 main “phases” & confirming the tRNA role was be crucial for the second phase of code-cracking…
The 1st phase was that brute-force password guessing. They stuck different strings of RNA letters into cell-free extracts and saw what got made.
When Nirenberg and Heinrich Matthaei stuck a string of U’s in there, they got a “hit” in the radiolabeled-phenylalanine sample. This now-famous “poly-U” experiment showed that strings of uracils coded for phenylalanine, and was a “proof of concept” for their whole experimental setup
This nucleotide string was unambiguous – no matter where in a chain of U’s you start (what “reading frame”) the 3-letter words are the same
UUU UUU UUU
U UUU UUU UU
UU UUU UUU U
And even if the words weren’t three letters – no matter how long the word was all of its letters would be Us. So they “quickly” found that poly-U made phenylalanine. Similarly, they found poly-C made proline. They had trouble when they looked at poly-G because G-rich sequences fold up weird (strong secondary structure) making them hard-to-access templates. And poly-A made lysine, which they found out a different way (and which their competitors found out)
Those single-letter ones didn’t answer the # needed question either. But later we’ll see how they address that too. For now, just think about 3-letter words.
If you have a string with more than one letter, you start having multiple options. If the letters are randomly ordered (e.g. the string only has As & Us but they can be in any order, AUUAUAAUAUUUA etc. there are a lot of different combinations of codons in there (AAA, AAU, AUU, AUA, UUU, UUA, UAA, UAU)…
And the situation gets even more complicated if you add in a third unique letter. With 1 kind of base you have 1 kind of triplet but with randomly-ordered polynucleotides with 2 kinds of bases you get 2^3=8 triplets, 3 kinds of bases gives you 3^3=27 triplet combos, & when you add in the 4th kind of base you’re up to a whopping 4^3=64 possible codons.
Even if you only have 2 letters and they’re precisely ordered, there are still options.
UAU AUA UAU
U AUA UAU AU
UA UAU AUA U
So if you “get a hit” with this string (it makes a string containing the amino acid you have labeled) it the hit coming from the UAU or the AUA?
Enter Phase 2: One way you can tease them apart is by using more combos to help unambiguize it. But a more elegant way was provided by a binding experiment testing whether tRNAs charged with different amino acids would bind to a ribosome holding the corresponding template RNA.
We know know that peptides are put together amino acid by amino acid with the help of molecular machinery called RIBSOSOMES, which rely on servants called transfer RNAs (tRNAs) to bring them the right amino acid charm to add.
tRNA is a type of “functional RNA” meaning that, unlike the messenger RNA (mRNA) intermediary that’s just an RNA copy of the DNA gene, tRNA never gets made into protein – but it does help make other proteins by TRANSFERing free-floating amino acids to a growing protein chain!
One part of tRNA binds a specific amino acid and the other end contains a 3-nucleotide ANTICODON that is complementary to the matching 3-letter CODON on the mRNA. Different tRNAs have different ANTICODONS & carry different amino acids.
But at the time, the role of tRNA was still inconclusive. In one of many nice collaborations in the competitive race, Lipmann & Nathans gave the Nirenberg group purified “transfer enzymes” and they showed that Phe-tRNA is required for polyphenylalanine synthesis – and transfer enzymes & GTP were also needed.
So they wanted to see whether tRNAs charged with different amino acids would bind to a ribosome holding the corresponding template RNA. They got the idea from some other scientists’ experiments – Arlinghaus, Favelukes and Schweet and Kaji and Kaji had found that poly-U causes tRNA loaded with phenylalanine (Phe-tRNA) to attach to ribosomes before a peptide bond was actually formed. This got Nirenberg & his colleague Philip Leder thinking – what if they introduced trinucleotides or hexanucleotides of known sequence – would the corresponding amino-acid-loaded tRNA bind it?
If yes, this would tell them that that sequence order spelled that amino acid. But they’d need a way to separate the ribosomal-bound from the unbound amino-acid-tRNA.
They used a filter-based membrane-binding technique. The ribosomal intermediate got trapped on a disc of cellulose nitrate but unbound tRNA didn’t stick. So if the tRNA was labeled & the membrane was hot, the tRNA must have bound to the ribosome.
When they added the trinucleotide AAA, Lys-tRNA bound to the ribosomes (and also if they added AAAA or AAAAA) But when they just added a doublet, it didn’t -> at least 3 sequential bases were needed (I told you they’d show this!).
Now they had a way to distinguish between different orders – so, for example, they could separately test GUU, UGU, & UUG -> which they found caused Val-tRNA, Cys-tRNA, & Leu-tRNA, respectively, to bind the ribosomes. And this told them that GUU spells valine, UGU spells cysteine, & UUG spells leucine.
They were able to get those triplets with just 2 letters by digesting poly-(U,G) sequences and fractionating the products – using their slightly different chemical properties to separate them. It would make more sense to just write the strings they wanted, but the technology for synthesizing RNA was just developing.
It wasn’t like know where you can just order custom RNA of any sequence you want and a company will make it for you in a couple weeks (hoping a couple I ordered are coming soon…) At the time, custom RNA making was still a new, cutting-edge thing so Nirenberg and friends got help from some other great biochemists.
The project of figuring out what order of letters spelled what was largely carried out by a research associate in Nirenberg’s lab at NIH, Philip Leder. He looked into enzymatic methods for synthesizing trinucleotides – got help from Leon Heppel & Maxine Singer – who continued to advise them throughout the course of their experiments. He ordered diribonucleotides (each of the 16 2-letter nucleotide combos) from a German company he saw advertising them in a journal and used an enzyme called polynucleotide phosphorylase to add a third letter onto the end to get triplets of defined sequence. This allowed him to make all 64 triplet combos. That polynucleotide phosphorylase was discovered by Grunberg-Manago, Oritz, & Ochoa (Nobel laureate Severo Ochoa & his big lab at NYU was one of their main competitors).
Using these methods, Nirenberg’s group was able to get the code mostly cracked by 1966 & in 1968, Nirenberg was awarded the Nobel Prize in physiology or medicine – sharing the honor with Robert W. Holley & Har Gobind Khorana. You can (and I recommend it) read Nirenberg’s Nobel Lecture (it should pop up quickly if you Google it). And the National Library of Medicine has the “Marshall W. Nireberg Papers” which include lots of his laboratory notebooks, etc. and you can see a lot of it online.
In case it’s confusing you – it did me – that they were able to get the extracts to make peptides without a start codon… Nirenberg got “lucky” – he didn’t know it at the time (the whole thing he was trying to figure out was what spelled what!) but there’s 1 RNA word that spells “START” – it tells the protein-making machinery to latch on & start making protein. His synthetic RNAs didn’t have this start signal – but his solutions did have a LOT of magnesium (Mg2+)
One of the reasons that the start codon is normally needed is that it helps recruit additional machinery to “clamp” the ribosome & tRNA on. You see, in addition to those unique bases that allow for the letter-to-letter base pairing between strands, RNA letters have a generic sugar(ribose)-phosphate backbone. And the phosphate in that backbone is negatively charged. And negative charges repel. So even though the base-pairing is favorable, sticking a bunch of negative charge together isn’t
This is especially an issue when writing RNA & DNA because the letters come in in their triphosphate form (3 of those negatively-charged phosphates linked together like a smooshed-up spring). When the letters link up, 2 of the phosphates split off – it’s kinda like un-smooshing the spring – energy is released and this helps pay the cost of the decrease in entropy of linking (the molecules are more restricted in their movement when bound together and that’s entropically unfavorable)
Magnesium hangs out as a divalent cation (2+-charged particle) Mg2+. And those plusses are attracted to phosphates’ minuses. So they hang out and this “distracts” the phosphate so it doesn’t mind hanging out near other phosphates as much. So the RNA letters can get close and stay close – long enough to get translation started. And once it’s started you increase the base pairing so there’s more reason to stay. So, by having so much magnesium in there, Nirenberg had bypassed the start codon requirement.
more on Hershey experiment: http://bit.ly/hershey_chase
more on radiation & radiolabeling: http://bit.ly/radiolabeling
more on amino acids: http://bit.ly/2KBFRiS
more on translation: http://bit.ly/2XwGdKO