Learning to read the genetic code? You might want to get your reading frames checked! But do your prescriptions overlap? A few days ago we saw how the genetic code was “cracked” – Nirenberg and friends figured out what RNA word spells what protein letter. But even if you know what they words mean you need to read them correctly which means you need to know – can words share letters? Today let’s look at some of the classic experiments – by Tsugita & Fraenkel-Conrat; Wittmann; & Brenner that set out to answer the question: Is the genetic code overlapping????
The instructions for making proteins are written in DNA form in genes, and an RNA copy of the gene (messenger RNA (mRNA) is made (DNA & RNA are both forms of nucleic acid. so we call this process transcription (no language-changing)) and then that RNA copy is read by protein-making machinery (ribosomes) and used to link together protein letters (amino acids) to form a protein (this IS “language-changing” so we call this process translation).
The ribosome knows what amino acid to add because the mRNA has 3-letter “words” called CODONS that “spell” different amino acids. Ribosomes travels along the mRNA adding letters that are brought to it by its transfer RNA (tRNA) “servants.” One part of tRNA binds a specific amino acid and the other end contains a 3-nucleotide ANTICODON that is complementary to the matching 3-letter CODON on the mRNA. Different tRNAs have different ANTICODONS & carry different amino acids.
Nirenberg and colleagues “cracked the genetic code” and determined which codon spells which amino acid. There are 4 nucleotide letters – A, C, G, & T/U – so 64 possible codons. BUT there are only 20 (common) amino acids. 3 of the codons don’t spell an amino acid – instead they spell STOP & signal the end of the protein. But that still leaves 61. So some amino acids have multiple codons (DEGENERACY (redundancy)). BUT any 1 codon will only ever spell 1 amino acid (NOT ambiguous)
But a big question was – are they overlapping? We know that 3 sequential RNA letters spell 1 amino acid letter. But can words share letters? Can one word’s 2nd letter be the neighboring words first letter? and the third one’s first letter?
Nucleic acids are like cursive – the letters are all connected. But unlike cursive, there are nospacesbetweenwords. So how can protein-makers (ribosomes) comprehend it? Thankfully, unlike most languages, all the “words” in RNA are the same length – 3 letters. So it’s like catdogfox
But those letters are part of much longer strings and where you start reading determines what words you read. We call this the READING FRAME, & we now know that the reading frame is determined by the start codon – the word AUG – when it’s at the beginning – tells RNA Pol to start adding amino acids, starting with Met – and if AUG is in the middle it just tells RNA Pol to add Met.
If the code is NOT overlapping and you’re in the right frame you only have 1 option for reading this. cat dog fox. But if the code IS overlapping you could have words like cat atd tdo dog if you overlap by 2 and cat tan ndo og.
Overlapping would save space and space is at a premium when it comes to genetic info, cuz we have to keep a lot of it. So some scientists thought that the genetic code would be overlapping. Several groups of scientists set out to test this
A few groups, including A Tsugita & H. Fraenkel-Conrat at UC Berkeley and Wittmann at Max Planck in Germany went about this in a mutagenesis way -> make mutations to the genetic info and see what happens to the protein. And other scientists (Sydney Brenner) approached it from more of a data mining way -> look at naturally existing proteins and whether the dipeptide combos present are codon-wise possible if overlapping is allowed.
Let’s start with the mutagenesis stuff. The basic premise was -> If the code IS overlapping, changing 1 letter could change 3 different words because that letter could be the 1st letter in 1 word, the 2nd letter in another word, and the 3rd letter in a third word. And, since the words spell amino acids, you could end up with up to 3 different amino acids in the resultant protein from a single mutation. If you only allow for overlap of 1 letter, you could change 2 in one not 3.
It’s important to note that not all mutations cause changes to the protein – because of the degeneracy of the genetic code, different words can spell the same amino acid (kinda like how grAy and grEy still mean a color between white & black). So, for example, ACU & ACC both spell threonine (Thr). So a mutation from ACU to ACC wouldn’t change the protein. We call this a synonymous substitution and it’s “silent”
BUT other mutations DO cause changes to the protein – if instead of grey to gray you go gray to green, a difference in the protein will be seen! NONSYNONYMOUS substitutions change the amino acid that the codon spells for.
I use mutagenesis all the time to look at specific parts of proteins, but they used mutagenesis to study the RNA, with protein just being the “readout.” Another difference is that, when I use mutagenesis, I do it “site-directed” – thanks to technology they didn’t have at the time I can introduce specific mutations to the genetic instructions I put into cells to make protein for me – But scientists at the time didn’t have that luxury – they were stuck with randomness. They could introduce chemical mutagens – things that can cause genetic mutations – different types of mutagens have different “preferences” for types & locations of mutations they cause, but other than that, the mutations they make are random.
A mutagen called nitrous acid tends to induce the conversion s C->U & A->G (well, it actually changes A to hypoxanthine which is copied as if it were G…) And a different mutagen, 5-fluorouracil (5-FU) tends to convert U->C.
In their classic paper, “The amino acid composition and C-terminal sequence of a chemically evoked mutant of TMV” PNAS, 1960, Tsugita & Fraenkel-Conrat used nitrous acid to introduce mutations in tobacco mosaic virus (TMV). It’s an RNA virus so they were mutating the RNA directly, not DNA. Then they studied the resultant proteins. They measured the amounts of each amino acid letter in the mutants and compared that to the normal version. And they saw differences. https://www.pnas.org/content/46/5/636
They analyzed the viral protein by hydrolyzing it with acetic acid, which splits all the letters apart, then separating the letters based on their chemical differences and measuring how much of each letter there was. They focused on one mutant in particular. It had a proline, an aspartate, and a threonine swapped for a leucine, and alanine, & a serine. Because they were just measuring the # of each letter, not the order of the letters, they couldn’t tell the actual sequence and didn’t know what swapped for what where – conclusively at least – they *were* able to figure out one of them with a pretty cool experiment.
But, before we get to that, this, in and of itself is pretty damn awesome if you think about it – they were able to make changes in the RNA and detect changes in the protein! Nowadays I take this for granted all the time. So – to that cool figuring out 1 swappage part. Scientists knew that you could add carboxypeptidase to cut off the C-terminal (end end) amino acid of proteins. And you’d think that you could just keep doing this – and it can – usually… But carboxypeptidase gets stuck when it hits proline (which has a weird side chain that loops back on itself and attaches to the backbone) and neighboring residues.
And scientists had found that if they added carboxypeptidase to natural strains of TMV, only a threonine would get cut off. The carboxypeptidase was getting stuck. And this ended up making sense because scientists were able to figure out the sequence of the C-terminal end of TMV to have and end of Pro-Ala-Thr
But when these scientists added carboxypeptidase to their mutant TMV, a whole bunch of amino acids got cut off (depending on how long they let it cut, they got to at least 15 residues cut off (they could tell this because they detected Phe, which is the 15th-to-last amino acid in the normal TMV). This indicated that that Proline block had been removed. And they knew from their amino acid composition comparison that Proline had been swapped for something – but they didn’t know if that something was a Leu or an Ala, or a Ser.
But if they compared the composition of the end the mutant gave with the composition the normal TMV end was known to have, they found that they had one more Leu. So they were able to deduce that a change from Pro to Leu (Pro->Leu) happened at the 3rd to last residue.
In the Nirenberg post, we talked about how RNA synthesis technology was in its infancy – well, protein sequencing technology was too. Around that time, scientists were beginning to be able to determine the amino acid composition of small proteins, but it wasn’t easy. A lot of it involved cutting proteins into shorter pieces, figuring out what was in those pieces, and then trying to piece the pieces together.
Tsugita & Fraenkel-Conra did a lot of that, but they weren’t doing it in this paper – yet. Though they hint that they’re working on it and later that year they report the complete sequence of a strain of TMV. But Wittman & Wittman-Liebold, in their (first-reported in German I think) experiments, as summed up in Protein chemical studies of two RNA viruses and their mutants – 1966, do go that additional step – before he hydrolyzed the amino acids (split them into individual letters) Wittmann digested them with trypsin. Acetic acid causes chemical hydrolysis that’s not picky, because all the peptides have the same backbone and the acetic acid just makes them more likely for water to attack & split it (the hydro in hydrolysis). https://pubmed.ncbi.nlm.nih.gov/5237188/
But trypsin’s a protein enzyme – it has shape, charge, etc. that has to accommodate the side chains of peptides it cuts & those side chains have a lot of different shapes and properties – trypsin can’t accommodate them all so instead it evolved to only cut after Lys & Arg (these are long and charged (sometimes) so other amino acids don’t fit nicely.
Because it only cuts after certain amino acids, this “pre-digestion” gave Wittmann longer peptides that he separated and then hydrolyzed those. And compared the amino acid compositions of those long (but shorter than the whole protein) peptides from mutant and normal instead of just comparing the whole proteins. This way, if he detected a difference (indicating a mutation) it would be located on a shorter piece that’s easier to read. It’s like instead of Where in the World is Carmen San Diego it’s “Where on this street is Carmen San Diego”
He was only looking at the protein level physically, but when he consulted that codon table he now had thanks to Nirenberg and friends, he saw that all the amino acid exchanges in 36 mutants they tested could be correlated with a single alteration per codon.
And when he looked at mutations caused by different types of mutagens he found that the amino acid changes were consistent with the expected type of RNA changes. Remember how nitrous acid tends to cause C->U & A->G & 5-FU causes U->C? Well, the predicted RNA changes were comparable with these – except for one case which could have been a spontaneous mutation that occurred when they were trying to induce mutations.
So, those groups started with changing the RNA and looking to the protein. Brenner instead started with the protein & then computed info about the RNA. You see, he recognized a problem with the overlapping codon theory – if you had a 1 base overlap, each codon would have to start with the RNA letter that the one before it ended with. So if one codon started ended with a G, then next one would have to start with a G. And there are only 16 codons of the 64 possible ones – that do.
Say your first codon is AUG (methionine, Met). After it you’d only be able to have valine (Val), alanine (Ala), Aspartate (Asp), Glutamate (Glu) or Glycine (Gly).cYou’ve lost the opportunity to have Met followed by any of the other 15 amino acids. So you’d have to pay a significant cost in diversity for that space-saving.
In a beautifully-named paper: On the impossibility of all overlapping triplet codes in information transfer from nucleic acid to proteins (PNAS, 1957), Brenner looked at dipeptide combos from 7 proteins of known sequence – and while he didn’t find every possibility (there are 20^2 = 400 possibilities)(remember at this time only a few small protein sequences were known) – but he found enough to, with accompanying match stuff (this paper is considered by some to be the first bioinformatics paper) show that codons couldn’t overlap. https://www.pnas.org/content/43/8/687
But that doesn’t mean the work of the other scientists wasn’t important – it certainly was because it showed direct links between single changes in RNA and changes in proteins – along with confirming other aspects of molecular biology dogma.
more on cracking the genetic code: http://bit.ly/nirenbergcodecracking
more on amino acids: http://bit.ly/aminoacidalphabet
more on translation: http://bit.ly/proteintranslation