When scientists sequenced the human genome (our full genetic “blueprint”) they were surprised at how few actual genes (individual genetic “recipes” for things like proteins) we had (many a bet was lost!) To understand what was going on, I need to introduce you to the intron!
We keep the original version of our genome in DNA. But when we want to make a protein, our cells first have to make an RNA copy of that DNA through a process called transcription. The RNA produced is called messenger RNA (mRNA) because it serves as a molecular messenger, taking that protein recipe to the “chefs” – protein-making complexes called ribosomes. At first glance, this may seem like a waste of time, energy, and resources but there are several benefits. I will go into a bit more detail but here’s an overview to tide you over
- DNA protection: that original DNA version is precious – mess with it in one cell and you can get mutations that are carried onto all further cells born from that cell – so it is kept safe in a membrane-bound compartment inside the cell called the nucleus. But the ribosomes are in the cytoplasm (the general cellular interior) – so making mRNA copies that get exported out of the nucleus and into the cytoplasm allows the DNA to stay put but protein to still be made
- amplification: you only have 2 copies of a gene (you get one from each parent, except for the sex-chromosomally-located ones), but you can make “as many” mRNA copies as you want. And there are a LOT of ribosomal “chefs” ready and willing to make protein based off of their instructions
- regulation: RNA is really really similar to DNA (which is why our cells can use DNA as a template for making those RNA copies) – but RNA is less stable – so it’s easier to degrade when you’re done with it, so you can shut down protein production when your cells have had enough (we looked at some of the mechanisms by which they do this yesterday: https://bit.ly/mrnalife
- alternative splicing: genes, the DNA versions of protein recipes, contain the instructions for what amino acids (protein letters) to link up in what order to make the protein of interest (in a process called translation) – but they also contain additional information. The “put this amino acid then that amino acid” parts of the genes are called exons because they are EXpressed. But in between these exons are stretches of DNA that contain regulatory information that’s important for transcription (DNA to RNA copying) and other nuclear stuff but these INterupting introns would just confuse the ribosomes because they don’t contain amino-acid-adding instructions and the ribosome can’t tell, so if it were to translate these interrupting Introns, the proteins would be gibberish. To prevent production of gibberish proteins, those introns are cut out during the mRNA-making process called SPLICING. And this splicing can be done in alternative ways to give you different “versions” of the same recipe. It’s like being able to make a chocolate chip cookie from a chocolate chip and walnut cookie recipe. Even though you have to do that splicing, and it may seem like you’re wasting a lot of DNA, you’re actually saving DNA because you don’t have to have 2 almost identical genes. But speaking of almost identical genes, another benefit of the exon/intron setup is…
- exon shuffling: sometimes, in the course of evolution, a gene gets duplicated. This is a key evolutionary “technique” because you now have a “backup copy” for natural selection to play with without having to worry about messing up the other version (note: I make it seem like evolution has a plan, but it doesn’t – it’s all just random – random mutations get made, and if they’re harmful they get selected against, if they’re neutral nothing changes, and if they’re beneficial they get selected for). As if that weren’t cool enough, sometimes exons from different genes can get joined to give you new proteins. Like taking the frosting instructions from a cake recipe and sticking them on your cookie recipe to get a frosted cookie.
By now, I’ve hopefully convinced you that this whole setup of DNA with extra regulatory info (introns) -> mRNA copies with introns removed to leave only exons (and maybe not all of them) -> “custom” proteins is cool and worth it. So now let’s go a bit more in depth. Because I’m the bumbling biochemist, and that’s what I do…
note: some of the rest of this will be a repeat of what I just said because most of the rest of this is a repost of a post I posted in November
Proteins serve as “molecular workers” – they can do everything from help build up or break down other molecules, to serving as scaffolding to hold things together. Different proteins are specialized for different jobs (which they’re able to carry out thanks to their unique sequence of amino acid letters (specified by their genetic instructions) – these letters have different properties (e.g. size, charge, water-liking/avoiding-ness (hydrophilicity/hydrophobicity)) that get the proteins to fold up differently and act differently so they can do different things.
And our cells have a lot of different thing doing to do! So our cells have to make A LOT of different proteins. So, when scientists started sequencing DNA they thought human DNA was going to have WAYYYYY more protein-coding genes than we actually do – but turns out you don’t need a separate recipe for each protein – through ALTERNATIVE SPLICING you can save space in your chromosomal cookbooks by editing copies of the same recipe to give you different proteins (like cutting out the “Add raisins” step if you want oatmeal cookies instead of oatmeal raisin ones. Instead of making proteins directly from the DNA gene, protein-making complexes called ribosomes work from temporary RNA copies of genes called messenger RNA (mRNA). The editing happens to these mRNA copies, not the originals, so you don’t lose the ability to make oatmeal raisin ones when the craving arises.
DNA is written in nucleotide letters (so’s RNA, but the nucleotides in RNA have a Ribose sugar instead of a Deoxyribose sugar). In addition to generic sugar-phosphate linker parts, nucleotides have unique nitrogenous bases that stick off (A, G, C, & T in DNA and A, G, C, & U in RNA) which allow for complementary base-pairing (A:T(or U) & G:C) so DNA usually exists double-stranded – and can easily be copied into DNA or RNA.
Since DNA’s usually double-stranded, we usually speak of length in terms of “base pairs” (bp) – and one early lesson genome sequencers learned is length isn’t everything. “letter count”-wise, marbled lungfish Protopterus aethiopicus has >40 x more DNA than we do (~132.8 billion base pairs vs our “measly” ~3 billion. I got these stats here (https://go.nature.com/2pWN0Cm) and they’re for a haploid copy (you have 2 copies of each chromosomal recipe cookbook and haploid refers to just one of each – so in terms of total DNA, you have ~6 billion base pairs stuffed in each of your cells except for the germ cells (sperm or eggs) (and not including mitochondrial DNA).
P. aethiopicus holds the record for longest cookbooks, but not the record for the most recipes – that prize is held by the frequent urinary tract infector Trichomonas vaginalis – a parasite with ~60,000 protein-coding genes (I specify protein-coding because not all genes have instructions for proteins – some genes make “functional RNAs” – regulatory RNAs like microRNA (miRNA) – one of my favorite things 🙂 .
How does this compare to humans? T. vaginalis has about 2X more separate protein recipes than we do!!!!! It must have a ton of DNA right? Wrong! It only has ~160 million base pairs!!!!!!! WTF is going on in there?! Instead of “alternative facts” the molecular “culprit” is alternative splicing! This process allows us to make an estimated 500,000 (1/2 a million!) proteins from “only” 20,000 distinct genes. Let’s take a closer look at what’s inside a genetic cookbook!
Like I was talking about above, the “original recipes” are called genes, and they’re written in DNA. A bunch of genes are hooked up back-to-back with some “spacer content” in “cookbook volumes” called chromosomes. Humans have 23 chromosomes & we get 1 copy of each from each parent (22 of them at least are mostly identical except for slight different variations in the genes (allelic variation) that give us diversity – the 23rd chromosome (the “sex chromosome”) has an x & a y version which are more different and you get one of those (x or y) from each parent.
Your whole collection of chromosomes is called your GENOME and in *our* cells (and the cells of other eukaryotes (basically most things that aren’t bacteria) it’s housed in a membrane-bound compartment of the cell called the NUCLEUS (don’t confuse this with the atomic nucleus, which is the central hub of atoms where protons hang out)
Instead of telling you to add vanilla & sugar, genetic recipes tell you to add valine (V) & serine (S) & (as well as 18 other AMINO ACIDS that serve as protein “building blocks”) to make chains of amino acids that fold up into proteins. The recipes specify how much of each & in what order to add them & this “baking” process is called TRANSLATION
The nucleus serves as a kind of “reference section” of the cellular library – it has a ton of important info, but you can’t “check it out” from the nucleus. Instead, if you want to use it, you have to make a copy (a process called TRANSCRIPTION) & take it out of the nucleus into the “kitchen” of the CYTOPLASM (the main interior part of the cell) where the “chefs” (ribosomes) are
The copy machine (RNA polymerase) is in the nucleus & makes a copy of gene in RNA (RiboNucleic Acid) instead of DNA. RNA’s really similar to DNA & holds the same “info” but it’s less stable so it’s kinda like making a copy w/a shorter-lasting ink/ 1st, RNA pol copies the gene “word for word” to make pre-messenger RNA (pre-mRNA). But this pre-mRNA copy has more info than the chefs need
In addition to telling you what to add where (what the chefs need), genetic recipes contain “margin notes” (INTRONS) providing info about things like *when* to make copies & “suggested pairings” (if you’re making this you might also want to make…). These notes are REGULATORY information & they’re important, but they’re “upstream” of the chefs who are just following orders from upper management. The chefs don’t need this info, so it gets cut out of the recipe copy before it’s given to them. the process of RNA SPLICING cuts out the regulatory info to turn pre-mRNA into mature mRNA
Those “margin notes” getting cut out are called INTRONS because they INTerrupt the EXpressed “add this” steps (called EXONS) & provide unique opportunities for making variations of the same basic recipe.
Say you have a recipe for a 3-layer cake 🎂 with a layer of chocolate cake 🍫, then a layer of strawberry cake 🍓, topped off w/a layer of vanilla cake 🍨 (🍫🍓🍨🍰). If you split up the “make chocolate layer,” “make strawberry layer” & “make vanilla layer” steps you can cut 1 or 2 out of the recipe before handing it to the chef. So the same basic recipe can be altered to make a chocolate/strawberry cake 🍫🍓, a chocolate/vanilla cake 🍫🍓, a strawberry/vanilla cake 🍓🍨, a chocolate cake 🍫, a strawberry cake 🍓, or a vanilla cake 🍨 🤯
INTRONS allow you to mix n’ match to make different proteins from the same instructions. This decreases the amount of DNA we need which is good because (although it’s not as much as the lungfish) we already have a TON of it – so much so that we have to wind & wind & wind it up to get it to fit inside the nucleus (which offers an additional opportunity for regulation since you have to unwind the parts you want to use – “epigenetic” regulation often involves modifying (e.g. through methylation and/or acetylation) the histone proteins the DNA is wound around to make genes more or less available for copying)
But back to the intron/exon system – because it’s not done wowing us yet. In addition to space-saving, the intron/exon system opens up the potential for evolution
The ALTERNATIVE SPLICING we looked at above only changes the RNA copy – NOT the gene itself -so you don’t have to worry about messing up your original recipe, but your options are limited. BUT if a gene gets duplicated so you have 2 copies of the recipe in your cookbook, evolution can play around w/the 2nd copy & make permanent changes in the GENE itself without messing up the 1st. So you can do things like duplicate 1 of the exons (get a 2nd chocolate layer 🍫🍫🍓🍨) or even mix n’ match with other genes (maybe add a layer of frosting by adding on an exon from a different gene. We call this exon shuffling & it can lead to genes with new functions.
It’s important to remember that all of those changes happen RANDOMLY – evolution doesn’t have a motive, but changes will “only” stick around if they’re useful (or at least not harmful on balance) – most genetic changes will be neutral or harmful but changes that are harmful will lead to lower survival – thus natural selection will “weed them out”
Speaking of evolution, our cells have evolved to have pre-mRNA editors called SPLICEOSOMES that do the splicing, but bacterial cells haven’t (they have the copiers (RNA polymerase) & chefs (ribosomes) but no editors (spliceosomes)). So if we want bacteria to make a protein for us (such “recombinant protein expression” is a common method for getting a bunch of protein to study), we need to edit the recipe ahead of time. Instead of putting in the “full recipe” (the genomic DNA (gDNA)) we need to give it the edited version (but in DNA form) – the COMPLEMENTARY DNA (cDNA) which is complementary to the spliced mRNA (you want the complementary strand because it still has to get copied to give you mRNA and when you copy you get the complementary strand (see pics if this is confusing) – so now, when the bacteria makes an RNA copy, it’s already edited & ready to go/
Since we’re adding cDNA not gDNA, we only get 1 variation of the recipe (which is actually usually a good thing for us because it decreases the number of variables we have to worry about) & we can choose which version we want. Then we can stick that version of the recipe into bacterial cells to have them make more copies of the recipe and/or the protein product.
📝Not all genes are recipes for proteins 👉 some are recipes for FUNCTIONAL RNAs 👉 types of RNA that are more than “just messengers” 👉 they can do things like bind to DNA or mRNA to regulate the DNA->RNA copying (transcription) or RNA->protein “baking” (translation)
some slides on this: http://bit.ly/2Tpsdlh