Genes are stretches of DNA that serve as genetic “recipes” instructing your cells how to make things like proteins, which serve as “molecular workers,” doing everything from helping build up or break down other molecules, to serving as scaffolding to hold things together. As scientists raced to sequence an entire human genome (a full genetic “blueprint”), bets were on as to how many genes they’d find amongst those roughly 3 billion nucleotides (DNA letters). Millions? At least hundreds of thousands, right? Wrong. To their surprise (and the detriment of some bank accounts) they found that humans only have ~20,000 distinct protein-coding genes (way less than some microbes). But that doesn’t mean we can only make 20,000 different proteins… To understand what was going on, I need to introduce you to the intron! 

We keep the original version of our genome in DNA. But when we want to make a protein, our cells first have to make an RNA copy of that DNA through a process called transcription. The RNA produced is called messenger RNA (mRNA) because it serves as a molecular messenger, taking that protein recipe to the “chefs” – protein-making complexes called ribosomes. At first glance, this may seem like a waste of time, energy, and resources but there are several benefits. I will go into more detail (so if some of these terms don’t make since yet please hang in there!), but here’s an overview of benefits to tide you over⠀

  • DNA protection: that original DNA version is precious – mess with it in one cell and you can get mutations that are carried on to all further cells born from that cell. Therefore, for its protection, it is kept safe in a membrane-bound compartment inside the cell called the nucleus. But the ribosomes are in the cytoplasm (the general cellular interior) – so making mRNA copies that get exported out of the nucleus and into the cytoplasm allows the DNA to stay put while allowing protein to still be made⠀
  • amplification: you only have 2 copies of each gene (you get one from each biological parent, except for the sex-chromosomally-located ones), but you can make “as many” mRNA copies as you want. And there are a LOT of ribosomal “chefs” ready and willing to make protein based off of their instructions, so making more mRNA copies allows you to make more of the corresponding protein without getting bottlenecked. ⠀
  • regulation: RNA is really really similar to DNA (which is why our cells can use DNA as a template for making those RNA copies) – but RNA is less stable – so it’s easier to degrade when you’re done with it. This “easy-to-destroy-ness” makes it so your cells can shut down protein production when they have enough. More about some of the mechanisms by which they do this:⠀

now we get to the fun stuff…

  • alternative splicing: genes, the DNA versions of protein recipes, contain the instructions for what amino acids (protein letters) to link up in what order to make the protein of interest (in a process called translation) – but genes also contain additional information. The “put this amino acid then that amino acid” parts of the genes are called “exons” because they are EXpressed. But in between these exons are stretches of DNA that contain regulatory information that’s important for transcription (DNA to RNA copying) and other nuclear stuff. These INterupting “introns” are needed in those “upstream” steps, but they would confuse the ribosomes, which can’t tell if certain parts don’t contain amino-acid-adding instructions and thus would try to translate the introns, resulting in gibberish proteins. To prevent this problem, introns are cut out during part of the mRNA-making process called SPLICING. And this splicing can be done in alternative ways to give you different “versions” of the same recipe. It’s like being able to make a chocolate chip cookie from a chocolate chip and walnut cookie recipe. Even though you have to do that splicing, and it may seem like you’re wasting a lot of DNA, you’re actually saving DNA because you don’t have to have 2 almost identical genes. But speaking of almost identical genes, another benefit of the exon/intron setup is…⠀
  • exon shuffling: sometimes, in the course of evolution, a gene gets duplicated. This is a key evolutionary “technique” because you now have a “backup copy” for natural selection to play with without having to worry about messing up the other version (note: I make it seem like evolution has a plan, but it doesn’t – it’s all just random – random mutations get made, and if they’re harmful they get selected against, if they’re neutral nothing changes, and if they’re beneficial they get selected for). As if that weren’t cool enough, sometimes exons from different genes can get joined to give you new proteins. Like taking the frosting instructions from a cake recipe and sticking them on your cookie recipe to get a frosted cookie. ⠀

By now, I’ve hopefully convinced you that this whole setup of: 

DNA with extra regulatory info (introns) -> mRNA copies with introns removed to leave only exons (and maybe not all of them) -> “custom” proteins 

is cool and worth it. So now let’s go a bit more in depth. Because I’m the bumbling biochemist, and that’s what I do…⠀

note: some of the rest of this repeats some of what I just said, just in more depth, so apologies for formatting weirdness

Different proteins are specialized for different jobs, which they’re able to carry out thanks to their unique sequence of amino acid letters. There are 20 (common) amino acids, specified by genetic instructions, which have different properties (e.g. size, charge, water-liking/avoiding-ness (hydrophilicity/hydrophobicity). These properties cause different proteins to fold up differently and act differently so they can do different things.⠀

And our cells have a lot of different thing doing to do! So our cells have to make A LOT of different proteins. So, when scientists started sequencing DNA they thought human DNA was going to have WAYYYYY more protein-coding genes than we actually do – but turns out you don’t need a separate recipe for each protein – through alternative splicing you can save space in your chromosomal cookbooks by editing copies of the same recipe to give you different proteins (like cutting out the “Add raisins” step if you want oatmeal cookies instead of oatmeal raisin ones). 

Cells are able to do this without harming the original recipe because, instead of making proteins directly from the DNA gene, protein-making complexes called ribosomes work from temporary RNA copies of genes called messenger RNA (mRNA). The editing happens to these mRNA copies, not the originals, so you don’t lose the ability to make oatmeal raisin ones when the craving arises. ⠀

DNA is written in nucleotide letters (so is RNA, but the nucleotides in RNA have a Ribose sugar instead of a Deoxyribose sugar). DNA and RNA are “nucleic acids” and they’re characterized by having generic sugar-phosphate linker parts as well as unique nitrogenous bases that stick off (A, G, C, & T in DNA and A, G, C, & U in RNA). Those “bases” allow for letter-to-letter specific complementary base-pairing (A:T(or U) & G:C) between letters on different strands of DNA (or RNA). Therefore, DNA usually exists double-stranded – and can easily be copied into DNA or RNA. ⠀

Since DNA’s usually double-stranded, we commonly speak of length in terms of “base pairs” (bp) – and one early lesson genome sequencers learned is length isn’t everything. “Letter count”-wise, marbled lungfish Protopterus aethiopicus has >40 x more DNA than we do (~132.8 billion base pairs vs our “measly” ~3 billion. I got these stats here ( and they’re for a haploid copy (you have 2 copies of each chromosomal recipe cookbook and haploid refers to just one of each – so in terms of total DNA, you have ~6 billion base pairs stuffed in each of your cells except for the germ cells (sperm or eggs) (and not including mitochondrial DNA, which I don’t have time to get into here, but basically inside your cells are membrane-bound “rooms” called mitochondria where energy production occurs and they have some of their own DNA)). ⠀

P. aethiopicus holds the record for longest cookbooks (largest genome), but not the record for the most recipes – that prize is held by the frequent urinary tract infector Trichomonas vaginalis – a parasite with ~60,000 protein-coding genes (I specify protein-coding because not all genes have instructions for proteins – some genes make “functional RNAs” – regulatory RNAs like microRNA (miRNA) – one of my favorite things 🙂 .⠀

How does this compare to humans? T. vaginalis has about 2X more separate protein recipes than we do!!!!! It must have a ton of DNA right? Wrong! It only has ~160 million base pairs!!!!!!! WTF is going on in there?! Instead of “alternative facts” the molecular “culprit” is alternative splicing! This process allows us to make an estimated 500,000 (1/2 a million!) proteins from “only” 20,000 distinct genes. Let’s take a closer look at what’s inside a genetic cookbook! ⠀

Like I was talking about above, the “original recipes” are called genes, and they’re written in DNA. A bunch of genes are hooked up back-to-back with some “spacer content” in “cookbook volumes” called chromosomes. Humans have 23 chromosomes & we get 1 copy of each from each parent (22 of them at least are mostly identical except for slight different variations in the genes (allelic variation) that give us diversity – the 23rd chromosome (the “sex chromosome”) has an x & a y version which are more different from one another, and you get one of those (x or y) from each parent.⠀

Your whole collection of chromosomes is called your genome and in *our* cells (and the cells of other eukaryotes (basically most things that aren’t bacteria) it’s housed in a membrane-bound compartment of the cell called the nucleus (don’t confuse this with the atomic nucleus, which is the central hub of atoms where protons hang out).⠀

Instead of telling you to add vanilla & sugar, genetic recipes tell you to add valine (V) & serine (S) & (as well as 18 other amino acids that serve as protein “building blocks”) to make chains of amino acids that fold up into proteins. The recipes specify how much of each & in what order to add them & this “baking” process is called translation.

The nucleus serves as a kind of “reference section” of the cellular library –  it has a ton of important info, but you can’t “check it out” from the nucleus. Instead, if you want to use it, you have to make a copy (a process called transcription) & take it out of the nucleus into the “kitchen” of the cytoplasm (the main interior part of the cell) where the “chefs” (ribosomes) are.⠀

The copy machine (RNA polymerase) is in the nucleus & makes a copy of gene in RNA (RiboNucleic Acid) instead of DNA. RNA’s really similar to DNA & holds the same “info” but it’s less stable so it’s kinda like making a copy w/a shorter-lasting ink. First, RNA pol copies the gene “word for word” to make pre-messenger RNA (pre-mRNA). But this pre-mRNA copy has more info than the chefs need.⠀

In addition to telling you what to add where (what the chefs need), genetic recipes contain “margin notes” (introns)  providing info about things like *when* to make copies & “suggested pairings” (if you’re making this you might also want to make…). These notes are regulatory information & they’re important, but they’re “upstream” of the chefs who are just following orders from upper management. The chefs don’t need this info, so it gets cut out of the recipe copy before it’s given to them. The process of RNA splicing cuts out the regulatory info to turn pre-mRNA into mature mRNA. 

Those “margin notes” getting cut out are called introns because they INTerrupt the EXpressed “add this” steps (called EXONS) & provide unique opportunities for making variations of the same basic recipe.⠀

Say you have a recipe for a 3-layer cake with a layer of chocolate cake 🍫, then a layer of strawberry cake 🍓, topped off with a layer of vanilla cake 🍨 (🍫🍓🍨). If you split up the “make chocolate layer,” “make strawberry layer” & “make vanilla layer” steps you can cut 1 or 2 out of the recipe before handing it to the chef. So the same basic recipe can be altered to make a chocolate/strawberry cake 🍫🍓, a chocolate/vanilla cake 🍫🍓, a strawberry/vanilla cake 🍓🍨, a chocolate cake 🍫, a strawberry cake 🍓, or a vanilla cake 🍨!

Introns allow you to mix n’ match to make different proteins from the same instructions. This decreases the amount of DNA we need which is good because (although we don’t have as much as the lungfish) we already have a TON of it – so much so that we have to wind & wind & wind it up to get it to fit inside the nucleus. 

sidenote: that winded-up-bess offers an additional opportunity for regulation since you have to unwind the parts you want to use – “epigenetic” regulation often involves modifying (e.g. through methylation and/or acetylation) the histone proteins the DNA is wound around to make genes more or less available for copying⠀

But back to the intron/exon system – because it’s not done wowing us yet. In addition to space-saving, the intron/exon system opens up the potential for evolution⠀

The alternative splicing we looked at above only changes the RNA copy – NOT the gene itself -so you don’t have to worry about messing up your original recipe, but your options are limited. BUT if a gene gets duplicated so you have 2 copies of the recipe in your cookbook, evolution can play around w/the 2nd copy & make permanent changes in the gene itself without messing up the 1st. So you can do things like duplicate 1 of the exons (get a 2nd chocolate layer 🍫🍫🍓🍨) or even mix n’ match with other genes (maybe add a layer of frosting by adding on an exon from a different gene). We call this exon shuffling & it can lead to genes with new functions.⠀

It’s important to remember that all of those changes happen RANDOMLY – evolution doesn’t have a motive, but changes will “only” stick around if they’re useful (or at least not harmful on balance) – most genetic changes will be neutral or harmful but changes that are harmful will lead to lower survival – thus natural selection will “weed them out”⠀

Speaking of evolution, our cells have evolved to have pre-mRNA editors called spliceosomes that do the splicing, but bacterial cells haven’t (they have the copiers (RNA polymerase) & chefs (ribosomes) but no editors (spliceosomes)). So if we want bacteria to make a protein for us (such “recombinant protein expression” is a common method for getting a bunch of protein to study), we need to edit the recipe ahead of time. Instead of putting in the “full recipe” (the genomic DNA (gDNA)) we need to give it the edited version (but in DNA form) – the complementary (cDNA) which is complementary to the spliced mRNA (you want the complementary strand because it still has to get copied to give you mRNA and when you copy you get the complementary strand (see pics if this is confusing)) – so now, when the bacteria makes an RNA copy, it’s already edited & ready to go.

Since we’re adding cDNA not gDNA, we only get 1 variation of the recipe (which is actually usually a good thing for us because it decreases the number of variables we have to worry about) & we can choose which version we want. Then we can stick that version of the recipe into bacterial cells to have them make more copies of the recipe and/or the protein product. ⠀

note: Not all genes are recipes for proteins; some are recipes for functional RNAs. Functional RNAs are types of RNA that are more than “just messengers” – they can do things like bind to DNA or mRNA to regulate the DNA->RNA copying (transcription) or RNA->protein “baking” (translation)⠀

note: Mutations at or around splice sites of genes can cause genes to be mis-spliced, leading to disease. For example, some of the cystic-fibrosis mutations in the CFTR gene affect its splicing. Researchers are looking into using short pieces of chemically-stabilized DNA called “antisense oligonucleotides” (ASOs) to “hide” the mutations so that proper splicing occurs. 

some slides on this:

more on topics mentioned (& others) #365DaysOfScience All (with topics listed) 👉⠀

Leave a Reply

Your email address will not be published.