Do you cDNA in here? I hope you don’t see INTRONS! And 🤞 you’ll *complement* my mRNA! I talk about genes as being cellular recipes (written in DNA) for making proteins (written in amino acids) – since the DNA language is “universal” I can stick it into other types of cells (like bacteria) and get them to make protein for me to study – but instead of sticking in the actual gene for the protein I want to study, I stick in an edited version called complementary DNA (cDNA). This might sound like it’ll be another one of those super-technical posts – but it’s actually really cool – and relevant for way more than “just” such recombinant protein expression! Because, unlike things like CRISPR, this editing process happens all the time in our cells and it is responsible for generating tons of different proteins from a “small” (relatively-speaking!) amount of DNA. There’s lots to love about ALTERNATIVE SPLICING!

Proteins serve as “molecular workers” – they can do everything from help build up or break down other molecules, to serving as scaffolding to hold things together. Different proteins are specialized for different jobs (which they’re able to carry out thanks to their unique sequence of amino acid letters (specified by their genetic instructions) – these letters have different properties (e.g. size, charge, water-liking/avoiding-ness (hydrophilicity/hydrophobicity)) that get the proteins to fold up differently and act differently so they can do different things.

And our cells have a lot of different thing doing to do! So our cells have to make A LOT of different proteins. So, when scientists started sequencing DNA they thought human DNA was going to have WAYYYYY more protein-coding genes than we actually do – but turns out you don’t need a separate recipe for each protein – through ALTERNATIVE SPLICING you can save space in your chromosomal cookbooks by editing copies of the same recipe to give you different proteins (like cutting out the “Add raisins” step if you want oatmeal cookies instead of oatmeal raisin ones. Instead of making proteins directly from the DNA gene, protein-making complexes called ribosomes work from temporary RNA copies of genes called messenger RNA (mRNA). The editing happens to these mRNA copies, not the originals, so you don’t lose the ability to make oatmeal raisin ones when the craving arises. 

DNA is written in nucleotide letters (so’s RNA, but the nucleotides in RNA have a Ribose sugar instead of a Deoxyribose sugar). In addition to generic sugar-phosphate linker parts, nucleotides have unique nitrogenous bases that stick off (A, G, C, & T in DNA and A, G, C, & U in RNA) which allow for complementary base-pairing (A:T(or U) & G:C) so DNA usually exists double-stranded – and can easily be copied into DNA or RNA. 

Since DNA’s usually double-stranded, we usually speak of length in terms of “base pairs” (bp) – and one early lesson genome sequencers learned is length isn’t everything. “letter count”-wise, marbled lungfish Protopterus aethiopicus has >40 x more DNA than we do (~132.8 billion base pairs vs our “measly” ~3 billion. I got these stats here ( and they’re for a haploid copy (you have 2 copies of each chromosomal recipe cookbook and haploid refers to just one of each – so in terms of total DNA, you have ~6 billion base pairs stuffed in each of your cells except for the germ cells (sperm or eggs) (and not including mitochondrial DNA). 

P. aethiopicus holds the record for longest cookbooks, but not the record for the most recipes – that prize is held by the frequent urinary tract infector Trichomonas vaginalis – a parasite with ~60,000 protein-coding genes (I specify protein-coding because not all genes have instructions for proteins – some genes make “functional RNAs” – regulatory RNAs like microRNA (miRNA) – one of my favorite things 🙂 . 

How does this compare to humans? T. vaginalis has about 2X more separate protein recipes than we do!!!!! It must have a ton of DNA right? Wrong! It only has ~160 million base pairs!!!!!!! WTF is going on in there?! Instead of “alternative facts” the molecular “culprit” is alternative splicing! This process allows us to make an estimated 500,000 (1/2 a million!) proteins from “only” 20,000 distinct genes. Let’s take a closer look at what’s inside a genetic cookbook! 

Each of the goopy little white dots on these plates I just took out of the incubator is a little cluster of bacteria – we call these clusters colonies and the bacteria within a single colony are all (assuming they don’t acquire mutations) genetically identical – but the bacteria in different colonies can be different. Thy *should* all be identical and have a little circular piece of DNA called a plasmid that I’ve shoved in (through a process called transformation that often involves heat shock – taking weakened cells & heating them up to open up pores in the bacterial membranes to let the plasmid sneak in. 

The plasmid contains an antibiotic resistance gene – and these bacterial food plates (nutrient agar) are spiked with the corresponding antibiotic – so that only cells that have taken in the plasmid can grow on it. But it’s not the antibiotic resistance gene that I’m interested in – that just serves as a selection marker – I want to know if the instructions for making a protein I want to study are in that plasmid too! And, while we often talk about sticking a “gene” for a protein into a plasmid – what we’re actually sticking in is an edited version of the gene called complementary DNA (cDNA), which has the regulatory info (introns) cut out – and sometimes parts of the actual recipe instruction parts can be edited in different ways to give you different protein products – in our cells these processes happen naturally through mRNA splicing and alternative splicing. But bacteria don’t splice – and even if they did, which version would they choose? So it’s up to us to tell them which to use! 

Say I learn about a really cool protein & I want to study it from a biochemical/structural biological/biophysical viewpoint (cuz that’s the bumbling biochemist’s M.O.) What do I do? I’m going to need a LOT of it  & I’m going to need to get it super PURE. So 1st I need to get my hands on a copy of its “recipe” – like I was talking about above, the “original recipes” are called genes, and they’re written in DNA. A bunch of genes are hooked up back-to-back with some “spacer content” in “cookbook volumes” called chromosomes. 

Humans have 23 chromosomes & we get 1 copy of each from each parent (22 of them at least are mostly identical except for slight different variations in the genes (allelic variation) that give us diversity – the 23rd chromosome (the “sex chromosome”) has an x & a y version which are more different and you get one of those (x or y) from each parent.

Your whole collection of chromosomes is called your GENOME and in *our* cells (and the cells of other eukaryotes (basically most things that aren’t bacteria) it’s housed in a membrane-bound compartment of the cell called the NUCLEUS (don’t confuse this with the atomic nucleus, which is the central hub of atoms where protons hang out)

Instead of telling you to add vanilla & sugar, genetic recipes tell you to add valine (V) & serine (S) & (as well as 18 other AMINO ACIDS that serve as protein “building blocks”) to make chains of amino acids that fold up into proteins. The recipes specify how much of each & in what order to add them & this “baking” process is called TRANSLATION

The nucleus serves as a kind of “reference section” of the cellular library –  it has a ton of important info, but you can’t “check it out” from the nucleus. Instead, if you want to use it, you have to make a copy (a process called TRANSCRIPTION) & take it out of the nucleus into the “kitchen” of the CYTOPLASM (the main interior part of the cell) where the “chefs” (ribosomes) are

The copy machine (RNA polymerase) is in the nucleus & makes a copy of gene in RNA (RiboNucleic Acid) instead of DNA. RNA’s really similar to DNA & holds the same “info” but it’s less stable so it’s kinda like making a copy w/a shorter-lasting ink/ 1st, RNA pol copies the gene “word for word” to make pre-messenger RNA (pre-mRNA). But this pre-mRNA copy has more info than the chefs need

In addition to telling you what to add where (what the chefs need), genetic recipes contain “margin notes” (INTRONS)  providing info about things like *when* to make copies & “suggested pairings” (if you’re making this you might also want to make…). These notes are REGULATORY information & they’re important, but they’re “upstream” of the chefs who are just following orders from upper management. The chefs don’t need this info, so it gets cut out of the recipe copy before it’s given to them. the process of RNA SPLICING cuts out the regulatory info to turn pre-mRNA into mature mRNA

Those “margin notes” getting cut out are called INTRONS because they INTerrupt the EXpressed “add this” steps (called EXONS) & provide unique opportunities for making variations of the same basic recipe

 Say you have a recipe for a 3-layer cake 🎂 with a layer of chocolate cake 🍫, then a layer of strawberry cake 🍓, topped off w/a layer of vanilla cake 🍨 (🍫🍓🍨🍰). If you split up the “make chocolate layer,” “make strawberry layer” & “make vanilla layer” steps you can cut 1 or 2 out of the recipe before handing it to the chef. So the same basic recipe can be altered to make a chocolate/strawberry cake 🍫🍓, a chocolate/vanilla cake 🍫🍓, a strawberry/vanilla cake 🍓🍨, a chocolate cake 🍫, a strawberry cake 🍓, or a vanilla cake 🍨 🤯

INTRONS allow you to mix n’ match to make different proteins from the same instructions. This decreases the amount of DNA we need which is good because (although it’s not as much as the lungfish) we already have a TON of it – so much so that we have to wind & wind & wind it up to get it to fit inside the nucleus (which offers an additional opportunity for regulation since you have to unwind the parts you want to use – “epigenetic” regulation often involves modifying (e.g. through methylation and/or acetylation) the histone proteins the DNA is wound around to make genes more or less available for copying)

But back to the intron/exon system – because it’s not done wowing us yet. In addition to space-saving, the intron/exon system opens up the potential for evolution

The ALTERNATIVE SPLICING we looked at above only changes the RNA copy – NOT the gene itself -so you don’t have to worry about messing up your original recipe, but your options are limited. BUT if a gene gets duplicated so you have 2 copies of the recipe in your cookbook, evolution can play around w/the 2nd copy & make permanent changes in the GENE itself without messing up the 1st. So you can do things like duplicate 1 of the exons (get a 2nd chocolate layer 🍫🍫🍓🍨) or even mix n’ match with other genes (maybe add a layer of frosting by adding on an exon from a different gene. We call this exon shuffling & it can lead to genes with new functions.

It’s important to remember that all of those changes happen RANDOMLY – evolution doesn’t have a motive, but changes will “only” stick around if they’re useful (or at least not harmful on balance) – most genetic changes will be neutral or harmful but changes that are harmful will lead to lower survival – thus natural selection will “weed them out”

Speaking of evolution, our cells have evolved to have pre-mRNA editors called SPLICEOSOMES that do the splicing, but bacterial cells haven’t (they have the copiers (RNA polymerase) & chefs (ribosomes) but no editors (spliceosomes)). So if we want bacteria to make a protein for us, we need to edit the recipe ahead of time. Instead of putting in the “full recipe” (the genomic DNA (gDNA)) we need to give it the edited version (but in DNA form) – the COMPLEMENTARY DNA (cDNA) which is complementary to the spliced mRNA (you want the complementary strand because it still has to get copied to give you mRNA and when you copy you get the complementary strand (see pics if this is confusing) – so now, when the bacteria makes an RNA copy, it’s already edited & ready to go

Since we’re adding cDNA not gDNA, we only get 1 variation of the recipe (which is actually usually a good thing for us because it decreases the number of variables we have to worry about) & we can choose which version we want. Then we can stick that version of the recipe into bacterial cells to have them make more copies of the recipe and/or the protein product. 

📝Not all genes are recipes for proteins 👉 some are recipes for FUNCTIONAL RNAs 👉 types of RNA that are more than “just messengers” 👉 they can do things like bind to DNA or mRNA to regulate the DNA->RNA copying (transcription) or RNA->protein “baking” (translation)

some slides on this: 

more on topics mentioned (& others) #365DaysOfScience All (with topics listed) 👉

2 Thoughts on “mRNA Splicing, alternative splicing, cDNA”

  • Neat! Informative! The analogies are great… and delicious.

    I’m a computer science master’s student, thinking about going for a Ph. D. in bio(medical)informatics, and I think your blog can help me along with the life science-y side of things. 😀

Leave a Reply

Your email address will not be published.