After 9 months, I finally got to see (and even hug!) my mom today! So I thought it’d be good to review DNA! How we develop as individuals is a combination of “nurture” (which she gave(s) immeasurably) and “nature” (which she gave about half of hers). This nature part refers to the genetic information we inherit from our biological parents – our genetic information is stored in pairs of long coiled-up DNA strands called chromosomes and we inherit one copy of each chromosome from each of our biological parents. These chromosomes are like cookbooks with “recipes” called genes for making proteins (and some functional RNAs) and slight differences in these recipes (nucleotide polymorphisms) can lead to small differences in the proteins we make (either the proteins themselves or when or how much of them we make). Proteins are the major workers & structure-makers in our cells; therefore, while it is far from deterministic, the “cookbook collection” of DNA we inherit (referred to as our genome) does provide the foundation on which nurture can act. So let’s take a closer look at the letters in these cookbooks!
DNA stands for DeoxyRiboNucleic Acid. “Nucleic acid” is a category that includes DNA and the closely related molecule RNA (RiboNucleic Acid) and the nucleic acid “alphabet” is made up of “letters” called nucleotides (nt) (technically deoxynucleotides (dNT) in the case of DNA. These letters have generic sugar-phosphate backbones for easy linking as well as unique nitrogenous bases which can form specific base pairs with each other (A to T (or U in RNA)) and C to G. This way you can use one strand of double-stranded DNA as a template for making the other one (which if you used that as a template would give you back the 1st strand…) DNA & RNA are so closely related that molecules in our cells can use a sequence of DNA letters (deoxynucleotides) as a template for making the complementary sequence of RNA letters (nucleotides). This is important because your cells have to do that each time they want to make a protein, a type of molecule which is written in a different language – the language of amino acids.
Proteins are long folded-up chains of molecular letters called amino acids. There are 20 different (common) ones, each with unique properties, and they can link up in different orders to give you different proteins. But in order for the right amino acids to be added in the right order to give you the right protein, you need to have a really clear recipe. Nucleic acids are really good for this. There are only 4 DNA letters & 4 RNA letters, but by using 3-letter “RNA words” called codons to spell 1 amino acid letter, you can write out the recipe.
So, for example,
the RNA sequence: AUGGCAAAGGAACCCGAAGCUUGCGAA
spells the protein sequence: Methionine(M)-Alanine(A)-Lysine(K)-Glutamate(E)-Proline(P)-Glutamate(E)-Alanine(A)-Cysteine(C)-Glutamate(E)
But, back up a sec, why are we talking about RNA? Didn’t I say genes are in DNA?! Yes – BUT – RNA copies of the genes act as go-betweens!
Your genome is written in DNA. You have one copy of it in each of your cells. And it’s precious – protecting it is one of your body’s main priorities, so your cells restrict access to it, by locking it up in a membrane-bound compartment in your cells called the nucleus. The only time it gets out (in healthy cells) is when cells divide – the genome gets copied each time a cell divides so you never lose it.
So it’s locked up most of the time, but you also have to be able to access it or else it’s pretty useless… Enter the messenger RNA!
You can think of the nucleus as being sort of like you the reference section of a library (where they keep encyclopedias and maps and stuff). When you want to make the protein, you don’t work with the original recipe, instead you “Xerox” it and give the “chefs” (protein-making machinery called the ribosome) copies to work with. The original recipe (the one that is kept in the “reference section”) is written in DNA & the “Xeroxes” are made in RNA.
In more technical terms, if you need to make a protein, you have to find the recipe for it. Then you have to TRANSCRIBE a copy of it into the related RNA language. Then you can take this MESSENGER RNA (mRNA) copy out of the nucleus & into the CYTOPLASM (general cellular interior) where RIBOSOMES (protein/RNA molecular machines) TRANSLATE the mRNA into the PROTEIN language of AMINO ACIDS. As we saw above, 3 RNA letter combos (codons) spell out 1 protein letter – ribosomes travel along the mRNA in 3-letter steps, pausing at each codon and waiting for a different type of RNA, transfer RNA (tRNA) to bring it the corresponding amino acid. http://bit.ly/proteintranslation
Because there’s a larger number of amino acids, with a broader range of properties than you get with nucleotides, you can use the “simpler” nucleic acid language to give you a huge diversity of proteins. http://bit.ly/aminoacidalphabet
But just because nucleic acids are “simpler” doesn’t mean they aren’t amazing in they’re own right. So here are a few things about them to highlight!
First, some terminology. The combination of ALL the DNA in one of your cells is called your GENOME. You can think of it like a super long cookbook where the recipes for making proteins, functional RNAs, etc. are regions of DNA called GENES. The genome’s split up into “volumes” called CHROMOSOMES & you have 46 of them. 46? Then why’s it called “23 and me?” 23 *pairs* – you have 2 copies of each volume – 1 you inherited from each of your biological parents.
The copies are mostly the same (except for the pair of sex chromosomes which can be different) – they have recipes for making the same things BUT small differences in the recipes (POLYMORPHISMS) lead to slight differences in the final products &/or when they’re made. This variety is good (polymorphisms are the spice of genetic life) because the slightly different versions might be slightly better in certain situations. BUT you might also have “typos” (harmful mutations) in 1 copy of a recipe, but thankfully you have a backup! (🤞 your backup’s ok)
Just like proteins have layers of structure (primary structure’s sequence of amino acids, secondary structure’s things like helices & strands made through backbone-backbone interactions, tertiary involves side chains, & quaternary involves more than one polypeptide chain) nucleic acids have structure too. The sequence of nucleotide “letters” in nucleic acids (DNA & RNA) defines its PRIMARY STRUCTURE & base-pairing between those nucleotides gives it SECONDARY STRUCTURE
And just like with proteins, the primary structure is of primary importance, so let’s take a closer look at these nucleic acid letters!
If you hear the word “polymer” you might think of “plastic” – “plastic” is actually a pretty generic term used to encompass lots of types of synthetic or semi-synthetic organic (carbon skeleton-based) polymers. And we try to avoid getting those in our body – but there are also polymers that *do* belong in our body – in fact polymers produce people! Because polymer (poly (many) + memos (parts)) is just a word for chains of similar units (monomers)(single part).
So just like the polyethylene in plastic bottles is made up of lots and lots of stuck together “ethylene” (aka ethene) monomers (CH₂=CH₂) and the polypropylene in your microwave container gets its strength from lots and lots of individual propylene (CH₂=CHCH₃) monomers teaming up, our bodies: store energy in the form of polymers called glycogen, which has glucose (blood sugar) monomers; carry out work with the help of polymers of amino acids (yup, proteins are biological macro-polymers); and encode genetic information in polymers called nucleic acids, which are “just” polymers made up of nucleotide monomers, which I want to talk some more about today.
These nucleotide monomers are themselves made up of key components. NUCLEOTIDES contain
🔹 a NITROGENOUS (nitrogen-containing) BASE (aka NUCLEOBASE)
🔹 a PENTOSE (5-carbon SUGAR) – either ribose (in RNA) or deoxyribose (in DNA)
🔹 1 or more PHOSPHATES (negatively-charged)
building nucleic acids up from their parts…
BASE + SUGAR = nucleoSIDE
BASE + SUGAR + PHOSPHATE = nucleoTIDE
NUCLEOTIDE + NUCLEOTIDE + NUCLEOTIDE . . . = NUCLEIC ACID
There are 5 common nitrogenous bases (nucleobases, often just called “bases”). 3 are found in *both* RNA & DNA (but attached to different sugars) – and these 3 shared ones are adenine (A), guanine (G), & cytosine (C). Then you have thymine (T), which is only found in DNA & uracil (U), which is only found in RNA. They can be classified into “purines” (A & G) or pyrimidines (C, T, U). To help you remember which is which, you can use the mnemonic PURe As Gold to recall that the purines are adenine and guanine (so the other 3 have to be pyrimidines).
What’s the diff? Purines have 2 rings (1 6-membered, 1 5-membered) (I remember this by thinking of purines as “pure” – having it all) as opposed to pyrimidines which only have 1 ring (6-membered). As we’ll see later, purines bind to pyrimidines and vice versa so each rung of the DNA double-helix ladder has 3 rings and our ladder has constant width (if that didn’t make sense, stick with me and I will detail more below).
But before talking about interactions *between* nucleotides, we need to finish making our nucleotides. We need to hook up those bases to their generic backbone. When those bases hook up to a Sugar (through a glycosidic bond) they become nucleoSides. We call them ribonucleosides if they bind to ribose & deoxyribonucleosides if they bind to deoxyribose (like ribose but w/1 less oxygen).
The “weddings” of nucleobase to sugar also come with name changes
🔹 purines -> end in “-sine” (e.g. adenine becomes adenosine)
🔹 pyrimiDines -> end in “-Dine” (e.g. cytosine becomes cytidine)
Almost there… we’ve gone from nucleobase + sugar to nucleoSide and now it’s time to make our monomer, the nucleoTide. And we do this by hooking them up with phosphate groups.
🔹 +1 phosphate gives you a nucleoside MONOphosphate (NMP (if the nucleoside has ribose as the sugar) or dNMP (if the sugar’s deoxyribose)) (e.g. AMP)
🔹+2 phosphates gives you a nucleoside Diphosphate (NDP or dNDP) (e.g. ADP)
🔹+3 phosphates gives you a nucleoside TRIphosphate (NTP or dNTP) (e.g. ATP – the energy money we keep talking about)
It may look confusing that the nucleoTides have nucleoSide in their name as written above, but that’s because it’s written as its parts – nucleoSide + phosphate (nucleoTide has the phosphate so if we wrote nucleoTide monophosphate, etc. that would be “redundant”
Now we have our monomers (nucleotides) – time to stick them together to form a polymer (nucleic acids). Nucleotides join together through phosphodiester bonds, which are strong covalent (electron-sharing) linkages that leave a backbone of nucleic acids consisting of alternating phosphate & pentose residues w/nitrogenous bases as “side groups” joined to the backbone at regular intervals. Nucleotides come to be added (by enzymes called DNA or RNA polymerases) in their triphosphate form, but 2 phosphates are lost as energy payment, so you end up with a single phosphate in between each letter in the chain.
Even though you only have 1 phosphate per letter, you have lots of letters, so you have to accommodate a lot of phosphates, which are negatively-charged. But how best to accommodate it all? The sugar-phosphate backbone is charged & HYDROPHILIC (water-loving) but nucleobases are HYDROPHOBIC (water-excluded). So nucleic acids try to form secondary structures that minimize exposure of bases to water & maximize exposure of the backbone to water http://bit.ly/hydrophobesarenotafraid
DNA usually does this through forming double-stranded DNA (dsDNA) consisting of a DOUBLE HELIX made up of 2 DNA strands running in opposite directions (ANTIPARALLEL). In this helix, bases pair in the center like rungs in a ladder – the bases are planar (flat), so their rings “stack” like pancakes & help “glue” the strands together. They don’t have charges to offer, but, kinda like how if you alternate the pages of 2 phone books you can get them stuck together really strongly, these BASE STACKING interactions add up, to see your DNA zipped up.
The width of the helix is constant because 1 PURINE (2-ringed base) & one PYRIMIDINE (1-ringed base) always join together: C to G & A to T. They pair in this way because they have complementary binding opportunities on their “edges.” Unlike the strong phosphodiester bonds linking letters within a chain through their backbone, which are covalent bonds (involve neighboring atoms electron-sharing), the bonds connecting the bases in different strands are weaker hydrogen bonds (H-bonds) which are a special type of partial charge attractions.
In a covalent bond, neighboring molecules share a pair of electrons (e⁻). But sometimes 1 of the atoms holds on to e⁻ more tightly (it’s more ELECTRONEGATIVE) so its partly negative & the other atom is partly positive. We call this a POLAR covalent bond. Opposites attract, so the negative part seeks out something positive & the positive part seeks out something negative.
“Hydrogen bond” (H-bond) is just a special name we give to these attractions when the partly positive part’s a hydrogen attached to something electronegative (e⁻ hogging) like nitrogen or oxygen and the partly negative part is an electronegative atom (like N or O) with a lone pair of electrons (electrons not being used for pairing). http://bit.ly/waterawesomeness
And we can find lots of these H-bond donors & H-bond acceptors in the nucleobase part of nucleotides (the purine or pyrimidine). And your cells “know” how to copy it based on which bases have matching donor & acceptor sites. A & T (or U) only form 2 H-bonds w/each other BUT G & C can form 3 H-bonds when they “Velcro together” so they’re stronger
Back to the helix – See a DNA double helix, put your right thumb up. If the twist follows your fingers your graphic designer is in luck! The first thing I do when I see a graphic of DNA is give it a thumbs up. Not *just* because I’m happy to see it (though it is *groovy*) but because I’m checking to make sure it’s RIGHT-HANDED! In order to complement the base pairing we’ll discuss more in a second, the strands adopt a right-handed helical form (to see that it’s right handed, give a thumbs up w/right hand – the direction your fingers are curling is direction DNA curls)
The most common double helix in DNA is “B-form” It has 2 grooves – a larger major groove, & narrower minor groove. These grooves can act like windows for other molecules to “read” their sequence without having to unzip them. RNA can also form double helixes (though usually a more compact “A-form”) & it also forms a variety of structures w/in single strands
In the MAJOR groove you can distinguish all 4 base pairs, but in the MINOR groove you can only tell GC or CG vs AT or TA. I still remember the mnemonic I came up with when taking molecular bio in undergrad… AHA! ADAM ATe A TAMADA! (see pic)
Thanks to DNA’s grooviness, you can read the DNA without disrupting the helix, but if you want to make a copy, you have to open up that region of the helix (MELT it) (don’t worry – it will ANNEAL back together afterwards). It’s easier to pull the strands apart in regions that have lots of A-T pairs because each of those base pairs only has 2 hydrogen bonds versus the 3 of G-C pairs, so regions that have to be opened often are often A-T rich. In cells, proteins called HELICASES help with the separation, but in the lab we can *literally* melt the strands by heating them up – this is what we do in PCR http://bit.ly/pcrtrain
But the helix is not where the structure stops! – your DNA packs up tight w/help of HISTONES because we have so much DNA it’d never fit in our cells if it weren’t all coiled up (if you stretched it out it would be ~2m (~6.5ft) long) To help it coil & stay coiled ➿ it wraps around “hair rollers” (proteins called HISTONES) to form NUCLEOSOMES, which are like beads on string ➰
This saves space (& prevents things you don’t want read from being read), but when you *do* want a region read &/or transcribed, that region must be “opened up.” It’s kinda like when you open a mobile version of a website & it has all the different sections collapsed to save room ▶️ & you have to click on them to expand them if you want to actually read them 🔽
Much of EPIGENETICS involves special proteins adding modifications to the DNA or its “curlers” that help the sections expand if you want to read them 🔽 & collapse if you don’t want them read 🔼. These modifications are often put on histones by post-translationally modifying lysine amino acids in the tails of the histone proteins, as we talked about here: http://bit.ly/lysineanalysis
How do you know what’s worth reading when? Other proteins called transcription factors are able to read “section headers” – recognize specific DNA sequence motifs (like words) that are present in front of functionally-related genes so that, instead of reading gene-specific headers, it’s more like reading “key words” or indexing terms, so a DNA-binding protein can search for 1 search term & get “hits” on multiple regions or genes which it can act on “simultaneously” – this allows for coordinated activation or deactivation of related genes