The last few weeks have been all about amino acids (protein letters) (and a brief segue into candle combustion chemistry…) But in order for the right amino acids to be added in the right order to give you the right protein, you need to have a really clear recipe. You need to be able to copy those recipes each time a cell divides so you never lose it, and it can’t be like a game of telephone, where the message changes slightly when you pass it down. Instead, these recipes have to be faithfully replicated, and protected between copy-making. So you keep them locked up safe like encyclopedias in the reference section of a library and, when you want to make the protein, you don’t work with the original recipe, instead you “xerox” it and give the “chefs” (protein-making machinery called the ribosome) copies to work with.
How to achieve all of this? NUCLEIC ACIDS! “Nucleic acid” is a category that includes RNA (RiboNucleic Acid) and DNA (DeoxyRiboNucleic Acid). The original recipe (the one that is kept in the “reference section”) is written in DNA & the “Xeroxes” are made in RNA. The nucleic acid “alphabet” is made up of “letters” called nucleotides, similarly to how the protein “alphabet” is made up of letters called amino acids – but the nucleic acid alphabet is a lot smaller – just 4 DNA letters & 4 RNA letters – as opposed to the 20 (common) amino acids.
The letters of both alphabets have generic backbones for easy linking (a sugar-phosphate backbone for nucleic acids and a (carboxylic acid-alpha carbon -amino group) for amino acids, but they also have unique parts that give them different properties. For nucleotides, the unique parts are nitrogenous bases and they can form specific base pairs with each other (A to T (or U in RNA)) and C to G. This way you can use one strand of double-stranded DNA as a template for making the other one (which if you used that as a template would give you back the 1st strand…)
For amino acids, the unique parts are the “side chains” or “R groups” which range from small and flexible to large and bulky, negative, positive, or neutral. These different properties help proteins fold up into intricate shapes and carry out various functions. But in order to get those functional proteins you have to make sure you link up the write letters and this is one where DNA comes in.
The instructions for what order to link the amino acids in are written in DNA as genes. Because those instructions are so important, this DNA is kept “locked up” safe and sound in a membrane-bound “room” in your cells called the nucleus. It’s kinda like a “reference section” of a library – you can’t check out books, but you can read them & even make copies. If you need to make a protein, you have to find the recipe for it. Then you have to TRANSCRIBE a copy of it into the related RNA language. Then you can take this MESSENGER RNA (mRNA) copy out of the nucleus & into the CYTOPLASM (general cellular interior) where RIBOSOMES (protein/RNA molecular machines) TRANSLATE the mRNA into the PROTEIN language of AMINO ACIDS.
The combination of ALL the DNA in one of your cells is called your GENOME. You can think of it like a super long cookbook where the recipes for making proteins, functional RNAs, etc. are regions of DNA called GENES. The genome’s split up into “volumes” called CHROMOSOMES & you have 46 of them. 46? Then why’s it called “23 and me?” 23 *pairs* – you have 2 copies of each volume – 1 you inherited from each of your biological parents.
The copies are mostly the same (except for the pair of sex chromosomes which can be different) – they have recipes for making the same things BUT small differences in the recipes (POLYMORPHISMS) lead to slight differences in the final products &/or when they’re made. This variety is good (polymorphisms are the spice of genetic life) because the slightly different versions might be slightly better in certain situations. BUT you might also have “typos” (harmful mutations) in 1 copy of a recipe, but thankfully you have a backup! (🤞 your backup’s ok)
Just like proteins have layers of structure (primary structure’s sequence of amino acids, secondary structure’s things like helices & strands made through backbone-backbone interactions, tertiary involves side chains, & quaternary involves more than one polypeptide chain) nucleic acids have structure too. The sequence of nucleotide “letters” in nucleic acids (DNA & RNA) defines its PRIMARY STRUCTURE & base-pairing between those nucleotides gives it SECONDARY STRUCTURE
And just like with proteins, the primary structure is of primary importance, so let’s take a closer look at these nucleic acid letters!
If you hear the word “polymer” you might think of “plastic” – “plastic” is actually a pretty generic term used to encompass lots of types of synthetic or semi-synthetic organic (carbon skelton-based) polymers. And we try to avoid getting those in our body – but there are also polymers that *do* belong in our body – in fact polymers produce people! Because polymer (poly (many) + memos (parts)) is just a word for chains of similar units (monomers)(single part).
So just like the polyethylene in plastic bottles is made up of lots and lots of stuck together “ethylene” (aka ethene) monomers (CH₂=CH₂) and the polypropylene in your microwave container gets its strength from lots and lots of individual propylene (CH₂=CHCH₃) monomers teaming up, our bodies: store energy in the form of polymers called glycogen, which has glucose (blood sugar) monomers; carry out work with the help of polymers of amino acids (yup, proteins are biological macro-polymers); and encode genetic information in polymers called nucleic acids, which are “just” polymers made up of nucleotide monomers, which I want to talk some more about today.
These nucleotide monomers are themselves made up of key components. NUCLEOTIDES contain
- a NITROGENOUS (nitrogen-containing) BASE (aka NUCLEOBASE)
- a PENTOSE (5-carbon SUGAR) – either ribose (in RNA) or deoxyribose (in DNA)
- 1 or more PHOSPHATES (negatively-charged)
building nucleic acids up from their parts…
BASE + SUGAR = nucleoSIDE
BASE + SUGAR + PHOSPHATE = nucleoTIDE
NUCLEOTIDE + NUCLEOTIDE + NUCLEOTIDE . . . = NUCLEIC ACID
There are 5 common nitrogenous bases (nucleobases, often just called “bases”). 3 are found in *both* RNA & DNA (but attached to different sugars) – and these 3 shared ones are adenine (A), guanine (G), & cytosine (C). Then you have thymine (T), which is only found in DNA & uracil (U), which is only found in RNA.
We saw a *ton* of different ways to classify amino acids because their unique parts were all so unique, but there are only two main classifications we give to nucleobases – they can be a purine (A & G) or a pyrimidines (C, T, U). To help you remember which is which, you can use the mnemonic PURe As Gold to recall that the purines are adenine and guanine (so the other 3 have to be pyrimidines).
What’s the diff? Purines have 2 rings (1 6-membered, 1 5-membered) (I remember this by thinking of purines as “pure” – having it all) as opposed to pyrimidines which only have 1 ring (6-membered)
When those bases hook up to a Sugar (through a glycosidic bond) they become nucleoSides. We call them ribonucleosides if they bind to ribose & deoxyribonucleosides if they bind to deoxyribose (like ribose but w/1 less oxygen).
The “weddings” of nucleobase to sugar also come with name changes
🔹 purines -> end in “-sine” (e.g. adenine becomes adenosine)
🔹 pyrimiDines -> end in “-Dine” (e.g. cytosine becomes cytidine)
Almost there… we’ve gone from nucleobase + sugar to nucleoSide and now it’s time to make our monomer, the nucleoTide. And we do this by hooking them up with phosphate groups.
🔹 +1 phosphate gives you a nucleoside MONOphosphate (NMP (if the nucleoside has ribose as the sugar) or dNMP (if the sugar’s deoxyribose)) (e.g. AMP)
🔹+2 phosphates gives you a nucleoside Diphosphate (NDP or dNDP) (e.g. ADP)
🔹+3 phosphates gives you a nucleoside TRIphosphate (NTP or dNTP) (e.g. ATP – the energy money we keep talking about)
It may look confusing that the nucleoTides have nucleoSide in their name as written above, but that’s because it’s written as its parts – nucleoSide + phosphate (nucleoTide has the phosphate so if we wrote nucleoTide monophosphate, etc. that would be “redundant”
Now we have our monomers (nucleotides) – time to stick them together to form a polymer (nucleic acids). Nucleotides join together through phosphodiester bonds, which are strong covalent (electron-sharing) linkages that leave a backbone of nucleic acids consisting of alternating phosphate & pentose residues w/nitrogenous bases as “side groups” joined to the backbone at regular intervals. Nucleotides come to be added (by enzymes called DNA or RNA polymerases) in their triphosphate form, but 2 phosphates are lost as energy payment, so you end up with a single phosphate in between each letter in the chain.
Even though you only have 1 phosphate per letter, you have lots of letters, so you have to accommodate a lot of phosphates. Which are negatively-charged. But how best to accommodate it all? The sugar-phosphate backbone is charged & HYDROPHILIC (water-loving) but nucleobases are HYDROPHOBIC (water-avoiding). So nucleic acids try to form secondary structures that minimize exposure of bases to water & maximize exposure of the backbone to water
DNA usually does this through forming double-stranded DNA (dsDNA) consisting of a DOUBLE HELIX made up of 2 DNA strands running in opposite directions (ANTIPARALLEL). In this helix, bases pair in the center like rungs in a ladder – the bases are planar (flat), so their rings “stack” like pancakes & help “glue” the strands together. They don’t have charges to offer, but, kinda like how if you alternate the pages of 2 phone books you can get them stuck together really strongly, these base stacking interactions add up, to see your DNA zipped up.
The width of the helix is constant because 1 PURINE (2-ringed base) & one PYRIMIDINE (1-ringed base) always join together: C to G & A to T. They pair in this way because they have complementary binding opportunities on their “edges.” Unlike the strong phosphodiester bonds linking letters within a chain through their backbone, which are covalent bonds (involve neighboring atoms electron-sharing), the bonds connecting the bases in different strands are weaker hydrogen bonds (H-bonds) which are a special type of partial charge attractions.
In a covalent bond, neighboring molecules share a pair of electrons (e⁻). But sometimes 1 of the atoms holds on to e⁻ more tightly (it’s more ELECTRONEGATIVE) so its partly negative & the other atom is partly positive. We call this a POLAR covalent bond. Opposites attract, so the negative part seeks out something positive & the positive part seeks out something negative.
Hydrogen bond (H-bond) is just a special name we give to these attractions when the partly positive part’s a hydrogen attached to something electronegative (e⁻ hogging) like nitrogen or oxygen and the partly negative part is an electronegative atom (like N or O) with a lone pair of electrons (electrons not being used for pairing).
And we can find lots of these H-bond donors & H-bond acceptors in the nucleobase part of nucleotides (the purine or pyrimidine). And your cells “know” how to copy it based on which bases have matching donor & acceptor sites. A & T (or U) only form 2 H-bonds w/each other BUT G & C can form 3 H-bonds when they “Velcro together” so they’re stronger
Back to the helix – See a DNA double helix, put your right thumb up. If the twist follows your fingers your graphic designer is in luck! The first thing I do when I see a graphic of DNA is give it a thumbs up. Not *just* because I’m happy to see it (though it is *groovy*) but because I’m checking to make sure it’s RIGHT-HANDED! In order to complement the base pairing we’ll discuss more in a second, the strands adopt a right-handed helical form (to see that it’s right handed, give a thumbs up w/right hand – the direction your fingers are curling is direction DNA curls)
The most common double helix in DNA is “B-form” It has 2 grooves – a larger major groove, & narrower minor groove. These grooves can act like windows for other molecules to “read” their sequence without having to unzip them. RNA can also form double helixes (though usually a more compact “A-form”) & it also forms a variety of structures w/in single strands
In the MAJOR groove you can distinguish all 4 base pairs, but in the MINOR groove you can only tell GC or CG vs AT or TA. I still remember the mnemonic I came up with when taking molecular bio in undergrad… AHA! ADAM ATe A TAMADA! (see pic)
Thanks to DNA’s grooviness, you can read the DNA without disrupting the helix, but if you want to make a copy, you have to open up that region of the helix (MELT it) (don’t worry – it will ANNEAL back together afterwards). It’s easier to pull the strands apart in regions that have lots of A-T pairs because each of those base pairs only has 2 hydrogen bonds versus the 3 of G-C pairs, so regions that have to be opened often are often A-T rich. In cells, proteins called HELICASES help with the separation, but in the lab we can *literally* melt the strands by heating them up
But the helix is not where the structure stops! – your DNA packs up tight w/help of HISTONES because we have so much DNA it’d never fit in our cells if it weren’t all coiled up (if you stretched it out it would be ~2m (~6.5ft) long) To help it coil & stay coiled ➿ it wraps around “hair rollers” (proteins called HISTONES) to form NUCLEOSOMES, which are like beads on string ➰
This saves space (& prevents things you don’t want read from being read), but when you *do* want a region read &/or transcribed, that region must be “opened up.” It’s kinda like when you open a mobile version of a website & it has all the different sections collapsed to save room ▶️ & you have to click on them to expand them if you want to actually read them 🔽
Much of EPIGENETICS involves special proteins adding modifications to the DNA or its “curlers” that help the sections expand if you want to read them 🔽 & collapse if you don’t want them read 🔼. These modifications are often put on histones by post-translationally modifying lysine amino acids in the tails of the histone proteins, as we talked about here: http://bit.ly/lysineanalysis
How do you know what’s worth reading when? Other proteins are able to read “section headers” – recognize specific DNA sequence motifs (like words) that are present in front of functionally-related genes so that, instead of reading gene-specific headers, it’s more like reading “key words” or indexing terms, so a DNA-binding protein can search for 1 search term & get “hits” on multiple regions or genes which it can act on “simultaneously” – this allows for coordinated activation or deactivation of related genes