Step aside 007 – there’s a new (actually a really ancient) bond in town – the PHOSPHODIESTER BOND. It “cursivises” your DNA letters to write your genetic “script.” But if you get your hands on a copy of such a script, how do you read it? What actually happens to DNA when we “send it for sequencing”? SANGER SEQUENCING (aka CHAIN TERMINATION METHOD) is a way to “read” the sequence of nucleotides (DNA letters) in pieces of DNA. We care about this sequence of letters because it will determine the sequence of amino acids (protein letters) in the proteins it codes for, which will determine how that protein folds and what it can do.
Yesterday we looked at another bond – the peptide bond – which links together protein letters (amino acids). I’m a protein person, and if we want to study a protein, we can use molecular cloning to stick the gene for that protein into a circular piece of DNA called a vector plasmid that we can stick into cells to express it for us. We want to make sure that the sequence got into the plasmid ok, without any typos in the DNA sequence (which could cause typos in the resultant protein or even prevent it from being made all together).
So, before we try to get cells to express the protein, we put the plasmid we engineered into bacteria to make lots of copies of it, which we then purify out using alkaline lysis (“minipreps”) more here: http://bit.ly/2Ty5HVS
After that, you’re left with pure plasmid. And you want to check the sequence of the part you put in, which we do using SANGER SEQUENCING. We don’t do the sequencing ourselves, we send samples to be sequenced to facilities that are specialized to do it fast and on the cheap. And we don’t often stop to think about what really happens when it gets to the facility. We just wrap up tiny tubes with bubblewrap, stick them in a big tube, stick that in an envelope (too many “your tubes arrived damaged” emails…) and drop them in the outgoing mail box (too many crushed tubes…). To understand what happens when it gets to the sequencing people, lets first review what it is we’re trying to read – what is DNA?
DNA stands for DeoxyriboNucleic Acid & it’s made up of long chains of “letters” called NUCLEOTIDES (nt), which usually pair with another strand to form double-stranded DNA (dsDNA) There are 4 DNA nt -> A, T, C, & G & they’re made up of 3 main parts – a deoxyribose sugar & phosphate(s) form the generic “backbone” part & then each letter has a unique “nitrogenous base” (“base”) which has 1 ring (the pyrimidines C & T) or 2 rings (the purines A & G). The different bases pair with specific other bases on other strands – A:T and G::C. So if you know the sequence of one strand you know the sequence of the other.
I like to picture them as tiny little cartoons where the sugar’s 5-sided ring forms the core body & various groups stick off of its arms & legs. The “right arm” (as in the right of your screen/paper) is the “1’” position (the ‘ is pronounced “prime”) & this is where the base attaches. The “left arm” (5’ position) is where the phosphate(s) link on. The 5’ position is actually more like an elbow because there’s a “linker” from the 4’ “shoulder” & the “left leg” (3’ position) has a hydroxyl (-OH) group.
Nucleotides link together left arm (5’ phosphate) to left leg (3’ OH) through PHOSPHODIESTER BONDS. You can link up as many as you want to get a chain, one end of which will have a free 5’ phosphate (the 5’ end) & the other end of which will have a free 3’ hydroxyl (the 3’ end).
DNA & RNA (RiboNucleic Acid) are different in that RNA has a right leg (2’ -OH) and a left leg (3’ -OH) but DNA only has a left leg (they actually both have 2 right legs and 2 left legs, but if the leg is just a “Stub” (hydrogen) it doesn’t really do anything but take up a little space and satisfy the electrons carbon needs, so we don’t usually draw it. (The other difference between DNA & RNA is that RNA has a “U” instead of a T)
sidenote: So if you see a carbon with less than 4 bonds, you just assume that there are hydrogens there. Also, if you see a “corner” without an element letter, you assume that there’s a carbon there. Carbons (with hydrogens as sorts of “placeholders”) form the skeleton of organic molecules (organic as in carbon-based, not “all-natural”), but it’s often the things they’re bonded to (functional groups) that do the exciting reacting stuff so we want to make them stand out more. So we’ll often draw the chemical structures of organic molecules with implied carbons and hydrogens, and just write in the C’s or H’s in places where they’re actually involved in what we’re interested in. This shorthand is really helpful, but it can also be confusing to people unfamiliar to it, so I hope this helps make biochem a bit more accessible.
So, back to the sequencing story -> it’s ok that DNA doesn’t have that right leg because it doesn’t need it to link to another letter (polymerize). But this linking DOES need the left leg – that’s where the incoming letter will latch on.
A molecule called DNA Polymerase (DNA Pol) facilitates this linkage. It acts like a train that can only travel on double-stranded track. So to travel on single-stranded track it first has to add the complementary nucleotide (the one that base pairs with it)(e.g. to travel past an A on the template strand it has to add a T to the growing strand)(so the product that’s being made is the complement to the template strand, but if you know one you know the other).
Because it can only travel on double-stranded track, you also have to provide a primer (short complementary sequence) for it to start from. In Polymerase Chain Reaction (PCR), you use 2 primers to define the “start” and “stop” of a region you want to make copies of. With SANGER SEQUENCING, you only use 1 primer – you just give it the “start” station and then you let it stop wherever it adds one of the defective tracks you give it and see how far it goes.
These “defective tracks” are DIdeoxynucleic acids (ddNTPs) which don’t have a left leg to latch onto (they have a 3’ H instead of an -OH). So these defective NTs act as CHAIN TERMINATORS.
The basic premise of SANGER SEQUENCING is -> you give it mostly normal NTs (dNTPs) mixed in with some “defective” NTs DNA Pol will add normal NTs (dNTPs) normally but when a terminator gets incorporated, nothing else can be added. So, depending on how many normal ones got added before the terminator, you’ll get pieces of different sizes.
You can run this on a urea-PAGE gel which separates them by their size by using the DNA’s negative charge to drive it through a gel towards a positive charge, with the gel mesh slowing bigger things down more along the way. Compared to agarose gels, urea-PAGE offers much higher resolution because you can make a tighter gel mesh (more here: http://bit.ly/2XsNzQg) -> can detect single-NT length differences – so you can tell XXX apart from XXXX, BUT you can’t tell what letters those X’s are (e.g. AAA and TTT look the same) So, you had to do 4 separate reactions, with each reaction only having terminator versions of a single letter.
You don’t want all of the letter to be terminator-y because then you’d never be able to get past the 1st instance of it, so you include ~100X-less of the ddNTP than the dNTP (e.g. in the “A” reaction for every 100 A’s have in the mix, 99 will be dATP (normal) and 1 will be ddATP (terminating))(and you’ll also have all normal dGTP, dTTP, & dCTP in there).
Then, technology advanced, allowing for DYE-TERMINATOR SEQUENCING -> scientists began using fluorescently-labeled nucleotides. Fluorescent molecules absorb light at one wavelength (excitation wavelength) and release it at a different wavelength (emission wavelength). Different wavelengths have different colors, so if you use fluorophores that have different emission wavelengths, you can tell them apart.
You can label the different terminators (ddATP, ddTTP, ddGTP, ddCTP) with different fluorophores and add all 4 at once. The fluorophores are added to the base, in a position that doesn’t interfere with the base pairing. To make things even easier, you can use CAPILLARY GEL ELECTROPHORESIS. Instead of running it through a “slab gel”, you run it through a vertical tube of gel. And as it runs through it gets “scanned” by a laser.
The light from the laser is at the fluorophore’s emission wavelength so it excites the fluorophore, which then emits light at a different wavelength, which gets recorded by a detector as peaks of fluorescence intensity at each wavelength, drawn on a CHROMATOGRAPH. Because the different ddNTPs have different fluorophores and give off light with different wavelengths, the detector can tell them apart.
Sanger sequencing is kinda like the “gold standard” in terms of accuracy (which is really important in our case), but it’s expensive (relatively speaking). It’s really cheap if you only have like one reaction (~$5) but if you wanted to sequence an entire genome (which you’d first have to break up into lots of shorter pieces you’d later “stitch together” computationally) it’d be really expensive. So for big projects, things have switched to “next-gen sequencing” which is highly parallel – lots of reactions happening at the same time, usually on a chip, with really tiny volumes
more on DNA polymerization: http://bit.ly/2TFdQN9
more on PCR: http://bit.ly/2FiBXsl
more on peptide bonds: http://bit.ly/2lVQsuJ