Was there a break-in to the restricted section? Did you successfully sneak your protein recipe into the plasmid cookbook library? You can use restriction enzymes as DNA detectives to collect and analyze “DNA fingerprints in an analytical digest– and speaking of detectives, actual detectives have historically used similar techniques to ID criminals, prove paternity, and chase down genetic mutations causing disease. Let’s take a look at RFLP (Restriction Fragment Length Polymorphism) analysis – the original “DNA fingerprinting” – first in its wider use, then how we use a form of it in molecular cloning confirmation.
Bacteria have DNA-specific “scissors” called restriction enzymes (aka restriction endonucleases, or REases) that recognize & cut specific “code words” (restriction sites aka recognition sequences) written in DNA that act as “dotted lines.” Bacteria use them as a defense against invaders like bacteria-infecting viruses (phages) and biochemists use them to cut and paste pieces of DNA together in MOLECULAR CLONING and to check if that cloning worked (diagnostic digest). And geneticists and CSI folks can use them to compare DNA sequences to track inheritance patterns for genetic diseases or rule suspects in or out.
The enzymes can only cut if the dotted line’s there (the recognition sequence is present). People naturally have genetic variations (polymorphisms) that alter how many restriction sites are available – so if you try to cut it, with those restriction enzymes, different people will have different #s & sizes of products (which you can separate by their size using agarose gel electrophoresis (use the DNA’s natural negative charge to send it through a gel that acts as a molecular sieve cuz the pieces get tangled up in it and the longer the pieces, the harder it is to get through. And then you can make some or all of them visible (more on this later).
People only have different #s of pieces if they (or one of their ancestors) happen to have a mutation that affects that cut site. Any mutations between sites are “invisible.” But if there are mutations in between cut sites that change the length of the in-between part, you’ll get different sized pieces. And although I say “mutations” it’s important to remember that mutations aren’t all bad – and instead of calling them mutations (sounds scary) we can call them “polymorphisms” (less scary-sounding and conveys the accurate impression that these genetic differences are just ways we’re unique not things to make you freak!)
One source of polymorphisms is different #s of repeating sequences called tandem-repeats. The “repeat” is a repeating core sequence of 10-15 DNA letters (nucleotides) that people can have different numbers of copies of in a row (in tandem). Why? DNA processing machinery can “slip up” and slip off and end up adding extra or not enough of the repeats so people end up with different numbers of them. Usually these repeats are in “noncoding” regions (not part of protein instructions) and don’t have much of a biological impact, but they do change the size of the pieces. And, since DNA is inherited, you can compare piece sizes to analyze paternity or try to figure out where a disease gene is.
Humans have 23 chromosomes, but we have 2 copies of each – we inherit one from biological mom & 1 from dad so we have 2 potential “versions” (alleles) of each genetic location (I say potential because the sequence might be the same in both copies you get). The way this works is that (in a process called meiosis), mom & dad create special cells (egg and sperm) with only 1 copy of each chromosome so that when these cells combine you have 2 copies – one from mom & 1 from dad.
So ~1/2 the polymporphic fragments will come from the biological father and any fragments that aren’t present in the mother should* be present in the father (since the kid had to get it from somewhere I say “should” not “must” because random spontaneous mutations can arise leaving the kid to have a polymorphism not present in either parent. So if you break open cells from a person, extract out the DNA, add restriction enzymes to cut it up into pieces and then look at the SIZE of the pieces that encompass those regions, you can get get a “DNA fingerprint.” Such DNA fingerprinting was discovered in 1984 by a British scientist named Alec Jeffreys, who was interested in genetic disease detectiving.
You know how I said you can visualize some or all of the DNA? When we run your traditional agarose gel in the lab these days we’re usually just doing things like looking at a few small bands we’ve created in a tube or relatively small plasmids with just a couple cut sites. So we stain *all of it* with a fluorescent dye that binds DNA nonspecifically.
But Jeffrey’s situation was a lot trickier because he was looking at entire human genomes – which are huge – with lots of cutting sites – and lots of fragments of similar sizes that could come from anywhere. If he labeled all of the bands it’d be a smeary mess and you couldn’t tell what came from where – he needed a way to only highlight some of the bands, those that came from specific locations.
You can use something called Southern blotting to “highlight” specific sequences – after running the gel
you transfer the DNA to a membrane. Then you add labeled (usually radioactively-labeled) oligonucleotide probes – these are relatively short stretches of DNA (oligonucleotides) that stick to specific sequences in the DNA because they’re complementary to them. You can use different sequences as probes to look for different things.
But what to look for? He wanted to probe bands that are likely to be highly polymorphic – enter the minisatellite -(aka variable numbers of tandem repeats) (VNTR) – and later the microsatellite. These are like genetic “cliches” – oft-repeated sequences that show up all over the place. They typically come from “transposable elements” – so-called “jumping genes” that often came from ancient viruses. Because these transposable elements can make copies of themselves, and insert those new copies of themselves in new places, you end up with that sequence located at multiple places in the genome – so if you probe for one of those core repeated sequences you’ll highlight multiple bands.
But because people have different RFLPs and different #s of tandem repeats, the probes will stick to different size fragments in different people so if you probe for the core sequence you’ll find it in different size bands in different people. And this has found a variety of uses from resolving paternity disputes, solving murder cases, and finding genetic disease culprits.
It was first used in court in an immigration case, where a Ghanaian British citizen was denied re-entry into the UK because he had a forged passport – the family’s lawyer enlisted Jeffreys’ help. The father’s DNA wasn’t available but he was able to compare DNA from the mom, the children not in question and the child in question. And he showed that they may have faked the passport, but they weren’t faking the relatedness! Success!
Then in 1986 it was used in a criminal case for the 1st time – Richard Buckland was found innocent of rape & murder of 2 young school girls when the semen found didn’t match his – even though he’d confessed to the second murder (but not the first) – so the first DNA exoneration was also a warning about forced confessions! The police collected & tested blood from men in the area, but the real killer escaped detection for a while by getting a friend to give blood for him – but then he bragged about his trick – someone overheard this brag, his real blood got tested, it matched, and he was convicted.
RFLP analysis also has a rich history in hereditary disease research. The chromosome copies you get aren’t “all grandpa” or “all grandma” because when those mommy & daddy cells split so that they only had 1 copy of each (a process called meiosis), they swapped pieces in a process called homologous recombination – this is great for diversity because it means you get some of grandma & some of grandpa in each chromosome. And the process of physically crossing over helps hold copies together until it’s splitting time so you the daughter cells get 1 of each.
As a result, chromosomes are “patchwork-y” (like AAAAAAAAAAAA + aaaaaaaaaaaa =-> aaaaAAAaaaaa + AAAaaaAAAAA)
And regions of DNA close to one another tend to get passed down together whereas farther away regions might not. So if a disease mutation arises near a polymorphic region the disease mutation will tend to get co-inherited with those polymorphisms
So when scientists were trying to find the causative mutations for genetic diseases like cystic fibrosis they could compare the DNA fingerprints of family members with and without the disease to see whether people with the disease tended to have a band size that people without the disease didn’t. This could hint that the disease gene might be somewhere in that piece and then scientists could further investigate.
a couple technical notes:
In Jeffrey’s original work, he probed for minisatellites – these contain a repeated “core sequence” of 6-100 letters that can be repeated 2-hundreds of times in a row (hence the “tandem”) per mini satellite location. So he was getting fragments that were tens of thousand of bases long that could be hard to distinguish the copy number variants from one another (try detecting a difference of 6 letters in a sea of thousands. So, later scientists turned to micro satellites or short tandem repeats (STRs). Microsatellites – as the name suggests – are smaller. AKA short tandem repeats (STR) their repeating part is only 1-7 letters long & is only repeated 5-100 times at each microsatellite location. Here’s a link to the paper in the pic https://www.nature.com/articles/316076a0.pdf
Another much greater “simplifiication” came when more specific probes were made – Jeffrey’s 1st experiments used multi locus probes (MLP) detect 15-20 variable fragments per individual, from 3.5-20 kb in size. But he and others later (once they could make probes for variable regions that didn’t target the “Generic” part of the repeat) he switched to single-locus profiling – single hypervariable locus detected by a specific single-locus probe (SLP).
You usually use multiple probes so you can look at multiple locations – but not “all of them” and this is the distinction between DNA profiling & DNA fingerprinting – in profiling you only look at a limited # of repeated regions. A DNA profile’s a set of #s that says how many repeat units a person has at each of a standard set of locations (the U.S. Combined DNA Index System (CODIS) database uses 20). But RFLP-based methods aren’t usually used to find out that # these days…
A problem with RFLP-based methods is that they require a lot of DNA because you’re not making copies of anything – in fact, you’re making things smaller! (And since shorter DNA strands offer less opportunity for fluorescent dyes to stick to them, the same # of small pieces will look a lot duller on a gel than that # of big pieces)
So scientists have largely turned to PCR-based methods, with RFLP-based method use largely falling out of favor starting in the early 90s. . PCR stands for Polymerase Chain Reaction and it’s a way to make lots of copies of specific parts of DNA – you specify the “Start” and “stop” points using short DNA pieces called primers that bind to those places to create the double standard “train stations” DNA Polymerase (DNA Pol) needs to start laying down track – linking up nucleotides that complement the other strand. So, instead of restriction enzymes, you can use primers to bookend a region & see how big it is. This is commonly referred to as STR (short tandem region) analysis.
And instead of running a slab agarose gel you can use capillary electrophoresis. Same bigger-moves-slower concept, but you use fluorescently-labeled primers so the fragments are “pre-labeled” and you run it through a gel in a long thin tube and detect it when it runs through (similar to in DNA sequencing. more here: http://bit.ly/DNAsequencing
But “RFLPs” on a much tinier scale are still really useful in the lab to do a quick check of our cloning before we send it for more definitive sequencing. It has a lot in common with what we’ve just learned about but we’re checking to see if a gene we think we put into a circular pieces of DNA called a plasmid vector actually got in there (we can also do a similar PCR-based method called colony PCR: http://bit.ly/2HzpfZ3
In analytic restriction digest (aka diagnostic digest) we use those DNA “scissors” – restriction enzymes – to cut our plasmid. Remember that different scissors recognize different “code words” (restriction sites) – so you can take plasmid DNA you want to see if has insert ⏩ add restriction enzyme(s) (and a buffer containing salts, pH stabilizers, Mg2+, etc. to keep the enzyme happy) ⏩ heat it up to give the enzymes energy to work & give it time to cut ⏩ then you run an agarose gel like before ⏩ check how many fragments you see & how big they are
🔹 NUMBER of pieces you get depends on how many restriction sites there are for the enzyme(s) you use
🔹 SIZE of pieces depends on how much stuff’s in between the sites
If your gene contains a restriction site that the vector backbone doesn’t, presence of the gene will lead to 1 more cut, so you get an extra product (or, if this is the only site present, you’ll get 1 linear product instead of the circular plasmid). Circular DNA runs kinda unpredictably so it’s nicest if you have 1 site in backbone & 1 site in insert – you can use multiple restriction enzymes to make this happen
Sometimes your gene doesn’t have any unique restriction sites (or at least none you have matching enzyme for) BUT all hope’s not lost! You have a few options
1️⃣ buy another restriction enzyme 👉 end up w/racks & racks of various ones you’ll probably never need again…
2️⃣ check if you can introduce a “silent mutation” that changes the DNA sequence but NOT the protein sequence it codes for (like “grAy” vs “grEy” 👉 different spellings, same meaning BUT different restriction enzymes will only recognize 1 ✅ & not other ❌) If your restriction enzyme is pickier than your protein-makers, you can take advantage of these differences
🔹 To make a protein from DNA, DNA first gets copied (transcribed) into mRNA which then gets turned into protein (translated). It takes 3 DNA (or RNA) letters (the same bases except the T’s become U’s) (more here: http://bit.ly/2yCisGq ) to spell a single protein letter (amino acid). We call these 3 nucleotide “code words” CODONS
There are 4 different nucleotides & 20 (common) amino acids – do the math (4 bases & 3 spaces so 4^3 = 64) & you have more codons than amino acids. How to reconcile this? Multiple codons spell the same word (degeneracy) – the protein translation machinery (ribosomes) knows “gray” & “grey” mean same thing BUT a restriction enzyme might be more of a “purist” & refuse to get near anything that doesn’t spell it the way it considers “proper.” So you can mutate the DNA’s sequence before you stick it in, changing spelling so it’ll get cut by restriction enzyme BUT the protein product won’t be affected
A helpful tool for checking for opportunities to do this is WatCut: http://watcut.uwaterloo.ca/template.php
3️⃣ use restriction enzymes that only recognize the vector – you’ll get the SAME NUMBER of products regardless of whether your insert is in there, BUT the SIZE of the products will be DIFFERENT – 1 will be a lot bigger if your gene’s inside (similar to the logic behind colony PCR w/ vector-specific primers)
📝 A couple practical notes
🔹To help you compare, run some controls 👉 negative control: don’t add enzyme – shows you what uncut looks like. also run plasmid-only (you know it doesn’t have insert) and insert-only (you know it doesn’t have plasmid)
🔹Unlike colony PCR, you have to purify the DNA first – & you don’t want to waste all your DNA (if you do decide to “hire it” you’ll need the rest – so you typically just test a tiny bit in ~20μL total (1 μL (microliter) is 1 millionth of a liter)
🔹You want to check that there aren’t a lot of sites for that enzyme on your plasmid or else you’ll get lots of little pieces that are hard to analyze
🔹 The enzymes are “numbered” not “lettered” (e.g. EcoRV isn’t an all-electric RV model, it’s EcoR FIVE (learned this the embarrassing way). It tells you it was the 5th restriction enzyme found in the “RY13” strain of E. coli)
more on restriction enzymes: http://bit.ly/reasesvsmtases ⠀
more on clone-checking: http://bit.ly/colonychecking