A practical guide for checking your clones to make sure your inserted DNA got safely into its new home!. We can use molecular cloning to put genes into circular pieces of DNA that we put in bacteria. But how do we know if that gene you think you put inside it is *really* inside? With COLONY PCR it cannot* hide! (well, it can… and it can also hide from another clone-checking method, ANALYTICAL RESTRICTION DIGEST (aka diagnostic digest), but that’s where the SEQUENCING comes in later!). 

note: apologies in advance for formatting and/or repetitiveness problems. I mish-mashed stuff from a couple past posts so all of this would be in one place to refer people to. And now have my own research work to do!

We’ve talked a lot lately about the process of MOLECULAR CLONING, where I take an (edited) gene for a protein I want to study from one template and put that gene INSERT into a circular piece of DNA called a PLASMID VECTOR that has the “bells and whistles” I want, like “tags” to help with purification and start signals for turning the gene into protein. Then I stick this RECOMBINANT plasmid into bacterial cells so the bacteria will make more of the DNA and/or protein. http://bit.ly/molecularcloningguide 

But how do I know if the bacteria *really* have my gene in them? The plasmid vector has a selection marker – often an antibiotic resistance gene – so that if you grow the bacteria that should have it on food containing that antibiotic, only the bacteria that have the plasmid (and hence the resistance gene) are able to grow. These bacteria grow and replicate to form individual “colonies” on a bacterial plate. Each colony has lots of cells but they all have the same genetic makeup.

BUT this only tells you if the *plasmid* is inside the bacteria not if your gene is inside that plasmid. If you want to know if the gene is *probably* in there, a few of the main techniques you can turn to are colony PCR, analytical digest, and blue-white screening. If you want to make super sure that the gene is in there (and that the sequence didn’t get mutated) you have to actually read that sequence with DNA sequencing. 

Here’s a basic overview of the different methods and then I’m going to focus most on colony PCR in this post, but provide links to more in-depth posts on the other topics. If the terms don’t make sense yet, bear with me

  • colony PCR: make copies of regions of the plasmid DNA that should have your insert in there. then look to see how if you get pieces and/or how big those pieces are (big enough to include the insert?)
  • analytical digest: use restriction enzymes (restriction endonucleases) to cut the plasmid into pieces and then see how big those pieces are

For both of those methods, you get your readout by separating the pieces by size using agarose gel electrophoresis, a technique in which you use electricity to get pieces of DNA to swim through a gel mesh made from a sugar called agarose. The longer pieces get tangled more, so when you turn off the electricity they won’t have traveled as far. Then you stain for DNA and comparing the size of the products to what you’d expect to see if the insert was or wasn’t there. 

The next methods have different readouts.

  • blue-white screening: have your insertion spot in the middle of a gene in the plasmid that’s needed to produce a blue-colored product. If your insert got inserted it’ll mess up that gene, so the bacteria can’t make the blue-maker so they can’t make the blue product and the colonies with the insert will be white while colonies without the insert will look blue (assuming you feed them the right food)
  • note: this method has the advantage that you don’t have to do extra work – you get a readout at the stage where you’re just starting your check with the other methods. but it has the disadvantage that you don’t get any information about the size of the insert so there’s less confirmation that the right thing got in
  • I’m not going to talk more about this in today’s post, but check out: http://bit.ly/2TVwNZe 

So, you get more information with the first couple methods than with blue-white screening, but you really don’t know for sure that all’s okay until you do

  • DNA sequencing – this is basically just what it sounds like – you send a sample of your plasmid with short DNA pieces called primers that direct them where you want to sequence and then they’ll send you the DNA sequence in that region. The benefit’s that it’s the only way to really know if the insert is okay. Especially when you’re doing site-directed mutagenesis, and checking to see if you made single DNA letter swaps, which don’t change the length of the insert. Downside’s that you can’t do it yourself so you have to wait (and pay, though it’s typically fairly cheap for these)

So, more details…


Polymerase Chain Reaction (PCR) is a way to amplify (make lots of copies of) short stretches of DNA from longer pieces of double-stranded (ds) DNA we call the TEMPLATE. We choose what region to copy by designing short pieces of DNA called PRIMERS to bookend the start & stop of this region (1 per strand) so that a protein called DNA POLYMERASE (DNA Pol) can copy each strand. 

I often use PCR to make copies of the insert as part of the actual molecular cloning process, but now with colony PCR we’re past cloning and it’s still useful! We can tell if the insert *probably* got in okay by using PCR with cleverly designed primers. You have a few options and presence/size of the copied produces (which you can tell by agarose gel electrophoresis) can tell you different things:

INSERT-SPECIFIC PRIMERS: both primers are in the INSERT (the gene you put in). This is a YES/NO for whether your insert’s present. If your gene’s not there there will be nothing for the primers to bind to -> no product. But if your gene is there the primers will latch on & Pol will copy between them -> product (note that by product I mean a defined, specific product, not “nonspecific products” that can come from primers binding incorrectly (mispriming)

  • tells you if your gene is present BUT NOT if your gene is where you want it…
  • advantage is that you can use this same set of primers to test for your insert in different plasmids

VECTOR-SPECIFIC PRIMERS: both primers are in the VECTOR, straddling the insertion site. As long as the plasmid’s present, you should get some sort of product, but it’s the SIZE of the product that gives you your answer (not a simple yes/no like above) – if your insert’s not in the vector the product will be really short but if your insert’s in there, the product should be bigger (that short length PLUS the length of your insert)

  • tells you if your gene (or something of that same size) is present IN YOUR VECTOR
  • useful because you can use the same pair of primers to test different constructs since the primers are specific for the vector not the insert
  • does NOT tell you whether your insert is inserted in the correct direction. for that you can use

ORIENTATION-SPECIFIC PRIMERS: one primer is in the insert & the other is in the vector

  • you’ll only get a defined product if your gene if facing the right way (not put in backwards so that the “start making protein here” message on the plasmid is next to the “stop making protein here” message on the DNA
  • tells you 1) is your plasmid present 2) is your gene present 3) is your gene in your plasmid and 4) is your gene “backwards”
  • downside is you have to design a specific primer

So we can use PCR as a secondary “screen” when cloning, but we still haven’t answered the question of how we get the DNA to screen. You can purify plasmid DNA out of bacteria – often using easy-to-use “mini prep kits” – they’re easy to use but if you have lots of bacteria to test, you don’t want to waste time purifying something “useless” so you can skip the purification (for now) and add a teeny bit of the whole bacterial cells into your PCR mix.

Just barely touch the colony with a sterile toothpick or pipet tip & swirl it around a bit in your PCR mix. (alternatively, you can resuspend a bit of it (pipet it up in down in some water) and add some of this to the PCR mix.

When the reaction heats up to MELT the DNA (separate the strands) it also LYSES the cells (breaks them open) so that the DNA “spills out” and DNA Pol can latch on.

If you get a positive result, you can then go ahead and grow up more of that colony and purify it. 


Another “quick check” is an analytical restriction digest – lots more here: http://bit.ly/RFLPanalysis 

But the basic idea is that you use restriction enzymes, which are sequence-specific DNA cutters that come from bacteria. Different restriction enzymes recognize & cut different ones and so you find ones that are present in your plasmid and/or insert. And then you cut out, within, etc., the part of your plasmid that should contain your gene. Then you see how many & how big those pieces are (with agarose gel electrophoresis). If your gene is there the piece will be much bigger than if it’s not there and/or depending on where your cut sites are you will get more pieces. And while you can’t tell exactly how many DNA letters are there, you get an idea whether you’re in the right ballpark.

You can take PLASMID DNA you want to see if has insert ⏩ add restriction enzyme(s) (and a buffer containing salts, pH stabilizers, Mg2+, etc. to keep the enzyme happy) ⏩ heat it up to give the enzymes energy to work & give it time to cut ⏩ then you run an agarose gel like before ⏩ check how many fragments you see & how big they are

  • NUMBER of pieces you get depends on how many restriction sites there are for the enzyme(s) you use
  • SIZE of pieces depends on how much stuff’s in between the sites

If your gene contains a restriction site that the vector backbone doesn’t, presence of the gene will lead to 1 more cut, so you get an extra product (or, if this is the only site present, you’ll get 1 linear product instead of the circular plasmid). Circular DNA runs kinda unpredictably so it’s nicest if you have 1 site in backbone & 1 site in insert – you can use multiple restriction enzymes to make this happen

Sometimes your gene doesn’t have any unique restriction sites (or at least none you have matching enzyme for) BUT all hope’s not lost! You have a few options 

  1.  buy another restriction enzyme & likely end up w/racks & racks of various ones you’ll probably never need again… 
  2. check if you can introduce a “silent mutation” that changes the DNA sequence but NOT the protein sequence it codes for (like “grAy” vs “grEy” – different spellings, same meaning BUT different restriction enzymes will only recognize 1 & not other) If your restriction enzyme is pickier than your protein-makers, you can take advantage of these differences

To make a protein from DNA, DNA first gets copied (transcribed) into mRNA which then gets turned into protein (translated). It takes 3 DNA (or RNA) letters (the same bases except the T’s become U’s) (more here: http://bit.ly/2yCisGq ) to spell a single protein letter (amino acid). We call these 3 nucleotide “code words” CODONS

There are 4 different nucleotides & 20 (common) amino acids – do the math (4 bases & 3 spaces so 4^3 = 64) & you have more codons than amino acids. How to reconcile this? Multiple codons spell the same word (degeneracy) – the  protein translation machinery (ribosomes) knows “gray” & “grey” mean same thing BUT a restriction enzyme might be more of a “purist” & refuse to get near anything that doesn’t spell it the way it considers “proper.” So you can mutate the DNA’s sequence before you stick it in, changing spelling so it’ll get cut by restriction enzyme BUT the protein product won’t be affected 

A helpful tool for checking for opportunities to do this is WatCut: http://watcut.uwaterloo.ca/template.php

3) use restriction enzymes that only recognize the vector – you’ll get the SAME NUMBER of products regardless of whether your insert is in there, BUT the SIZE of the products will be DIFFERENT –  1 will be a lot bigger if your gene’s inside (similar to the logic behind colony PCR w/ vector-specific primers)

📝 A couple practical notes

  • To help you compare, run some controls: negative control: don’t add enzyme – shows you what uncut looks like. also run plasmid-only (you know it doesn’t have insert) and insert-only (you know it doesn’t have plasmid)
  • Unlike colony PCR, you have to purify the DNA first – & you don’t want to waste all your DNA (if you do decide to “hire it” you’ll need the rest – so you typically just test a tiny bit in ~20μL total (1 μL (microliter) is 1 millionth of a liter)
  • You want to check that there aren’t a lot of sites for that enzyme on your plasmid or else you’ll get lots of little pieces that are hard to analyze 
  • The enzymes are “numbered” not “lettered” (e.g. EcoRV isn’t an all-electric RV model, it’s EcoR FIVE (learned this the embarrassing way). It tells you it was the 5th restriction enzyme found in the “RY13” strain of E. coli)

more on restriction enzymes: http://bit.ly/reasesvsmtases 

BUT – with either of these methods, you still don’t know if there are any typos! (is the sequence correct?) Both restriction enzymes and colony PCR primers only require that the short stretches of DNA they recognize are there & typo-free but that’s like seeing that one word in a document is spelled correctly and then taking that as proof you didn’t make any typos anywhere else in the document. 


For definitive evidence, you turn to DNA sequencing (note: I don’t usually do the colony PCR or digest step unless I’m having problems (often not worth it)). The conclusive proof that it’s the correct sequence comes from DNA SEQUENCING – but unlike the type of sequencing that sequences “all” your DNA, we’re only interested in sequencing the specific region with our gene.

Using sequencing primers is similar in setup and concept to vector-specific colony pcr – use 1 primer that matches a sequence upstream of your gene and one downstream. But, unlike in colony PCR, where you have both primers in the same reaction, for the sequencing reactions you do the reactions separately. instead of focusing on making tons of copies, you focus on reading carefully – you read out the sequence as you add each base. Instead of adding both primers in the same reaction, it’s one at a time, so instead of making double-stranded (ds) copies of a defined region of DNA, you start making a copy of a single strand and you “stalk it” as it works 

You put in fluorescently-labeled nucleotides (nucleic acids) so you can “watch them be added” – most methods use special dye-terminator nucleotide methods where the fluorescently-labeled letters are “defective” – they’re dideoxynuclotides (as opposed to the “normal” singly-oxygen-defficient deoxynuclotides (dNTPs). DNA has 1 less oxygen than RNA (at the 2’ position (“right leg” of the sugar) – and ddNTPs are also missing the 3’OH oxygen (“left leg”) so there’s nowhere for more nucleotides to be added after it – it thus acts as a chain terminator – and if it’s fluorescently-labeled (with different colors for the different letters) you can see what letter it ended in

You put in a mix of unlabeled, normal letters and labeled defective letters so you get pieces that all start at the same place (primer binding site) but stop at different letters ofter traveling different distances – you can run those pieces through capillary electrophoresis to separate them by size (like a really long, thin version of the agarose slab gels we often run) and you shine a laser at them as they travel so you can “read out” what letter the pieces end in then read out the sequence. more here: http://bit.ly/DNAsequencingmethods 

To make gene-in-plasmid checking easier, plasmids often contain “standard” sequencing primer sites flanking the gene insertion site. These match standard primers that the sequencing companies will often provide for free – just ship them your plasmid, tell them what to use & they’ll send you the sequences.

But you can also use your own sequencing primers. I have primers that match the regions of the plasmids I commonly use right before and after where the gene goes in. They work no matter what gene’s in there and they’re designed so that their orientation sends Pol traveling into the insert. When you’re checking cloning products it’s especially important that you get good coverage of the insertion sites because that’s where errors are most likely to occur.

If you have a short gene you might be able to read it all with just end primers (if it’s really short only one may suffice) – but if it’s longer you might need to add additional primers that start in the gene itself (you’ll definitely have to custom-design those since they’re gene-specific not vector-specific) 

I usually send DNA from 3-5 colonies of each construct for sequencing – I add the colony to liquid media to let it grow lots overnight, then do a mini prep (alkaline lysis) to purify out the plasmid DNA. Then I use the NanoDrop to figure out its concentration based on its UV absorbance (light-stealing)(the more DNA the higher the absorbance & you can use Beer’s Law to convert between the 2). I want to know the concentration because sequencing companies want a certain quantity of DNA so I need to know how much to add. more here: http://bit.ly/2N4nzXE

I calculate so I’m in the recommended range, add one of the primers, & add water to the desired final volume. I do this one per primer. And then I wrap it in bubble wrap, stick it in a 50mL Falcon tube, and send it off! A couple days later I get sequencing results as a chromatograph with a peak of a “different color” for each letter (the different colors are just the company’s way of overlaying the different fluorescence channels, so it’s not like “T” is actually red & “C” actually blue. In addition to this raw chromatograph data you get the corresponding sequence (at least how the computer reads it – sometimes if the traces are “messy” or you have a long string of the same letter it can miss one, call the wrong one etc – so be sure to look at the traces and not just the letters – and be suspicious of the beginning and ends of the data where the signal’s weak and the base calling not too reliable. 

more on topics mentioned (& others) #365DaysOfScience All (with topics listed) 👉 http://bit.ly/2OllAB0

Leave a Reply

Your email address will not be published.