I know you’ve been hearing A LOT about COVID-19 (the disease caused by the novel coronavirus SARS-CoV-2 (aka 2019-nCoV)). And I have been avoiding posting anything about it because I didn’t just want to add to the flood of information that can feel super overwhelming, and I know we can all use a breather. I am really grateful for public health researchers and their outreach, but this isn’t going to be another one of those posts telling you to wash your hands (which you should do) – instead, I want to tell you about how the diagnostic tests work. I’m not going to get into any of the politics stuff – and I’m not going to make any predictions – so please don’t ask me about that. But, I thought one relatively under-covered area I might be able to help explain is the science behind how the tests actually work.
Most of the tests, including the US’s official CDC tests, utilize what is called Reverse-Transcriptase Polymerase Chain Reaction (RT-PCR), to detect SARS-CoV-2’s genetic information. SARS-CoV-2 is an RNA virus – instead of storing its genetic blueprint (genome) in DNA like we do, it keeps it in RNA. And as single strands.
Within this RNA are instructions for making proteins that the virus needs. A virus really only “cares” about one thing – making more copies of itself and infecting more cells, and so the genes you find in its RNA reflect what it needs to do this. A few examples:
- It has to make copies of its RNA, so it needs an RNA-dependent RNA polymerase (RdRP) that can travel along the RNA strand and use it as a template for making a complementary RNA strand – which can then be used to make a copy of the original template strand which can then be used to make a copy of the complementary strand which… you get the point – thanks to the complementary base-pair-ing nature of RNA & DNA (the letter A binds to U (or T in DNA) and G to C) it’s “copyable” And, since the virus encodes for an RNA-dependent RNA polymerase, it’s able to copy its genome.
- Then it has to coat those RNA copies in a protein “shell” called a nucleocapsid for protection on its journey, so it needs a gene for the nucleocapsid protein (this is called the N gene and it is what the CDC tests look for).
- Next it has to be able to bud out of the cells its currently in so that it can go find and get into a new cell where it can do it all again, so it needs an envelope protein for this (WHO’s tests look for this “E gene”).
- But, before it can get into a new cell, the virus needs to “stick” to that cell’s surface and it does this using Spike (S) proteins that jut out from the viral envelope like a crown (hence the name corona) and bind to receptors on the host-cell-to-be’s surface.
Humans don’t need those things for ourselves, so we don’t have instructions for them in our genome, so if scientists find those RNAs in a person, it indicates that a virus is in that person.
But how do you go about finding that RNA? There are a couple of challenges, the first of which is that, even if there’s a bunch of virus, the total amount of RNA is still pretty tiny. And, to make things even worse, RNA is pretty unstable – so, in the process of isolating it (RNA extraction) you lose some of what you started with
So you need really sensitive methods to detect it amongst all the other RNA & DNA (collectively called “nucleic acids”) present. And one way to do this is to amplify the signal while not changing the “noise” – make a bunch of copies specifically of the viral genetic info that’s present.
Thankfully, scientists have known for half a century or so how to get a DNA Polymerase to make lots of copies of specific stretches of DNA in vitro (in a tube) using a method called Polymerase Chain Reaction (PCR). The stretches to be copied (amplicons) are specified by short pieces of DNA called “primers” that are designed to bind to where you want the copying to start and stop (one primer per strand). PCR is carried out in a series of cycles where you ANNEAL – bind the primers -> ELONGATE – let the DNA Pol copy the stretch to give you double-stranded DNA -> MELT – raise the temperature so the strands “unzip” and you can do it all again
But the DNA Pols we use are DNA-dependent DNA polymerases – a mouthful of a phrase that just means that they make DNA copies from DNA templates. Which brings us to problem number 2: SARS-Cov-2 is an RNA virus?!
Thankfully, we have a solution to this too, which is where the “RT” part of “RT-PCR” comes in. RT stands for Reverse Transcription, and it’s the process of making a DNA copy of an RNA template. It’s called reverse transcription because the process of going from DNA to RNA is called transcription, which is what our cells do to make RNA copies of our DNA genes to use as protein-making instructions
So, before the actual PCR part, can use a “Reverse Transcriptase” enzyme to make complementary DNA copies of the viral RNA, and now we have DNA that DNA Pol will happily copy if we provide primers.
But we still need a way to detect the copies that get made. This is done using fluorescent probes, which are kinda like primers in that they’re short DNA pieces that pair specifically with the region you’re interest in. But instead of binding the ends, they bind somewhere in the middle of the copied region and allow you to see that a copy’s been made. How it works is pretty cool, so if you stick around later, I’ll give some more details.
The primer/probe combos allow us to look for those little stretches of letters in the viral genome that this virus has but we don’t (such as parts of the N, E, or RdRP genes). But in order to find those little stretches on the viral RNA we first need to find the viral RNA. Where do we look for that? Tests are normally performed on swabs from your nose or throat (officially referred to as nasopharyngeal (NP) or oropharyngeal (OP) specimens, respectively. You might also be asked to cough up some sputum to get a “lower respiratory” specimen as well.
update: now a lot of the tests are done with shorter swabs that just swirl around in the front section of your nostril
With all the news reporting on COVID-19, it’s easy to forget that influenza (flu) and just your everyday common cold are still raging (and part of the reason that there’s a huge public health benefit to spreading out and even just delaying the number of COVID-19 infections is that healthcare facilities are still full of flu patients and don’t have the capacity to take on a rush of people needing respirators and other equipment (and doctors, nurses, beds, etc.) So I highly encourage you to check out the “flatten the curve” initiative. You can find more information on it here: https://www.npr.org/sections/health-shots/2020/03/13/815502262/flattening-a-pandemics-curve-why-staying-home-now-can-save-lives
The reason I’m bringing up these still-more-likely-in-most-cases causes of your sore throat and cough is that it’s important to rule them out and, even if it is not COVID-19, you (and epidemiologists) still want to know what is making you sick. So doctors often test for these first and/or in addition to SARS-CoV-2, which you can do if you also include primer/probe combos that target those other viruses and/or if you add on different tests. For example, you might of heard of the “BioFire” test – this is a chip-based test that can basically do tons of different PCR reactions in different wells of a single chip, so it can test for a large number of common respiratory diseases at once – the 21 diseases it tests for doesn’t yet include SARS-CoV-2, but the company that makes it, BioMérieux, is working to add it. update: they did!
But for now, let’s focus on the tests we currently have – the CDC tests use “TaqMan” probes. These “dual-labeled hydrolysis probes” work using something called FRET. Don’t fret if you don’t know what that means – let me explain. FRET stands for Forster Resonance Energy Transfer and it’s this cool phenomenon that allow you to tell if 2 molecules are nearby each other. One molecule, the “fluorophore” is able to give off light, but only if the other molecule, the “quencher” isn’t nearby.
In the probes, they *are* nearby (at least in the beginning…) And speaking of beginning in a different sense, the fluorophore (a chemical group called FAM (6-carboxyfluorescein) in the CDC probes) is on the beginning of the probe (what we call the 5’ end) and the quencher (BHQ (Black Hole Quencher) in the CDC probes) is on the other end of the probe (the 3’ end). The probes are only about 20 letters long, so the quencher is near enough to the fluorophore to keep it from shining (they’re just a few nm apart, which is about 100,000 times less than a hair width apart).
Light is a form of energy and different wavelengths of light have different energies. If you have a fluorophore and you shine light of a wavelength that the fluorophore “likes,” the fluorophore absorbs the light and enters an “excited state” – but it’s hard to stay excited for long, so it comes down from the high & releases that energy which it had absorbed as light that you can detect.
But light isn’t the only way energy can be transferred – another way is through FRET. If the quencher likes the amount of energy the fluorophore would normally give off *and* that quencher and fluorophore are close enough, the quencher can absorb the energy that the fluorophore would normally give off as light – it “quenches” the fluorescence http://bit.ly/2m5hpfh
But in order for that quenching to happen, the fluorophore & quencher have to be close together. When they get separated, the reporter’s free to shine.
And they get separated when the DNA they’re bound to gets copied, because the DNA Pol used to do the copying, Taq polymerase, has 5’ nuclease activity. So, if it goes to copy some DNA and there’s a probe “road block,” Taq can “chew up” the probe when it runs into it – this allows Taq to displace the probe and carry out its copying – and it separates the fluorophore from the quencher, allowing us to see the light.
This light serves as a signal that a copy got made since Taq can “only” run into the probe if it’s copying the DNA the probe’s bound to (I mean theoretically they could just bump into each other anywhere but it’s really unlikely unless they’re brought together by the copying process). And a copy getting made means that the target sequence (such as the viral gene) was present to get copied, indicating infection.
But, since you have so few copies in the beginning, the amount of fluorescence you originally see is really low – indistinguishable from the background “noise” – so we need to amplify it.
In between each copying cycle, you have another melt step where the strands come apart, and this allows unchewed (quenched) probe to bind during the anneal step (the primers also bind then too). And then when you enter the elongation phase, where the DNA gets copied, the Taq runs into and chews up more probe. And so more dye gets “un-quenched” so you see more fluorescence.
Since each strand is used to make 1 copy each cycle, you go from 2 -> 4-> 8 -> 16 -> 32, etc. (exponential growth) so you see an exponential increase in fluorescence until something runs out (probe, primer, nucleotides, etc.). The more copies you start with, the faster the fluorescence will climb, so you can designate a “reference threshold” level of fluorescence and then see “how fast” different sample/primer/probe combos reach that reference. By “how fast,” we usually talk in terms of amplification cycles.
If you plot cycle # vs fluorescence, you get a sideways-candy-cane shaped curve, which there isn’t really a good symbol combo for, but kinda ,- ish (just look at the pic…)
What you’re looking for is a value called the Ct value which is the # of cycles it takes to pass a “threshold line” corresponding to the background fluorescence level (noise) – when you cross the threshold it means you get “above background” – so you know you have real signal and not just noise. It’s hard to tell in the first cycles because you have so little being copied, but the more copies are in your initial “little amount” the sooner (fewer cycles) you’ll cross the threshold.
So, RT-PCR is commonly used to look at how many copies are originally present (not a “yes/no” but more of a “how much so.” But with these tests, the # of copies originally present doesn’t tell you that much because it depends on things like how much RNA was in the sample and whether some got degraded, etc. Instead you need a yes/no answer, so a threshold isidentified and if a sample/target combo makes it above the threshold, that target is positive – but whether the test is considered positive overall depends on how the other targets went.
The tests usually look for at least 2 SARS-CoV-2-specific gene parts to be super sure. If both get amplified, the test is positive; if one gets amplified, the test is inconclusive; and if neither gets amplified, the test is negative (although there is always a chance of a “false negative” especially if the samples got degraded – remember how fragile RNA is!)
Of course, this is all assuming that the negative control (a sample of RNA that’s not from SARS-CoV-2) came back negative and the positive control (a sample of RNA that you know is from SARS-CoV-2) came back positive – if they didn’t, there’s something wrong with the test and the results are invalid.
And this can be a real problem, as the US found out the tragically hard way. RT-PCR is *really* sensitive – because each original RNA copy that’s there gets exponentially amplified. So a tiny amount of contamination can cause a negative sample to test positive (a so-called “false positive”). And this might have been what was happening with some of the CDC’s early test kits – some of the primer/probe combos were sometimes giving false positives. update: more on that here: https://bit.ly/cdctestproblems
Until recently, other labs were restricted from producing their own tests for the virus, so it has taken a long time for testing to be conducted in the U.S. and there is still a great need to ramp it up. There are also efforts being made to semi-automate the process so that more tests can be done more quickly – the CDC protocol uses 96-well PCR plates (about the size of a big iPhone, with the wells about hole-punch-sized) that scientists have to manually pipet into. So, even though the PCR part only takes a few hours or less, the set up takes a while. Automated and semi-automated methods help speed this up but require more expensive equipment.
But even if the PCR part is greatly sped up, RNA extraction is still more complicated. One potential holdup is that the RNA extraction is typically done using commercially available “kits” like a DSP Viral RNA Mini kit. The kits make things easier, but they aren’t necessary – RNA was being extracted way before Qiagen started making kits for it – and here’s some info the “old-school” method of RNA extraction, which uses phenol-chloroform to isolate RNA from other stuff based on their relative solubilities http://bit.ly/2Xj4Zyc
The RT-PCR tests are just one way to test for the virus – and it only detects it when people are still acutely infected, and the virus is still making all that RNA to make all the proteins it needs to make more of itself and infect more cells.
Once the virus is “conquered” by a person’s immune system, that viral RNA isn’t there anymore; however, evidence of the proteins made from it is – the immune response that allowed the body to fight off the virus involved making little proteins called antibodies that recognize specific pieces of the viral proteins as “foreign” and trigger an immune response.
After the initial infection, it takes a while for the body to develop antibodies against it – the process involves the viral proteins getting chopped up and their pieces placed “on display,” held by proteins jutting out from immune cells. Your body goes through a random “trial and error” approach to making antibodies that recognize (bind to) those viral protein pieces and then make more of the matching antibodies. More here: http://bit.ly/antibodytypesuses
Some of these antibodies stick around after the infection’s over to “keep watch” so that, if that same virus tries again, the immune system doesn’t have to go through the trial and error phase of finding an appropriate antibody. So, tests that look for antibodies can see if someone previously had the virus, even after they’ve recovered, and this can be used to trace cases back to see the line of transmission even if the transmitters are no longer symptomatic and don’t have the RNA that the RT-PCR tests could detect.
The antibody tests are quicker and they’re typically done on blood samples, but a downside with them is that, since they come from the immune response finally gaining some ground on the virus, they can’t detect the virus as early in an infection, while the RT-PCR way can.
Vaccines work by using non-infectious virus or virus parts to activate the immune system and get your body to go through that trial and error process of generating antibodies that recognize the virus and stick it on the “watch list” without you having to first have the disease, and scientists are working to create safe, effective vaccines against SARS-Cov-2, but it’s going to take quite some time because they have to be developed and tested for safety and then for efficacy. update: looking good!
I hope this post helps people understand what’s going on, and, speaking of help, I wanted to be super duper sure I was only giving the most accurate information possible, so I got some “peer review” from friends and I’d like to give them a huge shout-out. Thank you to Katie Meze, Dr. Dr. Alexandra Newton, Dr. Justin Kinney, Dr. Charles Murtaugh, and Dr. Elisa Zhang.
more on topics mentioned (& others) #365DaysOfScience All (with topics listed) 👉 http://bit.ly/2OllAB0
update: After originally posting this, with support from the IUBMB, volunteers from around the world pitched in to translate the figures into close to 30 languages, and you can find them and subsequent posts I did on additional types of coronavirus tests (such as antigen tests and some of the rapid tests) here: https://bit.ly/covid19bbresources
updates added 11/22/20 but I wanted to leave the original wording because it seems kinda historical-ish right now and represents that frame of mind I was in then when all of this was just getting started