In tests we trust? Too much? We (desperately) need Covid-19 tests, so a lot of companies are starting to make and/or sell their own versions. BUT a bad test can be worse than no test, so what makes a test better than all the rest? In test development, it’s a tug-of-war between sensitivity (find all those infected) and specificity (avoid false alarms). Those, combined with who gets tested, determine what the test’s predictive values are (how much can you really trust your result?). And, as we’ll see, beware of the term “accuracy”!

It’s on top of many people’s mind – Covid-19 tests – both the diagnostic tests, which look for evidence of the virus’ genetic info to see if a person is currently infected, and the antibody tests, which look for evidence that a person has been infected with the virus previously and their immune system fought it off (or someone’s in the later stage of the infection).  A lot of attention has been given to issues surrounding developing tests, shipping them out, and making them accessible to patients. But less attention has been given to interpreting the results of the tests. When it comes to Covid-19 test results, there are no easy answers about how “good” a test is, but there are some key things to know. Note: first, I’m going to talk in terms of diagnostic tests, which look to see if a person is infected. The same terminology applies to the antibody tests, but you aren’t looking for *current* infection with those. 

You might hear a term called “accuracy” – when used in the technical sense, this tells you what percentage of test results are correct. So if 100 people get tested and 99 of them get the right result, the test has 99% accuracy. Sounds like a useful figure, right? Not so fast… As I show in the figures, a test can theoretically be useless at detecting disease and still have 100% accuracy. This is because accuracy depends on who’s getting tested. Say you have a worthless test that CANNOT detect disease and only ever gives negative results, BUT no infected people take the test… everyone who does take the test will get the right answer (no infection) – so 100% accuracy! But if everyone who takes the test IS infected, then they’ll all get the wrong result – so 0% accuracy! Nothing about the test itself changed – the only thing that changed was who got tested – so accuracy depends on who’s getting tested.

But there are 2 key measures of a test’s usefulness that DO NOT depend on who gets tested: SENSITIVITY and SPECIFICITY. SENSITIVITY is how good a test is at finding all the true infections. And SPECIFICITY is how good a test is at avoiding false positives (saying someone is infected when they aren’t). 

You can think of a test as sorting a bunch of people into a positive group and a negative group. That’s what’s “visible” in the results – but within each of those groups are “subgroups” that you can’t see – within the positives you have the True Positives (TPs) – people who have are infected and tested positive; and you have the False Positives (FPs) – people who are NOT infected but test positive). And within the negatives you have the True Negatives (TNs) – people who are NOT infected and test negative; and the False Negatives (FNs) – people who ARE infected but test negative). We can’t tell which individuals’ results are true & which are false – but, as we’ll see when we look at predictive values, we can guess how many people are in each subgroup and therefore figure out the probability that any individual’s result is right. 

This all sounds super technical, but I hope you’ll bear with me, because there are important consequences to test reliability. It’s important to avoid false negatives, because this can let infected people “slip through the cracks” and potentially go infect more people. So you want a test to have a high sensitivity. A test with perfect sensitivity would never give a false negative.

But, push sensitivity too high and you usually start getting a bunch of false positives, and you want to avoid those because you don’t want to unnecessarily scare someone and/or isolate them, track down their contacts, quarantine them, etc. So you also want your test to have high specificity. A test with perfect specificity would never give a false positive.

Test-makers balance these things, so tests often ends up working best for the most people when about half the people who take the test are infected. But if you take a test and get some result, you don’t really care about how well the test worked for others – you want to know how well the test worked for you! Enter the predictive value… It can’t tell you whether your specific test is right or wrong, but it can tell you, statistically-speaking, how likely your result is to be right. 

Predictive values ask, of all the people in your “result group” – negatives or positives – what proportion got the correct result.  For Positive Predictive Value (PPV), (How much can I test my positive result?) this amounts to all the true positives divided by all the infected people in the test group (TP/(TP + FN)).  Similarly, Negative Predictive Value (PPV) (How much can I trust my negative result?) is calculated by dividing all the true negatives by all the uninfected people in the test group (TN/(TN + FP)). 

No test is perfect – so there are going to be some false positives and false negatives. The more true positives, the more those false positives get “diluted out,” and similarly, if you have a lot of true negatives, the false negatives will only make up a small proportion of the total negatives. It’s easier to see (hopefully) in the figures with some actual numbers, but the more people in the “actual group” (i.e. infected or uninfected) corresponding to your result group, the more likely your result is to be correct. 

So, if a lot of infected people get tested and you get a positive result, it’s more likely to be a true positive than if few infected people get tested and you get a positive result. But, if a lot of infected people get tested and you get a negative result, it’s more likely to be a false negative than if few infected people get tested and you get a negative result. This is one of the reasons why, if you have the symptoms of Covid-19 and test negative you can still have a high likelihood of actually having the disease, so you should act as if you do (plus, even if you don’t have Covid-19 you still likely have something contagious!)

The situation in which a large proportion of the people tested are actually infected is common for diagnostic tests – it’s not a random sample of the population getting tested – instead, people are “self-selecting” – getting the test BECAUSE they have symptoms. So, even if there are few cases in the community, the proportion of people getting tested that has the infection is high. But the opposite situation (most people tested are negative) is common for “screening” and “surveillance” tests. And this is where there can be some real danger with antibody tests used for “sero-prevalence surveys” which try to test a whole population (or at least a large representative sample of it) to see how widespread the disease was.

If you go testing a bunch of people for antibodies against SARS-Cov-2 (the virus that causes Covid-19) in a community where there weren’t many infections, most of the positive results you’ll get will be false positives. But if you test people for antibodies in a hard-hit region, those positive results will have a higher predictive value.  The “better” the test, the less you have to worry about those problems, but many of the tests being used haven’t been thoroughly vetted to make sure that they work:

There’s only 1 FDA-authorized antibody test as of April 18, 2020 – it’s manufactured by Cellex and they report 93.8% sensitivity & 96.4% specificity. 

So, test a lot of people and you’re gonna get a lot of false negatives and false positives. And this is the FDA-authorized one – people can also administer non-FDA-approved tests as long as they’re not used for diagnostic purposes and they include a disclaimer. The reliability of these tests isn’t really known yet and, importantly, we don’t even know how well they tell you if you’re protected from re-infection.

Speaking of reliability, it’s important to recognize that all that stuff about changes in predictive value is just statistics – the test itself isn’t changing, the only thing that changes is how much you can trust the result you get. Even if you’re in the group that’s statistically most likely to be correct, if the test is biochemically bad, the result is still not likely to be useful. 

So it’s important to know specificity and sensitivity, and the test-makers report these. Note: they usually tell you how it performs in their lab, where they’re doing things under ideal circumstances – and often with “contrived samples” – so, for the diagnostic tests they’re usually searching for (safe) lab-made RNA pieces corresponding to the pieces of the viral genome the test’s looking for, instead of actual patient sample. This allows them to precisely quantify things, but you can expect “real world results” to be a little worse. 

For approval (or at least emergency use authorization) the US FDA requires that the diagnostic test makers test at least 30 positive & 30 negative samples and achieve at least 95% sensitivity & 100% specificity. This means the tests can’t have any false positives in the lab setting – in the real world, there will probably be some, but they’ll be really rare. So if you get a positive result, you can be pretty sure you’re infected. But if you get a negative result and have symptoms suggesting it’s pretty likely you have the disease no matter what the test says, you might still be infected. 

So what determines the specificity & sensitivity? How can you get false negatives & false positives?

Let’s start with false negatives – missed infections. Part of it is sample prep problems but, even with a perfect sample, you need to have enough of the thing you’re looking for to reach/pass Level Of Detection (LOD), which varies from test to test. LOD refers to how much of the thing the test looks for needs to be there in order for it to be detected. Imagine trying to determine how many pennies need to be present before a metal detector realizes they’re there. You take a bunch of pennies and start taking pennies away until the metal detector stops beeping.  Similarly, diagnostic tests look for how many copies of the viral genome need to be present and antibody tests look for how dilute blood samples can be to still get detected. 

False negatives can be caused by: 

for diagnostic tests: these tests usually involve a nasal swab which apparently is really uncomfortable because the doctors have to push the swab further up a person’s nose that seems possible – but if they don’t get a good sample, there might not be enough viral particles in it to detect. This is one of the reasons there’s concern about some proposed at-home tests – people likely won’t stick the swab far enough up… And the samples have to be handled and the genetic info isolated carefully so it doesn’t get degraded. Even if a sample is properly gotten and handled, there still might not be a high enough “viral load” (enough copies of the virus) to reach the test’s LOD. And this might be problem with the ID NOW “rapid tests” 

for antibody tests: antibodies are little proteins that an infected person’s immune system makes to specifically bind parts of the virus and recruit helpers to destroy it. Some antibodies stick around after the fact to keep watch, which is why antibody tests can detect past infections. But even people who really were infected can test negative with these antibody tests. One reason is that the tests usually look for antibodies against one part of the virus (like the nucleocapsid protein that surrounds the viral RNA) but someone might have antibodies against another part of the virus. Or they might not have enough antibodies to reach the LOD. 

Those problems involve the sample, but there can also be problems with the test itself, which is why some sort of control is usually included, like a sample you know is positive and one you know is negative. 

False positives can be caused by:

for diagnostic tests:  contamination from another sample but, as mentioned above, false positives are really rare for most diagnostic tests (at least now that the CDC worked out their testing problems)

for antibody tests: past infection with similar viruses (like the original SARS virus) that led a person to produce antibodies that cross-react (they can also bind the SARS-Cov-2 virus parts that the test is using as probes).

I’m not trying to rain on anybody’s parade. I think test are REALLY IMPORTANT – even with their flaws. But, we need to make sure we don’t rely entirely on what they say when deciding how to let people go about their day! 

more on Covid-19, including how the tests work: 

more on topics mentioned (& others) #365DaysOfScience All (with topics listed) 👉

Leave a Reply

Your email address will not be published.