Wednesday, May 13, 2020

Base-Rate Neglect in the News

https://www.nytimes.com/2020/05/13/opinion/antibody-test-accuracy.html


It has been a while since I have thought about the fallacy of base-rate neglect. I did not even think about it when I was recently talking to someone about the reliability (or lack thereof) of tests for SARS-CoV2 antibodies. The piece by Todd Haugh and Suneal Bedi published in the New York Times today (linked above) is a useful reminder.

But it seems to me that Haugh and Bedi do not state their example clearly enough (perhaps because of editorial pruning). I would state it this way: Suppose that a test for SARS-CoV2 antibodies has a sensitivity of 90%. This means that it gives a positive result to 90% of subjects taking it who actually have the antibodies. Suppose also that it has a specificity of 90%. This means that it gives a negative result to 90% of subjects taking it who don't have antibodies. Suppose also that the incidence of antibodies to the virus in the population is (as the writers estimate it to be) 5 percent. And suppose, finally, that 2,000 randomly selected people take the test. If our sample is perfectly representative, 100 of the people taking the test will have the antibodies; and of these, 90 will get a positive result. Of the other 1,900 people taking the test, the ones who don’t have the antibodies, 1710 will get a negative result. But this means—and this is the most significant part—the remaining 190 will get a (false) positive result.

This is significant because it means that, of the 280 people in total who get a positive test result, more than two thirds will not have antibodies. In more general terms, the lower the base rate of what you are testing for, the higher the ratio of false positives to true positives, and therefore the less reliable a positive test result is. If the base rate is 10% then out of our representative sample of 2,000 tests there will be 180 true positives and 180 false positives. So, assuming 90% specificity and sensitivity of a test (which, as I gather, is far better than what any test now available can offer), the base rate has to be above 10% for a positive test result to be more reliable than a coin toss. (A coin toss as to whether a given positive result is true or false, that is, not a coin toss as to whether a given person has antibodies.)

These figures, of course, only describe a mathematical model. One assumption of the model is that persons taking the test are representative of the whole population with regard to the incidence of antibodies among them. This is not necessarily the case. In fact, it is not even probable; rather, people who have had symptoms that they attribute to COVID-19 are more likely to take the test than people who have not. Consequently, the incidence of antibodies to the virus will be higher among persons taking the test than the base rate of the total population. How much higher? I don’t know, and I don’t know how one would go about estimating such a thing.