How to Read a Statistic Without Getting Played

In 1995, the UK Committee on Safety of Medicines issued an urgent warning: third-generation oral contraceptives doubled the risk of blood clots compared to older pills. Women panicked. Doctors stopped prescribing. Abortions and pregnancies surged. What the warning did not clearly state was that the risk had increased from roughly 1 in 7,000 to 2 in 7,000. "Doubled" was technically accurate. Clinically, the absolute increase was 1 additional case per 7,000 women. The panic caused far more harm than the pills. This is not an outlier. It is the rule. Every statistic you encounter in a headline, a press release, or a political speech has been selected, framed, and presented by someone with an interest in how you interpret it. Here is how to stop getting played. Correlation Is Not Causation This is the most violated rule in data literacy. Two things occurring together does not mean one causes the other. Ice cream sales and drowning deaths rise together every summer. Ice cream does not cause drowning. Heat causes both. Countries with more chocolate consumption produce more Nobel laureates. Chocolate does not make you a genius. Wealth correlates with both. From 1999 to 2009, the number of people who drowned by falling into pools correlated nearly perfectly (r=0.89) with the number of films starring Nicolas Cage. Cage did not cause drownings. The trap is seductive because the human brain is wired to find causal explanations. When a study says "people who do X are more likely to have Y," the default assumption is causation. It is almost always more complicated. Survivorship Bias During World War II, the Allied military wanted to add armor to bomber planes. They examined returning aircraft and noted where bullet holes were concentrated -- in the wings and fuselage. The obvious answer: reinforce those areas. Statistician Abraham Wald pointed out the fatal flaw. The planes they were examining had survived. The bullet holes they could see were in areas that a plane could absorb and still fly home. The planes that were hit in the engine or cockpit never came back. The correct answer was to reinforce the areas with no bullet holes on the survivors. Survivorship bias distorts everything from business advice (studying only successful companies and ignoring the thousands that failed doing the same things) to health claims (looking only at people who lived and ignoring those who died under the same conditions). Relative Risk vs. Absolute Risk This is the pharmaceutical industry's favorite trick. Consider a drug that reduces the risk of a disease from 2% to 1%. There are two ways to describe this: Relative risk reduction: "The drug reduces risk by 50%!" -- sounds revolutionary Absolute risk reduction: "The drug reduces risk by 1 percentage point" -- sounds marginal Both are accurate. One is designed to mislead. When you see a headline about a drug or treatment producing a large percentage improvement, ask: percentage of what? The NNT -- Number Needed to Treat -- tells you how many people must take a drug for one person to benefit. For many widely prescribed medications, the NNT is between 50 and 200. The mirror image is NNH -- Number Needed to Harm. If a drug helps 1 in 100 but harms 1 in 200, the calculus changes dramatically. Pharmaceutical marketing highlights the first number and buries the second. Sampling Bias The most famous example occurred in 1936. The Literary Digest conducted the largest pre-election poll in history, mailing 10 million surveys and receiving 2.4 million responses. They predicted Alf Landon would defeat Franklin Roosevelt in a landslide. Roosevelt won 46 of 48 states. The problem: the Digest drew its mailing list from telephone directories, automobile registrations, and magazine subscriptions -- all of which skewed heavily toward wealthier Americans during the Great Depression. The sample was enormous but systematically biased. George Gallup, using a properly sampled group of just 50,000, correctly predicted Roosevelt's victory. Sampling bias persists today. Online polls only capture people who are online and motivated to respond. Customer satisfaction surveys capture people with extreme opinions. Health studies that exclude women, minorities, or the elderly produce findings that do not generalize. How Crime Stats Get Massaged Crime statistics are particularly vulnerable to manipulation, not because the data is fabricated, but because the categories are political: Reclassification: Downgrading aggravated assault to simple assault makes violent crime appear to decrease. The FBI's Uniform Crime Reporting program has documented this pattern in multiple jurisdictions. Under-reporting: Changes in how agencies classify or accept reports can produce apparent declines that reflect procedural changes, not actual crime trends. Cherry-picking time windows: Comparing this month to last month instead of year-over-year can obscure long-term trends. In 2010, the Milwaukee Journal Sentinel found that the Milwaukee Police Department had misclassified over 5,000 violent assaults as lesser offenses over a three-year period, making the city appear significantly safer than it was. The Questions to Ask Every time you encounter a statistic, run it through this checklist: Compared to what? A number without context is meaningless. "X% increase" -- from what baseline? Who was studied? Does the sample represent the population the claim is about? What is the absolute number? Translate relative claims into absolute terms. What is being left out? What would the counter-statistic look like? Who benefits? Who funded the study, and what do they want you to conclude? Statistics are not lies. But they are tools, and like all tools, they can be used to build or to deceive. The difference depends on whether the person reading them knows which end is sharp. They didn't ask if we wanted to know how easily numbers can be weaponized. Now you do. _- The Department_