You’re Invited to a Colonoscopy!

Dynomight

Colonoscopies are the first-line method for preventing colorectal cancer in America —and almost nowhere else. But do they work? We finally have a comprehensive trial, but it’s left gastroenterologists with more questions than answers.

Colorectal cancer is the second-most deadly cancer, killing over 1 million people per year around the world — 1.7% of all deaths. In the United States, where colorectal cancer causes 50,000 deaths per year, the foundation of the fight against it is the colonoscopy. Getting one periodically is recommended for everyone over the age of 45.

Colonoscopies are rarely used for screening elsewhere but have been standard in the U.S. for decades. There are many reasons to think that they should work. But they are also expensive, invasive, unpleasant, and rarely — but not that rarely — have serious side effects. Are they worth it?

Until recently we didn’t have any randomized controlled trials that directly tested how well colonoscopies work. We finally just got one and the results were — how can I describe them? Confusing? Ambiguous? Frenzy-inducing?

Let’s try to understand what to make of this trial, and why American gastroenterologists were so quick to criticize it.

Reminders About Tubes

After you swallow food, your body uses rhythmic waves of contractions to send it on a 4-meter (13-foot) journey through your esophagus, stomach, and small intestine. These extract most of the food’s nutrients and render it into a pulpy acidic fluid called chyme. The chyme then travels through your colon, a 1.5-meter (5-foot) tube that reabsorbs water and electrolytes, creating a solid mass that is then moved to your rectum for storage and eventual disposal. Yay!

The outermost layer of your inner colon is a single layer of epithelial cells whose job it is to let the good stuff through and keep the bad stuff out. Stem cells deeper inside the colon constantly divide to make new epithelial cells, which climb to the surface and live for four or five days before committing “suicide.”

Colonoscopies rest on the adenoma-carcinoma hypothesis. The idea is that errors can arise in the DNA, resulting in epithelial cells that don’t die on schedule. If they do anything too weird, your T-cells will kill them. But some mutations fly under the radar, causing little clumps of cells to grow on the surface of the colon. These clumps, or “polyps,” are usually not cancer — they grow slowly, and won’t (yet) spread to neighboring tissues. But if these persist for many years, they can acquire additional mutations that make them start spreading.

To prepare for a colonoscopy, you must empty your colon. This is achieved by drinking some chemicals and enduring some spectacular biological functions. Then a doctor threads a 1.5-meter (5-foot) flexible tube with a light and camera to look at the entire colon and remove or sample any polyps. The idea is not just to detect cancer but, by removing precancerous polyps, prevent it.

The primary alternative to colonoscopies for colorectal cancer screening are “occult blood tests” that look for spooky hidden blood in the stool. The oldest of these use an extract of the guaiacum tree and have RCTs showing they reduce colorectal cancer mortality by 9%-22% when used for screening. Newer tests look for antibodies and/or genetic mutations. These are more sensitive, though we don’t yet have RCTs estimating how much they help with mortality.

Another alternative is an older procedure called a sigmoidoscopy, which is basically a “mini” colonoscopy with a 0.6-meter (2-foot) tube. Compared to colonoscopy, it is quicker, safer, less painful, and cheaper, but it can only look at the lower (“sigmoid”) colon. Still, randomized trials have shown that screening sigmoidoscopies reduce colorectal cancer deaths by 26%-30%.

In principle, colonoscopies should be better than either of these tests. Unlike blood tests, colonoscopies try to remove polyps before they become cancer. And unlike sigmoidoscopy, colonoscopies can examine the whole colon.

But how much does it actually help to remove precancerous polyps? Gastrointestinal doctors often point to the National Polyp Study, but this is not a true randomized comparison — the study did colonoscopies on all subjects and concluded, based on comparisons to base rates in other “similar” populations, that removing polyps helped. And how much does it help to screen the whole colon? Cross et al. compared sigmoidoscopy to colonoscopy in English patients with suspected colorectal cancer and found that sigmoidoscopy was sufficient to detect 80% of cancers.

Because of the cost, the lack of direct evidence for efficacy, and the fact that it’s hard to convince people to do colonoscopies, they are rarely used for cancer screening outside the United States and some parts of German-speaking Europe. So it would be really useful to have an RCT that tested how well screening colonoscopies work.

The Trial

That brings us to the star of our show. The Nordic-European Initiative on Colorectal Cancer (NordICC) is a huge randomized trial aimed at rigorously measuring how much colonoscopies reduce cancer and death.¹

Here’s what the researchers did: Between 2009 and 2014, they identified 85,179² subjects mostly in Poland (64.1%), Norway (31.2%), and Sweden (4.3%), drawn at random from population registries of people between 55 and 64 years old.³ They invited one-third of them to a one-time screening colonoscopy. Of those contacted, 42% accepted the invitation and underwent a colonoscopy, while 58% refused the invitation. The other two-thirds of people were not contacted and seemingly never knew they were in the trial. The researchers then followed everyone (invited or not, colonoscopy or not) for a median of 10 years and checked government records to see who had been diagnosed with colorectal cancer, died from colorectal cancer, or died from any cause.

This was an “intention-to-screen” analysis. That means that it compared the control group to the whole invited group, including both the 42% of people who agreed to a colonoscopy and the 58% who refused. (If that seems strange, keep reading.)

These were the main results:

The 18% reduction in colorectal cancer incidence was statistically significant, while the 10% reduction in colorectal cancer mortality and 1% reduction in overall mortality were not.

So the reductions — they are small. This was a surprise.

The study had a huge sample and simple, reliable statistics. The authors seemed to expect a stronger showing for colonoscopies. When that didn’t happen, they made no excuses — they just followed their preregistered statistical plan and published the results. We want research to be reproducible, right? Well, then this is what we want people to do.

The Debate

This paper was greeted with gastroenterological bedlam. One response was that it shows colonoscopies aren’t cost-effective, since there was only a small effect on deaths from colorectal cancer, not much better than in previous trials of less invasive screening methods. Others took the more extreme stance that the study was further proof that American medicine is rotten, subjecting us to horrible expensive procedures for little benefit.

Overwhelmingly, American gastroenterologists did not agree. A primary concern expressed on Twitter (now known as X) seemed to be that the study might cause patients to do the worst thing a patient can ever do: Read a paper and ask questions about it.

For a little slice of the debate, take CNN’s piece, “New Study Examines the Effectiveness of Colonoscopies.” It describes the study and results, gives a few quotes from the authors and other experts about how the results are surprising, cautions that it isn’t conclusive and further research is needed, and notes that “other studies have estimated larger benefits for colonoscopies, reporting that these procedures could reduce the risk of dying of colorectal cancer by as much as 68%.”

The American College of Gastroenterology disagreed with CNN so strongly that it released a rebuttal, claiming:

1. The way the study was set up, colonoscopies couldn’t possibly succeed because only 42% of people agreed to screening, 10 years isn’t long enough to see a benefit, and European doctors are bad at colonoscopies.

2. There is other evidence that removing polyps and doing colonoscopies save lives.

3. It’s irresponsible to call colonoscopies “invasive” (as CNN did) since that might make people think they are unpleasant and not do them.

4. Actually, the study shows colonoscopies are great.

Similar points were raised elsewhere, such as in a letter doctors sent to the New England Journal of Medicine. Are these points correct? Why were American gastroenterologists so eager to dismiss this trial?

Did the Trial Actually Show That Colonoscopies Are Great?

By far the biggest controversy with the trial comes from the fact that only a minority of people agreed to screening. Colonoscopies can’t do anything if they don’t happen.

Many doctors suggested that if you look at the numbers in the right way, then this trial shows colonoscopies are great. For example, Hanley argues that while only 42% of people agreed to screening, among the people who did, colon cancer incidence was reduced by 31% and colon cancer mortality was reduced by 50%.

Many, many other doctors repeated the claim that this trial showed that getting a colonoscopy reduced the risk of dying of colorectal cancer by 50%. A common meme was to reframe the 10% reduction in colorectal cancer mortality in the main intention-to-screen analysis by rewriting the title in dismissive ways like “Effect of Mailing Letters to People on Risks of Colorectal Cancer and Related Death.”

In my opinion, that 50% number is at best unreliable and probably just wrong. Looking into why gives a great lesson on why statistics is hard and how that hardness conspires with social dynamics to confuse people.

In this study, there were three types of people:

Controls, who were never contacted

Refusers, who were invited to get a colonoscopy but declined

Acceptors, who were invited to get a colonoscopy and chose to proceed

The main analysis only compares the controls to the combination of the acceptors and refusers. One reason to do so is that all screening programs see some proportion of people refuse assessment; what the analysis provides is a direct estimate of what an actual program would do.

But if you’re deciding if you should get a colonoscopy personally, then you’d like to know how much it would help. Naively, you might try to find out by looking at results for the acceptors versus controls, ignoring refusers. Unfortunately, there’s a deep problem: Acceptors and refusers are different. They might vary in age, sex, education, ethnicity, family health history, or comfort with their own tubes. Even if the colonoscopies had never happened, they would have had different outcomes from the controls. This is not hypothetical! At the end of the study period, 1.2% of the controls were diagnosed with colorectal cancer, compared with 1.05% of refusers and 0.89% of acceptors.⁴

The refusers have less colorectal cancer than controls, even though neither had colonoscopies. The likely explanation is that people have some idea of their risk, either because of their family history, their lifestyle, or how their tubes have been feeling recently. Those at lower risk are more likely to refuse. (It’s also possible that refusers just hate going to the doctor and so just had fewer diagnoses, but never mind.)

So you can’t directly compare acceptors to controls or to refusers. They are different populations. Sadly, there’s no good way to solve this problem. But there are many bad ones!

The main results in the paper are the intention-to-screen analysis that compares the controls to the whole invited group. This understates the benefits of colonoscopies because it ignores the fact that less than half of invited people agreed to have colonoscopies. But this analysis is ultrarobust, because it compares two perfectly randomized groups. It found an 18% reduction in colorectal cancer diagnoses and a 10% reduction in colorectal cancer mortality.

An intuitive way to fix that bias would be to do a naive correction and pretend that the 42% of people who agreed to colonoscopies were random. If that were true, then the benefits if everyone agreed would have been 1/0.42 = 2.38 times as large, meaning that if 100% of people agreed instead of 42%, there would have been a 43% reduction in colorectal cancer diagnoses and a 24% reduction in colorectal cancer mortality.

I call this “naive” because people did not agree at random. But what’s the bias? Remember, people at higher risk were more likely to agree. That means that the colonoscopies that were done were allocated more efficiently than a random allocation. Thus, the naive correction is likely an overestimate. We’d expect somewhat less than a 24% reduction in colorectal cancer mortality if everyone accepted.

One thing you can do is estimate what would have happened to the acceptors if they hadn’t gotten colonoscopies. Imagine a world where the trial never happened. Outcomes for the control group and the refusers would stay the same — the only people with a different result would be the ones who would have accepted colonoscopies had the study occurred. On the whole, though, the would-be invited group (that is, the would-be refusers and the would-be acceptors) would be just like the controls, since no one would have received any screening. With this in mind, we can work out what would have happened to the would-be acceptors in that world. You can see this in the dotted line in the graphs below:

Source: Dynomight, data from NordICC

The right graph compares those results to the observed outcomes for acceptors in our branch of the multiverse, where they did have colonoscopies. At the end of the trial, there’s a reduction of 37%. Unfortunately, data aren’t available to do this for mortality, but this confirms the idea that the true effects are probably a bit lower than what the naive correction gives.

So where does this “50% reduction in colorectal cancer mortality” everyone is quoting come from? This is a complex per-protocol analysis that tries to use math to “make” the acceptors be like the controls. The details are a little fuzzy because the paper does the common academic thing where it says the details are in another paper that itself cites other papers in an infinite regress. But, basically, the analysis takes data from all the acceptors and refusers and fits a big equation to predict someone’s odds of getting diagnosed with or dying from colorectal cancer based on their age, sex, country, and group (acceptor or control). Then they see how much those odds change when you flip the group assignment. This gives the much quoted decrease of 31% in colorectal cancer diagnoses and 50% in colorectal cancer mortality.

Because that kind of analysis is known to be unreliable, they also included a sensitivity analysis as a sanity check. This gave a 34% decrease in colorectal cancer diagnoses and a 28% reduction in colorectal cancer mortality.

To summarize:

The last three analyses all give something like a 33% reduction in colorectal cancer diagnoses. I think we can be reasonably confident that’s right.

But should we believe that 50% number from the per-protocol analysis for colorectal cancer mortality?

I mean … no?

Let’s start with common sense. Certainly, if everyone invited had colonoscopies, the observed 10% reduction in colorectal cancer mortality would have been higher. But would it have been five times larger, even though almost half of people already got colonoscopies? Doubtful. The naive correction gives a reduction of only 24% in colorectal cancer mortality, and we have strong reasons to think this is an overestimate.

The 50% number comes from a fiddly statistical technique that only works under strong assumptions. One of these is that all the ways controls and acceptors vary are included in the dataset. But the dataset used was very limited — it didn’t have income, education, family history, or how people’s tubes were feeling when they got the invitation. If any of those things influenced people’s choices to accept, then this analysis wouldn’t have been able to correct for the biases between the groups. And there weren’t that many colorectal cancer deaths during the trial (229 total), so the analysis was done on sparse data.

And the sensitivity analysis — the point of this was to check how sensitive the per-protocol analysis is to its rather tenuous assumptions. It worked! It showed that the “50% reduction in colorectal cancer mortality” calculation is exquisitely sensitive to those assumptions! But then everyone just sort of ignored it.

So what happened? How did we end up with so many gastroenterologists quoting this 50% number?

I wouldn’t blame the study authors. The paper largely just carries out the preregistered analysis without much comment. But in a letter to the journal, the authors emphasize that the per-protocol analysis is “prone to bias” and “not as trustworthy” and “caution against overinterpretation … because the number of deaths to date was small.”

But further removed reports get from the original study, the more those cautions disappear. Other doctors writing about the study didn’t pay much attention to those concerns. By the time you get to the doctors on social media, they’re basically gone.

It’s hard not to think that there’s some motivated reasoning at work here. Around 15 million colonoscopies are done in the U.S. each year, at an average cost of something like $3,000 for each. I don’t think doctors are intentionally distorting the truth. But they have been trained that colonoscopies are extremely effective, and most of them are busy and — no offense — have only a superficial understanding of statistics. So when a study comes out with various numbers, it’s human nature to look much harder for flaws in the surprising numbers than the ones that reinforce your worldview.

Or, to put it another way: Imagine a world where the intention-to-screen analysis showed strong effects, but the per-protocol analysis showed weak ones. Which one do you think everyone would be quoting?

Intermission

There are no good colonoscopy jokes. I asked everyone and found none that were funny or tasteful. Here’s the least-terrible one, a joint effort of myself and a large language model.

Q. Why do Bayesians love colonoscopy trials?

A. Because they use posteriors to inform priors.

I know, I know. Anyway.

Was the Trial Too Short?

Many doctors also objected that the trial didn’t give colonoscopies a fair chance. One theory is that 10 years isn’t long enough (we'll know more when the trial ends and 15-year mortality is released). It takes years for polyps to turn into cancer and even longer for that cancer to kill you. Maybe the colonoscopies removed lots of polyps that would have eventually turned into cancer and/or deaths, but wouldn’t have done so yet.

The best way to check this theory would be to look at how previous trials evolved after 10 years. We don’t have any such trials for colonoscopies, but we do for sigmoidoscopies. NORCCAP ran in Norway, where 61% of men agreed to screening. UKFSST ran in the U.K., where 71% agreed to screening. Here is the reduction in colorectal cancer diagnoses and mortality in the invited group compared to the control group in all three trials over time:⁵

Source: Dynomight, data from NordICC, UKFSST, and NORCCAP

For diagnoses, the “reduction” is negative at first because screening catches a lot of cancer that would otherwise be missed. In NordICC, the reduction is smaller than sigmoidoscopy trials probably just because NordICC saw fewer people agree to screening. But, overall, the NordICC curve looks similar to the other trials, suggesting modest gains over time.

For mortality, things just look weird. The other trials seemed to reach their maximum benefit by five to seven years and then sort of bounce around randomly. But NordICC actually had higher colorectal cancer mortality in the invited group until almost the end of the trial. I assume this was just bad luck. It’s hard to say what that green line will do in future years.

Do European Doctors Suck?

While the equipment is standardized, colonoscopies are effectively artisanal: The doctor decides what looks bad and what should be sampled or removed. Another theory for the small effect in the NordICC trial is that since colonoscopies are less common in Europe, the doctors had less experience than American doctors.

The skill of a doctor doing a colonoscopy is often measured by their adenoma detection rate — how often they find a precancerous polyp. American doctors find them 40% of the time. In the NordICC trial, doctors found them only 31% of the time. Many suggested this meant the doctors weren’t great.

Cappell et al. counter that perhaps it wasn’t the doctors but that there were more adenomas to find in the American colons because of “higher prevalence of obesity (30% vs. 12%), a higher rate of ever having smoked cigarettes, higher fat and cholesterol consumption, and poorer physical fitness.”

Dominitz and Robertson reply that lifestyles notwithstanding, the U.S. has substantially lower rates of colorectal cancer. I was surprised by this and looked up the numbers. Here are the rates per 100,000 people aged 55-64.⁶

The obvious interpretation is that Americans have fewer adenomas than Europeans, meaning the doctors in the trial were worse than American doctors at detecting them. But be cautious: Diagnoses aren’t a great measure of incidence, since they depend on how aggressively you look. And deaths also aren’t great, since they depend on treatments. It’s also conceivable that Americans do have more adenomas, but because colonoscopies are already common in the U.S., many get removed before they become cancer.

Also, I’m not sure I trust that the diagnosis rates in this data reflect cancer rates: Do we really believe that Norway has 45% more colorectal cancer than Sweden, despite the fact that these countries have similar diets, life expectancies, and mortality rates — and that Norwegians drink and smoke a bit less? (The mysteriously high colorectal cancer diagnosis rates in Norway have been noted for decades.⁷ )

Another oddity is that there were zero perforations in the NordICC trial across all 12,000 colonoscopies performed whereas previous studies report them at rates between 1 in 100 and 1 in 20,000. (Perforation of the colon is fatal without emergency surgery.) One interpretation of this is that European doctors are awesome. Alternatively, Powell and Prasad give this heart-warming speculation:

It has been hypothesized that the absence of perforation in NordICC was due in part by the fact that the vast majority of procedures were unsedated, preventing the operator from using more force—a contributing factor to perforation.

If doing colonoscopies without anesthesia reduces perforations, that’s great. But one doctor I talked to suggested that without anesthesia it’s impossible to be as thorough, so this might reduce the benefits of colonoscopies.

Overall, I don’t think we have a clear answer here. I think there is weak evidence for the idea that the average American colonoscopy is better than those in this trial. But it’s very uncertain.

Why Was There No Impact on Overall Mortality?

Among nondoctors, many people were surprised that overall mortality rates — from any cause — were the same in the control and invited groups. Isn’t the point of colonoscopies to prevent death? Does this mean colonoscopies are worthless?

No. I ran millions of simulated trials assuming “magic colonoscopies” that reduced the odds of dying from colorectal cancer to zero. The results ranged from the invited group having 5.6% lower overall mortality to 2.5% higher.⁸ So we can’t conclude anything from this. There just aren’t enough people to detect a tiny signal (change in colorectal cancer mortality) that’s drowned out by tons of noise (mortality from other factors).

Discussion

Are colonoscopies better than other screening methods? I don’t know. Maybe!

But on the margin, surely this trial gives some evidence that colonoscopies might not be as effective as we hoped. So why were American gastroenterologists so keen to dismiss it? There seemed to be no similar frenzy of doctors in other countries trying to discredit the trial.

Or, to back up — why has America settled on colonoscopies when most of the world has not? Regulators in Europe basically give two reasons that they continue to recommend fecal blood tests:

1. It’s hard to get Europeans to do any colorectal cancer screening, and exceedingly hard to get them to agree to colonoscopies.

2. Colonoscopies have not yet been proven to be cost-effective.

In America, neither of these apply. When setting recommendations, the U.S. Preventative Service Task Force explicitly does not consider costs. And since we expect colonoscopies to be effective, it would be too risky not to do them. Americans are probably more open to colonoscopies simply because doctors have told them for decades how important they are. (Note: None of these things are obviously bad!)

So that’s my theory. Europe wants stronger evidence for cost-effectiveness. America is more aggressive and more willing to accept high costs. I don’t think this trial changes that. Colonoscopies are still expensive, and still possibly but not conclusively better.

One thing is clear: Screening works. If you’re of the appropriate age, please get screened. If your tubes are acting funny, please get screened without delay. The best method and the level of benefit are debatable, but we know it helps. Use a stool test if you want (multitarget DNA test if you can), or a colonoscopy, or a sigmoidoscopy, or a “virtual” CT colonoscopy, or a crazy edible camera. Do one of them. Statistics show colorectal cancer is highly curable when caught early, and now that we have feisty checkpoint inhibitor immunotherapies,it’s probably even better now. Just do it. Your tubes will thank you.

An earlier version of this article referred to the NordICC trial in the past tense. It has been updated to reflect that the trial is still ongoing.

This piece is based on the ten-year preliminary analysis. We'll know more when the final results are published for the 15-year trial, probably in 2027. ↩
They originally identified 94,958 individuals, including 9,780 participants from the Netherlands. These were not included in the final analysis because a Dutch law based on EU data protection regulations was passed which made follow-up impossible. ↩
They excluded 221 people for having a previous diagnosis of colorectal cancer and 373 for being — umm — dead, leaving them with 84,585. ↩
Since no data were available, I extracted the numbers for refusers and from images in the published papers. This was tons of fun and probably introduced a bit of noise. Unfortunately, there is no data available on the breakdown of these three groups in terms of colorectal cancer mortality or overall mortality. I checked with the authors, who said they aren’t releasing the numbers because there aren’t enough data available yet. ↩
Again, for this and all the following plots, I had to extract the data from images in the published papers, which probably introduced some noise. ↩
These are all age-standardized numbers for 2020. The mortality number for the U.S. comes from the CDC’s WONDER system, combining deaths from malignant neoplasm of the colon and rectum. All the other numbers come from the International Agency for Research on Cancer. (Mortality numbers aren’t available for Sweden.) ↩
See, e.g., N. Malila and T. Hakulinen, “Epidemiological Trends of Colorectal Cancer in the Nordic Countries,” Scandinavian Journal of Surgery 92, no. 1 (2003): 5-9. ↩
Suppose that over 10 years, someone has a 0.31% chance of dying of colorectal cancer and a 10.73% chance of dying of something else (the observed rates in the control group in the trial). And suppose that 42% of people invited for colonoscopies agree, and if they do, their risk of dying from colorectal cancer drops to zero. I ran millions of simulations with the same numbers of people as in this trial. If you take the inner 95% range of outcomes, these ranged from the invited group having 5.6% lower to 2.5% higher. It’s a weird coincidence that there was no change in this trial. ↩

Dynomight writes about science and dispenses life advice at dynomight.net.

Published October 2023

Have something to say? Email us at letters@asteriskmag.com.

Previous
From Warp Speed to 100 Days

Next
Fracking Eyeballs

You’re Invited to a Colonoscopy!

Dynomight

Reminders About Tubes

The Trial

The Debate

Did the Trial Actually Show That Colonoscopies Are Great?

Intermission

Was the Trial Too Short?

Do European Doctors Suck?

Why Was There No Impact on Overall Mortality?

Discussion

Further Reading

Fracking Eyeballs

From Warp Speed to 100 Days

Half A Million Kinksters Can’t Be Wrong

Mysticism & Empiricism

The Art of Asking Questions

About Highlights