Why Most Clinical AI Studies Disappear: A Successful Colorectal Surgery Prediction Model Author: Bruno Pignataro
Author: Bruno Pignataro

TL;DR
- A Danish AI model predicting 1-year mortality after colorectal cancer surgery was successfully deployed in clinical practice — rare for clinical AI.
- Patients receiving AI-guided care had 36% fewer postoperative complications compared to standard care.
- Cost savings reached $2,847 per patient in the first year.
- Success relied on a high-quality national dataset, surgeon involvement, and prediction linked directly to treatment.
Most clinical AI models stop at performance metrics. Here’s a Danish colorectal surgery model that made it into practice, and what we can learn from it.
AI-based tools for personalized clinical decisions are still largely limited to research settings. We face substantial complexity in both building and implementing these models. Studies often develop a model, report an area under the curve (AUC), and then simply disappear.
This week, I came across an interesting article. Andreas Weinberger Rosen and his Danish co-authors developed a model to predict 1-year mortality after colorectal cancer (CRC) surgery. This was a rare case where the model didn’t stop at development and actually made it into clinical practice. You can read the article here. As the authors conclude:
“We demonstrate the clinical utility of AI technologies and offer an adaptable framework for further scalability to other health care fields.”
This model actually made it into real-world practice. What made it work?
The authors used a nationwide dataset of 18,403patients for model development and internal validation. They then used a retrospective cohort of 806 patients for external validation, testing the model on real-world data, this cohort served as the control group. After that, they implemented the model prospectively in clinical practice, applying personalized perioperative treatment. This became the intervention group, later compared with the control group.
The final model included 58 variables out of an initial 8,694. It achieved an AUROC of 0.82 (95% CI, 0.81–0.84) in the development set, 0.77 (95% CI, 0.74–0.80) in internal validation, and 0.79 (95%CI, 0.71–0.87) in external validation. The model maintained solid performance from development to real-world testing.
The authors defined four risk groups based on predicted 1-year mortality: A ≤1%, B 1–5%, C 5–15%, and D >15%. More intensive perioperative intervention bundles were then applied according to risk level in the intervention group.
Postoperative medical complications occurred in 23.7% of the intervention group versus 37.3% in the control group, a roughly 36% relative reduction (OR 0.53; 95% CI, 0.36–0.76;P<0.001). This also translated into cost savings, an estimated US$2,847 per patient during the first year after surgery.
The authors suggest a few reasons this worked: ahigh-quality dataset prospectively collected and widely used for clinical care and research in Denmark; close involvement of colorectal surgeons, which kept the model focused on a real clinical problem; validation on local data, where the model would actually be used; and linking prediction to predefined perioperative intervention bundles.
Clinical Al succeeds when it starts with a real problem, uses high-quality data, and is built to work in practice.
However, some limitations remain. The study cannot establish a causal relationship between personalized treatment and improved outcomes, nor identify which specific component is driving the benefit. The interventions were not fully individualized, and the prospective cohort wass mall. The authors also suggest that future studies could use more data-efficient evaluation methods. I also think performance bias is hard to ignore here, since the prospective cohort was managed by a team aware of the study (classic Hawthorne effect).
So yes, they developed a clinically useful AI tool in this study. Building useful models in medicine is not like predicting mechanical failure from machinery data. We need a real clinical problem, a high-quality dataset, and a way to actually make it work in practice. You don’t often see all three come together. I’d like to see more studies like this: clinical AI tools that don’t just perform well, but actually make it into practice.
Frequented Asked Questions — F.A.Q.
What did the AI model actually predict?
The model predicted 1-year mortality risk after colorectal cancer surgery, classifying patients into four risk groups (≤1%, 1–5%, 5–15%, and >15%). Each group then received a matched perioperative care bundle based on their risk level.
How is this different from other clinical AI studies?
Most clinical AI studies stop at reporting performance metrics and never reach patients. This one went through development, internal validation, external validation on local data, and prospective deployment in real clinical practice — a rare full journey from model to bedside.
Can these results be trusted?
The findings are promising but come with caveats. The study cannot prove causation, the prospective cohort was small, and the care team knew they were part of a study — introducing potential performance bias (Hawthorne effect). Larger, blinded trials would be needed to confirm the magnitude of the benefit.
Reference
Rosen AW, Sørensen MS, Achiam MP, Gögenur I, Rasmussen LS, Achiam MP, et al. AI-guided personalized perioperative treatment in colorectal cancer surgery. Nature Medicine [Internet]. 2025 [cited 2026 May 20]. Available from: https://www.nature.com/articles/s41591-025-03942-x
Originally published at https://www.segmed.ai.
Comments
Post a Comment