American Surgical Association

ASA Home ASA Home Past & Future Meetings Past & Future Meetings

Back to 2025 Abstracts


Development, Validation, and Prospective Testing of a Novel Open Architecture AI Model for Automated Surgical Coding
*Mohamad El Moheb1, *Kristin Putman1, *Olivia Sears1, *Jianjie Ma1, *Yangfeng Ji2, Melina Kibbe1, Craig Kent1, Allan Tsung1
1Department of Surgery, University of Virginia, Charlottesville, VA; 2Department of Computer Science, University of Virginia, Charlottesville, VA

Objective: Operative note coding is time-intensive, with errors leading to significant financial consequences due to lost revenue or overbilling. While artificial intelligence (AI) could improve the efficiency and accuracy of this process, adoption has been limited as current solutions are closed-source proprietary systems that lack transparency and standardized validation. We sought to develop, validate, and prospectively test a novel, transparent, open architecture transformer-based AI model designed to extract procedures codes from free-text operative notes, using breast surgery as a case example.
Methods: We developed our model using breast surgery operative notes from July 2017 to December 2023. Notes lacking surgeon-provided Current Procedural Terminology (CPT) codes were excluded. Expert medical coders manually reviewed and validated all CPT codes, establishing a reference standard for evaluation. Surgeon codes were compared against this reference standard to determine overcoding (extraneous codes) and undercoding (missed codes) rates. Data was split into 80% training and 20% validation sets. We then trained a large-scale clinical transformer model, pretrained on medical corpora, on our dataset and optimized it for CPT code prediction. Model and surgeon performance were evaluated against the reference standard using area under the curve (AUC), precision (proportion of correct predictions), recall (proportion of codes captured), and F1 score (precision-recall balance). The final model was prospectively tested on notes from May to October 2024.
Results: Our dataset included 3,259 operative notes with a total of 8,041 CPT codes. We first compared surgeon-provided codes against expert-validated codes across all notes, revealing discrepancies in 12% (n=380) of notes, with 8% (n=255) containing extraneous codes and 10% (n=309) having missing codes. The AI model’s performance was then evaluated against the reference standard in the validation set and compared to surgeons’ coding. The model demonstrated strong performance with an F1 score of 0.969, AUC of 0.998, precision of 98.3%, and recall of 95.5% - surpassing surgeon performance across all metrics except recall, where surgeons achieved a marginally higher rate (95.8% vs. 95.5%). In prospective testing on 268 notes, the model maintained robust performance, outperforming surgeons across all metrics (Table). A user-friendly interface was developed for model interaction (Figure).
Conclusions: We developed, validated, and prospectively tested a novel AI system that demonstrated robust performance in automatically extracting CPT codes from free-text breast surgery operative notes, consistently outperforming surgeons. As a user-friendly and transparent solution with publicly available architecture, our model has the potential to streamline billing, reduce errors, and alleviate administrative burdens while removing cost barriers to enable widespread adoption for enhanced surgical coding nationwide.
Comparative Performance of AI Model and Surgeons in Predicting CPT Codes.
 Surgeons' performanceTransformer-based AI model
ValidationTesting
F1 Score0.9590.9690.969
AUC0.9740.9980.996
Precision96.9%97.8%97.9%
Recall95.8%95.5%96.4%

Metrics were computed using weighted averaging
The User-Friendly Interface of the AI-Driven Model for Automated CPT Code Prediction. Users enter an operative note and set a probability threshold which controls the algorithm’s sensitivity (threshold of 0.5 is typically used).
Back to 2025 Abstracts