The United States and Canadian Academy of Pathology (USCAP) Annual Meeting is the largest pathology conference in North America, and over the last several years its digital pathology and computational pathology programming has grown substantially relative to the overall meeting size. The 2026 meeting reflected a field that has moved past the proof-of-concept phase — the conversation has shifted from "can AI analyze pathology images?" to "how do we validate it rigorously, integrate it into clinical workflows, and navigate the regulatory pathway to clinical deployment?"
This is a synthesis of the themes that stood out from the oral presentations, educational sessions, and short course content focused on computational IHC and digital pathology. It is not a verbatim session summary, and individual presentation conclusions are reported in general terms without speaker attribution.
Validation Methodology Is the Central Conversation
More sessions this year directly addressed the question of what good validation looks like for AI pathology tools — not just what performance metrics to report, but how study design choices affect what those metrics mean. Several educational talks covered the ICC and kappa interpretation questions we have addressed in earlier posts on this blog, which itself signals that the community is grappling with interpretation variability in the literature.
A recurring theme was the inadequacy of single-site validation studies. Concordance data generated at the institution that developed or primarily trained an algorithm cannot be taken as generalizable evidence of clinical performance. The inter-site variation in staining platforms, scanner models, fixation protocols, and pathologist scoring norms is large enough that a model performing at ICC 0.94 at its home institution may perform at ICC 0.85 or lower when deployed at a different center with different infrastructure. Multi-site, prospective validation with pre-specified primary endpoints and blinded reference-standard reads is increasingly the expectation for tools being positioned for clinical use — and several poster presentations at the meeting demonstrated exactly this study design, including for IHC scoring tasks.
There was also substantive discussion of cohort composition requirements. The consensus position that emerged from multiple sessions was consistent with what statistical power analysis suggests: you need case stratification, not just adequate total n. A study of 300 cases where 240 are unambiguous 0 or 3+ cases provides less useful information than a study of 200 cases with 80 equivocal or boundary cases. The equivocal band is where the algorithm's clinical value is most relevant, and studies that are not powered to demonstrate performance in that band leave the most important clinical question unanswered.
SaMD Regulatory Discussions Were Substantive and Practical
The regulatory programming this year was better attended and more technically detailed than in previous years, reflecting the number of groups in the field who are actively planning or pursuing 510(k) or De Novo submissions. A workshop on SaMD classification and the Clinical Decision Support (CDS) provisions of the 21st Century Cures Act was standing-room full — a tangible sign that the regulatory question is no longer abstract for many development teams.
The CDS exclusion question generated significant discussion. The four-part test for CDS software that is not a device — including the requirement that a clinician can "independently review the basis for the recommendations" — is interpreted by FDA as requiring that the algorithm's underlying data is surfaced to the reviewer, not just its output. Several speakers emphasized that a black-box score without supporting data is less likely to qualify for CDS exclusion than a score accompanied by the spatial heatmap, cell-count data, and confidence information that allows the pathologist to actually evaluate whether the score is reasonable. This design principle aligns with both regulatory positioning and clinical trust-building.
The predetermined change control plan (PCCP) framework — FDA's mechanism for allowing ML-based SaMD to undergo post-market algorithm modifications without a new 510(k) submission if the modifications are within a pre-approved specification — received dedicated coverage. For tools using models that may be updated as additional training data becomes available, the PCCP is an important operational consideration that should be part of the regulatory strategy from the outset. Building the PCCP documentation structure before the initial submission is substantially more tractable than retrofitting it after.
IHC Biomarker Scoring: Where the Clinical Interest Is Concentrated
Within the computational pathology space, IHC quantification for predictive biomarkers — HER2, PD-L1, and proliferation markers — remained the area of most active clinical interest. The motivation is straightforward: these markers have established companion diagnostic or prognostic roles, their clinical significance is high, and the inter-observer variability problem in manual scoring is well-documented. These factors combine to make IHC quantification the highest-value target for algorithmic assistance in the near term.
PD-L1 scoring complexity received sustained attention across multiple sessions. The multi-assay, multi-indication landscape — 22C3 for multiple pembrolizumab indications, 28-8 for nivolumab, SP142 for atezolizumab — continues to create practical challenges for algorithmic tools. Several presentations highlighted concordance data between different PD-L1 clones on the same tissue and the implications for cross-clone normalization approaches. The consistent message was that clone-specific models validated against companion diagnostic-equivalent reference reads are required for clinical-grade scoring — general-purpose PD-L1 models without clone specificity are not adequate for the companion diagnostic context.
Ki-67 scoring methodology also generated discussion, particularly around the global versus hotspot debate for breast carcinoma and the International Ki-67 in Breast Cancer Working Group recommendations. The IKWG's systematic work to standardize Ki-67 scoring methodology has begun to influence algorithmic tool design, and several presented studies used the IKWG global scoring protocol as the reference standard — which is the right choice for reproducibility-focused validation in that indication.
Digital Pathology Infrastructure: The Integration Gap
A short course on digital pathology implementation at academic health systems addressed a gap that practitioners encounter frequently: the science of whole-slide imaging and AI analysis is ahead of the infrastructure needed to integrate it into clinical workflows. DICOM-WSI storage, DICOMweb access patterns for AI analysis versus interactive viewing, HL7 result routing from algorithmic tools to LIS systems, and structured result display in the pathology sign-out interface are all solvable problems — but each requires specific engineering work that most clinical sites are not equipped to do independently.
The practical implication is that AI pathology tools with well-developed integration packages — documented HL7 interfaces, DICOM conformance statements, LIS-specific integration playbooks — have a meaningful competitive advantage over tools with better algorithms but weaker integration support. A pathology department evaluating tools on algorithm performance alone will encounter the integration gap at deployment time; a department that evaluates integration support in the selection process is more likely to achieve timely deployment.
Workforce and Training Themes
A theme that appeared in multiple sessions — not always explicitly in the computational pathology programming — was the changing nature of pathologist training in the digital pathology era. The perceptual skills developed by trainees on glass slides are not identical to the skills needed for digital review, and departments have variable approaches to integrating digital workflow training into residency and fellowship programs.
For AI-assisted workflows specifically, the question of how to train pathologists to evaluate algorithmic outputs critically — rather than accepting them uncritically or rejecting them reflexively — is emerging as a curriculum gap. A pathologist who understands what ICC means, how to evaluate a spatial heatmap for plausibility, and when to override an algorithmic score with appropriate documentation is a different kind of user than one who treats the score as either ground truth or noise. Developing that evaluative competency is not something that happens automatically from deploying a tool; it requires intentional training.
The overall tone of USCAP 2026 was measured optimism — a field that has seen enough early hype and enough disappointing deployments to be appropriately careful, but also a field with enough real progress in validation methodology, regulatory clarity, and integration infrastructure to believe that clinical-grade AI pathology tools are achievable in the near to medium term. The challenge is executing on the science and the regulatory pathway in parallel, and doing the validation work rigorously enough that the results mean something when they land on a pathologist's review screen.