Ki-67 is not one scoring method. Depending on institutional convention, clinical indication, and the specific scoring guideline in use, a pathologist may be asked to report a global proliferation index (percentage of Ki-67-positive tumor cells across the entire invasive carcinoma), a hotspot score (the Ki-67 index in the area of highest positivity density), or both. These two numbers tell different biological stories, and for certain tumor types and clinical contexts, they are not interchangeable.
For algorithmic Ki-67 scoring, getting the number right in absolute terms is only part of the problem. The harder question is whether the algorithm is computing the biologically meaningful number — and for hotspot scoring, that requires spatial reasoning about where on the slide to look, not just how to count what is there.
Global Score vs. Hotspot Score: Why the Distinction Matters
In breast carcinoma, the St. Gallen International Expert Consensus has consistently identified Ki-67 as a marker of tumor proliferative activity with prognostic implications, particularly for luminal B versus luminal A classification. But the consensus has also consistently noted that the optimal scoring method is not universally agreed upon. The International Ki-67 in Breast Cancer working group (IKWG), which has produced the most rigorous methodological guidance, recommends a global average score from at least four representative tumor regions — acknowledging that local hotspot scoring amplifies the highest Ki-67 density at the expense of whole-tumor representativeness.
For neuroendocrine tumors (NETs), the WHO 2022 classification uses Ki-67 proliferation index as the primary grading determinant (G1: <3%, G2: 3–20%, G3: >20%). Here the guidance explicitly recommends hotspot scoring: the pathologist should identify the area of highest mitotic and proliferative activity and count at least 500–2000 cells in that region. A global average would systematically underestimate the proliferative rate of the most aggressive region, potentially misclassifying a G2 NET as G1.
These two contexts — breast carcinoma and neuroendocrine tumors — encode opposite scoring philosophies in their clinical guidelines, both for legitimate biological reasons. An algorithm that computes only a global score and applies it to a NET is not just methodologically incorrect, it is clinically dangerous. The algorithm must know what scoring protocol is required for the tumor type and clinical indication before it reports a number.
How Hotspot Localization Works Computationally
The hotspot problem is fundamentally a spatial density estimation problem. On a whole-slide image of an invasive carcinoma, the task is to identify a contiguous region of the tumor parenchyma where the density of Ki-67-positive nuclei is highest, and then compute the proliferation index within that region. This requires three sequential operations:
First, tissue segmentation — distinguishing invasive tumor from stroma, in situ components, normal epithelium, necrosis, and artifact. Ki-67 positive fibroblasts and endothelial cells are not relevant to the tumor proliferation index, and including them inflates the score. A densely stromal tumor (e.g., desmoplastic infiltrating lobular carcinoma) can have a high density of Ki-67-positive stromal nuclei that has nothing to do with tumor cell proliferative activity. Tissue segmentation that is insufficiently specific to invasive carcinoma cells will produce a score that is not what the guidelines are asking for.
Second, nucleus detection and classification — identifying individual Ki-67-positive and Ki-67-negative nuclei within the segmented tumor tissue, operating at high magnification (typically 20x or 40x equivalents). This is where cell-level deep learning models operate: each detected nucleus in the IHC-stained slide is classified as positive (DAB-brown nuclear staining) or negative (hematoxylin-blue counterstain only). The precision of this step drives the reliability of the count-based denominator.
Third, hotspot localization — computing a spatial density map of positive cell fraction across the tumor tissue, and identifying the region of peak density. The definition of "hotspot" involves choices: what is the minimum region size (to avoid a single high-magnification field of 10 cells)? What spatial smoothing kernel is applied to avoid over-fitting to small local fluctuations? Should the hotspot be a fixed-area window (e.g., 1 mm², consistent with the high-power field area used in manual mitotic counting) or an algorithmically determined contiguous high-density region?
The Fixed-Window vs. Adaptive Hotspot Debate
Manual hotspot scoring by pathologists uses a mental model of scanning the slide at a medium magnification overview, identifying a candidate hot area, then switching to high power to count. The result is a fixed-area count within an area selected by visual judgment. Two pathologists looking at the same slide will often select overlapping but not identical hotspot regions, which is a primary driver of the wide ICC range (0.59–0.92) observed in published Ki-67 inter-observer studies.
An algorithm can take two approaches to formalizing this. The fixed-window approach: slide the scoring window across the tumor at a fixed size (say, 1 mm²), compute the Ki-67 index in each window position, and report the maximum. This is geometrically clean and analogous to the manual approach. It is also sensitive to the window size selection — a 0.5 mm² window will produce a higher hotspot score than a 2 mm² window on the same case, because it concentrates on a smaller, higher-density region.
The adaptive approach: use the full spatial density map to identify the contiguous region of highest mean positivity above a density threshold, regardless of absolute area. This better captures biological hot regions that may not fit cleanly into a fixed geometric window, but it requires a more complex algorithmic definition and produces results that are harder to compare across cases.
We are not saying one approach is categorically superior. The fixed-window approach is closer to how pathologist counting is defined in guidelines and is therefore more appropriate when the algorithm's output will be directly compared to guideline-defined pathologist scores. The adaptive approach may be more sensitive as a biological discovery tool. In a clinical scoring context, alignment with the applicable guideline's definition of hotspot is the constraint, not algorithmic elegance.
Reporting Both Scores: The Practical Case
For breast carcinoma, where the IKWG global scoring recommendation exists alongside widespread clinical use of hotspot scoring in many institutions, the most informative algorithmic output reports both the global mean Ki-67 index across the tumor parenchyma and the hotspot Ki-67 index in the highest-density region, together with the spatial map showing where the hotspot is located.
This dual reporting has an immediate practical benefit. When a global score is 18% and the hotspot score is 35%, the spatial heatmap reveals a tumor with highly heterogeneous proliferative activity — a pattern that may have different implications for prognosis than a homogeneously high-proliferative tumor with both scores converging near 30%. The pathologist reviewing the algorithmic output has more information, not less, and can apply clinical judgment to how the score interacts with the specific clinical context.
In the scenario of a needle biopsy of an 8 mm invasive ductal carcinoma submitted to an academic cancer center's breast pathology unit, where the tumor tissue on the core is limited and fragmented across multiple cores in the cassette, the global versus hotspot distinction becomes even more important. Limited tissue means the global score may be dominated by the specific core fragments sampled, which may not be representative. An algorithm that identifies the hotspot on the most cellular core fragment while also reporting the aggregate global score gives the pathologist a more complete picture of what the biopsy can and cannot tell them about the whole tumor's proliferative activity.
Calibration Against Clinical Thresholds
For breast carcinoma, clinical decision thresholds for Ki-67 vary by guideline and institutional convention. The cutoffs of 14%, 20%, and 30% for Luminal A/B stratification have all been proposed in different guidelines, and the optimal threshold remains an active area of research. An algorithm reporting Ki-67 percentage should be calibrated against the scoring convention used by its reference pathologist panel — a score of 20% on Synthia's output should be comparable to a score of 20% scored by the same pathologists using the same method, not comparable to 20% scored by a different method on a different scanner.
This is why scanner-specific validation matters. A model trained and validated on Aperio GT 450 images may produce systematically different absolute Ki-67 percentages when applied to Hamamatsu NanoZoomer images of the same tissue, because the DAB color rendering, hematoxylin blue intensity, and contrast characteristics differ between scanners. Scanner-agnostic claims require cross-scanner validation data, and any honest capability disclosure should specify which scanner platforms have been validated and what the inter-scanner concordance looks like.