[263] An Interobserver Concordance Study Reporting Estrogen Receptor (ER) in Invasive Breast Tumors Reveals a High Prevalence (4.9%) of Discordant Positive Versus Negative Results, Despite Excellent Overall Concordance

Emily S Reisenbichler, Amy Ly, Susan C Lester, Jane E Brock. Brigham and Women's Hospital, Boston, MA

Background: 2010 ASCO/CAP guidelines for reporting ER set a 1% threshold for a positive result which is based upon a 1990s study showing improved outcome with hormone therapy with 1% weak staining (Allred score 3). Specifically, one observer reviewed 1,982 cases with 11% of cases reviewed by a second observer for a concordance score of 0.87 using 6F11 ER antibody. A group of six dedicated breast pathologists report the concordance in ER by Allred and H-score using two widely used ER antibodies and reveal a hitherto unappreciated discordance reporting cases around the 1% threshold.
Design: Routinely processed consecutive cases of invasive breast carcinoma were stained with ER-SP1 and ER-1D5 and slides were scored for Allred (0-8) and H-score (0-300) by 3 observers. Cases with positive versus negative discordance between the original 3 observers were reviewed by 3 additional observers.
Results: For 264 cases evaluated, the two antibodies resulted in a difference in Allred score of ≤1 point in 254 (96%), 250 (94%) and 252 (95%) cases for each respective observer and < 50 point difference between H-scores in 232 (88%), 242 (92%) and 232 (88%) of cases. Pairwise kappa agreement between observers ranged from 0.863 to 0.924 with SP1 and 0.892 to 0.943 with 1D5 when dividing cases as either positive (Allred >2; H-score ≥ 1) or negative (Allred ≤2; H-score <1). Thirteen cases (4.9%) showed discrepant positive/negative results between the original 3 observers with one or both antibodies and were evaluated by 3 additional observers. Within these 13 cases, the interobserver H-scores were discrepant by up to 49 points with SP1 and 50.5 points with 1D5. Allred scores were discrepant by up to 4 and 5 points with SP1 and 1D5 respectively. All 6 observers found a higher rate of positive cases with SP1 than with 1D5. No statistically significant intra- or inter-observer difference was seen between the 2 antibodies in either Allred score or H-score across all tumors reviewed.
Conclusions: Despite excellent inter-observer concordance in evaluating ER expression levels, we find almost 5% of routine cases evaluated have a low level of ER expression difficult to classify as ER positive or negative by manual counting. Prior to this multi-observer study, borderline ER expression cases were considered rare representing <1% of routine cases. The surprisingly high prevalence brought to light in this study is clinically significant given patient treatment and possible survival implications.
