Concordance of Tumor Grade, ER and Her2+ER- Status with Gene-Expression-Based Profile Studies: Boosted Classification
Les W Dalton. South Austin Hospital, Austin, TX
Background: Development of sophisticated data mining algorithms has paralleled the advancements of molecular methodologies, although in the medical community the latter garners most of the attention. Data mining can discover patterns of variable association not obvious by traditional statistical measures. We wished to study if this might be the case in a widely referenced patient data set (Concordance of Gene-Expression-Based Predictors for Breast Cancer. NEJM 2006 355:560-9). In particular we wished to examine boosting classification (BC) which is a well known algorithm in the data mining community, but not in pathology.
Design: The data on each of 291 individual patients was obtained from supplementary online material of the NEJM paper. The patient population was of a younger group with age range 26-53 and median 45. Tumor grade (TG), ER status, and Her2+/ER- status(HR), were set as predictor variables with death of disease (DOD) as the target variable. Via BC (Statistica Data Miner, StatSoft, Tulsa,OK), these three predictors were combined into a binary "boosted grade" (BG) of high and low risk. BG was then compared with high vs low risk recurrence score (RS), activated or quiescent wound response profile (WR), and good vs poor seventy gene profile (SG).
Results: Boosted grade was concordant with the gene assays to include the subset of node neg. tumors. The NEJM paper stressed Cramer's V (V) statistic for comparing predictors in 2 way tables with >.36 regarded as substantial agreement and >.5 as strong. Thus, using R statistical project software V calculated for predictor pairs as: BG/RS .49; BG/SG .52; BG/WR .22; RS/SG .58; RS/WR .43; SG/WR .37-- all pairs had chi.sq. with p <.05. V of predictors paired with DOD showed: BG/DOD .38, SG/DOD .37, RS/DOD .32; WR/DOD .24. Further comparison of predictors with DOD showed likelihood ratios and diagnostic odds ratios respectively of BG/DOD 2.3 & 6.5; RS/DOD 1.6 & 7.6; SG/DOD 9.5 & 1.8; WR/DOD 1.3 & 5.8. All predictors showed p <.05 via chi.sq, and Pearson.
Conclusions: According to this data set of younger patients, gene assays have yet to be proven superior to TG, ER, HR when these latter variables have been "boosted." TG,ER, and HR are required on CAP tumor checklists-- the addition of gene profiling studies entails considerable added cost. It may very well be the math is more important than the molecules. And, it may be that a novel data mining algorithm can add the same or more value as can a novel tumor marker in stratifying patient risk. Study of a BG with other data sets is recommended.
Tuesday, March 20, 2012 11:00 AM
Platform Session: Section B, Tuesday Morning