Computational Image Analysis Identifies New Morphologic Features That Predict Breast Cancer Outcome.
Andrew H Beck, Robert B West, Marc van de Vijver, Daphne Koller. Stanford University, CA; Netherlands Cancer Institute, Amsterdam, Netherlands
Background: Tumor morphology encodes abundant biological and clinical information; however, the molecular basis of clinically significant morphological features is poorly understood. The goal of this project is to develop an image analysis and machine learning pipeline to quantify morphologic features in breast cancer, to build quantitative image-based models predictive of patient outcome, and to identify genes driving the clinically significant morphologic phenotypes.
Design: Microscopic images and expression profiling data were obtained from H&E stained breast cancer tissue microarrays (TMAs) from the Netherlands Cancer Institute. 670 images from TMA cores from 248 patients were used. We used image analysis techniques to identify and segment nuclei and to characterize their morphological features such as shape, texture, heterogeneity, and relationships to neighbors. We quantified 139 morphologic features from each nucleus and 20 global features from each image. For each patient, features were summarized by mean and standard deviation. Survival predictions were made using 5-fold cross-validation.
Results: At each fold, Principal Component Analysis (PCA) was performed on a reduced data matrix, consisting of the training cases and the top image features associated with survival on the training cases. On the held-out cross-validation data, the second principal component (PC2) was highly associated with survival (p = 0.002). In a multivariate model with grade, lymph node status, ER, size, and the 70 gene prognosis signature, the significant predictors of survival were the 70 gene prognosis signature (p=0.006), PC2 (p=0.02), and ER (p=0.04). Grade, lymph node status, and size did not make a statistically significant contribution to survival prediction in this model (all p >0.05). PC2 contains features that characterize nuclear chromatin heterogeneity and nuclear pleomorphism. The set of annotated genes most predictive of this morphologic phenotype was enriched for proteins expressed at mutagenesis sites, proteins involved in regulation of metabolic processes, and proteins expressed in the nucleus and involved in DNA repair.
Conclusions: We have developed an image analysis and machine learning system to extract quantitative morphologic data from breast cancer microscopic images, build prognostic models from image features, and predict genes that regulate the morphologic phenotypes. We have characterized a novel quantitative nuclear phenotype associated with patient outcome, and we have identified a set of genes predictive of this phenotype.
Tuesday, March 1, 2011 2:30 PM
Platform Session: Section B, Tuesday Afternoon