Brief description
In this computational model to predict breast cancer prognosis based on microarray gene expression data, we use prior knowledge, in the form of pre-specified gene sets from the Molecular Signatures Database (MSigDB) dataset. We compare features derived from the gene sets with features based on individual genes, with respect to the following criteria:- discrimination: ability to predict metastasis within 5 years, both on average and its variance; stability of the ranks of individual features within datasets; concordance between the weights and ranks of features from different datasets; and the underlying biological process pointed to by the features.
Notes
The purpose of the set statistic is to reduce the set's expression matrix to a single vector, which is then used as a feature for classification. The intention is for the set statistic to be representative of the expression levels of the set, in a useful way. The different set statistics used in this work are all unsupervised, in the sense that they do not take into account the metastatic class. The set statistics used in this work are: Mathematical Notation, Set Centroid and Set Median, Set Medoid, Set t-Statistic, U-statistic p-value, 1st Principal Component of the Set, The Centroid Classifier.
To measure the concordance between datasets, we perform internal and external validation. For internal validation, we estimate the classifier's generalisation inside each dataset, using repeated random subsampling; the subsampling is used to form a bagged classifier for each dataset. External validation is then performed, where the bagged classifier from each dataset is used to predict the metastatic class of patients from another dataset. In the internal validation, we use repeated random subsampling to estimate the classifier's internal generalisation error, as measured by AUC (area under receiver-operating characteristic curve).
Requirements: UNIX-like operating system; R programming language.
How to cite: Abraham, G; Kowalczyk, A; Loi, S; Haviv, I; Zobel, J. (2011) Computational Model for Gene Set Analysis to predict breast cancer prognosis based on microarray gene expression data. Computer Science and Software Engineering, The University of Melbourne. doi:10.4225/02/4E9F69C011BC8
User Contributed Tags
Login to tag this record with meaningful keywords to make it easier to discover
- Local : www.nicta.com.au/service-1
- DOI : doi:10.4225/02/4E9F69C011BC8
