Dataset

Computational Model for Gene Set Analysis to predict breast cancer prognosis based on microarray gene expression data

NICTA
A/Prof Izhak Haviv (Associated with) Dr Adam Kowalczyk (Associated with) Dr Sherene Loi (Associated with) Gad Abraham (Managed by) Prof Justin Zobel (Associated with)
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doidoi:10.4225/02/4E9F69C011BC8&rft.title=Computational Model for Gene Set Analysis to predict breast cancer prognosis based on microarray gene expression data&rft.identifier=doi:10.4225/02/4E9F69C011BC8&rft.publisher=NICTA&rft.description=In this computational model to predict breast cancer prognosis based on microarray gene expression data, we use prior knowledge, in the form of pre-specified gene sets from the Molecular Signatures Database (MSigDB) dataset. We compare features derived from the gene sets with features based on individual genes, with respect to the following criteria:- discrimination: ability to predict metastasis within 5 years, both on average and its variance; stability of the ranks of individual features within datasets; concordance between the weights and ranks of features from different datasets; and the underlying biological process pointed to by the features.&rft.creator=Anonymous&rft.date=2012&rft.relation=10.1186/1471-2105-11-277&rft_subject=Bioinformatics Software&rft_subject=Information and Computing Sciences&rft_subject=Computer Software&rft.type=dataset&rft.language=English Access the data

Access:

Other view details

GPLv3 License. There are no restrictions for use by non-academics.


Brief description

In this computational model to predict breast cancer prognosis based on microarray gene expression data, we use prior knowledge, in the form of pre-specified gene sets from the Molecular Signatures Database (MSigDB) dataset. We compare features derived from the gene sets with features based on individual genes, with respect to the following criteria:- discrimination: ability to predict metastasis within 5 years, both on average and its variance; stability of the ranks of individual features within datasets; concordance between the weights and ranks of features from different datasets; and the underlying biological process pointed to by the features.

Notes

The purpose of the set statistic is to reduce the set's expression matrix to a single vector, which is then used as a feature for classification. The intention is for the set statistic to be representative of the expression levels of the set, in a useful way. The different set statistics used in this work are all unsupervised, in the sense that they do not take into account the metastatic class. The set statistics used in this work are: Mathematical Notation, Set Centroid and Set Median, Set Medoid, Set t-Statistic, U-statistic p-value, 1st Principal Component of the Set, The Centroid Classifier.

To measure the concordance between datasets, we perform internal and external validation. For internal validation, we estimate the classifier's generalisation inside each dataset, using repeated random subsampling; the subsampling is used to form a bagged classifier for each dataset. External validation is then performed, where the bagged classifier from each dataset is used to predict the metastatic class of patients from another dataset. In the internal validation, we use repeated random subsampling to estimate the classifier's internal generalisation error, as measured by AUC (area under receiver-operating characteristic curve).

Requirements: UNIX-like operating system; R programming language.

How to cite: Abraham, G; Kowalczyk, A; Loi, S; Haviv, I; Zobel, J. (2011) Computational Model for Gene Set Analysis to predict breast cancer prognosis based on microarray gene expression data. Computer Science and Software Engineering, The University of Melbourne. doi:10.4225/02/4E9F69C011BC8

Subjects

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover

Identifiers