Gene Assessment and Sample Classification for Gene Expression Data Using a Genetic Algorithm / k-nearest Neighbor Method
Authors: Li L.; Darden T.A.; Weingberg C.R.; Levine A.J.; Pedersen L.G.
Source: Combinatorial Chemistry & High Throughput Screening, Volume 4, Number 8, December 2001 , pp. 727-739(15)
Publisher: Bentham Science Publishers
Abstract:
Recent tools that analyze microarray expression data have exploited correlation-based approaches such as clustering analysis. We describe a new method for assessing the importance of genes for sample classification based on expression data. Our approach combines a genetic algorithm (GA) and the k-nearest neighbor (KNN) method to identify genes that jointly can discriminate between two types of samples (e.g. normal vs. tumor). First, many such subsets of differentially expressed genes are obtained independently using the GA. Then, the overall frequency with which genes were selected is used to deduce the relative importance of genes for sample classification. Sample heterogeneity is accommodated; that is, the method should be robust against the existence of distinct subtypes. We applied GA / KNN to expression data from normal versus tumor tissue from human colon. Two distinct clusters were observed when the 50 most frequently selected genes were used to classify all of the samples in the data sets studied and the majority of samples were classified correctly. Identification of a set of differentially expressed genes could aid in tumor diagnosis and could also serve to identify disease subtypes that may benefit from distinct clinical approaches to treatment.
Keywords: Gene Expression; Algorithm (GA); K-nearest neighbor (KNN); Pattern recognition; Gene selection; High-dimensional; Microarray
Language: English
Document Type: Review article
DOI: http://dx.doi.org/10.2174/1386207013330733
Publication date: 2001-12-01
- Combinatorial Chemistry & High Throughput Screening publishes full length original research articles and reviews describing various topics in combinatorial chemistry (e.g. small molecules, peptide, nucleic acid or phage display libraries) and/or high throughput screening (e.g. developmental, practical or theoretical). Ancillary subjects of key importance, such as robotics and informatics, will also be covered by the journal. In these respective subject areas, Combinatorial Chemistry & High Throughput Screening is intended to function as the most comprehensive and up-to-date medium available. The journal should be of value to individuals engaged in the process of drug discoveryand development, in the settings of industry, academia or government.
- In this: publication
- By this: publisher
- In this Subject: Pharmacology
- By this author: Li L. ; Darden T.A. ; Weingberg C.R. ; Levine A.J. ; Pedersen L.G.

Shopping cart
Receive new issue alert
Get Permissions