Wilks' Λ Dissimilarity Measures for Gene Clustering: An Approach Based on the Identification of Transcription Modules
Summary Clustering methods are widely used in the analysis of microarray data for their ability to uncover coordinated expression profiles. One important goal of clustering is to discover coregulated genes because it has been postulated that genes targeted
by the same transcription factors tend to show similar expression patterns. We focus on agglomerative hierarchical clustering and consider the problem of choosing a dissimilarity measure on the basis of its ability to identify functional modules consisting of a transcription factor and the
associated target genes. We first propose two criteria that constitute a theoretical framework for assessing the adequacy and comparing different dissimilarity measures. We show that the proposed criteria allow one to gain insight into the behavior of dissimilarity measures and lead to a ranking
of some of the most commonly used dissimilarity measures. Next, we introduce two dissimilarity measures based on the Wilks' Λ statistic and show that, according to the above criteria, they have better performance than the other considered measures. The theoretical results are supported
by an applied analysis on both simulated and real data.