The expression level of a gene is a measurement of the amount of RNA transcribed from the gene, which is then translated into proteins. Expression levels are influenced by many factors such as tissue type, disease state, signals from other cells, position in the cell cycle, or drug concentrations. As a result, much can be learned by measuring expression levels of genes under various experimental conditions. This can be done very efficiently using cDNA or oligonucleotide microarrays, which enable the measurement of expression levels of thousands of genes simultaneously. Tremendous biological insight can be gained by identifying genes that express similarly across experiments, or experiments with similar expression profiles across genes.
This talk presents a method for clustering gene expression microarray data using independent component analysis. In this method, data are grouped into clusters, each of which is modeled as a mixture of statistically independent components coming from a number of sources. It is hypothesized that such a method will be useful in analyzing gene expression data since at least some of the differences in gene expression may be due to a small number of essentially independent mechanisms. The method also allows the inherent dimensionality (i.e., the number of independent components) of each cluster to be identified automatically. To make the method computationally tractable, a variational Bayesian method is applied to approximate the posterior probability density functions of the model parameters.