Identify Coexpressed ESTs with LNP

hosted by
Statistical tools for analyzing patterns of gene expression from expressed sequence tag (EST) or other counts-based data.
    by Morgan Price & Eleanor Rieffel

Download the
source code

We've developed an improved technique for finding coexpressed genes in large EST datasets (Price & Rieffel, to appear). On human dbEST, our measure finds significantly more coexpression between functionally related pairs of genes, relative to controls, than previous measures based on the Fisher exact test (Walker et al 1999) and the Pearson correlation coefficient (Ewing et al 1999). Our measure also shows slightly but significantly more coexpression between interacting pairs of human proteins than controls. In simulations, our measure reports reasonable significance values – the other measures report p-values as low as 10-30 on 1/1,000 uncorrelated pairs – and is more sensitive than the other measures.

We use a log-normal prior for the variation in each gene's levels. By doing this, we can interpret the number of times a gene appears in each EST library rigorously, giving a posterior distribution of the gene's frequency. In contrast, the other techniques use the counts directly, leading to problems when the library sizes vary widely, as is the case for dbEST. For example, two moderately expressed genes will tend to show up together in large libraries and be absent together from small libraries, leading to apparent coexpression without any underlying correlation in expression levels.

For each gene, we first find the maximum likelihood mean and standard deviation (in log-space) for the variation in each gene's levels, given the observed counts. Then, for any pair of genes, we compute the likelihood of a given correlation (again in log-space) between the two genes by integrating over all possible expression levels for both genes in each library, using the posterior distribution on frequencies. Finally, we find the maximum likelihood correlation numerically and use a maximum likelihood ratio test to assess statistical significance.

Download the source code
Supplementary information for the Bioinformatics paper
More sourceforge features