Pathway analysis tool using population data

One of the most fundamental downstream analyses in cancer genomics is determining whether a group of genes, e.g. a metabolic pathway, harbors more somatic mutations than that expected by chance, which would suggest some sort of mechanistic involvement. This testing has been traditionally done using elementary "tally" tests, e.g. based on the simple sum of all events over all samples. However, those methods discard important information like the effect of gene size (large genes are more likely to be mutated under the null hypothesis) and the distribution of mutations among the samples. This generally leads to artificially low P-values, i.e. elevated false positives. The PathScan tool performs a more sophisticated statistical analysis that considers these variables using Fisher-Lancaster theory, thus furnishing more accurate P-values. It is readily used for any biologically-relevant grouping of genes, i.e. not only established pathways, but also putative gene networks from de novo methods.

PathScan is available through MuSiC, or can be invoked as a stand-alone tool. It has been used in many significant cancer genomic studies by Genome Institute collaborators, including for AML (Timothy Ley), breast cancer (Matthew Ellis), and lung cancer (Ramaswamy Govindan).


PathScan is written in Perl and is both freely distributed and supported by the Genome Institute's bioinformatics staff. Documentation and examples are widely available, e.g. on CPAN.