runHyperGO {EMA}R Documentation

Run Gene Ontology analysis based on hypergeometric test from a probeset list

Description

Run Gene Ontology analysis based on hypergeometric test from a probeset list

Usage

runHyperGO(list, pack.annot, categorySize = 1, verbose = TRUE,
name = "hyperGO", htmlreport = TRUE, txtreport = TRUE,
tabResult = FALSE, pvalue = 0.05)

Arguments

list

vector of character with probeset names

pack.annot

annotation package to use

categorySize

integer, minimum size for category, by default = 1

verbose

logical, if TRUE, results are displayed, by default TRUE

name

character, name for output files, by default "hyperGO"

htmlreport

logical, if TRUE, a html report is created, by default TRUE

txtreport

logical, if TRUE, a txt report is created, by default TRUE

tabResult

logical, if TRUE, a list with the results is created, by default FALSE

pvalue

numeric, a cutoff for the hypergeometric test pvalue, by default 0.05

Details

The choice of the universe could have a significant impact on the results. It is well discussed in the vignette of the GOstats package. Here, we decided to apply a non-specific filtering procedure different from the one proposed by Falcon and Gentleman. Since not all genes will be expressed under all conditions in our data, we can ask the question of defining the universe only with the expressed genes or with all the genes of the array. Actually, we are not able to distinguish the genes which are biologically non expressed, from the ones of low quality. That's why we think that the non-expressed probesets could be biologically relevant, as well as the ones with a little variation accross samples, and we decided to first defined the universe with all the genes of the array. Then, we just remove probe sets that have no Entrez Gene identifier in our annotation data or no GO annotation. Finally, the Hypergeometric test is performed on the unique EntrezId of the gene list, and the unique EntrezId of the universe. The pvalues in output are not corrected from multiple testing. Note that because of the existing dependence structure (between genes, and GO terms) it is difficult to do any multiple testing correction. Moreover the most insteresting genesets are not necessarily the ones with the smallest pvalues. Nodes that are interesting are typically those with a reasonable number of genes (10 or more) and small pvalues.

Value

The R objects or the Txt and html reports

BP

Data.frame with results for Biological Process with GO Id, pvalue, Odd Ratio, Expected count, Size and GO Term

MF

Idem for Molecular Function

CC

Idem for Cellular Component

Author(s)

EMA group

See Also

hyperGTest,runHyperKEGG

Examples

## Not run: 
require(hgu133plus2.db)
data(marty)

## Probe list
probeList <- rownames(marty)[1:50]

## Hypergeometric test for KEGG pathway
res <- runHyperGO(probeList, htmlreport = FALSE, txtreport = FALSE,
    tabResult = TRUE, pack.annot = "hgu133plus2.db")

## End(Not run)

[Package EMA version 1.3.2 Index]