aPEAR
is designed to
help you notice the most important biological themes in your enrichment
analysis results. It analyses the gene lists of the pathways and detects
clusters of redundant overlapping gene sets.
Let’s begin by performing a simple gene set enrichment analysis with
clusterProfiler
:
# Load all the packages:
library(data.table)
library(ggplot2)
library(dplyr)
library(stringr)
library(clusterProfiler)
library(DOSE)
library(org.Hs.eg.db)
library(aPEAR)
data(geneList)
# Perform enrichment using clusterProfiler
set.seed(42)
enrich <- gseGO(geneList, OrgDb = org.Hs.eg.db, ont = 'CC')
enrichmentNetwork()
enrichmentNetwork
is the most important function
exported by aPEAR
. It detects clusters of similar pathways
and generates a ggplot2
visualization. The only thing it
asks you to provide is your enrichment result:
Internally, enrichmentNetwork
calls two functions,
findPathClusters
and plotPathClusters
, which
are described in more detail below.
clusterProfiler
?aPEAR
currently recognizes input from
clusterProfiler
and gProfileR
. However, if you
have custom enrichment input, do not worry!
aPEAR
accepts any kind of enrichment input as long as it
is formatted correctly, the only requirement is that the gene list of
each pathway is known. You should format your data so that:
data.frame
.colorBy
.nodeSize
.For example, you might format your data like this:
enrichmentData[ 1:5 ]
#> Description pathwayGenes NES Size
#> <char> <char> <num> <int>
#> 1: condensed chromosome, centromeric region 991/1062/10403/55... 2.692489 148
#> 2: chromosome, centromeric region 55143/991/1062/10... 2.670847 199
#> 3: kinetochore 991/1062/10403/55... 2.652515 138
#> 4: nuclear chromosome 8318/55388/7153/2... 2.548816 181
#> 5: chromosomal region 55143/991/1062/10... 2.541716 319
Then, tell the enrichmentNetwork
what to do:
p <- enrichmentNetwork(enrichmentData, colorBy = 'NES', nodeSize = 'Size', verbose = TRUE)
#> Validating parameters...
#> Validating enrichment data...
#> Detected enrichment type custom
#> Calculating pathway similarity using method jaccard
#> Using Markov Cluster Algorithm to detect pathway clusters...
#> Clustering done
#> Using Pagerank algorithm to assign cluster titles...
#> Pagerank scores calculated
#> Creating the enrichment network visualization...
#> Validating theme parameters...
#> Preparing enrichment data for plotting...
#> Detected enrichment type custom
#> Creating the enrichment graph...
findPathClusters()
If your goal is only to obtain the clusters of redundant pathways,
the function findPathClusters
is the way to go. It accepts
a data.frame
with the enrichment results and returns a list
of the pathway clusters and the similarity matrix:
clusters <- findPathClusters(enrich@result, cluster = 'hier', minClusterSize = 6)
clusters$clusters[ 1:5 ]
#> Pathway Cluster
#> <char> <char>
#> 1: cytoplasmic region ciliary plasm
#> 2: plasma membrane bounded cell projection cytoplasm ciliary plasm
#> 3: axoneme ciliary plasm
#> 4: ciliary plasm ciliary plasm
#> 5: cilium ciliary plasm
pathways <- clusters$clusters[ 1:5, Pathway ]
clusters$similarity[ pathways, pathways ]
#> cytoplasmic region
#> cytoplasmic region 1.0000000
#> plasma membrane bounded cell projection cytoplasm 0.7758621
#> axoneme 0.5172414
#> ciliary plasm 0.5344828
#> cilium 0.2500000
#> plasma membrane bounded cell projection cytoplasm
#> cytoplasmic region 0.7758621
#> plasma membrane bounded cell projection cytoplasm 1.0000000
#> axoneme 0.6666667
#> ciliary plasm 0.6888889
#> cilium 0.2462687
#> axoneme ciliary plasm
#> cytoplasmic region 0.5172414 0.5344828
#> plasma membrane bounded cell projection cytoplasm 0.6666667 0.6888889
#> axoneme 1.0000000 0.9677419
#> ciliary plasm 0.9677419 1.0000000
#> cilium 0.2459016 0.2540984
#> cilium
#> cytoplasmic region 0.2500000
#> plasma membrane bounded cell projection cytoplasm 0.2462687
#> axoneme 0.2459016
#> ciliary plasm 0.2540984
#> cilium 1.0000000
For more information about available similarity metrics, clustering
methods, cluster naming conventions, and other available parameters, see
?aPEAR.theme
.
plotPathClusters()
To visualize clustering results obtained with
findPathClusters
, use the function
plotPathClusters
:
set.seed(238923)
plotPathClusters(
enrichment = enrich@result,
sim = clusters$similarity,
clusters = clusters$clusters,
fontSize = 4,
outerCutoff = 0.01, # Decrease cutoff between clusters and show some connections
drawEllipses = TRUE
)
For more parameter options, see ?aPEAR.theme
.