A system-level understanding of
the regulation and coordination mechanisms of gene expression is
essential to understanding the complexity of biological processes in
health and disease. With the rapid development of single-cell RNA
sequencing technologies, it is now possible to investigate gene
interactions in a cell-type-specific manner. Here we introduce the
scLink package, which uses statistical network modeling to
understand the co-expression relationships among genes and to construct
sparse gene co-expression networks from single-cell gene expression
data.
Here we demonstrate the functionality of scLink using
the example data stored in the package.
## Loading required package: parallel
## Loading required package: glasso
count = readRDS(system.file("extdata", "example.rds", package = "scLink"))
genes = readRDS(system.file("extdata", "genes.rds", package = "scLink"))The example raw count matrix count has 793
rows representing different cells and 23,341
columns representing different genes.
genes is a character vector of 500 genes of interest.
sclink_normWe use the function sclink_norm to process single cell
read count for application of the sclink method. The code below will
normalize the read count matrix with a library size of \(10^6\) and only keep the 500 genes in
genes for downstream analysis. Note that the normalized
count matrix count.norm is on the \(\log_{10}\) scale.
If users do not have a particular gene list for network inference,
they can set filter.genes=TRUE to filter for the top \(n\) genes with largest average expression
values. For example:
sclink_netAfter the pre-processing step, we use the function
sclink_net to calculate the robust correlation matrix and
identifed sparse co-expression network of scLink. expr is
the normalized count matrix output by sclink_norm or
supplied by the users. lda is the candidate regularization
parameters used in scLink’s graphical model. The users can set
ncores to take advantage of parallel computation.
sclink_net returns a list of results. The scLink’s
robust correlation matrix can be retrieved from the cor
element:
## Rn45s Eef1a1 Malat1
## Rn45s 1.00000000 -0.27604002 -0.08561265
## Eef1a1 -0.27604002 1.00000000 -0.05138179
## Malat1 -0.08561265 -0.05138179 1.00000000
The gene co-expression networks and summary statistics can be
retrieved from the summary element, which is a list with
the same length as lda: each element corresponds to one
regularization parameter.
## [1] "adj" "Sigma" "nedge" "bic" "lambda"
## Rn45s Eef1a1 Malat1
## Rn45s 1 0 0
## Eef1a1 0 1 0
## Malat1 0 0 1
## Rn45s Eef1a1 Malat1
## Rn45s 1.696584 0.000000 0.000000
## Eef1a1 0.000000 1.665809 0.000000
## Malat1 0.000000 0.000000 1.578271
## [1] 1255450
## [1] 127
## [1] 0.5
sclink_corSince it is very difficult to infer co-expression relationships for
lowly expressed genes in single-cell data, we suggest the filtering step
as used in sclink_norm to select genes. This also reduces
the computational burden. However, if the users would like to infer gene
networks for a large gene list (e.g., \(>
5000\) genes), we suggest that the users first use
sclink_cor to investigate the correlation structures among
these genes.
If the correlation matrix suggests obvious gene modules, then the
users can apply sclink_net separately on these modules to
reduce computation time and increase overall accuracy.