Title: | Multi-Resolution Scanning for Cross-Sample Differences |
---|---|
Description: | An implementation of the MRS algorithm for comparison across distributions, as described in Jacopo Soriano, Li Ma (2017) <doi:10.1111/rssb.12180>. The model is based on a nonparametric process taking the form of a Markov model that transitions between a "null" and an "alternative" state on a multi-resolution partition tree of the sample space. MRS effectively detects and characterizes a variety of underlying differences. These differences can be visualized using several plotting functions. |
Authors: | Jacopo Soriano and Li Ma |
Maintainer: | Li Ma <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.2.6 |
Built: | 2025-02-03 07:19:35 UTC |
Source: | https://github.com/cranhaven/cranhaven.r-universe.dev |
This function executes the Multi Resolution Scanning algorithm to detect differences across the distributions of multiple groups having multiple replicates.
andova(X, G, H, n_groups = length(unique(G)), n_subgroups = NULL, Omega = "default", K = 6, init_state = c(0.8, 0.2, 0), beta = 1, gamma = 0.07, delta = 0.4, eta = 0, alpha = 0.5, nu_vec = 10^(seq(-1, 4)), return_global_null = TRUE, return_tree = TRUE)
andova(X, G, H, n_groups = length(unique(G)), n_subgroups = NULL, Omega = "default", K = 6, init_state = c(0.8, 0.2, 0), beta = 1, gamma = 0.07, delta = 0.4, eta = 0, alpha = 0.5, nu_vec = 10^(seq(-1, 4)), return_global_null = TRUE, return_tree = TRUE)
X |
Matrix of the data. Each row represents an observation. |
G |
Numeric vector of the group label of each observation. Labels are integers starting from 1. |
H |
Numeric vector of the replicate label of each observation. Labels are integers starting from 1. |
n_groups |
Number of groups. |
n_subgroups |
Vector indicating the number of replicates for each grop. |
Omega |
Matrix defining the vertices of the sample space.
The |
K |
Depth of the tree. Default is |
init_state |
Initial state of the hidden Markov process. The three states are null, altenrative and prune, respectively. |
beta |
Spatial clustering parameter of the transition probability matrix. Default is |
gamma |
Parameter of the transition probability matrix. Default is |
delta |
Parameter of the transition probability matrix. Default is |
eta |
Parameter of the transition probability matrix. Default is |
alpha |
Pseudo-counts of the Beta random probability assignments. |
nu_vec |
The support of the discrete uniform prior on nu. |
return_global_null |
Boolean indicating whether to return the marginal posterior probability of the global null. |
return_tree |
Boolean indicating whether to return the posterior representative tree. |
An mrs
object.
Ma L. and Soriano J. (2018). Analysis of distributional variation through multi-scale Beta-Binomial modeling. Journal of Computational and Graphical Statistics. Vol. 27, No. 3, 529-541.. doi:10.1080/10618600.2017.1402774
set.seed(1) n = 1000 M = 5 class_1 = sample(M, n, prob= 1:5, replace=TRUE ) class_2 = sample(M, n, prob = 5:1, replace=TRUE ) Y_1 = rnorm(n, mean=class_1, sd = .2) Y_2 = rnorm(n, mean=class_2, sd = .2) X = matrix( c(Y_1, Y_2), ncol = 1) G = c(rep(1,n),rep(2,n)) H = sample(3,2*n, replace = TRUE ) ans = andova(X, G, H) ans$PostGlobNull plot1D(ans)
set.seed(1) n = 1000 M = 5 class_1 = sample(M, n, prob= 1:5, replace=TRUE ) class_2 = sample(M, n, prob = 5:1, replace=TRUE ) Y_1 = rnorm(n, mean=class_1, sd = .2) Y_2 = rnorm(n, mean=class_2, sd = .2) X = matrix( c(Y_1, Y_2), ncol = 1) G = c(rep(1,n),rep(2,n)) H = sample(3,2*n, replace = TRUE ) ans = andova(X, G, H) ans$PostGlobNull plot1D(ans)
This function executes the Multi Resolution Scanning algorithm to detect differences across multiple distributions.
mrs(X, G, n_groups = length(unique(G)), Omega = "default", K = 6, init_state = NULL, beta = 1, gamma = 0.3, delta = NULL, eta = 0.3, alpha = 0.5, return_global_null = TRUE, return_tree = TRUE, min_n_node = 0)
mrs(X, G, n_groups = length(unique(G)), Omega = "default", K = 6, init_state = NULL, beta = 1, gamma = 0.3, delta = NULL, eta = 0.3, alpha = 0.5, return_global_null = TRUE, return_tree = TRUE, min_n_node = 0)
X |
Matrix of the data. Each row represents an observation. |
G |
Numeric vector of the group label of each observation. Labels are integers starting from 1. |
n_groups |
Number of groups. |
Omega |
Matrix defining the vertices of the sample space.
The |
K |
Depth of the tree. Default is |
init_state |
Initial state of the hidden Markov process. The three states are null, altenrative and prune, respectively. |
beta |
Spatial clustering parameter of the transition probability matrix. Default is |
gamma |
Parameter of the transition probability matrix. Default is |
delta |
Optional parameter of the transition probability matrix. Default is |
eta |
Parameter of the transition probability matrix. Default is |
alpha |
Pseudo-counts of the Beta random probability assignments. Default is |
return_global_null |
Boolean indicating whether to return the posterior probability of the global null hypothesis. |
return_tree |
Boolean indicating whether to return the posterior representative tree. |
min_n_node |
Node in the tree is returned if there are more than |
An mrs
object.
Soriano J. and Ma L. (2017). Probabilistic multi-resolution scanning for two-sample differences. Journal of the Royal Statistical Society: Series B (Statistical Methodology). doi:10.1111/rssb.12180
set.seed(1) n = 20 p = 2 X = matrix(c(runif(p*n/2),rbeta(p*n/2, 1, 4)), nrow=n, byrow=TRUE) G = c(rep(1,n/2), rep(2,n/2)) ans = mrs(X=X, G=G)
set.seed(1) n = 20 p = 2 X = matrix(c(runif(p*n/2),rbeta(p*n/2, 1, 4)), nrow=n, byrow=TRUE) G = c(rep(1,n/2), rep(2,n/2)) ans = mrs(X=X, G=G)
This function visualizes the regions of the representative tree
of the output of the mrs
function.
For each region the posterior probability of difference (PMAP) or the effect size is plotted.
plot1D(ans, type = "prob", group = 1, dim = 1, regions = rep(1, length(ans$RepresentativeTree$Levels)), legend = FALSE, main = "default", abs = TRUE)
plot1D(ans, type = "prob", group = 1, dim = 1, regions = rep(1, length(ans$RepresentativeTree$Levels)), legend = FALSE, main = "default", abs = TRUE)
ans |
An |
type |
What is represented at each node.
The options are |
group |
If |
dim |
If the data are multivariate, |
regions |
Binary vector indicating the regions to plot. The default is to plot all regions. |
legend |
Color legend for type. Default is |
main |
Overall title for the plot. |
abs |
If |
Soriano J. and Ma L. (2017). Probabilistic multi-resolution scanning for two-sample differences. Journal of the Royal Statistical Society: Series B (Statistical Methodology). doi:10.1111/rssb.12180
Ma L. and Soriano J. (2018). Analysis of distributional variation through multi-scale Beta-Binomial modeling. Journal of Computational and Graphical Statistics. Vol. 27, No. 3, 529-541.. doi:10.1080/10618600.2017.1402774
set.seed(1) p = 1 n1 = 200 n2 = 200 mu1 = matrix( c(0,10), nrow = 2, byrow = TRUE) mu2 = mu1; mu2[2] = mu1[2] + .01 sigma = c(1,.1) Z1 = sample(2, n1, replace=TRUE, prob=c(0.9, 0.1)) Z2 = sample(2, n2, replace=TRUE, prob=c(0.9, 0.1)) X1 = mu1[Z1] + matrix(rnorm(n1*p), ncol=p)*sigma[Z1] X2 = mu2[Z2] + matrix(rnorm(n2*p), ncol=p)*sigma[Z1] X = rbind(X1, X2) G = c(rep(1, n1), rep(2,n2)) ans = mrs(X, G, K=10) plot1D(ans, type = "prob") plot1D(ans, type = "eff")
set.seed(1) p = 1 n1 = 200 n2 = 200 mu1 = matrix( c(0,10), nrow = 2, byrow = TRUE) mu2 = mu1; mu2[2] = mu1[2] + .01 sigma = c(1,.1) Z1 = sample(2, n1, replace=TRUE, prob=c(0.9, 0.1)) Z2 = sample(2, n2, replace=TRUE, prob=c(0.9, 0.1)) X1 = mu1[Z1] + matrix(rnorm(n1*p), ncol=p)*sigma[Z1] X2 = mu2[Z2] + matrix(rnorm(n2*p), ncol=p)*sigma[Z1] X = rbind(X1, X2) G = c(rep(1, n1), rep(2,n2)) ans = mrs(X, G, K=10) plot1D(ans, type = "prob") plot1D(ans, type = "eff")
This function visualizes the regions of the representative tree
of the output of the mrs
function.
plot2D(ans, type = "prob", data.points = "all", background = "none", group = 1, dim = c(1, 2), levels = sort(unique(ans$RepresentativeTree$Levels)), regions = rep(1, length(ans$RepresentativeTree$Levels)), legend = FALSE, main = "default", abs = TRUE)
plot2D(ans, type = "prob", data.points = "all", background = "none", group = 1, dim = c(1, 2), levels = sort(unique(ans$RepresentativeTree$Levels)), regions = rep(1, length(ans$RepresentativeTree$Levels)), legend = FALSE, main = "default", abs = TRUE)
ans |
An |
type |
Different options on how to visualize the rectangular regions.
The options are |
data.points |
Different options on how to plot the data points.
The options are |
background |
Different options on the background.
The options are |
group |
If |
dim |
If the data are multivariate,
|
levels |
Vector with the level of the regions to plot. The default is to plot regions at all levels. |
regions |
Binary vector indicating the regions to plot. The default is to plot all regions. |
legend |
Color legend for type. Default is |
main |
Overall title for the legend. |
abs |
If |
Soriano J. and Ma L. (2017). Probabilistic multi-resolution scanning for two-sample differences. Journal of the Royal Statistical Society: Series B (Statistical Methodology). doi:10.1111/rssb.12180
Ma L. and Soriano J. (2018). Analysis of distributional variation through multi-scale Beta-Binomial modeling. Journal of Computational and Graphical Statistics. Vol. 27, No. 3, 529-541.. doi:10.1080/10618600.2017.1402774
set.seed(1) p = 2 n1 = 200 n2 = 200 mu1 = matrix( c(9,9,0,4,-2,-10,3,6,6,-10), nrow = 5, byrow=TRUE) mu2 = mu1; mu2[2,] = mu1[2,] + 1 Z1 = sample(5, n1, replace=TRUE) Z2 = sample(5, n2, replace=TRUE) X1 = mu1[Z1,] + matrix(rnorm(n1*p), ncol=p) X2 = mu2[Z2,] + matrix(rnorm(n2*p), ncol=p) X = rbind(X1, X2) colnames(X) = c(1,2) G = c(rep(1, n1), rep(2,n2)) ans = mrs(X, G, K=8) plot2D(ans, type = "prob", legend = TRUE) plot2D(ans, type="empty", data.points = "differential", background = "none") plot2D(ans, type="none", data.points = "differential", background = "smeared", levels = 4)
set.seed(1) p = 2 n1 = 200 n2 = 200 mu1 = matrix( c(9,9,0,4,-2,-10,3,6,6,-10), nrow = 5, byrow=TRUE) mu2 = mu1; mu2[2,] = mu1[2,] + 1 Z1 = sample(5, n1, replace=TRUE) Z2 = sample(5, n2, replace=TRUE) X1 = mu1[Z1,] + matrix(rnorm(n1*p), ncol=p) X2 = mu2[Z2,] + matrix(rnorm(n2*p), ncol=p) X = rbind(X1, X2) colnames(X) = c(1,2) G = c(rep(1, n1), rep(2,n2)) ans = mrs(X, G, K=8) plot2D(ans, type = "prob", legend = TRUE) plot2D(ans, type="empty", data.points = "differential", background = "none") plot2D(ans, type="none", data.points = "differential", background = "smeared", levels = 4)
This function visualizes the representative tree of the output of the mrs
function.
For each node of the representative tree, the posterior probability of difference (PMAP) or the effect size is plotted.
Each node in the tree is associated to a region of the sample space.
All non-terminal nodes have two children nodes obtained by partitiing the parent region with a dyadic cut along a given direction.
The numbers under the vertices represent the cutting direction.
plotTree(ans, type = "prob", group = 1, legend = FALSE, main = "", node.size = 5, abs = TRUE)
plotTree(ans, type = "prob", group = 1, legend = FALSE, main = "", node.size = 5, abs = TRUE)
ans |
A |
type |
What is represented at each node.
The options are |
group |
If |
legend |
Color legend for type. Default is |
main |
Main title. Default is |
node.size |
Size of the nodes. Default is |
abs |
If |
The package igraph is required.
Soriano J. and Ma L. (2017). Probabilistic multi-resolution scanning for two-sample differences. Journal of the Royal Statistical Society: Series B (Statistical Methodology). doi:10.1111/rssb.12180
Ma L. and Soriano J. (2018). Analysis of distributional variation through multi-scale Beta-Binomial modeling. Journal of Computational and Graphical Statistics. Vol. 27, No. 3, 529-541.. doi:10.1080/10618600.2017.1402774
set.seed(1) p = 2 n1 = 200 n2 = 200 mu1 = matrix( c(9,9,0,4,-2,-10,3,6,6,-10), nrow = 5, byrow=TRUE) mu2 = mu1; mu2[2,] = mu1[2,] + 1 Z1 = sample(5, n1, replace=TRUE) Z2 = sample(5, n2, replace=TRUE) X1 = mu1[Z1,] + matrix(rnorm(n1*p), ncol=p) X2 = mu2[Z2,] + matrix(rnorm(n2*p), ncol=p) X = rbind(X1, X2) colnames(X) = c(1,2) G = c(rep(1, n1), rep(2,n2)) ans = mrs(X, G, K=8) plotTree(ans, type = "prob", legend = TRUE)
set.seed(1) p = 2 n1 = 200 n2 = 200 mu1 = matrix( c(9,9,0,4,-2,-10,3,6,6,-10), nrow = 5, byrow=TRUE) mu2 = mu1; mu2[2,] = mu1[2,] + 1 Z1 = sample(5, n1, replace=TRUE) Z2 = sample(5, n2, replace=TRUE) X1 = mu1[Z1,] + matrix(rnorm(n1*p), ncol=p) X2 = mu2[Z2,] + matrix(rnorm(n2*p), ncol=p) X = rbind(X1, X2) colnames(X) = c(1,2) G = c(rep(1, n1), rep(2,n2)) ans = mrs(X, G, K=8) plotTree(ans, type = "prob", legend = TRUE)
This function print the summary the output of the mrs
function.
It provides the marginal prior and posterior of the null and the top regions of the representative tree.
## S3 method for class 'summary.mrs' print(x, ...)
## S3 method for class 'summary.mrs' print(x, ...)
x |
A |
... |
Additional print parameters. |
Soriano J. and Ma L. (2017). Probabilistic multi-resolution scanning for two-sample differences. Journal of the Royal Statistical Society: Series B (Statistical Methodology). doi:10.1111/rssb.12180
Ma L. and Soriano J. (2018). Analysis of distributional variation through multi-scale Beta-Binomial modeling. Journal of Computational and Graphical Statistics. Vol. 27, No. 3, 529-541.. doi:10.1080/10618600.2017.1402774
set.seed(1) n = 100 p = 2 X = matrix(c(runif(p*n/2),rbeta(p*n/2, 1, 4)), nrow=n, byrow=TRUE) G = c(rep(1,n/2), rep(2,n/2)) x = mrs(X=X, G=G) fit = summary(x, rho = 0.95, abs_eff = 1) print(fit)
set.seed(1) n = 100 p = 2 X = matrix(c(runif(p*n/2),rbeta(p*n/2, 1, 4)), nrow=n, byrow=TRUE) G = c(rep(1,n/2), rep(2,n/2)) x = mrs(X=X, G=G) fit = summary(x, rho = 0.95, abs_eff = 1) print(fit)
This function summarizes the output of the mrs
function.
It provides the marginal prior and posterior null and
the top regions of the representative tree.
## S3 method for class 'mrs' summary(object, rho = 0.5, abs_eff = 0, sort_by = "eff", ...)
## S3 method for class 'mrs' summary(object, rho = 0.5, abs_eff = 0, sort_by = "eff", ...)
object |
A |
rho |
Threshold for the posterior alternative probability.
All regions with posterior alternative probability larger
than |
abs_eff |
Threshold for the effect size. All regions with
effect size larger than |
sort_by |
Define in which order the regions are reported.
The options are |
... |
Additional summary parameters. |
A list
with information about the top regions.
Soriano J. and Ma L. (2017). Probabilistic multi-resolution scanning for two-sample differences. Journal of the Royal Statistical Society: Series B (Statistical Methodology). doi:10.1111/rssb.12180
Ma L. and Soriano J. (2018). Analysis of distributional variation through multi-scale Beta-Binomial modeling. Journal of Computational and Graphical Statistics. Vol. 27, No. 3, 529-541.. doi:10.1080/10618600.2017.1402774
set.seed(1) n = 100 p = 2 X = matrix(c(runif(p*n/2),rbeta(p*n/2, 1, 4)), nrow=n, byrow=TRUE) G = c(rep(1,n/2), rep(2,n/2)) object = mrs(X=X, G=G) fit = summary(object, rho = 0.5, abs_eff = 0.1)
set.seed(1) n = 100 p = 2 X = matrix(c(runif(p*n/2),rbeta(p*n/2, 1, 4)), nrow=n, byrow=TRUE) G = c(rep(1,n/2), rep(2,n/2)) object = mrs(X=X, G=G) fit = summary(object, rho = 0.5, abs_eff = 0.1)