iClusterVB allows for fast integrative clustering and feature selection for high dimensional data.
Using a variational Bayes approach, its key features - clustering of mixed-type data, automated determination of the number of clusters, and feature selection in high-dimensional settings - address the limitations of traditional clustering methods while offering an alternative and potentially faster approach than MCMC algorithms, making iClusterVB a valuable tool for contemporary data analysis challenges.
There is a simulated dataset included as a list in the package that we can use to illustrate iClusterVB.
library(iClusterVB)
# sim_data comes with the iClusterVB package.
dat1 <- list(
gauss_1 = sim_data$continuous1_data[c(1:20, 61:80, 121:140, 181:200), 1:75],
gauss_2 = sim_data$continuous2_data[c(1:20, 61:80, 121:140, 181:200), 1:75],
poisson_1 = sim_data$count_data[c(1:20, 61:80, 121:140, 181:200), 1:75],
multinomial_1 = sim_data$binary_data[c(1:20, 61:80, 121:140, 181:200), 1:75]
)
# We re-code `0`s to `2`s
dat1$multinomial_1[dat1$multinomial_1 == 0] <- 2
dist <- c(
"gaussian", "gaussian",
"poisson", "multinomial"
)
fit_iClusterVB <- iClusterVB(
mydata = dat1,
dist = dist,
K = 4,
initial_method = "VarSelLCM",
VS_method = 1,
max_iter = 50
)
#> ------------------------------------------------------------
#> Pre-processing and initializing the model
#> ------------------------------------------------------------
#>
#> ------------------------------------------------------------
#> Running the CAVI algorithm
#> ------------------------------------------------------------
#> iteration = 10 elbo = -1987757.206577
#> iteration = 20 elbo = -1937584.919018
#> iteration = 30 elbo = -1885755.424734
#> iteration = 40 elbo = -1852764.123269
#> iteration = 50 elbo = -1837257.997981
# We can obtain a summary using summary()
summary(fit_iClusterVB)
#> Total number of individuals:
#> [1] 80
#>
#> User-inputted maximum number of clusters: 4
#> Number of clusters determined by algorithm: 4
#>
#> Cluster Membership:
#> 1 2 3 4
#> 20 20 20 20
#>
#> # of variables above the posterior inclusion probability of 0.5 for View 1 - gaussian
#> [1] "50 out of a total of 75"
#>
#> # of variables above the posterior inclusion probability of 0.5 for View 2 - gaussian
#> [1] "51 out of a total of 75"
#>
#> # of variables above the posterior inclusion probability of 0.5 for View 3 - poisson
#> [1] "52 out of a total of 75"
#>
#> # of variables above the posterior inclusion probability of 0.5 for View 4 - multinomial
#> [1] "51 out of a total of 75"