Title: | Calculates the Density-Based Clustering Validation Index (DBCV) Index |
---|---|
Description: | A metric called 'Density-Based Clustering Validation index' (DBCV) index to evaluate clustering results, following the <https://github.com/FelSiq/DBCV> 'Python' implementation by Felipe Alves Siqueira. Original 'DBCV' index article: Moulavi, D., Jaskowiak, P. A., Campello, R. J., Zimek, A., & Sander, J. (2014, April). "Density-based clustering validation", Proceedings of SDM 2014 -- the 2014 SIAM International Conference on Data Mining (pp. 839-847), <doi:10.1137/1.9781611973440.96>. |
Authors: | Davide Chicco [aut, cre] |
Maintainer: | Davide Chicco <[email protected]> |
License: | GPL-3 |
Version: | 1.1 |
Built: | 2025-01-09 23:22:47 UTC |
Source: | https://github.com/cranhaven/cranhaven.r-universe.dev |
Function to compute pairwise distances and ensure matrix format
compute_pair_to_pair_dists(data, metric = "euclidean")
compute_pair_to_pair_dists(data, metric = "euclidean")
data |
input clustering results |
metric |
metric of the distance, Euclidean by default |
a pairwise distances' matrix
n = 300; noise = 0.05; seed = 1782; theta <- seq(0, pi, length.out = n / 2) x1 <- cos(theta) + rnorm(n / 2, sd = noise) y1 <- sin(theta) + rnorm(n / 2, sd = noise) x2 <- cos(theta + pi) + rnorm(n / 2, sd = noise) y2 <- sin(theta + pi) + rnorm(n / 2, sd = noise) X <- rbind(cbind(x1, y1), cbind(x2, y2)) dist_matrix <- compute_pair_to_pair_dists(X)
n = 300; noise = 0.05; seed = 1782; theta <- seq(0, pi, length.out = n / 2) x1 <- cos(theta) + rnorm(n / 2, sd = noise) y1 <- sin(theta) + rnorm(n / 2, sd = noise) x2 <- cos(theta + pi) + rnorm(n / 2, sd = noise) y2 <- sin(theta + pi) + rnorm(n / 2, sd = noise) X <- rbind(cbind(x1, y1), cbind(x2, y2)) dist_matrix <- compute_pair_to_pair_dists(X)
Function that calculates the Density-Based Clustering Validation index (DBCV) of clustering results
dbcv(data, labels, metric = "euclidean", noise_id = -1)
dbcv(data, labels, metric = "euclidean", noise_id = -1)
data |
input clustering results |
labels |
labels of the clustering |
metric |
metric of the distance, Euclidean by default |
noise_id |
the code of the noise cluster points, -1 by default |
a real value containing the Saturn coefficient
n = 300; noise = 0.05; seed = 1782; theta <- seq(0, pi, length.out = n / 2) x1 <- cos(theta) + rnorm(n / 2, sd = noise) y1 <- sin(theta) + rnorm(n / 2, sd = noise) x2 <- cos(theta + pi) + rnorm(n / 2, sd = noise) y2 <- sin(theta + pi) + rnorm(n / 2, sd = noise) X <- rbind(cbind(x1, y1), cbind(x2, y2)) y <- c(rep(0, n / 2), rep(1, n / 2)) cat("dbcv(X, y) = ", dbcv(X, y), "\n", sep="")
n = 300; noise = 0.05; seed = 1782; theta <- seq(0, pi, length.out = n / 2) x1 <- cos(theta) + rnorm(n / 2, sd = noise) y1 <- sin(theta) + rnorm(n / 2, sd = noise) x2 <- cos(theta + pi) + rnorm(n / 2, sd = noise) y2 <- sin(theta + pi) + rnorm(n / 2, sd = noise) X <- rbind(cbind(x1, y1), cbind(x2, y2)) y <- c(rep(0, n / 2), rep(1, n / 2)) cat("dbcv(X, y) = ", dbcv(X, y), "\n", sep="")
Function to remove duplicate samples from the input data
remove_duplicates(data, labels)
remove_duplicates(data, labels)
data |
input clustering results |
labels |
labels of the clustering |
a list of data and labels without duplicates
n = 300; noise = 0.05; seed = 1782; theta <- seq(0, pi, length.out = n / 2) x1 <- cos(theta) + rnorm(n / 2, sd = noise) y1 <- sin(theta) + rnorm(n / 2, sd = noise) x2 <- cos(theta + pi) + rnorm(n / 2, sd = noise) y2 <- sin(theta + pi) + rnorm(n / 2, sd = noise) X <- rbind(cbind(x1, y1), cbind(x2, y2)) y <- c(rep(0, n / 2), rep(1, n / 2)) cat("remove_duplicates(X, y) = ") print(remove_duplicates(X, y))
n = 300; noise = 0.05; seed = 1782; theta <- seq(0, pi, length.out = n / 2) x1 <- cos(theta) + rnorm(n / 2, sd = noise) y1 <- sin(theta) + rnorm(n / 2, sd = noise) x2 <- cos(theta + pi) + rnorm(n / 2, sd = noise) y2 <- sin(theta + pi) + rnorm(n / 2, sd = noise) X <- rbind(cbind(x1, y1), cbind(x2, y2)) y <- c(rep(0, n / 2), rep(1, n / 2)) cat("remove_duplicates(X, y) = ") print(remove_duplicates(X, y))