Title: | Testing for Spatial Independence of Qualitative Data in Cross Section |
---|---|
Description: | Testing for Spatial Dependence of Qualitative Data in Cross Section. The list of functions includes join-count tests, Q test, spatial scan test, similarity test and spatial runs test. The methodology of these models can be found in <doi:10.1007/s10109-009-0100-1> and <doi:10.1080/13658816.2011.586327>. |
Authors: | Fernando Lopez [aut, cre] , Roman Minguez [aut] , Antonio Paez [aut] , Manuel Ruiz [aut] |
Maintainer: | Fernando Lopez <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.3.3 |
Built: | 2024-10-24 09:18:16 UTC |
Source: | https://github.com/cranhaven/cranhaven.r-universe.dev |
spqdep offers the user a collection of functions to testing for spatial independence of categorical cross-section datasets.
Some functionalities that have been included in spqdep package are:
Testing for spatial dependence with the Q test based on the symbolic entropy. See references
Two statistics for the spatial analysis of qualitative data are obtained that are based on the symbolic entropy of the maps. See references
Testing for spatial dependence with the global test based on runs. See references
Obtain the local test based on runs. See references
Testing for spatial dependence with the Scan test. See references
Testing for spatial dependence with the classical Join-count test. See references
Testing for spatial dependence with the Similarity test. See references
(example provinces_spain, FastFood.sf)
spqdep includes two different datasets: provinces_spain and FastFood.sf. These
sets are used to illustrate the capabilities of different functions.
Briefly, their main characteristics are the following
The FastFood.sf A sf object with points of the localization of Fastfood restaurants in Toronto.
The provinces_spain An object of the class sf with the geometry of spanish provinces and several data sets.
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
Ruiz M, López FA and A Páez (2011). Comparison of Thematic Maps Using Symbolic Entropy. International Journal of Geographical Information Science, 26, 413-439.
Ruiz, M., López, F., and Páez, A. (2010). Testing for spatial association of qualitative data using symbolic dynamics. Journal of Geographical Systems, 12(3), 281-309.0.
Kulldorff M, Nagarwalla N. (1995). Spatial disease clusters: Detection and Inference. Statistics in Medicine. 14:799-810
Jung I, Kulldorff M, Richard OJ (2010). A spatial scan statistic for multinomial data. Statistics in Medicine. 29(18), 1910-1918
Páez, A, López-Hernández, FA, Ortega-García, JA, & Ruiz, M. (2016). Clustering and co-occurrence of cancer types: A comparison of techniques with an application to pediatric cancer in Murcia, Spain. Spatial Analysis in Health Geography, 69-90.
Useful links:
A simple features object square regular lattice 16x16. from Fig. 3.3 in Upton and Fingleton (1985). In this figure, the cells coded black/white correspond to quadrats where the perennial shrub Atriplex hymenelytra is present/absent in a sample area in Death Valley, California.
data(Boots.sf)
data(Boots.sf)
A simple features dataframe with 256 rows and 1 column:
A factor with two levels: Black and White
Boots, B. (2003) Developing local measures of spatial association for categorical data, Journal of Geographical Systems, 5(2), 139-160. doi:10.1007/s10109-003-0110-3
Boots, B. (2003). Developing local measures of spatial association for categorical data. Journal of Geographical Systems, 5(2), 139-160. doi:10.1007/s10109-003-0110-3
Upton G., Fingleton B. (1985) Spatial data analysis by example. Volume 1: Point pattern and quantitative data. John Wiley & Sons, Chichester
data(Boots.sf) summary(Boots.sf) plot(Boots.sf)
data(Boots.sf) summary(Boots.sf) plot(Boots.sf)
This function obtains the set of symbols to get the Q-statistic
cr_symb(k = k, m = m)
cr_symb(k = k, m = m)
k |
number of categories |
m |
length of the m-surrounding |
...
A list with two types of symbols. Permutation and Combinations-totals
p_symb |
Matrix with symbols (permutations) |
c_symb |
Matrix with symbols (combinations) |
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
Ruiz M, López FA, A Páez. (2010). Testing for spatial association of qualitative data using symbolic dynamics. Journal of Geographical Systems. 12 (3) 281-309
# Example 1: Obtain symbols for k=2 classes and m-surroundings of size 5 symb25 <- cr_symb(2,5) symb25$p_symb # Permutations symbols symb25$c_symb # Combinations-totals symbols
# Example 1: Obtain symbols for k=2 classes and m-surroundings of size 5 symb25 <- cr_symb(2,5) symb25$p_symb # Permutations symbols symb25$c_symb # Combinations-totals symbols
The purpose of the function dgp.spq
is to generate a random dataset
with the dimensions and spatial structure decided by the user. This function may be
useful in pure simulation experiments or with the aim of showing specific properties
and characteristics of a spatial qualitative dataset.
dgp.spq(listw = listw, p = p, rho = rho, control = list())
dgp.spq(listw = listw, p = p, rho = rho, control = list())
listw |
A |
p |
a vector with the percentage of elements of each categories. The lengths must be the number of categories. The sum of the elements of vector must be 1. |
rho |
the level of spatial dependence (values between -1 y 1) |
control |
List of additional control arguments. See control argument section. |
In order to obtain categorical random variables with controlled degrees
of spatial dependence, we have designed a two- stage data generating process:
Firstly, we simulate autocorrelated data using the following model:
where I is the
identity matrix,
is a parameter
of spatial dependence, and W is a connectivity matrix that
determines the set of spatial relationships among points.
In the second step of the data generation process, the continuous spatially
autocorrelated variable Y is used to define a discrete spatial process
as follows. Let be defined by:
Let and define the discrete spatial process as:
a factor of length N (listw
is a matrix of order (NxN)) with
levels the first capital letters: "A", "B", ....
The description of the DGP is available in Páez et al. 2010 (pag 291) and in details section.
seed to generate the data sets
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
Ruiz M, López FA, A Páez. (2010). Testing for spatial association of qualitative data using symbolic dynamics. Journal of Geographical Systems. 12 (3) 281-309
Q.test
, Q.map.test
, sp.runs.test
, scan.test
# N <- 100 cx <- runif(N) cy <- runif(N) coor <- cbind(cx,cy) p <- c(1/6,3/6,2/6) rho = 0.5 listw <- spdep::nb2listw(spdep::knn2nb(spdep::knearneigh(cbind(cx,cy), k = 4))) xf <- dgp.spq(list = listw, p = p, rho = rho) data(provinces_spain) listw <- spdep::poly2nb(provinces_spain, queen = FALSE) p <- c(1/6,3/6,2/6) rho = 0.9 xf <- dgp.spq(p = p, listw = listw, rho = rho) provinces_spain$xf <- xf plot(provinces_spain["xf"])
# N <- 100 cx <- runif(N) cy <- runif(N) coor <- cbind(cx,cy) p <- c(1/6,3/6,2/6) rho = 0.5 listw <- spdep::nb2listw(spdep::knn2nb(spdep::knearneigh(cbind(cx,cy), k = 4))) xf <- dgp.spq(list = listw, p = p, rho = rho) data(provinces_spain) listw <- spdep::poly2nb(provinces_spain, queen = FALSE) p <- c(1/6,3/6,2/6) rho = 0.9 xf <- dgp.spq(p = p, listw = listw, rho = rho) provinces_spain$xf <- xf plot(provinces_spain["xf"])
A simple feature (sf) dataframe containing the locations of a selection of fast food restaurants in the city of Toronto, Canada (data are from 2008). The data are projected using EPSG: 26917 (WGS 84/UTM Zone 17N).
data(FastFood.sf)
data(FastFood.sf)
A simple features object with 614 rows and 3 variables:
Unique identifier of record.
Factor with 3 types of fast food restaurants: [P]izza, [S]andwich, and [H]amburger
Geometry of simple features.
Ruiz et al. (2010) https://link.springer.com/article/10.1007/s10109-009-0100-1
Ruiz M, López FA, A Páez. (2010). Testing for spatial association of qualitative data using symbolic dynamics. Journal of Geographical Systems. 12 (3) 281-309
A function to compute joincount
tests for spatial qualitative data. This function
is a wrapper of joincount.multi
and joincount.test
in spdep package.
jc.test(formula = NULL, data = NULL, fx = NULL, listw = NULL, na.action, zero.policy = NULL, distr = "asymptotic", alternative = "greater", control =list())
jc.test(formula = NULL, data = NULL, fx = NULL, listw = NULL, na.action, zero.policy = NULL, distr = "asymptotic", alternative = "greater", control =list())
formula |
a symbolic description of the factor(s). |
data |
an (optional) data frame or a sf object with points/multipolygons geometry containing the variable(s) to be tested. |
fx |
a factor or a matrix of factors in columns |
listw |
A |
na.action |
A function (default |
zero.policy |
Similar to the corresponding parameter of
|
distr |
character. Distribution type "asymptotic" (default) or "mc". |
alternative |
character string specifying the alternative hypothesis, must be one of "greater" (default), or "less". |
control |
list of additional arguments. |
An spjctest
object. This type of object
is a list of htest
objects.
The length of the list is the number of
factor variables included in the
formula
or the number of columns
in fx
.
Each element of the list can be a
jclist
object, for binomial factors,
or a jcmulti
object for multinomial
factors. See joincount.test
or joincount.multi
for additional details.
nsim |
number of permutations used in
the Monte Carlo distribution.
Default nsim = 999. |
seedinit |
seed to select the initial
element during the simulations.
Default seedinit = 1111. |
adjust.n |
default *TRUE*,
if *FALSE* the number of observations
is not adjusted for no-neighbour
observations, if *TRUE*, the number of
observations is adjusted consistently.
See joincount.multi
for aditional information. |
spChk |
should the data vector
names be checked against the spatial
objects for identity integrity,
*TRUE*, or *FALSE*, default *NULL* to
use *get.spChkOption()*.
See joincount.multi
for aditional information. |
sampling |
default *nonfree*,
may be *free*.
See joincount.test for
additional information. |
queen |
default *TRUE*. Defines the neighborhood criteria for *sf* objects. |
style |
defines the style for
listw . Default = *B* (binary). |
knn |
chooses the number of
neighboors when this criteria is used.
Default knn = 5.
|
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
Cliff, A. D., Ord, J. K. 1981 Spatial processes, Pion, pp. 19-20.
Upton, G., Fingleton, B. 1985 Spatial data analysis by example: point pattern and qualitative data, Wiley, pp. 158–170.
print.summary.spqtest
,
joincount.test
,
joincount.multi
## Case 1 ## Multinomial + Binomial using a sf multipolygon data("provinces_spain") # sf::sf_use_s2(FALSE) provinces_spain$Mal2Fml <- factor(provinces_spain$Mal2Fml > 100) levels(provinces_spain$Mal2Fml) = c("men","woman") provinces_spain$Older <- cut(provinces_spain$Older, breaks = c(-Inf,19,22.5,Inf)) levels(provinces_spain$Older) = c("low","middle","high") f1 <- ~ Older + Mal2Fml jc1 <- jc.test(formula = f1, data = provinces_spain, distr = "mc", alternative = "greater", zero.policy = TRUE) summary(jc1) provinces_spain$Coast <- factor(provinces_spain$Coast) levels(provinces_spain$Coast) = c("no","yes") f2 <- ~ Mal2Fml + Coast jc2 <- jc.test(formula = f2, data = provinces_spain, distr = "mc", zero.policy = TRUE) summary(jc2) # Case 2: ## Multinomial using a sf multipoint data("FastFood.sf") # sf::sf_use_s2(FALSE) f1 <- ~ Type jc3 <- jc.test(formula = f1, data = FastFood.sf, distr = "asymptotic", control = list(knn = 6)) summary(jc3) # Examples function joincount.test data(oldcol, package = "spdep") HICRIME <- cut(COL.OLD$CRIME, breaks = c(0,35,80), labels = c("low","high")) names(HICRIME) <- rownames(COL.OLD) jc4 <- jc.test(fx = HICRIME, listw = spdep::nb2listw(COL.nb, style="B")) summary(jc4) jc5 <- jc.test(fx = HICRIME, listw = spdep::nb2listw(COL.nb, style="B"), distr = "mc") summary(jc5) HICRIME <- cut(COL.OLD$CRIME, breaks = c(0, 35, 80), labels = c("low","high")) names(HICRIME) <- rownames(COL.OLD) jc6 <- jc.test(fx = HICRIME, listw = spdep::nb2listw(COL.nb, style = "B")) summary(jc6)
## Case 1 ## Multinomial + Binomial using a sf multipolygon data("provinces_spain") # sf::sf_use_s2(FALSE) provinces_spain$Mal2Fml <- factor(provinces_spain$Mal2Fml > 100) levels(provinces_spain$Mal2Fml) = c("men","woman") provinces_spain$Older <- cut(provinces_spain$Older, breaks = c(-Inf,19,22.5,Inf)) levels(provinces_spain$Older) = c("low","middle","high") f1 <- ~ Older + Mal2Fml jc1 <- jc.test(formula = f1, data = provinces_spain, distr = "mc", alternative = "greater", zero.policy = TRUE) summary(jc1) provinces_spain$Coast <- factor(provinces_spain$Coast) levels(provinces_spain$Coast) = c("no","yes") f2 <- ~ Mal2Fml + Coast jc2 <- jc.test(formula = f2, data = provinces_spain, distr = "mc", zero.policy = TRUE) summary(jc2) # Case 2: ## Multinomial using a sf multipoint data("FastFood.sf") # sf::sf_use_s2(FALSE) f1 <- ~ Type jc3 <- jc.test(formula = f1, data = FastFood.sf, distr = "asymptotic", control = list(knn = 6)) summary(jc3) # Examples function joincount.test data(oldcol, package = "spdep") HICRIME <- cut(COL.OLD$CRIME, breaks = c(0,35,80), labels = c("low","high")) names(HICRIME) <- rownames(COL.OLD) jc4 <- jc.test(fx = HICRIME, listw = spdep::nb2listw(COL.nb, style="B")) summary(jc4) jc5 <- jc.test(fx = HICRIME, listw = spdep::nb2listw(COL.nb, style="B"), distr = "mc") summary(jc5) HICRIME <- cut(COL.OLD$CRIME, breaks = c(0, 35, 80), labels = c("low","high")) names(HICRIME) <- rownames(COL.OLD) jc6 <- jc.test(fx = HICRIME, listw = spdep::nb2listw(COL.nb, style = "B")) summary(jc6)
This function calculates the local Join Count tests.This function
is a wrapper of local_joincount
in rgeoda package.
local.jc.test(formula = NULL, data = NULL, fx = NULL, case = NULL, coor = NULL, listw = listw, nsim = 999, control = list())
local.jc.test(formula = NULL, data = NULL, fx = NULL, case = NULL, coor = NULL, listw = listw, nsim = 999, control = list())
formula |
An (optional) formula with the factor included in |
data |
An (optional) data frame or a sf object containing the variable to testing for. |
fx |
An (optional) factor of observations with the same length as the neighbours list in |
case |
level of the factor used to obtain the local Join Count test. See details. |
coor |
(optional) a 2xN vector with spatial coordinates. Used when *data* is not a spatial object. |
listw |
A neighbourhood list (an object type knn, listw or nb). |
nsim |
The number of permutation to obtain the local Join Count tests. Default value 999. |
control |
Optional argument. See Control argument section. |
This test develop by Anselin and Li (2019) can be apply only for binary variables
(usually named B and W), and count the number of joins of type B-B where the proportion of
observations with B is lower than half. The interest lies in identifying co-occurrences of
uncommon events, i.e., situations where observations that take on the value of B constitute
much less than half of the sample.
The statistic is defined as:
only meaningful for those observations where , since for
the result will always equal zero.
The object listw
can be the class:
knn
: Objects of the class knn that consider the neighbours in
order of proximity.
nb
: If the neighbours are obtained from an sf object, the code internally
will call the function nb2nb_order
it will order them in order
of proximity of the centroids.
matrix
: If a object of matrix class based in the inverse of
the distance in introduced as argument, the function nb2nb_order
will
also be called internally to transform the object the class matrix to a matrix of the
class nb with ordered neighbours.
The output is an object of the class localjc local.SRQ
A matrix with
nn |
number of neighbourhood in the localization 'i'. |
ljc |
local Join Count statistics. |
pseudo.value |
p-value of local jc tests. |
seedinit |
Numerical value for the seed in boot version. Default value seedinit = 123 |
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
Anselin, L., and Li, X. (2019). Operational local join Count statistics for cluster detection. Journal of Geographical Systems, 21(2), 189-210.
# Case 1: Local spatial runs test based on knn N <- 100 cx <- runif(N) cy <- runif(N) coor <- cbind(cx,cy) listw <- spdep::knearneigh(coor, k = 4) p <- c(1/2,1/2) rho <- 0.5 fx <- dgp.spq(p = p, listw = listw, rho = rho) ljc <- local.jc.test(fx = fx, coor = coor, case= "A", listw = listw) print(ljc) plot(ljc, coor = coor, sig = 0.1) # Case 2:Fastfood example. sf (points) data("FastFood.sf") # sf::sf_use_s2(FALSE) x <- sf::st_coordinates(sf::st_centroid(FastFood.sf)) listw <- spdep::knearneigh(x, k = 6) formula <- ~ Type ljc <- local.jc.test(formula = formula, data = FastFood.sf, case = "H",listw = listw) print(ljc) plot(ljc, sf = FastFood.sf, sig = 0.05) # Case 3: With a sf object (poligons) fname <- system.file("shape/nc.shp", package="sf") nc <- sf::st_read(fname) listw <- spdep::poly2nb(as(nc,"Spatial"), queen = FALSE) p <- c(1/3,2/3) rho = 0.5 nc$fx <- dgp.spq(p = p, listw = listw, rho = rho) plot(nc["fx"]) formula <- ~ fx ljc <- local.jc.test(formula = formula, data = nc, case = "A", listw = listw) ljc plot(ljc, sf = nc) # Case 4: With isolated areas data(provinces_spain) # sf::sf_use_s2(FALSE) listw <- spdep::poly2nb(as(provinces_spain,"Spatial"), queen = FALSE) provinces_spain$Mal2Fml<- factor(provinces_spain$Mal2Fml > 100) levels(provinces_spain$Mal2Fml) = c("men","woman") formula <- ~ Mal2Fml ljc <- local.jc.test(formula = formula, data = provinces_spain, listw = listw) print(ljc) plot(ljc, sf = provinces_spain, sig = 0.1) # Case 5: Regular lattice data(Boots.sf) listw <- spdep::poly2nb(as(Boots.sf,"Spatial"), queen = TRUE) formula <- ~ BW ljc <- local.jc.test(formula = formula, data = Boots.sf, case="B", listw = listw) print(ljc) plot(ljc, sf = Boots.sf, sig = 0.05) lsr <- local.sp.runs.test(formula = formula, data = Boots.sf, distr = "bootstrap", nsim= 99, listw = listw) print(lsr) plot(lsr, sf = Boots.sf, sig = 0.1) scan <- scan.test(formula = formula, data = Boots.sf, nsim = 299, dist = "bernoulli", nv = 60 , case = "B", windows = "elliptic") plot(scan, sf = Boots.sf)
# Case 1: Local spatial runs test based on knn N <- 100 cx <- runif(N) cy <- runif(N) coor <- cbind(cx,cy) listw <- spdep::knearneigh(coor, k = 4) p <- c(1/2,1/2) rho <- 0.5 fx <- dgp.spq(p = p, listw = listw, rho = rho) ljc <- local.jc.test(fx = fx, coor = coor, case= "A", listw = listw) print(ljc) plot(ljc, coor = coor, sig = 0.1) # Case 2:Fastfood example. sf (points) data("FastFood.sf") # sf::sf_use_s2(FALSE) x <- sf::st_coordinates(sf::st_centroid(FastFood.sf)) listw <- spdep::knearneigh(x, k = 6) formula <- ~ Type ljc <- local.jc.test(formula = formula, data = FastFood.sf, case = "H",listw = listw) print(ljc) plot(ljc, sf = FastFood.sf, sig = 0.05) # Case 3: With a sf object (poligons) fname <- system.file("shape/nc.shp", package="sf") nc <- sf::st_read(fname) listw <- spdep::poly2nb(as(nc,"Spatial"), queen = FALSE) p <- c(1/3,2/3) rho = 0.5 nc$fx <- dgp.spq(p = p, listw = listw, rho = rho) plot(nc["fx"]) formula <- ~ fx ljc <- local.jc.test(formula = formula, data = nc, case = "A", listw = listw) ljc plot(ljc, sf = nc) # Case 4: With isolated areas data(provinces_spain) # sf::sf_use_s2(FALSE) listw <- spdep::poly2nb(as(provinces_spain,"Spatial"), queen = FALSE) provinces_spain$Mal2Fml<- factor(provinces_spain$Mal2Fml > 100) levels(provinces_spain$Mal2Fml) = c("men","woman") formula <- ~ Mal2Fml ljc <- local.jc.test(formula = formula, data = provinces_spain, listw = listw) print(ljc) plot(ljc, sf = provinces_spain, sig = 0.1) # Case 5: Regular lattice data(Boots.sf) listw <- spdep::poly2nb(as(Boots.sf,"Spatial"), queen = TRUE) formula <- ~ BW ljc <- local.jc.test(formula = formula, data = Boots.sf, case="B", listw = listw) print(ljc) plot(ljc, sf = Boots.sf, sig = 0.05) lsr <- local.sp.runs.test(formula = formula, data = Boots.sf, distr = "bootstrap", nsim= 99, listw = listw) print(lsr) plot(lsr, sf = Boots.sf, sig = 0.1) scan <- scan.test(formula = formula, data = Boots.sf, nsim = 299, dist = "bernoulli", nv = 60 , case = "B", windows = "elliptic") plot(scan, sf = Boots.sf)
This function calculates the local spatial runs tests for all localizations.
local.sp.runs.test(formula = NULL, data = NULL, fx = NULL, distr = "asymptotic", listw = listw, alternative = "two.sided" , nsim = NULL, control = list())
local.sp.runs.test(formula = NULL, data = NULL, fx = NULL, distr = "asymptotic", listw = listw, alternative = "two.sided" , nsim = NULL, control = list())
formula |
An (optional) formula with the factor included in |
data |
An (optional) data frame or a sf object containing the variable to testing for. |
fx |
An (optional) factor of observations with the same length as the neighbors list in |
distr |
a character string specifying the distribution "asymptotic" (default) or "bootstrap" |
listw |
A neighbourhood list (an object type knn or nb) or a W matrix that indicates the order of the elements in each $m_i-environment$ (for example of inverse distance). To calculate the number of runs in each $m_i-environment$, an order must be established, for example from the nearest neighbour to the furthest one. |
alternative |
a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". |
nsim |
Default value is NULL to obtain the asymptotic version of the local test. For the bootstrap version nsim is the number of permutations to obtain the pseudo-value. |
control |
Optional argument. See Control Argument section. |
The object listw
can be the class:
knn
: Objects of the class knn that consider the neighbours in
order of proximity.
nb
: If the neighbours are obtained from an sf object, the code internally
will call the function nb2nb_order
it will order them in order
of proximity of the centroids.
matrix
: If a object of matrix class based in the inverse of
the distance in introduced as argument, the function nb2nb_order
will
also be called internally to transform the object the class matrix to a matrix of the
class nb with ordered neighbours.
The output is an object of the class localsrq local.SRQ
A matrix with
runs.i |
number of runs in the localization 'i'. |
E.i |
expectation of local runs statistic in the localization 'i'. |
Sd.i |
standard deviate of local runs statistic in the localization 'i'. |
z.value |
standard value of local runs statistic (only for asymptotic version). |
p.value |
p-value of local local runs statistic (only for asymptotic version). |
zseudo.value |
standard value of local runs statistic (only for boots version). |
pseudo.value |
p-value of local runs statistic (only for boots version). |
MeanNeig
Mean of run.i MaxNeig
Maximum of run.i listw
the object listw
alternative
a character string describing the alternative hypothesis
seedinit |
Numerical value for the seed in boot version. Default value seedinit = 123 |
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
@references
Ruiz, M., López, F., and Páez, A. (2021). A test for global and local homogeneity of categorical data based on spatial runs. Working paper.
# Case 1: Local spatial runs test based on knn N <- 100 cx <- runif(N) cy <- runif(N) x <- cbind(cx,cy) listw <- spdep::knearneigh(cbind(cx,cy), k = 10) p <- c(1/6,3/6,2/6) rho <- 0.5 fx <- dgp.spq(p = p, listw = listw, rho = rho) # Asymtotic version lsrq <- local.sp.runs.test(fx = fx, listw = listw, alternative = "less") print(lsrq) plot(lsrq, sig = 0.05) # Asymtotic version lsrq <- local.sp.runs.test(fx = fx, listw = listw, alternative = "two.sided", distr ="bootstrap", nsim = 399) print(lsrq) plot(lsrq, sig = 0.1) # Case 2:Fastfood example. sf (points) data("FastFood.sf") # sf::sf_use_s2(FALSE) x <- sf::st_coordinates(sf::st_centroid(FastFood.sf)) listw <- spdep::knearneigh(x, k = 10) formula <- ~ Type lsrq <- local.sp.runs.test(formula = formula, data = FastFood.sf, listw = listw) print(lsrq) plot(lsrq, sf = FastFood.sf, sig = 0.05) # Case 3: With a sf object (poligons) fname <- system.file("shape/nc.shp", package="sf") nc <- sf::st_read(fname) listw <- spdep::poly2nb(as(nc,"Spatial"), queen = FALSE) p <- c(1/6,3/6,2/6) rho = 0.5 nc$fx <- dgp.spq(p = p, listw = listw, rho = rho) plot(nc["fx"]) formula <- ~ fx lsrq <- local.sp.runs.test(formula = formula, data = nc, listw = listw) print(lsrq) plot(lsrq, sf = nc) # Version boot lsrq <- local.sp.runs.test(formula = formula, data = nc, listw = listw, distr ="bootstrap", nsim = 399) print(lsrq) plot(lsrq, sf = nc) # Case 4: With isolated areas data(provinces_spain) listw <- spdep::poly2nb(as(provinces_spain,"Spatial"), queen = FALSE) provinces_spain$Mal2Fml<- factor(provinces_spain$Mal2Fml > 100) levels(provinces_spain$Mal2Fml) = c("men","woman") plot(provinces_spain["Mal2Fml"]) formula <- ~ Mal2Fml lsrq <- local.sp.runs.test(formula = formula, data = provinces_spain, listw = listw) print(lsrq) plot(lsrq, sf = provinces_spain, sig = 0.1) # Boots Version lsrq <- local.sp.runs.test(formula = formula, data = provinces_spain, listw = listw, distr ="bootstrap", nsim = 199) print(lsrq) plot(lsrq, sf = provinces_spain, sig = 0.10) # Case 5: SRQ test based on a distance matrix (inverse distance) N <- 100 cx <- runif(N) cy <- runif(N) coor <- as.data.frame(cbind(cx,cy)) coor <- sf::st_as_sf(coor,coords = c("cx","cy")) n = dim(coor)[1] dis <- 1/matrix(as.numeric(sf::st_distance(coor,coor)), ncol = n, nrow = n) diag(dis) <- 0 dis <- (dis < quantile(dis,.10))*dis p <- c(1/6,3/6,2/6) rho <- 0.5 fx <- dgp.spq(p = p, listw = dis, rho = rho) lsrq <- local.sp.runs.test(fx = fx, listw = dis) print(lsrq) plot(lsrq, coor = cbind(cx,cy), sig = 0.05) lsrq <- local.sp.runs.test(fx = fx, listw = dis, data = ) print(lsrq) plot(lsrq, sf = coor) # Version boots lsrq <- local.sp.runs.test(fx = fx, listw = dis, data = coor, distr ="bootstrap", nsim = 299) print(lsrq) plot(lsrq, sf = coor) # SRQ test based on inverse distance data("FastFood.sf") # sf::sf_use_s2(FALSE) n = dim(FastFood.sf)[1] dis <- 1000000/matrix(as.numeric( sf::st_distance(FastFood.sf, FastFood.sf)), ncol = n, nrow = n) diag(dis) <- 0 dis <- (dis < quantile(dis,.01))*dis formula <- ~ Type lsrq <- local.sp.runs.test(formula = formula, data = FastFood.sf, listw = dis) print(lsrq) # plot(lsrq, sf = FastFood.sf)
# Case 1: Local spatial runs test based on knn N <- 100 cx <- runif(N) cy <- runif(N) x <- cbind(cx,cy) listw <- spdep::knearneigh(cbind(cx,cy), k = 10) p <- c(1/6,3/6,2/6) rho <- 0.5 fx <- dgp.spq(p = p, listw = listw, rho = rho) # Asymtotic version lsrq <- local.sp.runs.test(fx = fx, listw = listw, alternative = "less") print(lsrq) plot(lsrq, sig = 0.05) # Asymtotic version lsrq <- local.sp.runs.test(fx = fx, listw = listw, alternative = "two.sided", distr ="bootstrap", nsim = 399) print(lsrq) plot(lsrq, sig = 0.1) # Case 2:Fastfood example. sf (points) data("FastFood.sf") # sf::sf_use_s2(FALSE) x <- sf::st_coordinates(sf::st_centroid(FastFood.sf)) listw <- spdep::knearneigh(x, k = 10) formula <- ~ Type lsrq <- local.sp.runs.test(formula = formula, data = FastFood.sf, listw = listw) print(lsrq) plot(lsrq, sf = FastFood.sf, sig = 0.05) # Case 3: With a sf object (poligons) fname <- system.file("shape/nc.shp", package="sf") nc <- sf::st_read(fname) listw <- spdep::poly2nb(as(nc,"Spatial"), queen = FALSE) p <- c(1/6,3/6,2/6) rho = 0.5 nc$fx <- dgp.spq(p = p, listw = listw, rho = rho) plot(nc["fx"]) formula <- ~ fx lsrq <- local.sp.runs.test(formula = formula, data = nc, listw = listw) print(lsrq) plot(lsrq, sf = nc) # Version boot lsrq <- local.sp.runs.test(formula = formula, data = nc, listw = listw, distr ="bootstrap", nsim = 399) print(lsrq) plot(lsrq, sf = nc) # Case 4: With isolated areas data(provinces_spain) listw <- spdep::poly2nb(as(provinces_spain,"Spatial"), queen = FALSE) provinces_spain$Mal2Fml<- factor(provinces_spain$Mal2Fml > 100) levels(provinces_spain$Mal2Fml) = c("men","woman") plot(provinces_spain["Mal2Fml"]) formula <- ~ Mal2Fml lsrq <- local.sp.runs.test(formula = formula, data = provinces_spain, listw = listw) print(lsrq) plot(lsrq, sf = provinces_spain, sig = 0.1) # Boots Version lsrq <- local.sp.runs.test(formula = formula, data = provinces_spain, listw = listw, distr ="bootstrap", nsim = 199) print(lsrq) plot(lsrq, sf = provinces_spain, sig = 0.10) # Case 5: SRQ test based on a distance matrix (inverse distance) N <- 100 cx <- runif(N) cy <- runif(N) coor <- as.data.frame(cbind(cx,cy)) coor <- sf::st_as_sf(coor,coords = c("cx","cy")) n = dim(coor)[1] dis <- 1/matrix(as.numeric(sf::st_distance(coor,coor)), ncol = n, nrow = n) diag(dis) <- 0 dis <- (dis < quantile(dis,.10))*dis p <- c(1/6,3/6,2/6) rho <- 0.5 fx <- dgp.spq(p = p, listw = dis, rho = rho) lsrq <- local.sp.runs.test(fx = fx, listw = dis) print(lsrq) plot(lsrq, coor = cbind(cx,cy), sig = 0.05) lsrq <- local.sp.runs.test(fx = fx, listw = dis, data = ) print(lsrq) plot(lsrq, sf = coor) # Version boots lsrq <- local.sp.runs.test(fx = fx, listw = dis, data = coor, distr ="bootstrap", nsim = 299) print(lsrq) plot(lsrq, sf = coor) # SRQ test based on inverse distance data("FastFood.sf") # sf::sf_use_s2(FALSE) n = dim(FastFood.sf)[1] dis <- 1000000/matrix(as.numeric( sf::st_distance(FastFood.sf, FastFood.sf)), ncol = n, nrow = n) diag(dis) <- 0 dis <- (dis < quantile(dis,.01))*dis formula <- ~ Type lsrq <- local.sp.runs.test(formula = formula, data = FastFood.sf, listw = dis) print(lsrq) # plot(lsrq, sf = FastFood.sf)
This function obtains the m-surroundings by selecting the *m-1* nearest neighbors of each observation, allowing for a degree of overlap of *r*.
m.surround(x, m, r = 1, distance = "Euclidean", control = list())
m.surround(x, m, r = 1, distance = "Euclidean", control = list())
x |
input sf object with points/multipolygons geometry or matrix of spatial coordinates |
m |
dimension of m-surrounding |
r |
maximum overlapping between any two m-surroundings. |
distance |
character. For Cartesian coordinates only: one of Euclidean,
Hausdorff or Frechet; for geodetic coordinates, Great Circle distances are
computed. (see |
control |
List of additional control arguments. |
A list of class list
and m_surr
containing the following components:
ms |
a matrix. Each row is a m-surrounding. |
R |
total number of observations. |
rowexcl |
index of rows excluded. |
mdtms |
distances between the observations of each m-surrounding. |
N |
total number of symbolized observations. |
m |
length of the m-surroundings. |
r |
overlapping degree. |
initobs |
element to start the generation of m-surroundings. |
distance |
type of distance. |
m |
length of the m-surroundings. |
x |
the input "x" as sf-object. |
initobs
: Initial observation to begin the
m-surrounding process. Default = 1.
dtmaxabs
: Threshold of distance (in absolute
value) to prune the m-surroundings. Any m-surrounding
exceeding the threshold is excluded.
If dtmaxabs
= 0 there is no exclusion of m-surroundings.
Default = 0.
dtmaxpc
: Threshold of distance (as a percentage
of the maximum distance between observations) to prune the m-surroundings. Any m-surrounding
exceeding the threshold is excluded.
Example if dtmaxpc
= 0.1 the m-surrounding exceeding
the 10
If dtmaxpc
= 0 there is no exclusion of m-surroundings.
Default = 0.
dtmaxknn
: Eliminate m-surroundings where some
of the elements are not among the closest
knn (k-nearest-neighbors).
Example, if dtmaxknn
= 4 exclude m-surroundings
where some of the elements are not between the 4 closest.
Default dtmaxknn
= 0 (no exclusion)
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
Ruiz M, López FA, A Páez. (2010). Testing for spatial association of qualitative data using symbolic dynamics. Journal of Geographical Systems. 12 (3) 281-309
# Example 1: Obtain m-surroundings with degree of overlapping r N <- 100 cx <- runif(N) cy <- runif(N) x <- cbind(cx,cy) m <- 3 r <- 1 msurr_points <- m.surround(x = x, m = m, r = r) plot(msurr_points, type = 1) plot(msurr_points, type = 2) summary(msurr_points) msurr_points <- m.surround(x = x, m = m, r = r, control = list(dtmaxpc = 0.1)) plot(msurr_points, type = 1) plot(msurr_points, type = 2) summary(msurr_points) msurr_points <- m.surround(x = x, m = m, r = r, control = list(dtmaxknn = 20)) plot(msurr_points, type = 1) plot(msurr_points, type = 2) summary(msurr_points) # Example 2: data("FastFood.sf") m <- 3 r <- 1 msurr_points <- m.surround(x = FastFood.sf, m = m, r = r, distance = "Euclidean", control = list(dtmaxpc = .001)) plot(msurr_points, type = 1) plot(msurr_points, type = 2) print(msurr_points) summary(msurr_points) msurr_points <- m.surround(x = FastFood.sf, m = m, r = r, distance = "Euclidean", control = list(dtmaxknn = 20)) plot(msurr_points, type = 1) plot(msurr_points, type = 2) summary(msurr_points) # Example 3: With isolated areas data(provinces_spain) # sf::sf_use_s2(FALSE) plot(sf::st_geometry(provinces_spain)) m <- 3 r <- 1 msurr_points <- m.surround(x = provinces_spain, m = m, r = r, distance = "Euclidean", control = list(dtmaxknn = 8)) plot(msurr_points, type = 1) plot(msurr_points, type = 2) # Example 4: Examples with multipolygons fname <- system.file("shape/nc.shp", package="sf") nc <- sf::st_read(fname) plot(sf::st_geometry(nc)) m <- 3 r <- 1 msurr_polygonsf <- m.surround(x = nc, m = m, r = r, distance = "Great Circle", control=list(dtmaxpc = 0.20)) plot(msurr_polygonsf, type = 1) plot(msurr_polygonsf, type = 2) # Example 5: With regular lattice sfc = sf::st_sfc(sf::st_polygon(list(rbind(c(0,0), c(1,0), c(1,1), c(0,1), c(0,0))))) hexs <- sf::st_make_grid(sfc, cellsize = 0.1, square = FALSE) hexs.sf <- sf::st_sf(hexs) listw <- spdep::poly2nb(as(hexs.sf, "Spatial"), queen = FALSE) m <- 3 r <- 1 msurr_polygonsf <- m.surround(x = hexs.sf, m = m, r = r) plot(msurr_polygonsf, type = 1) plot(msurr_polygonsf, type = 2) summary(msurr_polygonsf)
# Example 1: Obtain m-surroundings with degree of overlapping r N <- 100 cx <- runif(N) cy <- runif(N) x <- cbind(cx,cy) m <- 3 r <- 1 msurr_points <- m.surround(x = x, m = m, r = r) plot(msurr_points, type = 1) plot(msurr_points, type = 2) summary(msurr_points) msurr_points <- m.surround(x = x, m = m, r = r, control = list(dtmaxpc = 0.1)) plot(msurr_points, type = 1) plot(msurr_points, type = 2) summary(msurr_points) msurr_points <- m.surround(x = x, m = m, r = r, control = list(dtmaxknn = 20)) plot(msurr_points, type = 1) plot(msurr_points, type = 2) summary(msurr_points) # Example 2: data("FastFood.sf") m <- 3 r <- 1 msurr_points <- m.surround(x = FastFood.sf, m = m, r = r, distance = "Euclidean", control = list(dtmaxpc = .001)) plot(msurr_points, type = 1) plot(msurr_points, type = 2) print(msurr_points) summary(msurr_points) msurr_points <- m.surround(x = FastFood.sf, m = m, r = r, distance = "Euclidean", control = list(dtmaxknn = 20)) plot(msurr_points, type = 1) plot(msurr_points, type = 2) summary(msurr_points) # Example 3: With isolated areas data(provinces_spain) # sf::sf_use_s2(FALSE) plot(sf::st_geometry(provinces_spain)) m <- 3 r <- 1 msurr_points <- m.surround(x = provinces_spain, m = m, r = r, distance = "Euclidean", control = list(dtmaxknn = 8)) plot(msurr_points, type = 1) plot(msurr_points, type = 2) # Example 4: Examples with multipolygons fname <- system.file("shape/nc.shp", package="sf") nc <- sf::st_read(fname) plot(sf::st_geometry(nc)) m <- 3 r <- 1 msurr_polygonsf <- m.surround(x = nc, m = m, r = r, distance = "Great Circle", control=list(dtmaxpc = 0.20)) plot(msurr_polygonsf, type = 1) plot(msurr_polygonsf, type = 2) # Example 5: With regular lattice sfc = sf::st_sfc(sf::st_polygon(list(rbind(c(0,0), c(1,0), c(1,1), c(0,1), c(0,0))))) hexs <- sf::st_make_grid(sfc, cellsize = 0.1, square = FALSE) hexs.sf <- sf::st_sf(hexs) listw <- spdep::poly2nb(as(hexs.sf, "Spatial"), queen = FALSE) m <- 3 r <- 1 msurr_polygonsf <- m.surround(x = hexs.sf, m = m, r = r) plot(msurr_polygonsf, type = 1) plot(msurr_polygonsf, type = 2) summary(msurr_polygonsf)
The plot()
function allows the user to plot significant observations.
The print()
function is used to print the number of runs in each localization. Additional information of
expected values and standard deviation, z-value ans p-value is prited for each observation.
## S3 method for class 'localjc' print(x, ...) ## S3 method for class 'localjc' plot(x, ..., sf = NULL, coor = NULL, sig = 0.05)
## S3 method for class 'localjc' print(x, ...) ## S3 method for class 'localjc' plot(x, ..., sf = NULL, coor = NULL, sig = 0.05)
x |
a |
... |
further arguments passed to or from other methods. |
sf |
optional argument for |
coor |
optional argument for |
sig |
significant level for each observation in |
No return value, called for side effects
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
Ruiz, M., López, F., and Páez, A. (2021). A test for global and local homogeneity of categorical data based on spatial runs. working paper.
# Example 1: Local spatial runs test based on knn N <- 100 cx <- runif(N) cy <- runif(N) x <- cbind(cx,cy) listw <- spdep::knearneigh(cbind(cx,cy), k = 10) p <- c(1/6,3/6,2/6) rho <- 0.5 fx <- dgp.spq(p = p, listw = listw, rho = rho) # Asymtotic version lsrq <- local.sp.runs.test(fx = fx, listw = listw, alternative = "less") print(lsrq) plot(lsrq, sig = 0.05)
# Example 1: Local spatial runs test based on knn N <- 100 cx <- runif(N) cy <- runif(N) x <- cbind(cx,cy) listw <- spdep::knearneigh(cbind(cx,cy), k = 10) p <- c(1/6,3/6,2/6) rho <- 0.5 fx <- dgp.spq(p = p, listw = listw, rho = rho) # Asymtotic version lsrq <- local.sp.runs.test(fx = fx, listw = listw, alternative = "less") print(lsrq) plot(lsrq, sig = 0.05)
The plot()
function allows the user to plot significant observations.
The print()
function is used to print the number of runs in each localization. Additional information of
expected values and standard deviation, z-value ans p-value is prited for each observation.
## S3 method for class 'localsrq' print(x, ...) ## S3 method for class 'localsrq' plot(x, ..., sf = NULL, coor = NULL, sig = 0.05)
## S3 method for class 'localsrq' print(x, ...) ## S3 method for class 'localsrq' plot(x, ..., sf = NULL, coor = NULL, sig = 0.05)
x |
a |
... |
further arguments passed to or from other methods. |
sf |
optional argument for |
coor |
optional argument for |
sig |
significant level for each observation in |
No return value, called for side effects
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
Ruiz, M., López, F., and Páez, A. (2021). A test for global and local homogeneity of categorical data based on spatial runs. working paper.
# Example 1: Local spatial runs test based on knn N <- 100 cx <- runif(N) cy <- runif(N) x <- cbind(cx,cy) listw <- spdep::knearneigh(cbind(cx,cy), k = 10) p <- c(1/6,3/6,2/6) rho <- 0.5 fx <- dgp.spq(p = p, listw = listw, rho = rho) # Asymtotic version lsrq <- local.sp.runs.test(fx = fx, listw = listw, alternative = "less") print(lsrq) plot(lsrq, sig = 0.05)
# Example 1: Local spatial runs test based on knn N <- 100 cx <- runif(N) cy <- runif(N) x <- cbind(cx,cy) listw <- spdep::knearneigh(cbind(cx,cy), k = 10) p <- c(1/6,3/6,2/6) rho <- 0.5 fx <- dgp.spq(p = p, listw = listw, rho = rho) # Asymtotic version lsrq <- local.sp.runs.test(fx = fx, listw = listw, alternative = "less") print(lsrq) plot(lsrq, sig = 0.05)
A function to plots the m-surrounds give an object of the
class m_surr obtain with the code m.surround
.
The plot()
function allows the user view the configuration of the m-surroundings.
The argument type
select the type o visualization.
The print()
print the matrix of the m-surrounding.
.
The summary
give information about the characteristics of the m-surroundings.
.
## S3 method for class 'm_surr' summary(object, ...) ## S3 method for class 'm_surr' plot(x, ..., type = 1) ## S3 method for class 'm_surr' print(x, ...)
## S3 method for class 'm_surr' summary(object, ...) ## S3 method for class 'm_surr' plot(x, ..., type = 1) ## S3 method for class 'm_surr' print(x, ...)
object |
object of class m_surr. 2 plot W matrix with network |
... |
further arguments passed to or from other methods. |
x |
object of class m_surr |
type |
numeric. 1 (default) to get the plot with igraph. |
No return value, called for side effects
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
Ruiz, M., López, F., and Páez, A. (2021). A test for global and local homogeneity of categorical data based on spatial runs. Working paper.
# Example 1: Obtain m-surroundings with degree of overlapping r N <- 100 cx <- runif(N) cy <- runif(N) x <- cbind(cx,cy) m = 4 r = 2 msurr_points <- m.surround(x = x, m = m, r = r,control = list(dtmaxabs = 0.5)) plot(msurr_points, type = 1) plot(msurr_points, type = 2) print(msurr_points) # Example 2: data("FastFood.sf") m = 6 r = 1 msurr_points <- m.surround(x = FastFood.sf, m = m, r = r, distance = "Euclidean", control = list(dtmaxpc = .2)) plot(msurr_points, type = 1) plot(msurr_points, type = 2) print(msurr_points)
# Example 1: Obtain m-surroundings with degree of overlapping r N <- 100 cx <- runif(N) cy <- runif(N) x <- cbind(cx,cy) m = 4 r = 2 msurr_points <- m.surround(x = x, m = m, r = r,control = list(dtmaxabs = 0.5)) plot(msurr_points, type = 1) plot(msurr_points, type = 2) print(msurr_points) # Example 2: data("FastFood.sf") m = 6 r = 1 msurr_points <- m.surround(x = FastFood.sf, m = m, r = r, distance = "Euclidean", control = list(dtmaxpc = .2)) plot(msurr_points, type = 1) plot(msurr_points, type = 2) print(msurr_points)
A function to plot the difference in frequencies of symbols of each map.
The plot()
function to obtain the plot.
The argument ci
select the confidence level.
x |
object of class qmap |
ci |
confidence level for the difference of probabilities of symbols in 'plot' method.
Default |
... |
further arguments passed to or from other methods. |
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
Ruiz M, López FA and A Páez (2011). Comparison of Thematic Maps Using Symbolic Entropy. International Journal of Geographical Information Science, 26, 413-439.
Ruiz, M., López, F., and Páez, A. (2021). A test for global and local homogeneity of categorical data based on spatial runs. working paper.
# Example 1: N <- 100 cx <- runif(N) cy <- runif(N) coor <- cbind(cx,cy) p <- c(1/6,3/6,2/6) rho = 0.5 listw <- spdep::nb2listw(spdep::knn2nb(spdep::knearneigh(cbind(cx,cy), k = 4))) fx <- dgp.spq(list = listw, p = p, rho = rho) q.test <- Q.test(fx = fx, coor = coor, m = 3, r = 1) plot(q.test)
# Example 1: N <- 100 cx <- runif(N) cy <- runif(N) coor <- cbind(cx,cy) p <- c(1/6,3/6,2/6) rho = 0.5 listw <- spdep::nb2listw(spdep::knn2nb(spdep::knearneigh(cbind(cx,cy), k = 4))) fx <- dgp.spq(list = listw, p = p, rho = rho) q.test <- Q.test(fx = fx, coor = coor, m = 3, r = 1) plot(q.test)
The plot()
function allows the user plot the significant cluster(s).summary
list information about the most likelihood cluster.
## S3 method for class 'scantest' plot(x, ..., sf = NULL, coor = NULL) ## S3 method for class 'scantest' summary(object, ...)
## S3 method for class 'scantest' plot(x, ..., sf = NULL, coor = NULL) ## S3 method for class 'scantest' summary(object, ...)
x |
is a |
... |
further arguments passed to or from other methods. |
sf |
optional argument for |
coor |
optional argument for |
object |
a |
No return value, called for side effects
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
Ruiz, M., López, F., and Páez, A. (2021). A test for global and local homogeneity of categorical data based on spatial runs. Working paper.
The plot()
function allows the user obtain the plot of relative frequency
of each symbol (standard and equivalent) with the confidence interval.
The print()
function is used to get a list with the values of Q test
for standard symbolization based on permutations and for equivalent symbolization
based on combinations. summary()
print a table with the output of the Q test.
## S3 method for class 'spqtest' plot(x, ..., ci = 0.95)
## S3 method for class 'spqtest' plot(x, ..., ci = 0.95)
x |
a |
... |
further arguments passed to or from other methods |
ci |
confidence level for the intervals in |
This functions does not return any value
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
Ruiz, M., López, F., and Páez, A. (2021). A test for global and local homogeneity of categorical data based on spatial runs. working paper.
An auxiliary function. In the case of obtaining the list of neighbors of class nb or poly2nb, it is necessary to reorder the elements based on distance and/or angle.
nb2nb_order(listw = listw, sf = NULL)
nb2nb_order(listw = listw, sf = NULL)
listw |
an object of the nb class. |
sf |
the sf object used to get the |
Sort the elements of a list nb. First by distance and
An object of the nb class with the elements in order.
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
@references
Ruiz, M., López, F., and Páez, A. (2021). A test for global and local homogeneity of categorical data based on spatial runs. Working paper.
dgp.spq
, sp.runs.test
, local.sp.runs.test
# With a sf object (irregular lattice) library(sf) fname <- system.file("shape/nc.shp", package="sf") nc <- sf::st_read(fname) listw <- spdep::poly2nb(as(nc,"Spatial"), queen = FALSE) listw.order <- nb2nb_order(listw = listw, sf = nc) # With a sf object (regular lattice: hexagons) sfc = sf::st_sfc(sf::st_polygon(list(rbind(c(0,0), c(1,0), c(1,1), c(0,1), c(0,0))))) hexs <- sf::st_make_grid(sfc, cellsize = 0.1, square = FALSE) hexs.sf <- sf::st_sf(hexs) listw <- spdep::poly2nb(as(hexs.sf, "Spatial"), queen = FALSE) listw.order <- nb2nb_order(listw = listw, sf = hexs.sf)
# With a sf object (irregular lattice) library(sf) fname <- system.file("shape/nc.shp", package="sf") nc <- sf::st_read(fname) listw <- spdep::poly2nb(as(nc,"Spatial"), queen = FALSE) listw.order <- nb2nb_order(listw = listw, sf = nc) # With a sf object (regular lattice: hexagons) sfc = sf::st_sfc(sf::st_polygon(list(rbind(c(0,0), c(1,0), c(1,1), c(0,1), c(0,0))))) hexs <- sf::st_make_grid(sfc, cellsize = 0.1, square = FALSE) hexs.sf <- sf::st_sf(hexs) listw <- spdep::poly2nb(as(hexs.sf, "Spatial"), queen = FALSE) listw.order <- nb2nb_order(listw = listw, sf = hexs.sf)
A simple features object with geocoded information about respondents in the 1880 US Census with selected demographic information coded as dummy variables. The data are projected using EPSG: 32618 (WGS 84/UTM Zone 18N). The coordinates have been and jiggled to create unique coordinates for each observation.
data(Newark.sf)
data(Newark.sf)
A simple features dataframe with 21,520 rows and 8 columns:
Unique identifier of record.
Dummy variable for ethnicity of respondent: 1 if Yankee, 0 otherwise.
Dummy variable for ethnicity of respondent: 1 if Irish, 0 otherwise.
Dummy variable for ethnicity of respondent: 1 if German, 0 otherwise.
Dummy variable for age of respondent: 1 if younger than 30 years old, 0 otherwise.
Dummy variable for marital status of respondent: 1 if married, 0 otherwise.
Dummy variable for place of birth of respondent: 1 if born in the US, 0 otherwise.
geometry of the simple features object
Páez et al. (2012) doi:10.1080/00045608.2011.620502
Paez, A., Ruiz, M., Lopez, F. & Logan, J. (2012). Measuring Ethnic Clustering and Exposure with the Q Statistic: An Exploratory Analysis of Irish, Germans, and Yankees in 1880 Newark.. Annals of the Association of American Geographers.
data(Newark.sf) summary(Newark.sf)
data(Newark.sf) summary(Newark.sf)
Plot the empirical distribution of runs
## S3 method for class 'sprunstest' plot(x, ...)
## S3 method for class 'sprunstest' plot(x, ...)
x |
A object of class sprunstest. |
... |
further arguments passed to or from other methods. |
Plot the histogram with the empirical distribution of the runs
No return value, called for side effects
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
# Example 1: Fastfood example. sf (points) data("FastFood.sf") x <- sf::st_coordinates(sf::st_centroid(FastFood.sf)) listw <- spdep::knearneigh(x, k = 2) formula <- ~ Type srq <- sp.runs.test(formula = formula, data = FastFood.sf, listw = listw, nsim = 299) plot(srq) # Example 2: Spain example (poligons with 0 neinghbourhood) data("provinces_spain") sf::sf_use_s2(FALSE) listw <- spdep::poly2nb(as(provinces_spain,"Spatial"), queen = FALSE) provinces_spain$Older <- cut(provinces_spain$Older, breaks = c(-Inf,19,22.5,Inf)) levels(provinces_spain$Older) = c("low","middle","high") formula <- ~ Older srq <- sp.runs.test(formula = formula, data = provinces_spain, listw = listw, nsim = 299) plot(srq) provinces_spain$Mal2Fml <- factor(provinces_spain$Mal2Fml > 100) levels(provinces_spain$Mal2Fml) = c("men","woman") formula <- ~ Mal2Fml srq <- sp.runs.test(formula = formula, data = provinces_spain, listw = listw, nsim = 299) plot(srq)
# Example 1: Fastfood example. sf (points) data("FastFood.sf") x <- sf::st_coordinates(sf::st_centroid(FastFood.sf)) listw <- spdep::knearneigh(x, k = 2) formula <- ~ Type srq <- sp.runs.test(formula = formula, data = FastFood.sf, listw = listw, nsim = 299) plot(srq) # Example 2: Spain example (poligons with 0 neinghbourhood) data("provinces_spain") sf::sf_use_s2(FALSE) listw <- spdep::poly2nb(as(provinces_spain,"Spatial"), queen = FALSE) provinces_spain$Older <- cut(provinces_spain$Older, breaks = c(-Inf,19,22.5,Inf)) levels(provinces_spain$Older) = c("low","middle","high") formula <- ~ Older srq <- sp.runs.test(formula = formula, data = provinces_spain, listw = listw, nsim = 299) plot(srq) provinces_spain$Mal2Fml <- factor(provinces_spain$Mal2Fml > 100) levels(provinces_spain$Mal2Fml) = c("men","woman") formula <- ~ Mal2Fml srq <- sp.runs.test(formula = formula, data = provinces_spain, listw = listw, nsim = 299) plot(srq)
Print method for objects of class summary.spjctest.
## S3 method for class 'summary.spjctest' print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'summary.spjctest' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
object of class summary.spjctest. |
digits |
number of digits to show in printed tables. Default: max(3L, getOption("digits") - 3L). |
... |
further arguments passed to or from other methods. |
No return value, called for side effects
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
Print method for objects of class summary.spqtest.
## S3 method for class 'summary.spqtest' print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'summary.spqtest' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
object of class summary.spqtest. |
digits |
number of digits to show in printed tables. Default: max(3L, getOption("digits") - 3L). |
... |
further arguments passed to or from other methods. |
No return value, called for side effects
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
A simple features object with the provinces in Spain and selected demographic and economic information.
data(provinces_spain)
data(provinces_spain)
A simple features dataframe with 50 rows and 15 columns:
Names of provinces in Spain as factor
Names of Autonomous Communities in Spain as factor
National Institute of Statistics unique identifier of the provinces
Population in the province in 2020
Population density in the province in persons/km^2
Percentage of population 65 and older in the provice in 2020
Median age of population in the province in 2020
Ratio of male to female population in the province in 2020
GDP per capita in the province in 2016
Dummy variable for mass transit system in province; 1: YES
Area of the province
Altitude of the province
A dummy variable that indicates whether the province is in the coast; 1: YES
Identifier of meteorological station representative of the province used to retrieve climatic variables
geometry of the simple features object
Instituto Nacional de Estadistica http://www.ine.es/
Climatic data: Agencia Estatal de Meteorologia http://www.aemet.es/
Páez et al. (2020)
Paez, A., Lopez, F.A., Menezes, T., Cavalcanti, R., & Pitta, M. (2020). A Spatio‐Temporal Analysis of the Environmental Correlates of COVID‐19 Incidence in Spain. Geographical Analysis. 53(3) 397-421
data(provinces_spain) summary(provinces_spain)
data(provinces_spain) summary(provinces_spain)
This function compute the QE and QI tests for maps comparison based on symbolic entropy.
Q.map.test(formula = formula, data = data, coor = NULL, m = m, r = 1, type = "combinations", control = list())
Q.map.test(formula = formula, data = data, coor = NULL, m = m, r = 1, type = "combinations", control = list())
formula |
a symbolic description of the two factors. |
data |
(optional) a data frame or a sf object containing the variables to testing for. |
coor |
(optional) a 2xN vector with coordinates. |
m |
length of m-surrounding. |
r |
maximum overlapping between any two m-surroundings (default = 1). |
type |
Type of symbols: "permutations" or "combinations". Default "combinations" |
control |
Optional argument. See Control Argument section. |
If data
is not a sf object the coor
argument with the coordinates
of each observation must be included.
A list with two objects of the class htest
. The first one
is the QE test of Equivalence between maps and the second one is the QI test
of independence between maps. the elements of each test are:
method |
a character string giving description of the method. |
data.name |
a character string giving the name(s) of the data. |
statistic |
the value of the statistic QE or/and QI. |
alternative |
a character string describing the alternative hypothesis. |
p.value |
p-value for QE or QI. |
parameter |
free degree of the statistic for QE or QI. |
symb |
A matrix with the symbols. |
mh |
m-surrounding of th map. |
Tm |
number of maps (ONLY 2). |
sample.size |
number of symbolized observations. |
nsk |
a matrix Tm x symbols with the frequency of the number of symbols of each map. |
Several parameters to construct the m-surrounding
Delete degenerate surrounding based on the absolute distance between observations.
A value between 0 and 1. Delete degenerate surrounding based on the distance. Delete m-surrounding when the maximum distance between observation is upper than k percentage of maximum distance between anywhere observation.
A integer value 'k'. Delete degenerate surrounding based on the near neighborhood criteria. Delete m-surrounding is a element of the m-surrounding is not include in the set of k near neighborhood of the first element
seed to select the initial element to star the algorithm to compute the m-surroundings.
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
Ruiz M, López FA and A Páez (2011). Comparison of Thematic Maps Using Symbolic Entropy. International Journal of Geographical Information Science, 26, 413-439.
Ruiz, M., López, FA, and Páez, A. (2010). Testing for spatial association of qualitative data using symbolic dynamics. Journal of Geographical Systems, 12(3), 281-309.0.
# Case 1: N <- 200 cx <- runif(N) cy <- runif(N) x <- cbind(cx,cy) listw <- spdep::nb2listw(spdep::knn2nb( spdep::knearneigh(cbind(cx,cy), k = 4))) p <- c(1/6, 3/6, 2/6) rho = 0.5 QY1 <- dgp.spq(p = p, listw = listw, rho = rho) rho = 0.8 QY2 <- dgp.spq(p = p, listw = listw, rho = rho) dt = data.frame(QY1,QY2) m = 3 r = 1 formula <- ~ QY1 + QY2 control <- list(dtmaxknn = 10) qmap <- Q.map.test(formula = formula, data = dt, coor = x, m = m, r = r, type ="combinations", control = control) print(qmap) plot(qmap) plot(qmap, ci=.6) plot(qmap[[1]]$mh) summary(qmap[[1]]$mh) control <- list(dtmaxknn = 20) qmap <- Q.map.test(formula = formula, data = dt, coor = x, m = m, r = r, type ="permutations", control = control) print(qmap) plot(qmap) plot(qmap[[1]]$mh) qmap <- Q.map.test(formula = formula, data = dt, coor = x, m = m, r = r, type ="combinations") print(qmap) plot(qmap) control <- list(dtmaxknn = 10) qmap <- Q.map.test(formula = formula, data = dt, coor = x, m = m, r = r, type ="combinations", control = control) print(qmap) plot(qmap) # Case 2: data(provinces_spain) # sf::sf_use_s2(FALSE) m = 3 r = 1 provinces_spain$Mal2Fml <- factor(provinces_spain$Mal2Fml > 100) levels(provinces_spain$Mal2Fml) = c("men","woman") provinces_spain$Coast <- factor(provinces_spain$Coast) levels(provinces_spain$Coast) = c("no","yes") formula <- ~ Coast + Mal2Fml qmap <- Q.map.test(formula = formula, data = provinces_spain, m = m, r = r, type ="combinations") print(qmap) plot(qmap) plot(qmap[[1]]$mh) control <- list(dtmaxknn = 6) qmap <- Q.map.test(formula = formula, data = provinces_spain, m = m, r = r, type ="combinations", control = control) print(qmap) plot(qmap[[1]]$mh) summary(qmap[[1]]$mh)
# Case 1: N <- 200 cx <- runif(N) cy <- runif(N) x <- cbind(cx,cy) listw <- spdep::nb2listw(spdep::knn2nb( spdep::knearneigh(cbind(cx,cy), k = 4))) p <- c(1/6, 3/6, 2/6) rho = 0.5 QY1 <- dgp.spq(p = p, listw = listw, rho = rho) rho = 0.8 QY2 <- dgp.spq(p = p, listw = listw, rho = rho) dt = data.frame(QY1,QY2) m = 3 r = 1 formula <- ~ QY1 + QY2 control <- list(dtmaxknn = 10) qmap <- Q.map.test(formula = formula, data = dt, coor = x, m = m, r = r, type ="combinations", control = control) print(qmap) plot(qmap) plot(qmap, ci=.6) plot(qmap[[1]]$mh) summary(qmap[[1]]$mh) control <- list(dtmaxknn = 20) qmap <- Q.map.test(formula = formula, data = dt, coor = x, m = m, r = r, type ="permutations", control = control) print(qmap) plot(qmap) plot(qmap[[1]]$mh) qmap <- Q.map.test(formula = formula, data = dt, coor = x, m = m, r = r, type ="combinations") print(qmap) plot(qmap) control <- list(dtmaxknn = 10) qmap <- Q.map.test(formula = formula, data = dt, coor = x, m = m, r = r, type ="combinations", control = control) print(qmap) plot(qmap) # Case 2: data(provinces_spain) # sf::sf_use_s2(FALSE) m = 3 r = 1 provinces_spain$Mal2Fml <- factor(provinces_spain$Mal2Fml > 100) levels(provinces_spain$Mal2Fml) = c("men","woman") provinces_spain$Coast <- factor(provinces_spain$Coast) levels(provinces_spain$Coast) = c("no","yes") formula <- ~ Coast + Mal2Fml qmap <- Q.map.test(formula = formula, data = provinces_spain, m = m, r = r, type ="combinations") print(qmap) plot(qmap) plot(qmap[[1]]$mh) control <- list(dtmaxknn = 6) qmap <- Q.map.test(formula = formula, data = provinces_spain, m = m, r = r, type ="combinations", control = control) print(qmap) plot(qmap[[1]]$mh) summary(qmap[[1]]$mh)
A function to compute Q test for spatial qualitative data.
Q.test(formula = NULL, data = NULL, na.action, fx = NULL, coor = NULL, m = 3, r = 1, distr = "asymptotic", control = list())
Q.test(formula = NULL, data = NULL, na.action, fx = NULL, coor = NULL, m = 3, r = 1, distr = "asymptotic", control = list())
formula |
a symbolic description of the factor(s). |
data |
an (optional) data frame or a sf object with points/multipolygons geometry containing the variable(s) to be tested. |
na.action |
action with NA values |
fx |
a factor or a matrix of factors in columns |
coor |
(optional) a 2xN vector with spatial coordinates. Used when *data* is not a spatial object |
m |
length of m-surrounding (default = 3). |
r |
only for asimtotic distribution. Maximum overlapping between any two m-surroundings (default = 1). |
distr |
character. Distribution type "asymptotic" (default) or "mc". |
control |
Optional argument. See Control Argument section. |
The Q-test is a simple, consistent, and powerful statistic for qualitative spatial
independence that we develop using concepts from symbolic dynamics and symbolic
entropy. The Q test can be used to detect, given a spatial distribution of events,
patterns of spatial association of qualitative variables in a wide variety of
settings.
The Q(m) statistic was introduced by Ruiz et al. (2010) as a tool to explore geographical
co-location/co-occurrence of qualitative data. Consider a spatial variable X which is the
result of a qualitative process with a set number of categorical outcomes (j=1,...,k).
The spatial variable is observed at a set of fixed locations indexed by their coordinates
(i=1,..., N), so that at each location si where an event is observed,
takes one of the possible values
.
Since the observations are georeferenced, a spatial embedding protocol can be devised
to assess the spatial property of co-location. Let us define, for an observation at
a specified location, say , a surrounding of size m, called an m-surrounding.
The m-surrounding is the set of m-1 nearest neighbours from the perspective
of location . In the case of distance ties, a secondary criterion can be
invoked based on direction.
Once that an embedding protocol is adopted and the elements of the m-surrounding
for location have been determined, a string can be obtained that collects
the elements of the local neighborhood (the m-1 nearest neighbors) of the observation
at
. The m-surrounding can then be represented in the following way:
Since each observation Xs takes one of k possible values, and there are m observations in
the m-surrounding, there are exactly k possible unique ways in which those values can
co-locate. This is the number of permutations with replacement.
For instance, if k=2
(e.g. the possible outcomes are a1=0 and a2=1) and m=3, the following eight unique
patterns of co-location are possible (the number of symbols is =8): (0,0,0), (1,0,0),
(0,1,0), (0,0,1), (1,1,0), (1,0,1), (0,1,1), and (1,1,1). Each unique co-locationtype
can be denoted in a convenient way by means of a symbol
. It follows
that each site can be uniquely associated with a specific symbol, in a process termed
symbolization. In this way, we say that a location s is of type
if and only if
.
Equivalent symbols (see Páez, et al. 2012) can be obtained by counting the number of
occurrences of each category within an m-surrounding. This surrenders some
topological information (ordering within the m-surrounding is lost) in favor of a more
compact set of symbols, since the number of combinations with replacement.
Definition of Q(m) statistic
Let be a discrete spatial process and m be a fixed embedding
dimension. The statistic Q testing the null hypothesis:
is spatially independent, against any other alternative.
For a fixed , the relative frequency of symbols can be used to define the symbolic
entropy of the spatial process as the Shanon entropy of the distinct symbols:
where
with is simply the
number of times that the symbol
is observed and R the number of
symbolized locations.
The entropy function is bounded between
.
The Q statistic is essentially a likelihood ratio test between the symbolic entropy of the observed pattern and the entropy of the system under the null hypothesis of a random spatial sequence:
with . The statistic is asymptotically
distributed
with degrees of freedom equal to the number of symbols minus one.
An list of two object of the class htest
. Each element of the list return the:
data.name |
a character string giving the names of the data. |
statistic |
Value of the Q test |
N |
total number of observations. |
R |
total number of symbolized observations. |
m |
length m-surrounding. |
r |
degree of overlapping. |
df |
degree of freedom. |
distr |
type of distribution used to get the significance of the Q test. |
type |
type of symbols. |
character to select the type of distance. Default = "Euclidean" for Cartesian coordinates only: one of Euclidean, Hausdorff or Frechet; for geodetic coordinates, great circle distances are computed (see sf::st_distance())
Delete degenerate surrounding based on the absolute distance between observations.
A value between 0 and 1. Delete degenerate surrounding based on the distance. Delete m-surrounding when the maximum distance between observation is upper than k percentage of maximum distance between anywhere observation.
A integer value 'k'. Delete degenerate surrounding based on the near neighbourhood criteria. Delete m-surrounding is a element of the m-surrounding is not include in the set of k near neighbourhood of the first element
number of simulations for get the Monte Carlo distribution. Default = 999
seed to select the initial element to star the algorithm to get compute the m-surroundings or to start the simulation
The symbolization protocol proposed by Ruiz et al. (2010) - call these
Standard-Permutation Symbols — contains a large amount of topological information
regarding the units of analysis, including proximity and direction.
In this sense, the protocol is fairly general. On the other hand, it is easy to see
that the combinatorial possibilities can very quickly become unmanageable.
For a process with k = 3 outcomes and m = 5, the number of symbols becomes
; for k = 6 and m = 4 it is
. Depending on the number
of observations N, the explosion in the number of symbols can very rapidly consume
degrees of freedom for hypothesis testing, because as a rule of thumb
it is recommended that the number of symbolized locations be at least five times
the number of symbols used (e.g.,
), and R will usually be a fraction of N.
As an alternative, we propose a symbolization protocol that sacrifices
some amount of topological detail for conciseness. The alternative is based
on the standard scheme; however, instead of retaining proximity and
direction relationships, it maintains only the total number of occurrences
of each outcome in an m-surrounding. We call these Equivalent-Combination Symbols.
Because order in the sequence is not considered in this protocol, instead of a
permutation with repetition, the number of symbols reflects a combination with
repetition.
To select S locations for the analysis, coordinates are selected such that
for any two coordinates ,
the number of overlapping nearest
neighbours of
and
are at most r. The set S, which is a subset of all the
observations N, is defined recursively as follows. First choose a location
at random and fix an integer r
with
. The integer r is the degree of overlap, the maximum number of observations that contiguous
m-surroundings are allowed to have in common.
Let be the set of nearest neighbours
to
, where the
are ordered by distance to
, or angle in the case of ties.
Let us call
and define
. Take the set of
nearest neighbours to
, namely,
in the
set of locations
and define
. Nor for i>1 we define
where
is in the ser of nearest neighbors to
,
,of the set
.
Continue this process while there are locations to symbolize.
The bootstrapped-based testing can provide an advantage since overlapping between
m-surroundings is not a consideration, and the full sample can be used.
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
Ruiz M, López FA, A Páez. (2010). Testing for spatial association of qualitative data using symbolic dynamics. Journal of Geographical Systems. 12 (3) 281-309
López, FA, and A Páez. (2012). Distribution-free inference for Q(m) based on permutational bootstrapping: an application to the spatial co-location pattern of firms in Madrid Estadística Española, 177, 135-156.
# Case 1: With coordinates N <- 200 cx <- runif(N) cy <- runif(N) coor <- cbind(cx,cy) p <- c(1/6,3/6,2/6) rho = 0.3 listw <- spdep::nb2listw(spdep::knn2nb(spdep::knearneigh(cbind(cx,cy), k = 4))) fx <- dgp.spq(list = listw, p = p, rho = rho) q.test <- Q.test(fx = fx, coor = coor, m = 3, r = 1) summary(q.test) plot(q.test) print(q.test) q.test.mc <- Q.test(fx = fx, coor = coor, m = 3, distr = "mc", control = list(nsim = 999)) summary(q.test.mc) plot(q.test.mc) print(q.test.mc) # Case 2: With a sf object data("FastFood.sf") f1 <- ~ Type q.test <- Q.test(formula = f1, data = FastFood.sf, m = c(3, 4), r = c(1, 2, 3), control = list(distance ="Euclidean")) summary(q.test) plot(q.test) print(q.test) # Case 3: With a sf object with isolated areas data("provinces_spain") sf::sf_use_s2(FALSE) provinces_spain$Mal2Fml<- factor(provinces_spain$Mal2Fml > 100) levels(provinces_spain$Mal2Fml) = c("men","woman") provinces_spain$Older <- cut(provinces_spain$Older, breaks = c(-Inf,19,22.5,Inf)) levels(provinces_spain$Older) = c("low","middle","high") f1 <- ~ Older + Mal2Fml q.test <- Q.test(formula = f1, data = provinces_spain, m = 3, r = 1, control = list(seedinit = 1111)) summary(q.test) print(q.test) plot(q.test) q.test.mc <- Q.test(formula = f1, data = provinces_spain, m = 4, r = 3, distr = "mc", control = list(seedinit = 1111)) summary(q.test.mc) print(q.test.mc) plot(q.test.mc) # Case 4: Examples with multipolygons library(sf) fname <- system.file("shape/nc.shp", package="sf") nc <- sf::st_read(fname) qb79 <- quantile(nc$BIR79) nc$QBIR79 <- (nc$BIR79 > qb79[2]) + (nc$BIR79 > qb79[3]) + (nc$BIR79 >= qb79[4]) + 1 nc$QBIR79 <- as.factor(nc$QBIR79) plot(nc["QBIR79"], pal = c("#FFFEDE","#FFDFA2", "#FFA93F", "#D5610D"), main = "BIR79 (Quartiles)") sid79 <- quantile(nc$SID79) nc$QSID79 <- (nc$SID79 > sid79[2]) + (nc$SID79 > sid79[3]) + (nc$SID79 >= sid79[4]) + 1 nc$QSID79 <- as.factor(nc$QSID79) plot(nc["QSID79"], pal = c("#FFFEDE","#FFDFA2", "#FFA93F", "#D5610D"), main = "SID79 (Quartiles)") f1 <- ~ QSID79 + QBIR79 lq1nc <- Q.test(formula = f1, data = nc, m = 5, r = 2, control = list(seedinit = 1111, dtmaxpc = 0.5, distance = "Euclidean") ) print(lq1nc) lq2nc <- Q.test(formula = f1, data = nc, m = 5, r = 2, control = list(dtmaxpc = 0.2) ) print(lq2nc) lq3nc <- Q.test(formula = f1, data = nc, m = 5, r = 2, control = list(dtmaxknn = 5) ) print(lq3nc) # Case 5: Examples with points and matrix of variables fx <- matrix(c(nc$QBIR79, nc$QSID79), ncol = 2, byrow = TRUE) mctr <- suppressWarnings(sf::st_centroid(st_geometry(nc))) mcoor <- st_coordinates(mctr)[,c("X","Y")] q.test <- Q.test(fx = fx, coor = mcoor, m = 5, r = 2, control = list(seedinit = 1111, dtmaxpc = 0.5)) print(q.test) plot(q.test)
# Case 1: With coordinates N <- 200 cx <- runif(N) cy <- runif(N) coor <- cbind(cx,cy) p <- c(1/6,3/6,2/6) rho = 0.3 listw <- spdep::nb2listw(spdep::knn2nb(spdep::knearneigh(cbind(cx,cy), k = 4))) fx <- dgp.spq(list = listw, p = p, rho = rho) q.test <- Q.test(fx = fx, coor = coor, m = 3, r = 1) summary(q.test) plot(q.test) print(q.test) q.test.mc <- Q.test(fx = fx, coor = coor, m = 3, distr = "mc", control = list(nsim = 999)) summary(q.test.mc) plot(q.test.mc) print(q.test.mc) # Case 2: With a sf object data("FastFood.sf") f1 <- ~ Type q.test <- Q.test(formula = f1, data = FastFood.sf, m = c(3, 4), r = c(1, 2, 3), control = list(distance ="Euclidean")) summary(q.test) plot(q.test) print(q.test) # Case 3: With a sf object with isolated areas data("provinces_spain") sf::sf_use_s2(FALSE) provinces_spain$Mal2Fml<- factor(provinces_spain$Mal2Fml > 100) levels(provinces_spain$Mal2Fml) = c("men","woman") provinces_spain$Older <- cut(provinces_spain$Older, breaks = c(-Inf,19,22.5,Inf)) levels(provinces_spain$Older) = c("low","middle","high") f1 <- ~ Older + Mal2Fml q.test <- Q.test(formula = f1, data = provinces_spain, m = 3, r = 1, control = list(seedinit = 1111)) summary(q.test) print(q.test) plot(q.test) q.test.mc <- Q.test(formula = f1, data = provinces_spain, m = 4, r = 3, distr = "mc", control = list(seedinit = 1111)) summary(q.test.mc) print(q.test.mc) plot(q.test.mc) # Case 4: Examples with multipolygons library(sf) fname <- system.file("shape/nc.shp", package="sf") nc <- sf::st_read(fname) qb79 <- quantile(nc$BIR79) nc$QBIR79 <- (nc$BIR79 > qb79[2]) + (nc$BIR79 > qb79[3]) + (nc$BIR79 >= qb79[4]) + 1 nc$QBIR79 <- as.factor(nc$QBIR79) plot(nc["QBIR79"], pal = c("#FFFEDE","#FFDFA2", "#FFA93F", "#D5610D"), main = "BIR79 (Quartiles)") sid79 <- quantile(nc$SID79) nc$QSID79 <- (nc$SID79 > sid79[2]) + (nc$SID79 > sid79[3]) + (nc$SID79 >= sid79[4]) + 1 nc$QSID79 <- as.factor(nc$QSID79) plot(nc["QSID79"], pal = c("#FFFEDE","#FFDFA2", "#FFA93F", "#D5610D"), main = "SID79 (Quartiles)") f1 <- ~ QSID79 + QBIR79 lq1nc <- Q.test(formula = f1, data = nc, m = 5, r = 2, control = list(seedinit = 1111, dtmaxpc = 0.5, distance = "Euclidean") ) print(lq1nc) lq2nc <- Q.test(formula = f1, data = nc, m = 5, r = 2, control = list(dtmaxpc = 0.2) ) print(lq2nc) lq3nc <- Q.test(formula = f1, data = nc, m = 5, r = 2, control = list(dtmaxknn = 5) ) print(lq3nc) # Case 5: Examples with points and matrix of variables fx <- matrix(c(nc$QBIR79, nc$QSID79), ncol = 2, byrow = TRUE) mctr <- suppressWarnings(sf::st_centroid(st_geometry(nc))) mcoor <- st_coordinates(mctr)[,c("X","Y")] q.test <- Q.test(fx = fx, coor = mcoor, m = 5, r = 2, control = list(seedinit = 1111, dtmaxpc = 0.5)) print(q.test) plot(q.test)
This function compute the spatial scan test for Bernoulli and Multinomial categorical spatial process, and detect spatial clusters
scan.test(formula = NULL, data = NULL, fx = NULL, coor = NULL, case = NULL, nv = NULL, nsim = NULL, distr = NULL, windows = "circular", listw = NULL, alternative = "High", minsize = 1, control = list())
scan.test(formula = NULL, data = NULL, fx = NULL, coor = NULL, case = NULL, nv = NULL, nsim = NULL, distr = NULL, windows = "circular", listw = NULL, alternative = "High", minsize = 1, control = list())
formula |
a symbolic description of the factor (optional). |
data |
an (optional) data frame or a sf object containing the variable to testing for. |
fx |
a factor (optional). |
coor |
(optional) coordinates of observations. |
case |
Only for Bernoulli distribution. A element of factor, there are cases and non-cases for testing for cases versus non-cases |
nv |
Maximum windows size, default nv = N/2. The algorithm scan for clusters of geographic size between 1 and the upper limit (nv) defined by the user. |
nsim |
Number of permutations. |
distr |
distribution of the spatial process: "bernoulli" for two levels or "multinomial" for three or more levels. |
windows |
a string to select the type of cluster "circular" (default) of "elliptic". |
listw |
only for flexible windows. A neighbours list (an object of the class listw, nb or knn frop spdep) or an adjacency matrix. |
alternative |
Only for Bernoulli spatial process. A character string specifying the type of cluster, must be one of "High" (default), "Both" or "Low". |
minsize |
Minimum number of observations inside of Most Likely Cluster and secondary clusters. |
control |
List of additional control arguments. |
Two alternative sets of arguments can be included in this function to compute the scan test:
Option 1
: A factor (fx) and coordinates (coor).
Option 2
: A sf object (data) and the formula to specify the factor.
The function consider the coordinates of the centroids of the elements of th sf object.
The spatial scan statistics are widely used in epidemiology, criminology or ecology.
Their purpose is to analyse the spatial distribution of points or geographical
regions by testing the hypothesis of spatial randomness distribution on the
basis of different distributions (e.g. Bernoulli, Poisson or Normal distributions).
The scan.test
function obtain the scan statistic for two relevant distributions related
with categorical variables: the Bernoulli and Multinomial distribution.
The spatial scan statistic is based on the likelihood ratio test statistic and
is formulated as follows:
where Z represents the collection of scanning windows constructed on the study region,
is an alternative hypothesis,
is a null hypothesis, and
is the likelihood function with parameter
given window Z
.
The null hypothesis says that there is no spatial clustering on the study region, and
the alternative hypothesis is that there is a certain area with high (or low) rates of
outcome variables. The null and alternative hypotheses and the likelihood function may
be expressed in different ways depending on the probability model under consideration.
To test independence in a spatial process, under the null, the type of windows is
irrelevant but under the alternative the elliptic
windows can to identify with more precision the cluster.
For big data sets (N >>) the windows = "elliptic" can be so slowly
Bernoulli version
When we have dichotomous outcome variables, such as cases and noncases of certain
diseases, the Bernoulli model is used. The null hypothesis is written as
and the alternative hypothesis is
where p and q are the outcome probabilities (e.g., the probability of being a case)
inside and outside scanning window Z, respectively. Given window Z, the test statistic is:
where cz and nz are the numbers of cases and observations (cases and noncases) within z,
respectively, and C and N are the total numbers of cases and observations in the whole
study region, respectively.
Multinomial version of the scan test
The multinomial version of the spatial scan statistic is useful to investigate clustering
when a discrete spatial variable can take one and only one of k possible outcomes
that lack intrinsic order information. If the region defined by the moving window
is denoted by Z, the null hypothesis for the statistic can be stated as follows:
where is the probability of being of event type j inside the window Z,
and
is the probability of being of event type j outside the window.
The alternative hypothesis is that for at least one type event the probability
of being of that type is different inside and outside of the window.
The statistic is built as a likelihood ratio, and takes the following form after transformation using the natural logarithm:
where S is the total number of events in the study area and is the total number
of events of type j. The superscript Z denotes the same but for the sub-region
defined by the moving window.
The theoretical distribution of the statistic under the null hypothesis is not known,
and therefore significance is evaluated numerically by simulating neutral landscapes
(obtained using a random spatial process) and contrasting the empirically calculated
statistic against the frequency of values obtained from the neutral landscapes.
The results of the likelihood ratio serve to identify the most likely cluster,
which is followed by secondary clusters by the expedient of sorting them according to
the magnitude of the test. As usual, significance is assigned by the analyst, and the
cutoff value for significance reflects the confidence of the analyst, or tolerance for error.
When implementing the statistic, the analyst must decide the shape of the
window and the maximum number of cases that any given window can cover.
Currently, analysis can be done using circular or elliptical windows.
Elliptical windows are more time consuming to evaluate but provide greater
flexibility to contrast the distribution of events inside and outside the window,
and are our selected shape in the analyses to follow. Furthermore, it is recommended
that the maximum number of cases entering any given window does not exceed 50% of
all available cases.
A object of the htest and scantest class
method |
The type of test applied (). |
fx |
Factor included as input to get the scan test. |
MLC |
Observations included into the Most Likelihood Cluster (MLC). |
statistic |
Value of the scan test (maximum Log-likelihood ratio). |
N |
Total number of observations. |
nn |
Windows used to get the cluster. |
nv |
Maximum number of observations into the cluster. |
data.name |
A character string giving the name of the factor. |
coor |
coordinates. |
alternative |
Only for bernoulli spatial process. A character string describing the alternative hypothesis select by the user. |
p.value |
p-value of the scan test. |
cases.expect |
Expected cases into the MLC. |
cases.observ |
Observed cases into the MLC. |
nsim |
Number of permutations. |
scan.mc |
a (nsim x 1) vector with the loglik values under bootstrap permutation. |
secondary.clusters |
a list with the observations included into the secondary clusters. |
loglik.second |
a vector with the value of the secondary scan tests (maximum Log-likelihood ratio). |
p.value.secondary |
a vector with the p-value of the secondary scan tests. |
Alternative.MLC |
A vector with the observations included in another cluster with the same loglik than MLC. |
seedinit |
Numerical value for the seed (only for boot version). Default value seedinit=123 |
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
Kulldorff M, Nagarwalla N. (1995). Spatial disease clusters: Detection and Inference. Statistics in Medicine. 14:799-810
Jung I, Kulldorff M, Richard OJ (2010). A spatial scan statistic for multinomial data. Statistics in Medicine. 29(18), 1910-1918
Páez, A., López-Hernández, F.A., Ortega-García, J.A., Ruiz, M. (2016). Clustering and co-occurrence of cancer types: A comparison of techniques with an application to pediatric cancer in Murcia, Spain. Spatial Analysis in Health Geography, 69-90.
Tango T., Takahashi K. (2005). A flexibly shaped spatial scan statistic for detecting clusters, International Journal of Health Geographics 4:11.
local.sp.runs.test
, dgp.spq
, Q.test
,
# Case 1: Scan test bernoulli data(provinces_spain) sf::sf_use_s2(FALSE) provinces_spain$Mal2Fml <- factor(provinces_spain$Mal2Fml > 100) levels(provinces_spain$Mal2Fml) = c("men","woman") formula <- ~ Mal2Fml scan <- scan.test(formula = formula, data = provinces_spain, case="men", nsim = 99, distr = "bernoulli") print(scan) summary(scan) plot(scan, sf = provinces_spain) ## With maximum number of neighborhood scan <- scan.test(formula = formula, data = provinces_spain, case = "woman", nsim = 99, distr = "bernoulli") print(scan) plot(scan, sf = provinces_spain) ## With elliptic windows scan <- scan.test(formula = formula, data = provinces_spain, case = "men", nv = 25, nsim = 99, distr = "bernoulli", windows ="elliptic") print(scan) scan <- scan.test(formula = formula, data = provinces_spain, case = "men", nv = 15, nsim = 99, distr = "bernoulli", windows ="elliptic", alternative = "Low") print(scan) plot(scan, sf = provinces_spain) # Case 2: scan test multinomial data(provinces_spain) provinces_spain$Older <- cut(provinces_spain$Older, breaks = c(-Inf,19,22.5,Inf)) levels(provinces_spain$Older) = c("low","middle","high") formula <- ~ Older scan <- scan.test(formula = formula, data = provinces_spain, nsim = 99, distr = "multinomial") print(scan) plot(scan, sf = provinces_spain) # Case 3: scan test multinomial data(FastFood.sf) sf::sf_use_s2(FALSE) formula <- ~ Type scan <- scan.test(formula = formula, data = FastFood.sf, nsim = 99, distr = "multinomial", windows="elliptic", nv = 50) print(scan) summary(scan) plot(scan, sf = FastFood.sf) # Case 4: DGP two categories N <- 150 cx <- runif(N) cy <- runif(N) listw <- spdep::knearneigh(cbind(cx,cy), k = 10) p <- c(1/2,1/2) rho <- 0.5 fx <- dgp.spq(p = p, listw = listw, rho = rho) scan <- scan.test(fx = fx, nsim = 99, case = "A", nv = 50, coor = cbind(cx,cy), distr = "bernoulli",windows="circular") print(scan) plot(scan) # Case 5: DGP three categories N <- 200 cx <- runif(N) cy <- runif(N) listw <- spdep::knearneigh(cbind(cx,cy), k = 10) p <- c(1/3,1/3,1/3) rho <- 0.5 fx <- dgp.spq(p = p, listw = listw, rho = rho) scan <- scan.test(fx = fx, nsim = 19, coor = cbind(cx,cy), nv = 30, distr = "multinomial", windows = "elliptic") print(scan) plot(scan) # Case 6: Flexible windows data(provinces_spain) sf::sf_use_s2(FALSE) provinces_spain$Mal2Fml <- factor(provinces_spain$Mal2Fml > 100) levels(provinces_spain$Mal2Fml) = c("men","woman") formula <- ~ Mal2Fml listw <- spdep::poly2nb(provinces_spain, queen = FALSE) scan <- scan.test(formula = formula, data = provinces_spain, case="men", listw = listw, nv = 6, nsim = 99, distr = "bernoulli", windows = "flexible") print(scan) summary(scan) plot(scan, sf = provinces_spain)
# Case 1: Scan test bernoulli data(provinces_spain) sf::sf_use_s2(FALSE) provinces_spain$Mal2Fml <- factor(provinces_spain$Mal2Fml > 100) levels(provinces_spain$Mal2Fml) = c("men","woman") formula <- ~ Mal2Fml scan <- scan.test(formula = formula, data = provinces_spain, case="men", nsim = 99, distr = "bernoulli") print(scan) summary(scan) plot(scan, sf = provinces_spain) ## With maximum number of neighborhood scan <- scan.test(formula = formula, data = provinces_spain, case = "woman", nsim = 99, distr = "bernoulli") print(scan) plot(scan, sf = provinces_spain) ## With elliptic windows scan <- scan.test(formula = formula, data = provinces_spain, case = "men", nv = 25, nsim = 99, distr = "bernoulli", windows ="elliptic") print(scan) scan <- scan.test(formula = formula, data = provinces_spain, case = "men", nv = 15, nsim = 99, distr = "bernoulli", windows ="elliptic", alternative = "Low") print(scan) plot(scan, sf = provinces_spain) # Case 2: scan test multinomial data(provinces_spain) provinces_spain$Older <- cut(provinces_spain$Older, breaks = c(-Inf,19,22.5,Inf)) levels(provinces_spain$Older) = c("low","middle","high") formula <- ~ Older scan <- scan.test(formula = formula, data = provinces_spain, nsim = 99, distr = "multinomial") print(scan) plot(scan, sf = provinces_spain) # Case 3: scan test multinomial data(FastFood.sf) sf::sf_use_s2(FALSE) formula <- ~ Type scan <- scan.test(formula = formula, data = FastFood.sf, nsim = 99, distr = "multinomial", windows="elliptic", nv = 50) print(scan) summary(scan) plot(scan, sf = FastFood.sf) # Case 4: DGP two categories N <- 150 cx <- runif(N) cy <- runif(N) listw <- spdep::knearneigh(cbind(cx,cy), k = 10) p <- c(1/2,1/2) rho <- 0.5 fx <- dgp.spq(p = p, listw = listw, rho = rho) scan <- scan.test(fx = fx, nsim = 99, case = "A", nv = 50, coor = cbind(cx,cy), distr = "bernoulli",windows="circular") print(scan) plot(scan) # Case 5: DGP three categories N <- 200 cx <- runif(N) cy <- runif(N) listw <- spdep::knearneigh(cbind(cx,cy), k = 10) p <- c(1/3,1/3,1/3) rho <- 0.5 fx <- dgp.spq(p = p, listw = listw, rho = rho) scan <- scan.test(fx = fx, nsim = 19, coor = cbind(cx,cy), nv = 30, distr = "multinomial", windows = "elliptic") print(scan) plot(scan) # Case 6: Flexible windows data(provinces_spain) sf::sf_use_s2(FALSE) provinces_spain$Mal2Fml <- factor(provinces_spain$Mal2Fml > 100) levels(provinces_spain$Mal2Fml) = c("men","woman") formula <- ~ Mal2Fml listw <- spdep::poly2nb(provinces_spain, queen = FALSE) scan <- scan.test(formula = formula, data = provinces_spain, case="men", listw = listw, nv = 6, nsim = 99, distr = "bernoulli", windows = "flexible") print(scan) summary(scan) plot(scan, sf = provinces_spain)
This function compute the nonparametric test for spatial independence using symbolic analysis for categorical/qualitative spatial process.
similarity.test(formula = NULL, data = NULL, fx = NULL, listw = listw, alternative = "two.sided", distr = "asymptotic", nsim = NULL, control = list())
similarity.test(formula = NULL, data = NULL, fx = NULL, listw = listw, alternative = "two.sided", distr = "asymptotic", nsim = NULL, control = list())
formula |
a symbolic description of the factor (optional). |
data |
an (optional) data frame or a sf object containing the variable to testing for. |
fx |
a factor (optional). |
listw |
a listw object |
alternative |
a character string specifying the type of cluster, must be one of "High" (default), "Both" or "Low". |
distr |
A string. Distribution of the test "asymptotic" (default) or "bootstrap" |
nsim |
Number of permutations. |
control |
List of additional control arguments. |
This testing approach for spatial independence that extends some of the properties
of the join count statistic. The premise of the tests is similar to the join count
statistic in that they use the concept of similarity between neighbouring spatial
entities (Dacey 1968; Cliff and Ord 1973, 1981). However, it differs by taking
into consideration the number of joins belonging locally to each spatial unit,
rather than the total number of joins in the entire spatial system. The approach
proposed here is applicable to spatial and network contiguity structures, and
the number of neighbors belonging to observations need not be constant.
We define an equivalence relation in the set of locations S. An equivalence
relation satisfies the following properties:
Reflexive: for all
,
Symmetric: If , then
for all
and
Transitive: If and
, then
for all
We call the relation a similarity relation. Then, the null hypothesis that
we are interested in is
Assume that, under the null hypothesis, for all
.
Define
Then, for a fixed degree d and for all location si with degree d, the variable d
gives the number of nearest neighbours to si that are similar to si.
Therefore, under the null hypothesis, ,
follows a binomial
distribution
.
A object of the htest
data.name |
a character string giving the names of the data. |
statistic |
Value of the similarity test |
N |
total number of observations. |
Zmlc |
Elements in the Most Likelihood Cluster. |
alternative |
a character string describing the alternative hypothesis. |
p.value |
p-value of the similarity test |
similiarity.mc |
values of the similarity test in each permutation. |
seedinit |
Numerical value for the seed (only for boot version). Default value seedinit=123 |
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
Farber, S., Marin, M. R., & Paez, A. (2015). Testing for spatial independence using similarity relations. Geographical Analysis. 47(2), 97-120.
sp.runs.test
, dgp.spq
, Q.test
, , scan.test
# Case 1: N <- 100 cx <- runif(N) cy <- runif(N) listw <- spdep::knearneigh(cbind(cx,cy), k = 3) p <- c(1/4,1/4,1/4,1/4) rho <- 0.5 fx <- dgp.spq(p = p, listw = listw, rho = rho) W <- (spdep::nb2mat(spdep::knn2nb(listw)) >0)*1 similarity <- similarity.test(fx = fx, data = FastFood.sf, listw = listw) print(similarity) # Case 2: test with formula, a sf object (points) and knn data("FastFood.sf") coor <- sf::st_coordinates(sf::st_centroid(FastFood.sf)) listw <- spdep::knearneigh(coor, k = 4) formula <- ~ Type similarity <- similarity.test(formula = formula, data = FastFood.sf, listw = listw) print(similarity) # Case 3: data(provinces_spain) listw <- spdep::poly2nb(as(provinces_spain,"Spatial"), queen = FALSE) provinces_spain$Mal2Fml <- factor(provinces_spain$Mal2Fml > 100) levels(provinces_spain$Mal2Fml) = c("men","woman") formula <- ~ Mal2Fml similarity <- similarity.test(formula = formula, data = provinces_spain, listw = listw) print(similarity)
# Case 1: N <- 100 cx <- runif(N) cy <- runif(N) listw <- spdep::knearneigh(cbind(cx,cy), k = 3) p <- c(1/4,1/4,1/4,1/4) rho <- 0.5 fx <- dgp.spq(p = p, listw = listw, rho = rho) W <- (spdep::nb2mat(spdep::knn2nb(listw)) >0)*1 similarity <- similarity.test(fx = fx, data = FastFood.sf, listw = listw) print(similarity) # Case 2: test with formula, a sf object (points) and knn data("FastFood.sf") coor <- sf::st_coordinates(sf::st_centroid(FastFood.sf)) listw <- spdep::knearneigh(coor, k = 4) formula <- ~ Type similarity <- similarity.test(formula = formula, data = FastFood.sf, listw = listw) print(similarity) # Case 3: data(provinces_spain) listw <- spdep::poly2nb(as(provinces_spain,"Spatial"), queen = FALSE) provinces_spain$Mal2Fml <- factor(provinces_spain$Mal2Fml > 100) levels(provinces_spain$Mal2Fml) = c("men","woman") formula <- ~ Mal2Fml similarity <- similarity.test(formula = formula, data = provinces_spain, listw = listw) print(similarity)
This function compute the global spatial runs test for spatial independence of a categorical spatial data set.
sp.runs.test(formula = NULL, data = NULL, fx = NULL, listw = listw, alternative = "two.sided" , distr = "asymptotic", nsim = NULL,control = list())
sp.runs.test(formula = NULL, data = NULL, fx = NULL, listw = listw, alternative = "two.sided" , distr = "asymptotic", nsim = NULL,control = list())
formula |
a symbolic description of the factor (optional). |
data |
an (optional) data frame or a sf object containing the variable to testing for. |
fx |
a factor (optional). |
listw |
A neighbourhood list (type knn or nb) or a W matrix that indicates the order of the elements in each |
alternative |
a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". |
distr |
A string. Distribution of the test "asymptotic" (default) or "bootstrap". |
nsim |
Number of permutations to obtain pseudo-value and confidence intervals (CI). Default value is NULL to don't get CI of number of runs. |
control |
List of additional control arguments. |
The order of the neighbourhoods () is critical to obtain the test.
To obtain the number of runs observed in each , each element must be associated
with a set of neighbours ordered by proximity.
Three kinds of lists can be included to identify
:
knn
: Objects of the class knn that consider the neighbours in order of proximity.
nb
: If the neighbours are obtained from an sf object, the code internally
will call the function nb2nb_order
it will order them in order
of proximity of the centroids.
matrix
: If a object of matrix class based in the inverse of the distance in introduced
as argument, the function nb2nb_order
will also be called internally
to transform the object the class matrix to a matrix of the class nb with ordered neighbours.
Two alternative sets of arguments can be included in this function to compute the spatial runs test:
Option 1 |
A factor (fx) and a list of neighborhood (listw ) of the class knn. |
Option 2 |
A sf object (data) and formula to specify the factor. A list of neighbourhood (listw) |
A object of the htest and sprunstest class
data.name |
a character string giving the names of the data. |
method |
the type of test applied (). |
SR |
total number of runs |
dnr |
empirical distribution of the number of runs |
statistic |
Value of the homogeneity runs statistic. Negative sign indicates global homogeneity |
alternative |
a character string describing the alternative hypothesis. |
p.value |
p-value of the SRQ |
pseudo.value |
the pseudo p-value of the SRQ test if nsim is not NULL |
MeanNeig |
Mean of the Maximum number of neighborhood |
MaxNeig |
Maximum number of neighborhood |
listw |
The list of neighborhood |
nsim |
number of boots (only for boots version) |
SRGP |
nsim simulated values of statistic. |
SRLP |
matrix with the number of runs for eacl localization. |
In this section define the concepts of spatial encoding and runs, and construct the main statistics necessary
for testing spatial homogeneity of categorical variables. In order to develop a general theoretical setting,
let us consider to be the categorical spatial process of interest with Q different
categories, where S is a set of coordinates.
Spatial encoding:
For a location denote by
the set of neighbours according
to the interaction scheme W, which are ordered from lesser to higher Euclidean distance with respect to location s.
The sequence as its elements have the same value (or are identified by the same class)
is called a spatial run at location s of length r.
The total number of runs is defined as:
where for
Following result by the Central Limit Theorem, the asymtotical distribution of is:
In the one-tailed case, we must distinguish the lower-tailed test and the upper-tailed, which are associated
with homogeneity and heterogeneity respectively. In the case of the lower-tailed test,
the following hypotheses are used:
is i.i.d.
: The spatial distribution of the values of the categorical variable is more homogeneous than under the null hypothesis (according to the fixed association scheme).
In the upper-tailed test, the following hypotheses are used:
is i.i.d.
: The spatial distribution of the values of the categorical variable is more
heterogeneous than under the null hypothesis (according to the fixed association scheme).
These hypotheses provide a decision rule regarding the degree of homogeneity in the spatial distribution
of the values of the spatial categorical random variable.
seedinit |
Numerical value for the seed (only for boot version). Default value seedinit=123 |
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
Ruiz, M., López, F., and Páez, A. (2021). A test for global and local homogeneity of categorical data based on spatial runs. Working paper.
local.sp.runs.test
, dgp.spq
, Q.test
,
# Case 1: SRQ test based on factor and knn n <- 100 cx <- runif(n) cy <- runif(n) x <- cbind(cx,cy) listw <- spdep::knearneigh(cbind(cx,cy), k=3) p <- c(1/6,3/6,2/6) rho <- 0.5 fx <- dgp.spq(listw = listw, p = p, rho = rho) srq <- sp.runs.test(fx = fx, listw = listw) print(srq) plot(srq) # Boots Version control <- list(seedinit = 1255) srq <- sp.runs.test(fx = fx, listw = listw, distr = "bootstrap" , nsim = 299, control = control) print(srq) plot(srq) # Case 2: SRQ test with formula, a sf object (points) and knn data("FastFood.sf") x <- sf::st_coordinates(sf::st_centroid(FastFood.sf)) listw <- spdep::knearneigh(x, k=4) formula <- ~ Type srq <- sp.runs.test(formula = formula, data = FastFood.sf, listw = listw) print(srq) plot(srq) # Version boots srq <- sp.runs.test(formula = formula, data = FastFood.sf, listw = listw, distr = "bootstrap", nsim = 199) print(srq) plot(srq) # Case 3: SRQ test (permutation) using formula with a sf object (polygons) and nb library(sf) fname <- system.file("shape/nc.shp", package="sf") nc <- sf::st_read(fname) listw <- spdep::poly2nb(as(nc,"Spatial"), queen = FALSE) p <- c(1/6,3/6,2/6) rho = 0.5 co <- sf::st_coordinates(sf::st_centroid(nc)) nc$fx <- dgp.spq(listw = listw, p = p, rho = rho) plot(nc["fx"]) formula <- ~ fx srq <- sp.runs.test(formula = formula, data = nc, listw = listw, distr = "bootstrap", nsim = 399) print(srq) plot(srq) # Case 4: SRQ test (Asymptotic) using formula with a sf object (polygons) and nb data(provinces_spain) # sf::sf_use_s2(FALSE) listw <- spdep::poly2nb(provinces_spain, queen = FALSE) provinces_spain$Coast <- factor(provinces_spain$Coast) levels(provinces_spain$Coast) = c("no","yes") plot(provinces_spain["Coast"]) formula <- ~ Coast srq <- sp.runs.test(formula = formula, data = provinces_spain, listw = listw) print(srq) plot(srq) # Boots version srq <- sp.runs.test(formula = formula, data = provinces_spain, listw = listw, distr = "bootstrap", nsim = 299) print(srq) plot(srq) # Case 5: SRQ test based on a distance matrix (inverse distance) N <- 100 cx <- runif(N) cy <- runif(N) data <- as.data.frame(cbind(cx,cy)) data <- sf::st_as_sf(data,coords = c("cx","cy")) n = dim(data)[1] dis <- 1/matrix(as.numeric(sf::st_distance(data,data)),ncol=n,nrow=n) diag(dis) <- 0 dis <- (dis < quantile(dis,.10))*dis p <- c(1/6,3/6,2/6) rho <- 0.5 fx <- dgp.spq(listw = dis , p = p, rho = rho) srq <- sp.runs.test(fx = fx, listw = dis) print(srq) plot(srq) srq <- sp.runs.test(fx = fx, listw = dis, data = data) print(srq) plot(srq) # Boots version srq <- sp.runs.test(fx = fx, listw = dis, data = data, distr = "bootstrap", nsim = 299) print(srq) plot(srq) # Case 6: SRQ test based on a distance matrix (inverse distance) data("FastFood.sf") # sf::sf_use_s2(FALSE) n = dim(FastFood.sf)[1] dis <- 1000000/matrix(as.numeric(sf::st_distance(FastFood.sf,FastFood.sf)), ncol = n, nrow = n) diag(dis) <- 0 dis <- (dis < quantile(dis,.005))*dis p <- c(1/6,3/6,2/6) rho = 0.5 co <- sf::st_coordinates(sf::st_centroid(FastFood.sf)) FastFood.sf$fx <- dgp.spq(p = p, listw = dis, rho = rho) plot(FastFood.sf["fx"]) formula <- ~ fx # Boots version srq <- sp.runs.test(formula = formula, data = FastFood.sf, listw = dis, distr = "bootstrap", nsim = 299) print(srq) plot(srq)
# Case 1: SRQ test based on factor and knn n <- 100 cx <- runif(n) cy <- runif(n) x <- cbind(cx,cy) listw <- spdep::knearneigh(cbind(cx,cy), k=3) p <- c(1/6,3/6,2/6) rho <- 0.5 fx <- dgp.spq(listw = listw, p = p, rho = rho) srq <- sp.runs.test(fx = fx, listw = listw) print(srq) plot(srq) # Boots Version control <- list(seedinit = 1255) srq <- sp.runs.test(fx = fx, listw = listw, distr = "bootstrap" , nsim = 299, control = control) print(srq) plot(srq) # Case 2: SRQ test with formula, a sf object (points) and knn data("FastFood.sf") x <- sf::st_coordinates(sf::st_centroid(FastFood.sf)) listw <- spdep::knearneigh(x, k=4) formula <- ~ Type srq <- sp.runs.test(formula = formula, data = FastFood.sf, listw = listw) print(srq) plot(srq) # Version boots srq <- sp.runs.test(formula = formula, data = FastFood.sf, listw = listw, distr = "bootstrap", nsim = 199) print(srq) plot(srq) # Case 3: SRQ test (permutation) using formula with a sf object (polygons) and nb library(sf) fname <- system.file("shape/nc.shp", package="sf") nc <- sf::st_read(fname) listw <- spdep::poly2nb(as(nc,"Spatial"), queen = FALSE) p <- c(1/6,3/6,2/6) rho = 0.5 co <- sf::st_coordinates(sf::st_centroid(nc)) nc$fx <- dgp.spq(listw = listw, p = p, rho = rho) plot(nc["fx"]) formula <- ~ fx srq <- sp.runs.test(formula = formula, data = nc, listw = listw, distr = "bootstrap", nsim = 399) print(srq) plot(srq) # Case 4: SRQ test (Asymptotic) using formula with a sf object (polygons) and nb data(provinces_spain) # sf::sf_use_s2(FALSE) listw <- spdep::poly2nb(provinces_spain, queen = FALSE) provinces_spain$Coast <- factor(provinces_spain$Coast) levels(provinces_spain$Coast) = c("no","yes") plot(provinces_spain["Coast"]) formula <- ~ Coast srq <- sp.runs.test(formula = formula, data = provinces_spain, listw = listw) print(srq) plot(srq) # Boots version srq <- sp.runs.test(formula = formula, data = provinces_spain, listw = listw, distr = "bootstrap", nsim = 299) print(srq) plot(srq) # Case 5: SRQ test based on a distance matrix (inverse distance) N <- 100 cx <- runif(N) cy <- runif(N) data <- as.data.frame(cbind(cx,cy)) data <- sf::st_as_sf(data,coords = c("cx","cy")) n = dim(data)[1] dis <- 1/matrix(as.numeric(sf::st_distance(data,data)),ncol=n,nrow=n) diag(dis) <- 0 dis <- (dis < quantile(dis,.10))*dis p <- c(1/6,3/6,2/6) rho <- 0.5 fx <- dgp.spq(listw = dis , p = p, rho = rho) srq <- sp.runs.test(fx = fx, listw = dis) print(srq) plot(srq) srq <- sp.runs.test(fx = fx, listw = dis, data = data) print(srq) plot(srq) # Boots version srq <- sp.runs.test(fx = fx, listw = dis, data = data, distr = "bootstrap", nsim = 299) print(srq) plot(srq) # Case 6: SRQ test based on a distance matrix (inverse distance) data("FastFood.sf") # sf::sf_use_s2(FALSE) n = dim(FastFood.sf)[1] dis <- 1000000/matrix(as.numeric(sf::st_distance(FastFood.sf,FastFood.sf)), ncol = n, nrow = n) diag(dis) <- 0 dis <- (dis < quantile(dis,.005))*dis p <- c(1/6,3/6,2/6) rho = 0.5 co <- sf::st_coordinates(sf::st_centroid(FastFood.sf)) FastFood.sf$fx <- dgp.spq(p = p, listw = dis, rho = rho) plot(FastFood.sf["fx"]) formula <- ~ fx # Boots version srq <- sp.runs.test(formula = formula, data = FastFood.sf, listw = dis, distr = "bootstrap", nsim = 299) print(srq) plot(srq)
This function summarizes estimated spjctest objects. The tables in the output include basic information for each test. blablabla...
## S3 method for class 'spjctest' summary(object, ...)
## S3 method for class 'spjctest' summary(object, ...)
object |
An spjctest object including a list of htest. |
... |
further arguments passed to or from other methods. |
An object of class summary.spjctest
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
print.summary.spqtest
,
joincount.test
,
joincount.multi
## Multinomial + Binomial using a sf multipolygon data("provinces_spain") sf::sf_use_s2(FALSE) provinces_spain$Mal2Fml <- factor(provinces_spain$Mal2Fml > 100) levels(provinces_spain$Mal2Fml) = c("men","woman") provinces_spain$Older <- cut(provinces_spain$Older, breaks = c(-Inf,19,22.5,Inf)) levels(provinces_spain$Older) = c("low","middle","high") f1 <- ~ Older + Mal2Fml jc1 <- jc.test(formula = f1, data = provinces_spain, distr = "mc", alternative = "greater", zero.policy = TRUE) summary(jc1)
## Multinomial + Binomial using a sf multipolygon data("provinces_spain") sf::sf_use_s2(FALSE) provinces_spain$Mal2Fml <- factor(provinces_spain$Mal2Fml > 100) levels(provinces_spain$Mal2Fml) = c("men","woman") provinces_spain$Older <- cut(provinces_spain$Older, breaks = c(-Inf,19,22.5,Inf)) levels(provinces_spain$Older) = c("low","middle","high") f1 <- ~ Older + Mal2Fml jc1 <- jc.test(formula = f1, data = provinces_spain, distr = "mc", alternative = "greater", zero.policy = TRUE) summary(jc1)
This function summarizes estimated spqtest objects. The tables in the output include basic information for each test. blablabla...
## S3 method for class 'spqtest' summary(object, ...)
## S3 method for class 'spqtest' summary(object, ...)
object |
An spqtest object including a list of htest. |
... |
further arguments passed to or from other methods. |
An object of class summary.spqtest
Fernando López | [email protected] |
Román Mínguez | [email protected] |
Antonio Páez | [email protected] |
Manuel Ruiz | [email protected] |
# Example 1: With coordinates N <- 100 cx <- runif(N) cy <- runif(N) coor <- cbind(cx,cy) p <- c(1/6,3/6,2/6) rho = 0.5 listw <- spdep::nb2listw(spdep::knn2nb( spdep::knearneigh(cbind(cx, cy), k = 4))) fx <- dgp.spq(list = listw, p = p, rho = rho) q.test <- Q.test(fx = fx, coor = coor, m = 3, r = 1) summary(q.test) plot(q.test)
# Example 1: With coordinates N <- 100 cx <- runif(N) cy <- runif(N) coor <- cbind(cx,cy) p <- c(1/6,3/6,2/6) rho = 0.5 listw <- spdep::nb2listw(spdep::knn2nb( spdep::knearneigh(cbind(cx, cy), k = 4))) fx <- dgp.spq(list = listw, p = p, rho = rho) q.test <- Q.test(fx = fx, coor = coor, m = 3, r = 1) summary(q.test) plot(q.test)