Temporal Network Kernel density estimate
Description
Calculate the Temporal Network Kernel Density Estimate based on a network of lines,
sampling points in space and times, and events in space and time.
Usage
tnkde(
lines,
events,
time_field,
w,
samples_loc,
samples_time,
kernel_name,
bw_net,
bw_time,
adaptive = FALSE,
adaptive_separate = TRUE,
trim_bw_net = NULL,
trim_bw_time = NULL,
method,
div = "bw",
diggle_correction = FALSE,
study_area = NULL,
max_depth = 15,
digits = 5,
tol = 0.1,
agg = NULL,
sparse = TRUE,
grid_shape = c(1, 1),
verbose = TRUE,
check = TRUE
)
tnkde(
lines,
events,
time_field,
w,
samples_loc,
samples_time,
kernel_name,
bw_net,
bw_time,
adaptive = FALSE,
adaptive_separate = TRUE,
trim_bw_net = NULL,
trim_bw_time = NULL,
method,
div = "bw",
diggle_correction = FALSE,
study_area = NULL,
max_depth = 15,
digits = 5,
tol = 0.1,
agg = NULL,
sparse = TRUE,
grid_shape = c(1, 1),
verbose = TRUE,
check = TRUE
)
Arguments
lines |
A feature collection of linestrings representing the underlying network. The
geometries must be simple Linestrings (may crash if some geometries
are invalid) without MultiLineSring.
|
events |
events A feature collection of points representing the events on the
network. The points will be snapped on the network to their closest line.
|
time_field |
The name of the field in events indicating when the events
occurred. It must be a numeric field
|
w |
A vector representing the weight of each event
|
samples_loc |
A feature collection of points representing the locations for
which the densities will be estimated.
|
samples_time |
A numeric vector indicating when the densities will be sampled
|
kernel_name |
The name of the kernel to use. Must be one of triangle,
gaussian, tricube, cosine, triweight, quartic, epanechnikov or uniform.
|
bw_net |
The network kernel bandwidth (using the scale of the lines),
can be a single float or a numeric vector if a different bandwidth must be
used for each event.
|
bw_time |
The time kernel bandwidth, can be a single float or a numeric
vector if a different bandwidth must be used for each event.
|
adaptive |
A Boolean, indicating if an adaptive bandwidth must be used.
Both spatial and temporal bandwidths are adapted but separately.
|
adaptive_separate |
A boolean indicating if the adaptive bandwidths
for the time and the network dimensions must be calculated separately (TRUE) or in
interaction (FALSE)
|
trim_bw_net |
A float, indicating the maximum value for the adaptive
network bandwidth
|
trim_bw_time |
A float, indicating the maximum value for the adaptive
time bandwidth
|
method |
The method to use when calculating the NKDE, must be one of
simple / discontinuous / continuous (see nkde details for more information)
|
div |
The divisor to use for the kernel. Must be "n" (the number of
events within the radius around each sampling point), "bw" (the bandwith)
"none" (the simple sum).
|
diggle_correction |
A Boolean indicating if the correction factor
for edge effect must be used.
|
study_area |
A feature collection of polygons
representing the limits of the study area.
|
max_depth |
when using the continuous and discontinuous methods, the
calculation time and memory use can go wild if the network has many
small edges (area with many of intersections and many events). To
avoid it, it is possible to set here a maximum depth. Considering that the
kernel is divided at intersections, a value of 10 should yield good
estimates in most cases. A larger value can be used without a problem for the
discontinuous method. For the continuous method, a larger value will
strongly impact calculation speed.
|
digits |
The number of digits to retain from the spatial coordinates. It
ensures that topology is good when building the network. Default is 3. Too high a
precision (high number of digits) might break some connections
|
tol |
A float indicating the minimum distance between the events and the
lines' extremities when adding the point to the network. When points are
closer, they are added at the extremity of the lines.
|
agg |
A double indicating if the events must be aggregated within a
distance. If NULL, the events are aggregated only by rounding the
coordinates.
|
sparse |
A Boolean indicating if sparse or regular matrices should be
used by the Rcpp functions. These matrices are used to store edge indices
between two nodes in a graph. Regular matrices are faster, but require more
memory, in particular with multiprocessing. Sparse matrices are slower (a
bit), but require much less memory.
|
grid_shape |
A vector of two values indicating how the study area
must be split when performing the calculus. Default is c(1,1) (no split). A finer grid could
reduce memory usage and increase speed when a large dataset is used. When using
multiprocessing, the work in each grid is dispatched between the workers.
|
verbose |
A Boolean, indicating if the function should print messages
about the process.
|
check |
A Boolean indicating if the geometry checks must be run before
the operation. This might take some times, but it will ensure that the CRS
of the provided objects are valid and identical, and that geometries are valid.
|
Details
Temporal Network Kernel Density Estimate
The TNKDE is an extension of the NKDE considering both the location of events on the network and
in time. Thus, density estimation (density sampling) can be done along lines of the network and
at different time. It can be used with the three NKDE (simple, discontinuous and continuous).
density in time and space
Two bandwidths must be provided, one for the network distance and one for the
time distance. They are both used to calculate the contribution of each event
to each sampling point. Let us consider one event E and a sample S. dnet(E,S)
is the contribution to network density of E at S location and dtime(E,S) is
the contribution to time density of E at S time. The total contribution is
thus dnet(E,S) * dtime(E,S). If one of the two densities is 0, then the total
density is 0 because the sampling point is out of the covered area by the
event in time or in the network space.
adaptive bandwidth
It is possible to use an adaptive bandwidth both on the network and in time.
Adaptive bandwidths are calculated using the Abramson’s smoothing regimen
(Abramson 1982). To do so, the original fixed
bandwidths must be specified (bw_net and bw_time parameters).
The maximum size of the two local bandwidths can be limited with
the parameters trim_bw_net and trim_bw_time.
Diggle correction factor
A set of events can be limited in both space (limits of the study
area) and time ( beginning and ending of the data collection period). These
limits induce lower densities at the border of the set of events, because
they are not sampled outside the limits. It is possible to apply the Diggle
correction factor (Diggle 1985) in both the
network and time spaces to minimize this effect.
Separated or simultaneous adaptive bandwidth
When the parameter adaptive is TRUE, one can choose between using separated
calculation of network and temporal bandwidths, and calculating them
simultaneously. In the first case (default), the network bandwidths are
determined for each event by considering only their locations and the time
bandwidths are determined by considering only there time stamps. In the second
case, for each event, the spatio-temporal density at its location on the
network and in time is estimated and used to determine both the network and
temporal bandwidths. This second approach must be preferred if the events are
characterized by a high level of spatio-temporal autocorrelation.
Value
A matrix with the estimated density for each sample point (rows) at
each timestamp (columns). If adaptive = TRUE, the function returns a list
with two slots: k (the matrix with the density values) and events (a
feature collection of points with the local bandwidths).
Examples
# loading the data
data(mtl_network)
data(bike_accidents)
# converting the Date field to a numeric field (counting days)
bike_accidents$Time <- as.POSIXct(bike_accidents$Date, format = "%Y/%m/%d")
start <- as.POSIXct("2016/01/01", format = "%Y/%m/%d")
bike_accidents$Time <- difftime(bike_accidents$Time, start, units = "days")
bike_accidents$Time <- as.numeric(bike_accidents$Time)
# creating sample points
lixels <- lixelize_lines(mtl_network, 50)
sample_points <- lines_center(lixels)
# choosing sample in times (every 10 days)
sample_time <- seq(0, max(bike_accidents$Time), 10)
# calculating the densities
tnkde_densities <- tnkde(lines = mtl_network,
events = bike_accidents, time_field = "Time",
w = rep(1, nrow(bike_accidents)),
samples_loc = sample_points,
samples_time = sample_time,
kernel_name = "quartic",
bw_net = 700, bw_time = 60, adaptive = TRUE,
trim_bw_net = 900, trim_bw_time = 80,
method = "discontinuous", div = "bw",
max_depth = 10, digits = 2, tol = 0.01,
agg = 15, grid_shape = c(1,1),
verbose = FALSE)
data(mtl_network)
data(bike_accidents)
bike_accidents$Time <- as.POSIXct(bike_accidents$Date, format = "%Y/%m/%d")
start <- as.POSIXct("2016/01/01", format = "%Y/%m/%d")
bike_accidents$Time <- difftime(bike_accidents$Time, start, units = "days")
bike_accidents$Time <- as.numeric(bike_accidents$Time)
lixels <- lixelize_lines(mtl_network, 50)
sample_points <- lines_center(lixels)
sample_time <- seq(0, max(bike_accidents$Time), 10)
tnkde_densities <- tnkde(lines = mtl_network,
events = bike_accidents, time_field = "Time",
w = rep(1, nrow(bike_accidents)),
samples_loc = sample_points,
samples_time = sample_time,
kernel_name = "quartic",
bw_net = 700, bw_time = 60, adaptive = TRUE,
trim_bw_net = 900, trim_bw_time = 80,
method = "discontinuous", div = "bw",
max_depth = 10, digits = 2, tol = 0.01,
agg = 15, grid_shape = c(1,1),
verbose = FALSE)