relgam is a package that fits reluctant
generalized additive models (RGAM), a new method for
fitting sparse generalized additive models (GAM). RGAM is
computationally scalable and works with continuous, binary, count and
survival data.
We introduce some notation that we will use throughout this vignette. Let there be \(n\) observations, each with feature vector \(x_i \in \mathbb{R}^p\) and response \(y_i\). Let \(\mathbf{X} \in \mathbb{R}^{n \times p}\) denote the overall feature matrix, and let \(y \in \mathbb{R}^n\) denote the vector of responses. Let \(X_j \in \mathbb{R}^n\) denote the \(j\)th column of \(\mathbf{X}\).
Assume that the columns of \(\mathbf{X}\), i.e. \(X_1, \dots, X_p\), are standardized to have
sample standard deviation \(1\). Assume
that we have specified a scaling hyperparmeter \(\gamma \in [0,1]\), a degrees of freedom
hyperparameter \(d\), and a path of
tuning parameters \(\lambda_1 > \dots >
\lambda_m \geq 0\). RGAM’s model-fitting algorithm, implemented
in the function rgam(), consists of 3 steps:
Fit the lasso of \(y\) on \(\mathbf{X}\) to get coefficients \(\hat{\beta}\). Compute the residuals \(r = y - \mathbf{X} \hat{\beta}\), using the \(\lambda\) hyperparameter selected by cross-validation.
For each \(j \in \{ 1, \dots, p \}\), fit a \(d\)-degree smoothing spline of \(r\) on \(X_j\) which we denote by \(\hat{f}_j\). Rescale \(\hat{f}_j\) so that \(\overline{\text{sd}}(\hat{f}_j) = \gamma\). Let \(\mathbf{F} \in \mathbb{R}^{n \times p}\) denote the matrix whose columns are the \(\hat{f}_j(X_j)\)’s.
Fit the lasso of \(y\) on \(\mathbf{X}\) and \(\mathbf{F}\) for the path of tuning parameters \(\lambda_1 > \dots > \lambda_m\).
Steps 1 and 3 are implemented using glmnet::glmnet()
while Step 2 is implemented using stats::smooth.spline().
(Note that the path of tuning parameters \(\lambda_1 > \dots > \lambda_m\) are
only used in Step 3; for Step 1 we use glmnet::glmnet()’s
default lambda path.) We will refer to these 3 steps by their numbers
(e.g. “Step 1”) throughout the rest of the vignette.
The rgam() function fits this model and returns an
object with class “rgam”. The relgam package includes
methods for prediction and plotting for “rgam” objects, as well as a
function which performs \(k\)-fold
cross-validation for rgam().
For more details, please see our paper on arXiv.
The simplest way to obtain relgam is to install it
directly from CRAN. Type the following command in R console:
This command downloads the R package and installs it to the default
directories. Users may change add a repos option to the
function call to specify which repository to download from, depending on
their locations and preferences.
Alternatively, users can download the package source at CRAN and type Unix commands to install it to the desired location.
rgam() functionThe purpose of this section is to give users a general sense of the
rgam() function, which is the workhorse of this package.
First, we load the relgam package:
Let’s generate some data:
set.seed(1)
n <- 100; p <- 12
x = matrix(rnorm((n) * p), ncol = p)
f4 = 2 * x[,4]^2 + 4 * x[,4] - 2
f5 = -2 * x[, 5]^2 + 2
f6 = 0.5 * x[, 6]^3
mu = rowSums(x[, 1:3]) + f4 + f5 + f6
y = mu + sqrt(var(mu) / 4) * rnorm(n)We fit the model using the most basic call to
rgam():
fit <- rgam(x, y, verbose = FALSE)
#> init_nz not specified: setting to default (all features)
#> using default value of gamma for RGAM: 0.6(If verbose = TRUE (default), model-fitting is tracked
with a progress bar in the console. For the purposes of this vignette,
we will be setting verbose = FALSE.)
rgam() has an init_nz option which
(partially) determines which columns in the \(\mathbf{X}\) matrix will have non-linear
features computed in Step 2 of the RGAM algorithm. \(X_j\) will have a non-linear feature
computed for it if (i) it was one of the features selected by
cross-validation in Step 1, or (ii) it is one of the indices specified
in init_nz. By default, init_nz = 1:ncol(x),
i.e. non-linear features are computed for all the original features.
Another sensible default is init_nz = c(), i.e. non-linear
features computed for just those selected in Step 1. (This version of
the RGAM algorithm is denoted by RGAM_SEL in our paper.)
Below, we fit the model with a different init_nz
value:
You might have noticed that in both cases above, we did not specify a
value for the gamma hyperparameter: rgam()
chose one for us and informed us of the choice. The default value for
gamma is 0.6 if init_nz = c(), and is 0.8 in
all other cases.
The degrees of freedom hyperparameter for Step 2 of the RGAM
algorithm can be set through the df option, the default
value is 4. Here is an example of fitting the RGAM model with different
hyperparameters:
fit <- rgam(x, y, gamma = 0.6, df = 8, verbose = FALSE)
#> init_nz not specified: setting to default (all features)The function rgam() fits a RGAM for a path of lambda
values and returns a rgam object. Typical usage is to have
rgam() specify the lambda sequence on its own. The returned
rgam object contains some useful information on the fitted
model. For a given value of the \(\lambda\) hyperparameter, RGAM gives the
predictions of the form
\(\hat{y} = \sum_{j=1}^p \hat{\alpha}_j X_j + \sum_{j = 1}^p \hat{\beta}_j \hat{f}_j(X_j),\)
where \(\hat{f}_j(X_j)\) are the
non-linear features constructed in Step 2, while the \(\hat{\alpha}_j\) and \(\hat{\beta}_j\) are the fitted coefficients
from Step 3. The returned RGAM object has nzero_feat,
nzero_lin and nzero_nonlin keys which tell us
how many features, linear components and non-linear components were
included in the model respectively. (In mathematical notation,
nzero_lin and nzero_nonlin count the number of
non-zero \(\hat{\alpha}_j\) and \(\hat{\beta}_j\) respectively, while
nzero_feat counts the number of \(j\) such that at least one of \(\hat{\alpha}_j\) and \(\hat{\beta}_j\) is non-zero).
# no. of features/linear components/non-linear components for 20th lambda value
fit$nzero_feat[20]
#> [1] 7
fit$nzero_lin[20]
#> s19
#> 5
fit$nzero_nonlin[20]
#> s19
#> 5While the nzero_feat, nzero_lin and
nzero_nonlin keys tell us the number of features, linear
components and non-linear components included for each lambda value, the
feat, linfeat and nonlinfeat tell
us the indices of these features or components.
# features included in the model for 20th lambda value
fit$feat[[20]]
#> [1] 2 3 4 5 6 10 11
# features which have a linear component in the model for 20th lambda value
fit$linfeat[[20]]
#> [1] 2 3 4 5 6
# features which have a non-linear component in the model for 20th lambda value
fit$nonlinfeat[[20]]
#> [1] 4 5 6 10 11In general, we have
fit$nzero_feat[[i]] == length(fit$feat[[i]]),
fit$nzero_lin[[i]] == length(fit$linfeat[[i]]) and
fit$nzero_nonlin[[i]] == length(fit$nonlinfeat[[i]]).
Predictions from this model can be obtained by using the
predict method of the rgam() function output:
each column gives the predictions for a value of
lambda.
# get predictions for 20th model for first 5 observations
predict(fit, x[1:5, ])[, 20]
#> [1] 1.405611 -3.099824 7.363352 -1.607129 6.731922The getf() function is a convenience function that gives
the component of the prediction due to one input variable. That is, if
RGAM gives predictions
\(\hat{y} = \sum_{j=1}^p \hat{\alpha}_j X_j + \sum_{j = 1}^p \hat{\beta}_j \hat{f}_j(X_j),\)
then calling getf() on \(X_j\) returns \(\hat{\alpha}_j X_j + \hat{\beta}_j
\hat{f}_j(X_j)\). For example, the code below gives the component
of the response due to variable 5 at the 20th lambda value:
f5 <- getf(fit, x, j = 5, index = 20)
as.vector(f5)
#> [1] -0.13304318 -2.00863694 1.96464340 1.93378169 1.94991340
#> [6] 1.92377692 1.91777013 1.86878106 -0.65036784 1.87681233
#> [11] 1.93201613 0.17651878 -0.19184804 1.76238092 1.98654429
#> [16] 0.33405232 -0.07528638 1.83451868 -0.89898936 -1.29849231
#> [21] -0.63786652 1.85989858 1.60313054 -2.98869574 0.26926377
#> [26] -1.02562549 -2.21607047 1.74898375 1.93699335 1.89668756
#> [31] -4.76831809 0.36883370 -1.53923748 1.52890093 0.24007968
#> [36] 1.78664185 0.01557101 1.53029703 1.93123727 0.18243144
#> [41] 1.96018571 -1.51099570 1.51568391 0.03770175 0.99948882
#> [46] -12.46015985 -0.51799335 1.95463545 0.92024554 -6.95816165
#> [51] -0.02908100 1.96380259 1.79544727 -1.69202543 -1.04634397
#> [56] 1.35848484 0.35573360 -5.20968279 1.17469668 0.31613621
#> [61] 1.96999456 -3.44212299 -0.34692268 0.23707779 1.46897875
#> [66] 0.81554290 1.86592129 0.20086542 -0.72468916 -0.14309208
#> [71] 1.68469115 -2.09839818 1.67593630 0.46982723 -0.49566369
#> [76] -0.24891515 0.62060428 0.92848912 -0.27962342 1.53455401
#> [81] 1.91151048 0.61176526 1.98735560 1.91475172 -4.51952120
#> [86] -5.04972249 1.75240633 1.03472085 -4.36137765 0.34758804
#> [91] 0.85192467 -4.14011827 1.52296935 1.23187712 -10.48430946
#> [96] 0.77602901 0.74828002 0.78448785 0.65690671 1.55307241We can use this to make a plot showing the effect of variable 5 on the response:
(The “Plots and summaries” section shows how to get such a plot more easily.)
Let’s fit the basic rgam model again:
fit <- rgam(x, y, verbose = FALSE)
#> init_nz not specified: setting to default (all features)
#> using default value of gamma for RGAM: 0.6fit is a class “rgam” object which comes with a
plot method. The plot method shows us the
relationship our predicted response has with each input feature, i.e. it
plots \(\hat{\alpha}_j X_j + \hat{\beta}_j
\hat{f}_j(X_j)\) vs. \(X_j\) for
each \(j\). Besides passing
fit to the plot() call, the user must also
pass an input matrix x: this is used to determine the
coordinate limits for the plot. It is recommended that the user simply
pass in the same input matrix that the RGAM model was fit on.
By default, plot() gives the fitted functions for the
last value of the lambda key in fit, and gives
just the plots for the first 4 features:
par(mfrow = c(1, 4))
par(mar = c(4, 2, 2, 2))
plot(fit, x)
#> Warning in plot.rgam(fit, x): Plotting first 4 variables by defaultThe user can specify the index of the lambda value and which feature
plots to show using the index and which
options respectively:
# show fitted functions for x2, x5 and x8 at the model for the 15th lambda value
par(mfrow = c(1, 3))
plot(fit, x, index = 15, which = c(2, 5, 8))Linear functions are colored green, non-linear functions are colored red, while zero functions are colored blue.
Class “rgam” objects also have a summary method which
allows the user to see the coefficient profiles of the linear and
non-linear features. On each plot (one for linear features and one for
non-linear features), the x-axis is the \(\lambda\) value going from large to small
and the y-axis is the coefficient of the feature.
By default the coefficient profiles are plotted for all the
variables. We can plot for just a subset of the features by specifying
the which option. We can also include annotations to show
which profile belongs to which feature by specifying
label = TRUE.
We can perform \(k\)-fold
cross-validation (CV) for RGAM with cv.rgam(). It does
10-fold cross-validation by default:
We can change the number of folds using the nfolds
option:
If we want to specify which observation belongs to which fold, we can
do that by specifying the foldid option, which is a vector
of length \(n\), with the \(i\)th element being the fold number for
observation \(i\).
set.seed(3)
foldid <- sample(rep(seq(10), length = n))
cvfit <- cv.rgam(x, y, init_nz = 1:ncol(x), gamma = 0.6, foldid = foldid, verbose = FALSE)A cv.rgam() call returns a cv.rgam object.
We can plot this object to get the CV curve with error bars (one
standard error in each direction). The left vertical dotted line
represents lambda.min, the lambda value which
attains minimum CV error, while the right vertical dotted line
represents lambda.1se, the largest lambda
value with CV error within one standard error of the minimum CV
error.
The numbers at the top represent the number of features in our original input matrix that are included in the model (i.e. the number of \(j\) such that at least one of \(\hat{\alpha}_j\) and \(\hat{\beta}_j\) is non-zero).
The two special lambda values can be extracted directly
from the cv.rgam object as well:
Predictions can be made from the fitted cv.rgam object.
By default, predictions are given for lambda being equal to
lambda.1se. To get predictions are lambda.min,
set s = "lambda.min".
In the examples above, y was a quantitative variable
(i.e. takes values along the real number line). As such, using the
default family = "gaussian" for rgam() was
appropriate. The RGAM algorithm, however, is very flexible and can be
used when y is not a quantitative variable.
rgam() is currently implemented for three other family
values: "binomial", "poisson" and
"cox". The rgam() and cv.rgam()
functions, as well as their methods, can be used as above but with the
family option specified appropriately. In the sections
below we point out some details that are particular to each family.
In this setting, the response y should be a numeric
vector containing just 0s and 1s. When doing prediction, note that by
default predict() gives just the value of the linear
predictors, i.e.
\(\log [\hat{p} / (1 - \hat{p})] = \hat{y} = \sum_{j=1}^p \hat{\alpha}_j X_j + \sum_{j = 1}^p \hat{\beta}_j \hat{f}_j(X_j),\)
where \(\hat{p}\) is the predicted
probability. To get the predicted probability, the user has to pass
type = "response" to the predict() call.
# fit binary model
bin_y <- ifelse(y > 0, 1, 0)
binfit <- rgam(x, bin_y, family = "binomial", init_nz = c(), gamma = 0.9,
verbose = FALSE)
# linear predictors for first 5 observations at 10th model
predict(binfit, x[1:5, ])[, 10]
#> [1] 0.1955678 -0.5761350 0.6273256 -0.2976725 0.5075271
# predicted probabilities for first 5 observations at 10th model
predict(binfit, x[1:5, ], type = "response")[, 10]
#> [1] 0.5487367 0.3598224 0.6518828 0.4261265 0.6242266For Poisson regression, the response y should be a
vector of count data. While rgam() does not expect each
element to be an integer, it will throw an error if any of the elements
are negative.
As with logistic regression, by default predict() gives
just the value of the linear predictors, i.e.
\(\log \hat{\mu} = \hat{y} = \sum_{j=1}^p \hat{\alpha}_j X_j + \sum_{j = 1}^p \hat{\beta}_j \hat{f}_j(X_j),\)
where \(\hat{\mu}\) is the predicted
rate. To get the predicted rate, the user has to pass
type = "response" to the predict() call.
With Poisson data, it is common to allow the user to pass in an ,
which is a vector having the same length as the number of observations.
rgam() allows the user to do this as well:
# generate data
set.seed(5)
offset <- rnorm(n)
poi_y <- rpois(n, abs(mu) * exp(offset))
poifit <- rgam(x, poi_y, family = "poisson", offset = offset, verbose = FALSE)
#> init_nz not specified: setting to default (all features)
#> using default value of gamma for RGAM: 0.6Note that if offset is supplied to rgam(),
then an offset vector must also be supplied to predict()
when making predictions:
For Cox regression, the response y must be a two-column
matrix with columns named time and status. The
status column is a binary variable, with 1 indicating death
and 0 indicating right censored. We note that one way to produce such a
matrix is using the Surv() function in the
survival package.
As with logistic and Poisson regression, by default
predict() gives just the value of the linear predictors.
Passing type = "response" to the predict()
call will return the predicted relative-risk.