Skip to content

gratia 0.9.0

Compare
Choose a tag to compare
@gavinsimpson gavinsimpson released this 28 Mar 08:18
2be5c4f

Breaking changes

  • Many functions now return objects with different named variables. In order to
    avoid clashes with variable names used in user's models or data, a period
    (.) is now being used as a prefix for generated variable names. The
    functions whose names have changed are: smooth_estimates(),
    fitted_values(), fitted_samples(), posterior_samples(), derivatives(),
    partial_derivatives(), and derivative_samples(). In addition,
    add_confint() also adds newly-named variables.

    1. `est` is now `.estimate`,
    2. `lower` and `upper` are now `.lower_ci` and `.upper_ci`,
    3. `draw` and `row` and now `.draw` and `.row` respectively,
    4. `fitted`, `se`, `crit` are now `.fitted`, `.se`, `.crit`, respectively
    5. `smooth`, `by`, and `type` in `smooth_estimates()` are now `.smooth`,
       `.by`, `.type`, respectively.
    
  • derivatives() and partial_derivatives() now work more like
    smooth_estimates(); in place of the var and data columns, gratia now
    stores the data variables at which the derivatives were evaluated as columns
    in the object with their actual variable names.

  • The way spline-on-the-sphere (SOS) smooths (bs = "sos") are plotted has
    changed to use ggplot2::coord_sf() instead of the previously-used
    ggplot2::coord_map(). This changed has been made as a result of
    coord_map() being soft-deprecated ("superseded") for a few minor versions of
    ggplot2 by now already, and changes to the guides system in version 3.5.0 of
    ggplot2.

    The axes on plots created with coord_map() never really worked
    correctly and changing the angle of the tick labels never worked. As
    coord_map() is superseded, it didn't receive the updates to the guides
    system and a side effect of these changes, the code that plotted SOS smooths
    was producing a warning with the release of ggplot2 version 3.5.0.

    The projection settings used to draw SOS smooths was previously controlled via
    arguments projection and orientation. These arguments do not affect
    ggplot2::coord_sf(), Instead the projection used is controlled through new
    argument crs, which takes a PROJ string detailing the projection to use or
    an integer that refers to a known coordinate reference system (CRS). The
    default projection used is +proj=ortho +lat_0=20 +lon_0=XX where XX is the
    mean of the longitude coordinates of the data points.

Defunct and deprecated functions and arguments

Defunct

  • evaluate_smooth() was deprecated in gratia version 0.7.0. This function and
    all it's methods have been removed from the package. Use smooth_estimates()
    instead.

Deprecated functions

The following functions were deprecated in version 0.9.0 of gratia. They will
eventually be removed from the package as part of a clean up ahead of an
eventual 1.0.0 release. These functions will become defunct by version 0.11.0 or
1.0.0, whichever is released soonest.

  • evaluate_parametric_term() has been deprecated. Use parametric_effects()
    instead.

  • datagen() has been deprecated. It never really did what it was originally
    designed to do, and has been replaced by data_slice().

Deprecated arguments

To make functions in the package more consistent, the arguments select,
term, and smooth are all used for the same thing and hence the latter two
have been deprecated in favour of select. If a deprecated argument is used, a
warning will be issued but the value assigned to the argument will be assigned
to select and the function will continue.

User visible changes

  • smooth_samples() now uses a single call to the RNG to generate draws from
    the posterior of smooths. Previous to version 0.9.0, smooth_samples() would
    do a separate call to mvnfast::rmvn() for each smooth. As a result, the
    result of a call to smooth_samples() on a model with multiple smooths will
    now produce different results to those generated previously. To regain the
    old behaviour, add rng_per_smooth = TRUE to the smooth_samples() call.

    Note, however, that using per-smooth RNG calls with method = "mh" will be
    very inefficient as, with that method, posterior draws for all coefficients
    in the model are sampled at once. So, only use rng_per_smooth = TRUE with
    method = "gaussian".

  • The output of smooth_estimates() and its draw() method have changed for
    tensor product smooths that involve one or more 2D marginal smooths. Now,
    if no covariate values are supplied via the data argument,
    smooth_estimates() identifies if one of the marginals is a 2d surface and
    allows the covariates involved in that surface to vary fastest, ahead of terms
    in other marginals. This change has been made as it provides a better default
    when nothing is provided to data.

    This also affects draw.gam().

  • fitted_values() now has some level of support for location, scale, shape
    families. Supported families are mgcv::gaulss(), mgcv::gammals(),
    mgcv::gumbls(), mgcv::gevlss(), mgcv::shash(), mgcv::twlss(), and
    mgcv::ziplss().

  • gratia now requires dplyr versions >= 1.1.0 and tidyselect >= 1.2.0.

  • A new vignette Posterior Simulation is available, which describes how to
    do posterior simulation from fitted GAMs using {gratia}.

New features

  • Soap film smooths using basis bs = "so" are now handled by draw(),
    smooth_estimates() etc. #8

  • response_derivatives() is a new function for computing derivatives of the
    response with respect to a (continuous) focal variable. First or second
    order derivatives can be computed using forward, backward, or central
    finite differences. The uncertainty in the estimated derivative is determined
    using posterior sampling via fitted_samples(), and hence can be derived
    from a Gaussian approximation to the posterior or using a Metropolis Hastings
    sampler (see below.)

  • derivative_samples() is the work horse function behind
    response_derivatives(), which computes and returns posterior draws of the
    derivatives of any additive combination of model terms. Requested by
    @jonathanmellor #237

  • data_sim() can now simulate response data from gamma, Tweedie and ordered
    categorical distributions.

  • data_sim() gains two new example models "gwf2", simulating data only from
    Gu & Wabha's f2 function, and "lwf6", example function 6 from Luo & Wabha
    (1997 JASA 92(437), 107-116).

  • data_sim() can also simulate data for use with GAMs fitted using
    family = gfam() for grouped families where different types of data in
    the response are handled. #266 and part of #265

  • fitted_samples() and smooth_samples() can now use the Metropolis Hastings
    sampler from mgcv::gam.mh(), instead of a Gaussian approximation, to sample
    from the posterior distribution of the model or specific smooths
    respectively.

  • posterior_samples() is a new function in the family of fitted_samples()
    and smooth_samples(). posterior_samples() returns draws from the
    posterior distribution of the response, combining the uncertainty in the
    estimated expected value of the response and the dispersion of the response
    distribution. The difference between posterior_samples() and
    predicted_samples() is that the latter only includes variation due to
    drawing samples from the conditional distribution of the response (the
    uncertainty in the expected values is ignored), while the former includes
    both sources of uncertainty.

  • fitted_samples() can new use a matrix of user-supplied posterior draws.
    Related to #120

  • add_fitted_samples(), add_predicted_samples(), add_posterior_samples(),
    and add_smooth_samples() are new utility functions that add the respective
    draws from the posterior distribution to an existing data object for the
    covariate values in that object: obj |> add_posterior_draws(model). #50

  • basis_size() is a new function to extract the basis dimension (number of
    basis functions) for smooths. Methods are available for objects that inherit
    from classes "gam", "gamm", and "mgcv.smooth" (for individual smooths).

  • data_slice() gains a method for data frames and tibbles.

  • typical_values() gains a method for data frames and tibbles.

  • fitted_values() now works with models fitted using the mgcv::ocat()
    family. The predicted probability for each category is returned, alongside a
    Wald interval created using the standard error (SE) of the estimated
    probability. The SE and estimated probabilities are transformed to the logit
    (linear predictor) scale, a Wald credible interval is formed, which is then
    back-transformed to the response (probability) scale.

  • fitted_values() now works for GAMMs fitted using mgcv::gamm(). Fitted
    (predicted) values only use the GAM part of the model, and thus exclude the
    random effects.

  • link() and inv_link() work for models fitted using the cnorm() family.

  • A worm plot can now be drawn in place of the QQ plot with appraise() via
    new argument use_worm = TRUE. #62

  • smooths() now works for models fitted with mgcv::gamm().

  • overview() now returns the basis dimension for each smooth and gains an
    argument stars which if TRUE add significance stars to the output plus a
    legend is printed in the tibble footer. Part of wish of @noamross #214

  • New add_constant() and transform_fun() methods for smooth_samples().

  • evenly() gains arguments lower and upper to modify the lower and / or
    upper bound of the interval over which evenly spaced values will be generated.

  • add_sizer() is a new function to add information on whether the derivative
    of a smooth is significantly changing (where the credible interval excludes
    0). Currently, methods for derivatives() and smooth_estimates() objects
    are implemented. Part of request of @asanders11 #117

  • draw.derivatives() gains arguments add_change and change_type to allow
    derivatives of smooths to be plotted with indicators where the credible
    interval on the derivative excludes 0. Options allow for periods of decrease
    or increase to be differentiated via change_type = "sizer" instead of the
    default change_type = "change", which emphasises either type of change in
    the same way. Part of wish of @asanders11 #117

  • draw.gam() can now group factor by smooths for a given factor into a single
    panel, rather than plotting the smooths for each level in separate panels.
    This is achieved via new argument grouped_by. Requested by @RPanczak #89

    draw.smooth_estimates() can now also group factor by smooths for a given
    factor into a single panel.

  • The underlying plotting code used by draw_smooth_estimates() for most
    univariate smooths can now add change indicators to the plots of smooths if
    those change indicators are added to the object created by
    smooth_estimates() using add_sizer(). See the example in
    ?draw.smooth_estimates.

  • smooth_estimates() can, when evaluating a 3D or 4D tensor product smooth,
    identify if one or more 2D smooths is a marginal of the tensor product. If
    users do not provide covariate values at which to evaluate the smooths,
    smooth_estimates() will focus on the 2D marginal smooth (or the first if
    more than one is involved in the tensor product), instead of following the
    ordering of the terms in the definition of the tensor product. #191

    For example, in te(z, x, y, bs = c(cr, ds), d = c(1, 2)), the second
    marginal smooth is a 2D Duchon spline of covariates x and y. Previously,
    smooth_estimates() would have generated n values each for z and x and
    n_3d values for y, and then evaluated the tensor product at all
    combinations of those generated values. This would ignore the structure
    implicit in the tensor product, where we are likely to want to know how the
    surface estimated by the Duchon spline of x and y smoothly varies with
    z. Previously smooth_estimates() would generate surfaces of z and x,
    varying by y. Now, smooth_estimates() correctly identifies that one of the
    marginal smooths of the tensor product is a 2D surface and will focus on that
    surface varying with the other terms in the tensor product.

    This improved behaviour is needed because in some bam() models it is not
    always possible to do the obvious thing and reorder the smooths when defining
    the tensor product to be te(x, y, z, bs = c(ds, cr), d = c(2, 1)). When
    discrete = TRUE is used with bam() the terms in the tensor product may
    get rearranged during model setup for maximum efficiency (See Details in
    ?mgcv::bam).

    Additionally, draw.gam() now also works the same way.

  • New function null_deviance() that extracts the null deviance of a fitted
    model.

  • draw(), smooth_estimates(), fitted_values(), data_slice(), and
    smooth_samples() now all work for models fitted with scam::scam().
    Where it matters, current support extends only to univariate smooths.

  • generate_draws() is a new low-level function for generating posterior draws
    from fitted model coefficients. generate_daws() is an S3 generic function so
    is extensible by users. Currently provides a simple interface to a simple
    Gaussian approximation sampler (gaussian_draws()) and the simple Metropolis
    Hasting sample (mh_draws()) available via mgcv::gam.mh(). #211

  • smooth_label() is a new function for extracting the labels 'mgcv' creates for
    smooths from the smooth object itself.

  • penalty() has a default method that works with s(), te(), t2(), and
    ti(), which create a smooth specification.

  • transform_fun() gains argument constant to allow for the addition of a
    constant value to objects (e.g. the estimate and confidence interval). This
    enables a single obj |> transform_fun(fun = exp, constant = 5) instead of
    separate calls to add_constant() and then transform_fun(). Part of the
    discussion of #79

  • model_constant() is a new function that simply extracts the first
    coefficient from the estimated model.

Bug fixes

  • link(), inv_link(), and related family functions for the ocat() weren't
    correctly identifying the family name and as a result would throw an error
    even when passed an object of the correct family.

    link() and inv_link() now work correctly for the betar() family in a
    fitted GAM.

  • The print() method for lp_matrix() now converts the matrix to a data frame
    before conversion to a tibble. This makes more sense as it results in more
    typical behaviour as the columns of the printed object are doubles.

  • Constrained factor smooths (bs = "sz") where the factor is not the first
    variable mentioned in the smooth (i.e. s(x, f, bs = "sz") for continuous
    x and factor f) are now plotable with draw(). #208

  • parametric_effects() was unable to handle special parametric terms like
    poly(x) or log(x) in formulas. Reported by @fhui28 #212

  • parametric_effects() now works better for location, scale, shape models.
    Reported by @pboesu #45

  • parametric_effects now works when there are missing values in one or more
    variables used in a fitted GAM. #219

  • response_derivatives() was incorrectly using .data with tidyselect
    selectors.

  • typical_values() could not handle logical variables in a GAM fit as mgcv
    stores these as numerics in the var.summary. This affected evenly() and
    data_slice(). #222

  • parametric_effects() would fail when two or more ordered factors were in
    the model. Reported by @dsmi31 #221

  • Continuous by smooths were being evaluated with the median value of the by
    variable instead of a value of 1. #224

  • fitted_samples() (and hence posterior_samples()) now handles models with
    offset terms in the formula. Offset terms supplied via the offset argument
    are ignored by mgcv:::predict.gam() and hence are ignored also by gratia.
    Reported by @jonathonmellor #231 #233

  • smooth_estimates() would fail on a "fs" smooth when a multivariate base
    smoother was used and the factor was not the last variable specified in the
    definition of the smooth: s(x1, x2, f, bs = "fs", xt = list(bs = "ds"))
    would work, but s(f, x1, x2, bs = "fs", xt = list(bs = "ds")) (or any
    ordering of variables that places the factor not last) would emit an obscure
    error. The ordering of the terms involved in the smooth now doesn't matter.
    Reported by @chrisaak #249.

  • draw.gam() would fail when plotting a multivariate base smoother used in an
    "sz" smooth. Now, this use case is identified and a message printed
    indicating that (currently) gratia doesn't know how to plot such a smooth.
    Reported by @chrisaak #249.

  • draw.gam() would fail when plotting a multivariate base smoother used in an
    "fs" smooth. Now, this use case is identified and a message printed
    indicating that (currently) gratia doesn't know how to plot such a smooth.
    Reported by @chrisaak #249.

  • derivative_samples() would fail with order = 2 and was only computing
    forward finite differences, regardless of type for order = 1. Partly
    reported by @samlipworth #251.

  • The draw() method for penalty() was normalizing the penalty to the range
    0--1, not the claimed and documented -1--1 with argument normalize = TRUE.
    This is now fixed.

  • smooth_samples() was failing when data was supplied that contained more
    variables than were used in the smooth that was being sampled. Hence this
    generally fail unless a single smooth was being sampled from or the model
    contained only a single smooth. The function never intended to retain all the
    variables in data but was written in such a way that it would fail when
    relocating the data columns to the end of the posterior sampling object. #255

  • draw.gam() and draw.smooth_estimates() would fail when plotting a
    univariate tensor product smooth (e.g. te(x), ti(x), or t2()). Reported
    by @wStockhausen #260

  • plot.smooth() was not printing the factor level in subtitles for ordered
    factor by smooths.