Fits a hierarchical Bayesian model for Small Area Estimation (SAE) using
the brms package (Stan back-end). The function supports fixed
effects, random effects, spatial random effects (CAR/SAR), user-defined
priors, and three strategies for handling missing data in auxiliary
(predictor) variables.
Usage
hbm(
formula,
hb_sampling = "gaussian",
hb_link = "identity",
link_phi = "log",
re = NULL,
spatial_var = NULL,
spatial_model = NULL,
car_type = NULL,
sar_type = NULL,
M = NULL,
data,
prior = NULL,
fixed_params = NULL,
sampling_variance = NULL,
prior_type = "default",
hs_df = 1,
hs_df_global = 1,
hs_df_slab = 4,
hs_scale_global = NULL,
hs_scale_slab = 2,
hs_par_ratio = NULL,
hs_autoscale = TRUE,
r2d2_mean_R2 = 0.5,
r2d2_prec_R2 = 2,
r2d2_cons_D2 = NULL,
r2d2_autoscale = TRUE,
nonlinear = NULL,
nonlinear_type = "spline",
spline_k = -1L,
spline_bs = "tp",
gp_k = NA_integer_,
gp_cov = "exp_quad",
gp_c = NULL,
gp_scale = NULL,
handle_missing = NULL,
m = 5L,
mice_args = list(),
measurement_error = NULL,
control = list(),
chains = 4L,
iter = 4000L,
warmup = floor(iter/2),
cores = 1L,
sample_prior = "no",
sre = NULL,
sre_type = NULL,
stanvars = NULL,
...
)Arguments
- formula
A
brmsformulaor standardformulaobject specifying the model structure. For multi-response or imputation sub-models usebf. Examples:formula(y ~ x1 + x2),bf(y ~ x1 + x2), orbf(y | mi() ~ mi(x1) + x2) + bf(x1 | mi() ~ x2).- hb_sampling
Character string naming the distribution family of the response variable (default:
"gaussian"). Any family supported bybrmsfamilyis accepted.- hb_link
Character string specifying the link function (default:
"identity").- link_phi
Character string specifying the link function for the precision/phi parameter (default:
"log"). Only used for Beta and related families.- re
An optional one-sided
formulaspecifying group-level (random) effects, e.g.~(1|area). Must follow standardlme4-style syntax:~ (1|group1) + (1|group2). IfNULL(default) andspatial_modelis alsoNULL, the function emits a warning recommending an area-level random effect, since a Fay-Herriot SAE model with neither IID nor spatial area effects degenerates to fixed-effects regression and does not borrow strength across areas. The warning can be silenced withsuppressWarnings()if a fixed-effects-only baseline is intentional.- spatial_var
Character. Name of the column in
datathat identifies the spatial areas (e.g."regency"or"province"). Must be supplied together withspatial_modelandM; providing only one of them is an error. Distinct fromre:reis a formula for IID random effects, whereasspatial_varis a column name (string) for the spatially-structured random effect.- spatial_model
Character. Type of spatial model:
"car"(Conditional Autoregressive) or"sar"(Simultaneous Autoregressive). Must be supplied together withspatial_varandM; providing only one of them is an error.- car_type
Character. CAR subtype passed to
brms: one of"escar"(exact sparse CAR),"esicar"(exact sparse intrinsic CAR),"icar"(intrinsic CAR), or"bym2". Defaults to"icar"whenspatial_model = "car".- sar_type
Character. SAR subtype:
"lag"(SAR of the response values) or"error"(SAR of the residuals). Defaults to"lag"whenspatial_model = "sar".- M
Spatial matrix supplied as
data2tobrms. For CAR this must be a binary adjacency matrix; for SAR a spatial weight matrix. Row names must match the levels ofspatial_var.- data
A data frame containing all variables referenced in
formula.- prior
Priors specified via
prioror a list thereof. IfNULL(default),brmsdefault priors are used.- fixed_params
Named list pinning distributional parameters to known values instead of sampling them. Each entry maps a parameter name to one of:
- A column name (character)
The named column in
datais used as the fixed values.- A scalar (numeric, length 1)
Broadcast to all rows.
- A vector (numeric, length
nrow(data)) Used directly.
- A one-sided formula (e.g.
~ I(n / deff - 1)) Evaluated against
datato produce the vector of fixed values.
Internally, each pinned parameter
<par>is attached to the data as a column.hbsaems_<par>_fixedand added to the brms formula as<par> ~ 0 + offset(.hbsaems_<par>_fixed). Usingfixed_paramson a parameter for which the user also supplies an explicit prior is an error. Typical use cases include pinningphifor Beta regression from survey design effect (phi = ~ I(n / deff - 1)) and pinningsigmafor Fay–Herriot-style models with known sampling variance.- sampling_variance
Optional character. Name of a column in
datacontaining the known sampling variance \(D_i\) for each area (the Fay-Herriot sugar). When supplied, \(\sigma_i = \sqrt{D_i}\) is pinned via offset. This is the canonical way to fit a Gaussian Fay-Herriot model: without it, the residual \(\sigma\) and the area-RE \(\sigma_u\) compete to explain the same variance and the model is unidentified, typically producing divergent transitions. Equivalent tofixed_params = list(sigma = sqrt(data[[<col>]])).Family compatibility.
sampling_variancerequires a continuous family whose response distribution has a residual scale parameter namedsigma. Supported:gaussian(the canonical Fay-Herriot case),lognormal(D must be on the log scale; see alsohbm_lnln),student,skew_normal,exgaussian,asym_laplace. A helpful error is thrown when an incompatible family is supplied.For non-Gaussian SAE families the analogous pinning mechanism is family-specific:
Beta: pin the precision
phivia the survey design effect, e.g.\fixed_params = list(phi = ~ I(n/deff - 1))(Liu 2009). Seehbm_betalogitnorm.Binomial: sampling variability enters through the
trialsaddition term, not through a separatesigma. Seehbm_binlogitnorm.Poisson, Gamma, Weibull: variance is tied algebraically to the mean – no separate scale parameter to pin.
- prior_type
Character. Global-local shrinkage prior applied to all regression coefficients (
class = "b"). One of:"default"No shrinkage prior is added; the
priorargument governs everything (default)."horseshoe"Regularised horseshoe prior (Piironen & Vehtari 2017). Encourages exact sparsity while allowing large signals through. Controlled by
hs_df,hs_df_global,hs_df_slab,hs_scale_global,hs_scale_slab,hs_par_ratio, andhs_autoscale."r2d2"R2D2 prior (Zhang et al. 2022). Places a prior directly on the model \(R^2\) and distributes explained variance across predictors via a Dirichlet decomposition. Controlled by
r2d2_mean_R2,r2d2_prec_R2,r2d2_cons_D2, andr2d2_autoscale.
If
prioralready contains a globalclass = "b"entry,prior_typeis ignored and a warning is issued.Cascading to smooth and GP terms. When the formula contains
s()orgp()terms, the shrinkage prior is automatically extended to the corresponding parameter classes"sds"(spline SDs) and/or"sdgp"(GP SDs) using the brms-canonicalmain = TRUEpattern. The resulting prior regularises ALL components jointly – linear coefficients, nonlinear smooth wiggliness, and GP marginal variance – which is the principled approach to global-local shrinkage in models that combine parametric and nonparametric components.- hs_df
Numeric \(> 0\). Local half-\(t\) degrees of freedom for the horseshoe prior (default
1= half-Cauchy).- hs_df_global
Numeric \(> 0\). Global half-\(t\) degrees of freedom (default
1).- hs_df_slab
Numeric \(> 0\). Slab half-\(t\) degrees of freedom (default
4).- hs_scale_global
Numeric \(> 0\) or
NULL. Scale for the global half-\(t\) prior.NULL(default) letsbrmscompute it automatically from the number of predictors viahs_autoscale.- hs_scale_slab
Numeric \(> 0\). Scale for the slab component (default
2).- hs_par_ratio
Numeric \(> 0\) or
NULL. Expected ratio of non-zero to total coefficients.NULL(default) treats all coefficients as potentially non-zero.- hs_autoscale
Logical. Whether
brmsshould auto-scale the horseshoe prior using the residual SD \(\sigma\). DefaultTRUE; set toFALSEfor non-continuous responses (binomial, Poisson, ...) where \(\sigma\) is not defined.- r2d2_mean_R2
Numeric in \((0, 1)\). Prior mean of the model \(R^2\) (default
0.5).- r2d2_prec_R2
Numeric \(> 0\). Prior precision of \(R^2\) (default
2). Higher values concentrate mass aroundr2d2_mean_R2.- r2d2_cons_D2
Numeric \(> 0\) or
NULL. Dirichlet concentration for the D2 component.NULL(default) uses thebrmsdefault0.5.- r2d2_autoscale
Logical. Whether
brmsshould auto-scale the R2D2 prior using \(\sigma\). DefaultTRUE; set toFALSEfor non-continuous responses.- nonlinear
Character vector or
NULL. Names of predictor variables to model with a smooth nonlinear term. Each listed variable is replaced in the formula RHS withs(var)(spline) orgp(var)(Gaussian process). Variables not listed remain linear. Do not also writes(x)in the formula when using this argument – the modification is applied automatically. DefaultNULL(all predictors remain linear).- nonlinear_type
Character. Smooth term family to use. One of
"spline"(default, penalised regression spline viamgcv::s()) or"gp"(Gaussian process viabrms::gp()).- spline_k
Integer. Spline basis dimension (number of knots) passed to
mgcv::s(..., k = ...).-1L(default) lets mgcv choose automatically. For SAE typicallyk = 8to15. Ignored whennonlinear_type = "gp".- spline_bs
Character. Spline basis type passed to
mgcv::s(..., bs = ...). Default"tp"(thin-plate regression spline, the mgcv default). Common alternatives:"cr"(cubic regression spline; often more stable for SAE with correlated auxiliary variables),"cs"(cubic with shrinkage; allows variable selection),"ps"(P-splines). Ignored whennonlinear_type = "gp".- gp_k
Integer or
NA. Number of basis functions for the Hilbert-space approximate GP (Riutort-Mayol et al.\ 2020), passed tobrms::gp(..., k = ...).NA(default) = exact GP which scales \(O(n^3)\) and is not recommended for \(n > 100\) areas; an immediate warning is emitted in that case pointing to this argument. Integer values10–25are typical for SAE and dramatically improve convergence and runtime. Ignored whennonlinear_type = "spline".- gp_cov
Character. GP covariance function passed to
brms::gp(..., cov = ...):"exp_quad"(squared exponential / RBF, default),"matern15"(Matern 3/2),"matern25"(Matern 5/2; often more numerically stable for SAE than RBF), or"exponential".- gp_c
Numeric \(> 0\) or
NULL. Hilbert-space GP boundary-scale factor passed tobrms::gp(..., c = ...). Default brms value is \(5/4\) (= 1.25); increase if the GP appears truncated at the domain boundaries. Only relevant whengp_kis supplied.- gp_scale
Deprecated. Use
gp_cinstead. The old name suggested a length-scale interpretation but actually mapped to the HSGP boundary-scale factor. Will be removed in v2.0.0.- handle_missing
Character or
NULL. Strategy for missing data. One of"deleted","multiple", or"model"(see Details). IfNULL(default) and missing values exist in the data, an informative error is raised.- m
Integer. Number of imputations when
handle_missing = "multiple"(default:5). Ignored for other strategies.- mice_args
A named list of additional arguments forwarded to
mice, for examplelist(method = "pmm", seed = 42). Only used whenhandle_missing = "multiple".- measurement_error
Optional named list specifying which auxiliary variables are measured with error and where to find their standard errors. The list maps variable names to columns in
datacontaining the SE, e.g.measurement_error = list(x1 = "se_x1", x2 = "se_x2"). Listed variables are wrapped on-the-fly withmi(var, se_col)in the brmsformula so that brms treats them as latent variables with a Gaussian measurement-error structure, following Ybarra and Lohr (2008). Standard errors must be non-negative and have no missing values;measurement_errorvariables must be a subset of the model's auxiliary (linear) predictors. When the user has already writtenmi(...)explicitly in the formula, the corresponding entries inmeasurement_errorare detected and not duplicated. Note that ME inflates the parameter space (each x_i becomes latent), which slows sampling and increases the risk of divergent transitions, especially when combined with smooth terms (nonlinear). See the "Measurement error" section of the SAE vignette.- control
A named list of sampler control parameters (default:
list()). Passed directly tobrm.- chains
Integer. Number of MCMC chains (default:
4).- iter
Integer. Total iterations per chain (default:
4000).- warmup
Integer. Warm-up iterations per chain (default:
floor(iter / 2)).- cores
Integer. Number of CPU cores for parallel sampling (default:
1).- sample_prior
Character. Whether to draw from the prior distribution. One of
"no"(default),"yes", or"only".- sre
Deprecated. Use
spatial_varinstead. Kept for backward compatibility; will be removed in v2.0.0.- sre_type
Deprecated. Use
spatial_modelinstead. Kept for backward compatibility; will be removed in v2.0.0.- stanvars
An optional
stanvarobject (or a list of such objects) supplying additional Stan code, data, or parameters. Passed directly tobrmandbrm_multiple. Intended for use by wrapper functions such ashbm_betalogitnormthat require custom Stan blocks; end users typically do not need to set this argument. Default:NULL.- ...
Additional arguments forwarded to
brmorbrm_multiple.
Value
An object of class hbmfit, which is a named list containing:
- model
The fitted
brmsfitobject (orbrmsfit_multiplewhenhandle_missing = "multiple"with missing predictors).- handle_missing
The missing-data strategy used (
NULLif the data were complete).- data
The original data frame passed to
hbm()before any row deletion or imputation. This is intentional:hbsaeneeds all rows – including those with missing \(Y\) – to generate predictions for every small area.
Details
Hierarchical Bayesian Model for Small Area Estimation
**Spatial Small Area Estimation Models**
For spatially correlated areas, hbsaems extends the standard area-level SAE model (Fay-Herriot 1979) by adding a spatially structured random effect: $$y_i = x_i^\top \boldsymbol{\beta} + u_i + e_i,$$ where \(u_i\) is the spatial random effect for area \(i\) and \(e_i\) the sampling error. Two families of spatial structures are supported.
- CAR (Conditional Autoregressive; Besag 1974)
Specified by
spatial_model = "car". The joint distribution of the spatial effects is $$u \sim \mathcal{N}\bigl(0,\, \sigma_u^2 (D - \rho W)^{-1}\bigr),$$ where \(W\) is a binary adjacency matrix (1 if neighbour, 0 otherwise) and \(D = \mathrm{diag}(W \mathbf{1})\). Sub-types viacar_type:"icar"(intrinsic, \(\rho = 1\); Besag 1991);"escar","esicar"(exact sparse formulations of Morris et al.\ 2019);"bym2"(BYM2 reparameterisation of Riebler et al.\ 2016, recommended for disconnected graphs).- SAR (Simultaneous Autoregressive; Whittle 1954, Anselin 1988)
Specified by
spatial_model = "sar". The model is $$u = \rho W u + \varepsilon, \quad \varepsilon \sim \mathcal{N}(0, \sigma_\varepsilon^2 I),$$ where \(W\) is row-standardised so that \(\rho \in (-1, 1)\) carries an interpretable correlation meaning. Sub-types viasar_type:"lag"(spatial lag of the response, \(y = \rho W y + X\boldsymbol{\beta} + \varepsilon\));"error"(spatial error model).
Use check_spatial_weight to verify that \(M\) satisfies
the theoretical requirements (square, zero diagonal, symmetry for CAR,
style-appropriate for the model class). Use build_spatial_weight
to construct \(M\) from a shapefile.
**Missing Data Strategies**
The three strategies differ in scope and statistical assumptions:
"deleted"Complete-case analysis. Only rows where all response variable(s) are observed are used for model fitting. Auxiliary variables must be complete; otherwise an informative error is raised. Appropriate under MCAR (Missing Completely At Random).
"multiple"Multiple imputation via
micefor auxiliary (predictor) variables only. The response variable \(Y\) is never imputed. In a Bayesian model, missing outcomes are naturally marginalised through the posterior predictive distribution: $$p(\theta \mid Y_{\text{obs}}, X) = \int p(\theta \mid Y_{\text{obs}}, Y_{\text{mis}}, X)\, p(Y_{\text{mis}} \mid Y_{\text{obs}}, X, \theta)\, \mathrm{d}Y_{\text{mis}}.$$ Imputing \(Y\) before fitting would replace this integral with a single point substitute, deflate posterior uncertainty, and potentially bias the estimates if the imputation model is misspecified. If \(Y\) has missing values, those rows are excluded from model fitting (awarningis issued) but are retained in the returned object for subsequent prediction viahbsae. Appropriate under MAR (Missing At Random)."model"Model-based imputation using
brms::mi(). Missing values in auxiliary variables are jointly estimated with the model parameters. The user must specify imputation sub-models explicitly in theformulaargument, e.g.:bf(y | mi() ~ mi(x1) + x2) + bf(x1 | mi() ~ x2). Only applicable to continuous distributions. Appropriate under MAR.
If data are Missing Not At Random (MNAR), none of the above strategies applies directly; sensitivity analyses and explicit missingness models are recommended.
How to pin distributional parameters per family
hbsaems exposes three layers for pinning distributional parameters to known values, in increasing order of generality:
Family-specific sugar (least typing, most readable). Each wrapper exposes a convenience argument that maps to a well-defined Fay-Herriot-style transformation of survey design quantities:
Wrapper Sugar argument Pinned dpar hbm(..., hb_sampling = "gaussian")sampling_variance = "D"\(\sigma_i = \sqrt{D_i}\) hbm_lnln()sampling_variance = "psi"\(\sigma_i = \sqrt{\psi_i}\) (on log scale) hbm_betalogitnorm()n = "n", deff = "deff"\(\phi_i = n_i / \mathrm{deff}_i - 1\) (Liu 2009) hbm_binlogitnorm()trials = "n"(sampling variance built into the family) Universal
fixed_params(works everywhere). A named list pinning any distributional parameter – accepts a column name (character), a scalar (numeric of length 1), a vector of lengthnrow(data), or a one-sided formula (evaluated indata's environment). Examples:fixed_params = list(sigma = "D")– pin sigma to a column.fixed_params = list(phi = ~ I(n / deff - 1))– pin phi via formula.fixed_params = list(nu = 4)– pin Student-t df to a scalar.
Raw
stanvars(for power users authoring custom Stan code). Direct injection of Stan code blocks – seestanvar.
Sugar arguments are simply thin wrappers around layer 2: they
validate the survey-design inputs (no NAs, strictly positive) and
translate to fixed_params before delegating to the universal
machinery. Hence the conflict policy below applies uniformly to
both sugar and explicit fixed_params.
Conflict resolution between prior, prior_type, fixed_params, and stanvars
hbm() provides four orthogonal mechanisms to influence the
prior / parameter specification of the underlying brms model:
prior– explicitbrmspriorobject(s).prior_type– global-local shrinkage prior on the regression coefficients (cascades to"sds"/"sdgp"when splines or GPs are present).fixed_params– pin distributional parameters to known values via the offset trick.stanvars– inject custom Stan code blocks.
Plus two family-specific sugar arguments that translate to
fixed_params internally:
sampling_variance(continuous families): pins \(\sigma_i = \sqrt{D_i}\), equivalent tofixed_params$sigma = sqrt(data$D).n + deff(hbm_betalogitnorm): pins \(\phi_i = n_i / \mathrm{deff}_i - 1\), equivalent tofixed_params$phi = n / deff - 1.
Combining these without rules in mind can produce unidentified models
or compile-time errors. hbm() therefore enforces the
following conflict matrix, where each cell describes what
happens when the row and column inputs both target the same
distributional parameter (e.g.\ both pin sigma):
| fixed_params | prior | prior_type | stanvars | |
| sampling_variance | error | error (transitive) | no overlap | error (transitive) |
| n + deff | error | error (transitive) | no overlap | error (transitive) |
| fixed_params | – | error (10b.i) | no overlap | error (10b.ii) |
| prior | error (10b.i) | – | warning, user wins | no check needed |
| prior_type | no overlap | warning, user wins | – | no check needed |
| stanvars | error (10b.ii) | no check needed | no check needed | – |
Resolution semantics in detail:
priorvsprior_type. If the user supplies a global (nocoef =) prior onclass = "b","sds", or"sdgp",prior_typeis silently dropped for that class and a warning is emitted. Coefficient-specific user priors (coef = "x1") are kept alongside the shrinkage prior without warning.fixed_paramsvsprior. A pinned parameter is removed from the sampler; supplying a prior on that same parameter therefore has no effect and is treated as a user error – an informativestop()is issued.fixed_paramsvsstanvars. Same logic as above: a sampling statement instanvarsthat targets a pinned parameter would fail at Stan compile time;hbm()catches this and stops with a clear message.Sugar vs
fixed_paramson the same parameter. The sugar translators (.translate_sampling_variance(),.translate_n_deff_to_phi()) error out if the user has also pre-populatedfixed_paramsfor the target parameter – there should never be two pin sources for the same dpar.Sugar vs
prior/stanvarstransitively. After the sugar ->fixed_paramstranslation, the downstreamfixed_params-vs-prior / -stanvars checks fire automatically. E.g.\sampling_variance = "D"plusprior = set_prior("normal(0, 1)", class = "sigma")errors via thefixed_paramsvspriorrule.
The intent is to fail fast and explicitly rather than silently producing an unidentified or mis-specified model.
Convergence advice for SAE practitioners
Common convergence pathologies in hierarchical Bayesian SAE models
and how to address them. Run convergence_check() after
fitting to inspect \(\hat R\), effective sample size (ESS), and
divergent transitions.
1. Default sampler settings (recommended starting point).
chains = 4,iter = 4000,warmup = 2000control = list(adapt_delta = 0.95, max_treedepth = 12)cores = parallel::detectCores() - 1
2. Divergent transitions. Most common cause is the funnel geometry of hierarchical variance parameters.
First-line: increase
adapt_deltato0.99.If still diverging, increase
warmupand consider a tighter prior on the area-level standard deviation (sdclass), e.g.\set_prior("normal(0, 0.5)", class = "sd").For Beta/Binomial logit-normal models, prior on the random intercept SD should also be on the logit scale.
Gaussian (Fay-Herriot) only. Always supply the known sampling variance via
sampling_variance = "<col>". Without this, the residual \(\sigma\) and the area-RE \(\sigma_u\) compete to explain the same variance component, producing weak identifiability and divergent transitions almost regardless ofadapt_delta. This is the single most common cause of divergences in Fay-Herriot fits and should be checked first before any sampler-tuning.
3. Low effective sample size (ESS < 1000).
Increase
iter(e.g.\ to6000); this is the single most reliable fix.Centre and scale the auxiliary variables before fitting.
Check for prior–data conflict via priorsense; see
?prior_check.
4. Gaussian processes.
Exact GP scales \(O(n^3)\). For \(n > 100\) areas, set
gp_kto use the Hilbert-space approximate GP (Riutort-Mayol et al.\ 2020). A heuristic isgp_k = ceiling(min(n / 5, 25)).Try
gp_cov = "matern25"(Matern 5/2) if the default squared-exponential covariance is numerically unstable.The boundary-scale factor
gp_c(brms default 1.25) may need increasing if the posterior GP is truncated at the domain edges.
5. Splines.
Start with
spline_k = -1(auto). Increase only if the residual diagnostics suggest under-smoothing.For strongly correlated auxiliary variables, try
spline_bs = "cr"(cubic regression spline) for better numerical stability than the default thin-plate.For variable selection, use
spline_bs = "cs"(cubic with shrinkage); coefficients on irrelevant smooths shrink toward zero.
6. Spatial models.
For CAR,
car_type = "bym2"(Riebler et al.\ 2016) is the modern recommendation; it stabilises the spatial/IID decomposition via a single mixing parameter.Verify the weight matrix with
check_spatial_weight(); isolated areas or multiple disconnected components cause non-identifiability.
7. Prior predictive check first. Always call
prior_check() (sample_prior = "only") before
the full posterior run. Implausible prior predictives are the
single most common cause of slow / divergent sampling.
References
Rao, J. N. K., & Molina, I. (2015). Small Area Estimation. John Wiley & Sons.
Burkner, P. C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28.
Riutort-Mayol, G., Burkner, P.-C., Andersen, M. R., Solin, A., & Vehtari, A. (2023). Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming. Statistics and Computing, 33, 17. doi:10.1007/s11222-022-10167-2
Riebler, A., Sorbye, S. H., Simpson, D., & Rue, H. (2016). An intuitive Bayesian spatial model for disease mapping that accounts for scaling. Statistical Methods in Medical Research, 25(4), 1145–1165.
van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67.
Examples
# \donttest{
library(hbsaems)
library(brms)
data("data_fhnorm")
data <- data_fhnorm
# Standard brms-default MCMC settings used throughout these
# examples (chains = 4, iter = 2000, warmup = 1000). For tougher
# posteriors (funnel geometry, weakly identified priors), bump to
# iter = 4000, warmup = 2000, control = list(adapt_delta = 0.99).
FAST <- list(chains = 4, iter = 2000, warmup = 1000, cores = 1,
seed = 123, refresh = 0)
# -- Basic model --------------------------------------------------------------
model <- do.call(hbm, c(
list(formula = bf(y ~ x1 + x2 + x3),
hb_sampling = "gaussian",
hb_link = "identity",
re = ~(1 | regency),
data = data),
FAST
))
#> Compiling Stan program...
#> Error in .fun(model_code = .x1): Boost not found; call install.packages('BH')
summary(model)
#> Error: object 'model' not found
# -- Horseshoe prior (sparse coefficients) ------------------------------------
model_hs <- do.call(hbm, c(
list(formula = bf(y ~ x1 + x2 + x3),
re = ~(1 | regency),
data = data,
prior_type = "horseshoe",
hs_df = 1),
FAST
))
#> Compiling Stan program...
#> Error in .fun(model_code = .x1): Boost not found; call install.packages('BH')
summary(model_hs)
#> Error: object 'model_hs' not found
# -- R2D2 prior (prior on model R-squared) -------------------------------------
model_r2 <- do.call(hbm, c(
list(formula = bf(y ~ x1 + x2 + x3),
re = ~(1 | regency),
data = data,
prior_type = "r2d2",
r2d2_mean_R2 = 0.5,
r2d2_prec_R2 = 2),
FAST
))
#> Compiling Stan program...
#> Error in .fun(model_code = .x1): Boost not found; call install.packages('BH')
summary(model_r2)
#> Error: object 'model_r2' not found
# -- Spline smooth for x1 (nonlinear) -----------------------------------------
# x1 is modelled with s(x1); x2 and x3 remain linear.
model_spline <- do.call(hbm, c(
list(formula = bf(y ~ x1 + x2 + x3),
re = ~(1 | regency),
data = data,
nonlinear = "x1",
nonlinear_type = "spline"),
FAST
))
#> Warning: Variable(s) x1 appear in both 'auxiliary' (linear) and 'nonlinear'. They will be modelled nonlinearly ONLY. Remove them from 'auxiliary' to suppress this warning.
#> Compiling Stan program...
#> Error in .fun(model_code = .x1): Boost not found; call install.packages('BH')
summary(model_spline)
#> Error: object 'model_spline' not found
# -- Gaussian process for x2 (nonlinear) --------------------------------------
model_gp <- do.call(hbm, c(
list(formula = bf(y ~ x1 + x2 + x3),
re = ~(1 | regency),
data = data,
nonlinear = "x2",
nonlinear_type = "gp"),
FAST
))
#> Warning: Variable(s) x2 appear in both 'auxiliary' (linear) and 'nonlinear'. They will be modelled nonlinearly ONLY. Remove them from 'auxiliary' to suppress this warning.
#> Compiling Stan program...
#> Error in .fun(model_code = .x1): Boost not found; call install.packages('BH')
summary(model_gp)
#> Error: object 'model_gp' not found
# -- Missing data: deletion (Y missing, X complete) ---------------------------
data_miss_y <- data
data_miss_y$y[3:5] <- NA
model_deleted <- do.call(hbm, c(
list(formula = bf(y ~ x1 + x2 + x3),
re = ~(1 | regency),
data = data_miss_y,
handle_missing = "deleted"),
FAST
))
#> handle_missing = 'deleted': 3 row(s) with missing response variable removed from model fitting.
#> Compiling Stan program...
#> Error in .fun(model_code = .x1): Boost not found; call install.packages('BH')
summary(model_deleted)
#> Error: object 'model_deleted' not found
# -- Missing data: multiple imputation (X only -- Y is never imputed) ----------
data_miss_x <- data
data_miss_x$x1[6:8] <- NA
model_multiple <- do.call(hbm, c(
list(formula = bf(y ~ x1 + x2 + x3),
re = ~(1 | regency),
data = data_miss_x,
handle_missing = "multiple",
m = 5),
FAST
))
#> Missing predictor variable(s): x1. Applying multiple imputation (mice) with m = 5 imputations.
#> Warning: Number of logged events: 2
#> Compiling the C++ model
#> Error in .fun(model_code = .x1): Boost not found; call install.packages('BH')
summary(model_multiple)
#> Error: object 'model_multiple' not found
# -- Missing data: model-based imputation with mi() ---------------------------
data_miss_x2 <- data
data_miss_x2$x1[6:7] <- NA
model_model <- do.call(hbm, c(
list(formula = bf(y | mi() ~ mi(x1) + x2 + x3) +
bf(x1 | mi() ~ x2 + x3),
re = ~(1 | regency),
data = data_miss_x2,
handle_missing = "model"),
FAST
))
#> handle_missing = 'model': using mi() specification for joint model-based imputation.
#> Setting 'rescor' to FALSE by default for this model
#> Compiling Stan program...
#> Error in .fun(model_code = .x1): Boost not found; call install.packages('BH')
summary(model_model)
#> Error: object 'model_model' not found
# -- Spatial: CAR (Conditional Autoregressive) --------------------------------
data("adjacency_matrix_car")
model_car <- do.call(hbm, c(
list(formula = bf(y ~ x1 + x2 + x3),
data = data,
spatial_var = "province",
spatial_model = "car",
M = adjacency_matrix_car),
FAST
))
#> Compiling Stan program...
#> Error in .fun(model_code = .x1): Boost not found; call install.packages('BH')
summary(model_car)
#> Error: object 'model_car' not found
# -- Spatial: SAR (Simultaneous Autoregressive) -------------------------------
# spatial_weight_sar is a 100x100 row-standardised matrix with row-
# names regency_001..regency_100, so it pairs with the fine-grained
# "regency" column (100 levels) -- NOT with "province" (5 levels).
data("spatial_weight_sar")
model_sar <- do.call(hbm, c(
list(formula = bf(y ~ x1 + x2 + x3),
data = data,
spatial_var = "regency",
spatial_model = "sar",
M = spatial_weight_sar),
FAST
))
#> Compiling Stan program...
#> Error in .fun(model_code = .x1): Boost not found; call install.packages('BH')
summary(model_sar)
#> Error: object 'model_sar' not found
# }