Overview
hbsaems is an R package for area-level Hierarchical Bayesian Small Area Estimation (HBSAE). The methodological foundation follows the standard SAE literature – primarily Rao and Molina (2015) – while the computational implementation is adapted to the parameterisation and prior-specification conventions of the brms package (Bürkner, 2017), which targets the Stan back-end.
The package is designed to support the principled Bayesian workflow (Gelman and others, 2020) so that SAE modelling becomes systematic: prior predictive checks, MCMC convergence diagnostics, posterior predictive checks, leave-one-out cross-validation, Bayesian model comparison and averaging, prior sensitivity analysis, and design- consistent benchmarking are all part of the standard pipeline.
Scope. This package focuses on area-level SAE models. Unit- level SAE models (e.g. the nested error model of Battese, Harter & Fuller, 1988) are not the focus.
Key features
- Area-level SAE models: Fay-Herriot normal, lognormal-lognormal, beta logit-normal, and binomial logit-normal – each adapted from Rao and Molina (2015) to the brms parameterisation.
-
SAE-idiomatic argument naming (v1.0.0): arguments named after SAE concepts (
auxiliary,area_var,spatial_var,spatial_model,sampling_variance,n,deff). -
Survey-design-informed fixed parameters:
- Beta precision: pinned via offset.
- Lognormal: pinned via offset (Fay-Herriot lognormal).
-
Generic
fixed_paramsmechanism for power users and custom distributions. - Spatial random effects: CAR (ICAR / proper / BYM2) and SAR (lag / error).
- Custom brms distributions: built-in Loglogistic and Shifted Loglogistic, plus a registration framework for user-defined families.
-
Missing data: listwise deletion, multiple imputation (
mice), or joint Bayesian imputation (brms::mi()) with auto-selection. -
Comprehensive Bayesian workflow: prior predictive checks, MCMC convergence diagnostics, posterior predictive checks, LOO-CV, Bayesian model comparison and averaging, prior sensitivity analysis via
priorsense, design-consistent benchmarking. -
Shrinkage priors: Horseshoe and R2D2 selectable via
prior_type. - Nonlinear smooth terms: thin-plate splines and Gaussian processes.
-
Interactive Shiny application: bilingual (English/Indonesian) GUI via
run_sae_app(). - Ten comprehensive vignettes covering every supported model and advanced topic.
Installation
You can install the development version from GitHub:
# install.packages("devtools")
devtools::install_github("madsyair/hbsaems")Or with vignettes:
devtools::install_github("madsyair/hbsaems", build_vignettes = TRUE)Quick start
library(hbsaems)
# Load example data
data("data_fhnorm")
str(data_fhnorm[, c("regency", "province", "y", "D", "x1", "x2")])
# Fit a basic Fay-Herriot Normal model with an IID area effect
model <- hbm(
formula = bf(y ~ x1 + x2 + x3),
hb_sampling = "gaussian",
hb_link = "identity",
re = ~(1 | regency),
data = data_fhnorm,
chains = 4, iter = 4000, warmup = 2000, cores = 2, seed = 1
)
# Check the model summary
summary(model)The three-layer function family
hbm()
Universal: any brms family, full customisation
▲
|
hbm_flex()
Family registry + auxiliary + fixed_params
▲
|
hbm_lnln() / hbm_betalogitnorm() / hbm_binlogitnorm()
SAE-friendly wrappers: response, auxiliary, area_var, ...Most users start with a wrapper. Step up to hbm_flex() when you need a custom family or the generic fixed_params interface; step up to hbm() when you need full brms control.
Core functions
Main modeling functions
hbm() – universal modeling function
model <- hbm(
formula = bf(y ~ x1 + x2 + x3), # brms formula
hb_sampling = "gaussian", # likelihood family
hb_link = "identity", # link function
re = ~(1 | regency), # IID area RE (formula)
spatial_var = "province", # spatial RE column
spatial_model = "car", # "car" or "sar"
M = adjacency_matrix_car, # spatial weight matrix
data = data_fhnorm
)Distribution-specific wrappers
Beta logit-normal for proportions in :
fit_beta <- hbm_betalogitnorm(
response = "y",
auxiliary = c("x1", "x2", "x3"),
n = "n", # sample size column
deff = "deff", # design effect column
area_var = "regency", # area random effect
data = data_betalogitnorm,
chains = 4, iter = 4000, warmup = 2000, cores = 2, seed = 1
)Binomial logit-normal for successes out of trials:
fit_bin <- hbm_binlogitnorm(
response = "y",
trials = "n",
auxiliary = c("x1", "x2", "x3"),
area_var = "district", # 100 districts in this dataset
data = data_binlogitnorm,
chains = 4, iter = 4000, warmup = 2000, cores = 2, seed = 1
)Lognormal-lognormal (Fay-Herriot variant) for positive, right-skewed responses:
Workflow functions
| Function | Purpose | Replaces (legacy) |
|---|---|---|
convergence_check() |
MCMC convergence diagnostics: , ESS, Geweke, Heidelberger, Raftery | hbcc() |
model_compare() |
LOO/WAIC/Bayes-factor model comparison and posterior predictive checks | hbmc() |
model_compare_all() |
Multi-model ranking analogous to loo_compare
|
– |
model_average() |
Bayesian model averaging (manual, stacking, or pseudo-BMA+ weights via loo) |
– |
prior_check() |
Prior predictive checks | hbpc() |
prior_sensitivity() |
Power-scale prior sensitivity diagnostics (priorsense) |
– |
sae_predict() |
In-sample and out-of-sample SAE prediction | hbsae() |
sae_benchmark() |
Design-consistent benchmarking | – |
Auxiliary functions
| Function | Purpose |
|---|---|
check_data() |
Data integrity checks (missingness, duplicates, sample size) |
check_spatial_weight() |
Spatial weight matrix theoretical-compatibility checks |
build_spatial_weight() |
Build adjacency / row-standardised weight matrices |
is_converged() |
Quick yes/no convergence check |
posterior_interval() |
Credible intervals for posterior draws |
posterior_draws() |
Extract posterior draws as a long data frame |
hbm_info() |
Inspect model spec, priors, sampler settings |
hbm_warnings() |
Surface model fit warnings |
Custom distributions
hbsaems ships with two custom brms families for positive, right- skewed data:
-
Loglogistic – canonical Fisk parameterisation following
flexsurv::dllogisand Wikipedia. See?loglogisticand?brms_custom_loglogistic. -
Shifted Loglogistic – GEV-style parameterisation of Hosking & Wallis (1997). See
?shifted_loglogisticand?brms_custom_shifted_loglogistic.
Both are auto-registered with brms at package load via register_hbsae_brms_custom().
Interactive Shiny application
The package ships a bilingual (English / Indonesian) Shiny application:
The application provides a guided workflow: data upload → exploration → spatial setup → modelling → diagnostics → results download.
Vignettes
The package includes ten vignettes that progressively walk through the modelling workflow:
browseVignettes("hbsaems")| Vignette | Topic |
|---|---|
complete-workflow |
End-to-end Fay-Herriot example with full Bayesian workflow |
hbsaems-modelling |
Modelling concepts and the three-layer API |
hbsaems-betalogitnorm-model |
Beta logit-normal for proportions |
hbsaems-binlogitnorm-model |
Binomial logit-normal for counts |
hbsaems-lnln-model |
Lognormal-lognormal Fay-Herriot variant |
hbsaems-spatial |
CAR / SAR / BYM2 spatial models |
hbsaems-handle-missing |
Three missing-data strategies |
advanced-features |
Shrinkage priors, splines, custom families |
hbsaems-run_sae_app |
Using the Shiny application |
migration-guide |
Migrating from v0.x to v1.0.0 |
Migrating from v0.x
If you are coming from v0.x of hbsaems, see the migration-guide vignette. The most visible changes:
| Old (deprecated) | New (canonical) |
|---|---|
hbcc() |
convergence_check() |
hbmc() |
model_compare() |
hbpc() |
prior_check() |
hbsae() |
sae_predict() |
group = |
area_var = |
sre = |
spatial_var = |
sre_type = |
spatial_model = |
predictors = |
auxiliary = |
sampling_var = |
sampling_variance = |
The old names continue to work in v1.0.0 with a deprecation warning and will be removed in v2.0.0.
References
- Battese, G. E., Harter, R. M., & Fuller, W. A. (1988). An error-components model for prediction of county crop areas using survey and satellite data. Journal of the American Statistical Association, 83(401), 28-36.
- Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1-28.
- Gelman, A., Vehtari, A., Simpson, D., Margossian, C. C., Carpenter, B., Yao, Y., Kennedy, L., Gabry, J., Bürkner, P.-C., & Modrák, M. (2020). Bayesian workflow. arXiv:2011.01808.
- Rao, J. N. K., & Molina, I. (2015). Small Area Estimation (2nd edition). Wiley.
Authors
- Achmad Syahrul Choir (maintainer, author) – madsyair@stis.ac.id, Politeknik Statistika STIS, Jakarta
- Saniyyah Sri Nurhayati (author)
- Sofi Zamzanah (author)
- Arsyka Laila Oktalia Siregar (author)
