A simulated dataset for 100 regencies under the Fay-Herriot Normal
small-area model. Used as the running example throughout the package
documentation and vignettes. The simulation is engineered so that the
canonical Fay-Herriot fit
(hbm(..., sampling_variance = "D")) converges with default
brms / Stan settings – no divergent transitions, no manual
tuning required.
Format
A data frame with 100 rows and 9 variables:
yDirect (survey) estimator of the area mean.
DSampling variance of the direct estimator (KNOWN from the survey design; treat as input, not as a parameter).
x1,x2,x3Auxiliary covariates at the area level, simulated from \(\mathcal{N}(0, 1)\).
theta_trueTrue area-level latent value \(\theta_i\).
uTrue area-level random effect \(u_i\).
regencyRegency identifier (
"regency_001"through"regency_100") used as the IID random-effect grouping variable. Use withre = ~ (1 | regency)orarea_var = "regency".provinceProvince identifier (
"province_01"through"province_05") – 20 regencies per province. Used as the spatial random-effect grouping variable for CAR / SAR / BYM2 examples; also serves as the higher level in the hierarchical-area examplearea_var = c("province", "regency").
Details
Generative model. For each regency \(i = 1, \ldots, 100\), $$ y_i = \theta_i + \varepsilon_i, \quad \varepsilon_i \sim \mathcal{N}(0, D_i) $$ $$ \theta_i = 10 + 0.8 \, x_{1i} - 0.5 \, x_{2i} + 0.3 \, x_{3i} + u_i, \quad u_i \sim \mathcal{N}(0, \sigma_u^2) $$ with auxiliary covariates \(x_j \sim \mathcal{N}(0, 1)\) (already standardised), area RE SD \(\sigma_u = 1\), and known sampling variances \(D_i \sim \mathrm{Gamma}(\mathrm{shape} = 4, \mathrm{rate} = 4)\) – a realistic spread (\(\approx [0.2, 3.0]\)) that mirrors varying sample sizes across regencies.
Important: pass D as the sampling variance. In any
fit on this dataset, supply sampling_variance = "D"; otherwise
the residual \(\sigma\) and the area-RE \(\sigma_u\) compete to
explain the same variance, producing weak identifiability and
divergent transitions.