Skip to contents

What this covers

We show how to construct a recipe (covariates → treatment → event time → censoring), and then simulate datasets with simulate_from_recipe(). We also show batch generation (generate_recipe_sets()) and how to read compact metadata back (load_recipe_sets()), without any platform-specific tricks.

The simulator takes a plain list as the recipe. No YAML is required.

Recipe skeleton

list(
  n = 300,
  covariates = list(defs = list(/* see Covariates */)),
  treatment  = list(/* see Treatment */),
  event_time = list(/* see Event-time engines */),
  censoring  = list(/* see Censoring */),
  seed = 42
)

Covariates

Each covariate has a name, type ("continuous" or "categorical"), a dist with params, and optional transform steps applied after generation ("center(a)", "scale(b)").

Available distributions

Type dist (string) Parameters
continuous normal mean, sd
continuous lognormal meanlog, sdlog
continuous gamma shape, scale
continuous weibull shape, scale
continuous uniform min, max
continuous beta shape1, shape2
continuous t df
categorical bernoulli p (probability of 1)
categorical categorical prob = c(...), labels = c(...) (optional)
categorical ordinal prob = c(...), labels = c(...) (optional, ordered)

Example: define covariates

covs <- list(
  list(name="age",   type="continuous",  dist="normal",     params=list(mean=62, sd=10),
       transform=c("center(60)","scale(10)")),
  list(name="sex",   type="categorical", dist="bernoulli",  params=list(p=0.45)),
  list(name="stage", type="categororical", dist="ordinal",
       params=list(prob=c(0.3,0.5,0.2), labels=c("I","II","III"))),
  list(name="x",     type="continuous",  dist="lognormal",  params=list(meanlog=0, sdlog=0.6))
)

Treatment

Choose one assignment:

Assignment Key fields Meaning
"randomization" allocation = "a:b" Bernoulli with probability p1=a/(a+b)p_1 = a/(a+b).
"stratified" allocation, stratify_by = c("...") Same allocation within each stratum defined by listed categorical covariates.
"logistic_ps" ps_model = list(formula = "~ ...", beta = c(...)) Treatment probability is logit1(η)\mathrm{logit}^{-1}(\eta) from user model. Provide explicit beta to avoid parsing edge-cases.

Examples

Randomization:

tr_rand <- list(assignment="randomization", allocation="1:1")

Stratified by "stage":

tr_strat <- list(assignment="stratified", allocation="2:1", stratify_by=c("stage"))

Logistic propensity:

tr_ps <- list(
  assignment = "logistic_ps",
  ps_model  = list(
    formula = "~ 1 + x + sex",
    beta    = c(-0.3, 1.2, -0.6)  # (Intercept), x, sex
  )
)

Event-time engines

Let ZZ be treatment (0/1), XX be covariates, and η\eta be the linear predictor (defined in effects, below). Supported engines and baseline parameterizations:

Model (user-facing) model value Baseline parameters Notes
AFT Lognormal "aft_lognormal" mu, sigma logT=μ+η+σε\log T = \mu + \eta + \sigma \varepsilon, ε𝒩(0,1)\varepsilon \sim \mathcal{N}(0,1).
AFT Weibull "aft_weibull" shape, scale S0(t)=exp((t/λ)k)S_0(t) = \exp(-(t/\lambda)^k); AFT shift via η\eta.
AFT Log-Logistic "aft_loglogistic" shape, scale T=λexp(η)(U/(1U))1/kT = \lambda \exp(\eta) (U/(1-U))^{1/k}.
PH Exponential "ph_pwexp" rates = c(λ), cuts = numeric(0) Piecewise-Exp with a single segment is exponential.
PH Weibull "ph_weibull" shape, scale Proportional hazards with Weibull baseline.
PH Gompertz "ph_gompertz" rate, gamma Hazard h(t)=aexp(bt)h(t) = a \exp(bt).
PH Piecewise Exponential "ph_pwexp" rates = c(r1,r2,...), cuts = c(c1,c2,...) Rate in segment ss is rsexp(η)r_s \exp(\eta).

Effects and linear predictor

Specify effects on the appropriate scale (AFT: log-time; PH: log-hazard):

effects = list(
  intercept  = 0,                      # default is 0
  treatment  = -0.25,
  covariates = list(age = 0.01, sex = -0.2)  # NOTE: named LIST
  # or: formula="~ age + sex", beta=c(0.01, -0.2)
)

effects$covariates must be a named list of numerics (e.g., list(age=0.01)), not a named vector created with c().


Censoring

Two modes are supported:

Mode Fields Semantics
"target_overall" target, admin_time Solver finds an exponential random-censoring rate λc\lambda_c so that overall censoring fraction \approxtarget, subject to any administrative floor at admin_time.
"explicit" Any of: administrative = list(time=...); random = list(dist="exponential", params=list(rate=...)); dependent = list(formula="~ ...", base=..., beta=c(...)) Compose administrative, random, and covariate-dependent censoring directly.

Examples

Target overall censoring:

cz_target <- list(mode="target_overall", target=0.25, admin_time=36)

Explicit mix (admin + random):

cz_explicit <- list(
  mode = "explicit",
  administrative = list(time = 36),
  random = list(dist = "exponential", params = list(rate = 0.02))
)

Worked examples

We now build full recipes and call simulate_from_recipe(). We report realized censoring via attr(dat, "achieved_censoring").

Example 1 — AFT Lognormal

covs1 <- list(
  list(name="age",   type="continuous",  dist="normal",     params=list(mean=62, sd=10),
       transform=c("center(60)","scale(10)")),
  list(name="sex",   type="categorical", dist="bernoulli",  params=list(p=0.45)),
  list(name="stage", type="categorical", dist="ordinal",
       params=list(prob=c(0.3,0.5,0.2), labels=c("I","II","III"))),
  list(name="x",     type="continuous",  dist="lognormal",  params=list(meanlog=0, sdlog=0.6))
)

rec1 <- list(
  n = 300,
  covariates = list(defs = covs1),
  treatment  = list(assignment="randomization", allocation="1:1"),
  event_time = list(model="aft_lognormal",
                    baseline=list(mu=3.0, sigma=0.6),
                    effects=list(intercept=0, treatment=-0.25,
                                 covariates=list(age=0.01, sex=-0.2, x=0.05))),
  censoring  = list(mode="target_overall", target=0.25, admin_time=36),
  seed = 11
)

dat1 <- simulate_from_recipe(validate_recipe(rec1))
head(dat1)
       time status arm        age sex stage         x
1 17.256803      1   0 -0.3910311   0    II 1.2192376
2 19.531621      1   1  0.2265944   1    II 0.6122083
3 11.902048      1   0 -1.3165531   0   III 1.2334828
4 18.770511      1   0 -1.1626533   0    II 0.5784534
5 16.584294      1   1  1.3784892   0    II 0.2619596
6  9.377759      1   1 -0.7341513   0    II 1.1455235
attr(dat1, "achieved_censoring")
[1] 0.23

Example 2 — AFT Weibull

rec2 <- rec1
rec2$event_time <- list(model="aft_weibull",
                        baseline=list(shape=1.3, scale=12),
                        effects=list(intercept=0, treatment=-0.20,
                                     covariates=list(age=0.008, x=0.04)))
dat2 <- simulate_from_recipe(validate_recipe(rec2), seed=12)
summary(dat2$time)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2221  3.4534  7.2184  8.8069 12.1700 36.0000 
attr(dat2, "achieved_censoring")
[1] 0.2466667

Example 3 — PH piecewise exponential (single segment)

rec3 <- list(
  n = 400,
  covariates = list(defs = covs1),
  treatment  = list(assignment="randomization", allocation="1:1"),
  event_time = list(model="ph_pwexp",
                    baseline=list(rates=c(0.05), cuts=numeric(0)),
                    effects=list(intercept=0, treatment=-0.3,
                                 covariates=list(age=0.01, x=0.03))),
  censoring  = list(mode="target_overall", target=0.20, admin_time=30),
  seed = 13
)
dat3 <- simulate_from_recipe(validate_recipe(rec3))
Example 3 — Summary
Metric Value
n 400.000
Events 297.000
Censoring rate 0.258
Mean time 16.440
Median time 15.130

Example 4 — PH piecewise exponential (multi-segment)

rec4 <- list(
  n = 500,
  covariates = list(defs = list(
    list(name="age", type="continuous",  dist="normal",    params=list(mean=60, sd=8)),
    list(name="sex", type="categorical", dist="bernoulli", params=list(p=0.5)),
    list(name="x",   type="continuous",  dist="lognormal", params=list(meanlog=0, sdlog=0.5))
  )),
  treatment  = list(assignment="randomization", allocation="1:1"),
  event_time = list(model="ph_pwexp",
                    baseline=list(rates=c(0.10, 0.06, 0.03), cuts=c(6, 18)),
                    effects=list(intercept=0, treatment=-0.4,
                                 covariates=list(age=0.01, x=0.03))),
  censoring  = list(mode="target_overall", target=0.25, admin_time=30)
)
dat4 <- simulate_from_recipe(validate_recipe(rec4), seed=123)
Example 4 — Summary
Metric Metric.1 Value
n n 500.000
Events Events 367.000
Censoring rate Censoring rate 0.266
Mean time Mean time 5.640
Median time Median time 3.560

Batch generation with metadata

For simulation studies, write multiple scenarios and formats together. The writer creates a manifest.rds with a list-column meta describing each dataset. The loader reattaches attributes when reading back.

base <- validate_recipe(rec2)

out_dir <- file.path(tempdir(), "rmstss-manifest-demo")
unlink(out_dir, recursive = TRUE, force = TRUE)

man <- generate_recipe_sets(
  base_recipe = base,
  vary = list(n = c(200, 400),
              "event_time.effects.treatment" = c(-0.15, -0.25)),
  out_dir  = out_dir,
  formats  = c("rds","csv"),
  n_reps   = 1,
  seed_base = 2025
)

# Inspect the first row's compact metadata (fields only; no file paths)
m <- readRDS(file.path(out_dir, "manifest.rds"))
names(m)
 [1] "scenario_id"                     "rep"                            
 [3] "seed"                            "achieved_censoring"             
 [5] "n"                               "file_txt"                       
 [7] "file_csv"                        "file_rds"                       
 [9] "file_rdata"                      "p__n"                           
[11] "p__event_time.effects.treatment" "meta"                           
if ("meta" %in% names(m) && length(m$meta[[1]]) > 0) {
  list(model = m$meta[[1]]$model,
       baseline = m$meta[[1]]$baseline,
       effects = m$meta[[1]]$effects,
       achieved_censoring = m$meta[[1]]$achieved_censoring,
       n = m$meta[[1]]$n)
} else {
  "Manifest is minimal (older run); use rebuild_manifest() to enrich."
}
$model
[1] "aft_weibull"

$baseline
$baseline$shape
[1] 1.3

$baseline$scale
[1] 12


$effects
$effects$intercept
[1] 0

$effects$treatment
[1] -0.15

$effects$covariates
$effects$covariates$age
[1] 0.008

$effects$covariates$x
[1] 0.04



$achieved_censoring
[1] 0.28

$n
[1] 200

# Load datasets back
sets <- load_recipe_sets(file.path(out_dir, "manifest.rds"))
attr(sets[[1]]$data, "achieved_censoring")
[1] 0.28
str(sets[[1]]$meta)
List of 19
 $ dataset_id        : chr "sc001_r01"
 $ scenario_id       : int 1
 $ rep               : int 1
 $ seed_used         : int 3026
 $ n                 : int 200
 $ n_treat           : int 110
 $ n_control         : int 90
 $ event_rate        : num 0.72
 $ achieved_censoring: num 0.28
 $ model             : chr "aft_weibull"
 $ baseline          :List of 2
  ..$ shape: num 1.3
  ..$ scale: num 12
 $ effects           :List of 3
  ..$ intercept : num 0
  ..$ treatment : num -0.15
  ..$ covariates:List of 2
  .. ..$ age: num 0.008
  .. ..$ x  : num 0.04
 $ treatment         :List of 2
  ..$ assignment: chr "randomization"
  ..$ allocation: chr "1:1"
 $ censoring         :List of 3
  ..$ mode      : chr "target_overall"
  ..$ target    : num 0.25
  ..$ admin_time: num 36
 $ covariates        :List of 4
  ..$ :List of 4
  .. ..$ name  : chr "age"
  .. ..$ type  : chr "continuous"
  .. ..$ dist  : chr "normal"
  .. ..$ params:List of 2
  .. .. ..$ mean: num 62
  .. .. ..$ sd  : num 10
  ..$ :List of 4
  .. ..$ name  : chr "sex"
  .. ..$ type  : chr "categorical"
  .. ..$ dist  : chr "bernoulli"
  .. ..$ params:List of 1
  .. .. ..$ p: num 0.45
  ..$ :List of 4
  .. ..$ name  : chr "stage"
  .. ..$ type  : chr "categorical"
  .. ..$ dist  : chr "ordinal"
  .. ..$ params:List of 2
  .. .. ..$ prob  : num [1:3] 0.3 0.5 0.2
  .. .. ..$ labels: chr [1:3] "I" "II" "III"
  ..$ :List of 4
  .. ..$ name  : chr "x"
  .. ..$ type  : chr "continuous"
  .. ..$ dist  : chr "lognormal"
  .. ..$ params:List of 2
  .. .. ..$ meanlog: num 0
  .. .. ..$ sdlog  : num 0.6
 $ allocation        : chr "1:1"
 $ params            :List of 2
  ..$ n                           : num 200
  ..$ event_time.effects.treatment: num -0.15
 $ files             :List of 4
  ..$ txt  : chr NA
  ..$ csv  : chr "/tmp/Rtmp2Ikrna/rmstss-manifest-demo/sc1_r1.csv"
  ..$ rds  : chr "/tmp/Rtmp2Ikrna/rmstss-manifest-demo/sc1_r1.rds"
  ..$ rdata: chr NA
 $ created_at        : chr "2025-09-06 20:33:01.183117"

Reproducibility tips

That’s it—you now have the moving parts to define covariates, choose an event-time engine, specify censoring, simulate data, and (optionally) batch-create scenarios with compact metadata for downstream analysis.