What this covers
We show how to construct a recipe (covariates →
treatment → event time → censoring), and then simulate datasets with
simulate_from_recipe()
. We also show batch generation
(generate_recipe_sets()
) and how to read compact metadata
back (load_recipe_sets()
), without any platform-specific
tricks.
The simulator takes a plain list as the recipe. No YAML is required.
Covariates
Each covariate has a name
, type
("continuous"
or "categorical"
), a
dist
with params
, and optional
transform
steps applied after generation
("center(a)"
, "scale(b)"
).
Available distributions
Type |
dist (string) |
Parameters |
---|---|---|
continuous | normal |
mean , sd
|
continuous | lognormal |
meanlog , sdlog
|
continuous | gamma |
shape , scale
|
continuous | weibull |
shape , scale
|
continuous | uniform |
min , max
|
continuous | beta |
shape1 , shape2
|
continuous | t |
df |
categorical | bernoulli |
p (probability of 1) |
categorical | categorical |
prob = c(...) , labels = c(...)
(optional) |
categorical | ordinal |
prob = c(...) , labels = c(...) (optional,
ordered) |
Example: define covariates
covs <- list(
list(name="age", type="continuous", dist="normal", params=list(mean=62, sd=10),
transform=c("center(60)","scale(10)")),
list(name="sex", type="categorical", dist="bernoulli", params=list(p=0.45)),
list(name="stage", type="categororical", dist="ordinal",
params=list(prob=c(0.3,0.5,0.2), labels=c("I","II","III"))),
list(name="x", type="continuous", dist="lognormal", params=list(meanlog=0, sdlog=0.6))
)
Treatment
Choose one assignment
:
Assignment | Key fields | Meaning |
---|---|---|
"randomization" |
allocation = "a:b" |
Bernoulli with probability . |
"stratified" |
allocation , stratify_by = c("...")
|
Same allocation within each stratum defined by listed categorical covariates. |
"logistic_ps" |
ps_model = list(formula = "~ ...", beta = c(...)) |
Treatment probability is
from user model. Provide explicit beta to avoid parsing
edge-cases. |
Examples
Randomization:
tr_rand <- list(assignment="randomization", allocation="1:1")
Stratified by "stage"
:
Logistic propensity:
tr_ps <- list(
assignment = "logistic_ps",
ps_model = list(
formula = "~ 1 + x + sex",
beta = c(-0.3, 1.2, -0.6) # (Intercept), x, sex
)
)
Event-time engines
Let
be treatment (0/1),
be covariates, and
be the linear predictor (defined in
effects
, below). Supported engines and
baseline parameterizations:
Model (user-facing) |
model value |
Baseline parameters | Notes |
---|---|---|---|
AFT Lognormal | "aft_lognormal" |
mu , sigma
|
, . |
AFT Weibull | "aft_weibull" |
shape , scale
|
; AFT shift via . |
AFT Log-Logistic | "aft_loglogistic" |
shape , scale
|
. |
PH Exponential | "ph_pwexp" |
rates = c(λ) , cuts = numeric(0)
|
Piecewise-Exp with a single segment is exponential. |
PH Weibull | "ph_weibull" |
shape , scale
|
Proportional hazards with Weibull baseline. |
PH Gompertz | "ph_gompertz" |
rate , gamma
|
Hazard . |
PH Piecewise Exponential | "ph_pwexp" |
rates = c(r1,r2,...) ,
cuts = c(c1,c2,...)
|
Rate in segment is . |
Effects and linear predictor
Specify effects on the appropriate scale (AFT: log-time; PH: log-hazard):
effects = list(
intercept = 0, # default is 0
treatment = -0.25,
covariates = list(age = 0.01, sex = -0.2) # NOTE: named LIST
# or: formula="~ age + sex", beta=c(0.01, -0.2)
)
effects$covariates
must be a named list of numerics (e.g.,list(age=0.01)
), not a named vector created withc()
.
Censoring
Two modes are supported:
Mode | Fields | Semantics |
---|---|---|
"target_overall" |
target , admin_time
|
Solver finds an exponential random-censoring rate
so that overall censoring fraction
target , subject to any administrative floor at
admin_time . |
"explicit" |
Any of: administrative = list(time=...) ;
random = list(dist="exponential", params=list(rate=...)) ;
dependent = list(formula="~ ...", base=..., beta=c(...))
|
Compose administrative, random, and covariate-dependent censoring directly. |
Examples
Target overall censoring:
cz_target <- list(mode="target_overall", target=0.25, admin_time=36)
Explicit mix (admin + random):
cz_explicit <- list(
mode = "explicit",
administrative = list(time = 36),
random = list(dist = "exponential", params = list(rate = 0.02))
)
Worked examples
We now build full recipes and call
simulate_from_recipe()
. We report realized censoring via
attr(dat, "achieved_censoring")
.
Example 1 — AFT Lognormal
covs1 <- list(
list(name="age", type="continuous", dist="normal", params=list(mean=62, sd=10),
transform=c("center(60)","scale(10)")),
list(name="sex", type="categorical", dist="bernoulli", params=list(p=0.45)),
list(name="stage", type="categorical", dist="ordinal",
params=list(prob=c(0.3,0.5,0.2), labels=c("I","II","III"))),
list(name="x", type="continuous", dist="lognormal", params=list(meanlog=0, sdlog=0.6))
)
rec1 <- list(
n = 300,
covariates = list(defs = covs1),
treatment = list(assignment="randomization", allocation="1:1"),
event_time = list(model="aft_lognormal",
baseline=list(mu=3.0, sigma=0.6),
effects=list(intercept=0, treatment=-0.25,
covariates=list(age=0.01, sex=-0.2, x=0.05))),
censoring = list(mode="target_overall", target=0.25, admin_time=36),
seed = 11
)
dat1 <- simulate_from_recipe(validate_recipe(rec1))
head(dat1)
time status arm age sex stage x
1 17.256803 1 0 -0.3910311 0 II 1.2192376
2 19.531621 1 1 0.2265944 1 II 0.6122083
3 11.902048 1 0 -1.3165531 0 III 1.2334828
4 18.770511 1 0 -1.1626533 0 II 0.5784534
5 16.584294 1 1 1.3784892 0 II 0.2619596
6 9.377759 1 1 -0.7341513 0 II 1.1455235
attr(dat1, "achieved_censoring")
[1] 0.23
Example 2 — AFT Weibull
rec2 <- rec1
rec2$event_time <- list(model="aft_weibull",
baseline=list(shape=1.3, scale=12),
effects=list(intercept=0, treatment=-0.20,
covariates=list(age=0.008, x=0.04)))
dat2 <- simulate_from_recipe(validate_recipe(rec2), seed=12)
summary(dat2$time)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.2221 3.4534 7.2184 8.8069 12.1700 36.0000
attr(dat2, "achieved_censoring")
[1] 0.2466667
Example 3 — PH piecewise exponential (single segment)
rec3 <- list(
n = 400,
covariates = list(defs = covs1),
treatment = list(assignment="randomization", allocation="1:1"),
event_time = list(model="ph_pwexp",
baseline=list(rates=c(0.05), cuts=numeric(0)),
effects=list(intercept=0, treatment=-0.3,
covariates=list(age=0.01, x=0.03))),
censoring = list(mode="target_overall", target=0.20, admin_time=30),
seed = 13
)
dat3 <- simulate_from_recipe(validate_recipe(rec3))
Metric | Value |
---|---|
n | 400.000 |
Events | 297.000 |
Censoring rate | 0.258 |
Mean time | 16.440 |
Median time | 15.130 |
Example 4 — PH piecewise exponential (multi-segment)
rec4 <- list(
n = 500,
covariates = list(defs = list(
list(name="age", type="continuous", dist="normal", params=list(mean=60, sd=8)),
list(name="sex", type="categorical", dist="bernoulli", params=list(p=0.5)),
list(name="x", type="continuous", dist="lognormal", params=list(meanlog=0, sdlog=0.5))
)),
treatment = list(assignment="randomization", allocation="1:1"),
event_time = list(model="ph_pwexp",
baseline=list(rates=c(0.10, 0.06, 0.03), cuts=c(6, 18)),
effects=list(intercept=0, treatment=-0.4,
covariates=list(age=0.01, x=0.03))),
censoring = list(mode="target_overall", target=0.25, admin_time=30)
)
dat4 <- simulate_from_recipe(validate_recipe(rec4), seed=123)
Metric | Metric.1 | Value |
---|---|---|
n | n | 500.000 |
Events | Events | 367.000 |
Censoring rate | Censoring rate | 0.266 |
Mean time | Mean time | 5.640 |
Median time | Median time | 3.560 |
Batch generation with metadata
For simulation studies, write multiple scenarios and formats
together. The writer creates a manifest.rds
with a
list-column meta
describing each dataset.
The loader reattaches attributes when reading back.
base <- validate_recipe(rec2)
out_dir <- file.path(tempdir(), "rmstss-manifest-demo")
unlink(out_dir, recursive = TRUE, force = TRUE)
man <- generate_recipe_sets(
base_recipe = base,
vary = list(n = c(200, 400),
"event_time.effects.treatment" = c(-0.15, -0.25)),
out_dir = out_dir,
formats = c("rds","csv"),
n_reps = 1,
seed_base = 2025
)
# Inspect the first row's compact metadata (fields only; no file paths)
m <- readRDS(file.path(out_dir, "manifest.rds"))
names(m)
[1] "scenario_id" "rep"
[3] "seed" "achieved_censoring"
[5] "n" "file_txt"
[7] "file_csv" "file_rds"
[9] "file_rdata" "p__n"
[11] "p__event_time.effects.treatment" "meta"
if ("meta" %in% names(m) && length(m$meta[[1]]) > 0) {
list(model = m$meta[[1]]$model,
baseline = m$meta[[1]]$baseline,
effects = m$meta[[1]]$effects,
achieved_censoring = m$meta[[1]]$achieved_censoring,
n = m$meta[[1]]$n)
} else {
"Manifest is minimal (older run); use rebuild_manifest() to enrich."
}
$model
[1] "aft_weibull"
$baseline
$baseline$shape
[1] 1.3
$baseline$scale
[1] 12
$effects
$effects$intercept
[1] 0
$effects$treatment
[1] -0.15
$effects$covariates
$effects$covariates$age
[1] 0.008
$effects$covariates$x
[1] 0.04
$achieved_censoring
[1] 0.28
$n
[1] 200
# Load datasets back
sets <- load_recipe_sets(file.path(out_dir, "manifest.rds"))
attr(sets[[1]]$data, "achieved_censoring")
[1] 0.28
str(sets[[1]]$meta)
List of 19
$ dataset_id : chr "sc001_r01"
$ scenario_id : int 1
$ rep : int 1
$ seed_used : int 3026
$ n : int 200
$ n_treat : int 110
$ n_control : int 90
$ event_rate : num 0.72
$ achieved_censoring: num 0.28
$ model : chr "aft_weibull"
$ baseline :List of 2
..$ shape: num 1.3
..$ scale: num 12
$ effects :List of 3
..$ intercept : num 0
..$ treatment : num -0.15
..$ covariates:List of 2
.. ..$ age: num 0.008
.. ..$ x : num 0.04
$ treatment :List of 2
..$ assignment: chr "randomization"
..$ allocation: chr "1:1"
$ censoring :List of 3
..$ mode : chr "target_overall"
..$ target : num 0.25
..$ admin_time: num 36
$ covariates :List of 4
..$ :List of 4
.. ..$ name : chr "age"
.. ..$ type : chr "continuous"
.. ..$ dist : chr "normal"
.. ..$ params:List of 2
.. .. ..$ mean: num 62
.. .. ..$ sd : num 10
..$ :List of 4
.. ..$ name : chr "sex"
.. ..$ type : chr "categorical"
.. ..$ dist : chr "bernoulli"
.. ..$ params:List of 1
.. .. ..$ p: num 0.45
..$ :List of 4
.. ..$ name : chr "stage"
.. ..$ type : chr "categorical"
.. ..$ dist : chr "ordinal"
.. ..$ params:List of 2
.. .. ..$ prob : num [1:3] 0.3 0.5 0.2
.. .. ..$ labels: chr [1:3] "I" "II" "III"
..$ :List of 4
.. ..$ name : chr "x"
.. ..$ type : chr "continuous"
.. ..$ dist : chr "lognormal"
.. ..$ params:List of 2
.. .. ..$ meanlog: num 0
.. .. ..$ sdlog : num 0.6
$ allocation : chr "1:1"
$ params :List of 2
..$ n : num 200
..$ event_time.effects.treatment: num -0.15
$ files :List of 4
..$ txt : chr NA
..$ csv : chr "/tmp/Rtmp2Ikrna/rmstss-manifest-demo/sc1_r1.csv"
..$ rds : chr "/tmp/Rtmp2Ikrna/rmstss-manifest-demo/sc1_r1.rds"
..$ rdata: chr NA
$ created_at : chr "2025-09-06 20:33:01.183117"
Reproducibility tips
- Set
seed
in the recipe or passseed=
tosimulate_from_recipe()
. - For grids, fix a deterministic scheme like
seed_base + scenario_id*1000 + rep
(this is whatgenerate_recipe_sets()
does).
That’s it—you now have the moving parts to define covariates, choose an event-time engine, specify censoring, simulate data, and (optionally) batch-create scenarios with compact metadata for downstream analysis.