Ensemble simulations
ensemble.Rmd
After having built a stock-and-flow model, you may want to explore how different parameter values affect the model’s behaviour. Running multiple simulations with varying parameters is also called an ensemble, which provides insight into the range of possible outcomes and uncertainty associated with your model. In this vignette, we will explore how to set up and run ensemble simulations using the sdbuildR package.
Setting up the model
For this example, we will use Crielaard et al.’s (2022) model of eating behaviour, including the stocks hunger, eating, and compensatory behaviour (i.e. disordered eating behaviour such as purging and overexercising). For more details, see Crielaard et al. (2022). We can load this example from the model library and look what is inside:
sfm <- xmile("Crielaard2022")
summary(sfm)
#> Your model contains:
#> * 3 Stocks: Food_intake, Hunger, Compensatory_behaviour
#> * 8 Flows: Losing_energy_by_compensatory_behavior, Feeling_hunger, Satiety, Food_intake_reduces_hunger, Compensating_for_having_eaten, Satisfaction_with_hungry_feeling, Effect_of_eating_triggers, Effect_of_compensatory_behavior
#> * 3 Constants: a0, a1, a2
#> * 0 Auxiliaries
#> * 0 Graphical Functions
#> * 0 Custom model units
#> * 0 Macros
#>
#> Simulation time: 0.0 to 100.0 days (dt = 0.01)
#> Simulation settings: solver euler in R
Ensemble simulations are only supported in Julia, so we need to load the Julia environment and change the simulation language to Julia:
use_julia()
#> Starting Julia ...
#> Connecting to Julia TCP server at localhost:11980 ...
#> Setting up Julia environment for sdbuildR...
sfm <- sim_specs(sfm, language = "Julia")
Without changing the parameters, we can run a single simulation to see how the model behaves:
As the model has random initial conditions, another run will be different:
To explore this more systematically, we can run an ensemble
simulation using the ensemble()
function.
Running ensemble simulations
Ensemble simulations create multiple runs of the model, which only makes sense if the model either has some random elements or if parameters are being varied. Our model already has random initial conditions, but if it had not, we could create these:
With random initial conditions, multiple runs of the same model will be different. As running ensemble simulations can be quite memory intensive, it is highly recommended to reduce the size of the returned timeseries. This will save memory and speed up the simulation. For example, we can only save the timeseries every 1 time units, and only save from time 10:
sfm <- sim_specs(sfm, save_at = 1, save_from = 10)
The model is now ready for running ensemble simulations. We complete
100 runs using the ensemble()
function:
sims <- ensemble(sfm, n = 100)
#> Running a total of 100 simulations
#> Simulation took 4.6642 seconds
plot(sims)
The plot shows the mean and confidence interval of the stocks (mean
with 95% confidence interval). We can also plot the individual runs, for
which we first have to rerun the simulation and set
return_sims = TRUE
:
sims <- ensemble(sfm, n = 30, return_sims = TRUE)
#> Running a total of 30 simulations
#> Simulation took 0.641 seconds
plot(sims, type = "sims")
This automatically only plots the first ten simulations, as plotting
a large number of simulations can be quite slow. We can change which
simulations we plot by specifying the i
argument:
plot(sims, type = "sims", i = 15:30)
By default, only the stocks are saved. To save all variables, set
only_stocks = FALSE
:
sims <- ensemble(sfm, n = 100, only_stocks = FALSE)
#> Running a total of 100 simulations
#> Simulation took 2.1264 seconds
plot(sims)
Parallel simulations
To run simulations in parallel, set the number of threads you would like to use:
use_threads(n = 4)
#> Warning in use_threads(n = 4): n is set to 4, which is higher than the number
#> of available cores minus 1. Setting it to 3.
To stop using parallel simulations, run
use_threads(stop = TRUE)
.
Specifying ranges
Instead of generating an ensemble with random initial conditions, we can also specify ensembles with exact parameter values. For example, we could vary the parameter, which determines how strongly having eaten increases compensatory behaviour.
sims <- ensemble(sfm,
n = 100,
range = list("a2" = c(0.2, 0.4, 0.6, 0.8))
)
#> Running a total of 400 simulations for 4 conditions (100 simulations per condition)
#> Simulation took 2.2523 seconds
plot(sims)
We can also vary multiple parameters at once. For example, we can
vary both
and
,
where the latter influences how strongly food intake leads to more food
intake. n
now specifies the number of simulations per
condition. By default, cross = TRUE
, which means that all
possible combinations of parameters are simulated.
sims <- ensemble(sfm,
range = list(
"a2" = c(0.2, 0.8),
"a1" = c(1.3, 1.5)
),
n = 100
)
#> Running a total of 400 simulations for 4 conditions (100 simulations per condition)
#> Simulation took 0.9372 seconds
plot(sims)
The plot shows similarity within columns but differences between
columns. As
differs between columns, it appears that
has a larger effect than
.
To view the parameter combination corresponding to each condition
,
view conditions
in sims
:
sims$conditions
#> j a1 a2
#> [1,] 1 1.3 0.2
#> [2,] 2 1.5 0.2
#> [3,] 3 1.3 0.8
#> [4,] 4 1.5 0.8
To generate a non-crossed designed, set cross = FALSE
.
In this case, the length of each range needs to be the same.
sims <- ensemble(sfm,
range = list(
"a2" = c(0.4, 0.5, 0.6),
"a1" = c(1.3, 1.4, 1.5)
),
n = 100, cross = FALSE, return_sims = TRUE
)
#> Running a total of 300 simulations for 3 conditions (100 simulations per condition)
#> Simulation took 0.7381 seconds
plot(sims, nrows = 1)
We can select specific conditions to compare, where here we plot the first 5 simulations of the first two conditions:
plot(sims, i = 1:15, j = 1:2, type = "sims", nrows = 1)
Accessing simulation results
The results of the ensemble simulation are stored in the
sims
object. You can access the summary statistics per
condition
and per timepoint, such as the mean and confidence intervals, using:
head(sims$summary)
#> j time variable mean median variance
#> 1 1 10 Compensatory_behaviour 0.49752342 0.4925661167 0.0050657405
#> 2 1 10 Food_intake 0.01127093 0.0006806108 0.0009954441
#> 3 1 10 Hunger 0.88038502 0.8753712340 0.0027987819
#> 4 1 11 Compensatory_behaviour 0.47184687 0.4676526175 0.0030850559
#> 5 1 11 Food_intake 0.01032006 0.0004710780 0.0008999066
#> 6 1 11 Hunger 0.89699247 0.8917484667 0.0018608474
#> missing_count q025 q975
#> 1 0 3.969197e-01 0.65575622
#> 2 0 8.343609e-06 0.09740325
#> 3 0 8.022650e-01 0.98394813
#> 4 0 3.956835e-01 0.58339579
#> 5 0 6.436691e-06 0.09190464
#> 6 0 8.340815e-01 0.98500971
If you have set have set return_sims = TRUE
, you can
find the individual simulation runs in sims$df
. These are
in long format, containing the value of each variable for each timepoint
in each simulation
per condition
.
head(sims$df)
#> j i time variable value
#> 1 1 1 10 Compensatory_behaviour 0.484402052
#> 2 1 1 10 Food_intake 0.010087221
#> 3 1 1 10 Hunger 0.851484951
#> 4 1 1 11 Compensatory_behaviour 0.467650293
#> 5 1 1 11 Food_intake 0.007535658
#> 6 1 1 11 Hunger 0.870943886
To view the initial values of each simulation per condition , run:
head(sims$init$df)
#> j i variable value
#> 1 1 1 Compensatory_behaviour 0.30478256
#> 2 1 1 Food_intake 0.55435115
#> 3 1 1 Hunger 0.83209859
#> 4 1 2 Compensatory_behaviour 0.01156185
#> 5 1 2 Food_intake 0.89676911
#> 6 1 2 Hunger 0.19775285
To view their summary statistics, run:
head(sims$init$summary)
#> j variable mean median variance missing_count
#> 1 1 Compensatory_behaviour 0.4790821 0.4860786 0.08572806 0
#> 2 1 Food_intake 0.5213652 0.5095480 0.07865034 0
#> 3 1 Hunger 0.5321854 0.5321732 0.08626045 0
#> 4 2 Compensatory_behaviour 0.4854778 0.4664117 0.08341138 0
#> 5 2 Food_intake 0.5067007 0.5286892 0.08246959 0
#> 6 2 Hunger 0.4842567 0.4493411 0.07354595 0
#> q025 q975
#> 1 0.02679849 0.9649624
#> 2 0.02461372 0.9739386
#> 3 0.02029469 0.9871011
#> 4 0.03805743 0.9849021
#> 5 0.04430578 0.9625319
#> 6 0.06109075 0.9834187
Finally, to access the parameters (i.e. constants) of each simulation per condition , run:
head(sims$constants$df)
#> j i variable value
#> 1 1 1 a0 1.31
#> 2 1 1 a1 1.30
#> 3 1 1 a2 0.40
#> 4 1 2 a0 1.31
#> 5 1 2 a1 1.30
#> 6 1 2 a2 0.40
To view their summary statistics, run:
head(sims$constants$summary)
#> j variable mean median variance missing_count q025 q975
#> 1 1 a0 1.31 1.31 0.000000e+00 0 1.31 1.31
#> 2 1 a1 1.30 1.30 4.980182e-32 0 1.30 1.30
#> 3 1 a2 0.40 0.40 0.000000e+00 0 0.40 0.40
#> 4 2 a0 1.31 1.31 0.000000e+00 0 1.31 1.31
#> 5 2 a1 1.40 1.40 0.000000e+00 0 1.40 1.40
#> 6 2 a2 0.50 0.50 0.000000e+00 0 0.50 0.50