Skip to content

mrc-ide/frogger

Repository files navigation

frogger

Project Status: Concept – Minimal or no implementation has been done yet, or the repository is only intended to be a limited example, demo, or proof-of-concept. R build status codecov.io

Leapfrog is a multistate population projection model for demographic and HIV epidemic estimation.

The name leapfrog is in honor of Professor Basia Zaba.

Installation

You can install the development version of frogger from GitHub with:

# install.packages("remotes")
remotes::install_github("mrc-ide/frogger")

Simulation model

The simulation model is implemented in a header-only C++ library located in inst/include/frogger.hpp. This location allows the C++ code to be imported in other R packages via specifying LinkingTo: leapfrog in the DESCRIPTION file.

The simulation model is callable in R via a wrapper function run_model() created with Rcpp.

You can control how the simulation model is run with the following arguments:

  • hiv_age_stratification which must be “coarse” or “full”. Coarse is run with 5-year age groups and full with single year ages.
  • run_child_model which is FALSE by default. Set to TRUE to run the paediatric portion of the model.

Example

The file pjnz/bwa_aim-adult-art-no-special-elig_v6.13_2022-04-18.PJNZ contains an example Spectrum file constructed from default country data for Botswana with Spectrum (April 2022).

Prepare model inputs.

library(frogger)

pjnz <- system.file("pjnz/bwa_aim-adult-art-no-special-elig_v6.13_2022-04-18.PJNZ", 
                    package = "frogger", mustWork = TRUE)

demp <- prepare_leapfrog_demp(pjnz)
hivp <- prepare_leapfrog_projp(pjnz)

Simulate adult ‘full’ age group (single-year age) and ‘coarse’ age group (collapsed age groups) models from 1970 to 2030 with 10 HIV time steps per year.

lsimF <- run_model(demp, hivp, 1970:2030, 10L, 
                   hiv_age_stratification = "full", run_child_model = FALSE)
lsimC <- run_model(demp, hivp, 1970:2030, 10L, 
                   hiv_age_stratification = "coarse", run_child_model = FALSE)

Compare the HIV prevalence age 15-49 years and AIDS deaths 50+ years. Deaths 50+ years are to show some noticeable divergence between the "full" and "coarse" age group simulations.

prevF <- colSums(lsimF$p_hiv_pop[16:50,,],,2) / colSums(lsimF$p_total_pop[16:50,,],,2)
prevC <- colSums(lsimC$p_hiv_pop[16:50,,],,2) / colSums(lsimC$p_total_pop[16:50,,],,2)

deathsF <- colSums(lsimF$p_hiv_deaths[51:81,,],,2)
deathsC <- colSums(lsimC$p_hiv_deaths[51:81,,],,2)

plot(1970:2030, prevF, type = "l", main = "Prevalence 15-49")
lines(1970:2030, prevC, col = 2)

plot(1970:2030, deathsF, type = "l", main = "AIDS Deaths 50+ years")
lines(1970:2030, deathsC, col = 2)

Benchmarking

Install the package and then run the benchmarking script ./scripts/benchmark

lint

Lint R code with lintr

lintr::lint_package()

Lint C++ code with cpplint

cpplint inst/include/*

Code design

Simulation model

The simulation model is implemented as templated C++ code in inst/include/frogger.hpp. This is so the simulation model may be developed as a standalone C++ library that can be called by other software without requiring R-specific code features. The code uses header-only open source libraries to maximize portability.

R functions

The file src/frogger.cpp contains R wrapper functions for the model simulation via Rcpp and RcppEigen.

Development notes

Simulation model

  • The model was implemented using Eigen::Tensor containers. These were preferred for several reasons:
    • Benchmarking found they were slighlty more efficient than boost::multi_array.
    • Column-major indexing in the same order as R
    • Other statistical packages (e.g. TMB, Stan) rely heavily on Eigen so using Eigen containers slims the dependencies.

TODO

  • Restructuring the model code to identify more common code
    • There are examples like general demographic projection and hiv population demographic projection which are running similar processes like ageing, non HIV mortality, migration. We should be able to write a function for e.g. ageing which we can run on each of our population matrices. Even for the HIV and ART stratified we can add overloaded function to work with higher dimension data
  • Add a test that checks that no doubles are used in inst/include dir. We should be using templated real_type for TMB
  • Add a broad level overview of the algorithm - is there a diagram available?
  • Convert string flags to the model to enums if we need to switch on them in several places. This should make it easier to reason about in C++ world and isolate the string checking to a single place
  • Previously hiv_negative_pop was fixed size by having dimensions specified by template, how much does this speed up the code? Is there a better way to do this?
  • Tidy up confusing looping see #7 (comment)
  • Add R casting helpers which return better errors than Rcpp see #7 (comment)
  • Add a helper to do 0 to base 1 conversion and check upper bounds see #7 (comment)
  • Review what we pass as parameters - can some of these be computed in the struct ctor? e.g. #7 (comment)
  • Refactor OutputState to take a struct of state-space dimensions instead of unpacking the subset of parameters we need. See #12 (comment)

License

MIT © Imperial College of Science, Technology and Medicine