BIRS Recap: Machine learning for model parameterization

Sara HamisDavid HormuthJohn MetzcarMaria Monzon RondaBernadette Stolz

BIRS Recap: Machine learning for model parameterization

Written by Sara Hamis, David Hormuth, John Metzcar, Maria Monzon Ronda, Bernadette Stolz - April 03, 2025

This blog is one in a series resulting from the Winter 2025 BIRS workshop on Mechanistic Learning as a Combination of Machine Learning and Modeling in Mathematical Oncology.

**Figure 1:** From left to right: David Hormuth, Maria Monzon Ronda, Bernadette Stolz, Sara Hamis, John Metzcar.

In early January 2025, our group (pictured in the best room at BIRS) of mathematical and computational oncologists convened to explore opportunities for machine learning in model parameterization within mathematical and computational oncology. The week began with a brainstorming session aimed at defining the scope of this topic, identifying how our diverse expertise could contribute, and determining initial steps to integrate our knowledge effectively. As expected, our varied backgrounds in mechanistic modeling and data-driven approaches led to different interpretations of the topic. Through discussion, we refined our focus to two primary areas: (1) hierarchical modeling and (2) hybrid modeling. (Interested readers may be interested to check out [1] or [2] for a more detailed discussion of machine learning in mathematical oncology).

(1) Hierarchical modeling is a statistical framework that incorporates multiple levels of uncertainty or priors to account for variability across patients, tumor types, or experimental conditions. For instance, when applied to a Gompertz tumor growth model, hierarchical priors can be assigned at the individual level, where each patient's parameters (e.g., growth rate, carrying capacity) are drawn from a higher-level population distribution. Machine learning plays a crucial role in refining these parameters and priors by integrating complex, multi-modal datasets—such as caliper measurements, tumor volume assessments, imaging data, clinical pathology features, and RNA expression profiles. The ultimate goal is to use ML-enhanced hierarchical modeling to derive realistic parameter distributions that reduce uncertainty and enhance model robustness. One example of this approach is described by Boulet et al. [3], who leverage multi-modal data from both preclinical and clinical studies to improve model estimates and reduce uncertainty. Boulet et al. propose a Bayesian framework for integrating multi-source preclinical data to improve human dose predictions. Their four-step approach involves sequential parameter estimation, extrapolation to humans, commensurability checks between posterior distributions, and merging information to enhance estimation precision. Applied to oncology drug development, this method effectively reduces uncertainty in predictions, potentially leading to more efficient dose selection.

(2) Hybrid modeling combines mechanistic models (e.g., physics-based or biological models) with data-driven approaches (e.g., machine learning) to improve the accuracy and flexibility of tumor growth predictions. This approach could be particularly useful in oncology, where certain biological processes driving tumor growth are poorly understood, difficult to measure, or highly variable across patients. When applied to the same Gompertz tumor growth model, hybrid modeling could be used to replace or augment specific terms within the model allowing for improved predictions while retaining the interpretability and biological realism. For example, instead of estimating or calibrating a temporally-static proliferation rate through least-squares based approaches, the proliferation rate could be replaced with a neural network (or other machine learning framework) to yield a proliferation rate that is specific given a set of multi-modality data (e.g., imaging, clinical-pathology features, genetic markers). This could also be applied to other terms including those describing treatment efficacy. For example, in work by Podina, Ghodsi, and Kohandel [5], tumor-drug interactions of quantitative systems pharmacology models, which are previously unknown, are approximated with universal physics-informed neural networks [6]. Another application of hybrid modeling could be applied to correct model residuals via closure modeling. When applied to the same Gompertz tumor growth model, an additional term reflecting the model-data misfit can be trained to “close” or minimize the errors between the model predictions and the observed tumor growth. This ML-based term can be a function of additional patient or tumor data and allow for improved predictions with reduced uncertainty.

Based on this initial discussion, we pursued a project employing hierarchical modeling to better assign model parameters using multiparametric magnetic resonance imaging data, topological data analysis, and previously calibrated model parameters describing tumor growth and angiogenesis in a murine glioma model. But–no spoilers here–check out our work at the Annual SMB meeting this summer.

In summary, hybrid and hierarchical modeling approaches offer powerful ways to integrate machine learning with mechanistic tumor growth models. However, they are not universally applicable to all data or models. The success of these techniques depends on appropriate model and machine learning technique pairings, where the structure of the mechanistic model aligns with the strengths of the machine learning approach being used. An example of over-interpretation and misuse of machine learning in cancer research is seen in the use of decision tree algorithms for locally advanced rectal cancer, where complex models were applied without adequate consideration of the underlying data characteristics, leading to potential misinterpretation of results [4]. This highlights the importance of selecting models that align with the complexity and nature of the data being analyzed. Poorly chosen pairings can lead to overfitting, instability, or biologically implausible predictions. Additionally, data complexity, quantity, and quality play a crucial role—noisy, sparse, or high-dimensional data may introduce biases, making it difficult to extract meaningful patterns for parameter estimation or model refinement. Managing these challenges requires careful consideration of data preprocessing, feature selection, and uncertainty quantification. Conceptually, hybrid and hierarchical modeling fit within the broader mechanistic learning landscape, where they reside in the top right corner, representing models that maintain mechanistic interpretability while leveraging machine learning for enhanced predictive power. However, their placement also highlights the inherent trade-offs: while these methods provide flexibility and robustness, they demand rigorous validation to ensure they remain grounded in biological reality.

**Figure 2:** Mechanistic learning landscape, spanning from the difficult to model areas with little knowledge and data to paradigms such as hybrid and hierarchical modeling which make use of both physical and biological knowledge and data. From 1.

References

Metzcar J, Jutzeler CR, Macklin P, Köhn-Luque A, Brüningk SC. A review of mechanistic learning in mathematical oncology. Front Immunol. 2024 Mar 12;15:1363144. doi: 10.3389/fimmu.2024.1363144. PMID: 38533513; PMCID: PMC10963621.
Lorenzo G, Ahmed SR, Hormuth DA, Vaughn B, Kalpathy-Cramer J, Solorio L, Yankeelov TE, Gomez H. Patient-Specific, Mechanistic Models of Tumor Growth Incorporating Artificial Intelligence and Big Data. Annu Rev Biomed Eng. 2024 Jul;26(1):529-560. doi: 10.1146/annurev-bioeng-081623-025834. Epub 2024 Jun 20. PMID: 38594947.
Boulet S, Ursino M, Michelet R, Aulin LB, Kloft C, Comets E, Zohar S. Bayesian framework for multi-source data integration-Application to human extrapolation from preclinical studies. Stat Methods Med Res. 2024 Apr;33(4):574-588. doi: 10.1177/09622802241231493. Epub 2024 Mar 6. PMID: 38446999.
De Felice F, Crocetti D, Parisi M, Maiuri ., Moscarelli E, Caiazzo R, Bulzonetti N, Musio D, Tombolini V (2020). Decision tree algorithm in locally advanced rectal cancer: an example of over-interpretation and misuse of a machine learning approach. Journal of cancer research and clinical oncology, 146(3), 761–765. doi: 10.1007/s00432-019-03102-y
Podina, L., Ghodsi A., and Kohandel, M. (2024). Learning Chemotherapy Drug Action via Universal Physics-Informed Neural Networks. arXiv. Preprint. arXiv:2404.08019v1. doi: 10.48550/arXiv.2404.08019.
Podina, L., Eastman, B., and Kohandel, M. (2022). A PINN approach to symbolic differential operator discovery with sparse data. arXiv. Preprint. doi: 10.48550/2212.04630

← Previous Post Next Post →

[ref-1] Metzcar J, Jutzeler CR, Macklin P, Köhn-Luque A, Brüningk SC. A review of mechanistic learning in mathematical oncology. Front Immunol. 2024 Mar 12;15:1363144. doi: 10.3389/fimmu.2024.1363144. PMID: 38533513; PMCID: PMC10963621.

[ref-2] Lorenzo G, Ahmed SR, Hormuth DA, Vaughn B, Kalpathy-Cramer J, Solorio L, Yankeelov TE, Gomez H. Patient-Specific, Mechanistic Models of Tumor Growth Incorporating Artificial Intelligence and Big Data. Annu Rev Biomed Eng. 2024 Jul;26(1):529-560. doi: 10.1146/annurev-bioeng-081623-025834. Epub 2024 Jun 20. PMID: 38594947.

[ref-3] Boulet S, Ursino M, Michelet R, Aulin LB, Kloft C, Comets E, Zohar S. Bayesian framework for multi-source data integration-Application to human extrapolation from preclinical studies. Stat Methods Med Res. 2024 Apr;33(4):574-588. doi: 10.1177/09622802241231493. Epub 2024 Mar 6. PMID: 38446999.

[ref-4] De Felice F, Crocetti D, Parisi M, Maiuri ., Moscarelli E, Caiazzo R, Bulzonetti N, Musio D, Tombolini V (2020). Decision tree algorithm in locally advanced rectal cancer: an example of over-interpretation and misuse of a machine learning approach. Journal of cancer research and clinical oncology, 146(3), 761–765. doi: 10.1007/s00432-019-03102-y

[ref-5] Podina, L., Ghodsi A., and Kohandel, M. (2024). Learning Chemotherapy Drug Action via Universal Physics-Informed Neural Networks. arXiv. Preprint. arXiv:2404.08019v1. doi: 10.48550/arXiv.2404.08019.

[ref-6] Podina, L., Eastman, B., and Kohandel, M. (2022). A PINN approach to symbolic differential operator discovery with sparse data. arXiv. Preprint. doi: 10.48550/2212.04630