Bayesian statistical concepts with examples from rodent toxicology studies

Abstract

German

Spanish

French

The theory and practice of statistics comprises two main schools of thought: frequentist statistics and Bayesian statistics. Frequentist methods are most commonly used to analyze animal-based laboratory data, while Bayesian statistical methods have been implemented less widely and may be relatively unfamiliar to practitioners in experimental science. This paper provides a high-level overview of Bayesian statistics and how they compare with frequentist methods. Using examples in rodent toxicity research, we argue that Bayesian methods have much to offer laboratory animal researchers. We advocate for increased attention to and adoption of Bayesian methods in laboratory animal research. Bayesian statistical theory, methods, software, and education have advanced significantly in the last 30 years, making these tools more accessible than ever.

Keywords

Bayesian statistics prior distributions statistical software hierarchical modeling rodent toxicology

Introduction and definitions

There are two main approaches to statistical theory and practice: frequentist statistics and Bayesian statistics. Statistical inference tasks such as hypothesis testing or parameter estimation can be conducted with either approach. Many common statistical analyses such as t-tests, analysis of variance, chi-square tests, dose–response trend analyses, and many analyses based on regression models are traditionally conducted using a frequentist approach. Statistics for rodent toxicology consists primarily of frequentist statistical analyses. While frequentist approaches are suitable in many situations, there are a growing number of applications where a Bayesian approach is preferred.

Parameter estimation is one aspect of statistics where frequentist and Bayesian approaches differ. Suppose we are investigating the relationship between body weights and brain weights at necropsy for a set of control animals, using the simple linear regression model

y = α + β x + ϵ

where

y

is brain weight,

x

is body weight,

β

is the strength of the linear relationship between body weight and brain weight,

ϵ

represents measurement error or biological variability, and the intercept term

α

shifts the line

y = β x

vertically to improve the model fit. The frequentist practitioner estimates the unknown value of

β

by calculating a single estimate (point estimate) of the value of

β

given the data. Using a result known as Bayes’ Theorem, the Bayesian researcher estimates the unknown value of

β

by calculating a probability distribution that represents relative weightings for all possible values of

β

given the data. To do so, the Bayesian first expresses any pre-experimental beliefs about the likely values of

β

by specifying a prior distribution

p (β)

(the “prior”). For example, the Bayesian may characterize a lack of pre-experimental knowledge of

β

by choosing a prior with a large variance, or the Bayesian can choose a more informative prior distribution that reflects their subjective belief on the likely values of

β

. The Bayesian then updates their prior knowledge on

β

by integrating it with the observed body and brain weight data

{x, y}

, resulting in a posterior distribution

p (β∣ x, y)

(the “posterior”). Integration is either done through tractable mathematical calculations or approximated using Markov Chain Monte Carlo algorithms, which allows the researcher to draw random samples from the posterior distribution. With a sufficiently large number of random samples from the posterior, one can make many assertions about the distribution of

β

that take into account the information about

β

provided by the observed data

{x, y}

, conditional on the chosen model and priors. Thus, estimating the posterior distribution of a parameter is typically the primary goal of Bayesian inference. For a more complete introduction to Bayes’ Theorem see Van de Schoot et al.¹

Selecting an approach

Interpretability of the results

Results from Bayesian analyses are typically more interpretable than those from frequentist methods, particularly in terms of probabilistic statements about parameters of interest. A frequentist approach to the body and brain weight linear regression problem would produce a single $β$ estimate and a 95% confidence interval. The frequentist researcher asserts that over repeated experiments, 95% of confidence intervals contain the true $β$ value, a concept challenging to interpret practically. Conversely, Bayesian analysis allows for assertions about probabilities. Using $β$ values sampled from the posterior distribution, a researcher asserts that there is a 95% probability that $β$ falls within that particular 95% Bayesian credible interval, conditional on the chosen model, priors, and observed data. In fact, with the posterior distribution available, the researcher can easily calculate nearly any specific probability of interest, for example the probability that $β$ is greater or less than some value, or the probability that $β$ lies in a range of values. These assertions cannot be made with the same interpretable language via a frequentist analysis.²

Dependence between endpoints within individual animals

In a typical rodent toxicity study, many endpoints are measured on each animal and each endpoint often undergoes statistical analysis separately. For example, a researcher looking at incidence of multiple tumor types may conduct separate trend tests for dose–response, one for each tumor type. Performing multiple independent analyses implies (the typically incorrect assumption) that the endpoints are uncorrelated, that is, toxicity occurring in one organ or system of the body does not affect the chances that toxicity will occur in other organs or systems. When endpoints are correlated, multiple independent hypothesis tests will suffer from high false positive rates. Although frequentist methods can bring the inflated error rates back to expected levels by adjusting p-values using a multiple testing correction, Bayesian methods like hierarchical models allow simultaneous inference on multiple endpoints with no need for multiple testing correction.³ We encourage researchers to collaborate with statisticians and consider what inferences are possible under a Bayesian framework.

Bayesian approaches for modeling dependence between endpoints in animal studies have been demonstrated in the literature. Dunson et al. provide a general approach for simultaneous analysis of endpoints.⁴ Dunson and Herring use data from a mouse bioassay study to present a method for simultaneous analysis of outcomes such as time to first tumor, weekly increases in number of tumors, and presence of tumors at time of death.⁵ Kim and Hwang employ a Bayesian approach to a developmental toxicity study of diethylhexyl phthalate, simultaneously analyzing pup malformations and fetal weight.⁶ Hwang presents a Bayesian joint model for developmental toxicity studies on continuous data (e.g. fetal weight) and zero-inflated count data (e.g. birth defects or rare tumors).⁷

Although certain applications of Bayesian modeling of dependence across multiple endpoints have been addressed in the literature, the methods have not been widely implemented. The detailed paper by Dunson et al. on a general approach to modeling endpoint dependence has been widely cited, but few citations relate to laboratory animal research.⁴ In a 2014 review of statistics for toxicological bioassays,⁸ Bayesian methods are mentioned but only two citations are included on Bayesian methods for joint modeling in toxicology.^8
–10 There is therefore a substantial opportunity for more inclusion of Bayesian approaches in laboratory animal science.

Littermates (nesting among animals) and small samples sizes

Animals from the same litter represent a second type of nested data common in rodent studies: a single endpoint may be more correlated among littermates than it is between animals from different litters. Moreover, small sample sizes can occur in these contexts as some rodent toxicology experiments select only two or four animals per litter for analysis. There are other examples: historical control data is nested by laboratory, rodent strain, and sex; pup or body weight data are nested within individual animals whose endpoints are measured repeatedly over time; and histopathology endpoints can be considered nested by tissue or organ.

Whether the nesting structure is within- or between-animals (or both), the dependence that comes with it would ideally be accounted for in the statistical method. Mixed models that include parameters representing variability within and between litters can be fitted with a frequentist approach. However, these models can perform poorly when sample sizes are small. Fitting the model with a Bayesian approach can mitigate the issues related to small sample sizes (assuming sensible prior distributions are used).¹¹ For example, suppose we are using a Bayesian hierarchical model to estimate the variances $σ_{j}^{2}$ of body weights in eight litters of mice, indexed by $j \in (1, 2, \dots, 8)$ , where some litters have only 2–3 pups and some have 9–10. The Bayesian would assume the eight variance parameters are themselves values from some common prior distribution. By connecting the eight litters via a single prior distribution, the model uses information from litters with more pups to improve estimates for litters with fewer pups. (Generally speaking, a mixed model can be fitted with either a frequentist or a Bayesian approach, and there are interesting connections between the two approaches. For example, the frequentist “BLUP” method for predicting random effects in mixed models is essentially equivalent to using a particular prior distribution in a Bayesian approach.)

When sample sizes are large, Bayesian and frequentist methods often produce similar results. Unfortunately, the efficacy of many frequentist statistical methods relies on sufficiently large sample sizes. On the other hand, the 4 Rs of animal research (reduction, replacement, refinement, and responsibility) encourage smaller sample size designs, which can threaten the reliability of a frequentist statistical analysis.¹² Unlike their frequentist counterparts, Bayesian methods are adept at handling small sample sizes.

Other reasons to consider Bayesian methods

Another reason to consider a Bayesian approach is that the Bayesian framework for inference obviates the need for p-value based significance thresholds or ad hoc p-value corrections when considering multiple hypotheses, since the posterior distribution gives the distribution of all model parameters of interest. With a posterior distribution estimated, the researcher can readily answer questions about multiple parameters, for example “what is the probability that $θ_{1}$ and $θ_{2}$ are both less than 0?” The frequentist researcher might address this question via independent hypothesis tests and a multiple comparison adjustment.

Besides being a required part of a Bayesian analysis, prior distributions are valuable tools. Westfall and Soper discuss the use of prior distributions to alleviate the multiple comparison problem in carcinogenicity tests.¹³ Priors can be used to incorporate information from similar past experiments, for example using historical control data. For an excellent expository example, see Bayesian data analysis, 3rd ed., p.102.³

A word of caution

The specification of prior distributions in a Bayesian analysis should be overseen by a statistician experienced in Bayesian methods. There is no one-size-fits-all recipe for specifying prior distributions. The exact form of the prior distributions can influence the results, sometimes substantially; re-running the analysis under different priors is a necessary step to understand how the priors affect the results. Moreover, using a so-called “default prior” or one that is ostensibly “non-informative” can be a poor choice in certain cases. For further discussion on this critical aspect of Bayesian analysis, see Wheeler,¹⁴ Depaoli et al.,¹⁵ and Seaman III et al.¹⁶

Finally, we emphasize that a full Bayesian analysis often involves advanced algorithms with nuances that can be masked by the user-friendliness of the software. Readers should take care when interpreting the results of a Bayesian analysis, and involve a statistician to assist with interpreting and communicating results.

Accessibility

Identifying and implementing appropriate Bayesian methods is still a challenge for the modern data analyst. Computation for complex Bayesian models can be intensive, although software, hardware, and user interfaces are always improving. For readers interested in considering Bayesian methods in their work, we recommend a four-pronged approach consisting of (1) educational/background materials, (2) applied journal articles, (3) software or book resources for fitting Bayesian models, and (4) consulting with a statistician with experience in Bayesian methods. To assist with this four-pronged approach, we refer the reader to the included table of suggested materials (Table 1). It is more common today than in past decades for statisticians and programmers to publish papers containing full computer code available to fit the models and examples from the paper. The code is often released as part of an open-source software package with documentation, thereby expanding the set of software packages available for Bayesian model-fitting algorithms, results processing, and even visualization.

Table 1.

Educational/background materials on Bayesian concepts	Suitable for:
Bayesian statistics and modeling. Van de Schoot et al. (2021). Nature Reviews.¹	Detailed yet accessible introduction to the Bayesian statistics and modeling workflow
Bayesian data analysis for animal scientists: The basics. Blasco (2017).¹⁷	Introductory to advanced study and reference
Data analysis using regression and multilevel/hierarchical models. Gelman and Hill (2006).¹⁸	Intermediate to advanced applicationswww.stat.columbia.edu/∼gelman/arm
Bayesian and frequentist regression methods. Wakefield (2013).¹¹	Introductory to advanced study and reference; comparing frequentist and Bayesian approaches to many problems

Resources for locating applied journal articles	Suitable for:

https://www.researchrabbit.ai/	Searching for/within journal articles; finding related articles
https://www.elicit.com/	Summarizing papers and their conclusions
Journal of Statistical Software	Statistical software articles, guides; includes source code and code snippetshttps://www.jstatsoft.org/index

References for fitting Bayesian models in R	Suitable for:

The Stan software platform and documentation.¹⁹	Introductory to advanced studyhttps://mc-stan.org/
Data analysis using regression and multilevel/hierarchical models. Gelman and Hill (2006).¹⁸	Intermediate to advanced studywww.stat.columbia.edu/∼gelman/arm
R’s CRAN Task View: Bayesian inference.	Identifying R packages for Bayesian inference by the specific application areahttps://cran.r-project.org/web/views
R package ToxicR.²⁰	Frequentist and Bayesian approaches to dose–response modelinghttps://github.com/NIEHS/ToxicR

References for fitting Bayesian models in Python	Suitable for:

Python package PyMC for Bayesian inference.²¹	Python users, intermediate level and uphttps://www.pymc.io
Python package ArviZ for visualizing Bayesian inference.²²	Python users, intermediate level and uphttps://python.arviz.org

Conclusion

Despite the increased accessibility and relative user-friendliness of many Bayesian approaches, we recommend collaboration with statisticians when applying Bayesian methods to ensure that the analysis is properly executed; including setting prior distributions, defining/building the statistical model, writing or editing existing code to fit the model, checking model fit and convergence, and guiding other researchers in identifying and interpretating the results of the analysis.

Footnotes

Acknowledgments

The authors would like to thank Dr. Guanhua Xie, Dr. Matt Wheeler, and Dr. Helen Cunny for their helpful review on this manuscript.

Data availability

There are no experimental data associated with this manuscript.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Intramural Research Program of the National Institutes of Health, National Institute of Environmental Health Sciences (NIEHS) and contract GS-00F-173CA/75N96022F00055 to Social and Scientific Systems, Inc., a DLH Holdings Corp Company.

Research ethics

Our study did not require an ethical board approval because it did not contain human or animal trials.

ORCID iD

Gary J Larson

References

Van de Schoot

Depaoli

King

, et al. Bayesian statistics and modelling. Nat Rev Methods Primer 2021; 1: 1.

Hespanhol

Vallio

Costa

, et al. Understanding and interpreting confidence and credible intervals around effect estimates. Braz J Phys Ther 2019; 23: 290–301.

Gelman

Carlin

Stern

, et al. Bayesian data analysis. 3rd ed. New York: Taylor & Francis, 2013.

Dunson

Chen

Harry

A Bayesian approach for joint modeling of cluster size and subunit-specific outcomes. Biometrics 2003; 59: 521–530.

Dunson

Herring

Bayesian latent variable models for mixed discrete outcomes. Biostatistics 2005; 6: 11–25.

Kim

Hwang

BS.

Joint analysis of binary and continuous data using skewed logit model in developmental toxicity studies. Korean J Appl Stat 2020; 33: 123–136.

Hwang

BS.

A Bayesian joint model for continuous and zero-inflated count data in developmental toxicity studies. Commun Stat Appl Methods 2022; 29: 239–250.

Hothorn

LA.

Statistical evaluation of toxicological bioassays – a review. Toxicol Res 2014; 3: 418–432.

Fronczyk

Kottas

A Bayesian nonparametric modeling framework for developmental toxicity studies. J Am Stat Assoc 2014; 109: 873–888.

10.

Hwang

Pennell

ML.

Semiparametric Bayesian joint modeling of a binary and continuous outcome with applications in toxicological risk assessment. Stat Med 2014; 33: 1162–1175.

11.

Wakefield

Bayesian and frequentist regression methods. New York: Springer, 2013.

12.

Lee

Kang

BC.

The ‘R’ principles in laboratory animal experiments. Lab Anim Res 2020; 36: 45.

13.

Westfall

Soper

KA.

Using priors to improve multiple animal carcinogenicity tests. J Am Stat Assoc 2001; 96: 827–834.

14.

Wheeler

MW.

An investigation of non-informative priors for Bayesian dose–response modeling. Regul Toxicol Pharmacol 2023; 141: 105389.

15.

Depaoli

Winter

Visser

The importance of prior sensitivity analysis in Bayesian statistics: Demonstrations using an interactive Shiny App. Front Psychol 2020; 11: 608045.

16.

Seaman

III Seaman

Jr Stamey

JD.

Hidden dangers of specifying noninformative priors. Am Stat 2012; 66: 77–84.

17.

Blasco

Bayesian data analysis for animal scientists: The basics. Cham: Springer International Publishing. Epub ahead of print 2017. DOI: 10.1007/978-3-319-54274-4.

18.

Gelman

Hill

Data analysis using regression and multilevel/hierarchical models. Cambridge, UK: Cambridge University Press, 2006.

19.

Carpenter

Gelman

Hoffman

, et al. Stan: A probabilistic programming language. J Stat Softw 2017; 76: 1–32.

20.

Wheeler

Lim

House

, et al. ToxicR: A computational platform in R for computational toxicology and dose–response analyses. Comput Toxicol 2023; 25: 100259.

21.

Abril-Pla

Andreani

Carroll

, et al. PyMC: A modern, and comprehensive probabilistic programming framework in Python. Peer J Comput Sci 2023; 9: e1516.

22.

Kumar

Carroll

Hartikainen

, et al. ArviZ a unified library for exploratory analysis of Bayesian models in Python. J Open Source Softw 2019; 4: 1143.