Bayesian Indirect Estimation of Historical Fertility in Europe and US Using Online Genealogical Data

Abstract

A growing number of social scientists use online genealogical data as an alternative digital census of historical populations to study past demographic dynamics. However, the non-representativeness of this data source requires the development of bias-adjusting methods to obtain accurate demographic estimates. We address this challenge by proposing an indirect estimation framework to investigate fertility trends in seven European countries and the United States of America for the historical period 1751–1910, integrating data from the big genealogical database FamiLinx with more conventional data sources. The proposed methods allow for the indirect estimation of the total fertility rate using the number of women aged 15–49 and children under age 5, while accounting for child mortality, age-specific fertility patterns, and biases. Our methodological approaches demonstrate that, when combined with reliable demographic data, online genealogical data can be fruitfully used to examine fertility patterns in countries and periods lacking well-functioning national civil registration systems.

Keywords

Bayesian methods digital data online genealogies historical demography computational social science fertility

Introduction

The increasing availability of non-traditional data sources driven by the so-called ‘‘Data Revolution’’ (Alburez-Gutierrez et al. 2019; Cesare et al. 2018; Kashyap 2021) presents new opportunities to better understand demographic phenomena. Among these novel data sources, online genealogical data have gathered significant attention due to their unprecedented wealth of historical information about human societies (Alburez-Gutierrez et al. 2022; Colasurdo and Omenti 2024). Online genealogical data are scraped from websites, where users upload their own genealogical tree and insert individual-level demographic information about their ancestors, such as gender, birth and death dates, and countries. Online genealogies offer an unprecedented opportunity to examine population dynamics in historical periods and countries for which we lack ground-truth population data (Colasurdo and Omenti 2024; Stelter and Alburez-Gutierrez 2022). Nonetheless, as these data sources are not primarily designed for population studies, they are affected by several biases that hamper their usability for demographic research (Colasurdo and Omenti 2024). First, the bottom-up construction of the digital family trees amplifies the likelihood of omitting more distant ancestors. Second, the under-representation of various population subgroups, including women and children who died at an early age, and the over-representation of male individuals with higher socioeconomic status and better demographic conditions lead to a significant selection bias (Calderón-Bernal et al. 2025; Colasurdo and Omenti 2024; Hollingsworth and Hollingsworth 1976; Minardi et al. 2024, 2026; Stelter and Alburez-Gutierrez 2022). Third, an individual’s inclusion in the genealogy may be affected by the so-called ‘‘selective-remembering,’’ as the genealogist is more likely to include ancestors with a prominent role in their family history (Chong et al. 2022) and to omit relatives who dishonored the family (Zhao 2001). Fourth, the high percentage of missing values in common demographic variables, specifically birth and death dates and countries, makes only a small share of the profiles from online genealogies usable for demographic research (Colasurdo and Omenti 2024; Minardi et al. 2024). Additionally, online genealogies are built through a bottom-up approach as users begin the construction from the bottom of their family trees and trace their lineages backward, leading to the underestimation of childless individuals and the omission of more historically distant ancestors.

Due to these limitations, we cannot rely on online genealogical data alone to produce accurate population-level demographic indicators. As emphasized by Stelter and Alburez-Gutierrez (2022) and Colasurdo and Omenti (2024), the development of statistical methods, which account for these biases, is essential to enhance the reliability of demographic indicators derived from online genealogical data. Despite these limitations, the extensive historical coverage of such data can be leveraged beneficially in combination with more reliable but sparser historical data sources, such as censuses.

In this article, we address the challenge of estimating total fertility rates (TFR) from online genealogical data by extending an existing indirect estimation framework developed by Schmertmann and Hauer (2019) and by Hauer and Schmertmann (2020). Using the large-scale genealogical database FamiLinx as a case study, we adapt this framework, which permits both deterministic estimation through traditional indirect techniques and probabilistic estimation via Bayesian modeling, to the context of online genealogies. The original method relies on the number of children under age 5, the number of women aged 15–49, and child mortality, under the assumption that the ratio of children under age 5 to women of reproductive age (15–49) is accurate. As a consequence, this approach is not well-suited for TFR estimation in the context of online genealogies, where the age–sex structure of the online genealogical populations deviates substantially from that of the general population (Colasurdo and Omenti 2024). To address this issue, we refine this methodological framework to handle TFR estimation in situations, where the available population data are not representative and need to be combined with more reliable but sparser demographic sources.

Specifically, we produce annual $TFR$ estimates for seven European countries and the United States of America (US) during the historical period 1751–1910 by integrating online genealogical data from FamiLinx with more reliable demographic data sources when available.

The proposed Bayesian model combines information on the number of children under age 5 and the number of women classified by maternal age group from two main data sources: a biased source with complete temporal and spatial coverage (e.g., online genealogical data) and a reliable source with more limited temporal availability (e.g., population censuses). The model also incorporates data on child mortality and standard age-specific fertility schedules, which do not need to coincide with the fertility schedules of the countries under analysis.

A Bayesian modeling framework is particularly well-suited in this setting, because it allows us to combine multiple data sources while accounting for their uncertainty (Bryant and Graham 2013). Additionally, in Bayesian modeling, the outcome of the analysis is a probability distribution around the demographic indicator, allowing us to quantify uncertainty and make probabilistic statements about its range of plausible values (Bijak and Bryant 2016).

Broadly speaking, the use of Bayesian methods for demographic estimation has gained traction in data-sparse contexts. Bayesian methods in demography have been developed to measure migration by combining social media data with more traditional sources (Alexander et al. 2020; Hsiao et al. 2024; Rampazzo et al. 2021), to generate subnational population estimates in contexts with limited data (Alexander and Alkema 2022), to reconstruct historical populations (Voutilainen et al. 2020; Wheldon et al. 2013), to obtain small-area population estimates from satellite imagery (Leasure et al. 2020; Linard et al. 2012), to estimate mortality in data-sparse contexts (Alexander and Alkema 2018; Alexander et al. 2017; Chong et al. 2022).

Most of the previous studies (Blanc 2024a, 2024b; Corti and Minardi 2026; Corti et al. 2024; Cozzani et al. 2023; Gay et al. 2025; Hsu et al. 2021; Minardi et al. 2024, 2026; Pojman et al. 2023), which relied on online genealogical data to study demographic outcomes in Europe and North America, often assumed that these data were representative of the general population or focused on highly selected sub-populations, such as male individuals who survived beyond the age of 30 (Colasurdo and Omenti 2024). The work by Chong et al. (2022) represented the first attempt to develop a Bayesian modeling framework to calibrate mortality rates from online genealogical data with estimates from more reliable data sources such as the Human Mortality Database. To the best of our knowledge, this study is the first to propose a method for correcting TFR estimates based on online genealogical data, thereby introducing a novel approach to fertility estimation in settings where reliable demographic data are limited and online genealogical data serve as a complementary source.

The remainder of the article is structured as follows. First, we provide a description of the big genealogical database FamiLinx. Second, we provide a more detailed explanation behind our methodological choices. Third, we describe the proposed extensions to the existing indirect estimation framework by Schmertmann and Hauer (2019) and Hauer and Schmertmann (2020). Fourth, we apply the proposed methods to examine the historical fertility patterns in seven European countries and the United States (US) during the historical period 1751–1910. Finally, the strengths and limitations of our study are discussed, along with potential directions for future research.

Data

The FamiLinx Database

This article relies primarily on FamiLinx, a database derived from publicly available online genealogies on the website geni.com. The database was curated by Kaplanis et al. (2018) that scraped over 86 million profiles from the digital trees on the website geni.com. This big genealogical database contains individual-level records about 86 million individuals and information about kinship ties for approximately 43 million of these profiles. Specifically, the data incorporate micro-level records containing information about essential demographic variables, namely birth and death dates and countries. Thus, this data source generates a sizable population of individuals with life courses unfolding across multiple centuries and countries.

Despite its massive size, this data set is subject to several limitations that hamper its usability for demographic research. Besides the biases reported in the introduction, this data source significantly over-represents individuals experiencing vital events, i.e., births and deaths, in western countries (Colasurdo and Omenti 2024), and displays a large amount of missing values in key demographic variables, i.e., birth and death dates and locations (see table A1 in the online supplement).

Furthermore, these data exhibit various reporting errors, which may include improbable ages at death or unreasonable years of birth and death. Hence, before conducting any demographic analysis, a careful sample selection must be performed (Colasurdo and Omenti 2024). In addition, this data set is affected by passive registration. While in active registration systems, the data collection authority knows the status of the individual at all times, in passive registration systems, such as FamiLinx, only births and deaths are recorded (Colasurdo and Omenti 2024). Consequently, this data source lacks precise information about other essential life course events, including marriages and migrations.¹ Our analysis focuses on seven European countries (Denmark, England & Wales, Finland, France, Norway, Sweden, and the Netherlands) and the United States (US).² We have opted to select these countries for two major reasons. First, they are among the 20 countries with the highest number of births and deaths recorded in FamiLinx. Secondly, some of these countries are characterized by a long-standing tradition of well-functioning national civil registration systems that can be leveraged to inform the bias of demographic estimates calculated from online genealogical populations.

In order to create the country-specific analytical samples of online genealogical populations, we apply the following criteria:

The variables sex, birth and death years and countries must not be missing.

The profile must have at least one parent or one child. This ensures that each individual belongs to a family network of size strictly greater than one.

The birth year must not be greater than 1910.

The earliest death year cannot be less than 1751.

The age at death must fall between 0 and 110.

The countries of birth and death of the profile must be the same.

We do not consider individuals born after 1910, since Kaplanis et al. (2018) omitted profiles that were still alive as of 2015 due to privacy-related concerns. Hence, we can only include individuals from birth cohorts that were almost surely extinct in 2015. The earliest year of analysis is 1751 since it represents the year in which accurate demographic data started to become available for at least one country, i.e., Sweden, among those included in the analysis.³ Additionally, the age restriction enables to omit individuals with biologically implausible ages at death. We exclude migrants, defined in this study as individuals with distinct birth and death countries, due to the lack of information on the exact time of migration. Figure 1 displays the age–sex distributions of the genealogy-based populations in selected years and countries using population pyramids. The thick black lines indicate the true age–sex distribution when accurate population estimates are available. Overall, Figure 1 does not accentuate a clear convergence in the extent to which the age–sex distributions of online genealogical populations resemble the ones obtained from more reliable data sources.⁴ The genealogy-based age–sex distributions for the US and Sweden seem to become more representative toward the end of the 19th century. On the contrary, in England & Wales the age–sex structure of the genealogical population appears to mirror the true one more closely at the beginning of the study period and to worsen in the 19th century. Nonetheless, a general tendency of genealogical data to under-represent women and children, albeit with distinct magnitudes, is evident across the entire historical period under examination for all countries.

Figure 1.

Genealogy-based and expected population counts by age and sex for selected calendar years in Sweden, England & Wales and the United States of America. The solid lines refer to the true age–sex distribution based on more reliable data sources. Bars on the left represent females, whereas bars on the right represent males.

Methodological Motivation and Background

Online genealogical data from FamiLinx pose significant challenges to the traditional calculation of the TFR. Traditionally, TFR is derived by summing age-specific fertility rates (ASFRs), where each ASFR is calculated as the number of births to women in a given age group divided by the number of women at risk of childbearing in that same age group. However, FamiLinx contains a high proportion of incomplete family trees and a substantial percentage of individuals with missing demographic information, including birth and death years and countries (Colasurdo and Omenti 2024). As a result, many individuals either lack a recorded mother or have a recorded mother with a missing birth year, making it difficult to determine the corresponding age at childbearing.

As shown in Figure 2, during the historical period 1751–1910, a substantial share of children under age 5 in the country-specific analytical samples from FamiLinx lacks information on maternal age.⁵ Although there is some variation across countries, these percentages consistently exceed $50 %$ throughout the entire historical period. Consequently, calculating the TFR using the standard demographic method becomes unfeasible. In addition, these issues result in severely biased TFR estimates that exhibit no discernible patterns over time, as displayed in Figure A4 in the online supplement. This accentuates the challenge of employing traditional demographic methods for the TFR computation from online genealogical populations.

Figure 2.

Percentage of children aged 0–4 in the selected FamiLinx sample who lack maternal age at birth. This includes both children without a recorded mother and those with a recorded mother whose age at childbirth is unavailable. In agreement with our initial assumption, only children, who were born and died in the same country, were included.

To overcome this issue, we build on the indirect estimation framework by Schmertmann and Hauer (2019) and Hauer and Schmertmann (2020), both of which enable the estimation of the TFR without requiring the birth counts classified by maternal ages. Specifically, these methods rely on the number of children under age 5 and the number of women aged 15–49 as an indirect means to obtain TFR estimates. A key assumption of these approaches is that the population pyramid derived from the data source of interest accurately mirrors the true population structure. In particular, the ratio of children under age 5 to women aged 15–49 should closely match the one observed in the general population of interest. Nevertheless, online genealogical data, such as FamiLinx, do not meet this assumption due to numerous types of biases, as previously displayed in Figure 1. To correct for these distortions, we integrate the number of children under age 5 and women aged 15–49 from FamiLinx with estimates from more trustworthy data sources.

An additional challenge is the limited availability of reliable historical demographic data, including TFR estimates and population counts by age and sex, across the entire historical period for most countries with the exception of Sweden and of England & Wales. To address this gap, we incorporate a bias-adjustment factor, which corrects the number of children under age 5 per woman aged 15–49 derived from FamiLinx based on more reliable sources when available. When these data are not available, biases are informed by patterns observed both in other countries with reliable population data of broader temporal coverage or in periods within the same country where reliable population data are available.

Overall, by combining the complete temporal coverage of online genealogical populations with more reliable but sparser historical demographic data, our proposed method provides annual TFR estimates for the historical period 1751–1910, while also extending the indirect estimation framework by Schmertmann and Hauer (2019) and Hauer and Schmertmann (2020) to contexts with limited availability of accurate population data.

Indirect Estimation and Extension to Adjust for Biases

In this section, we describe our proposed extension to the indirect method for the TFR estimation by Hauer and Schmertmann (2020). Our objective is to modify the original set of indirect TFR indicators by Hauer and Schmertmann (2020) to account for the fact that population pyramids extracted from online genealogical data are biased.

Background

Hauer and Schmertmann (2020) proposed to decompose the TFR as a product of three major factors.

{iTFR}_{a, t}^{+} = \frac{1}{p_{a, t}} \cdot \frac{1}{s_{a, t}} \cdot \frac{C_{a, t}^{(g e n)}}{W_{a, t}^{(g e n)}}

(1)

The previous equation states that the TFR calculated for a country

a

in year

t

can be factorized into three major components:

the ratio of children aged 0–4 ( $C_{a, t}^{(g e n)}$ ) to the number of women aged 15–49 ( $W_{a, t}^{(g e n)}$ ) or child–woman ratio (CWR)

a multiplier for the child survival $(1 / s_{a, t})$

a multiplier for the age distribution of mothers at childbearing $(1 / p_{a, t})$

In this framework,

(1 / s_{a, t})

and

(1 / p_{a, t})

are treated as numerical constants derived from different data sources, while in the Bayesian modeling framework, they will be generated from statistical models and probability distributions.

Hauer and Schmertmann (2020) have set $(\frac{1}{s_{a, t}})$ to be equal to $(\frac{1}{1 - 0.75 \cdot q_{0 - 4, a, t}})$ , where $q_{0 - 4, a, t}$ denotes the probability of dying under age 5 in country a and year t. By changing the approximation for $(\frac{1}{p_{a, t}})$ , Hauer and Schmertmann (2020) have identified two major classes of indicators. The first set of indicators is called implied total fertility rate ( ${iTFR}^{+}$ ) with $(\frac{1}{p_{a, t}} \approx 7)$ and assumes that fertility levels are constant across the maternal age groups. The second measure is named extended total fertility rate $({xTFR}^{+})$ and allows $(\frac{1}{p_{a, t}})$ to be different from $7$ and to depend on the proportion of women aged 25–34 ( $π_{25 - 34, a, t}$ ) in the age pyramid, specifically $(\frac{1}{p_{a, t}})$ is approximated by $10.65 - 12.55 π_{25 - 34, a, t}$ .

In general, the foundational assumption underlying Equation 1 is that the number of children under age 5, as observed in an age pyramid for country $a$ in year $t$ , following adjustments for mortality $(\frac{1}{s_{a, t}})$ and fertility age patterns, can serve as a reliable proxy for recent births to women aged 15–49 within the same population pyramid. Additionally, this indirect estimation method relies on the assumption that the CWR calculated from the available data accurately reflects the ratio observed in the general population.

Indirect Bias-Adjustment Multiplier

Online genealogies suffer from multiple biases that give rise to distorted population pyramids. To address this issue, we propose an extension to the decomposition in Equation 1 by introducing a bias-adjustment multiplier ( $r_{a, t}$ ), which corrects for these discrepancies between genealogy-based and true CWRs.

T F R_{a, t} = \frac{1}{r_{a, t}} \cdot \frac{1}{p_{a, t}} \cdot \frac{1}{s_{a, t}} \cdot \frac{C_{a, t}^{(gen)}}{W_{a, t}^{(gen)}}

(2)

The bias-adjustment multiplier

r_{a, t}

measures the extent to which the CWR in online genealogical populations deviates from the CWRs calculated from more reliable data sources. Values of

r_{a, t}

greater than 1 indicate an overestimation of the CWR in online genealogies, while values less than 1 indicate an underestimation of the genealogy-based CWR.

For countries with reliable historical population counts by age and sex available from certain year $t *$ onward, $r_{a, t}$ is calculated as a ratio of the genealogy-based CWR to the true CWR, namely:

r_{a, t} = \frac{C_{a, t}^{(gen)} / W_{a, t}^{(gen)}}{C_{a, t}^{(true)} / W_{a, t}^{(true)}} if t \geq t *

(3)

For years where reliable population counts by age and sex are unavailable, we predict the values of

r_{a, t}

using Generalized Additive Models (GAM), which allow for the flexible data-driven estimation of temporal trends while incorporating country-specific effects. Specifically, we fit the following regression models separately for each country with partial availability of accurate population counts by age and sex (i.e., Denmark, Finland, France, the Netherlands, Norway, and the United States)

\log (r_{a, t}) = β_{0} + f (t) + ν_{a} + ϵ_{a, t}

(4)

where

β_{0}

is the intercept,

f (t)

is a smooth function of time

t

, which uses penalized B-splines as a basis,

ν_{a}

accounts for country-specific random effects and

ϵ_{a, t}

is the residual error.

For each country with only partial availability of accurate population data, we fit the previous model using: (1) data from Sweden, the only country with complete and reliable time series of full age- and sex-specific population counts covering the entire study period, from the earliest available year (i.e., 1751) up to year $t * + 10$ and (2) the first ten years of accurate population data available for the country of interest (i.e., from $t *$ to $t * + 10$ ). By restricting the time window to the period from 1751 to $t * + 10$ , we ensure that the model’s predictions for the bias-adjustment multipliers remain consistent with the patterns observed in the initial years of reliable data availability, thus reducing the risk of implausible extrapolations for earlier periods.

The predicted values of $r_{a, t}$ for the years prior to $t *$ are given by:

{\hat{r}}_{a, t} = \exp ({\hat{β}}_{0} + \hat{f} (t) + {\hat{ν}}_{a}) if t < t *

(5)

By applying the bias-adjustment multiplier

r_{a, t}

to the decomposition proposed by Hauer and Schmertmann (2020), we define two new bias-adjusted TFR indicators: bias-adjusted implied TFRs (

i T F R^{*}

) and bias-adjusted extended TFRs (

x T F R^{*}

i T F R^{*} = \frac{1}{r_{a, t}} \cdot 7 \cdot (\frac{1}{1 - 0.75 \cdot q_{0 - 4, a, t}}) \cdot \frac{C_{a, t}^{(gen)}}{W_{a, t}^{(gen)}}

(6)

x T F R^{*} = \frac{1}{r_{a, t}} \cdot (10.65 - 12.55 π_{25 - 34, a, t}) \cdot (\frac{1}{1 - 0.75 \cdot q_{0 - 4, a, t}}) \cdot \frac{C_{a, t}^{(gen)}}{W_{a, t}^{(gen)}}

(7)

Bayesian Model

In this section, we describe the proposed Bayesian model, which extends the Bayesian modeling framework originally developed by Schmertmann and Hauer (2019) to allow for the indirect estimation of the TFR from non-representative populations. In comparison to the previous indirect estimation method, our proposed Bayesian model allows for uncertainty assessment around the TFR, which is essential in contexts with imperfect data, and for the coherent integration of multiple data sources.

Data Model

Drawing inspiration from the modeling framework by Schmertmann and Hauer (2019), we propose a Bayesian hierarchical model to measure the TFR using the number of children under age 5 and the number of women aged 15–49, while accounting for child mortality, for standard age-specific fertility schedules and for the non-representativeness of online genealogical populations.

Specifically, we model simultaneously the number of children aged 0–4 ( $C_{a, t}^{(gen)}$ ) in the genealogical population of country $a$ in year $t$ and the true number of children ( $C_{a, t}^{(true)}$ ) based on the true age–sex distribution for country $a$ and year $t$ . Data on the number of children aged 0–4 in the genealogical populations ( $C_{a, t}^{(gen)}$ ) are available for all countries and across the entire historical period. In contrast, data on the true number of children aged 0–4 ( $C_{a, t}^{(true)}$ ) are only partially available, as accurate annual age- and sex-specific population counts are lacking for most of the countries analyzed during the entire period 1751–1910.

We assume that both variables are generated from a Poisson distribution:

C_{a, t}^{(gen)} | K_{x, a, t}, θ_{a, t} \sim Pois (\sum_{x = 15}^{45} K_{x, a, t} \cdot W_{x, a, t}^{(gen)} \cdot θ_{a, t} \cdot e^{X_{a, t} \cdot β^{(shock)}}) t \in T_{a}^{(gen)}, a \in A

(8)

C_{a, t}^{(true)} | K_{x, a, t} \sim Pois (\sum_{x = 15}^{45} K_{x, a, t} \cdot W_{x, a, t}^{(true)} \cdot e^{X_{a, t} \cdot β^{(shock)}}) t \in T_{a}^{(true)}, a \in A

(9)

where the term

W_{x, a, t}^{(gen)}

is the number of women in age group

x

in country

a

and year

t

based on the genealogical sample, while the term

W_{x, a, t}^{(true)}

denotes the true number of women in the maternal ages based on the true age–sex distribution of country

a

in year

t

. Both of the previous terms are treated as offsets. The term

K_{x, t, a}

is the expected number of children aged

0 - 4

per woman in the maternal age group

x

during year

t

. In this article, we assume that fertility outside the age interval

[15, 50)

is zero and that maternal ages are split into 5-year age groups. The additional term

θ_{a, t}

indicates the extent to which the expected number of children under age 5 per woman aged 15–49 is biased in comparison to true expected number of children under age 5 per woman aged 15–49. The term

e^{X_{a, t} \cdot β^{(shock)}}

captures the impact of the American Civil War (1861–1865) on the expected number of children under age 5 per woman aged 15–49. The term

β^{(shock)}

denotes the effect of the American Civil War on fertility, while the term

X_{a, t}

is an indicator variable, which is equal to 1 for the US only during the period of the Civil War and 0 otherwise. This adjustment is introduced due to the lack of annual accurate US population counts by age and sex during the decade

1860 - 1870

The term $A$ denotes the set of countries considered for the analysis, while the terms $T_{a}^{(gen)}$ and $T_{a}^{(true)}$ indicate the set of calendar years for which genealogy-based and true population counts by age and sex are available for country $a$ . The term $T_{a}^{(gen)}$ spans over the temporal period 1751–1910 for all the countries under analysis. The term $T_{a}^{(true)}$ does not cover the entire period and varies across countries as official population estimates by age and sex for most countries are not available for the entire historical period.

Figure 3 displays a graphical summary of the proposed Bayesian modeling framework. An insightful description of the model parameters is provided in the following subsections.

Figure 3.

Graphical representation of the Bayesian modeling framework. Primitive parameters (circles) denote the fundamental parameters in a model that are directly assigned a prior probability distribution. Derived parameters (rhombus) are functions of primitive parameters and do not have prior probability distribution directly assigned to them. Input data are represented with rectangles.

Construction of $K_{x, a, t}$

The term $K_{x, t, a}$ follows a simple rearrangement of the Leslie matrix formulas (Wachter 2014). However, unlike in standard cohort-component projection methods, this term defines the age of the mother as being attained at the end of the age interval rather than at the beginning.

Following the parametrization of Schmertmann and Hauer (2019), the component $K_{x, t, a}$ is defined as follows.

K_{x, a, t} = T F R_{a, t} \cdot \frac{L_{0, a, t}}{5} \cdot \frac{1}{2} \cdot [\frac{L_{x - 5, a, t}}{L_{x, a, t}} \cdot ϕ_{x - 5, a, t} + ϕ_{x, a, t}]

(10)

where the term

{TFR}_{a, t}

denotes the TFR in country

a

during year

t

L_{x, a, t}

indicates the person-years lived by women in the age group

x

during year

t

in country

a

ϕ_{x, a, t}

denotes the fraction of fertility experienced by women in age group

x

during year

t

in country

a

. Mathematically, the parameter

ϕ_{x, a, t}

is equal to the term

5 \cdot \frac{F_{x, t, a}}{{TFR}_{a, t}}

where the parameter

F_{x, t, a}

is the fertility rate in age group

x

, year

t

and country

a

. The latter quantity is assumed to be zero outside the age interval

[15, 50)

. The implementation of a Bayesian modeling framework allows for the specification of statistical models and prior probability distributions to estimate the previous parameters.

Model for Age-Specific Fertility

Following Schmertmann and Hauer (2019), to incorporate knowledge about the age-specific fertility patterns, we model the ratio of the share of lifetime fertility in an age group $x$ to the share of lifetime fertility in the earliest reproductive age group $15 - 19$ on the log scale as

\begin{matrix} γ_{a, t} = m + y_{1} β_{1, a, t} + y_{2} β_{2, a, t} \end{matrix}

(11)

where the parameter

γ_{x, a, t} = \log (\frac{ϕ_{x, a, t}}{ϕ_{15, a, t}})

is an index defined as the log transformation of the ratio of the share of lifetime fertility in age group

x

to the share of lifetime fertility in age group

15 - 19

. The term

γ_{a, t}

is a vector whose elements

γ_{x, a, t}

are defined at the distinct reproductive age groups.⁶ The transformation of

ϕ_{x, a, t}

ensures that the elements of the vector

γ_{a, t}

on the left-hand side of the above equation can assume both positive and negative values⁷.

The terms $y_{1}$ and $y_{2}$ are components derived from a set of standard age-specific fertility curves. In particular, the term $m$ is a vector containing the age-specific means of the log-transformed fertility schedules ( $γ_{x, a, t}$ ), while the terms $y_{1}$ and $y_{2}$ are the first and second left-singular vectors which are obtained via a singular value decomposition on the matrix $Y$ whose columns are log-transformed age-specific fertility schedules. For example, in our application to online genealogical data we employ all the available national age-specific female fertility curves available from the Human Fertility Collection (Grigorieva et al. 2015) covering the historical period 1751–1910.

We place standard normal priors on the parameters $β_{1, a, t}$ and $β_{2, a, t}$ .

β_{1, a, t}, β_{2, a, t} \sim N (0, 1)

(12)

From the parameter

γ_{x, a, t}

, we can easily derive the parameter

ϕ_{x, a, t}

that can be interpreted as the proportion of fertility experienced by women in age group

x

during year

t

in country

a

ϕ_{x, a, t} = \frac{\exp (γ_{x, a, t})}{\sum_{x = 15}^{45} \exp (γ_{x, a, t})}

(13)

Prior and Model on Total Fertility Rates

We assign the parameters ${TFR}_{a, t}$ a normal distribution centered around the corresponding historical TFR estimate, as long as reliable historical TFR values ${TFR}_{a, t}^{(hist)}$ are available. When these historical values are not available, TFR parameters are back-projected using a first-order random walk.

{\begin{matrix} {TFR}_{a, t} \sim N ({TFR}_{a, t}^{(hist)}, σ_{{TFR}_{a}}^{2}) t \geq t * \\ {TFR}_{a, t} \sim N ({TFR}_{a, t + 1}, σ_{{TFR}_{a}}^{2}) t < t * \end{matrix}

(14)

where

t *

denotes the earliest year in which an historical

TFR

estimate is available for country

a

. The variance parameter

σ_{TFR}^{2}

is assigned a weakly informative prior.

σ_{{TFR}_{a}}^{2} \sim Inv-Gamma (1, 1)

(15)

The practical implication of this choice is that the marginal posterior distribution of the TFR is mostly determined by the observed data rather than by the historical $TFR$ values⁸.

Model and Priors for Age-Specific Mortality

The model for the term $K_{x, a, t}$ also requires the estimation of the person-years $L_{x, a, t}$ parameters. Hence, we model child and adult mortality employing the two-dimensional log-quadratic mortality model by Wilmoth et al. (2012). In this model, the logarithmic transformation of the age-specific mortality risk ( $μ_{x, t, a}$ ) is a function of two main parameters $q_{0 - 4, a, t}$ and $κ_{a, t}$ . The term $q_{0 - 4, a, t}$ indicates the probability of dying under age 5, while the term $κ_{a, t}$ is a parameter affecting the shape of the age pattern of mortality. This model can be written as follows.

\log (μ_{x, a, t}) = a_{x} + b_{x} \log (q_{0 - 4, a, t}) + c_{x} [\log (q_{0 - 4, a, t})]^{2} + d_{x} κ_{a, t}

(16)

The terms

a_{x}, b_{x}, c_{x}, d_{x}

are age-specific fixed constants derived from various age-specific mortality schedules in the Human Mortality Database. The term

μ_{x, t, a}

indicates the risk of dying in the age group

x

at time

t

in country

a

The model parameters $q_{0 - 4, t, a}$ are assigned a Beta distribution as prior.

q_{0 - 4, t, a} \sim Beta (a ({\hat{q}}_{0 - 4, t, a}), b ({\hat{q}}_{0 - 4, t, a}))

(17)

where the terms

a ({\hat{q}}_{0 - 4, t, a})

and

b ({\hat{q}}_{0 - 4, t, a})

are chosen so that the probability

P (0.9 \cdot {\hat{q}}_{0 - 4, t, a} \leq q_{0 - 4, t, a} \leq 1.1 \cdot {\hat{q}}_{0 - 4, t, a}) = 0.9

. This ensures that the probality of death for children under age

5

in country

a

and year

t

lies fairly close to its estimated value with

90 %

probability.⁹

The parameters $κ_{a, t}$ are assigned a standard normal prior.

κ_{a, t} \sim N (0, 1)

(18)

In order to derive the age-specific life table person-years, we employ well-established demographic relationships. Specifically, we know that the survival column from an abridged life table

l_{x, a, t}

can be written as a function of the age-specific mortality risk

μ_{x, a, t}

l_{x, a, t} = {\begin{matrix} 1 & if x = 0 \\ e^{- μ_{0, a, t}} & if x = 1 \\ l_{1, a, t} \cdot e^{- 4 μ_{1, a, t}} & if x = 5 \\ l_{x - 5, a, t} \cdot e^{- 5 μ_{x - 5, a, t}} & \forall x > 5 \end{matrix}

(19)

Similarly, the life table person-years can be determined directly from the age-specific survival rates.

L_{x, a, t} = {\begin{matrix} \frac{1}{2} \cdot (l_{0, a, t} + l_{1, a, t}) + \frac{4}{2} \cdot (l_{1, a, t} + l_{5, a, t}) & x = 0 \\ \frac{5}{2} \cdot (l_{x, a, t} + l_{x + 5, a, t}) & \forall x \geq 5 \end{matrix}

(20)

Model for the Bias-Adjustment Multiplier

The modeling framework by Schmertmann and Hauer (2019) relies on the assumption that the number of children under age 5 per women aged 15–49 or CWR observed in the sample closely reflects that of the general population. Nonetheless, this assumption is violated in online genealogical populations, where deviations from the age–sex distribution of the true population (Colasurdo and Omenti 2024) lead to implausible values for this ratio and, consequently, biased $TFR$ estimates.

To address this issue, we introduce a country- and year-specific multiplier $θ_{a, t}$ , which accounts for the extent to which the CWRs from FamiLinx are either underestimated or overestimated in FamiLinx in comparison to the true population. This multiplier, positive by construction, is modeled on the log scale with a penalized spline regression model (Currie and Durban 2002; Eilers and Marx 1996):

\log (θ_{a, t}) = \sum_{p = 1}^{P} B_{a, p} (t) \cdot α_{a, p}

(21)

$B_{a, p} (t)$ refers to the p-th B-spline in country $a$ evaluated at time $t$ and $α_{a, p}$ indicates the p-th spline coefficient for country $a$ . Each country $a$ has $P$ knot points defined by $t_{1} < t_{2} < \dots < t_{P}$ . $P$ is the number of B-splines needed to cover the historical period from 1751 to 1910. To ensure consistency across countries, the spline knot spacing was fixed at 10-year intervals.¹⁰

Similarly to Alexander and Alkema (2018), spline coefficients $α_{a, p}$ are modeled as a linear combination of a country-specific intercept, $λ_{a}$ , and $P - 1$ first-order differences between adjacent spline coefficients:

Δ_{c} = (α_{a, 2} - α_{a, 1}, α_{a, 3} - α_{a, 2}, \dots, α_{a, P} - α_{a, P - 1})

(22)

The country-specific intercept is assigned a normal distribution centered around

0

and accounts for time-invariant country-specific factors that may influence the divergence of the genealogy-based CWR from the true CWR in a certain country.

λ_{a} \sim N (0, σ_{λ}^{2})

(23)

The terms

Δ_{a, p}

account for deviations from the country-specific intercept

λ_{c}

and are modeled as follows.

Δ_{a, p} \sim N (0, σ_{Δ_{a}}^{2})

(24)

The variance term

σ_{Δ_{a}}^{2}

can be interpreted as a country-specific smoothing parameter and is modeled hierarchically on the log scale.

\log (σ_{Δ_{a}}^{2}) \sim N (χ, ψ^{2})

(25)

e^{χ}

is a global smoothing parameter and

ψ^{2}

represents across-country variability. The hierarchical structure allows information on the amount of smoothing to be shared across countries. Hence, trends in the multiplier for years and countries lacking accurate population data are informed by trends observed in other countries with reliable population data for these same years.

Effect of American Civil War

To account for the impact of temporal shocks on the expected number of children under age 5 per woman aged 15–49 due to the American Civil War (1861–1865), the mean of Model 8 also includes the term $e^{X_{a, t} \cdot β^{(shock)}}$ . This term can be interpreted as a measure of relative risk. Specifically, if its final estimate is less than 1, it implies that the Civil War is associated with a reduction in the expected number of children under age 5 per woman aged 15–49. Conversely, if the estimate is greater than 1, it indicates that the Civil War is associated with an increase in the expected number of children under age 5 per woman in the age group 15–49.

Here, the term $X_{a, t}$ is a dummy variable that is set to 1 only for the US during the years affected by the American Civil War, namely 1862–1866. The effect of the Civil War on fertility is postponed by one year from 1861–1865 to 1862–1866 to account for the gestation period. $β^{(shock)}$ denotes the effect of the American Civil War on the expected number of children under age 5 per woman aged 15–49 during the years $1862 - 1866$ and is assumed to follow a normal distribution centered around 0.

β^{(shock)} \sim N (0, 1)

(26)

This component is included into the model because population counts by age and sex in the US are only available from decennial censuses. Since the closest censuses were conducted in 1860 and 1870, the linear interpolation of age- and sex-specific populations does not account for the temporary decline in births due to the American Civil War.¹¹ Concerning the other countries, the term

e^{X_{a, t} \cdot β^{(shock)}}

is always equal to 1 and hence

X_{a, t}

is equal to 0.¹²

Model Implementation

The model was implemented using the R statistical package nimble (de Valpine et al. 2017), which allows us to specify the main structure of the model in R (R Core Team 2021), while compiling it in C++ for efficient execution. To obtain samples from the posterior distribution of the model’s parameters, the Hamiltonian Monte Carlo (HMC) No-U-Turn Sampler (NUTS) was employed, as provided by the nimbleHMC package available in R (Turek et al. 2024).

Since our ultimate objective is the estimation of the $TFR$ , we focused on the posterior distribution of the parameter $TFR$ and utilized its median as best estimate. In order to quantify the uncertainty around this estimate, we built $95 %$ credible intervals by computing the $2.5 %$ and $97.5 %$ quantiles of the posterior distributions of the parameter $TFR$ . For the sake of simplicity, we denote the median of the posterior distribution of the $TFR$ parameter calculated from our proposed model by $b T F R^{*}$ . Convergence was assessed numerically via the potential scale reduction factor $\hat{R}$ (Gelman et al. 1995). The value of $\hat{R}$ for all parameters estimated was less than 1.1.

Results

We present key findings on the evolution of the bias-adjustment multiplier over the years across countries and the TFR estimates based on the proposed methods. Furthermore, we compare the accuracy of the TFR estimates derived from our proposed methods with those obtained using the original modeling framework by Schmertmann and Hauer (2019), which do not incorporate any bias-adjustment.

TFR Estimates Across Countries

Figure 4 illustrates trends in the Bayesian TFR estimates $bTFR *$ (solid lines) and the indirect TFR estimates $iTFR *$ (two-dash lines) during the historical period 1751–1910.¹³ When compared against reliable historical TFR estimates (star-shaped points), our bias-adjusted estimates align quite closely with the historical ones. An inspection of country-specific TFR trends reveals that France was the first nation to reach fertility levels below four children per woman aged 15–49, reinforcing its role as a pioneer in the fertility transition (Weir 1984; Wrigley 1985). In agreement with Jaadla et al. (2020), England & Wales experienced an increase in the TFR at the beginning of the 18th century and did not start their secular decline until 1880s, whereas the United States of America entered their fertility transition after the Civil War (1861–65) (Hacker and Roberts 2019). The other countries in the study did not experience any decline in fertility before 1880s, with Finland notably maintaining the highest fertility levels by the end of the study period.

Figure 4.

Model-based (solid lines) and indirect (dotted lines) TFR estimates against historical TFR estimates (star-shaped dots) for eight countries during the period 1751–1910. $bTFR *$ indicates the median posterior estimates from the proposed Bayesian model, while $iTFR *$ is the indirect TFR indicator derived from the extended indirect estimation method. Shaded areas denote $95 %$ credible intervals.

The trends in the $bTFR *$ estimates display the ability of our proposed method to capture exogenous shocks such as famines and wars that caused temporary declines in overall fertility levels. For instance, temporary declines affected Sweden and Finland in 1868–1870 due to the last major Northern-European famine, while the Netherlands suffered from the so-called ‘‘Potato Blight’’ during the historical period 1846–1847 which caused the loss of roughly $70 %$ of the potato crop (Bergman 1967) and a temporary reduction in overall fertility. Similarly, by incorporating a temporal dummy into the mean of the data model, the time series of US TFR estimates derived from our proposed model captures the temporary fertility decline during the Civil War, as well as the subsequent post-war rebound.

Furthermore, at the beginning of the study period, although none of the countries had yet undergone their fertility transition, they exhibited different fertility levels. In the European countries, the average number of children ranged between 4 and 6, whereas in the US, it exceeded 7. This finding aligns with previous studies, which attributed the higher fertility levels of the US to factors such as the greater land availability and earlier ages at marriage in comparison to European countries (Lee 2002).

The trends in the bias-adjusted indirect estimate $iTFR *$ are displayed for the historical period 1751–1910 across the eight selected countries and resemble fairly close the historical TFR estimates, especially in Denmark and Sweden. On the contrary, in the Netherlands and Norway, the indirect TFR estimates are slightly smaller compared to the historical TFR estimates.

Bias-Adjustment Estimates

Figure 5 illustrates the evolution of the Bayesian bias-adjustment multiplier $θ_{a, t}$ and the indirect bias-adjustment multiplier $r_{a, t}$ across countries during the historical period 1751–1910 on the logarithmic scale. The solid lines indicate the median estimates from the corresponding posterior samples, while the shaded regions represent the $95 %$ credible intervals. A positive estimate for the bias-adjustment multiplier suggests that genealogy-based CWRs are overestimated, whereas negative estimates imply an underestimation.

Figure 5.

Median posterior estimates for the bias-adjustment multiplier $θ_{a, t}$ (solid lines) and indirect bias-adjustment multiplier estimates $r_{a, t}$ (two-dash lines). Estimates are displayed on the natural log scale. Shaded regions indicate $95 %$ credible intervals.

Figure 5 reveals no consistent patterns in the evolution of the bias-adjustment multiplier during the period 1751–1910 across countries. For instance, in England & Wales we observe a U-shaped trend, with positive values at the beginning and at the end of the period, but negative values in between. Similar patterns are observed in France and Denmark, although their median estimates for the bias-adjustment multiplier remain negative throughout the historical period. This implies that at the beginning of the study period the fertility of these countries is either overestimated (England & Wales) or slightly underestimated (Denmark and France).

In the Netherlands, we observe a decline in the multiplier during the entire period, implying that the underestimation of the genealogy-based expected number of children under age 5 per woman aged 15–49 worsens over time. In Sweden and the US, genealogy-based CWRs are severely underestimated until the mid-19th century. However, this underestimation becomes progressively less severe as the 20th century unfolds, though a renewed underestimation appears in Sweden in the early 1900s. Norway and Finland exhibit a consistent underestimation of genealogy-based CWRs throughout the historical period. Additionally, in periods, where reliable data on population structure is lacking, countries display wider credible intervals. As more reliable data becomes available for a country, the size of its credible intervals narrows. Broadly speaking, these findings highlight that the non-representativeness of online genealogical data is not only time-dependent but also varies considerably across countries. Similar patterns are observed if we consider the predicted and observed indirect bias-adjustment multiplier estimates $r_{a, t}$ . Specifically, when reliable data on population structure are available, the observed indirect bias-adjustment multiplier estimates closely resemble those derived from the proposed Bayesian model. However, in the absence of accurate population counts by age and sex, the predicted estimates may exhibit slight differences due to the different estimation procedures.

Bayesian Model Validation and Accuracy of TFR Estimates

The performance of the Bayesian model was evaluated through a series of out-of-sample validation exercises. We constructed test datasets consisting of the most recent “true” counts of children under age 5 from the final 20 and 30 years of the considered historical period. Accordingly, the training datasets included all the “true” counts of children under age 5 up to the years 1890 and 1880, respectively. In these validation exercises, only the “true” counts of children under age 5 from the final 20 and 30 years are withheld from the model. All other sources of information were retained for the entire historical period, including the historical values of the TFR and child mortality probabilities, which are used as inputs to the TFR prior distribution and the log-quadratic mortality submodel, as well as the information derived from FamiLinx on the number of children under age 5 and the number of women classified by maternal age group.

The models fitted to each training set were then used to predict the true number of children under age 5 in the corresponding test set. Rather than comparing the predicted and observed true number of children under age 5, we opted to conduct the comparison in terms of the CWR. This transformation constrains the outcome to a narrower range, facilitating more stable and interpretable comparisons than those based on raw counts.

For the left-out observations, we calculated the absolute relative error defined by

e_{a, t} = \frac{| {CW}_{a, t}^{(true)} - {\hat{CW}}_{a, t}^{(true)} |}{{\hat{CW}}_{a, t}^{(true)}} \cdot 100

(27)

where the terms

{CW}_{a, t}^{(true)}

and

{\hat{CW}}_{a, t}^{(true)}

denote the observed and predicted CWR. The coverage was computed as:

\frac{\sum_{i = 1}^{N} 1 ({CW}_{a, t [i]}^{(true)} \geq l_{a, t [i]}) \cdot 1 ({CW}_{a, t [i]}^{(true)} \leq u_{a, t [i]})}{N} \cdot 100

(28)

where the constant

N

is the total number of left-out observations and the terms

l_{a, t [i]}

and

u_{a, t [i]}

are the lower and upper bounds of the prediction interval for the i-th observation. For each test set, we computed both the mean absolute relative error (MARE) and the coverage for various nominal levels.

Panel A of Table 1 reveals that the MAREs are around $3 %$ , highlighting the model’s ability to provide accurate point predictions. The coverage of the prediction intervals aligns with the nominal values. This finding highlights that our proposed model is reasonably calibrated.

Table 1.

Model validation results and accuracy of TFR estimates.

(a) Validation measures for left-out data
Validation set	MARE	$80 %$ Coverage	$90 %$ Coverage	$95 %$ Coverage
30-Year	$2.976$	$87.083$	$90.417$	$93.750$
20-Year	$3.070$	$85.625$	$89.375$	$92.500$

(b) Accuracy of TFR estimates by method and country using MARE.
Country	$bTFR *$	${bTFR}^{†}$	$iTFR *$	$xTFR *$
DEN	3.317	4.167	2.679	2.810
ENG	0.128	2.705	5.451	6.918
FIN	1.540	3.451	4.698	5.944
FRA	1.517	4.194	4.756	5.109
NLD	1.583	4.448	7.263	8.895
NOR	0.169	2.886	6.019	7.011
SWE	0.339	3.079	5.106	5.171
USA	3.365	6.927	5.845	5.434

In addition, we assessed how accurately the posterior TFR estimates from our proposed Bayesian model reproduce reliable historical TFR values. We compared its performance with that of the proposed indirect estimation indicators and with the posterior estimates from an alternative Bayesian model that relies exclusively on accurate population data and does not incorporate FamiLinx. This alternative model is structurally similar to the proposed Bayesian model but excludes the bias-adjustment component. The median posterior estimate from this restricted specification is denoted by ${bTFR}^{†}$ . To conduct the comparison, we computed again the MARE, defined as the average of the absolute differences between the TFR estimates from the proposed methods and the corresponding historical TFR benchmarks, expressed relative to the historical TFR benchmarks, calculated over the years for which reliable historical data are available.

Panel B of Table 1 reports the MAREs for the different TFR estimation methods by country. Overall, the proposed Bayesian model combining FamiLinx data with reliable population data achieves the lowest MARE. In particular, the TFR estimates from the proposed model provide the highest accuracy for most of the countries, apart from Denmark, where the TFR estimates from the indirect estimation method align more closely with the historical benchmarks. In addition, across all countries examined, the proposed Bayesian framework, which integrates online genealogical data with more reliable conventional sources, produces more accurate TFR estimates than the corresponding model that excludes the genealogical data from FamiLinx.

We further assessed the robustness of the proposed model through a series of sensitivity analyses, reported in the online supplement. These include prior-to-posterior comparisons for the TFR and the bias-adjustment multiplier, posterior predictive checks for the CWR, sensitivity to the exclusion of countries with reliable population data throughout the full historical period, sensitivity to alternative age-specific fertility schedules used in the singular value decomposition, sensitivity to the choice of the prior distributions for the variance parameters, and an examination of posterior correlations between the TFR and bias-adjustment parameters.

Conclusion

Limitations

Our study is not free from caveats. First, the proposed methods are limited to estimating the TFR and do not allow for the calculation of other fertility measures such as the cohort fertility rate (CFR) and the mean age at childbearing (MAB). While the TFR is widely used due to its availability across countries and time periods, multiple studies (Bongaarts and Feeney 2000, 2010; Sobotka and Lutz 2010) have highlighted the susceptibility of this indicator to tempo distortions. Temporary declines in the TFR may not reflect an actual decrease in fertility but rather delays in childbearing in response to external shocks, such as wars, economic crises, or pandemics. Hence, the availability of additional fertility indicators would certainly provide a richer understanding of the fertility patterns experienced by the countries included in the analysis during the historical period 1751–1910.

Second, our study assumes constant child mortality in cases where reliable historical estimates are unavailable. Nonetheless, pre-transitional populations displayed high and fluctuating child mortality rates due to the widespread prevalence of infectious diseases, poor sanitary conditions, and the recurrent occurrence of pandemics and famines (Omran 1998; Pozzi and Fariñas Ramiro 2015).

Third, the absence of socio-economic variables limits the depth of our analysis. Crucial factors such as education, wealth, occupation, and social class, which are known to have influenced fertility behaviors in historical populations in Europe and the US (Dribe et al. 2014), are not available in FamiLinx. As a consequence, we are unable to examine how fertility transitions varied across different social classes or to investigate mechanisms driving the diffusion of birth control methods.

Fourth, migration data are not available in FamiLinx, preventing us from examining fertility differentials between native and migrant women. Especially, in the context of the US, previous research (Dribe et al. 2014; Hacker 2003) has shown distinctive fertility trends between migrants and natives during the US fertility transition.

Fifth, the anonymity of the records in FamiLinx poses another constraint, as it prevents us from linking these data with other micro-level sources such as censuses and parish records. The employment of statistical matching techniques, which could enhance the richness of FamiLinx by integrating additional micro-level socio-economic information, is not feasible.

Sixth, the proposed indirect estimation framework does not allow us to estimate the TFRs when reliable historical benchmarks are unavailable. The incorporation of the bias-adjustment multiplier requires at least some reliable demographic data over part of the historical period in order to examine long-term fertility trends. Without these benchmarks, online genealogical data alone are insufficient to produce reliable annual TFR estimates.

Despite these limitations, we believe that our study represents an important step in advancing statistical methods for the estimation of demographic indicators in data-sparse settings, as well as in advancing fertility estimation in historical populations using a novel non-traditional data source in combination with more reliable data.

Discussion of the Results

The spread of technology has furnished population scientists with an unprecedented wealth of data sources, significantly enriching the demographic research landscape (Kashyap 2021). This surge has led to a growing body of literature in demographic research relying on digital data. Although the majority of studies in digital demography have focused on contemporary populations, we believe that historical demography stands in a unique position to benefit from these novel data streams. The digitization of traditional data sources such as parish records and censuses, alongside the development of platforms where users from different parts of the world can share their family history, offers much promise for the examination of demographic processes in the past at a more global scale. In this article, our focus is on data derived from digital genealogical trees, generated by a transnational network of genealogy enthusiasts dedicated to reconstructing their family history. As they contain demographic information about individuals whose life courses unfolded in the last 400 years across different countries, these repositories provide an unprecedented opportunity to study population dynamics in the past. Nonetheless, as pointed out by Colasurdo and Omenti (2024), this data source presents several pitfalls, which hamper its employment in demographic research, including the under-representation of various subgroups such as women and children as well as issues related to the accuracy of the reported demographic information.

In response to these challenges, this article proposes a methodological framework to obtain accurate TFR estimates from data that are inherently defective. Our proposed methods combine data from online genealogical populations with more traditional data sources in order to obtain TFR estimates in various historical populations. Specifically, we added a bias-adjustment multiplier to the modeling framework developed by Schmertmann and Hauer (2019) and the indirect TFR estimation method by Hauer and Schmertmann (2020) to incorporate information about the extent to which the number of children under age 5 per woman aged 15–49 is under- or over-estimated in online genealogical data.

The results suggest that the incorporation of a bias-adjustment parameter into the Bayesian modeling framework by Schmertmann and Hauer (2019) produces plausible annual TFR estimates that cover historical periods when most of the countries in our analysis lacked national well-functioning civil registration systems. The inclusion of a bias-adjustment factor into the TFR decomposition by Hauer and Schmertmann (2020) has also led to indirect TFR estimates, which closely align with historical TFR estimates. However, the proposed indirect estimation method under-performs in comparison to the Bayesian model in most countries. While the proposed indirect TFR estimation approach is straightforward and does not require a complex modeling procedure, unlike the proposed Bayesian model, it lacks the capability of accounting for uncertainties arising from different data sources and does not provide credible intervals. In contexts where accurate population data are only partially available, probabilistic demographic estimates, which usually come with a range of plausible values, certainly provide researchers with a more realistic and informative picture about fertility patterns.

Broadly speaking, online genealogical data are inherently biased and noisy. Consequently, relying solely on large genealogical databases such as FamiLinx to estimate overall fertility indicators for historical populations would lead to misleading results. By explicitly modeling the bias in populations derived from online genealogies, we are able to reconstruct annual time series of TFR estimates for multiple European countries and the United States over the period 1751–1910. The proposed model, which integrates online genealogical population data with more reliable historical sources, yields TFR estimates with higher accuracy and finer temporal resolution than a model that relies exclusively on reliable historical data.

While the article uses online genealogies to analyze fertility dynamics in historical populations, we argue that the proposed methodological framework has considerable potential for application in other historical and contemporary data-sparse settings. In particular, it is well suited to contexts where civil registration systems are incomplete, large-scale population surveys are available but infrequent, and non-traditional data sources can be leveraged to improve the temporal and spatial resolution of the TFR estimates.

In contemporary settings, potential applications include low- and middle-income countries in which fertility measurement relies primarily on infrequent household surveys. For instance, in Sub-Saharan Africa, genealogical reconstructions available through platforms, such as FamilySearch, could be combined with large-scale population surveys, including the Demographic and Health Surveys and Multiple Indicator Cluster Surveys, to improve the temporal resolution of fertility estimates between survey waves. In historical contexts, this framework could be applied to reconstruct TFR estimates at finer geographical scales by combining population reconstructions from local parish or genealogical records with higher-quality sources, such as historical population censuses. In both settings, the proposed methodological framework enables the integration of rich but biased information with sparse reliable data, yielding a more granular and informative time series of TFR estimates.

From a methodological viewpoint, conditional on the availability of data on the number of children under age 5 classified by maternal age group, an interesting extension of the proposed model would involve jointly modeling these counts from both a biased data source and a more reliable one. This approach could enable the estimation of fertility indicators beyond the TFR.

From a broad perspective, we have proposed a novel extension to the existing indirect estimation framework by Schmertmann and Hauer (2019) and by Hauer and Schmertmann (2020) to allow for the reconstruction of fertility patterns in eight countries during the historical period 1751–1910, when the availability of reliable population data is limited for most countries. The proposed extensions allow us to combine online genealogical data with other historical data sources, whose information was included either through the incorporation of a bias-adjustment statistical model within the Bayesian modeling framework or through the incorporation of a bias-adjustment factor within the indirect estimation method. To conclude, this research article sheds new light on the potential of Bayesian and indirect estimation methods to investigate historical fertility patterns in countries and historical periods with imperfect demographic data.

Supplemental Material

sj-pdf-1-smr-10.1177_00491241261463559 - Supplemental material for Bayesian Indirect Estimation of Historical Fertility in Europe and US Using Online Genealogical Data

Supplemental material, sj-pdf-1-smr-10.1177_00491241261463559 for Bayesian Indirect Estimation of Historical Fertility in Europe and US Using Online Genealogical Data by Riccardo Omenti, Monica Alexander and Nicola Barban in Sociological Methods & Research

Supplemental Material

sj-pdf-2-smr-10.1177_00491241261463559 - Supplemental material for Bayesian Indirect Estimation of Historical Fertility in Europe and US Using Online Genealogical Data

Supplemental material, sj-pdf-2-smr-10.1177_00491241261463559 for Bayesian Indirect Estimation of Historical Fertility in Europe and US Using Online Genealogical Data by Riccardo Omenti, Monica Alexander and Nicola Barban in Sociological Methods & Research

Footnotes

Acknowledgments

An earlier version of this article was presented at the 2024 Population Association of America Conference, the 2023 British Society for Population Studies Conference, the 15th Conference of Young Demographers, the Formal Demography Working Group Seminar Series, the Max Planck Institute for Demographic Research Seminar Series, and the Quetelet 2023 Seminar. The authors thank the participants at these events for their valuable comments and feedback. They also gratefully acknowledge Jakub Bijak, Ridhi Kashyap and Rebecca Johnson for their helpful suggestions on an earlier version of the paper. We used the large language model GPT-5.5 to assist with grammatical and typographical editing of the manuscript. All edits generated by ChatGPT were carefully reviewed by the authors.

ORCID iDs

Riccardo Omenti

Monica Alexander

Nicola Barban

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Riccardo Omenti and Nicola Barban received funding from the European Research Council under the European Union’s Horizon 2020 research and innovation program (Grant Agreement 865356).

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Preregistration statement

This study was not preregistered because it relies on publicly available data that already existed before the study was begun.

Code Availability Statement

The code needed to reproduce the figures and tables of the paper is stored in the permanent repository available at this link .

Data Availability Statement

All the data needed for the study are publicly available and deposited in the permanent repository available at this link https://zenodo.org/records/20423970 (Omenti, 2026).

Material Availability Statement

All the materials used in this study are available at the following permanent repository available at this link .

Supplemental Material

Supplemental material for this article is available online.

Notes

Author Biographies

Riccardo Omenti is a research fellow in demography at the University of Bologna. His research interests include statistical demography and the study of novel big data sources for demographic analysis.

Monica Alexander is an associate professor in statistics and sociology at the University of Toronto. Her research interests include improving methods of demographic estimates in data-sparse contexts.

Nicola Barban is a professor of demography at the University of Bologna. His research interests include the study of socio-historical determinants of fertility and family behavior using longitudinal and genealogical data.

References

Albert

2018. “Package LearnBayes.” URL http://rsync5.jp.gentoo.org/pub/CRAN/web/packages/LearnBayes/LearnBayes.pdf. R package version 2.7.3.

Alburez-Gutierrez

Diego

Aref

Samin

Gil-Clavel

Santiago

. 2019. “Demography in the Digital Era: New Data Sources for Population Research. Pp. 22-33 in Book of Short Papers SIS 2019 edited by Arbia

Giuseppe

Peluso

Stefano

Pinna

Andrea

et al., Pearson. doi:10.31235/osf.io/24jp7.

Alburez-Gutierrez

Diego

Barban

Nicola

Caswell

Hal

Kolk

Martin

Margolis

Rachel

Smith-Greenaway

Emily

Song

Verdery

Ashton M.

Zagheni

Emilio

. 2022. “Kinship, Demography, and Inequality: Review and Key Areas for Future Development.” Working Paper in SocArXiv. doi:10.31235/osf.io/fk7x9.

Alexander

Monica

Alkema

Leontine

. 2018. “Global Estimation of Neonatal Mortality Using a Bayesian Hierarchical Splines Regression Model.” Demographic Research 38, 335–372. doi:10.4054/DemRes.2018.38.15.

Alexander

Monica

Alkema

Leontine

. 2022. “A Bayesian Cohort Component Projection Model to Estimate Women of Reproductive Age At the Subnational Level in Data-Sparse Settings.” Demography 59(5): 1713–1737. doi:10.1215/00703370-10216406.

Alexander

Monica

Polimis

Kivan

Zagheni

Emilio

. 2020. “Combining Social Media and Survey Data to Nowcast Migrant Stocks in the United States.” Population Research and Policy Review 41, 1–28. doi:10.1007/s11113-020-09599-3.

Alexander

Monica

Zagheni

Emilio

Barbieri

Magali

. 2017. “A Flexible Bayesian Model for Estimating Subnational Mortality.” Demography 54(6): 2025–2041. doi:10.1007/s13524-017-0618-7.

Bergman

1967. “The Potato Blight in the Netherlands and Its Social Consequences (1845–1847).” International Review of Social History 12(3): 390–431.

Bijak

Jakub

Bryant

John

. 2016. “Bayesian Demography 250 Years After Bayes.” Population Studies 70(1): 1–19. doi:10.1080/00324728.2015.1122826.

10.

Blanc

Guillaume

. 2024a. “The Cultural Origins of the Demographic Transition in France.” Working paper, Manchester, UK: University of Manchester. doi:10.2139/ssrn.3702670.

11.

Blanc

Guillaume

. 2024b. “Demographic Transitions, Rural Flight, and Intergenerational Persistence: Evidence From Crowdsourced Genealogies.” Working paper, Manchester, UK: University of Manchester. https://hal.science/hal-02922398v3.

12.

Bongaarts

John

Feeney

Griffith

. 2000. “On the Quantum and Tempo of Fertility: Reply.” Population and Development Review 26(3): 560–564. doi:10.1111/j.1728-4457.2000.00560.x.

13.

Bongaarts

John

Feeney

Griffith

. 2010. “When is a Tempo Effect a Tempo Distortion?.” Genus 66(2): 1–15.

14.

Bryant

John R.

Graham

Patrick J.

. 2013. “Bayesian Demographic Accounts: Subnational Population Estimation Using Multiple Data Sources.” Bayesian Analysis 8(3): 591–622. doi:10.1214/13-BA820.

15.

Calderón-Bernal

Liliana P.

Diego

Alburez-Gutierrez

Emilio

Zagheni

. 2025. “Analysing Biases in Genealogies Using Demographic Microsimulation.” European Journal of Population 41(1): 34. 10.1007/s10680-025-09756-4.

16.

Cesare

Nina

Lee

Hedwig

McCormick

Tyler

Spiro

Emma

Zagheni

Emilio

. 2018. “Promises and Pitfalls of Using Digital Traces for Demographic Research.” Demography 55(5): 1979–1999. doi:10.1007/s13524-018-0715-2.

17.

Chong

Micheal

Alburez-Gutierrez

Diego

Alexander

Monica

Zagheni

Emilio

. 2022. “Identifying and Correcting Bias in Big Crowd-Sourced Online Genealogies.” MPIDR Working Paper Series WP-2022-005, Max Planck Institute for Demographic Research, Rostock. doi:10.4054/MPIDR-WP-2022-005.

18.

Colasurdo

Andrea

Omenti

Riccardo

. 2024. “Using Online Genealogical Data for Demographic Research: An Empirical Examination of the FamiLinx Database.” Demographic Research 51(41): 1299–1350. doi:10.4054/DemRes.2024.51.41.

19.

Corti

Giulia

Minardi

Saverio

. 2026. “Parental Loss in Early Years and Adult Family Formation: Evidence From U.S. Cohorts Born 1850–1910." Journal of Marriage and Family. Published electronically on February 5 2026. doi: https://doi.org/10.1111/jomf.70062.

20.

Corti

Giulia

Minardi

Saverio

Barban

Nicola

. 2024. “Trends in Assortative Mating in the United States, 1700–1910. Evidence From FamiLinx Data." The History of the Family 29(4):461–481. doi: https://doi.org/10.1080/1081602X.2024.2352539.

21.

Cozzani

Marco

Minardi

Saverio

Corti

Giulia

Barban

Nicola

. 2023. “Birth Month and Adult Lifespan: A Within-Family, Cohort, and Spatial Examination Using FamiLinx Data in the United States (1700–1899).” Demographic Research 49(9): 201–218. doi:10.4054/DemRes.2023.49.9.

22.

Currie

Iain D.

Durban

Maria

. 2002. “Flexible Smoothing With P-Splines: A Unified Approach.” Statistical Modelling 2(4): 333–349. doi:10.1191/1471082x02st039ob.

23.

de Valpine

Perry

Turek

Daniel

Paciorek

Christopher J.

Anderson-Bergman

Clifford

Lang

Duncan Temple

Bodik

Rastislav

. 2017. “Programming With Models: Writing Statistical Algorithms for General Model Structures With NIMBLE.” Journal of Computational and Graphical Statistics 26(2): 403–413. doi:10.1080/10618600.2016.1172487.

24.

Dribe

Martin

Hacker

J. David

Francesco

Scalone

. 2014. “The Impact of Socio-Economic Status on Net Fertility During the Historical Fertility Decline: A Comparative Analysis of Canada, Iceland, Sweden, Norway, and the USA.” Population Studies 68(2): 135–149. doi:10.1080/00324728.2014.889741.

25.

Eilers

Paul H. C.

Marx

Brian D.

. 1996. “Flexible Smoothing With B-Splines and Penalties.” Statistical Science 11(2): 89–121. doi:10.1214/ss/1038425655.

26.

Gay

Gobbi

Goñi

. 2025. “Revolutionary Transition: Inheritance Change and Fertility Decline.” Journal of Political Economy 134(6):1666–1713. doi: https://doi.org/10.1086/739821.

27.

Gelman

Andrew

Carlin

John B.

Stern

Hal S.

Rubin

Donald B.

. 1995. Bayesian Data Analysis. Boca Raton, FL: Chapman and Hall/CRC.

28.

Grigorieva

Jasilioniene

Jdanov

D. A.

Grigoriev

Sobotka

Zeman

Shkolnikov

V. M.

. 2015. “Methods Protocol for the Human Fertility Collection.” http://www.fertilitydata.org/docs/methods.pdf.

29.

Hacker

J. David

2003. “Rethinking the “Early” Decline of Marital Fertility in the United States.” Demography 40, 605–620. doi:10.2307/1515199.

30.

Hacker

J. David

2016. “Ready, Willing, and Able? Impediments to the Onset of Marital Fertility Decline in the United States.” Demography 53, 1657–1692. doi:10.1007/s13524-016-0513-7.

31.

Hacker

J. David

Evan

Roberts

. 2019. “Fertility Decline in the United States, 1850–1930: New Evidence From Complete-Count Datasets.” Annales de démographie Historique 138(2): 143–177. doi:10.3917/adh.138.0143.

32.

Hauer

Mathew E.

Schmertmann

Carl P.

. 2020. “Population Pyramids Yield Accurate Estimates of Total Fertility Rates.” Demography 57(1): 221–241. doi:10.1007/s13524-019-00842-x.

33.

Hollingsworth

Thomas H.

Hollingsworth

T-T.

. 1976. “Genealogy and Historical Demography.” Annales de Demographie Historique 1(1):167–170.

34.

Hsiao

Yuan

Fiorio

Lee

Wakefield

Jonathan

Zagheni

Emilio

. 2024. “Modeling the Bias of Digital Data: An Approach to Combining Digital with Official Statistics to Estimate and Predict Migration Trends.” Sociological Methods & Research 53(4): 1905–1943. doi:10.1177/00491241221140144.

35.

Hsu

Chen-Hao

Posegga

Oliver

Fischbach

Kai

Engelhardt

Henriette

. 2021. “Examining the Trade-Offs Between Human Fertility and Longevity Over Three Centuries Using Crowdsourced Genealogy Data.” PloS one 16(8): e0255528. doi:10.1371/journal.pone.0255528.

36.

Jaadla

Hannaliis

Reid

Alice

Garrett

Eilidh

Schürer

Kevin

Day

Joseph

. 2020. “Revisiting the Fertility Transition in England and Wales: The Role of Social Class and Migration.” Demography 57(4): 1543–1569. doi:10.1007/s13524-020-00895-3.

37.

Kaplanis

Joanna

Gordon

Assaf

Shor

Tal

Weissbrod

Omer

Geiger

Dan

Wahl

Mary

Gershovits

Michael

Markus

Barak

Sheikh

Mona

Gymrek

Melissa

, et al. 2018. “Quantitative Analysis of Population-Scale Family Trees With Millions of Relatives.” Science (New York, N.Y.) 360(6385): 171–175. doi:10.1126/science.aam9309.

38.

Kashyap

Ridhi

. 2021. “Has Demography Witnessed a Data Revolution? Promises and Pitfalls of a Changing Data Ecosystem.” Population Studies 75(Suppl. 1): 47–75. doi:10.1080/00324728.2021.1969031.

39.

Leasure

Douglas R.

Jochem

C. Warren

Weber

M. Eric

Vincent

Seaman

Tatem

Andrew J.

. 2020. “National Population Mapping From Sparse Survey Data: A Hierarchical Bayesian Modeling Framework to Account for Uncertainty.” Proceedings of the National Academy of Sciences 117(39): 24173–24179. doi:10.1073/pnas.1913050117.

40.

Lee

Ronald

. 2002. “The Demographic Transition: Three Centuries of Fundamental Change.” Journal of Economic Perspectives 17(4): 167–190. doi:10.1257/089533003772034943.

41.

Linard

Catherine

Gilbert

Marius

Snow

Robert W.

Noor

Abdisalan M.

Tatem

Andrew J.

. 2012. “Population Distribution, Settlement Patterns and Accessibility Across Africa in 2010.” PLoS One 7(2): e31743. doi:10.1371/journal.pone.0031743.

42.

McPherson

James M

. 2003. Battle Cry of Freedom: The Civil War Era. New York, NY: Oxford University Press. doi: https://doi.org/10.2307/1908702.

43.

Minardi

Saverio

Corti

Giulia

Barban

Nicola

. 2024. “Historical Patterns in the Intergenerational Transmission of Lifespan and Longevity: A Research Note on US Cohorts Born Between 1700 and 1900.” Demography 61(4): 979–994. doi:10.1215/00703370-11458359.

44.

Minardi

Saverio

Puschmann

Paul

Barban

Nicola

. 2026. “Within Families, Across Borders: Lifespans of US Immigrants Born 1850-1890 Compared to Origins, Destination, and Siblings.” Demography. Forthcoming.

45.

Omenti

Riccardo

. 2026. Scripts, data, and replication materials for ”Bayesian Indirect Estimation of Historical Fertility in Europe and US Using Online Genealogical Data”. Zenodo. doi: 10.5281/zenodo.20423970. Deposited on May 28 2026.

46.

Omran

Abdel R

. 1998. “The Epidemiologic Transition Theory Revisited Thirty Years Later.” World Health Statistics Quarterly 51(2–4): 99–119.

47.

Pojman

Emily

Mwedzi

Donald E.

Bucaro

Olivia O.

Zhang

Shuo

Chong

Maria

Alexander

Maxwell

Alburez-Gutierrez

Diego

. 2023. “Leaving for Life: Using Online Crowd-Sourced Genealogies to Estimate the Migrant Mortality Advantage for the United Kingdom and Ireland During the 18th and 19th Centuries.” MPIDR Working Paper Series WP-2023-050, Max Planck Institute for Demographic Research, Rostock. doi:10.4054/MPIDR-WP-2023-050.

48.

Pozzi

Lucia

Fariñas Ramiro

Diego

. 2015. “Infant and Child Mortality in the Past.” Annales de démographie Historique 129(1): 55–75.

49.

Rampazzo

Francesco

Bijak

Jakub

Vitali

Agnese

Weber

Ingmar

Zagheni

Emilio

. 2021. “A Framework for Estimating Migrant Stocks Using Digital Traces and Survey Data: An Application in the United Kingdom.” Demography 58(6): 2193–2218. doi:10.1215/00703370-9578562.

50.

R Core Team 2021 R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

51.

Schmertmann

Carl P.

Hauer

Mathew E.

. 2019. “Bayesian Estimation of Total Fertility From a Population’s Age–Sex Structure.” Statistical Modelling 19(3): 225–247. doi:10.1177/1471082X18801450.

52.

Sobotka

Tomáš

Lutz

Wolfgang

. 2010. “Misleading Policy Messages Derived From the Period Tfr: Should We Stop Using It?.” Comparative Population Studies 35(3):637–664. doi: 10.12765/CPoS-2010-15.

53.

Stelter

Robert

Alburez-Gutierrez

Diego

. 2022. “Representativeness is Crucial for Inferring Demographic Processes From Online Genealogies: Evidence From Lifespan Dynamics.” Proceedings of the National Academy of Sciences 119(10): e2120455119. 10.1073/pnas.2120455119.

54.

Turek

Daniel

Valpine

Perry de

Paciorek

Christopher J.

. 2024. “nimbleHMC: An R Package for Hamiltonian Monte Carlo Sampling in Nimble.” Journal of Open Source Software 9(99): 6745. doi:10.21105/joss.06745.

55.

Voutilainen

Miikka

Helske

Jouni

Högmander

Harri

. 2020. “A Bayesian Reconstruction of a Historical Population in Finland, 1647–1850.” Demography 57(3): 1171–1192. doi:10.1007/s13524-020-00889-1.

56.

Wachter

Kenneth W

. 2014. Essential Demographic Methods. Cambridge, Massachusetts: Harvard University Press.

57.

Weir

David R.

. 1984. “Fertility Transition in Rural France, 1740–1829.” The Journal of Economic History 44(2): 612–614.

58.

Wheldon

Mark C.

Raftery

Adrian E.

Clark

Samuel J.

Patrick

Gerland

. 2013. “Reconstructing Past Populations With Uncertainty From Fragmentary Data.” Journal of the American Statistical Association 108(501): 96–110. doi:10.1080/01621459.2012.737729.

59.

Wilmoth

John

Zureick

Sarah

Canudas-Romo

Vladimir

Inoue

Mie

Sawyer

Cheryl

. 2012. “A Flexible Two-Dimensional Mortality Model for Use in Indirect Estimation.” Population Studies 66(1): 1–28. doi:10.1080/00324728.2011.611411.

60.

Wrigley

Edward A

. 1985. “The Fall of Marital Fertility in Nineteenth-Century France: Exemplar or Exception?.” European Journal of Population 1(1): 31–60.

61.

Wrigley

Edward Anthony

Davies

Ros S.

Oeppen

James E.

Schofield

Roger S.

. 1997. English Population History From Family Reconstitution 1580–1837. Cambridge University Press. doi:10.1017/CBO9780511660344.

62.

Zhao

Zhongwei

. 2001. “Chinese Genealogies as a Source for Demographic Research: A Further Assessment of Their Reliability and Biases.” Population Studies 55(2): 181–193. doi:10.1080/00324720127690.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.03 MB

1.06 MB

Bayesian Indirect Estimation of Historical Fertility in Europe and US Using Online Genealogical Data

Abstract

Keywords

Introduction

Data

The FamiLinx Database

Methodological Motivation and Background

Indirect Estimation and Extension to Adjust for Biases

Background

Indirect Bias-Adjustment Multiplier

Bayesian Model

Data Model

Construction of K x , a , t

Model for Age-Specific Fertility

Prior and Model on Total Fertility Rates

Model and Priors for Age-Specific Mortality

Model for the Bias-Adjustment Multiplier

Effect of American Civil War

Model Implementation

Results

TFR Estimates Across Countries

Bias-Adjustment Estimates

Bayesian Model Validation and Accuracy of TFR Estimates

Conclusion

Limitations

Discussion of the Results

Supplemental Material

sj-pdf-1-smr-10.1177_00491241261463559 - Supplemental material for Bayesian Indirect Estimation of Historical Fertility in Europe and US Using Online Genealogical Data

Supplemental Material

sj-pdf-2-smr-10.1177_00491241261463559 - Supplemental material for Bayesian Indirect Estimation of Historical Fertility in Europe and US Using Online Genealogical Data

Footnotes

Acknowledgments

ORCID iDs

Funding

Declaration of Conflicting Interests

Preregistration statement

Code Availability Statement

Data Availability Statement

Material Availability Statement

Supplemental Material

Notes

Author Biographies

References

Supplementary Material

Construction of $K_{x, a, t}$