Abstract
We aim to evaluate the marginal effects of covariates on time-to-disability in the elderly under the semi-competing risks framework, as death dependently censors disability, not vice versa. It becomes particularly challenging when time-to-disability is subject to interval censoring due to intermittent assessments. A left truncation issue arises when the age time scale is applied. We develop a flexible two-parameter copula-based semiparametric transformation model for semi-competing risks data subject to interval censoring and left truncation. The two-parameter copula quantifies both upper and lower tail dependence between two margins. The semiparametric transformation models incorporate proportional hazards and proportional odds models in both margins. We propose a two-step sieve maximum likelihood estimation procedure and study the sieve estimators’ asymptotic properties. Simulations show that the proposed method corrects biases in the marginal method. We demonstrate the proposed method in a large-scale Chinese Longitudinal Healthy Longevity Study and provide new insights into preventing disability in the elderly. The proposed method could be applied to the general semi-competing risks data with intermittently assessed disease status.
Keywords
Introduction
Disability in the elderly is a long-term impairment largely attributable to disease conditions such as dementia, cardiovascular diseases, respiratory diseases, cancer, arthritis, and diabetes.1–3According to World Health Organization, more than
Motivated by the Chinese Longitudinal Healthy Longevity Study (CLHLS), with 12,969 elderly being recruited in 2002, this work aims to quantify the dependence strength between disability and death, estimate covariate effects on disability, as well as to characterize the cumulative incidence function of disability. For example, although the elderly are at risk of developing disability and progressing to death, 7 the explicit dependence strength between the time to disability and the time to death is unknown. Also, despite the wealth of evidence that sex plays an important role in the risk of disability, 8 its precise effect on time-to-disability among the Chinese elderly is unavailable in the literature. Moreover, the cumulative incidence function is desired to depict the absolute risk profile of disability, which is more appropriate than the survival function for decision making. 9
The disability data belong to the category of semi-competing risks data, 10 where one individual can experience both the non-terminal event of interest (i.e. disease) and the terminal event (i.e. death). The non-terminal event is dependently censored by the terminal event, but not vice versa. Semi-competing risks data arise in a broad range of applications including cancer, 11 cardiovascular diseases, 9 dementia, 12 and diabetes. 13 The semi-competing risks data are usually handled with two approaches: the illness-death model14–16 and the copula model.17–19 The illness-death model focuses on the transition intensities between states (i.e. health, disease, and death). A frailty effect can be introduced into the illness-death model to account for the dependence among transitions.20,12 The copula model constructs the joint distribution of disease and death by connecting marginal distributions via the copula function. The dependence strength can be explicitly estimated by the copula parameter. Because the marginal distribution in the copula model only involves time and covariates, the covariate effect on the disease can be interpreted on the marginal level. We choose the copula model in our motivating example because of its straightforward interpretation of dependence strength and covariate effects.
Modeling the semi-competing risks data of disability in CLHLS must address two important data attributes. First, the time scale by age is more preferred than by study entry in ageing studies like CLHLS. 21 However, the age scale induces the left truncation issue since CLHLS only recruits individuals aged 60 years or above. Second, since the recruited individuals were intermittently assessed every 3–4 years, the time-to-disability is interval-censored. Ignoring these data attributes may lead to biased estimation results. 22 Few works on disability have fully considered semi-competing (i.e. dependent-censoring), left-truncation, and interval-censoring. In this work, we aim to develop a copula model for semi-competing risks data subject to left truncation and interval censoring and apply it to the CLHLS data.
Despite the wealthy literature of copula models for semi-competing risks data subject to right censoring, there are few copula methods under interval censoring. Several works use copula to model bivariate interval-censored data.23–25 However, these earlier works could not deal with interval-censored semi-competing risks data. Ma et al. 26 develop a copula model for dependent current status data where the non-terminal event is only examined once at either the terminal event time or the study stopping time. The method does not apply to the general interval censoring situation, where each individual can be potentially examined multiple times before the terminal event or the study stopping time.
We propose a broad class of copula-based semiparametric regression models. Specifically, we use a two-parameter copula function that can flexibly account for dependence between the non-terminal and terminal events in both upper and lower tails. The dependence strength is explicitly expressed by Kendall’s
The remaining of this article is organized as follows. Section 2 introduces the model and the joint likelihood functions. Section 3 presents the sieve maximum likelihood estimation procedure, the estimation of the cumulative incidence function, and the asymptotic properties. Section 4 illustrates extensive simulation studies for the estimation performance of the proposed method. We analyze the CLHLS data and present the findings in Section 5. Finally, we discuss and conclude in Section 6. Additional results, the regularity conditions, and proofs are provided in the Supplemental Appendix.
Notation and likelihood
Copula model for semi-competing risks data
Assume that there are
According to the Sklar’s theorem,
27
the joint survival function of
In many longitudinal studies like CLHLS, the non-terminal event is interval-censored due to intermittent assessments. Denote
Next, we will describe four censoring scenarios in CLHLS as illustrated in Figure 1 and construct the likelihood function under each scenario.

Graphical representation of four censoring scenarios for disability (
Scenario 1 contains two different sub-scenarios: both Scenarios 1a and 1b represent the situation where neither disability nor death occurs by the study end time
Disability is observed between two adjacent assessment times
Scenario 3 involves three different sub-scenarios. Scenarios 3a and 3b represent situations where disability is not observed at the assessment time
Scenario 4 includes two different sub-scenarios. Scenario 4a represents the situation where disability is observed between two adjacent assessment times
Suppose a subject
We consider the semiparametric transformation model for the marginal survival function of
Sieve likelihood with Bernstein polynomials
In the likelihood function (4), we are interested in estimating the unknown parameter
In the log-likelihood function
In practice, we choose
In the next section, we describe an estimation procedure to maximize
Estimation procedure for sieve maximum likelihood estimators
We develop a two-step sieve maximum likelihood estimation procedure that ensures computational efficiency in semi-competing risks data. The procedure generally applies to any choice of Archimedean copulas and marginal models. In principle, one may obtain the sieve estimators by directly maximizing the log-likelihood function in one step. However, due to the complex structure of the likelihood function, directly maximizing the log-likelihood in a one-step procedure using randomly selected initial values will cause a high convergence failure rate. Therefore, by following previous works on copula model estimation,34,24 we propose a two-step estimation procedure by introducing a separate first step. The goal of the first step is to obtain good initial values for the optimization of the complex likelihood in the second step, which will greatly guarantee the convergence performance of the sieve estimation. All the parameters are updated through the full likelihood in the second step. Therefore, the coverage probabilities and variance estimates are all correct, as illustrated in simulations in Section 4.
Specifically, we first estimate Obtain initial estimates of Simultaneously maximize the joint sieve log-likelihood to get final estimates:
The two-step estimation procedure guarantees almost
Estimation of cumulative incidence function
In the semi-competing risks data analysis, instead of using the marginal survival function, it is more reasonable to use the cumulative incidence function to characterize the risk profile of the non-terminal event of interest.9,13 Given that a subject is alive and free of disability by age
We study the asymptotic properties of the sieve maximum likelihood estimators
(Convergence rate). Assume that Conditions 1-2 and 4-5 in the Supplemental Appendix hold. Let
(Asymptotic normality and efficiency). Suppose Conditions 1–5 in the Supplemental Appendix hold. Define
Theorem 3.1 states that the sieve estimator
We conduct simulations to evaluate the performance of the proposed method, including parameter estimation and cumulative incidence function estimation.
Data generation
The true event times are generated from various copula models (i.e. Clayton, Gumbel, AMH, Frank, and Gaussian) with Weibull (PH model, scale
Parameter estimation under correct copula specification
We evaluate the estimation performance of the proposed method. The true model is Clayton copula with Weibull (PH) or Loglogistic (PO) margins under Kendall’s
Estimation results for interval-censored and left-truncated semi-competing risks data under three models: the true Clayton copula with parametric margins (Weibull for proportional hazards and Loglogistic for proportional odds; denoted as Clayton-PM), the proposed Copula2 with semiparametric margins (Copula2-S), and the marginal semiparametric model (Marginal-S) for the non-terminal event. Kendall’s
is set as 0.6.
Estimation results for interval-censored and left-truncated semi-competing risks data under three models: the true Clayton copula with parametric margins (Weibull for proportional hazards and Loglogistic for proportional odds; denoted as Clayton-PM), the proposed Copula2 with semiparametric margins (Copula2-S), and the marginal semiparametric model (Marginal-S) for the non-terminal event. Kendall’s
SE: standard deviation of the point estimate; SEE: mean of the standard error estimates; CP:
Moreover, we evaluate the three models’ performance in two special cases: (1) no left truncation; (2) dependent current status data where the non-terminal event is only censored by the study stopping time or the terminal event time. The results are summarized in Tables S3 and S4 in the Supplemental Appendix. Overall, the performance of our proposed and the true models are close, whereas the marginal model leads to biased estimation and incorrect coverage probabilities.
We evaluate the estimation performance of the proposed model on data generated from copula models that do not belong to the Copula2 family, such as AMH copula with
Estimation results for interval-censored and left-truncated semi-competing risks data fitted with the proposed Copula2-S model (misspecified copula) where the true data are generated from AMH, Frank, and Gaussian copulas.
Estimation results for interval-censored and left-truncated semi-competing risks data fitted with the proposed Copula2-S model (misspecified copula) where the true data are generated from AMH, Frank, and Gaussian copulas.
SE: standard deviation of the point estimate; SEE: mean of the standard error estimates; CP:
We evaluate the estimation accuracy of the cumulative incidence function by the proposed method. Data are from the Clayton copula with Weibull margins. We fit the true parametric model, the proposed method, and the marginal model. For the true and proposed methods, we estimate the cumulative incidence functions

Simulation results for cumulative incidence function estimation when Kendall’s
One major objective of CLHLS is to investigate the development of disability related to the activities of daily life in the elderly. A total of 12,969 participants free of disability over 60 years of age were enrolled in 2002 and underwent five examinations in 2005, 2008, 2011, 2014, and 2018. Six tasks essential to independent living were measured to evaluate disability status at each examination, including eating, toileting, bathing, dressing, indoor activity, and continence. The outcome of interest is time-to-disability, where disability is defined as the inability to perform any task independently. 42 The time-to-disability is subject to dependent censoring by death, leading to semi-competing risks data. Due to the recruitment and intermittent assessment design, the time-to-disability is left-truncated and interval-censored. To the best of our knowledge, most existing CLHLS studies do not properly account for these complexities simultaneously.
We calculate time-to-disability from age 60. The truncation time is the enrollment age minus 60 years. We consider several covariates available at the baseline, including sex, education, and marital status. We exclude 180 subjects with missing disability status and 107 subjects with missing covariate values. The final analysis cohort contains 12,682 subjects. The mean follow-up time is 3.8 years with SD
Characteristics of subjects in the cohort of CLHLS.
Characteristics of subjects in the cohort of CLHLS.
CLHLS: Chinese Longitudinal Healthy Longevity Study.
We analyze the CLHLS data with the proposed method. We check various combinations of transformation functions (PH and PO models) and Bernstein polynomial degrees (3 to 6) and choose the model that produces the smallest AIC, which corresponds to the PH models with Bernstein polynomial degrees 3 for both margins. The AIC results are summarized in Table S6 in the Supplemental Appendix. The proposed method estimates the dependence parameters as

The Contour plot of the estimated Copula2 density function (i.e.
For the covariate effects on disability as shown in Table 4, by using the proposed method, sex (hazard ratio (HR) 0.814, confidence interval (CI) 0.779–0.850), education (HR 0.914, CI 0.862–0.969), and marital status (HR 0.815, CI 0.777–0.854) are significantly associated with reduced risks of developing disability. Specifically, females have a lower risk than males; people with higher education have a lower risk than people without; married people have a lower risk than unmarried people. These findings provide important insights into the prevention and management of disability. We also report results from a marginal semiparametric model on disability in Table 4. Although the estimated HRs in the marginal model are all smaller than 1, indicating the protective effects of these covariates, the coefficient values are dramatically different from that in the copula model. Specifically, the HRs are
Covariate effects on disability and death in CLHLS using the proposed Copula2-S model and the Marginal-S model.
CLHLS: Chinese Longitudinal Healthy Longevity Study; HR: hazard ratio; CI: confidence interval.
For the modeling of death, the findings are consistent between the copula and marginal methods, as shown in Table 4. Specifically, both methods report significant protective effects for sex, education, and marital status. The
We estimate the cumulative incidence functions of disability for females and males under both Copula2-S and Marginal-S models. The rest covariates are set as median values. For Copula2-S, the cumulative incidence function is

Estimation of cumulative incidence functions for disability in females and males by the proposed Copula2-S and the Marginal-S methods.
Semi-competing risks data frequently arise in clinical and epidemiological studies. With a high mortality risk for a disease, there could be incorrect estimation and inference results using a marginal method that ignores the dependence between disease and death. The situation becomes further complicated when the time of disease is subject to interval censoring and left truncation, which is commonly seen in many studies with the disease being intermittently assessed. We propose a flexible two-parameter copula-based semiparametric transformation model for semi-competing risks data subject to interval censoring and left truncation. The proposed method handles all possible censoring scenarios as depicted in Figure 1. By quantifying the dependence between disease and death, the proposed method provides correct estimations and highlights cumulative incidence profiles of disease progression associated with different baseline covariates. We develop a two-step sieve maximum likelihood estimation procedure and prove the sieve estimators’ asymptotic properties. All the methods have been built into an R package {CopulaCenR}, which is available on CRAN at https://cran.r-project.org/package=CopulaCenR.
Several copula selection methods are developed for semi-competing risks data under right censoring.45,46 However, there is a lack of goodness-of-fit tests under interval censoring. In CLHLS, we use AIC to guide the model selection for simplicity. A formal goodness-of-fit test is highly desirable.
We use the semi-parametric transformation model for the marginal distributions instead of parametric models (e.g. Weibull). When the parametric assumption is satisfied, the copula model with correct parametric margins produces correct estimation and inference results. However, when the parametric assumption is unsatisfied, the copula model with misspecified parametric margins will lead to biased estimation and inference results, even when the copula structure is correctly specified.47,34 We choose the semiparametric transformation model because of two advantages. First, the semiparametric model does not require a parametric assumption, which renders great flexibility in practice. In particular, its estimators of the finite-dimensional parameters (i.e. copula parameter(s) and regression coefficients) are proved to be as efficient as the estimators under the copula model with parametric margins. Second, the transformation model flexibly incorporates different survival assumptions (i.e. PH and PO). In contrast, the parametric model is usually restricted to one specific assumption (e.g. Weibull is PH, Loglogistic is PO). Because of its advantages, the semiparametric transformation model has been widely used in the literature on survival analysis.32,48,24
It is well recognized that the PH/PO assumptions may not hold in practice. The semiparametric transformation model incorporates various model assumptions through the transformation function, which is usually selected by AIC or BIC (i.e. Bayesian information criterion). For example, Li et al.
49
find that the transformation function
There are other possible approaches along the direction of the illness-death model, which can account for dependence between transitions by introducing a frailty effect. As discussed in several existing works,50,51,24 the marginal survival functions are modeled differently between copula and frailty models. Specifically, the marginal function under the copula model only involves the time and covariate effects, whereas the marginal function under the frailty model includes time, covariate effects, and the frailty parameter. As a result, the model parameters (i.e. covariate effects) are usually not directly comparable between the two methods. In this article, the objectives of our real study lead us to pursue the copula model instead of the frailty-based illness-death model because the copula model offers a straightforward interpretation of covariate effects and dependence strength. In addition, the copula model accounts for the potential differential mortality between diseased and non-diseased groups through covariate effects and the dependence between
We apply the proposed method to CLHLS, a large-scale longitudinal study of disability in the elderly, and successfully identify significant covariates associated with disability development, some of which are not detected in the marginal model. We find that the dependence between the time to disability and the time to death is extremely strong with Kendall’s
Our work is the first research on disability that adopts a solid statistical model that appropriately handles interval-censored and left-truncated semi-competing risks data. Our findings provide new insights and tools for the prevention and management of disability in the elderly. The proposed method can be applied to the general interval-censored and possibly left-truncated semi-competing risks data in clinical and epidemiological studies with intermittently assessed disease status.
Supplemental Material
sj-pdf-1-smm-10.1177_09622802221133552 - Supplemental material for Semiparametric copula method for semi-competing risks data subject to interval censoring and left truncation: Application to disability in elderly
Supplemental material, sj-pdf-1-smm-10.1177_09622802221133552 for Semiparametric copula method for semi-competing risks data subject to interval censoring and left truncation: Application to disability in elderly by Tao Sun, Yunlong Li, Zhengyan Xiao, Ying Ding and Xiaojun Wang in Statistical Methods in Medical Research
Footnotes
Acknowledgements
The authors thank the editor(s) and referees for their insightful comments and suggestions that have led to a significant improvement of this paper. The authors gratefully thank Peking University Open Access Research Database and National Archive of Computerized Data on Aging sponsored by National Institute of Aging (NIA/NIH) for providing the CLHLS data.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors acknowledge the support (to XJW and TS) from the National Natural Science Foundation of China (grant nos. 72101261, 72141306), the Ministry of Education of China (grant no. 20JZD023), National Bureau of Statistics of China (grant no. 2021LZ18), and the Fund for Building World-class Universities (Disciplines) of Renmin University of China.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
