Abstract
Maximum likelihood estimates of density function and regression coefficients in the proportional odds regression models are proposed and studied based on event-time data that are either completely or partly interval-censored. A smooth estimate of the survival function is then obtained. Theoretical results indicate that the proposed method enjoys an almost parametric
Keywords
Introduction
A limited number of parametric families of distributions are available for survival analysis. These parametric models are restrictive due to the small number of parameters. Nonparametric and semiparametric models such as the accelerate-failure-time (AFT) model,1–4 the proportional hazard (PH) regression model, 5 and the proportional odds (PO) model6–9 are much more flexible and thus are preferred.
Let
When data are interval-censored,13–15 the time
Traditional treatment of the nonparametric portion of a non- or a semi-parametric model is to approximate the unknown underlying continuous distribution function by a step-function with jumps at the observations and to parameterize it by the unknown jump sizes.17–19 When data are not censored this method results in efficient estimation. However, especially for small sample size data, the estimated distribution and density functions are unsatisfactory due to its roughness. They are even more unsatisfactory for interval-censored data. Turnbull
18
developed a self-consistency algorithm which can be viewed as an example of EM algorithm for estimating distribution function based on arbitrarily grouped, censored and truncated data. This approach does not work for the AFT regression model with interval-censoring. Although the estimation of the regression coefficients is efficient,
16
the distributional estimation based on interval censored data is typically
Guan
26
proposed to approximate the unknown underlying continuous distribution function defined on a closed interval by a Bernstein polynomial model and use the coefficients as parameters. This method has the advantage over other smoothing techniques such as kernel and spline methods. The Bernstein polynomial model is an actual probability model which is the mixture of beta distributions. This makes it possible and easy to find maximum likelihood estimates based on all kinds of interval-censored data and many other incompletely observed and contaminated data.27,28 This model was used to successfully obtain maximum likelihood estimation in the PH and AFT models based on interval censored data.4,29 The distributional estimation has much better than
The rest of the paper is organized as follows. Section 2 describes the proposed method and the algorithm to find maximum likelihood estimates. Some asymptotic results are given in Section 3. Simulation results and a real data application are presented in Sections 4 and 5. The proofs of the theoretical results are given in the Supplemental Material.
The likelihood
Let
Consider an individual interval-censored data
The first and second derivatives of
The full likelihood is
Let
Without specifying
We assume that
Then
If
For a fixed
Let
We can estimate
By (8) the derivatives of
For each
Introducing Lagrange function
Assuming
The equation
Therefore, starting with an initial guess
For an optimal degree Let Find the maximizer Choose Repeat Steps 1 and 2 until convergence. The final
The use of the working baseline
With a
We need some assumptions and conditions to state the asymptotic results.
The support
For each
The conditions on the examination times are similar to those used by other authors, for example, Huang and Wellner.
16
Assumption 2 is fulfilled with
Define statistical distances
Suppose that Assumptions 1 and 2 are satisfied. About the relations of these distances we have the following results.
Define
The sets
Suppose that Assumptions 1 and 2 are satisfied with degree
If
Theorem 3.3 implies that the proposed estimators posses an almost parametric
Suppose that Assumptions 1 and 2 are fulfilled, and
From the proof of this theorem we have approximation
The asymptotic normality of
In this simulation we shall use exponential and gamma distributions as the baseline
Simulation results with exponential baseline, sample size
, and case
censoring (
): (i) the bias(
) and the standard deviation(
) in parentheses of parametric estimator
, the proposed estimator
, and the semiparametric estimator
; (ii) the root mean integrated squared errors(
) of parametric estimators
and
, the proposed estimators
and
, and the semiparametric estimator
; and (iii) the convergence percentage C(%). For case 1 censoring, left-censoring (lc) rates are 25%, 50%, and 75%.
Simulation results with exponential baseline, sample size
Simulation results with gamma baseline, sample size
We shall compare the proposed method with the parametric method according to the known simulation models and the semiparametric method based on Turnbull’s method. We used
The event times are subject to Case
For a given covariate
For the exponential baseline with rate
For the gamma baseline with shape
An examplified comparison of the proposed method with the parametric and semiparametric methods is given in Figure 1.

Estimated survival(left) and density(right) curves based on a simulated current status dataset using model II with gamma baseline as shown in Table 2 and 75% left-censoring rate.
The simulation shows that (i) the proposed method performs much better than the semiparametric one; (ii) although the parametric method outperforms the proposed one the latter is closer to the former than to the semiparametric one in most cases; (iii) in some cases of the current status data the proposed method even outperforms the parametric method possibly due to the inflexibility of the latter; (iv) the current status data with 50% left-censoring produce better estimation for the specific examination time as generated in this simulation; (v) as sample size increases the percentage of convergent Monte Carlo runs increases.
Several authors37–39 studied the times to HIV infection for Danish homosexual men from two cities in Denmark. In this study among 297 who were tested at least once for HIV-antibody positivity on six different dates 65 have been tested positive. The dataset is contained in R package

Estimated survival curves for HIV infection time data with
Using the proposed method the
As pointed out by a referee, the Bernstein polynomial model as a global approximation may be difficult to capture features such as sudden changes. The degree of approximation of the generalized Bernstein polynomial for any continuous function depends on its smoothness. Much larger model degree
Footnotes
Acknowledgments
The author is grateful for the three anonymous reviewers for their careful reviews and insightful and useful comments.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interest
The author have no conflicting interests to declare with respect to the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
