Regression Analysis of Panel Count Data with Multiple Modes of Recurrence

Abstract

Panel count data arise when recurrent events experienced by the study subjects are recorded only at discrete observation times. In this article, we focus on the regression analysis of panel count data with multiple modes of recurrence. We introduce a proportional mean model to assess the effects of covariates on the underlying counting processes corresponding to different recurrence modes. The methodology includes the joint estimation of baseline cumulative cause-specific mean functions and regression coefficients. We also establish the asymptotic properties of the proposed estimators. A Monte Carlo simulation study is conducted to examine the performance of the proposed estimators in finite samples. We illustrate the practical applicability of the procedures using two real-life data sets.

AMS Subject Classification: 62P10

Keywords

Counting process panel count data proportional mean model pseudo-likelihood recurrent events

1. Introduction

In many longitudinal studies on recurrent events, instead of observing the exact time to the occurrence of an event, we may only observe the number of recurrences experienced by a subject in a given time interval. If each subject can be observed at more than one time point, the number of recurrences between two successive observation times is available. The data collected in this form are referred to as panel count data.^[1] Panel count data frequently arise in many fields, such as clinical trials, epidemiological studies and engineering, when continuous follow-up to obtain the exact event times of each subject is infeasible or too costly. ^[2] Some authors refer to panel count data as interval count data or interval censored recurrent event data.^{[3, 4]} Sun and Zhao^[5] provide a comprehensive review of various methodologies for modelling panel count data. In certain situations, subjects can be observed at only a single monitoring time, and only the information on whether the event of interest has occurred or not by that time is available. This type of data are referred to as current status data in the literature. For a detailed study on current status data, one may refer to the study of Sun.^[6]

The analysis of panel count data can be approached either by modelling the mean function or the rate function of the underlying recurrent event process. Sun and Kalbfleisch^[7] developed an estimator for the mean function based on isotonic regression theory. Likelihood-based non-parametric estimation methods for the mean function are discussed by Wellner and Zhang,^[8] and they proposed a non-parametric maximum likelihood estimator (NPMLE) as well as a non- parametric maximum pseudo-likelihood estimator (NPMPLE) for the same. The asymptotic properties of both NPMPLE and NPMLE are studied in detail by Wellner and Zhang.^[8] The analysis of panel count data using rate functions is studied by Lawless and Zhan^[3] and Thall and Lachin.^[9]

In most of the studies involving panel count data, we observe a covariate vector Z for each subject that affects the underlying counting process of the recurrent events. Two important approaches employed for analyzing regression models for panel count data are the maximum likelihood method and the generalized estimating equation approach. Regression analysis of panel count data using the likelihood approach is studied extensively by Zhang^[10] and Wellner and Zhang,^[11] while Hu et al.^[12] explored the generalized estimating equation approach for the same. The regression analysis of panel count data with informative censoring is addressed by Sun and Wei,^[13] Huang et al.^[14] and Zhao and Tong.^[15] Notably, Zhao et al.^[16] proposed a non-parametric regression model for panel count data to accommodate the potential non-linear covariate effect. Regression analysis of multivariate panel count data is considered by He et al.^[17] and Zhang et al.^[18] Various semiparametric regression models for panel count data using the R programming language are reviewed by Chiou et al.^[2]

Consider a study where each of the study subjects is exposed to the recurrent events due to more than one mode of recurrence. In this situation, the number of recurrences due to each distinct mode (cause) at different observation times is available. Consequently, we obtain panel count data with multiple modes of recurrence. It is important to distinguish this data type from multivariate panel count data, where multiple related events are observed exclusively at discrete time points. While panel count data have attracted research interest for the past two decades, the literature addressing cases with multiple recurrence modes remains limited. The non-parametric estimation of cause-specific mean functions is addressed by Sreedevi and Sankaran^[19] while Paduthol et al.^[20] developed non-parametric estimators for the cause-specific rate functions. Both these works considered panel count data without covariates. Regression analysis of panel count data with multiple modes of recurrence has not been studied yet. Motivated by this, in this article, we propose a proportional mean model to assess the covariate effect on the cumulative mean functions for different modes of recurrence.

The rest of the article is organized as follows. In Section 2, we propose a new proportional mean model to estimate the baseline cumulative mean functions and regression parameters due to each mode of recurrence simultaneously. A simple iterative algorithm is derived for the estimation procedure. Asymptotic properties of the proposed estimators are established in Section 3. In Section 4, the finite sample behaviour of the proposed estimators is validated through a Monte Carlo simulation study. The proposed procedures are illustrated using two real data sets in Section 5. Finally, Section 6 gives concluding remarks.

2. The Proportional Mean Model

Consider a study on n individuals exposed to the recurrent events due to {1, 2, …, k} distinct modes. Assume that the event process can be observed only at a sequence of random monitoring times. As a result, only the counts of event recurrences for each mode of recurrence, between observation times, are available, while the exact recurrence times remain unknown.

Define a counting process N_j(t) = {N_j(t); t ≥ 0}, where N_j(t) denotes the number of recurrences of the event due to cause j up to time t. Now, E(N_j(t)) = Λ _j (t) for j = 1, 2, …, k denotes the expected number of cumulative events due to mode (cause) j up to time t. The function Λ _j (t) for j = 1, 2, …, k is the mean function of the counting process N_j(t), which is termed as the cause-specific mean function.^[19] Assume that, corresponding to each subject, we observe a d × 1 vector of covariates denoted by Z. Our interest is to study E(N_j(t)|Z) = Λ _j (t|Z) for j = 1, 2, …, k, the expected number of cumulative events due to cause j up to time t conditionally on covariate vector Z. To estimate the effect of covariate vector Z on lifetime T, we propose the proportional mean model given by

Λ_{j} (t | Z) = Λ_{0 j} (t) \exp (β_{j}^{'} Z) j = 1, 2, \dots, k,

(2.1)

where Λ₀ _j (.) is the completely unspecified baseline cumulative cause-specific mean function and β_j is the d × 1 vector of regression parameters corresponding to cause j. When k = 1, the model in Eq. (2.1) reduces to the proportional mean model for panel count data with a single mode of recurrence, studied by Zhang,^[10] Wellner and Zhang^[11] and Sun and Wei.^[13]

We now discuss the structure of panel count data exposed to multiple modes of recurrence in the presence of covariates. Let M be an integer-valued random variable denoting the number of observation times, which may be different for each individual, and $\underline{T} = \{T_{M, p}, p = 1, 2, \dots, M; M = 1, 2, \dots\}$ be the set of observation times. Now $T_{M, p - 1} \leq T_{M, p}$ for p = 1, 2, …, M and for all possible values of M. Assume that N_j(t) and $(M, \underline{T})$ are independent. Let $N_{M, p}^{j}$ denote the number of recurrences of the event due to cause j up to monitoring time M for p = 1, 2, …, M and j = 1, 2, …, k, with $N_{M, p}^{j} = N_{j} (T_{M, p})$ .

For each subject, we also observe a d × 1 vector of covariates Z. Now, we observe n independent and identically distributed (i.i.d) copies of $\{M, T_{M, p}, N_{M, p}^{1}, \dots, N_{M, p}^{k}, Z\}, p = 1, 2, \dots, M$ . Consequently, the observed data will be of the form $\{m_{i}, t_{m_{i}, p}, n_{m_{i}, p}^{1}, \dots, n_{m_{i}, p}^{k}, z_{i}\}$ , where p = 1, 2, …, m_i and i = 1, 2, …, n. It can be noted that the exact recurrence times remain unobserved throughout the study.

The regression analysis of panel count data based on maximum likelihood methods with a single recurrence mode is explored by Zhang^[10] and Wellner and Zhang.^[11] They constructed a pseudo-likelihood of the observed data using the marginal distributions of the counting process N(t|Z), which denote the number of recurrences up to time t, conditional on Z. N(t|Z) is specified by a non-homogeneous Poisson process given by

P (N (t) = m | Z) = \frac{Λ {(t | Z)}^{m}}{m!} \exp (- Λ (t | Z)), for m = 0, 1, 2, \dots .

where m is the total number of observations. In the construction of pseudo-likelihood, they only consider the factors relevant in estimating the regression parameters β and the baseline cumulative mean function Λ₀(t), while ignoring the dependence between event counts within a subject. Notably, Wellner and Zhang^[11] pointed out that the procedures based on pseudo-likelihood considerably reduce the complexity of the estimation procedures, compared to those based on complete likelihood.

In the presence of multiple modes of recurrence, Sreedevi and Sankaran^[19] developed a pseudo-likelihood function for the observed data and derived an isotonic regression estimator (IRE) for cause-specific mean functions. They constructed a pseudo-likelihood function for the observed data (which does not involve covariates) by extending methods by Wellner and Zhang.^[11]

We now derive a pseudo-likelihood function in the presence of multiple recurrence modes and covariates, to estimate both Λ₀ _j (.) and β_j simultaneously. The estimates are obtained as the values that maximize the pseudo-likelihood. Under the assumption that the underlying counting process N_j(t) is a non-homogeneous Poisson process with conditional mean function given in Eq. (2.1), for j = 1, 2, …, k, we obtain

P (N_{j} (t) = m^{'} | Z) = \frac{{(Λ_{0 j} \exp (β_{j}^{'} Z))}^{m^{'}} \exp (- Λ_{0 j} \exp (β_{j}^{'} Z))}{m^{'}!} for m^{'} = 0, 1, 2, \dots .

(2.2)

where $m^{'}$ is the total number of observations. When the k > 1 modes of recurrence are independent, the log pseudo-likelihood function of the observed data can be written as

l_{n} (β_{j}, {\underline{Λ}}_{0 j}, \underline{X}) = \sum_{j = 1}^{k} l_{n j} (β_{j}, Λ_{0 j}, \underline{X})

(2.3)

where $\underline{X}$ is the observed data given by $\underline{X} = \{M, T_{M, p}, N_{M, p}^{1}, \dots N_{M, p}^{k}, Z\}$ and $l_{n j} (Λ_{0 j}, \underline{X})$ is the log likelihood corresponding to j^th cause. By extending the results in Zhang^[10] for panel count data with single mode of failure and after ignoring the insignificant parts in the estimation of β_j’s and Λ₀ _j ’s, the log-likelihood for j^th mode of recurrence $i_{n j} (β_{j}, Λ_{0 j}, \underline{X})$ is given by

l_{n j} (β_{j}, Λ_{0 j}, \underline{X}) = \sum_{i = 1}^{n} \sum_{p = 1}^{M_{i}} [N_{M_{i}, p}^{j} \log Λ_{0 j} (T_{M_{i}, p}) + N_{M_{i}, p}^{j} (β_{j}^{'} Z_{i}) - Λ_{0 j} (T_{M_{i}, p}) \exp (β_{j}^{'} Z_{i})] j = 1, 2, \dots, k,

(2.4)

where M_i is the number of observation times, T_Mi,p, p = 1, 2, …, M_i, is the different observation times and $N_{M_{i}, p}^{j}, p = 1, 2, \dots, M_{i}, j = 1, 2, \dots, k$ , is the number of recurrences of the event due to cause j for i-th individual. We assume that given the covariate vector Z, the distributions of T and M are independent of β_j and Λ₀ _j . We maximize the log pseudo-likelihood given in Eq. (2.4) to obtain the estimators of β_j and Λ₀ _j .

We now discuss the computational procedures. Based on the observed data X discussed above, we define the following terms. Let I(A) be the indicator function of the set A and s₁ < s₂ < … < s_r be the distinct ordered observation time points in the set {T_Mi,p, p = 1, 2, …, M_i, i = 1, 2, …, n}. For q ∈ {1, 2, …, r} and for any particular cause of recurrence J, define

b_{q j} = \sum_{i = 1}^{n} \sum_{p = 1}^{M_{i}} I [T_{M_{i}, p} = s_{q}; J = j],

(2.5)

the number of observations made at s_q due to cause j and

{\bar{n}}_{q j} = \frac{1}{b_{q j}} \sum_{i = 1}^{n} \sum_{p = 1}^{M_{i}} N_{M_{i}, p}^{j} I [T_{M_{i}, p} = s_{q}; J = j]

(2.6)

as the mean value of the recurrences made at s_q due to cause j for j = 1, 2, …, k. Also define

V_{q j} (β_{j}, Z) = \frac{1}{b_{q j}} \sum_{i = 1}^{n} \sum_{p = 1}^{M_{i}} \exp (β_{j}^{'} Z_{i}) I [T_{M_{i}, p} = s_{q}; J = j]

(2.7)

and

W_{q j} (β_{j}, Z, N^{j}) = \frac{1}{b_{q j}} \sum_{i = 1}^{n} \sum_{p = 1}^{M_{i}} N_{M_{i}, p}^{j} (β_{j}^{'} Z_{i}) I [T_{M_{i}, p} = s_{q}; J = j] .

(2.8)

Now we can rewrite the log pseudo-likelihood for the jth mode of recurrence given in Eq. (2.4) as

l_{n j} (β_{j}, Λ_{0 j} | \underline{X}) = \sum_{q = 1}^{r} b_{q j} [{\bar{n}}_{q j} \log Λ_{0 j} (s_{q}) - V_{q j} (β_{j}, Z) Λ_{0 j} (s_{q}) + W_{q j} (β_{j}, Z, N_{j})]

(2.9)

We maximize Eq. (2.9) to obtain the estimates of β_j and Λ _j (.) for j = 1, 2, …, k. The obtained semiparametric maximum pseudo-likelihood estimators will be the values of parameters that maximize Eq. (2.9) over the set R^d × Ω⁺, where R is the set of real numbers and Ω⁺ = {(y₁, y₂, …, y_r) ∈ R^r : y₁ ≤ y₂ ≤ … ≤ y_r}. Under the assumption that Λ _j (0) = 0 and r is related to n, the estimators can be obtained as

(\hat{β_{j}}, \hat{Λ_{0 j}}) = \underset{(β_{j}, Λ_{0 j}) \in R^{d} \times Ω^{+}}{a r g m a x} l_{n j} (β_{j}, Λ_{0 j} | \underline{X}) .

(2.10)

To solve the optimization problem numerically, we first choose an initial value of β_j, say $β_{j}^{0}$ . Now for a fixed β_j, the estimator of Λ₀ _j can be obtained as ${\hat{Λ}}_{0 j} (β_{j}^{0}) = \underset{Λ_{0 j} \in Ω^{+}}{a r g m a x} (l_{n j}^{*} (Λ_{0} j | β_{j}^{0}, \underline{X}))$ , where

l_{n j}^{*} (Λ_{0} j | β_{j}, \underline{X}) = \sum_{q = 1}^{r} b_{q j} [{\bar{n}}_{q j} \log Λ_{0 j} (s_{q}) - V_{q j} (β_{j}, Z) Λ_{0 j} (s_{q})] .

(2.11)

Let $Λ_{0 j}^{0}$ be the solution of Eq. (2.11). Now using the estimated value of $Λ_{0 j}^{0}$ , we can find the updated estimate of β_j as ${\hat{β}}_{j} (Λ_{0 j}^{0}) = \underset{β_{j} \in R^{d}}{a r g m a x} (I_{n j}^{* *} (β_{j} {|Λ}_{0 j}^{0}, \underline{X})$ where

l_{n j}^{* *} (β_{j} {|Λ}_{0} j, \underline{X}) = \sum_{q = 1}^{r} b_{q j} [W_{q j} (β_{j}, Z, N_{j}) - V_{q j} (β_{j}, Z) Λ_{0 j} (s_{q})] .

(2.12)

The process is continued until the estimators converge. The convergence criteria can be chosen as

|\frac{l_{n j}^{(h + 1)} - l_{n j}^{(h)}}{l_{n j}^{(h)}}| \leq ϵ,

(2.13)

where $l_{n j}^{(h)} = l_{n} (β_{j}^{(h)}, Λ_{0 j}^{(h)})$ for h = 0, 1, 2, …..

To estimate (β_j, Λ₀ _j ) for j = 1, 2, …, k, the computational algorithm can be summarized as follows:

Step 1. Choose an initial value β_j, say $β_{j}^{0}$ .

Step 2. For the given $β_{j}^{h}$ , compute $Λ_{0 j}^{h}$ as the maximum argument of Eq. (2.11), given by

{\hat{Λ}}_{0 j}^{h} (β_{j}^{h}) = \underset{Λ_{0 j} \in Ω^{+}}{argmax} (l_{n j}^{*} (Λ_{0} j | β_{j}^{h}, \underline{X}) .

Step 3. Update the estimate of $β_{j}^{h}$ , using the estimate of $Λ_{0 j}^{h}$ obtained in Step 2, as the maximum argument of Eq. (2.12), given by

{\hat{β}}_{j} (Λ_{0 j}^{h}) = \underset{β_{j} \in R^{d}}{argmax} (l_{n j}^{* *} (β_{j} {|Λ}_{0 j}^{h}, \underline{X}),

and obtain the value of $β_{j}^{(h + 1)}$ .

Step 4. Repeat steps 2 and 3 for h = 1, 2, …, until the convergence criteria in Eq. (2.13) are obtained.

We establish strong consistency of the proposed estimators in the Appendix. The rate of convergence of estimators in an L₂ metric related to the observation scheme is also derived. We also establish the asymptotic normality of ${\hat{β}}_{j}, j = 1, 2.., k$ under some mild regularity conditions.

3. Simulation Study

We carry out a Monte Carlo simulation study to assess the performance of the proposed estimation procedure in finite samples. We consider the situation with two competing modes of recurrence. Based on the assumption that event counts from two recurrence modes follow a non-homogeneous Poisson process, we model the event counts for both modes of recurrence simultaneously using a bivariate Poisson process. A similar approach was previously employed by Balakrishnan and Zhao^[21] in a non-competing risks framework. We generate panel count data of the form $\{M_{i}, T_{M_{i}, p}, N_{M_{i}, p}^{1}, N_{M_{i}, p}^{2}, Z_{i}\}$ for $p = 1, 2, \dots, M_{i}, i = 1, 2, \dots, n$ . We consider $Z_{i} = {\{Z_{i 1}, Z_{i 2}\}}^{'}$ , as the covariate vector with two mutually independent components. For each subject, Z_i₁ is generated from a Bernoulli distribution with a probability of success of 0.5, and Z_i₂ is generated from a normal distribution with a mean of 0 and a standard deviation 0.5. The number of observation times M_i for each individual is generated from a discrete uniform distribution U(1, 5) for i = 1, 2, …, n. Thus, the maximum number of observations for each individual is restricted to 5. Then we generated gap times between each observation from a uniform distribution U(1, 5). The discrete observation time points T_Mi,p for p = 1, 2, …, M_i and i = 1, 2, …, n are generated using the above-mentioned time gaps. Once the observation times are generated, number of recurrences $\{N_{M_{i}, p}^{1}, N_{M_{i}, p}^{2}\}$ are generated from a bivariate Poisson process given by

(Δ N_{M_{i}, p}^{1}, Δ N_{M_{i}, p}^{2}) \sim BivPo (Λ_{01} (Δ T_{M_{i}, p}) \exp (β_{1}^{'} Z_{i}), Λ_{02} (Δ T_{M_{i}, p}) \exp (β_{2}^{'} Z_{i}), ρ),

(3.1)

where $Δ N_{M_{i}, p}^{j} = N_{M_{i}, p}^{j} - N_{M_{i}, p - 1}^{j}$ for $j = 1, 2; Δ T_{M_{i}, p} = T_{M_{i}, p} - T_{M_{i}, p - 1}, Λ_{01} (t)$ and $Λ_{02} (t)$ are the true baseline cumulative mean functions due to modes l and 2, β₁ and β₂ are the values of regression parameters due to modes l and 2 and ρ is the covariance between the number of recurrences due to modes 1 and 2, which is considered as a fixed parameter in our study.

We consider various forms of Λ₀₁(t) and Λ₀₂(t) specified as t, 2t and t² to generate panel count data. The sample size n is chosen to have three different values, that is, n = 50, 100 and 200. The process is repeated 10,000 times to estimate the efficiency of the estimators. The absolute bias and mean square error (MSE) of the estimates of β₁ = {β₁₁, β₁₂} and β₂ = {β₂₁, β₂₂} are obtained. Various parameter values of β₁ and β₂ are considered. Since the results are similar, we present the same only for three different combinations of β₁ for β₂ in Table 1. We choose ϵ = 10⁻⁵ to obtain the convergence. The covariance ρ is set to be 0.5 in our studies. The simulations are carried out using the R programming language.

Table 1.

Absolute Bias and MSE of the Estimators of Regression Coefficients for Various Parameter Combinations of (β₁₁, β₁₂) and (β₂₁, β₂₂).

True Baseline Function	n	Bias11	MSE11	Bias12	MSE12	Bias21	MSE21	Bias22	MSE22
		(β₁₁, β₁₂) = (0.5, 1)				(β₂₁, β₂₂) = (1, −0.5)
	50	0.0271	0.0212	0.0401	0.0121	0.0245	0.1081	0.0231	0.0243
Λ₀₁(t) = t, Λ₀₂(t) = t²	100	0.0138	0.0109	0.0218	0.0099	0.0928	0.0856	0.0187	0.0208
	200	0.0098	0.0056	0.0102	0.0071	0.0723	0.0089	0.0077	0.0112
	50	0.0172	0.0145	0.0454	0.0113	0.0211	0.1097	0.0278	0.0218
Λ₀₁(t) = t², Λ₀₂(t) = 2t	100	0.0104	0.0098	0.0152	0.0074	0.0141	0.0954	0.0124	0.0122
	200	0.0076	0.0027	0.0098	0.0059	0.0769	0.0086	0.0076	0.0061
	50	0.0121	0.011	0.0278	0.0279	0.0515	0.0352	0.0182	0.0114
Λ₀₁(t) = t, Λ₀₂(t) = 2t	100	0.0091	0.0065	0.0131	0.0138	0.0061	0.0102	0.0112	0.0098
	200	0.0052	0.0015	0.0076	0.002	0.0057	0.0076	0.0069	0.0055
		(β₁₁, β₁₂) = (1, 0.5)				(β₂₁, β₂₂) = (0.5, 1)
	50	0.1218	0.0189	0.0372	0.0176	0.0322	0.1108	0.0275	0.0218
Λ₀₁(t) = t, Λ₀₂(t) = t²	100	0.1018	0.0105	0.0146	0.0088	0.0245	0.0815	0.0178	0.0111
	200	0.0923	0.0094	0.0131	0.0055	0.0116	0.0312	0.0093	0.0085
	50	0.0218	0.0118	0.0271	0.0162	0.0215	0.0541	0.0354	0.0221
Λ₀₁(t) = t², Λ₀₂(t) = 2t	100	0.0119	0.0092	0.0198	0.0118	0.0161	0.0254	0.0219	0.0131
	200	0.0103	0.0058	0.0119	0.0074	0.0093	0.0126	0.0141	0.0091
	50	0.0548	0.0149	0.0213	0.0232	0.0154	0.0201	0.0234	0.0237
Λ₀₁(t) = t, Λ₀₂(t) = 2t	100	0.0393	0.0108	0.0104	0.0117	0.0084	0.0125	0.0142	0.0133
	200	0.0712	0.0087	0.0091	0.0054	0.0047	0.0081	0.0078	0.0071
		(β₁₁, β₁₂) = (1, −2)				(β₂₁, β₂₂) = (−1, 2)
	50	0.0343	0.0221	0.0438	0.0156	0.0487	0.0265	0.0329	0.0307
Λ₀₁(t) = t, Λ₀₂(t) = t²	100	0.0287	0.0109	0.0213	0.0105	0.0354	0.0189	0.0188	0.0298
	200	0.0121	0.0089	0.0141	0.0081	0.0231	0.0098	0.0073	0.0222
	50	0.0287	0.0261	0.0472	0.0198	0.0278	0.0438	0.0216	0.0212
Λ₀₁(t) = t², Λ₀₂(t) = 2t	100	0.0226	0.0191	0.0313	0.0131	0.0171	0.0327	0.0189	0.0165
	200	0.0148	0.0111	0.0135	0.0093	0.0126	0.0146	0.0109	0.0098
	50	0.0213	0.0181	0.0298	0.0255	0.0279	0.0401	0.0201	0.0171
Λ₀₁(t) = t, Λ₀₂(t) = 2t	100	0.0116	0.0108	0.0177	0.0117	0.0162	0.0284	0.0148	0.0132
	200	0.0098	0.0074	0.0132	0.0078	0.0119	0.0172	0.0098	0.0105

From simulation studies, we observe that the absolute bias and MSE of the estimators of regression coefficients approach zero as the sample size increases. This ensures that the proposed estimators are unbiased with nominal variance. Further, to corroborate the results, we plot the true baseline mean functions and the estimated baseline mean functions for all the parameter combinations considered in the study. The plots of same for the parameter combinations (β₁₁, β₁₂) = (0.5, 1) and (β₂₁, β₂₂) = (1, −0.5) for both modes of recurrence and for n = 50, 100 and 200 for all the choices of Λ₀ _j (t) are given in Figure 1. From Figure 1, we can see that the estimated baseline mean functions align with the assumed form when Λ₀₁(t) or Λ₀₂(t) = t and Λ₀₁(t) or Λ₀₂(t) = 2t. A slight deviation is observed for large values of t, when Λ₀₁(t) or Λ₀₂(t) = t². For the other parameter combinations of (β₁₁, β₁₂) and (β₂₁, β₂₂), we also obtain similar results, for which the plots are not included. The plots in Figure 1 validate the results of simulation studies.

Figure 1.

Plots of Baseline Mean Functions.

4. Data Analysis

In this section, we demonstrate the practical utility of the proposed inference procedures using two real-life data sets.

Table 2.

Estimates of the Regression Parameters with Corresponding for Males.

Cause	Covariate	Coefficient	SE
BCC	DFMO	−0.3715	0.2331
	No of prior tumours	0.0685	0.0103
SCC	DFMO	−0.2408	0.0460
	No of prior tumours	0.1013	0.0292

Table 3.

Estimates of the Regression Parameters with Corresponding Standard Error for Females.

Cause	Covariate	Coefficient	SE
BCC	DFMO	−0.1671	0.0347
	No of prior tumours	0.0666	0.0456
SCC	DFMO	0.9557	0.0964
	No of prior tumours	0.1053	0.0458

4.1. Skin Cancer Data

The proposed estimation procedure is applied to real data on a skin cancer chemoprevention trial given by Sun and Zhao^[5] for illustration. The primary objective of this study was to evaluate the effectiveness of the drug difluoromethylornithine (DFMO) in reducing new skin cancers in a population with a history of non-melanoma skin cancers, basal cell carcinoma (BCC) and squamous cell carcinoma (SCC). The patients were randomly assigned into two groups, that is, a treatment group with oral DFMO at a daily dose of 0.5 g and a placebo group with a matching dosage. The data consist of the details of 290 patients with a history of non-melanoma skin cancers who were supposed to be assessed or observed every 6 months. However, the real observation and follow-up times differ from patient to patient. The data include the number of recurrences of two types of recurrent events, that is, BCC and SCC. We treat these two types of cancers as two modes of recurrence following Sreedevi and Sankaran.^[19]

In the data set, the number of observations on an individual varies from 1 to 17 and time of observation varies from 12 to 1,766 days. For each individual, the pieces of information on age, gender, DFMO status and the number of prior tumours are observed. We consider all 290 patients in our analysis that includes 174 male and 116 female patients. To obtain more explicit conclusions, we analyze the data on males and females separately by taking the covariate information on DFMO status and number of prior tumours. Out of 290 patients, 147 were assigned to the placebo group and the remaining 143 were treated with oral DFMO. The number of prior tumours varies from 1 to 35. The estimates of regression parameters with corresponding standard errors (SEs) for males are given in Table 2. The baseline cumulative cause-specific mean functions for males are plotted in Figure 2. The estimates of regression parameters for females are given in Table 3, and the baseline cumulative cause-specific mean functions are plotted in Figure 3. The solid line represents the baseline cumulative mean function for patients with BCC, and dotted line represents the baseline cumulative mean function for patients with SCC in Figures 2 and 3.

From Tables 2 and 3, we can see that modes of cancer recurrences, BCC and SCC, affect males and females in different ways. The estimates of regression coefficients for the number of prior tumours are greater than zero for both males and females and for both modes BCC and SCC. This implies that as the number of prior tumours increases, the recurrence rate always increases. While considering the covariate effect of drug DFMO, since the mean ratio is less than unity, we can say that the drug DFMO decreases the recurrence rate of BCC for males and females, as well as SCC for males. We can note that similar inference was made Sun and Zhao.^[5] From Figure 2, we can see that the recurrence rate of BCC is higher in males than the recurrence rate of SCC up to 1,800 days (approximately) and from that point the recurrence rate of SCC crosses that of BCC, while Figure 3 shows that for females, the recurrence rate of SCC is always lower than that of BCC. The plots also show the difference in recurrence patterns of the events due BCC and SCC for males and females.

Figure 2.

Baseline Cumulative Cause-Specific Cumulative Mean Functions for Males.

4.2. Severity Score Data

We also consider the data from a clinical trial study given by Crowder^[22] to illustrate the procedures. The data consist of the treatment specification and medical conditions of 29 patients in a clinical study, to compare two different treatment regimes. The specifics of the treatment and the medical condition are concealed. The severity scores of patients are listed on an increasing scale from 0 to 5. The recurrent times and the severity levels for each patient are available. We define the first mode of recurrence as recurrent episodes of severity 1 (which occurs most frequently) and the second mode of recurrence as recurrent episodes of severity 2 or above. Out of a total of 77 recurrences, 48 are due to recurrence mode 1 and the remaining 29 happened as a result of recurrence mode 2. The longest follow-up time reported in the study is 28 days. To generate panel count data, we arbitrarily select the observation time points as 7, 14, 21 and 28 days, keeping the patient’s original end-of-study time unchanged, and the number of recurrences in between the observation times is noted. We consider the treatment regime (Z = 0 for treatment 1 and Z = 0 for treatment 2) and the patient’s age as covariates in our analysis. The estimates of regression parameters with corresponding SEs are given in Table 4.

Figure 3.

Baseline Cumulative Cause-Specific Cumulative Mean Functions for Females.

Table 4.

Estimates of the Regression Parameters with Corresponding SE for Severity Score Data.

Cause	Covariate	Coefficient	SE
Mode l	Treatment	0.0285	0.1307
	Age	−0.0096	0.0002
Mode 2	Treatment	0.4311	0.2956
	Age	−0.0001	0.0003

From the estimates of regression coefficients, we can conclude that treatment 1 reduces the number of recurrences of individuals for both modes of failure, but extensively for recurrences due to mode 2. Moreover, we see that the covariate age has no effect on the recurrence pattern due to both modes 1 and 2 failures for patients. From the plots of baseline cumulative cause-specific mean functions plotted in Figure 4, we note that recurrence rate of mode 1 is greater than the recurrence rate due to mode 2 up to 9 days (approximately), and after that, the recurrence rate due to mode 2 dominates.

Figure 4.

Baseline Cumulative Cause-Specific Cumulative Mean Functions for 29 Patients.

5. Concluding Remarks

Panel count data with multiple modes of recurrence often arise in periodic follow-up studies. In this article, we proposed a new proportional mean model to assess the covariate effect on various modes of recurrence for panel count data with multiple modes of recurrence. We can note that panel count data with multiple modes of recurrence are different from multivariate panel count data, where the recurrent events of several related events are studied. We derived the estimators for regression parameters and baseline cumulative mean functions due to each recurrence mode. A simple iterative procedure was developed for the estimation of parameters. The finite sample performance of the estimators in terms of bias and MSE was assessed through a Monte Carlo simulation study. The proposed procedures were applied to two real-life data sets.

In this article, we follow the assumption that the modes of recurrence are independent. In many real-life scenarios, we observe the modes of recurrence to be dependent on each other. Regression models for panel count data with multiple dependent recurrence modes are an area of future research. The estimation procedure we developed in this article considered the pseudo-likelihood function of panel count data. Maximum likelihood estimators in this context can be developed by extending the results of Wellner and Zhang,^[11] which involves a more complex iterative procedure. The regression analysis of panel count data with multiple recurrence modes can also be approached using generalized estimating equations. In many situations, rate functions of the underlying recurrent event process are more important than mean functions. Cause-specific rate functions developed by Sankaran et al.^[20] can be used to study panel count data when the study subjects are exposed to multiple recurrence modes. Empirical likelihood methods that combine the flexibility of nonparametric methods with the strength of likelihood-based inference can also be employed in analyzing panel count data with multiple recurrence modes.

Footnotes

Acknowledgements

The authors would like to express their gratitude to the associate editor and the unknown referees for their valuable comments and constructive suggestions, which have significantly enhanced the quality of this research article.

Data Availability

The source of the data set used for illustrative purposes is mentioned in the manuscript.

Declaration of Conflicting Interest

The authors declare no conflict of financial or non-financial interests that are directly or indirectly related to this research work.

Ethical Statement

The submitted work is original and has not been published elsewhere.

Funding

The authors received no financial support for the research, authorship and/or publication of this article.

ORCID iD

Sreedevi E. P.

APPENDIX

The asymptotic properties of the proposed estimators can be derived using results from empirical process theory. Notably, Zhang^[10] proved some results about the asymptotic behaviour of the semiparametric pseudo maximum likelihood estimators when only a single mode of recurrence is observed and later Wellner and Zhang^[11] modified the results. When recurrence due to multiple modes is observed, Sreedevi and Sankaran^[19] studied the asymptotic properties of cause-specific mean functions. We extend the results of Wellner and Zhang^[8] into a multiple-mode scenario and generalize the results discussed by Sreedevi and Sankaran^[19] to incorporate covariates. We establish the asymptotic normality and strong consistency of the proposed estimators.

As we discuss, we estimate β_j and Λ₀ _j for j = 1, 2, …, k as the maximum points of the pseudo-likelihood function given in Eq. (2.4). We assume that the estimators as well as the true value of the parameters include in the parameter domain $R \times F_{j}$ , where $R \in R^{d}$ is a bounded convex set and $F_{j}$ be the class of functions defined as

$F_{j} \equiv \{Λ_{j} (.) : [0, \infty) \to [0, \infty) ∣ Λ_{j} (.)$ is monotone non-decreasing with $Λ_{j} (0) = 0\} j = 1, 2, \dots, k$ .

To prove the asymptotic properties of the estimators, we define the following. Let $B_{d}$ and $B$ denote the collection of Borel sets in $R_{d}$ and $R$ respectively. Let H(.) be the distribution of the covariate vector Z and τ = max(t) and define $B_{1} [0, τ] = {B \cap [0, τ] : B \in B}$ . Now we define the measures ψ_j, η_j and θ_j as follows. For $B, B_{1} \in B_{1} [0, τ]$ and $C \in B_{d}$ define

η_{j} (B \times C) = \int_{C} \sum_{m = 1}^{\infty} P (M = m; J = j | Z = z) \times \sum_{p = 1}^{m} P (T_{m, p} \in B | M = m, Z = z) d H (z) .

A similar measure is defined by Schick and Yu ^[23] to study the consistency of the likelihood estimators for mixed case interval censored data. Define the L₂ metric d₁(.) in parameter space $R \times F_{j}$ as

d_{1} ((β_{j 1}, Λ_{0 j 1}), (β_{j 2}, Λ_{0 j 2})) = {\{{|β_{j 1} - β_{j 2}|}^{2} + {‖Λ_{0 j 1} - Λ_{0 j 2}‖}_{L_{2} (ψ_{1 j})}^{2}\}}^{\frac{1}{2}},

where (β_j₁, Λ₀ _j ₁) and (β_j₂, Λ₀ _j ₂) are elements of the parameter space $R^{d} \times Ω^{+}$ and $ψ_{j} (B) = η_{j} (B \times R^{d})$ .

C1: The true parameter values of β_j and Λ₀ _j are included in $R^{0} \times F$ , where $R^{0}$ is the interior of $R$ .

C2: The observation times T_M_, _p are the random variables included in the bounded interval [0, τ] for some τ ∈ (0, ∞) for all p = 1, 2, …, M, M = 1, 2, …. Also the measure $ψ_{j} \times H on ([0, τ] \times R^{d}, B_{1} [0, τ] \times B_{d})$ is absolutely continuous with respect to η_j for j = 1, 2, …, k and E(M) < ∞.

C3: For each true baseline cumulative cause-specific mean function Λ₀ _j ,j = 1, 2, …, k, there exists an I_j ∈ (0, ∞) such that Λ₀ _j (τ) ≤ I_j.

C4: The function I_0j is defined as $I_{0 j} (X) \equiv \sum_{p = 1}^{M} N_{M, p} \log (N_{M, p})$ , satisfies $P I_{0 j} (X) < \infty$ .

C5: The support of H, the distribution of covariate vector Z, is a bounded set in $R^{d}$ .

C6: For all $a \in R^{d}, a \neq 0$ and $c \in R, P (a^{'} Z \neq c) > 0$ .

References

Kalbfleisch

*and Lawless

JF.

The analysis of panel data under a Markov assumption. J Am Stat Assoc 1985; 80: 863–871.

Chiou

, Huang

, Xu

, . Semiparametric regression analysis of panel count data: A practical review. Int Stat Rev 2019; 87: 24–43.

Lawless

*and Zhan

Analysis of interval-grouped recurrent-event data using piecewise constant rate functions. Can J Stat 1998; 26: 549–565.

Thall

PF.

Mixed Poisson likelihood regression models for longitudinal interval count data. Biometrics 1988; 44: 197–209.

Sun

*and Zhao

Statistical analysis of panel count data. New York: Springer, 2013.

Sun

The statistical analysis of interval-censored failure time data. New York: Springer, 2006.

Sun

*and Kalbfleisch

Estimation of the mean function of point processes based on panel count data. Stat Sin 1995; 5: 279–289.

Wellner

*and Zhang

Two estimators of the mean of a counting process with panel count data. Ann Stat 2000; 28: 779–814.

Thall

*and Lachin

JM.

Analysis of recurrent events: Nonparametric methods for random-interval count data. J Am Stat Assoc 1988; 83: 339–347.

10.

Zhang

A semiparametric pseudolikelihood estimation method for panel count data. Biometrika 2002; 89: 39–48.

11.

Wellner

*and Zhang

Two likelihood-based semiparametric estimation methods for panel count data with covariates. Ann Stat 2007; 35: 2106–2142.

12.

, Sun

*and Wei

Lj.

Regression parameter estimation from panel counts. Scand J Stat 2003; 30: 25–43.

13.

Sun

*and Wei

Regression analysis of panel count data with covariate-dependent observation and censoring times. J R Stat Soc Series B Stat Methodol 2000; 62: 293–302.

14.

Huang

*and and

Wang MC

Zhang Y Analysing panel count data with informative observation times. Biometrika 2006; 93: 763–775.

15.

Zhao

*and Tong

Semiparametric regression analysis of panel count data with informative observation times. Comput Stat Data Anal 2011; 55: 291–300.

16.

Zhao

, Zhang

, Zhao

, . A nonparametric regression model for panel count data analysis. Stat Sin 2019; 29: 809–826.

17.

, Tong

, Sun

, . Regression analysis of multivariate panel count data. Biostatistics 2008; 9: 234–248.

18.

Zhang

, Zhao

, Sun

, . Regression analysis of multivariate panel count data with an informative observation process. J Multivar Anal 2013; 119: 71–80.

19.

Sreedevi

*and Sankaran

PG.

Nonparametric inference for panel count data with competing risks. J Appl Stat 2021; 48: 3102–3115.

20.

Sankaran

, Ashlin

*and Sreedevi

EP.

Cause specific rate functions for panel count data with multiple modes of recurrence. J Indian Stat Assoc 2021; 58: 175–194.

21.

Balakrishnan

*and Zhao

A class of multi-sample nonparametric tests for panel count data. Ann Inst Stat Math 2011; 63: 135–156.

22.

Crowder

MJ.

Multivariate survival analysis and competing risks. Boca Raton, FL: CRC Press, 2012.

23.

Schick

*and Yu

Consistency of the GMLE with mixed case interval-censored data. Scand J Stat 2000; 27: 45–55.

24.

Huang

Efficient estimation for the proportional hazards model with interval censoring. Ann Stat 1996; 24: 540–468.

25.

Sreedevi

, Sankaran

*and Dewan

A semi-parametric regression model for current status competing risks data. J Indian Stat Assoc 2017; 55: 35–61.

Regression Analysis of Panel Count Data with Multiple Modes of Recurrence

Abstract

Keywords

1. Introduction

2. The Proportional Mean Model

Absolute Bias and MSE of the Estimators of Regression Coefficients for Various Parameter Combinations of (β11, β12) and (β21, β22).

Plots of Baseline Mean Functions.

Table 2.

Estimates of the Regression Parameters with Corresponding for Males.

Estimates of the Regression Parameters with Corresponding Standard Error for Females.

Figure 2.

Baseline Cumulative Cause-Specific Cumulative Mean Functions for Males.

Figure 3.

Baseline Cumulative Cause-Specific Cumulative Mean Functions for Females.

Estimates of the Regression Parameters with Corresponding SE for Severity Score Data.

Baseline Cumulative Cause-Specific Cumulative Mean Functions for 29 Patients.

Footnotes

Acknowledgements

Data Availability

Declaration of Conflicting Interest

Ethical Statement

Funding

ORCID iD

APPENDIX

References

Absolute Bias and MSE of the Estimators of Regression Coefficients for Various Parameter Combinations of (β₁₁, β₁₂) and (β₂₁, β₂₂).