A permutation test based on the restricted mean survival time for comparison of net survival distributions in non-proportional excess hazard settings

Abstract

Net survival is used in epidemiological studies to assess excess mortality due to a given disease when causes of death are unreliable. By correcting for the general population mortality, it allows comparisons between regions or periods and thus evaluation of health policies. The Pohar-Perme non-parametric estimator of net survival has been recently proposed, soon followed by an appropriate log-rank-type test. However, log-rank tests are known to be under-optimal in non-proportional settings (e.g. crossing of the hazard functions). In classical survival analysis, one solution is to compare the restricted mean survival times. A difference in restricted mean survival time represents a life benefit or loss over the studied period. In the present article the restricted mean net survival time was used to derive a specific test statistic to compare net survivals in proportional and non-proportional hazards settings. The new test was generalized to more than two groups and to stratified analysis. The test performance was assessed on simulation study, compared to the log-rank-type test, and its use illustrated on a population-based colorectal cancer registry. The new test for net survival comparisons proved robust to non-proportionality and well-performing in proportional hazards situations. Furthermore, it is also suited to the classical survival framework.

Keywords

Survival analysis net survival restricted mean survival time excess hazard non-proportional hazards hypothesis testing cancer life benefit

1 Introduction

Cancer registries collect population-based data that are critical for evaluating and improving the quality of cancer care. In the analysis of registry data, one key indicator is survival; it provides useful information on cancer control at the scale of a whole healthcare system. As cancer patients can die from other causes than cancer itself –which is becoming increasingly frequent due to progress in therapy– several approaches have been proposed to adjust survival data for competing causes of death. This led to the concept of net survival, the survival that would be observed if only deaths from cancer were considered.¹ However, in cancer registries, the exact cause of death is often unavailable or unreliable; thus, particular methods have been developed to estimate cancer net survival.² These methods rely on the assumption that, at an individual level, the observed instantaneous hazard of death ( $λ_{O}$ ) is the sum of a population hazard that results from causes other than the cancer under study ( $λ_{P}$ ) and an excess hazard specific to this cancer ( $λ_{E}$ ). Such a decomposition of the observed hazard is valid when, conditionally on a known set of covariates, the latent time to death due to cancer and the latent time to death due to any other possible cause are independent. The population hazard can be obtained from mortality tables of the general population that include socio-demographic covariates identical to those of cancer patients under study (usually; age at diagnosis, sex, year of diagnosis, region or district) assuming that cancer accounts for a negligible fraction of deaths.

Due to its intrinsic properties, net survival allows comparisons of cancer burden between subpopulations of patients or between periods while correcting for differences in baseline population mortalities. In 2012, Pohar-Perme et al.³ proposed a non-parametric and consistent estimator of net survival. Its use has been advocated to estimate net survival in large-scale studies with significant non-homogeneity in population hazards of death resulting, for instance, from different age structures [see literature^4–6]. More recently, Grafféo et al.⁷ introduced a specific log-rank-type test to compare two or more net survival distributions, with a stratification version to control for categorical covariates having different distributions in the groups to be compared and are known to affect the survival. This test is currently implemented using relsurv R package for relative survival analysis.⁸ However, it is known that though the usual log-rank test is optimal under the assumption of proportional hazards (PH), its power is poor when this assumption does not hold.⁹ Indeed, by construction of the log-rank statistic, early positive differences between groups can be negated by later negative differences, leading to a non-significant result. The same limitation is expected with the test adapted to net survival by Grafféo et al.,⁷ even as non-proportional effects of some covariates have been uncovered in population-based cancer studies [see e.g. Giorgi et al.¹⁰ and Remontet et al.¹¹]. Several statistics have then been proposed as alternatives to the usual log-rank test for such a setting in the field of classical survival [see literature^9,12–14 for examples]. However, their direct implementation in the field of net survival is not straightforward because registries do not have, usually, the distribution of the times to deaths specifically due to a given cancer. To face this problem, the concept of restricted mean survival time (RMST; i.e. the average life expectancy over a given period) seems much more appealing because: (i) the RMST can be directly obtained from the time-survival curve by integration; (ii) RMST-based tests have been already proposed to compare survival distributions and found robust in non-PH settings;^15,16 (iii) in net survival, the restricted mean net-survival time (RMNST) and its meaningful complement, the restricted mean net time loss (RMNTL; i.e. life years lost due to cancer over a given period of time) are attractive measures at population level, which provide an excellent alternative to hazard ratio to compare net survival distributions in settings with non-proportional excess hazards.

The present article proposes a permutation test to compare distributions of net survival that is robust to non-proportionality of the excess hazards of death from cancer or another disease. This test is based on a statistic derived from the RMNST. In this article, Sections 2 and 3 present, respectively, the Pohar-Perme estimator of net survival and the RMNST, and discusses their interests and interpretations. Section 4 presents the absolute difference RMNST statistic (AD-RMNST) between two groups and introduces the AD-RMNST test based on permutations, the extension of the test to more than two groups, and its stratified version. Section 5 assesses the performance of the test through an extensive simulation study. Section 6 provides an application to colorectal cancer data. Section 7 is a brief discussion of the achievements and future developments of the newly proposed AD-RMNST permutation test.

2 Estimation of net survival

Consider registry data on n patients, with $T_{i}$ $(i = 1, \dots, n)$ denoting the time of follow-up of each patient i and $δ_{i}$ a censoring indicator ( $δ_{i} = 1$ if patient i dies (from any cause, cancer included) at end of follow-up, and = 0 otherwise). For each patient, the at-risk indicator at any time t is $Y_{i} (t)$ (where $Y_{i} (t) = 1$ if $T_{i} \leq t$ and 0 otherwise) and the number of events at each time point is $N_{i} (t)$ ( $N_{i} (t) = 1$ if $T_{i} t$ and $δ_{i} = 1$ ).

Let $T_{E, i}$ denote the latent hypothetical time to death of patient i due to the cancer under study; the individual excess hazard $λ_{E, i} (t)$ is then

λ_{E, i} (t) = {lim}_{Δ t \to 0} \frac{P (t < T_{E, i} \leq t + Δ t | T_{E, i} > t)}{Δ t}

and is noted

Λ_{E, i} (t) = \int_{0}^{t} λ_{E, i} (u) d u

when cumulated over period

[0, t]

. The individual net survival associated with this excess hazard is then

S_{E, i} (t) = exp (- Λ_{E, i} (t))

. At a cohort level, net survival may be defined as

S_{E} (t) = E (S_{E, i} (t))

, which may be also written as a function of a cohort cumulated excess hazard:

S_{E} (t) = exp (- Λ_{E} (t))

The cumulative excess hazard over time $Λ_{E} (t)$ may be estimated using the Pohar-Perme estimator.³ For this estimation, the authors used the inverse probability weighting to correct Ederer II estimator¹⁷ for informative censoring due to covariates able to affect deaths from cancer and other causes. The weights are the inverses of the probabilities of dying from other causes than the studied cancer and are calculated based on the individual population hazard $λ_{P, i} (t)$ , obtained from population mortality tables according to individual demographic data (classically; sex, age, year of birth, and region). Finally, the Pohar-Perme estimator is defined as

{\overset{\land}{Λ}}_{E} (t) = \int_{0}^{t} \frac{\sum_{i = 1}^{n} {dN}_{i}^{w} (u)}{\sum_{i = 1}^{n} Y_{i}^{w} (u)} d u - \int_{0}^{t} \frac{\sum_{i = 1}^{n} Y_{i}^{w} (u) d Λ_{Pi} (u)}{\sum_{i = 1}^{n} Y_{i}^{w} (u)} d u

where

{dN}_{i}^{w} (t) = d N_{i} (t) / S_{Pi} (t)

and

Y_{i}^{w} (t) = Y_{i} (t) / S_{Pi} (t)

are, respectively, the variations in the number of events indicator and the at-risk indicator of individual i, weighted by the inverse of their expected population survival

(S_{P, i} (t) = exp (- Λ_{P, i} (t)) = \int_{0}^{t} λ_{P, i} (u) d u)

3 The RMNST

In survival analysis, one parameter of interest is the mean survival time. It is defined as the mathematical expectation of $T_{O}$ , the observed survival time, and may be written as the integral of the observed survival function over time: $E (T_{o}) = \int_{0}^{\infty} S_{o} (u) d u$ . However, due to limited follow-up and censoring, the tail of the survival time distribution is most often ill determined. In this case, the restricted mean survival time $RMST (τ)$ may be used to estimate the mean survival over a finite period $[0, τ]$ where τ may be, for instance, the maximum administrative follow-up time of patients in a registry or any other clinically significant time point. $RMST (τ)$ is defined as the mathematical expectation of the observed survival time $T_{O}$ restricted to period $τ \geq 0$ . It corresponds to the integral of the survival function over time interval $[0, τ]$ ; thus, $RMST (τ) = E (min (T_{O}, τ)) = \int_{0}^{τ} S_{o} (u) d u$ (the latter equality is proven by integration by parts). $RMST (τ)$ may be easily visualized as the area under the observed survival curve limited to time τ and can be interpreted as the life expectancy over period $[0, τ]$ . A complementary measure is the restricted mean time loss $RMTL (τ) = \int_{0}^{τ} 1 - S_{o} (u) d u = τ - RMST (τ)$ , corresponding to the area above the survival curve. It is interpreted as the average number of life years lost over the studied period with respect to a hypothetical immortal cohort.

Extending these concepts to the net survival setting implies using $T_{E}$ , the time to death from cancer, instead of $T_{O}$ . The restricted mean net survival time over $[0, τ]$ , $RMNST (τ)$ , then reads

RMNST (τ) = E (min (T_{E}, τ)) = \int_{0}^{τ} S_{E} (u) d u

The reader should keep in mind that $T_{E}$ is latent (not directly accessible) while $S_{E}$ can be estimated using for instance the Pohar-Perme estimator; hence, the integral expression is preferred for evaluation of $RMNST (τ)$ . Similarly to $RMST (τ)$ , $RMNST (τ)$ may be readily visualized as the area under the curve that plots the net survival probability as a function of time (the net-survival curve) truncated at time τ (Figure 1(a)). $RMNST (τ)$ corresponds to a hypothetical life expectancy if patients could die from cancer only (all other causes of death having been eliminated). Likewise, the restricted mean net time loss $RMNTL (τ) = \int_{0}^{τ} 1 - S_{E} (u) d u = τ - RMNST (τ)$ is viewed as the area above the net survival curve and corresponds to the average number of years of life lost due to cancer in comparison with an immortal cohort over the period of interest (Figure 1(a)).

Figure 1.

(a) Illustration of a net survival function of time with the respective restricted mean net survival time (RMNST, shaded area under the curve) and restricted mean net time loss (RMNTL, non-shaded area above the curve). (b) Net survival functions for two groups with the corresponding absolute-difference restricted mean net survival time (AD-RMNST, shaded area). (c) Net survival functions for three groups with the corresponding AD-RMNST (shaded area).

Eventually, when comparing the net survival between two groups, the difference in RMNSTs, $Δ RMNST (τ) = RMNS T_{1} (τ) - RMNS T_{2} (τ) = RMNT L_{2} (τ) - RMNT L_{1} (τ)$ , is a convenient global indicator that reflects the advantage of belonging to one group versus the other in terms of life years lost due to cancer.

4 The absolute difference RMNST test

4.1 The absolute difference RMNST statistic

$Δ RMNST (τ)$ is an epidemiologically relevant indicator that allows a straightforward comparison of cancer burden between two groups on a time scale; however, it does not always provide an appropriate representation of survival time differences. Indeed, two groups with distinct net survival time distributions may have identical expected net survival times; i.e. no difference in $RMNST (τ)$ . Because $Δ RMNST (τ)$ corresponds to the algebraic area between two net survival curves and because these curves may sometimes cross each other (the underlying excess hazards also cross), positive differences before the intersection point may be offset by negative differences after the intersection point. In such a case, testing the hypothesis of no difference in net survival distribution between two patient groups by testing for a difference in $RMNST (τ)$ is not possible. The restricted mean net-survival absolute difference was then conceived to account for all differences between net survival distributions as an absolute value

AD - RMNST (τ) = \int_{0}^{τ} | S_{E 2} (t) - S_{E 1} (t) | d t

It goes without saying that this formulation conceals the epidemiological significance of the statistic in case of survival curve crossing. This significance is kept when there is no crossing, even when the underlying hazards are not proportional (which is the most frequent setting). On a time versus survival plot, the $AD - RMNST (τ)$ statistic is the geometric area between the net survival curves (Figure 1(b)).

This statistic may be extended to any number $K \geq 2$ of groups by considering the area of the envelope formed by the corresponding K net survival curves (Figure 1(c))

AD-RMNST (τ) = \int_{0}^{τ} {max}_{1 \leq k \leq K} S_{Ek} (t) - {min}_{1 \leq k \leq K} S_{Ek} (t) d t

Testing for the equality of all K net survival time distributions over a follow-up period limited to time τ amounts to testing for the following null hypothesis

(H_{0}) ⋮ \forall t \in [0, τ], S_{E, 1} (t) = \dots = S_{E, K} (t)

which is equivalent to

(H_{0}) ⋮ \forall t \in [0, τ], {max}_{1 \leq k \leq K} S_{E, k} (t) = {min}_{1 \leq k \leq K} S_{E, k} (t)

and, using the newly proposed test statistic, to

(H_{0}) ⋮ AD-RMNST (τ) = 0

4.2 A permutation test for the absolute difference RMNST statistic

Testing whether the calculated $AD-RMNST (τ)$ is significantly different from $0$ requires knowing its distribution under the null hypothesis of no group effect on net survival. A simple alternative to the analytical derivation of such a distribution may be based on a numerical permutation approach.^18,19

Because, under the null hypothesis, group indices k are exchangeable (allegedly no group effect on survival), computing the $AD-RMNST (τ)$ statistic under all permutations of group indices k would provide the distribution under the null of $AD-RMNST (τ)$ . However, this approach becomes computationally unrealistic along with the increase of the number of patients. It is then proposed to permute randomly the indices a ‘large’ number of times in order to approximate the null distribution of the statistic of interest.^18,19

When B is the number of permutations, drawn at random (with replacement) among all possible permutations of group indices k, $AD-RMNST b (τ)$ is the value of the test statistic for the $b^{th}$ permuted dataset, $AD-RMNST obs (τ)$ is the value of the test statistic for the real dataset, and M is the number of permuted datasets where $AD-RMNST b (τ) \geq AD-RMNST obs (τ)$ . Then, according to Phipson and Smyth,²⁰ the empirical p value of the permutation test is estimated at

p = (M + 1) / (B + 1) .

4.3 Stratified version of the AD-RMNST permutation test

A stratified version of the AD-RMNST permutation test is proposed for adjusting for the effect of categorical covariates which present different distributions in the groups to be compared and which are known to affect the survival. A partition of the covariate space is defined by $(I_{1}, \dots, I_{S})$ , where $(I_{s})_{1 \leq s \leq S}$ corresponds to the stratum of one or more covariates. The issue is to test

(H_{0}) ⋮ \forall s \in [1; S], \forall t \in [0, τ] S_{E, 1, s} (t) = \dots = S_{E, K, s} (t)

\Leftrightarrow \forall s \in [1; S] AD-RMNST (τ)_{s} = 0

The stratified version of the statistic consists in a weighted sum over all strata of $AD-RMNST (τ)_{s}$ with weights corresponding to the proportions $n_{s} / n$ of patients initially present in each stratum. The statistic reads

AD-RMNST (τ) = \frac{1}{n} \sum_{s = 1}^{S} n_{s} AD-RMNST (τ)_{s}

Its distribution under the null hypothesis may be also approached by permutations, as in the non-stratified case, the only difference being that the permutations of group indices k are carried out within each stratum $I_{s}$ and never across strata. The empirical p value is again estimated as $p = (M + 1) / (B + 1)$ (see section 4.2).

The AD-RMNST permutation test and its stratified version are proposed to compare $K \geq 2$ distributions of net survival as estimated by the Pohar-Perme estimator. They have been implemented in an R program (available on request).

5 Simulations

A simulation study was conducted to assess the performance of the proposed $AD-RMNST$ permutation test. The first part of the study was composed of five scenarios comparing three groups of patients, each scenario featuring a different degree of non-proportionality of the excess hazards between the three groups. The second part of the study consisted of the same five scenarios but with an additional effect of patients' sex on net survival that required the use of a stratified test to detect a difference. The criteria used for the assessment of the AD-RMNST permutation test were the type I error and the statistical power. These results were compared with those provided by the log-rank-type test.

5.1 Data generation and simulation design

For each patient i, a group covariate $K_{i}$ was generated with three levels ( $K_{i} = 0, 1$ or $2$ ), corresponding to the groups to compare. The distribution of $K_{i}$ was set so as to study balanced cases (i.e. $P (K_{i} = 0) = P (K_{i} = 1) = P (K_{i} = 2) = 1 / 3$ ). A vector $D_{i}$ of independent demographic covariates (sex, age at diagnosis, and year of diagnosis) was also generated. A first series of simulations aimed at testing the unstratified AD-RMNST permutation test, sex being generated from a binomial distribution with probability $P (man) = P (woman) = 1 / 2$ . A second series of simulations aimed at testing the stratified version of the AD-RMNST permutation test by considering sex heterogeneity between groups: $P (man | K_{i} = 0) = 0.6$ , $P (man | K_{i} = 1) = 0.5$ , and $P (man | K_{i} = 2) = 0.3$ . Age at diagnosis was generated from uniform distributions to obtain the following age categories: $[30, 65 [$ , [65,75], and [75, 80] years with probabilities 0.25, 0.35, and 0.40, respectively, so as to mimic colon cancer data. For convenience, the year of diagnosis was 1984.

Survival times were generated using a multivariable additive excess hazard model; this provided a consistent parametric estimator of net survival after adjustment on demographic covariates.⁴ This model assumes that the instantaneous observed hazard $λ_{Oi}$ relative to the individual time to death $T_{i}$ is defined as the sum of an instantaneous population hazard $(λ_{Pi})$ plus an excess hazard $(λ_{Ei})$ . Two times to death were generated for each patient i: (i) the time to death due to the population hazard $(T_{Pi})$ obtained from the 1984–1994 American life tables stratified by age, sex, and year of diagnosis (survexp.us table available in R package survival); (ii) the time to death due to the cancer under study $(T_{Ei})$ derived from a flexible power-generalized Weibull distribution of excess hazard $λ_{E, k, s, i} ⋮ t_{i}, σ_{k, s}, ν_{k, s}, γ_{k, s} \to \frac{ν_{k, s}}{γ_{k, s} σ_{k, s}^{ν}} t_{i}^{ν_{k, s} - 1} (1 + (\frac{t_{i}}{σ_{k, s}}) ν_{k, s})^{1 / 1 γ_{k, s} γ_{k, s} - 1}$ , using the inverse transformation method.

In each series of simulations, five scenarios were considered to assess the performance of the test. In each scenario, the set of parameters $(σ_{k, s}, ν_{k, s}, γ_{k, s})_{k \in 0, 2, s \in {man, woman}}$ was chosen so as to offer different shapes of excess hazard for each group k and sex stratum. In the non-stratified series of simulations, sex had no effect on excess hazard. In the stratified series, the group effect was inversed in women stratum with respect to men stratum (for instance, if males in groups 0, 1, and 2 presented low, moderate and high excess hazards, respectively, females in groups 0, 1, and 2 would in turn be attributed high, moderate, and low excess hazards, respectively). Figure 2 shows the excess hazard functions and the corresponding net survival curves relative to all five simulation scenarios. Scenario 1 was designed to assess type I error: the excess hazards were identical for all three levels of covariate K and so were the net survival curves $S_{E, 0} (t) = S_{E, 1} (t) = S_{E, 2} (t)$ . Scenario 2 aimed at assessing the test performance in proportional excess hazard settings: the excess hazard ratios (EHR) of groups 1 and 2 versus group 0 were $EH R_{1 / 0} = 1.1$ and $EH R_{2 / 0} = 1.25$ . Scenarios 3, 4, and 5 aimed at assessing the test performance in various settings of non-proportional excess hazards. Specifically, in Scenario 3 the departure from proportionality was moderate with $EH R_{1 / 0}$ and $EH R_{2 / 0}$ starting at 1 at time $t = 0$ and increasing gradually up to $EH R_{1 / 0} (t = 10) = 1.1$ and $EH R_{2 / 0} (t = 10) = 1.25$ . Scenario 4 investigated settings with large non-proportional excess hazards: $EH R_{1 / 0}$ increasing from 0 at $t = 0$ to 1.2 at $t = 10$ and $EH R_{2 / 0}$ increasing from 0 to 1.5. The crossing, at 5 years, in the excess hazards of groups 1 and 2 resulted in a crossing of the net survival curves at end of follow-up. In Scenario 5, the crossing in the excess hazards occurred earlier, at 2 years, resulting in crossing of the net survival curves at mid-follow-up time.

Figure 2.

Simulation scenarios for the unstratified series: excess hazards for each group modelled using a flexible power-generalized Weibull function (upper panels), with the resulting net survival functions (lower panels).

A censoring time $C_{i}$ was also generated from a uniform distribution so as to obtain 30% of censored times and administrative censoring was set at $C =$ 10 years for all individuals. Then, for each individual, the observable time to death was $T_{i} = min (T_{Pi}, T_{Ei})$ , the observed time was $T_{O, i} = min (T_{i}, C_{i}, C)$ , and the corresponding vital status $δ_{i}$ was $δ_{i} = 1$ when $T_{i} = T_{O, i}$ , 0 otherwise. The set of data ${(T_{O, i}, δ_{i}, D_{i}, K_{i})}_{i \in [1, n]}$ , usually collected in cancer registries, will hereafter be referred to as “observed data”.

As in Grafféo et al.,⁷ providing cancer is the only cause of death, an individual's hypothetical follow-up time would be $T_{i}^{hypo} = min (T_{Ei}, C_{i}, C)$ with the corresponding vital status $δ_{i}^{hypo} (δ_{i}^{hypo} = 1$ when $T_{i}^{hypo} = T_{Ei}, 0$ otherwise). The dataset ${(T_{i}^{hypo}, δ_{i}^{hypo}, D_{i}, G_{i})}_{i \in [1, n]}$ , possible only in a simulation framework, is hereafter referred to as “hypothetical data”. Such data allow estimating and comparing net survivals between groups while shunting the Pohar-Perme estimator by applying (1) the usual log-rank test and (2) the AD-RMNST permutation test on Kaplan–Meier estimates. The objective of this approach is to compare log-rank and AD-RMNST permutation test mechanisms independently of the net survival estimator and to demonstrate that the new test is also useful in a classical survival analysis because any survival estimator can be plugged in.

For each scenario, a simulation run consisted of 5000 datasets (2000 for the stratified simulations) of 1000 individuals generated independently. Permutation tests were based on B = 5000 permutation samples. This value was retained according to results obtained in an exploratory analysis based on independent datasets (not used later) that showed that the standard deviation of the p value of the permutations was around 0.005. In each scenario, the probability of rejecting the null hypothesis was defined as the proportion of simulations with a p value < 0.05 in the AD-RMNST permutation test, which corresponds to the type I error rate for Scenario 1 and to the test power for Scenarios 2 to 5. The same type I error rate and power were calculated for the log-rank-type test, as well as for the usual log-rank and AD-RMNST permutation tests applied to “hypothetical data”.

5.2 Results

Table 1 displays results of the unstratified series of simulations with each scenario in terms of probability of rejection of the null hypothesis by both the AD-RMNST permutation test and the log-rank-type test applied to “observed data”

{(T_{O, i}, δ_{i}, D_{i}, G_{i})}_{i \in [1, n]}

. In accordance with the structure of the AD-RMNST permutation test, the type I error rate in Scenario 1 was satisfactory, close to the nominal 0.05 value expected for the selected 0.05 level of significance.

Table 1.

Percentage (with 95% confidence interval) of rejection of the null hypothesis H₀ by the AD-RMNST permutation test and the log-rank-type test applied to “observed data” with all five simulation scenarios, for a level of significance of 0.05.

Scenario^a	AD-RMNST	Log-rank-type
1. Type I error	5.1 (4.4–5.7)	5.3 (4.7–5.9)
2. Power with excess hazards proportionality	80.1 (79.0–81.2)	79.4 (78.3–80.6)
3. Power with moderate non-proportionality	76.0 (74.8–77.2)	76.9 (75.7–78.0)
4. Power with notable non-proportionality	79.2 (78.1–80.3)	28.5 (27.3–29.8)
5. Power with notable non-proportionality	72.1 (70.9–73.3)	10.4 (9.53–11.2)

AD-RMNST: absolute difference restricted mean net-survival time statistic.

See Figure 2 for a representation of the excess hazard functions for each scenario.

In Scenario 2 (with proportional hazards), the AD-RMNST permutation test and the log-rank-type test showed similar powers: 80.1% versus 79.4%, with a $\pm 1 %$ confidence interval.

In Scenarios 3, 4, and 5, the power of the log-rank-type test decreased with the degree of non-proportionality, whereas the AD-RMNST permutation test kept a convincing discrimination power.

Table 2 complements Table 1 by providing the percentage of rejection of the null hypothesis by the AD-RMNST permutation and the log-rank tests applied to “hypothetical data”. It confirms the results in terms of type I error rate, power, and superiority of the AD-RMNST permutation test versus the log-rank test in non-proportional hazard settings, independently of the net survival estimator.

Table 2.

Percentage (with 95% confidence interval) of rejection of the null hypothesis H₀ by the AD-RMNST permutation test and the log-rank test applied to “hypothetical data” and all five simulation scenarios, for a level of significance of 0.05.

Scenario^a	AD-RMNST	Usual log-rank
1. Type I error	5.0 (4.4–5.6)	5.1 (4.5–5.7)
2. Power with excess hazards proportionality	96.0 (95.5–96.6)	97.1 (96.6–97.5)
3. Power with moderate non-proportionality	94.3 (93.6–94.9)	96.4 (95.9–96.9)
4. Power with notable non-proportionality	98.8 (98.5–99.1)	61.3 (60.0–62.7)
5. Power with notable non-proportionality	99.1 (98.8–99.3)	18.7 (17.6–19.8)

AD-RMNST: absolute difference restricted mean net-survival time statistic.

See Figure 2 for a representation of the excess hazard functions for each scenario.

Finally, the results of the second series of simulations showed that the AD-RMNST permutation test has also a good power when used in its stratified form in any of the five scenarios (Table 3).

Table 3.

Percentage (with 95% confidence interval) of rejection of the null hypothesis H₀ by the stratified AD-RMNST permutation test compared to the unstratified AD-RMNST permutation test, applied to “observed data” and “hypothetical data” for all five stratified simulation scenarios, for a level of significance of 0.05.

	Test on “observed data”		Test on “hypothetical data”
Scenario^a	Stratified AD-RMNST	Unstratified AD-RMNST	Stratified AD-RMNST	Unstratified AD-RMNST
1. Type I error	5.7 (4.6–6.7)	5.7 (4.6–6.6)	5.1 (4.1–6.1)	5.0 (4.0–6.0)
2. Power with excess hazards proportionality	66.3 (64.2–68.4)	5.0 (4.0–5.9)	89.1 (87.7–90.4)	5.9 (4.8–6.9)
3. Power with moderate non-proportionality	66.1 (64.1–68.2)	10.6 (9.3–11.9)	90.3 (89.1–91.6)	14.9 (13.4–16.5)
4. Power with notable non-proportionality	66.2 (64.1–68.3)	9.4 (8.2–10.7)	94.8 (93.9–95.8)	13.5 (12.0–15.0)
5. Power with notable non-proportionality	50.1 (47.9–52.3)	5.8 (4.7–6.8)	90.2 (88.9–91.5)	5.4 (4.4–6.4)

AD-RMNST: absolute difference restricted mean net-survival time statistic.

See Figure 2 for a representation of the excess hazard functions for each scenario.

6 Application

The application used population-based survival data on 1967 incident cases of colorectal cancer (Registry of Digestive Cancers in Burgundy, France) diagnosed between 1976 and 1990 and followed-up until 31 December 1994.¹⁰ In this application, the patients were censored when they were still alive at the end of follow-up or 10 years after diagnosis. The variables used were sex, age (into three categories), and year of diagnosis (into five three-year periods), TNM stage at diagnosis, and primary tumour site (left-sided or right-sided colon cancer). The prognostic effect of each covariate was tested over the first 10 years of follow-up, first in terms of observed survival by using the log-rank test to compare covariate subgroups, then in terms of net cancer survival using the log-rank-type and the newly proposed AD-RMNST permutation test. Table 4 displays the distributions of the covariates and the results of the log-rank, log-rank-type, and AD-RMNST permutation tests in terms of p value (the last two tests used a French national mortality table stratified on sex, age, and year of diagnosis available in R package survexp.fr).

Table 4.

Distribution of the colon cancer prognostic factors, and p values of the differences in observed survival (usual log-rank) and in net survival distributions (log-rank-type test and AD-RMNST permutation test).

		Observed survival	Net survival
Prognostic factors	N (%)	Log-rank	Log-rank-type	AD-RMNST
Sex		0.680	<0.001	<0.001
Men	974 (49.5%)
Women	993 (50.5%)
Age		<0.001	0.033	0.025
[0–64]	541 (27.5%)
[64 –74]	602 (30.6%)
[74 –100]	824 (41.9%)
Year of diagnosis		<0.001	<0.001	0.001
1976–1978	429 (21.8%)
1979–1981	229 (11.6%)
1982–1984	406 (20.6%)
1985–1987	436 (22.2%)
1988–1990	467 (23.7%)
TNM stage		<0.001	<0.001	<0.001
I–II	1011 (51.4%)
III–IV	956 (48.6%)
Primary location		0.001	0.241	0.039
Left colon	1178 (59.9%)
Right colon	789 (40.1%)

AD-RMNST: absolute difference restricted mean net-survival time statistic.

Age, year of diagnosis, and TNM stage at diagnosis had undisputable influence on observed survival and net survival to cancer over the first 10 years of follow-up: all tests detected significant differences between subgroups regarding these factors (Table 4). Sex did not seem to affect observed survival (log-rank p-value: 0.68) but survival to cancer (log-rank-type and AD-RMNST p values < 0.001): data not shown here indicated that women had poorer net survival to colorectal cancer than men despite lower mortality rates in the general population. This would explain the presence of differences detected by tests on net survival and the absence of differences in observed survival.

In this illustration, primary tumour location seemed to be a prognostic factor for observed survival (log-rank p value: <0.001), but whose influence on net survival was unclear: the result of log-rank-type test (non-significance) contradicted the result of the AD-RMNST permutation test. Figure 3(a) shows the survival of Burgundy registry patients according to tumour location. Patients diagnosed with left-sided colon cancer showed higher survival probabilities than patients diagnosed with right-sided cancer. A maximum difference of 10.8% between survival probabilities was seen at 1.3 year after diagnosis (63.3% versus 52.5%), this difference decreased afterwards to 4.8% at 10 years (the survival rates were then 26.8% and 22.0% for left- and right-sided locations, respectively). The RMST over the 10-year-period was of 4.4 years and 3.7 years for left- and right-colon cancer, respectively. In terms of RMTL, the group diagnosed with right-colon cancer lost on average 8 months more of life during the first decade after diagnosis.

Figure 3.

(a) Observed survival of patients with left and right-sided colon tumours (Kaplan-Meier curves with 95% CI, p value of the log-rank test). (b) Net survival of patients with left and right-sided colon tumours (Pohar-Perme estimation with 95% CI, p values of the log-rank type and AD-RMNST permutation tests). AD-RMNST: absolute difference restricted mean net-survival time statistic.

Figure 3(b) shows a similar difference regarding net survival: patients diagnosed with left-sided colorectal cancer had higher net survivals over the first 9 years of follow-up than those diagnosed with right-sided cancer (maximum difference of 10.5% at 16 months and net survivals of 66.8% versus 56.2%). An inversion occurred after 9 years: the survival to colorectal cancer at 10 years of patients with right-sided tumours was slightly higher (by 3.8%, although the confidence interval of the difference included 0). The RMNST over 10 years of follow-up was 5.5 and 5.0 years for left- and right-sided cancer, respectively. Incidentally, these observations indicate that the underlying excess hazards do not stay proportional over the 10-year-long period of follow-up. Therefore, the inability of the log-rank-type test to detect a significant difference in net survival (p value: 0.24) is not surprising. In contrast, the AD-RMNST permutation test concluded to a significant difference in net survival between left-sided and right-sided cancer patients (p value: 0.039).

A similar analysis stratified on age categories and TNM stage groups was performed to adjust for factors known to influence survival and that might not be evenly distributed between left- and right-sided cancer patients. The results shown in Table 5 corroborated those of the unstratified analysis: a difference in net survival between proximal and distal colon cancers detected by the AD-RMNST permutation test was not detected by the log-rank-type test.

Table 5.

p values for log-rank-type test and AD-RMNST permutation test comparing net survival to right- and left-sided colon cancer with and without stratification.

Primary location	Log-rank-type	AD-RMNST
No stratification	0.241	0.039
Stratification on
Age	0.387	0.030
TNM stage groups	0.571	0.355
Age and TNM stage groups	0.812	0.048

AD-RMNST: absolute difference restricted mean net-survival time statistic.

7 Discussion

The proposed new AD-RMNST permutation test is a statistical tool that allows net survival comparisons even in settings where the PH hypothesis does not hold. Up to now, the only means of non-parametric comparison was the log-rank-type test developed by Grafféo et al.⁷ The present study shows that the log-rank-type test performs well under the PH assumption, keeps some power in case of moderate non-proportionality but is not suited for high non-proportionality settings. The new AD-RMNST permutation test uses the consistent non-parametric Pohar-Perme estimator of net survival and derives from the RMNST, an indicator that can be interpreted on a time scale. [For more details on these indicators, see literature^21,22]. The proposed test statistic can be also used in the classical survival framework when an observed survival estimator is plugged in, e.g. the Kaplan-Meier estimator.

The AD-RMNST statistic corresponds to the geometric area between the lower and upper limits defined by the net survival curves; thus, it is not affected by the crossing-over of hazard functions. Furthermore, it can be extended to more than two groups and is suited to stratified analysis. The distribution under the null hypothesis of the AD-RMNST statistic is obtained by numerical permutations. A series of simulations designed to test the Type I error rate and the power showed good performances of the AD-RMNST permutation test in proportional as in non-proportional hazard settings: the new test was equivalent to the log-rank-type in the former settings and outperformed it in the latter. This is very interesting because alternatives to the log-rank test in non-proportional hazard settings perform commonly worse in settings of proportional hazards. Similar results were observed when the proposed test was used in the classical survival framework and compared with the usual log-rank test. Furthermore, the slight improvement in statistical power in comparison with the net survival framework can be explained by the use of the Kaplan-Meier estimator, which has a lower variance than the Pohar-Perme estimator. The stratified version of the AD-RMNST test statistic was also validated by simulation.

A practical application on data from a colorectal cancer registry illustrated the use of the AD-RMNST permutation test for net survival analysis. The new test detected differences in survival to cancer between patients with left- and right-sided colon tumours, while the difference was deemed non-significant by the log-rank-type test (due to the non-proportionality of the underlying excess hazards). Physiological evidence supports this difference [for example, see Ulivi et al.²³]. Firstly, these locations may be considered as two distinct neoplasms due to different embryologic origins, microenvironments, and blood supplies; therefore, treatment designed for left-sided tumours might be less effective on right-sided tumours. Secondly, symptoms appear with more advanced tumours in right-sided than in left-sided tumours, which compromises their prognosis.

Technically, the AD-RMNST permutation test has been implemented in R with parallel processing of the permutation process, which reduces the computation time down to a few minutes (2 Intel® Xeon® 2.10 GHz 8-cores CPU E5-2620 v4). The code is available (on demand) for net survival and classical survival comparisons with plug-in Pohar-Perme and Kaplan-Meier estimators, respectively.

Most net survival analyses on cancer data involve modelling. Flexible regression models may be used to obtain unbiased estimations of net survival. They allow non-proportionality of the excess hazards,^4,10,11 and, as recently demonstrated, are also well-suited for comparisons.²⁴ The new non-parametric test is a simple and easy-to-use tool for preliminary analyses that requires no modelling strategy. In the same perspective, adapting other test statistics from the classical survival framework designed for non-proportional hazards settings, such as the Lin and Wang squared log-rank statistic²⁵ or PP-plot-based test,²⁶ is an ongoing work. It could also be of interest to detect the time-point at which the RMNST difference is the most significant.²⁷ Following this line, other options might be investigated to address the non-proportionality issue, as applying the same test at multiple time-points, and using a non-parametric combination (NPC) approach to obtain a single output statistic and p value.²⁸

Finally, another prospect of this work would be the derivation of the mathematical form of the distribution under the null hypothesis to propose a fully analytical non-parametric comparison test for both classical and net survival comparisons in any PH or non-PH setting.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Estève J, Benhamou E, Raymond L. Statistical methods in cancer research. Volume IV: Descriptive epidemiology. Lyon: IARC Scientific Publication, 1994, p.302.

Giorgi

and CENSUR working survival group . Challenges in the estimation of Net SURvival: The CENSUR working survival group. Rev Epidemiol Sante Publique 2016; 64: 367–371.

Pohar-Perme

Stare

Estève

. On estimation in relative survival. Biometrics 2012; 68.1: 113–120.

Danieli

Remontet

Bossard

, et al. Estimating net survival: the importance of allowing for informative censoring. Stat Med 2012; 31: 775–786.

Allemani

Weir

Carreira

, et al. CONCORD Working Group. Global surveillance of cancer survival 1995-2009: analysis of individual data for 25,676,887 patients from 279 population-based registries in 67 countries (CONCORD-2). Lancet 2015; 385: 977–1010.

Cowppli-Bony

Uhry

Remontet

, et al. French Network of Cancer Registries (FRANCIM). Survival of solid cancer patients in France, 1989–2013: a population-based study. Eur J Cancer Prev 2017; 26: 461–468.

Grafféo

Castell

Belot

, et al. A log-rank-type test to compare net survival distributions. Biometrics 2016; 72.3: 760–769.

Pohar-Perme M. relsurv: Relative Survival. R package, version 2.1-2, https://CRAN.R-project.org/package=relsurv (2005, accessed 11 December 2018).

Qiu

Sheng

. A two-stage procedure for comparing hazard rate functions. J R Stat Soc Series B Stat Methodol 2008; 70: 191–208.

10.

Giorgi

Abrahamowicz

Quantin

, et al. A relative survival regression model using B-spline functions to model non-proportional hazards. Stat Med 2003; 22: 2767–2784.

11.

Remontet

Bossard

Belot

, et al. An overall strategy based on regression models to estimate relative survival and model the effects of prognostic factors in cancer survival studies. Stat Med 2007; 26: 2214–2228.

12.

Fleming

O'Fallon

O'Brien

, et al. Modified Kolmogorov-Smirnov test procedures with application to arbitrarily right-censored data. Biometrics 1980; 36: 607–625.

13.

Breslow

Edler

Berger

. A two-sample censored-data rank test for acceleration. Biometrics 1984; 40: 1049–1062.

14.

Mantel

Stablein

. The crossing hazard function problem. Statistician 1988; 37: 59–64.

15.

Royston

Parmar

. Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome. BMC Med Res Methodol 2013; 13: 152.

16.

Horiguchi

, et al. A flexible and coherent test/estimation procedure based on restricted mean survival times for censored time-to-event data in randomized clinical trials. Stat Med 2018; 37: 2307–2320.

17.

Ederer

Axtell

Cutler

. The relative survival rate: a statistical methodology. Natl Cancer Inst Monogr 1961; 6: 101–121.

18.

Dwass

. Modified randomization tests for nonparametric hypotheses. Ann Math Stat 1957; 28: 181–187.

19.

Good P. Permutation, parametric, and bootstrap tests of hypotheses. 3rd ed. New York: Springer-Verlag, 2005, p.315.

20.

Phipson

Smyth

. Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn. Stat Appl Genet Mol Biol 2010, pp. 9 : Article 39.

21.

Andersen

. Decomposition of number of life years lost according to causes of death. Stat Med 2013; 32: 5278–5285.

22.

Antero-Jacquemin

Pohar-Perme

Rey

, et al. The heart of the matter: years-saved from cardiovascular and cancer deaths in an elite athlete cohort with over a century of follow-up. Eur J Epidemiol 2018; 33: 531–543.

23.

Ulivi

Scarpi

Chiadini

, et al. Right- vs. left-sided metastatic colorectal cancer: differences in tumor biology and bevacizumab efficacy. Int J Mol Sci 2017; 18: pii: E1240.

24.

Pavlic

Pohar Perme

. On comparison of net survival curves. BMC Med Res Methodol 2017; 17: 79.

25.

Lin

Wang

. A new testing approach for comparing the overall homogeneity of survival curves. Biom J 2004; 46: 489–496.

26.

Cox

. Testing the equivalence of survival distributions using PP-and PPP-plots. Int J Stat Med Res 2014; 3: 161–173.

27.

Zhao

Claggett

Tian

, et al. On the restricted mean survival time curve in survival analysis. Biometrics 2016; 72: 215–221.

28.

Arboretti

Fontana

Pesarin

, et al. Nonparametric combination tests for comparing two survival curves with informative and non-informative censoring. Stat Med 2018; 27: 3739–3769.