Asymptotic optimality of Hodges-Lehmann inverse rank likelihood estimators

Abstract

Hodges and Lehmann proposed using rank test statistics evaluated at inversely transformed data to construct estimating equations. This paper reframes the resulting estimates in terms of a rank likelihood. In the context of a model of the form Y = h(e; z, β), where Y is a response, z a vector of predictors and e is a random error, this approach corresponds to computing the inverse e = g(Y, z; β) by solving Y = h(e; z, β) for e and using the distribution of the ranks of independent e’s as a likelihood. The properties of the resulting estimators have been developed in many important contexts. This paper will review and extend asymptotic optimality properties of Hodges Lehmann estimators in semiparametric models. In particular, the paper will establish semiparametric optimality of the estimate obtained from a Gaussian linear model. Moreover, it will be shown that the Hodges-Lehmann estimate obtained from the exponential likelihood is asymptotically minimax for the semiparametric accelerated failure time model with increasing hazard rates, and it will be shown that a uniform (Wilcoxon) score estimate applied to log Y_i, 1 ≤ i ≤ n, is asymptotically minimax for an accelerated failure time model with increasing logit rate. References to recently developed software in R is provided.

Keywords

asymptotic efficiency Hodges-Lehmann estimators likelihood ranks

1 Introduction

One of the goals of statistical research is to develop methods that make efficient use of available data. One very efficient method is the normal scores inverse rank (NSIR) estimator which is based on the ideas in a paper by Hodges and Lehmann (1963). In a regression framework based on a linear model with general error distribution, this method is asymptotically uniformly more efficient than the least squares (LS) estimator, and the LS estimator in turn is best linear unbiased by the Gauss-Markov Theorem. Moreover, Monte Carlo simulations have shown that the superiority of the NSIR estimator also holds for finite sample sizes. Nevertheless, the NSIR estimator is hardly main stream in the statistical literature. One reason this may be is that it does not appear to have the prestige that comes with being based on a likelihood. Moreover, there is a lack of asymptotic semiparametric optimality results. Another reason may be that computational software has not been readily available until now.

This paper frames Hodges-Lehmann inverse rank estimators as likelihood based, reviews some of the properties and develops asymptotic minimax properties of inverse rank estimates based on normal, exponential and uniform scores.

In this paper → will denote convergence in distribution. The asymptotic distribution results used in this paper are from Adichie (1967), Jurečková (1971), Koul (1971) Jaeckel (1972), Kraft and van Eeden (1972), among others. Books that include Hodges-Lehmann estimates are by Lehmann (1975), Bickel and Doksum (1977), Hettmansperger (1984) and Hettmansperger and McKean (2011). Software in R for inverse rank estimates incuding the NSIR estimator has been developed by Kloke and McKean (2012). See the Appendix.

2 Rank and inverse rank likelihoods

Let X = (Y, Z), Y ∈ R, Z ∈ R^d, denote data with probability distribution P, and let p(x; θ) denote a parametric density for P with θ = (β, Λ), where β ∈ R^pand Λ is a function or a vector of functions. The profile likelihood is

L_{P} (β) = \sup_{Λ} {\prod_{i = 1}^{n} p (x_{i}; β, Λ)},

where the sup is over step functions Λ that jump at the ordered y’s. (e.g., Murphy and van der Vaart, 2000). If λ(y; β|z) is the conditional hazard rate, then the Cox partial likelihood is

L_{C} (β) = \prod_{i = 1}^{n} \frac{λ (y_{i}; β | z_{i})}{\sum_{j = i} λ (y_{i}; β | z_{j})} .

Let R_i = Rank(Y_i), R = (R₁, …, R_n), let θ₀ denote a H₀ (nullhypothesis) parameter value and let V⁽¹⁾ < … < V⁽ⁿ⁾ denote p(υ; θ₀|z) order statistics, then the Hoeffding likelihood is

L_{H} (β; r) = n! P (R = r) = E_{H_{0}} (\prod_{i = 1}^{n} \frac{p (v^{(r_{i})}; θ_{0} | z_{i})}{p (v^{(r_{i})}; θ | z_{i})}),

where r₁, …, r_n denote the observed ranks and r = (r₁, …, r_n).

The estimate ${\hat{β}}_{K P}$ = arg max L_H(β; r), which was developed by Kalbfleisch and Prentice (1973, 1980), is clearly a function of the ranks r₁, …, r_n. So are ${\hat{β}}_{C}$ = arg max L_C(β) and ${\hat{β}}_{P}$ = arg max L_P(β) for the models where they have been applied.

For example, consider the linear model

Y_{i} = z_{i}^{T} β + ε_{i}, ε_{i} i .i .d . F .

(2.1)

The KP approach has H₀ : β = 0 and ${\hat{β}}_{K P}$ solves

\nabla_{β} L_{H} (β; r) = 0.

In contrast, in this linear model, the Hodges-Lehmann estimate is the solution for β*of

\nabla_{β} L_{H} (β; r (y - z^{T} β *)) |_{β = 0} = 0.

(2.2)

Thus the Hodges-Lehmann estimate is likelihood based, but rather than ranking Y_i, it ranks $ε_{i}^{*} = Y_{i} - z_{i}^{T} β *$ , the ‘inverse’ in the equation y = z^T β + ε (see McKean and Vidmar, 1994). Note that β= 0 implies the i.i.d. case. Thus the idea in (2.2) is to find β* such that $ε_{1}^{*}, . . ., ε_{n}^{*}$ look i.i.d. to the gradient of the Hoeffding likelihood.

For a familiar example, consider the logistic two-sample shift model F₂(x) = F₁ (x – β). In this case, ∇_βΛ_Η|_β=0 is proportional to the Wilcoxon statistic and

{\hat{β}}_{H L} = med {X_{2 j} - X_{1 i} : i, j \in {1, . . ., n}} .

In general, the Hodges-Lehmann inverse rank estimate can be defined as follows: consider a model where for some function h Yhεzβ

Let ε*(y; z, b) be the solution (inverse) for ε of h(ε; z, β) = y, then ${\hat{β}}_{H L}$ solves _β_Hβεyzβ_β=0

3 Basic properties of the Hodges-Lehmann inverse rank estimate

Consider again the linear model

Y_{i} = z_{i}^{T} β + ε_{i}, ε_{i} i .i .d . F .

We use L_H with H₀ : β = β₀ = 0. In this case, $\nabla_{β} L_{H} (β; ε_{i}^{*}) |_{β = 0} = 0$ are p linear rank statistics of the form

T_{n j} (β *) = \sum_{i = 1}^{n} (z_{i j} - {\bar{z}}_{j}) a_{n} (R (Y_{i} - z_{i}^{T} β *)), j = 1, \dots, p,

(3.1)

where R denotes rank and $a_{n} (r) = E_{F} [- \frac{f^{'}}{f} (V^{(r)}) + \frac{f^{'}}{f} (V)]$ is assumed to exist. Usually a_n(r) is approximated by $c_{n} (r) = - \frac{f^{'}}{f} (F^{- 1} (\frac{r}{n + 1})) + (E \frac{f^{'}}{f} (V)) .$ Here c_n(r) is proportional to the uniform $[\frac{r}{n + 1}] - \frac{1}{2}$ and normal $Φ^{- 1} (\frac{r}{n + 1})$ scores when F is logistic and normal, respectively, where Φ is the N(0, 1) distribution function. In the $Φ^{- 1} (\frac{r}{n + 1})$ case, ${\hat{β}}_{H L}$ is the NSIR estimate. Solutions to T_nj(β*) = 0, j = 1, …, p, exist.

Theorem 3.1. (Jaeckel, 1972)

If a_n(1) < … < a_n(n) and $\sum_{i = 1}^{n} a_{n} (i) = 0,$ the Hodges-Lehmann estimate can be expressed as any value that minimizes

S (β) \equiv \sum_{i = 1}^{n} a_{n} (R (Y_{i} - z_{i}^{T} β)) (Y_{i} - z_{i}^{T} β) .

S(β) is a nonnegative, continuous and convex function of β.

If (z_ij)_n×p is of full rank, S(β) reaches its minimum.

In model (2.1), let ${\hat{β}}_{N S}$ denote the NSIR estimate, Ψ = Φ′, $Z = {(z_{i j} - {\bar{z}}_{j})}_{n \times p},$ and let ∑= lim_n→∞n^–1Z^TZ.

Theorem 3.2. In the linear model (2.1), if ∑^–1 and the Fisher information I(f) exist,

\sqrt{n} ({\hat{β}}_{N S} - β) \to N (0, σ_{N S}^{2} (F)),

where

σ_{N S}^{2} (F) = C^{- 2} (F) Σ^{- 1}

and

C^{2} (F) = E_{F} {(\frac{f (X)}{Ψ (Φ^{- 1} (F (X)))})}^{2} .

The corresponding result for the LS estimate when ∑^–1 and σ²(F) = var(ε) exist is

\sqrt{n} ({\hat{β}}_{L S} - β) \to N (0, σ_{L S}^{2} (F)),

where

σ_{L S}^{2} (F) = σ^{2} (F) Σ^{- 1} .

It follows that

σ_{L S}^{2} (F) = C^{2} (F) σ^{- 2} (F) σ_{N S}^{2} (F) .

Chernoff and Savage (1958) showed that σ²(F)C²(F) ≥ 1 with equality iff F is Gaussian. A simple proof based on the Jensen and Cauchy-Schwarz inequalities was given by Gastwirth and Wolff (1968). It follows that.

Theorem 3.3. Under Theorem 2 conditions, each entry in the matrix $σ_{L S}^{2} (F)$ is larger than the corresponding entry in $σ_{N S}^{2} (F)$ , except when F is Gaussian, in which case the entries are equal.

4 Naive asymptotic semiparametric minimax theory

There was a time when most mathematics students and faculty read the book ‘Naive Set Theory’ by PR Halmos (1960). It presented the main concepts and results of set theory without detailed justifications. With apologies to Halmos, the naive asymptotic semiparametric minimax approach amounts to finding a decision rule δ₀ and a function Λ₀ such that δ₀ has maximum asymptotic risk at Λ₀, and δ₀ has smaller asymptotic risk than other δ at Λ₀. Then, if R(δ, Λ , β) is the asymptotic risk,

\max_{Λ} R (δ_{0}, Λ, β) = R (δ_{0}, Λ_{0}, β) \leq R (δ, Λ_{0}, β) \leq \max_{Λ} R (δ, Λ, β) .

That is, (δ₀, Λ₀) is a saddle point and δ₀ is a minimax for R.

For example, in the proportion hazard rate model, where the hazard rate satisfies λ(y; β|z) = λ(y)exp(z^Tβ) for some baseline hazard λ(⋅), the Cox partial likelihood estimate ${\hat{β}}_{C}$ has a risk which is constant in the hazard rate λ. This is because ${\hat{β}}_{C}$ depends only on the ranks and thus Rank(Y_i) = Rank(Λ(Y_i)), where $Λ (y) = \int_{0}^{y} λ (t) d t$ . This means that ${\hat{β}}_{C}$ has maximum risk at Λ_(y) = y. But Λ(y) = y corresponds to the exponential model and ${\hat{β}}_{C}$ has the same asymptotic distribution as the MLE ${\hat{β}}_{M}$ for the exponential model. Because ${\hat{β}}_{M}$ has the smallest possible asymptotic mean squared error for the exponential distribution, ${\hat{β}}_{C}$ is a naive asymptotic minimax estimate. This argument, although intuitively clear, is ‘naive’ because it has left out all the conditions needed to make precise the statement ‘in the exponential model, the MLE has the smallest possible asymptotic mean squared error’. This statement is not true unless we introduce a framework that rules out Hodges’ locally superefficient estimates.

This naive approach makes it straightforward to show semiparametric asymptotic optimality of Hodges-Lehmann inverse rank estimators.

Theorem 4.1. Consider the class of models (2.1), where F has finite Fisher information and σ² (F) ≤ M for some M > 0. Then the normal scores estimate ${\hat{β}}_{N S}$ is asymptotically minimax in the sense that if $\hat{β}$ is any other estimate with $\sqrt{} \overset{}{}^{} (F))$ for some τ²(F) > 0, then $\sup_{F} τ^{2} (F) \geq \sup_{F} σ_{N S}^{2} (F)$ , where ≥ means component-wise inequality of matrices.

Proof. By the Chernoff-Savage inequality, $σ_{N S}^{2} (F)$ is maximized by taking F = N(0, M). But ${\hat{β}}_{N S}$ has the same asymptotic distribution as ${\hat{β}}_{L S}$ for this F and ${\hat{β}}_{L S}$ equals the MLE for F = N(0, M) and thus is asymptotically optimal. It follows that ${\hat{β}}_{N S}$ is a naive asymptotic semiparametric minimax estimate.

Next consider the accelerated failure time model, where Y₁, …, Y_n are independent with Y|z distribution defined by Y(4.1)εz^TβεF

For this model, ${\hat{β}}_{H L}$ solves

\nabla_{β} L_{H} (β; r (y / \exp [z^{T} β *]) |_{β = 0} = 0,

which is equivalent to solving

\sum_{i = 1}^{n} (z_{i j} - {\bar{z}}_{j}) a_{n} (R (Y_{i} / \exp [β^{T} z_{i}])), j = 1, \dots, p,

(4.2)

where a_n(i) = E_K(V⁽ⁱ⁾) –1 with K(t) = 1 – exp(–t). Here a_n(i) is often approximated with $- \log (1 - \frac{r}{n + 1}) - 1.$ The solution ${\hat{β}}_{E S}$ to (4.2) is called the exponential scores inverse rank (ESIR) estimate.

Theorem 4.2. Assume model (4.1). If ∑^–1 exists and

0 < E_{F} {{(ε \frac{f^{'} (ε)}{f (ε)})}^{2}} < \infty,

then $\sqrt{n} ({\hat{β}}_{E S} - β) \to N (0, σ_{E S}^{2} (F))$ , where

σ_{E S}^{2} (F) = B^{- 2} (F) Σ^{- 1}

with

B (F) = \int_{0}^{\infty} \frac{t f (t)}{1 - F (t)} d F (t) .

Proof. This result can be obtained from the literature on the asymptotic distribution of Hodges-Lehmann inverse rank estimates by noting that log Y follows a linear model with the error having the extreme value density exp(t – e^t) (see Hettmansperger and McKean, 2011, p. 233).

Theorem 4.3. Assume the accelerated failure time model with F having nondecreasing hazard rate. Also assume the conditions of Theorem 4.2. Then the exponential scores estimate ${\hat{β}}_{E S}$ is asymptotically minimax in the sense that $\sup_{F} τ^{2} (F) \geq \sup_{F} σ_{E S}^{2} (F),$ where τ²(F) is the asymptotic covariance matrix of any $\hat{β}$ satisfying $\sqrt{n} (\hat{β} - β) \to N (0, τ^{2} (F)) .$

Proof. Let IFR stand for ‘increasing hazard rate’. We show that B(F) ≥ 1 when F is IFR. Because λ(t) = f(t)/[1 – F(t)] is nondecreasing

Λ (t) = \int_{0}^{t} λ (v) d v = - \log [1 - F (t)]

is convex, which implies that t^–1Λ(t) is nondecreasing. Thus t(4.3)^–1tt^–2FtfttFt

which implies

\frac{t f (t)}{1 - F (t)} \geq - \log [1 - F (t)] .

(4.4)

Thus

B (F) = \int_{0}^{\infty} \frac{t f (t)}{1 - F (t)} d F (t) \geq \int_{0}^{\infty} - \log [1 - F (t)] d F (t) = 1.

(4.5)

Here equality holds iff F is exponential. Thus $σ_{E S}^{2} (F)$ is maximized at the exponential distribution. Moreover, the MLE ${\hat{β}}_{M}$ in the exponential model satisfies

\sqrt{n} ({\hat{β}}_{M} - β) \to N (0, Σ^{- 1}) .

Thus ${\hat{β}}_{E S}$ is asymptotically optimal for the exponential model, and thus asymptotically minimax for the semiparametric IFR model.

This result is for uncensored data. Tsiatis (1990) has extended inverse rank estimates to censored data. It is conjectured that Theorem 4.3 holds for censored data.

A logit survival model

Doksum and Gasko (1990) and others have examined the correspondence between models in binary regression analysis and survival analysis. Thus the Cox proportional hazard model corresponds to the complimentary log–log link function in the binary regression model. Here we next consider a survival model related to the logit binary regression model.

Let Y₀ be a baseline survival time, let F₀ denote its distribution function and define

Δ (y) = l o g i t [F_{0} (y)] .

Our model is the accelerated failure time model where Y₁, …, Y_n are independent with Y|z distributed as YY₀z^Tβ

Previously we assumed that the baseline integrated hazard rate Λ(y) = –log(1 – F₀(y)) was convex, that is, the rate λ(y) is nondecreasing. Now we assume that the logit rate, defined by

δ (y) = \frac{d}{d y} logit [F_{0} (y)]

is nondecreasing. We write that F₀ has an increasing logit rate (ILR). Intuitively, objects that wear over time will have ILR failure time distributions.

To show a minimax result for the ILR model, we need to ‘guess’ the minimax rule. To this end we introduce the inverse rank estimate ${\hat{β}}_{U S}$ for the linear model with F logistic. Here ${\hat{β}}_{U S}$ is based on the uniform scores $a_{n} (i) = [i / (n + 1)] - \frac{1}{2} .$

Theorem 4.4. In the linear model (2.1) assume that I(f) and ∑^–1 exist. Let ${\hat{β}}_{U S}$ be the uniform scores estimates. Then

\sqrt{n} ({\hat{β}}_{U S} - β) \to N (0, \frac{1}{12 {(\int f (x) d F (x))}^{2}} Σ^{- 1}) .

It turns out that the asymptotically minimax estimate for the accelerated failure time model with increasing logit rate can be obtained from Theorem 4.4.

Theorem 4.5. Let ${\hat{β}}_{U S L}$ denote the uniform scores estimate applied to Y_i^′ = log Y_i, 1 ≤ i ≤ n. Assume that ∑^–1 exists, that F₀ is an ILR distribution, and

0 < E_{F_{0}} {{(Y_{0} \frac{f^{'}_{0} (Y_{0})}{f_{0} (Y_{0})})}^{2}} < \infty .

Then ${\hat{β}}_{U S L}$ is asymptotically minimax in the sense that if $\hat{β}$ is any estimate which satisfies

\sqrt{n} (\hat{β} - β) \to N (0, τ^{2} (F_{0}))

for some τ²(F₀), then

\sup_{F_{0}} τ^{2} (F_{0}) \geq \sup_{F_{0}} σ_{U S L}^{2} (F_{0}),

(4.6)

where the sup is over F₀ ILR.

Proof. Note that Yβ^T zε

where ε = log Y₀. Let F(t) = P(ε ≤ t) = F₀(e^t). Then logit[F(t)] = logit[F₀(e^t)] is convex. It follows that logit[F(t)] – t is nondecreasing. Thus

(logit [F (t)] - t)^{'} = \frac{f (t)}{F (t) [1 - F (t)]} - 1 \geq 0,

which implies f²xFtFtft

and

\int_{- \infty}^{\infty} f^{2} (t) d t \geq \int_{- \infty}^{\infty} F (t) [1 - F (t)] d F (t) = \int_{0}^{1} u (1 - u) d u = \frac{1}{6} .

This and Theorem 4.4 show that $\sup σ_{U S L}^{2} (F_{0})$ is maximized when $\int f^{2} (t) d t = \frac{1}{6},$ which occurs when f is the logistic density. In model (2.1) with F logistic, the MLE has the same asymptotic distribution as ${\hat{β}}_{U S L}$ . Thus ${\hat{β}}_{U S L}$ is asymptotically optimal in the case where it attains its maximum risk. It is minimax.

It is clear from this proof that the result holds if we widen the class of ILR distributions to the class of models where logit[F₀(e^x)] is convex.

Proposition 4.1. The optimality of ${\hat{β}}_{U S L}$ expressed by (4.6) holds if we take the sup over the class of distributions with logit [F₀(e^x)] convex.

Appendix: Rfit code

To indicate how easy it is now to compute inverse rank estimators, we add the few lines of appropriate Rfit code. Assuming that the vector y and the matrix x contain the responses and the design matrix, respectively, then the Rfit commands to obtain the Wilcoxon, normal scores, and log-rank fits are

fitw < – rfit(y ∼ x)

fitns < – rfit(y ∼ x, scores = nscores)

lrs < – new (“scores”, phi = function(u){–1 –log(1 – u)},

Dphi = function(u){1/(1 – u)}

fitlr < – rfit(y ∼ x, scores = lrs)

The uniform Wilcoxon scores are default in Rfit, the normal scores are in Rfit, but the log-rank scores need to be defined.

References

Adichie

(1967) Estimate of regression parameters based on rank tests. Annals of Mathematical Statistics, 38, 894–904.

Bickel

Doksum

(1977) Mathematical statistics: Basic ideas and selected topics. San Francisco: Holden-Day.

Chernoff

Savage

(1958) Asymptotic normality and efficiency of certain nonparametric test statistics. Annals of Mathematical Statistics, 39, 972–94.

Doksum

Gasko

(1990) On a correspondence between models in binary regression analysis and survival analysis. International Statistical Review, 58, 243–52.

Gastwirth

Wolff

(1968) An elementary method for obtaining lower bounds on the asymptotic power of rank tests. Annals of Mathematical Statistics, 39, 2128–30.

Halmos

(1960) Naive set theory. Chicago: van Nostrand.

Hettmansperger

(1984) Statistical inference based on ranks. New York: John Wiley and Sons.

Hettmansperger

McKean

(2011) Robust nonparametric statistical methods. New York: CRC Press.

Hodges

JL Jr

Lehmann

(1963) Estimates of location based on rank tests. Annals of Mathematical Statistics, 34, 598–611.

10.

Jaeckel

(1972) Estimating regression coefficients by minimizing the dispersion of the residuals. Annals of Mathematical Statistics, 43, 1449–58.

11.

Jurečková

(1971) Nonparametric estimate of regression coefficients. Annals of Mathematical Statistics, 42, 1328–38.

12.

Kalbfleisch

Prentice

(1973) Marginal likelihood based on Cox’s regression and life model. Biometrika, 60, 267–78.

13.

Kalbfleisch

Prentice

(1980) The statistical analysis of failure time data. New York: John Wiley and Sons.

14.

Kloke

McKean

(2012) Rift: Rank-based estimation for linear models. The R Journal, 4, 57–64.

15.

Koul

(1971) Asymptotic behavior of a class of confidence regions based on ranks in regression. Annals of Mathematical Statistics, 42, 466–76.

16.

Kraft

van Eeden

(1972) Linearized rank estimates and signed-rank estimates for the general linear hypothesis. Annals of Mathematical Statistics, 43, 42–57.

17.

Lehmann

(1975) Nonparametrics: Statistical methods based on ranks. San Francisco: Holden Day.

18.

McKean

Vidmar

(1994) A comparison of two rank based methods for the analysis of linear models. The American Statistician, 48, 220–29.

19.

Murphy

van der Vaart

(2000) On profile likelihood. Journal of the American Statistical Association, 95, 449–65.

20.

Tsiatis

(1990) Estimating regression parameters using linear rank tests for censored data. Annals of Mathematical Statistics, 18, 354–72.