Birnbaum–Saunders statistical modelling: a new approach

Abstract

Modelling based on the Birnbaum–Saunders distribution has received considerable attention in recent years. In this article, we introduce a new approach for Birnbaum–Saunders regression models, which allows us to analyze data in their original scale and to model non-constant variance. In addition, we propose four types of residuals for these models and conduct a simulation study to establish which of them has a better performance. Moreover, we develop methods of local influence by calculating the normal curvatures under different perturbation schemes. Finally, we perform a statistical analysis with real data by using the approach proposed in the article. This analysis shows the potentiality of our proposal.

Keywords

Birnbaum–Saunders distribution data analysis influence diagnostics Monte Carlo methods reparameterization residuals

1 Introduction

Birnbaum and Saunders (1969a, 1969b) proposed a statistical model for fatigue life of structures under cyclic stress. Based on a setting that shows that the failures originate from the development of a dominant crack produced by cumulative stress, they derived the Birnbaum–Saunders (BS) distribution.

The BS distribution has received considerable attention in recent years, due to its theoretical arguments associated with cumulative damage processes, its properties and its relation with the normal distribution. Specifically, the amount of cumulative damage that allows the BS distribution to be generated is assumed to follow a normal distribution. This model corresponds to a unimodal, positively skewed, two-parameter distribution and with positive support. Over the past decades, theoretical, methodological and practical aspects of the BS model have been largely studied; see, e.g., Johnson et al. (1995, pp. 651–63) for a general scope, and Owen and Padgett (2000), Guiraud et al. (2009) and Ho (2012) for applications in engineering. Novel practical applications of this model have been recently considered in areas different from engineering, which include business, environment and medicine; see, e.g., Leiva et al. (2007, 2008, 2009, 2010, 2011, 2012), Barros et al. (2008), Bhatti (2010), Vilca et al. (2010, 2011), Paula et al. (2012), Ferreira et al. (2012) and Marchant et al. (2013a, 2013b). Some extensions and generalizations of the BS distribution are attributed to Díaz-García and Leiva (2005), Vilca and Leiva (2006), Gómez et al. (2009), Kotz et al. (2010), Balakrishnan et al. (2011), Caro-Lopera et al. (2012) and Fierro et al. (2013).

A random variable Y follows a BS distribution with shape parameter α > 0 and scale parameter  > 0, denoted by Y ∼ BS(α, ), if its probability density function (PDF) takes the form

f (y; α, ϱ) = \frac{\exp (α^{- 2})}{2 α \sqrt{2 π ϱ}} y^{- \frac{3}{2}} [y + ϱ] \exp (- \frac{1}{2 α^{2}} [\frac{y}{ϱ} + \frac{ϱ}{y}]), y > 0.

In addition, ϱ is the median of the distribution of Y, 1/Y ∼ BS(α,1/ϱ) and bY ∼ BS (α,bϱ), if b > 0. The mean of Y is E[Y] = ϱ[1 + α²/2] and its variance is Var[Y] = [αϱ]² [1 + 5α²/4].

Statistical modelling based on BS distributions has considerably attracted the attention of a number of researchers. Rieck and Nedelman (1991) were the pioneers in this line. They defined that if Y ∼ BS(α, ϱ), then V = log(Y) follows a log-BS distribution with shape parameter α and location parameter γ = log(ϱ) ∈ ℝ, denoted by V ∼ log-BS(α, γ). They proposed log-linear regression models based on the log-BS distribution and applied them to fatigue data, whereas Galea et al. (2004) and Xie and Wei (2007) developed several diagnostic tools for this model. Leiva et al. (2007) formulated BS log-linear regression models and their diagnostics, and applied them to survival data of patients with blood cell cancer. Barros et al. (2008) assumed that the cumulative damage follows a Student-t distribution and then introduced BS-t log-linear regression models and their diagnostics, and applied them to survival data of patients with lung cancer. They showed that the maximum likelihood (ML) estimates for the parameters of this model are robust against outliers, which was conjectured by deviance and martingale residuals. This conjecture was mathematically formalized by Paula et al. (2012), who in this work applied BS-t log-linear models to insurance data. Lemonte and Cordeiro (2009) proposed BS non-linear regression models, generalizing the proposal of Rieck and Nedelman (1991). Lemonte and Patriota (2011) and Vanegas et al. (2012) performed diagnostic procedures for these non-linear models. For all of these regression models, the original response must be transformed to a logarithmic scale, which could provoke a reduction of the power of the study and difficulties of interpretation; see Huang and Qu (2006). In addition, although in this scale one is modelling the mean, say γ = log(ϱ), in the original scale one is modelling ϱ = exp(γ), which, in the BS case, as aforementioned, corresponds to the median. This issue could make sense when lifetime data are modelled, but if we are analyzing data in other areas, such as business, it makes more statistical sense to model the mean. Recently, Santos-Neto et al. (2012) proposed new parameterizations for the BS distribution. One of them allows us to rely the BS distribution on the mean, such as Ferrari and Cribari-Neto (2004) did, which is a similar idea to that of generalized linear models (GLM), but based on the BS distribution that does not belong to the exponential family.

The objective of this article is to develop a new approach based on BS regression models following the line of GLM. In this context, the mean response is related to the linear predictor through a link function. This linear predictor encompasses regressors and unknown parameters. Unlike the existing BS regression models at present, the new approach that we propose models the data in their original scale, because it is based on the population mean.

The remainder of the article unfolds as follows. In Section 2, we introduce a reparameterization of the BS distribution that allows us to postulate a new approach. In Section 3, we define new BS regression models, calculate their corresponding score function and develop an iterative process to estimate their parameters. In Section 4, we perform a diagnostic analysis based on normal curvatures useful for a local influence analysis, whereas in Section 5, we analyze the concept of generalized leverage (GL) for the new approach. In Section 6, we propose four types of residuals for the new BS regression models. In Section 7, we carry out Monte Carlo simulations to study the behaviour of these residuals. In Section 8, we show the potentiality of the new approach through an application with real data. Finally, in Section 9, we sketch some conclusions obtained throughout this work.

2 A BS distribution parameterized by its mean

One new parameterization of the BS distribution proposed by Santos-Neto et al. (2012) is indexed by the parameters µ and δ, where µ > 0 is a scale parameter and the mean of the distribution, whereas δ > 0 is a shape and precision parameter. Based on this parameterization, the PDF of Y is given by

f (y; μ, δ) = \frac{\exp (δ / 2) \sqrt{δ + 1}}{4 y^{3 / 2} \sqrt{π μ}} [y + \frac{δ μ}{δ + 1}] \exp (- \frac{δ}{4} [\frac{y {δ + 1}}{δ μ} + \frac{δ μ}{y {δ + 1}}]), y > 0.

(2.1)

In this case, we use the notation Y ∼ BS(µ, δ). The mean and variance of Y are given by E[Y] = µ and Var[Y] = µ²/φ, respectively, where φ = [δ + 1]²/[2δ + 5], such that, as mentioned, δ can be interpreted as a precision parameter, that is, for fixed values of µ, when δ → ∞, the variance of Y tends to zero. Also, for fixed µ, if δ → 0, then Var[Y] → 5µ². We can see that Var[Y] = µ²/φ is similar to the variance function of the gamma distribution, in which case the variance has a quadratic relation with its mean. It is also possible to show that bY ∼ BS (bµ, δ), with b > 0, and 1/Y ∼ BS(µ*, δ), where µ*= [δ + 1]/[δµ].

Figure 1

Source: Authors’ own.

Figure 1 displays some shapes of the PDF of Y ∼ BS(µ, δ) given in (2.1). From Figure 1(a), note that the parameter δ controls the skewness and kurtosis of the distribution. It is also possible to note that, as δ increases, the corresponding PDF is more concentrated around the mean and, therefore, the variability decreases. From Figure 1(b), note that the parameter µ alters the scale of the distribution and, as it increases, the variability increases too. Finally, from Figure 1(c), note that the variance tends to 20, when µ = 2 and δ → 0, whereas it tends to zero, as δ → ∞.

3 Modelling and estimation

Let Y₁, …, Y_n be independent random variables, where Y_i ∼ BS(µ_i, δ), for i = 1, …, n, and y = (y₁, …, y_n)^T their corresponding observations. Then, we define a statistical model based on (2.1) by the systematic component

h (μ_{i}) = η_{i} = x_{i}^{⊤} β, i = 1, ..., n,

(3.1)

where β = (β₁, …, β_p)^T, for p < n, is a vector of unknown parameters to be estimated, and $x_{i}^{⊤} = (1, x_{i 2}, ..., x_{i p})$ represents the values of p regressors, such that $μ_{i} = h^{- 1} (x_{i}^{⊤} β)$ , with h^–¹ being the inverse function of h. In the model given in (3.1), the link function h: ℝ → ℝ^ is strictly monotone, positive and at least twice differentiable; for example, h(µ) = log(µ) or $h (μ) = \sqrt{μ}$ .

We have that Var[Y_i] is a function of µ_iand, consequently, of the regressors x _i . Then, because we are modelling the mean based on a particular structure, we are also modelling the variance due to $V a r [Y_{i}] = μ_{i}^{2} / ϕ .$ Therefore, situations where a non-constant variance is present could be analyzed by using model given in (3.1).

The log-likelihood function of the model given in (3.1) for θ = ( β ^T, δ)^T is $ℓ (θ; y) = ℓ (θ) = \sum_{i = 1}^{n} ℓ_{i} (μ_{i}, δ; y_{i}),$ where

\begin{array}{l} ℓ_{i} (μ_{i}, δ; y_{i}) = ℓ_{i} (μ_{i}, δ) = \frac{δ}{2} - \frac{\log (16 π)}{2} - \frac{1}{2} \log (\frac{[δ + 1] y_{i}^{3} μ_{i}}{{[δ y_{i} + y_{i} + δ μ_{i}]}^{2}}) \\ - \frac{y_{i} [δ + 1]}{4 μ_{i}} - \frac{δ^{2} μ_{i}}{4 [δ + 1] y_{i}} . \end{array}

(3.2)

The score functions for β_j, with j = 1, …, p, and δ are, respectively, given by

{\dot{ℓ}}_{β_{j}} = \frac{\partial ℓ (θ)}{\partial β_{j}} = \sum_{i = 1}^{n} \underset{z_{i}}{\underset{︸}{{- \frac{1}{2 μ_{i}} + \frac{δ}{[δ y_{i} + y_{i} + δ μ_{i}]} + \frac{y_{i} [δ + 1]}{4 μ_{i}^{2}} - \frac{δ^{2}}{4 y_{i} [δ + 1]}}}} \underset{a_{i}}{\underset{︸}{\frac{1}{h^{'} (μ_{i})}}} x_{i j} = \sum_{i = 1}^{n} z_{i} a_{i} x_{i j}

and

{\dot{ℓ}}_{δ} = \frac{\partial ℓ (θ)}{\partial δ} = \sum_{i = 1}^{n} \underset{b_{i}}{\underset{︸}{{\frac{1}{2} - \frac{1}{2 [δ + 1]} + \frac{[y_{i} + μ_{i}]}{[δ y_{i} + y_{i} + δ μ_{i}]} - \frac{y_{i}}{4 μ_{i}} - \frac{δ [δ + 2] μ_{i}}{4 {[δ + 1]}^{2} y_{i}}}}} = \sum_{i = 1}^{n} b_{i},

where h′ is the derivative of h. Then, in matrix form, we have

{\dot{ℓ}}_{β} = ({\dot{ℓ}}_{β_{j}}) = X^{⊤} D (a) z a n d {\dot{ℓ}}_{δ} = t r (D (b)),

(3.3)

where X = ( x ₁, …, x _n), with x _i given in (3.1), for i = 1, …, n, z = (z₁, …, z_n)^T and D denotes the diagonalization operator of a vector, such that D ( a ) = diag( a ) and D ( b ) = diag( b ) are n × n matrices, with a = (a₁, …, a_n)^T and b = (b₁, …, b_n)^T. Thus, the score vector is ${\dot{ℓ}}_{θ} = {(\dot{ℓ}_{β}^{⊤}, \dot{ℓ_{δ}})}^{⊤} .$

In order to estimate the model parameters by the ML method, we solve the equation $\dot{ℓ_{θ}} = 0 .$ However, no closed-form expressions for the ML estimates are available. Then, an iterative method for non-linear optimization is needed, such as the BHHH, Fisher scoring, Newton or quasi-Newton (BFGS) algorithms. In this work, we use the Fisher scoring method.

The algorithm for estimating θ is given by

θ^{(m + 1)} = θ^{(m)} + {(K_{θ θ}^{- 1})}^{(m)} . ℓ_{θ}^{(m)}, m = 0, 1, 2, ...,

(3.4)

where K_θθ is the expected Fisher information matrix given in (A6) (see Appendix). We can rewrite K_θθ as K_θθ $= {\tilde{X}}^{⊤} \tilde{W} \tilde{X},$ where

\tilde{X} = [\begin{matrix} X & 0 \\ 0 & 1 \end{matrix}] and \tilde{W} = [\begin{matrix} D (υ) & D (a) s \\ s^{⊤} D (a) & tr (D (u)) \end{matrix}],

with a and X as given in (3.3) and s , u and υ as in (A6) (see Appendix).

Therefore, the inverse of K_θθ can be expressed by $K_{θ θ}^{- 1} = {({\tilde{X}}^{⊤} \tilde{W} \tilde{X})}^{- 1},$ so that (3.4) now can be written as

θ^{(m + 1)} = θ^{(m)} + {({\tilde{X}}^{⊤} {\tilde{W}}^{(m)} \tilde{X})}^{- 1} [\begin{matrix} X^{⊤} D {(a)}^{(m)} z^{(m)} \\ tr {(D (b))}^{(m)} \end{matrix}]

= θ^{(m)} + {({\tilde{X}}^{⊤} {\tilde{W}}^{(m)} \tilde{X})}^{- 1} {\tilde{X}}^{⊤} [\begin{matrix} D {(a)}^{(m)} & 0 \\ 0 & tr {(D (b))}^{(m)} \end{matrix}] [\begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} z^{(m)} \\ 1 \end{matrix} \end{matrix} \end{matrix} \end{matrix}]

= {({\tilde{X}}^{⊤} {\tilde{W}}^{(m)} \tilde{X})}^{- 1} {\tilde{X}}^{⊤} {\tilde{W}}^{(m)} z *^{(m)}, m = 0, 1, 2, ...,

(3.5)

where b is as given in (3.3) and

z *^{(m)} = \tilde{X} θ^{(m)} + {({\tilde{W}}^{(m)})}^{- 1} [\begin{matrix} D {(a)}^{(m)} & 0 \\ 0 & t r {(D (b))}^{(m)} \end{matrix}] [\begin{matrix} z^{(m)} \\ 1 \end{matrix}] .

Note that θ ^{(m + 1)} given in (3.5) takes the form of a weighted least square estimate, when z * is a modified response in each step.

Under some regularity conditions (see Cox and Hinkley, 1974), the asymptotic distribution of $\hat{θ}$ is given by $\hat{θ} \dot{~} N_{p + 1} (θ, \underset{θ}{\sum^{}}),$ where $\dot{~}$ means ‘approximately distributed’ and $\sum_{θ}$ is the asymptotic variance–covariance matrix of $\hat{θ},$ which can be obtained by using the inverse expected Fisher information matrix $K_{θ θ}^{- 1}$ and estimated by replacing θ at $\hat{θ} .$ Thus, an approximate 100 × [1 – ξ]% confidence region for θ is given by ${[\hat{θ} - θ]}^{⊤} {\hat{Σ}}_{θ}^{- 1} [\hat{θ} - θ] \leq χ_{1 - ξ}^{2} (p + 1),$ for θ ∈ ℝ^{p + 1}, where $χ_{1 - ξ}^{2} (p + 1)$ is the (1 – ξ)th quantile of the chi-squared distribution with p + 1 degrees of freedom and ${\sum^{^}}_{θ}$ is a consistent estimator of ∑_θ. Furthermore, it is possible to construct asymptotic 100 × [1 – ξ]% confidence bands for the linear predictor $μ (x_{p r e d}) = h^{- 1} (x_{p r e d}^{⊤} β),$ for all x _pred ∈ ℝ^p, where x _pred is an arbitrary p × 1 vector. We have that the asymptotic distribution of $\hat{β}$ is given by $\hat{β} \dot{~} N p_{} (β, \underset{β}{\sum^{}}),$ where ∑_β is the asymptotic variance–covariance matrix of $\hat{β},$ which can be obtained appropriately from ∑_θ. Then, an approximate 100 × [1 – ξ]% confidence region for µ ( x _pred) is

[h^{- 1} (x_{pred}^{⊤} \hat{β} - \sqrt{χ_{1 - ξ}^{2} (p)} {x_{pred}^{⊤} {\sum^{^}}_{β} x_{pred}}^{\frac{1}{2}}), h^{- 1} (x_{pred}^{⊤} \hat{β} + \sqrt{χ_{1 - ξ}^{2} (p)} {x_{pred}^{⊤} {\sum^{^}}_{β} x_{pred}}^{\frac{1}{2}})],

(3.6)

where ${\sum^{^}}_{β} = {(X^{⊤} \hat{V} X)}^{- 1},$ with $\hat{V} = D (\hat{v}) - D (\hat{a}) \hat{s} {[tr (D (\hat{u}))]}^{- 1} {\hat{s}}^{⊤} D (\hat{a}),$ x _pred ∈ ℝ^pand 0 < ξ < 1, with a as given in (3.3) and s , u and v as in (A6) (see Appendix). For more details about confidence bands for regression models, see, e.g., Liu et al. (2008).

4 Local influence

The likelihood displacement is defined as $L D (ω) = 2 [ℓ (\hat{θ}) - ℓ ({\hat{θ}}_{ω})],$ where ${\hat{θ}}_{ω}$ is the ML estimate of θ for a perturbed model and ω = (ω₁, …, ω_n)^T is a perturbation vector. Cook (1986) proposed to study the local behaviour of LD( ω ) around ω ₀, which represents the non-perturbed vector, such that LD( ω ₀) = 0. The normal curvature for $\hat{θ}$ at the arbitrary direction l , with $‖ l ‖ = 1$ , is given by $C_{l} (\hat{θ}) = 2 | l^{⊤} Δ^{⊤} \overset{..}{ℓ_{\hat{θ} \hat{θ}}^{- 1}} Δ l |,$ where $\overset{..}{ℓ_{\hat{θ} \hat{θ}}}$ is the Hessian matrix of _θ evaluated at $\hat{θ}$ and Δ is a (p + 1) × n perturbation matrix, with elements

Δ_{i j} = {\frac{\partial^{2} ℓ_{ω} (θ)}{\partial θ_{i} \partial ω_{j}} |}_{θ = \hat{θ}, ω = ω_{0}}, i = 1, ..., p + 1, j = 1, ..., n,

and _ω^( θ ) being the log-likelihood function corresponding to the model perturbed by ω .

For the model given in (3.1), the elements of ${\overset{..}{ℓ}}_{θ θ}$ are $\overset{..}{ℓ_{β β}} = X^{⊤} D (c) X,$ $\overset{..}{ℓ_{β δ}} = \overset{..}{ℓ_{δ β}} = X^{⊤} D (a) m$ and $\overset{..}{ℓ_{δ δ}} = t r (D (d)),$ with c and m as given in (A1) and (A2), respectively (see Appendix). We consider the direction l _max as the eigenvector corresponding to the largest eigenvalue of the matrix

\ddot{F} = - Δ^{⊤} \overset{..}{ℓ_{\hat{θ} \hat{θ}}^{- 1}} Δ;

(4.1)

see Cook (1986). The index plot of l _max may detect those observations that are potentially influential on $\hat{θ} .$

In case our interest is only on the vector $\hat{β},$ the normal curvature at the direction l is expressed by $C_{l} (\hat{β}) = 2 | l^{⊤} Δ^{⊤} [\overset{..}{ℓ_{\hat{θ} \hat{θ}}^{- 1}} - \overset{..}{ℓ_{1}}] Δ l |,$ where the (p + 1) × (p + 1) matrix $\overset{..}{ℓ_{1}}$ is given by

\overset{..}{ℓ_{1}} = [\begin{matrix} 0 & 0 \\ 0 & .. ℓ_{\hat{δ} \hat{δ}}^{- 1} \end{matrix}] .

If our interest lies in studying the local influence on $\hat{δ}$ , the normal curvature in the direction of the vector l is given by $C_{l} (\hat{δ}) = 2 | l^{⊤} Δ^{⊤} [\overset{..}{ℓ_{\hat{θ} \hat{θ}}^{- 1}} - \overset{..}{ℓ_{2}}] Δ l |,$ where the (p + 1) × (p + 1) matrix $\overset{..}{ℓ_{2}}$ is given by

\overset{..}{ℓ_{2}} = [\begin{matrix} ℓ_{\hat{β} \hat{β}}^{- 1} & 0 \\ 0 & 0 \end{matrix}] ...

Another important direction is l = e _in , where e _in is an n × 1 vector of zeros, with one at the ith position. In that case, the normal curvature, called total local influence of the ith observation, is given by $C_{i} = 2 | e_{i n} \ddot{F} e_{i n} | = 2 | {\ddot{F}}_{i i} |,$ where ${\ddot{F}}_{i i}$ is the ith diagonal element of $\ddot{F}$ given in (4.1). Lesaffre and Verbeke (1998) suggested to pay a special attention in those observations with $C_{i} > 2 \bar{C}$ , where $\bar{C} = \sum_{i = 1}^{n} C_{i} / n .$

4.1 Case–weights perturbation

Let ω = (ω₁, …, ω_n)^T be a weight vector. In this case, the perturbed log-likelihood function is given by $ℓ_{ω} (θ) = \sum_{i = 1}^{n} ω_{i} ℓ_{i} (μ_{i}, δ),$ where _i(µ_i, δ) is defined in (3.2), with 0 ≤ ω_i ≤ 1, for i = 1 , …, n, and ω ₀ = (1, …, 1)^T. Hence, the perturbation matrix is expressed as

Δ = [\begin{matrix} X^{⊤} D (\hat{a}) D (\hat{z}) \\ {\hat{b}}^{⊤} \end{matrix}],

where a , b and z are given in (3.3).

4.2 Response perturbation

Consider now an additive perturbation on the ith response by making y_i(ω_i) = y_i + ω_i s(y_i), where $s (y_{i}) = \sqrt{{\hat{μ}}_{i}^{2} / \hat{ϕ}}$ and ω_i ∈ ℝ, for i = 1, …, n. Then, under the scheme of response perturbation, the log-likelihood function is given by $ℓ_{ω} (θ) = \sum_{i = 1}^{n} ℓ_{ω_{i}} (μ_{i}, δ),$ where

ℓ_{ω_{i}} (μ_{i}, δ) = \frac{δ}{2} - \frac{1}{2} \log (δ + 1) - \frac{1}{2} \log (16 π) - \frac{1}{2} \log (μ_{i})

- \frac{3}{2} \log (y_{i} (ω_{i})) + \log (δ y_{i} (ω_{i}) + y_{i} (ω_{i}) + δ μ_{i}) - \frac{y_{i} (ω_{i}) [δ + 1]}{4 μ_{i}} - \frac{δ^{2} μ_{i}}{4 y_{i} (ω_{i}) [δ + 1]}

and ω ₀ = (0, …, 0)^T. Hence, the perturbation matrix here takes the form

Δ = [\begin{matrix} X^{⊤} D (\hat{a}) D (\hat{ψ}) D (\hat{ϑ}) \\ {\hat{τ}}^{⊤} D (\hat{ϑ}) \end{matrix}],

where $\hat{ϑ} = {({\hat{ϑ}}_{1}, ..., {\hat{ϑ}}_{n})}^{⊤},$ $\hat{ψ} = {({\hat{ψ}}_{1}, ..., {\hat{ψ}}_{n})}^{⊤},$ and $\hat{τ} = {({\hat{τ}}_{1}, ..., {\hat{τ}}_{n})}^{⊤},$ with ${\hat{ϑ}}_{i} = s (y_{i}),$

{\hat{ψ}}_{i} = - \frac{\hat{δ} [\hat{δ} + 1]}{{[\hat{δ} y_{i} + y_{i} + \hat{δ} {\hat{μ}}_{i}]}^{2}} + \frac{[\hat{δ} + 1]}{4 {\hat{μ}}_{i}^{2}} + \frac{{\hat{δ}}^{2}}{4 [\hat{δ} + 1] y_{i}^{2}},

and

{\hat{τ}}_{i} = - \frac{{\hat{μ}}_{i}}{{[\hat{δ} y_{i} + y_{i} + \hat{δ} {\hat{μ}}_{i}]}^{2}} - \frac{1}{4 {\hat{μ}}_{i}} + \frac{\hat{δ} [\hat{δ} + 2] {\hat{μ}}_{i}}{4 y_{i}^{2} {[\hat{δ} + 1]}^{2}}, i = 1, ..., n .

4.3 Regressor perturbation

Consider now an additive perturbation on a particular continuous regressor, namely x_t , for t = 1, …, p, by making x_it (ω_i) = x_it + ω_i s_x, where s_x is a scale factor, which can be the estimated standard deviation (SD) of x_t , and ω_i ∈ ℝ, for i = 1, …, n. Then, under the scheme of regressor perturbation, the log-likelihood function is given by $ℓ_{ω} (θ) = \sum_{i = 1}^{n} ℓ_{ω_{i}} (μ_{i}, δ),$ where

ℓ_{ω_{i}} (μ_{i}, δ) = \frac{δ}{2} - \frac{1}{2} \log (16 π) - \frac{1}{2} \log (\frac{[δ + 1] y_{i}^{3} μ_{i} (ω_{i})}{{[δ y_{i} + y_{i} + δ μ_{i} (ω_{i})]}^{2}}) - \frac{y_{i} [δ + 1]}{4 μ_{i} (ω_{i})} - \frac{δ^{2} μ_{i} (ω_{i})}{4 [δ + 1]},

and ω ₀ = (0, …, 0)^T, with $μ_{i} (ω_{i}) = h^{- 1} (x_{i}^{⊤} (ω_{i}) β)$ and $x_{i}^{⊤} (ω_{i}) = {(1, x_{i 2}, ..., x_{t i} (ω_{i}), ..., x_{i p})}^{⊤} .$ Hence, the perturbation matrix assumes the form

Δ = [\begin{matrix} Δ_{β} \\ Δ_{δ} \end{matrix}],

where Δ_β = (Δ_βij) is a p × n matrix with elements, when j ≠ t, given by

Δ_{β_{i j}} = s_{x} {\hat{β}}_{t} {\hat{a}}^{'}_{i} x_{i j} {\hat{q}}_{i} + s_{x} {\hat{β}}_{t} x_{i j} {\hat{a}}_{i}^{2} [\frac{1}{2 {\hat{μ}}_{i}^{2}} - \frac{{\hat{δ}}^{2}}{{\hat{δ} y_{i} + y_{i} + \hat{δ} {\hat{μ}}_{i}}^{2}} - \frac{y_{i} {\hat{δ} + 1}}{2 {\hat{μ}}_{i}^{3}}],

with

{\hat{q}}_{i} = - \frac{1}{2 {\hat{μ}}_{i}} + \frac{\hat{δ}}{[\hat{δ} y_{i} + y_{i} + \hat{δ} {\hat{μ}}_{i}]} + \frac{y_{i} [\hat{δ} + 1]}{4 {\hat{μ}}_{i}^{2}} - \frac{{\hat{δ}}^{2}}{4 y_{i} [\hat{δ} + 1]},

whereas, when j = t, it is given by

Δ_{β_{i t}} = s_{x} {\hat{a}}_{i} {\hat{q}}_{i} + s_{x} {\hat{β}}_{t} {\hat{a}}^{'}_{i} x_{i t} {\hat{q}}_{i} + s_{x} {\hat{β}}_{t} x_{i t} {\hat{a}}_{i}^{2} [\frac{1}{2 {\hat{μ}}_{i}^{2}} - \frac{{\hat{δ}}^{2}}{{\hat{δ} y_{i} + y_{i} + \hat{δ} {\hat{μ}}_{i}}^{2}} - \frac{y_{i} {\hat{δ} + 1}}{2 {\hat{μ}}_{i}^{3}}],

and $Δ_{δ} = ({\hat{ζ}}_{1}, ..., {\hat{ζ}}_{n}),$ with

{\hat{ζ}}_{i} = s_{x} {\hat{β}}_{t} {\hat{a}}_{i} [\frac{y_{i}}{{\hat{δ} y_{i} + y_{i} + \hat{δ} {\hat{μ}}_{i}}^{2}} + \frac{y_{i}}{4 {\hat{μ}}_{i}^{2}} + \frac{\hat{δ} {\hat{δ} - 2}}{4 y_{i} {\hat{δ} + 1}^{2}}],

where a_i is as given in (3.3) and ${a^{'}}_{i}$ is its derivative. In addition, ${\hat{a}}_{i},$ ${\hat{a}}^{'}_{i}$ are a_i, ${a^{'}}_{i}$ evaluated at $\hat{μ}$ and ω = ω ₀.

4.4 Perturbation on the precision parameter

We modify now the precision of the model given in (3.1) as δ_i = δ/ω_i, with ω_i> 0, for i = 1, …, n, so that the precision of the perturbed model is non-constant across observations. In this case, the perturbed log-likelihood function is given by $ℓ_{ω} (θ) = \sum_{i = 1}^{n} ℓ_{ω_{i}} (μ_{i}, δ_{i}),$ where

ℓ_{ω_{i}} (μ_{i}, δ_{i}) = \frac{δ}{2 ω_{i}} - \frac{\log (16 π)}{2} - \frac{\log (δ + ω_{i})}{2} - \frac{\log (ω_{i})}{2} - \frac{\log (μ_{i})}{2}

- \frac{3 \log (y_{i})}{2} + \log (δ y_{i} + y_{i} ω_{i} + δ μ_{i}) - \frac{y_{i} [δ + ω_{i}]}{4 μ_{i} ω_{i}} - \frac{δ^{2} μ_{i}}{4 y_{i} ω_{i} [δ + ω_{i}]}

and ω ₀ = (1, …, 1)^T. Hence, the perturbation matrix is here expressed as

Δ = [\begin{matrix} X^{⊤} D (\hat{a}) D (\hat{ϖ}) \\ {\hat{φ}}^{⊤} \end{matrix}],

where $\hat{ϖ} = {({\hat{ϖ}}_{1}, ..., {\hat{ϖ}}_{n})}^{⊤}$ and $\hat{φ} = {({\hat{φ}}_{1}, ..., {\hat{φ}}_{n})}^{⊤},$ with

{\hat{ϖ}}_{i} = - \frac{\hat{δ} y_{i}}{{[\hat{δ} y_{i} + y_{i} + \hat{δ} {\hat{μ}}_{i}]}^{2}} - \frac{\hat{δ} y_{i}}{4 {\hat{μ}}_{i}^{2}} + \frac{{\hat{δ}}^{2} [\hat{δ} + 2]}{4 y_{i} {[\hat{δ} + 1]}^{2}}

and

{\hat{φ}}_{i} = - \frac{1}{2} + \frac{1}{2 {[\hat{δ} + 1]}^{2}} - \frac{y_{i} [y_{i} + {\hat{μ}}_{i}]}{{[\hat{δ} y_{i} + y_{i} + \hat{δ} {\hat{μ}}_{i}]}^{2}} + \frac{y_{i}}{4 {\hat{μ}}_{i}} + \frac{{\hat{δ}}^{2} {\hat{μ}}_{i} [\hat{δ} + 3]}{4 y_{i} {[\hat{δ} + 1]}^{3}} + \frac{\hat{δ} {\hat{μ}}_{i}}{y_{i} {[\hat{δ} + 1]}^{3}}, i = 1, ..., n .

5 Generalized leverage

The study of leverage points has as objective to evaluate the influence of the observed value of the response y_i on its own predicted value ${\hat{y}}_{i};$ see Wei et al. (1998). In the case of the BS regression models given in (3.1), the GL can be obtained similarly as in GLM by

\frac{\partial \hat{y}}{\partial y} = G L_{θ} = H_{θ} {(- \overset{..}{ℓ_{θ θ}})}^{- 1} \overset{..}{ℓ_{θ y}} = \frac{D (\hat{a}) X \hat{A} \hat{B} D (\hat{a}) D (\hat{ν})}{\hat{E}} - {\hat{G L}}_{β},

where y = (y₁, …, y_n)^T, H_θ = ∂ µ /∂ θ ^T, with µ = (µ₁, …, µ_n)^T, $\overset{..}{ℓ_{θ y}} = \partial^{2} ℓ (θ) / \partial θ \partial y^{⊤},$ $\hat{B} = {\hat{λ}}^{⊤} {(D (\hat{a}) D (\hat{ν}))}^{- 1} - {\hat{A}}^{⊤},$ with $\hat{λ} = {({\hat{λ}}_{1}, ..., {\hat{λ}}_{n})}^{⊤},$

{\hat{λ}}_{i} = - \frac{{\hat{μ}}_{i}}{{[\hat{δ} y_{i} + y_{i} + \hat{δ} {\hat{μ}}_{i}]}^{2}} + \frac{1}{4 {\hat{μ}}_{i}^{2}} + \frac{\hat{δ} [\hat{δ} + 2] {\hat{μ}}_{i}}{4 {[\hat{δ} + 1]}^{2} y_{i}^{2}}, i = 1, ..., n,

and A, E defined in (A4) (see Appendix). In addition,

{\hat{GL}}_{β} = D (\hat{a}) X {\hat{M}}^{- 1} X^{⊤} D (\hat{a}) D (\hat{ν})

is the GL for the vector of parameters β , where M is defined in (A4) (see Appendix) and $\hat{ν} = {({\hat{ν}}_{1}, ..., {\hat{ν}}_{n})}^{⊤},$

{\hat{ν}}_{i} = - \frac{\hat{δ} [\hat{δ} + 1]}{{[\hat{δ} y_{i} + y_{i} + \hat{δ} {\hat{μ}}_{i}]}^{2}} + \frac{[\hat{δ} + 1]}{4 {\hat{μ}}_{i}^{2}} + \frac{{\hat{δ}}^{2}}{4 [\hat{δ} + 1] y_{i}^{2}} .

6 Residual analysis

We propose four types of residuals for the BS regression models defined in Section 3.

6.1 First type of residual

This residual is based on the difference $y_{i} - {\hat{μ}}_{i} .$ Then, we have the standardized residual

r_{i}^{(1)} = \frac{y_{i} - {\hat{μ}}_{i}}{\sqrt{\hat{V a r} [Y_{i}]}} = \frac{{\hat{ϕ}}^{1 / 2} [y_{i} - {\hat{μ}}_{i}]}{\sqrt{{\hat{μ}}_{i}^{2}}}, i = 1, ..., n,

(6.1)

where φ and µ_i are as given in (2.1) and (3.1), respectively.

6.2 Second type of residual

This second residual is based on the work proposed by Jørgensen (1984). Then, we have

r_{i}^{(2)} = J_{i} {({\hat{μ}}_{i})}^{- 1 / 2} k_{i} ({\hat{μ}}_{i}), i = 1, ..., n,

(6.2)

where

k_{i} ({\hat{μ}}_{i}) = - \frac{1}{2 {\hat{μ}}_{i}} + \frac{\hat{δ}}{[\hat{δ} y_{i} + y_{i} + \hat{δ} {\hat{μ}}_{i}]} + \frac{y_{i} [\hat{δ} + 1]}{4 {\hat{μ}}_{i}^{2}} - \frac{{\hat{δ}}^{2}}{4 y_{i} [\hat{δ} + 1]}

and

J_{i} ({\hat{μ}}_{i}) = - \frac{1}{2 {\hat{μ}}_{i}^{2}} + \frac{{\hat{δ}}^{2}}{{[\hat{δ} y_{i} + y_{i} + \hat{δ} {\hat{μ}}_{i}]}^{2}} + \frac{[\hat{δ} + 1] y_{i}}{2 {\hat{μ}}_{i}^{3}},

with $k_{i} ({\hat{μ}}_{i})$ being the ith element of the vector k ( µ ) = ∂( θ )/∂ µ and $J_{i} ({\hat{μ}}_{i})$ the ith diagonal element of J ( µ ) = – ∂²( θ )/∂ µ ^T∂ µ evaluated at $\hat{θ} .$

6.3 Third type of residual

This third residual is based on the iterative process used in the estimation of the proposed model parameters. In the iterative process defined in (3.5), considering δ as known, we can write the expression of the estimate β at step m as

β^{(m + 1)} = {(X^{⊤} D {(v)}^{(m)} X)}^{- 1} X^{⊤} D {(v)}^{(m)} z_{1}^{(m)},

(6.3)

where $z_{1}^{(m)} = X β^{(m)} + {(D {(v)}^{(m)})}^{- 1} D {(a)}^{(m)} z^{(m)}$ is a p × 1 vector. From the convergence of the iterative process given in (6.3), we obtain

\hat{β} = {(X^{⊤} D (\hat{v}) X)}^{- 1} X^{⊤} D (\hat{v}) {\hat{z}}_{2},

(6.4)

where ${\hat{z}}_{2} = \hat{η} + D {(\hat{v})}^{- 1} D (\hat{a}) \hat{z},$ with $\hat{η} = X \hat{β} = {({\hat{η}}_{1}, ..., {\hat{η}}_{p})}^{⊤} .$ Thus, $\hat{β}$ can be interpreted as the solution of a linear regression by computing ordinary least squares of $D {(\hat{v})}^{1 / 2} {\hat{z}}_{2}$ against the columns of $D {(\hat{v})}^{1 / 2} X .$ Then, the standardized residual based on the solution of a weighted linear regression by computing ordinary least square estimates of ${\hat{z}}_{2}$ against X , given by $r * = D {(\hat{v})}^{1 / 2} [{\hat{z}}_{2} - \hat{η}] = D {(\hat{v})}^{- 1 / 2} D (\hat{a}) \hat{z},$ turns out to be

r_{i}^{(3)} = \frac{{\hat{z}}_{i}}{\sqrt{{\hat{v}}_{i}^{*}}}, i = 1, ..., n,

(6.5)

where ${\hat{v}}_{i}^{*} = \hat{δ} / [2 {\hat{μ}}_{i}^{2}] + [{\hat{δ}}^{2} / {\hat{δ} + 1}^{2}] ℐ (\hat{θ}),$ with ( θ ) being defined in (A5); see Appendix.

6.4 Fourth type of residual

We have proposed three residuals for BS regression models. However, the most used residual in GLM is defined from the components of the deviance function. Assuming δ as fixed or known, we propose the deviance component (DC) residual for the BS regression model given in (3.1) as

r_{i}^{(4)} = s i g n (y_{i} - {\hat{μ}}_{i}) \sqrt{2} [\log (2) - \frac{δ}{2} + \frac{{δ + 1} y_{i}}{4 {\hat{μ}}_{i}} + \frac{δ^{2} {\hat{μ}}_{i}}{4 {δ + 1} y_{i}}

+ \frac{1}{2} \log (\frac{δ {δ + 1} y_{i} {\hat{μ}}_{i}}{{δ y_{i} + y_{i} + δ {\hat{μ}}_{i}}^{2}})]^{\frac{1}{2}},

(6.6)

where ${\hat{μ}}_{i}$ is the ML estimate of µ_i and $s i g n (y_{i} - {\hat{μ}}_{i})$ returns the signal of the difference $y_{i} - {\hat{μ}}_{i} .$ Observe in (6.6) that the DC residual for the BS model given in (3.1) is well defined for all µ_i and δ, whereas in the log-BS model, the DC residual is restricted to α < 2; see Leiva et al. (2007).

7 Simulation study

Based on the approach presented in this article, we develop a set of computational routines in R computer language; see http://www.R-project.org and Barros et al. (2009). These routines contain the functions bsreg, bsreg.fit, diagnostics.bs, envelope.bs, residuals and summary.bs, which are available to interested users upon request from the authors or from the website of the journal; see http://smj.sagepub.com.

We perform a simulation study to examine the distributions of the residuals r⁽¹⁾, r⁽²⁾, r⁽³⁾ and r⁽⁴⁾ defined in (6.1), (6.2), (6.5) and (6.6), respectively. We use a BS regression model with link function µ_i(7.1)β₁β₂x_ii

where the true values of the parameters are taken as β₁ = 0.2, β₂ = 0.5, and δ = 2 and δ = 25. In this study, we assume that the values of the regressor (x_i) are obtained from a uniform distribution in the interval (0, 1). The number of Monte Carlo replications is 5000. Using the relation µ_i = exp(β₁ β₂x_i), we obtain the values of µ_i. In each of the 5000 replications, we obtain the observations y = (y₁, …, y₂₀)^T from the BS distribution with parameters µ_i and δ, for i = 1, …, 20. Then, we fit the model given in (7.1) through the command bsreg(), and generate the residuals by the command residuals().

Table 1 displays descriptive statistics based on the mean ( $\bar{y}$ ), SD and coefficients of skewness (CS) and kurtosis (CK) for the four types of residuals. From this table, note that r⁽¹⁾, r⁽³⁾ and r⁽⁴⁾ have mean approximately equal to zero, as expected, while residual r⁽²⁾ presents a negative value for its mean. The results of the simulations show that all the residuals have SDs close to one. Note also that the empirical distributions of residuals r⁽¹⁾ and r⁽²⁾ have positive and negative asymmetry, respectively. In addition, note that the residuals r⁽³⁾ and r⁽⁴⁾ present asymmetry close to zero. Furthermore, all the residuals show values of the CK close to three. Note that the residuals r⁽³⁾ and r⁽⁴⁾ have a similar behaviour. For δ = 2, residuals r⁽¹⁾ and r⁽²⁾ are more symmetrical and these have a higher level of kurtosis than the case δ = 25.

In order to graphically view the aspects detected in Table 1, we perform a comparison between the empirical (sample) distribution of the residuals and the standard normal distribution, by using a quantile against quantile (QQ) plot with simulated envelope; for more details about the QQ plot with envelope, see Atkinson (1985). Figure 2 presents eight QQ plots with simulated envelope, one for each 5000 residuals r⁽¹⁾, r⁽²⁾, r⁽³⁾, and r⁽⁴⁾ and for each value of δ = 2 and δ = 25. Note that the QQ plots with simulated envelope of the residuals r⁽¹⁾ and r⁽²⁾ are further away from the diagonal line and are outside the envelope. However, note that the residuals r⁽³⁾ and r⁽⁴⁾ present an approximately linear behaviour and have a good agreement with the normal distribution.

Table 1

Descriptive summary of the indicated residual

	$r_{i}^{(\cdot)}$	i =1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20
δ = 2
	r ⁽¹⁾	0.00	–0.02	–0.02	–0.03	0.03	–0.01	0.02	–0.02	0.02	–0.03	0.00	0.00	–0.01	0.02	0.01	–0.01	0.03	0.02	–0.03	0.00
	r ⁽²⁾	–0.36	–0.32	–0.32	–0.35	–0.37	–0.37	–0.39	–0.35	–0.39	–0.35	–0.33	–0.36	–0.40	–0.39	–0.38	–0.39	–0.38	–0.37	–0.37	–0.37
$\bar{y}$	r ⁽³⁾	0.00	0.01	0.01	–0.01	0.02	–0.01	0.00	0.00	0.00	–0.01	0.02	0.00	–0.02	0.00	0.00	–0.02	0.01	0.01	–0.01	0.00
	r ⁽⁴⁾	–0.04	–0.04	–0.04	–0.05	–0.03	–0.06	–0.05	–0.05	–0.05	–0.05	–0.03	–0.05	–0.07	–0.04	–0.05	–0.06	–0.04	–0.05	–0.07	–0.05
	r ⁽¹⁾	0.94	0.89	0.88	0.91	1.04	0.94	1.02	0.92	1.02	0.89	0.92	0.97	0.94	1.01	1.00	0.97	1.03	1.00	0.92	0.99
	r ⁽²⁾	1.23	1.14	1.15	1.17	1.25	1.21	1.28	1.15	1.28	1.15	1.18	1.19	1.27	1.28	1.25	1.27	1.26	1.24	1.17	1.21
SD	r ⁽³⁾	0.96	0.89	0.89	0.91	1.02	0.95	1.02	0.91	1.02	0.90	0.93	0.95	0.98	1.01	0.99	0.99	1.02	0.99	0.92	0.97
	r ⁽⁴⁾	0.91	0.86	0.86	0.88	0.94	0.91	0.94	0.88	0.94	0.87	0.89	0.90	0.92	0.94	0.93	0.92	0.94	0.92	0.88	0.91
	r ⁽¹⁾	1.72	1.59	1.72	1.75	1.99	1.70	1.92	1.68	1.98	1.64	1.60	1.79	1.78	1.80	1.80	1.88	1.94	1.91	1.77	1.89
	r ⁽²⁾	–1.61	–1.45	–1.58	–1.46	–1.55	–1.45	–1.59	–1.41	–1.62	–1.41	–1.54	–1.42	–1.58	–1.61	–1.56	–1.57	–1.56	–1.59	–1.45	–1.47
CS	r ⁽³⁾	–0.07	–0.02	–0.04	0.04	0.18	0.01	0.09	0.06	0.08	0.01	–0.07	0.13	–0.09	0.04	0.07	–0.01	0.16	0.06	0.07	0.16
	r ⁽⁴⁾	0.10	0.14	0.12	0.17	0.21	0.17	0.18	0.21	0.18	0.18	0.12	0.22	0.11	0.18	0.20	0.16	0.19	0.18	0.21	0.23
	r ⁽¹⁾	6.52	5.86	6.77	6.71	7.92	6.31	7.49	6.13	8.04	5.90	5.78	6.71	6.98	6.68	6.60	7.42	7.77	7.48	6.65	7.26
	r ⁽²⁾	6.09	5.31	5.96	5.43	5.86	5.41	6.23	5.33	6.21	5.21	5.78	5.23	5.95	6.25	6.05	5.89	5.99	6.29	5.50	5.53
CK	r ⁽³⁾	4.04	3.74	4.04	3.92	4.41	3.78	4.26	3.77	4.49	3.72	3.83	3.88	4.14	4.12	4.06	4.20	4.34	4.40	3.92	4.09
	r ⁽⁴⁾	2.68	2.58	2.72	2.67	2.79	2.61	2.77	2.62	2.82	2.61	2.63	2.66	2.70	2.74	2.73	2.77	2.75	2.76	2.70	2.74
δ = 25
	r ⁽¹⁾	0.00	0.00	0.00	–0.01	0.02	0.00	0.01	–0.01	0.00	–0.01	0.01	0.00	–0.02	0.01	0.01	–0.01	0.02	0.01	–0.02	0.00
	r ⁽²⁾	–0.13	–0.11	–0.11	–0.13	–0.13	–0.14	–0.14	–0.13	–0.15	–0.13	–0.11	–0.13	–0.16	–0.14	–0.14	–0.15	–0.13	–0.14	–0.14	–0.13
$\bar{y}$	r ⁽³⁾	0.00	0.01	0.01	0.00	0.01	0.00	0.00	0.00	–0.01	0.00	0.02	0.00	–0.02	0.00	0.00	–0.01	0.01	0.00	–0.02	0.00
	r ⁽⁴⁾	0.00	0.01	0.00	–0.01	0.01	–0.01	–0.01	–0.01	–0.01	–0.01	0.01	–0.01	–0.02	–0.01	–0.01	–0.02	0.00	0.00	–0.02	–0.01
	r ⁽¹⁾	0.99	0.94	0.94	0.97	1.04	1.00	1.04	0.96	1.04	0.95	0.97	1.00	0.99	1.03	1.02	1.01	1.04	1.01	0.97	1.01
	r ⁽²⁾	1.07	1.00	1.01	1.02	1.08	1.06	1.09	1.02	1.09	1.01	1.03	1.04	1.08	1.09	1.07	1.08	1.08	1.07	1.02	1.04
SD	r ⁽³⁾	1.00	0.95	0.95	0.97	1.03	1.00	1.04	0.97	1.03	0.96	0.98	0.99	1.01	1.03	1.02	1.02	1.03	1.01	0.97	1.00
	r ⁽⁴⁾	0.99	0.94	0.95	0.96	1.02	0.99	1.03	0.96	1.02	0.95	0.97	0.99	1.00	1.02	1.01	1.01	1.02	1.00	0.97	0.99
	r ⁽¹⁾	0.62	0.59	0.59	0.68	0.78	0.64	0.71	0.65	0.76	0.63	0.56	0.70	0.61	0.69	0.71	0.65	0.74	0.70	0.69	0.74
	r ⁽²⁾	–0.78	–0.70	–0.76	–0.67	–0.67	–0.68	–0.69	–0.63	–0.71	–0.66	–0.71	–0.59	–0.75	–0.67	–0.65	–0.78	–0.64	–0.74	–0.61	–0.61
CS	r ⁽³⁾	–0.07	–0.03	–0.07	0.02	0.07	0.00	0.02	0.03	0.05	0.00	–0.05	0.07	–0.05	0.03	0.05	–0.04	0.06	0.02	0.05	0.08
	r ⁽⁴⁾	–0.04	0.00	–0.05	0.03	0.08	0.02	0.04	0.05	0.07	0.02	–0.03	0.08	–0.02	0.06	0.07	–0.01	0.07	0.04	0.06	0.10
	r ⁽¹⁾	3.31	3.14	3.35	3.46	3.69	3.21	3.44	3.23	3.62	3.23	3.04	3.31	3.23	3.27	3.34	3.35	3.56	3.41	3.37	3.51
	r ⁽²⁾	3.71	3.63	3.64	3.48	3.50	3.47	3.51	3.38	3.64	3.47	3.50	3.20	3.64	3.49	3.46	3.88	3.36	3.87	3.32	3.38
CK	r ⁽³⁾	2.86	2.77	2.90	2.85	2.90	2.75	2.85	2.76	2.91	2.78	2.74	2.73	2.80	2.80	2.81	2.90	2.83	2.86	2.80	2.84
	r ⁽⁴⁾	2.72	2.64	2.76	2.72	2.74	2.63	2.71	2.64	2.75	2.66	2.62	2.62	2.66	2.68	2.69	2.75	2.68	2.71	2.68	2.70

Source: Authors’ own.

Figure 2

Source: Authors’ own.

8 Empirical application

In the following, we analyze a real data set obtained by means of an R package called faraway, corresponding to the projected sales (regressor, x_i, in M$) and actual sales (response variable, Y_i, in M$) of 20 consumer products; for more details, see Whitmore (1986). These data can be accessed from the faraway package through the command data(cpd) and these are displayed in Table 2.

8.1 Exploratory data analysis

Table 3 provides a descriptive summary of the actual sales (in M$) that includes median (MD), mean ( $\bar{y}$ ), SD, coefficient of variation (CV), CS, CK, and minimum (y₍₁₎) and maximum (y_(n)) values. Figure 3 shows a scatterplot of projected and actual sales, the histogram, the usual boxplot and the adjusted boxplot of the actual sales. Adjusted boxplots for asymmetric data can be constructed by the command adjbox () of an R package called robustbase.

Table 2

Projected (X) and actual (Y) sales of 20 consumer products

Product	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20
X	5959	3534	2641	1965	1738	1182	667	613	610	549	527	353	331	290	253	193	156	133	122	114
Y	5673	3659	2565	2182	1839	1236	918	902	756	500	487	463	225	257	311	212	166	123	198	99

Source: Whitmore (1986).

Table 3

statistics for the actual sales (in M$)

MD	$\bar{y}$	SD	CV	CS	CK	y₍₁₎	y_(n)	n
493.50	1138.55	1399.55	1.23	1.93	6.20	99	5673	20

Source: Authors’ own.

Based on Table 3 and Figure 3, we conduct an exploratory data analysis (EDA). First, from Figure 3(a), we see the actual sale for the ith product (y_i) is linearly related to the ith projected sale (x_i). In addition, this figure shows evidence that the line goes through the origin. Such an aspect is evaluated in Subsection 8.2. Also, from Figure 3(a), we observe the variability of the sales tends to increase as the sale values increase. This observation, which could be an indication of a non-constant variance in the data, is evaluated in Subsection 8.2. Second, from the histogram displayed in Figure 3(b), we note that the values of actual sales have an empirical distribution that is unimodal and positively skewed, whose values are concentrated primarily in the range [0, 1000]. Third, from Figure 3(c), we detect two atypical data by the usual boxplot, but the adjusted boxplot for asymmetric data do not show atypical data. Fourth, from Table 3 and Figure 3, we detect that the amplitude of the data suggests a high variability, which is corroborated by the high value of their CV (123%). In consequence, the BS regression model given in (3.1) may be suitable for describing the mean of the data, and the non-constant variance and asymmetry detected in these data. It is worth to highlight that none of the BS regression models postulated until now can simultaneously describe all these aspects.

Figure 3

Source: Authors’ own.

8.2 Estimation and verification of assumptions

Based on the EDA performed in Subsection 8.1, we assume the response Y_i ∼ BS(µ_i, δ) for sale data. Then, the systematic component of the regression model on the mean, by using the identity link function, is expressed as µ_i(8.1)β₁β₂x_ii

with β₁ and β₂ being the regression coefficients and x_i the value of the regressor X. We fit the BS model by using the command bsreg.fit ( ). The ML estimates of the parameters of the model given in (8.1), with approximate estimated standard errors (SEs) in parenthesis, are: ${\hat{β}}_{1}$ = 2.8714(18.1208), ${\hat{β}}_{2}$ = 1.0802(0.0731) and $\hat{δ}$ = 49.6477(15.7000). Note that only β₁ is not significant at 5%, as expected by the EDA that we performed. Thus, the selected model is µ_i(8.2)β₂x_ii

The ML estimates of the parameters of the model given in (8.2), with approximate estimated SEs in parenthesis, are: ${\hat{β}}_{2}$ = 1.0891(0.0493) and $\hat{δ}$ = 49.2284(15.5674). This conducts to the prediction model

{\hat{μ}}_{p r e d} = 1.0891 x_{p r e d} .

(8.3)

The assumptions of the model given in (8.2) are verified by a residual analysis based on sale data. The normal probability plot with envelope for the DC residual (r⁽⁴⁾) provided in Figure 4(a) is used to verify the distributional assumption of the model given in (8.2). This figure does not show unusual features, so that the assumption that the response variable follows a BS distribution does not seem to be unsuitable. In addition, the independence assumption also is verified by the normal probability with envelope and by the plot of residuals displayed in Figure 4(b), from which outlying observations are not detected. In Section 3, we mentioned that one of the assumptions of the proposed model is that the relation between the variance and mean is described by $V a r [Y_{i}] = μ_{i}^{2} / ϕ$ , which seems to correct the non-constant variance problem detected in Subsection 8.1; see Figure 4(b).

We also verify whether the identity link function used in the model given in (8.2) is correct or not. We have that ${\hat{z}}_{2}$ defined in (6.4) is given by ${\hat{z}}_{2} = \hat{η} + \hat{υ} * ⊙ \hat{z},$ where $\hat{v} * = {(1 / {\hat{v}}_{1}^{*}, ..., 1 / {\hat{v}}_{n}^{*})}^{⊤},$ with ${\hat{v}}_{i}^{*}$ being defined in (6.5) and ‘⊙’ representing the Hadamard product. The plot of ${\hat{z}}_{2 i}$ against ${\hat{η}}_{i}$ is utilized to verify the adequacy of the link function, where a linear tendency is needed. From Figure 4(c), note that the identity link function seems to be appropriate for our model.

Figure 4

Source: Authors’ own.

8.3 Diagnostic analysis

Influence diagnostics for the BS regression model given in (3.1) are presented in Figure 5. We also construct a GL plot (omitted here) based on the results of Section 5, which shows that observation #13 is a leverage point. Figure 5 displays index plots of C_i, from where observations #13 and #19 are detected as potentially influential. Also, we analyze index plots of | l _max|, but the results are similar to those presented for the index plots of C_i, so that we omit these results here. Figure 5 (g)–(h) shows plots of C_i against x_i. These plots indicate that small values of the regressor have a moderate influence on the estimates; see, e.g., observation #19.

We now investigate the impact on the model inference when the cases detected as potentially influential in the diagnostic analysis are removed. Then, we again estimate the model parameters after removing the sets of observations {13}, {19} and {13, 19}. Table 4 provides the relative changes (RCs) in the parameter estimates and in their corresponding estimated SEs, by using the sale data. These changes are calculated from

R C_{θ_{j (i)}} = | \frac{{\hat{θ}}_{j} - {\hat{θ}}_{j (i)}}{{\hat{θ}}_{j}} | \times 100 % and R C_{S E {({\hat{θ}}_{j})}_{(i)}} = | \frac{\hat{S E} ({\hat{θ}}_{j}) - \hat{S E} {({\hat{θ}}_{j})}_{(i)}}{\hat{S E} ({\hat{θ}}_{j})} | \times 100 %,

where ${\hat{θ}}_{j (i)}$ and $\hat{S E} {({\hat{θ}}_{j})}_{(i)}$ denote the ML estimates of θ_j and of the SE of the corresponding estimator, respectively, obtained after removing the ith observation, for j = 1, 2 and i = 1, …, 20, with θ₁ = β₂ and θ₂ = δ. From Table 4, note that the most important RCs are detected for the estimates of δ, where the largest values are associated with both observations #13 and #19. However, there no inferential changes are found. The results presented in this table show that the diagnostic measures derived in this article identify potentially influential points, but these do not affect the inference of the model. In summary, these diagnostic analyses based on the local influence approach and residuals confirm that the BS regression model presented in (8.2) is stable to the atypical points detected and quite suitable for modelling the sale data.

8.4 The sale prediction model

Figure 6(a) shows a scatterplot of projected sales against actual sales, with the estimated model given in (8.2). Note that the fitted line (proposed model) has a good agreement with the real data. We may interpret the estimated coefficients of the final model given in (8.3) as follows. Suppose that the projected sale value increases by p × 100%, so that the projected value is given by x_p = [1 + p]x, for 0 < p < 1. Thus, the predicted mean sale value is given by µ(x_p) = β₂[1 + p]x, and the mean ratio is

\frac{μ (x_{p})}{μ (x)} = 1 + p .

Therefore, if the projected sale increases p%, then the mean sale increases p% too. Based on (3.6) and considering the linear predictor µ( x _pred) = β₂ x _pred, Figure 6(b) provides approximately 95% confidence bands for the mean actual sale (µ( x _pred)), where x _pred ∈ (114, 5959) is the projected sale, both in M$.

Figure 5

Source: Authors’ own.

Table 4

RCs (in %) in ML estimates and in the corresponding estimated SEs for the indicated removed case(s), and respective p-values using sale data and the model given in (3.1)

Removed cases		β₂ (projected sale)	δ (precision)
None	RC( $\hat{θ}$ )	–	–
	RC( $\hat{S E}$ )	–	–
	p-value	< 0.01	< 0.01
{13}	RC( $\hat{θ}$ )	1.98	30.04
	RC( $\hat{S E}$ )	8.34	33.41
	p-value	< 0.01	< 0.01
{19}	RC( $\hat{θ}$ )	2.58	23.59
	RC( $\hat{S E}$ )	10.17	26.80
	p-value	< 0.01	< 0.01
{13, 19}	RC( $\hat{θ}$ )	0.65	72.65
	RC( $\hat{S E}$ )	20.44	81.99
	p-value	< 0.01	< 0.01

Source: Authors’ own.

Figure 6

Source: Authors’ own.

9 Conclusions

The new BS regression models proposed in this article have characteristics that are unavailable in the models of this type existing in the literature. Specifically, the new models allow us to describe the mean of the data in their original scale, unlike the existing models, which employ a logarithmic transformation of the data, provoking a possible reduction of the power of the study and difficulties of interpretation. In addition, the new models enable us to describe data with non-constant variance. None of the existing BS models can simultaneously describe all these aspects. Furthermore, the new models are very flexible, because they permit us to use different non-negative link functions to relate the mean with the regressors. We have proposed four types of residuals for the new models and conducted a simulation study to establish their empirical properties in order to evaluate their performances. From this study, we have detected that the deviance component residual, usually employed in generalized linear models, is the most appropriate, which is coherent because the new models have been proposed using a similar idea to these generalized linear models. Moreover, we have developed methods of local influence to assess the potential influence of some observations on the model by using several perturbation schemes. Finally, we have performed a statistical modelling with real data by using the new approach proposed in the article, which have shown the importance of our proposal. The methodology introduced in this article has been implemented in the R software and it is available to interested users.

Footnotes

Acknowledgements

The authors thank the Editor, Professor Jeffrey Simonoff, an anonymous Associate Editor and two anonymous referees for their constructive comments on an earlier version of this manuscript, which resulted in this improved version. The authors gratefully acknowledge financial support from CAPES, CNPq and FACEPE, Brazil, and from FONDECYT 1120879 grant, Chile.

Hessian matrix

For the BS regression model given in (3.1), the second derivative of ( θ ) with respect:

to β_j and β_i, for j, l = 1, …, p, is (A1)

\frac{\partial^{2} ℓ (θ)}{\partial β_{j} \partial β_{l}} = \sum_{i = 1}^{n} \underset{c_{i}}{\underset{︸}{{\frac{\partial^{2} ℓ_{i} (μ_{i}, δ)}{\partial μ_{i}^{2}} {[\frac{d μ_{i}}{d η_{i}}]}^{2} + \frac{\partial ℓ_{i} (μ_{i}, δ)}{\partial μ_{i}} [\frac{\partial}{\partial μ_{i}} \frac{d μ_{i}}{d η_{i}}] \frac{d μ_{i}}{d η_{i}}}}} x_{i j} x_{i l};

we can group the values obtained in matrix form as $\overset{..}{ℓ_{β β}} = X^{⊤} D (c) X,$ where c = (c₁, …, c_n)^T;

to β_j and δ, for j = 1, …, p, is (A2)

\frac{\partial^{2} ℓ (θ)}{\partial β_{j} \partial δ} = \sum_{i = 1}^{n} \underset{m_{i}}{\underset{︸}{{\frac{y_{i}}{{[δ y_{i} + y_{i} + δ μ_{i}]}^{2}} + \frac{y_{i}}{4 μ_{i}^{2}} - \frac{δ [δ + 2]}{4 {[δ + 1]}^{2} y_{i}}}}} a_{i} x_{i j},

where a_i is the ith element of a given in (3.3); the expression provided in (A2) can also be represented in matrix form as $\overset{..}{ℓ_{β δ}} = X^{⊤} D (a) m,$ where m = (m₁, …, m_n)^T;

to δ is (A3)

\frac{\partial^{2} ℓ (θ)}{\partial δ^{2}} = \sum_{i = 1}^{n} \underset{d_{i}}{\underset{︸}{{\frac{1}{2 {[δ + 1]}^{2}} - \frac{{[y_{i} + μ_{i}]}^{2}}{{[δ y_{i} + y_{i} + δ μ_{i}]}^{2}} - \frac{μ_{i}}{2 {[δ + 1]}^{3} y_{i}}}}};

the expression given in (A3) can be represented in matrix form as $\overset{..}{ℓ_{δ δ}} = t r (D (d)),$ where d = (d₁, …, d_n)^T.

Then, the Hessian matrix can be expressed as

\overset{..}{ℓ_{θ θ}} = \begin{matrix} .. \\ .. \end{matrix} [\begin{matrix} ℓ_{β β} & ℓ_{β δ} \\ ℓ_{δ β} & ℓ_{δ δ} \end{matrix}] \begin{matrix} .. \\ .. \end{matrix} = [\begin{matrix} X^{⊤} D (c) X & X^{⊤} D (a) m \\ m^{⊤} D (a) X & tr (D (d)) \end{matrix}] .

References

Atkinson

(1985) Plots, transformations and regression. Oxford: Oxford University.

Balakrishnan

Gupta

Kundu

Leiva

Sanhueza

(2011) On some mixture models based on the Birnbaum-Saunders distribution and associated inference. Journal of Statistical Planning and Inference, 141, 2175–90.

Bhatti

(2010) The Birnbaum-Saunders autoregressive conditional duration model. Mathematics and Computers in Simulation, 80, 2062–78.

Barros

Paula

Leiva

(2008) A new class of survival regression models with heavy-tailed errors: robustness and diagnostics. Lifetime Data Analysis, 14, 316–32.

Barros

Paula

Leiva

(2009) An R implementation for generalized Birnbaum–Saunders distributions. Computational Statistics and Data Analysis, 53, 1511–28.

Birnbaum

Saunders

(1969a) A new family of life distributions. Journal of Applied Probability, 6, 319–27.

Birnbaum

Saunders

(1969b) Estimation for a family of life distributions with applications to fatigue. Journal of Applied Probability, 6, 328–47.

Caro-Lopera

Leiva

Balakrishnan

(2012) Connection between the Hadamard and matrix products with an application to a matrix–variate Birnbaum–Saunders distribution. Journal of Multivariate Analysis, 104, 126–39.

Cook

(1986) Assessment of local influence. Journal of the Royal Statistical B, 48, 133–69.

10.

Cox

Hinkley

(1974) Theoretical statistics. London: Chapman & Hall.

11.

Díaz-García

Leiva

(2005) A new family of life distributions based on the elliptically contoured distributions. Journal of Statistical Planning and Inference, 128, 445–457.

12.

Ferrari

SLP

Cribari-Neto

(2004) Beta regression for modelling rates and proportions. Journal of Applied Statistics, 31, 799–815.

13.

Ferreira

Gomes

Leiva

(2012) On an extreme value version of the Birnbaum-Saunders distribution. Revstat–Statistical Journal, 10, 181–210.

14.

Fierro

Leiva

Ruggeri

Sanhueza

(2013) On a Birnbaum–Saunders distribution arising from a non–homogeneous Poisson process. Statistics and Probability Letters, 83, 1233–39.

15.

Galea

Paula

Leiva

(2004) Influence diagnostics in log-Birnbaum–Saunders regression models. Journal of Applied Statistics, 31, 1049–64.

16.

Gómez

Olivares-Pacheco

Bolfarine

(2009) An extension of the generalized Birnbaum–Saunders distribution. Statistics and Probability Letters, 79, 331–38.

17.

Guiraud

Leiva

Fierro

(2009) A non-central version of the Birnbaum-Saunders distribution for reliability analysis. IEEE Transactions on Reliability, 58, 152–60.

18.

(2012) Parameter estimation for the Birnbaum–Saunders distribution under an accelerated degradation test. European Industrial Engineering, 6, 644–65.

19.

Huang

(2006) The loss in power when the test of differential expression is performed under a wrong scale. Journal of Computational Biology, 13, 786–97.

20.

Johnson

Kotz

Balakrishnan

(1995) Continuous univariate distributions. Vol. 2. New York: John Wiley & Sons.

21.

Jørgensen

(1984) The delta algorithm and GLIM. International Statistical Review, 52, 283–300.

22.

Kotz

Leiva

Sanhueza

(2010) Two new mixture models related to the inverse Gaussian distribution. Methodology and Computing in Applied Probability, 12, 199–212.

23.

Leiva

Barros

Paula

Galea

(2007) Influence diagnostics in log-Birnbaum–Saunders regression models with censored data. Computational Statistics and Data Analysis, 51, 5694–707.

24.

Leiva

Barros

Paula

Sanhueza

(2008) Generalized Birnbaum–Saunders distributions applied to air pollutant concentration. Environmetrics, 19, 235–49.

25.

Leiva

Sanhueza

Angulo

(2009) A length-biased version of the Birnbaum–Saunders distribution with application in water quality. Stochastic Environmental Research and Risk Assessment, 23, 299–307.

26.

Leiva

Vilca

Balakrishnan

Sanhueza

(2010) A skewed sinh-normal distribution and its properties and application to air pollution. Communications in Statistics– Theory and Methods, 39, 426–43.

27.

Leiva

Athayde

Azevedo

Marchant

(2011) Modeling wind energy flux by a Birnbaum–Saunders distribution with unknown shift parameter. Journal of Applied Statistics, 38, 2819–38.

28.

Leiva

Ponce

Marchant

Bustos

(2012) Fatigue statistical distributions useful for modeling diameter and mortality of trees. Colombian Journal of Statistics, 35, 349–67.

29.

Lemonte

Cordeiro

(2009) Birnbaum–Saunders nonlinear regression models. Computational Statistics and Data Analysis, 53, 4441–452.

30.

Lemonte

Patriota

(2011) Influence diagnostics in Birnbaum–Saunders non-linear regression models. Journal of Applied Statistics, 38, 871–84.

31.

Lesaffre

Verbeke

(1998) Local influence in linear mixed models. Biometrics, 38, 963–74.

32.

Liu

Lin

Piegorsch

(2008) Construction of exact simultaneous confidence bands for a simple linear regression model. International Statistical Review, 76, 39–57.

33.

Marchant

Bertin

Leiva

Saulo

(2013a) Generalized Birnbaum–Saunders kernel density estimators and an analysis of financial data. Computational Statistics and Data Analysis, 63, 1–15.

34.

Marchant

Leiva

Cavieres

Sanhueza

(2013b) Air contaminant statistical distributions with application to PM10 in Santiago, Chile. Reviews of Environmental Contamination and Toxicology, 223, 1–31.

35.

Owen

Padgett

(2000) A Birnbaum–Saunders accelerated life model, IEEE Transactions on Reliability, 49, 224–29.

36.

Paula

Leiva

Barros

Liu

(2012) Robust statistical modeling using the Birnbaum–Saunders t-distribution applied to insurance. Applied Stochastic Models in Business and Industry, 28, 16–34.

37.

Rieck

Nedelman

(1991) A log-linear model for the Birnbaum–Saunders distribution. Technometrics, 33, 51–60.

38.

Santos-Neto

Cysneiros

FJA

Leiva

Ahmed

(2012) On new parametrizations of the Birnbaum–Saunders distribution. Pakistan Journal of Statistics, 1, 1–26.

39.

Vanegas

Rondon

Cysneiros

FJA

(2012) Diagnostic procedures in Birnbaum–Saunders non-linear regression models. Computationa Statistics and Data Analysis, 56, 1662–80.

40.

Vilca

Leiva

(2006) A new fatigue life model based on the family of skew-elliptic distributions. Communications in Statistics – Theory and Methods, 35, 229–44.

41.

Vilca

Sanhueza

Leiva

Christakos

(2010) An extended Birnbaum–Saunders model and its application in the study of environmental quality in Santiago, Chile. Stochastic Environmental Research and Risk Assessment, 24, 771–82.

42.

Vilca

Santana

Leiva

Balakrishnan

(2011) Estimation of extreme percentiles in Birnbaum–Saunders distributions. Computational Statistics and Data Analysis, 55, 1665–78.

43.

Xie

Wei

(2007) Diagnostics analysis for log-Birnbaum–Saunders regression models. Computational Statistics and Data Analysis, 51, 4692–706.

44.

Wei

Fung

(1998) Generalized leverage and its applications. Scandinavian Journal of Statistics, 25, 25–37.

45.

Whitmore

(1986) Inverse Gaussian ratio estimation. Journal of the Royal Statistical C, 35, 8–15.