Abstract
Spatio-temporal models are becoming increasingly popular in recent regression research. However, they usually rely on the assumption of a specific parametric distribution for the response and/or homoscedastic error terms. In this article, we propose to apply semiparametric expectile regression to model spatio-temporal effects beyond the mean. Besides the removal of the assumption of a specific distribution and homoscedasticity, with expectile regression the whole distribution of the response can be estimated. For the use of expectiles, we interpret them as weighted means and estimate them by established tools of (penalized) least squares regression. The spatio-temporal effect is set up as an interaction between time and space either based on trivariate tensor product P-splines or the tensor product of a Gaussian Markov random field and a univariate P-spline. Importantly, the model can easily be split up into main effects and interactions to facilitate interpretation. The method is presented along the analysis of spatio-temporal variation of temperatures in Germany from 1980 to 2014.
Keywords
Introduction
In many areas of applied sciences, longitudinal data are recorded at several locations and multiple time points. Simple statistical models then estimate the temporal and the spatial effects as additive components (see Fahrmeir et al., 2004). Hence, the impact of time and space is assumed to be independent of each other while, for example, variations of the spatial effect over time are not considered. To allow for such interactions, statisticians developed several approaches to incorporate both time and space jointly in interaction, the so-called spatio-temporal models. Since these models are rather complex and computationally demanding, Cressie and Wikle (2011) called them the ‘next frontier’. In their book, they explain ideas how to estimate spatio-temporal Kriging models. An alternative that deals with interactions of smooth effects was introduced by Gu (2002), where tensor products of smoothing splines are discussed. However in Gu (2002), the penalty term remains the integral of the second derivative, such that the estimation bases on more complex techniques like reproducing kernel Hilbert spaces. To simplify the estimation in comparison with smoothing splines, P-splines were introduced by Eilers and Marx (1996). P-splines are easier to calculate due to their approximation of the penalty via differences of the basis coefficients. A first step towards spatio-temporal models using P-splines was the introduction of bivariate P-splines as tensor products by Eilers and Marx (2003). The extension of the bivariate case to the trivariate case, the spatio-temporal model, is then conceptually straightforward. Wood (2006) was among the first to discuss trivariate splines in detail and developed an alternative penalization which is based on a theoretical understanding of smoothness in larger dimensions. The separation into main effects and interaction effects is important for the practical use and interpretation of spatio-temporal models. Therefore, Wood (2006), Lee and Durbán (2009), Lee and Durbán (2011) and Wood et al. (2013) developed several approaches mostly relying on the representation of P-splines as mixed models (see Fahrmeir et al., 2004). In this article, we will make use of a representation of spatio-temporal models which is achieved without using the mixed model decomposition. This representation was introduced by Wood (2017), and based on the corresponding
In most of the aforementioned articles, a Gaussian distribution for the error terms is assumed and only some refer to other distributions usually from the exponential family. Even if the error distribution was specified correctly, the assumption of homoscedasticity of the errors might still not be fulfilled. If we suppose that the reasons for heteroscedasticity are measured by some covariates, generalized additive models for location, scale and shape (GAMLSS; Rigby and Stasinopoulos, 2005; Stasinopoulos et al., 2017) can be applied. These models have separate regression predictors assigned to each parameter of the distribution. Umlauf et al. (2016) model the spatio-temporal distribution of rainfall in Austria with the help of the Bayesian version of GAMLSS (Klein et al., 2015). Therefore, besides the mean they also model the variance parameters of the normal distribution with some spatio-temporal trend. This model can be used to show that the variance increases for specific regions or times. Alternatively, models for extreme values can also be used to discuss effects beyond the means and to detect specific structures for extremal events. Umlauf and Kneib (2018) and Kneib et al. (2017), for example, build complex covariate structures for the generalized Pareto distribution to study, for example, 100-year return levels of rainfall.
In the GAMLSS framework, a large variety of distributions is available and therefore the given data can be modelled quite flexibly. However, the model always depends on the correct choice of the distribution and the link functions. Due to the complex design of the predictors, these choices are non-trivial (Rigby et al., 2013). An alternative method for distributional regression that also deals with heteroscedasticity is expectile regression as introduced by Newey and Powell (1987). With expectiles we do not assume a specific distribution and account for heteroscedasticity by putting more or less emphasis on specific parts of the distribution. Therefore, this model is very flexible and omits the specification of a parametric distribution. Basically, expectile regression is a weighted least squares regression, where the weights depend on the observations and the fitted values (for details see Section 3). Based on the similarity with ordinary mean regression, smooth effects can easily be incorporated in expectile regression. Therefore, semiparametric expectile regression has been introduced by Schnabel and Eilers (2009) and Sobotka and Kneib (2012). More details on inference in semiparametric expectile regression can also be found in Sobotka et al., 2013. Quantile regression (Koenker and Bassett, 1978) is an alternative to model effects beyond the mean without distributional assumption. Since quantiles are defined as generalization of the median, while expectiles are a generalization of the mean, they are easier to interpret, but harder to estimate, in particular in settings including smooth terms.
Thus, we will use expectile regression to analyse the spatio-temporal trend of temperature in Germany. Figure 2 displays the characteristics of the dataset. We estimate the distribution of temperatures depending on time and location. Based on expectile regression, we further determine where especially cold winters occur and which areas have relatively hot summers. In addition to the detection of increased variance in some regions, we may also specify the direction of the divergence from the mean.
In the remainder of this article, we recapture the ideas of semiparametric models, including spatio-temporal models, in Section 2. Afterwards, we continue with a brief introduction to expectile regression in Section 3 and discuss smoothing parameter selection in spatio-temporal expectile regression. In Section 4, we summarize a small simulation study on the smoothing parameter selection in semiparametric expectile regression with interactions. As an example, the spatio-temporal analysis of temperatures in Germany is displayed in Section 5. Finally, we conclude with a discussion in Section 6.
Semiparametric regression models
Basis function approaches
In most classical regression models, the effect for each covariate is assumed to be linear, or of a simple polynomial form. However, this is often not sufficiently flexible to cover the true underlying effect, which may result in biased estimates. Hastie and Tibshirani (1986) therefore introduced the class of generalized additive models (GAM) where the effect per covariate is defined as a smooth curve. One possibility to specify the smooth curve
We index each effect, each covariate and all coefficients with the number of the covariate to be consistent with the notation for the later following tensor products. The vector of function evaluations can then be written in matrix notation with
Additionally, to the additive model of univariate smooth effects, the smooth interaction between two covariates
where
where
Penalization
In basic spline regression, as defined in Section 2.1, the number and location of the basis functions need to be optimized. This is challenging for univariate splines and nearly impossible in higher dimensions. Consequently, Eilers and Marx (1996) introduced a technique called P-splines for univariate smooth functions where they start with a moderately large number of basis functions but restrict the curves to be smooth. Therefore, they penalize the wiggliness of the curves by adding a penalty term to the least squares argument such that not only the optimal model fit is a criterion but also the smoothness of the function. Additionally, this method has the advantage that the locations and the number of basis functions do not need to be optimized anymore. In general, smoothness of a function can be quantified based on the integrated squared second derivative of the function. Estimating the second derivative of the unknown function can be challenging, but for B-splines with equidistant knots Eilers and Marx (1996) showed that the penalty can be approximated by the sum of the coefficients second order differences
which can be minimized analytically for fixed smoothing parameter
Each of the smooth effects needs a penalty for controlling the wiggliness. Thus, the penalized least squares criterion of the additive model is given as
where
Based on the univariate penalties, penalties for bivariate splines can also be constructed. Eilers and Marx (2003) suggest to use the sum of the squared differences of the coefficients in each covariate direction to obtain a valid penalty. In detail, they propose to apply the joint penalty
to penalize the surfaces wiggliness. Therefore,
Spatio-temporal models
In analogy to the interaction between two covariates, we can use the aforementioned strategy to build interactions in any dimensions (see Wood, 2006). An application of a trivariate interaction is the temporal variation of a spatial effect (or, vice versa, the spatial variation of a temporal effect) in a spatio-temporal model. Therefore, we build the trivariate smooth surface based on the univariate basis functions as
where lon and lat represent the longitudinal and the latitudinal coordinate of an observation, respectively. Furthermore,
with penalty matrices
The identifiability of semiparametric regression has, in this article, not been discussed so far. Applying several basic univariate P-splines in one model is not identified since each smooth function could be shifted on the y-axis and the shift would be absorbed by the intercept or another smooth function. Therefore, the identifiability constraint
An alternative approach to achieve identifiability is to transform the basis functions and the penalty matrices with a QR-decomposition of the vector of column means of
where
where
The separation as mixed models and the separation based on the QR decomposition results in the same predictions if appropriate smoothing parameters are chosen. However, we focus here on the separation based on the QR decomposi- tion since this is included in the
So far, we estimated the spatial trend based on the exact longitudinal and latitudinal coordinates of the location. In many datasets, however, the location is only measured in regional grids such as ZIP-code areas or states. A simple model would use the regional information as a factor variable and estimate independent effects for each region. However, this results in rather wiggly estimates comparing neighbouring regions. Therefore, our goal is to estimate a smooth surface of the spatial effect also with regional data. Generally, the smoothed regional effects approach is motivated by Gaussian Markov random fields (GMRF; Rue and Held, 2005). The spatial surfaces based on the categorical covariates can be estimated as smooth effects
For estimation, the design matrix of the regional covariate
To get a smooth effect, differences between neighbouring regions are penalized with a penalty matrix
where
with
where
Expectile regression
The classical linear model is based on the assumption of homoscedasticity. If this assumption is violated, the standard estimators for the mean effects are still valid while the estimation of their uncertainty is not valid anymore. Moreover, we are then interested what the drivers for the heteroscedasticity are and how they influence the response. There are several models, that allow for heteroscedastic error terms and analyse the drivers, including GAMLSS (Rigby and Stasinopoulos, 2005) and quantile regression (Koenker and Bassett, 1978). As discussed in the introduction, we will apply expectile regression as introduced by Newey and Powell (1987) in this article. For the introduction of expectile regression, we start with only linear effects in Section 3.1, before we introduce semiparametric expectile regression in Section 3.2.
Classical expectile regression
In ordinary least squares models, the sum of squared residuals should be minimized with respect to the regression coefficients. Taking the derivative of the least squares equation with respect to these coefficients and setting it to zero yields the least squares estimate. Furthermore, rearranging the derivative with respect to the intercept shows that the sum of the residuals must be 0. Thus, the solution will be the predictor, where the sum of the residuals above and below are equal which corresponds to the centre, of gravity in a physical interpretation. In expectile regression, the emphasis is now put on outer parts of the distribution to detect variation of the effects beyond the mean. Therefore, in the least squares equation, a weight
where the weights are defined as
Thereby the predictor
The solution for Equation (3.1) can be written in matrix notation as
where
In Section 3.1, we defined expectile regression for linear predictors. In order to gain more flexibility, semiparametric predictors are useful. Therefore, Schnabel and Eilers (2009) and Sobotka and Kneib (2012) introduced semiparametric expectile regression with the penalized LAWS criterion
To simplify notation,
In our application, we use the power (both with respect to memory requirement and computational speed) of the
proved to have good properties in Schnabel and Eilers (2009) as a criterion for the optimization. Moreover, it is straightforward to the classical optimization of smoothing parameters in the model based on homoscedastic Gaussian distributed errors (see Wood, 2017). In the earlier formula,
Due to the possibility to write P-splines as mixed models, the Schall algorithm (Schall, 1991) for selecting smoothing parameters was introduced to semiparametric expectile regression by Schnabel and Eilers (2009). However, the Schall algorithm only allows for one smoothing parameter per smooth term, since it needs a standard form in the penalty matrix (see Wood and Fasiolo 2017, p. 3 or Rodríguez-Álvarez et al.2015, p. 943). Thus, the Schall algorithm is not applicable in spatio-temporal models, where we have multiple smoothing parameters per smooth term. Alternatively, the generalized Fellner–Schall algorithm of Wood and Fasiolo (2017) can be adopted to semiparametric expectile regression by interpreting expectile regression as weighted linear regression. Moreover, the generalized Fellner–Schall algorithm allows for the estimation of smoothing parameters in interaction settings. Other approaches such as Lee et al.(2013), Rodríguez-Álvarez et al. (2015) and Rodríguez-Álvarez et al. (2018) also provide powerful extensions to the standard approach of Schall (1991) for interaction settings. These approaches could also be used in expectile regression due to the equivalence to weighted least squares. However, the Fellner–Schall algorithm for expectile regression can be implemented based on the memory saving algorithms of the
In the generalized Fellner–Schall algorithm, the smoothing parameters are optimized iteratively with the model fit. So the model is estimated given some smoothing parameters λ. Thus,
where
where
Alternatively to expectile regression, quantile regression (Koenker and Bassett, 1978; Koenker, 2005) can be used to estimate models beyond the mean. Therefore, in Equation (3.1), the
Applying the GCV to select smoothing parameters in semiparametric expectile regression with interactions is straightforward. Nevertheless the generalized Fellner–Schall algorithm has, to our knowledge, never been used in expectile regression before. Therefore, we provide in this section a small simulation study to compare the goodness of fit of both approaches.
The covariates
Here
Overall, 100 independent replications of expectile models with
with 15 B-spline basis functions in each direction. Since the smoothing parameters in the Fellner–Schall algorithm have to be sized neither too small nor too large, we restrict them to be larger than
For the central asymmetries, similar outputs for the optimization via GCV and generalized Fellner–Schall occur. Comparing more extreme asymmetries results in differences between the optimization methods. The Fellner–Schall algorithm has benefits in the interaction setting (
PMWSE of the simulation study
PMWSE of the simulation study
The distribution of temperatures in Germany motivated us for a spatio-temporal estimation beyond the mean. Therefore, we use data from the German weather service (DWD, 2017). The response variable is the daily mean temperature in
Location of the observation stations and marginal density of daily mean temperature
Location of the observation stations and marginal density of daily mean temperature
To model the spatial and temporal variation of the temperatures (in
for each asymmetry parameter
As discussed in Section 2.3, the effects
For the estimation, we apply penalized cubic B-splines with 15 basis functions for the spline of the year. Moreover, the spatio-temporal effect has 15 basis functions for the daily effect and 6 respectively 9 basis functions for the univariate spatial marginals. The spatial effect has only few basis functions in each direction to obtain reasonable computational times. Furthermore, with a higher number of basis functions, we obtain instabilities at the borders due to the small number of observation stations in certain regions close to the German border. Lee et al. (2013) proposes to use different number of basis function in nested designs. However, then we would assume that in the main effects more information is contained than in the interaction, which we do not assume in our model. For the estimation of the elevation effect, only 7 basis functions are applied to avoid arbitrary results due to gaps in the parameter space. The smoothing parameters are optimized via GCV (see Section 3.2). The results for optimizing the smoothing parameters via the generalized Fellner–Schall algorithm are similar and available on demand.
In Figure 3, the estimated main effects for elevation and day are displayed. We see some variation between the mean effects and the effects at the outer parts of the distribution. While for the low areas the curves are parallel, which indicates homo- scedasticity, the outer expectile curves diverge from the mean for higher altitudes. The difference looks rather small, but we are talking about
Main effects for elevation, day and year (including intercept)
For the main effect of the day in the year, as displayed in Figure 3 in the middle picture, we can detect that the lower expectiles have a wider range than the upper expectiles. This means, in winter the lower temperatures are often lower than expected and in summer the low temperatures are higher than expected. Moreover, the difference between the lower expectile and the upper expectile is in winter higher than in the summer. Thus, the expected variance of temperatures in winter is higher than in summer. Furthermore, the high temperatures in summer are not as high as they had to be if the underlying process was a homoscedastic Gaussian distribution.
By including the parameter year in the model, we control for varying effects in specific years. Additionally, we can check if we find some impact of the climate change in this rather small example. The estimated curve for the trend per year is also plotted in Figure 3 on the right. There we detect some small general increase in the temperatures, beyond the natural fluctuation.
The spatio-temporal result for
Prediction of the spatial effect for 31 January and 31 July, excluding elevation and intercept
To get a better impression on how the spatial effect varies with time, we plotted the temporal effect of four German cities in the supplementary material. Their locations are indicated in the spatial effect maps. Out of this figure, we conclude that in Cologne the winters are a lot warmer than in Munich or Berlin, while the summers are colder in Hamburg and warmer in Berlin. This differentiation is valid for all parts of the distribution, but the amplitude of the temperatures is smaller for the upper part of the distribution than for the lower part.
Additional to the application of the exact location of the observation stations, we apply the same data but with locations on a regional level. Therefore, we assign to each location its ‘Raumordnungsregion’ (BBSR, 2017). The ‘Raumordnungsregion’ is a special German classification of regions between the NUTS 2 and NUTS 3 level. The advantage of this grid is that the 96 regions have similar size and we have for each region at least one observation station (with region 506, Dortmund, as the single exception without an observation station which is therefore merged with region 509, Emscher-Lippe). To model the spatio-temporal trend based on grid data, we then use the model of Section 2.4, while the rest of the covariate information remained unchanged. Thus, the model is built as
As expected, the effects for year and elevation as well as the seasonal trend are similar in this grid approach, compared to the original model. Also the spatial patterns as displayed in Figure 5 show a large similarity to the spatial effect based on the exact locations of Figure 4. We still observe an east–west trend in winter and a north–south trend in summer. Furthermore, the variation for colder winter is higher than for warmer winters.
Prediction of the regional effect for 31 January and 31 July, excluding elevation and intercept
All code and data to reproduce the analysis of this article are included in the online supplementary material.
In this article, we present spatio-temporal effects for expectile regression. Spatio-temporal modelling with interaction terms of P-splines is nowadays a well established method. However, usually the data are analysed with some pre-specified parametric distribution and the errors are assumed to be homoscedastic. Contrarily, in expectile regression no assumption on the distribution of the data is applied. Moreover, expectile regression is able to take heteroscedasticity of the data into account. Based on the idea of weighted least squares, spatio-temporal expectile regression can be applied with the help of tools from standard least squares regression. So it is a natural extension of the standard approaches. Furthermore, expectile regression can be used to check whether the homoscedasticity assumption is valid. Therefore, we check if all effects are equal for a grid of asymmetry parameters. This is similar to the test of Newey and Powell (1987). Our analysis showed that the assumption of homoscedasticity is not necessarily fulfilled for the temperature and the application of expectile regression is necessary. Thus, the effect of elevation varies for the different parts of the distribution and there are some regions where the spatial effect of the outer expectiles varies from the mean effect.
In our example, we do not have crossing expectiles. However, due to numerical issues they might happen, in particular when considering a dense grid of asymmetries. Thus, there exist several ideas to prevent crossing expectiles (see Schulze Waltrup et al., 2015 and citations therein). These ideas could also be transferred to spatio-temporal expectile regression, but are beyond the scope of this article. Alternatively to the analysis of temperatures in Germany, other meteorological parameters such as the amount of rain or the duration of sunshine could also be analysed and are likely to have effects beyond the mean. However, several of these parameters feature a spike in zero. If we would like to analyse these parameters with expectile regression, we would need to build a new type of model. First, the zeros need to be considered with, for example, a hurdle model (Mullahy, 1986). Second, in the following expectile regression the target set needs to be considered to prevent predicted negative expectiles. This could be achieved, for example, by including a link function around the classical expectile model. However, a fixed link function would impose a distributional assumption which is undesired in expectile regression. Thus, the link function should be estimated jointly with the covariate effects. GAM with flexible response function have, for example, been considered in Spiegel et al. (2019), where we modified the approach of Muggeo and Ferrara (2008) to also include smooth effects in the predictor. Extending this to semiparametric expectile regression is straightforwardly possible, but beyond the scope of this article and left for further research. Alternative approaches to deal with zero-inflation in spatio-temporal models (Umlauf et al., 2016 for example) usually deal with distributional assumptions which we avoid in expectile regression.
Supplementary materials
Supplementary materials for this article including all code and further graphics are available from
Acknowledgements
The comments of the editor and the referees are thankfully acknowledged due to their substantial improvement of the article. Furthermore, we thank Nikolaus Umlauf for providing code and material of his articles.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article:We also acknowledge financial support by the German Research Foundation (DFG), grant KN 922/4-2.
