Abstract
Testing for collinearity continues to be a controversial issue in the literature. Multicollinearity detection criteria, such as the variance inflation factor, often fail to detect the true extent of multicollinearity. In this article, we propose utilizing the Bayesian approach as an attractive alternative. Under the Bayesian approach, we recommend comparing the marginal posterior of regression parameters under two different priors. If the difference in the posterior under these two priors is pronounced, one can surmise that collinearity is harmful. The Kolmogorov–Smirnov test can also be used as further evidence to confirm whether the posterior difference is significant.
Introduction
Multicollinearity continues to be a major issue in the hospitality and tourism literature (Assaf et al., 2019; Dogru et al., 2017; Morley, 2009; Olya & Mehran, 2017). Recent studies have shown that multicollinearity detection criteria such as the variance inflation factor (VIF) or condition index (commonly used in the tourism literature) cannot be blindly trusted and often fail to detect the extent of multicollinearity in the data (Kalnins, 2018). Unfortunately, with traditional estimation methods such as ordinary least squares (OLS) and maximum likelihood, researchers do not have a variety of other options and must rely on such detection criteria.
In this article, we propose a different approach to testing for collinearity using the Bayesian approach. In tourism and other related literatures, the Bayesian approach has grown in popularity in recent years (Assaf et al., 2020; Assaf & Tsionas, 2019, 2020; Tsionas & Assaf, 2020). Not only does the Bayesian approach provide more robust tools for hypothesis testing and rich diagnostics for models and parameters but it also enables one to look at the marginal posterior density of parameters using the data in hand. Such advantages allow us to examine the extent to which collinearity is harmful to the estimation of regression models. We elaborate on this issue further using real and simulated data. We also suggest the Kolmogorov–Smirnov test to assess the difference in posterior densities under harmful and nonharmful collinearity.
Testing for Collinearity
Here, we do not intend to give an overview of Bayesian analysis, as this has been discussed in detail in some recent papers in the field (e.g., Assaf et al., 2020; Assaf & Tsionas, 2020). Basically, for a model with parameters
The denominator is also
One can build on this specific advantage of the Bayesian approach. Whenever multicollinearity is suspected, we recommend the following two-step approach. First, comparing the marginal posterior under two different priors. Specifically, one can monitor whether there is a significant change in the marginal posterior under these two priors. If there is, multicollinearity can be harmful. Second, post visual inspection of the marginal posterior, we recommend using the Kolmogorov–Smirnov test to confirm whether the univariate marginal corresponding posteriors are the same. If the test is rejected, one can further validate that the marginal posteriors are not the same under the two priors and that multicollinearity is harmful.
Illustration
Simulation
To illustrate, we first provide an example from simulated data. Our data-generating process is as follows:
where the common factor
Moreover,
where
We consider two priors for
where
where
where
From these expressions it is clear that the informative prior (Equation 4) acts pretty much like ridge regression, while the flat prior (Equation 3) acts like OLS. As mentioned, we study the difference between the marginal posteriors under extreme collinearity (
Certain marginal posteriors are reported in Figure 1. Evidently, the marginal posteriors are quite different under extreme collinearity (Panel a), but the differences disappear in Panel (b) in which we used

Marginal Posterior Densities
Marginal posteriors of the difference between the posteriors are reported in Panels (a), (c), and (e) of Figure 2. In Panels (b), (d), and (f), we reported median absolute differences between the two posterior means across all Markov Chain Monte Carlo draws. We used a Gibbs sampler with 15,000 passes and omitted the first 5,000 to remove the impact of possible start-up effects.

Marginal Posterior Densities of Differences of Parameters
An important question is whether, besides visual inspection of posterior distributions, it is possible to deliver a single measure that can inform the user about the possibility of harmful collinearity. In the case of Panel (a) of Figure 1, we have applied the Kolmogorov–Smirnov test as to whether the univariate marginal posteriors corresponding to Equations 3 and 4 are the same. The p value of the test was zero, indicating rejection of the null that the univariate marginal posteriors are the same. In the case of Panel (b) the Kolmogorov–Smirnov test does not reject the null and the minimum p value (which can be employed more generally as a harmful collinearity test) was .0153. Finally, when
The differences between parameters corresponding to the flat and informative priors in Figure 2 are more pronounced under extreme collinearity (Panels a and b) but tend to become smaller in moderate and absent collinearity (Panels c-f).
This is especially true of Panels (b), (d), and (f), which can be particularly useful for empirical researchers. So, the practical outcome is that harmful collinearity can be diagnosed by formal Bayesian analysis and histograms of median differences (as in Panels b, d, and f). More details and further results are provided in the appendix.
Hospitality Application
We also present an illustration of our proposed approach using an application on U.S. hotels. This regression focuses on the relationship between room revenue and the following covariates: room expenses, food and beverage expenses, utility expenses, marketing expenses, property and maintenance expenses, and number of rooms. We obtained all data from Smith Travel Research. The sample consists of 78 U.S. hotels (for the years 2012-2016) for a total of 390 observations. The correlation matrix for all variables included in the model (Table 1) clearly illustrates a high collinearity problem. Due to this, we expect the marginal posterior under OLS and the ridge-type prior to illustrate importance differences.
Correlation Matrix
The results for the regressors we have in our models are presented in Figure 3. As expected, the difference between the flat and informative priors are pronounced, indicating multicollinearity problems. The two priors have even resulted in different signs for some of the coefficients. We have also applied the Kolmogorov–Smirnov test as to whether the univariate marginal posteriors are the same. The p value of the test was less than .01 for all coefficients, indicating rejection of the null that the univariate marginal posteriors are the same.

Marginal Posterior Densities of Parameters Under Informative and Flat Priors
Concluding Remarks
The Bayesian approach continues to grow in popularity in tourism research. In this article, we exploited the advantages this approach allows in testing for collinearity in tourism research. As one can directly observe the marginal posterior of regression parameters under the Bayesian approach, we recommend comparing this marginal posterior under the two priors. We showed that these two converge under nonharmful collinearity, but are substantially different under harmful collinearity. Along with direct visible inspection, we also recommend the use of the Kolmogorov–Smirnov test to confirm whether the difference between the marginal posteriors are significant.
Footnotes
Appendix
Finally, we report, in Figure A2, the spectral measure
namely, the ratio of the minimum to the maximum eigenvalue of the cross-products matrix
