Bayesian semiparametric density ratio modelling with applications to medical malpractice reform

Abstract

This study examines the efficacy of tort reforms instituted throughout the country during the last decade, improving upon existing semiparametric density ratio estimation (DRE) methodologies in the process. DRE is a well-known semiparametric modelling technique that has been used for well over two decades. Although the approach has been demonstrated to be extremely useful in statistical modelling, it has suffered from one main limitation—the methodology has thus far not been capable of modelling individual-level heterogeneity. We address this issue by presenting a novel adaptation of DRE to model individual level heterogeneity. We do so by marginalizing the associated empirical likelihood function involving density ratios to provide an overall distribution of the entire population despite having extremely limited initial information about each individual in the dataset. We apply this approach to medical malpractice loss data from the previous decade to quantify the probability of changes in tort losses. Our results demonstrate the success of a number of recently implemented malpractice reforms. Comparisons to existing DRE methods, as well as standard regression methods, illustrate the efficacy of our approach.

Keywords

Bayesian computation Density ratio estimation medical malpractice reform semiparametric modelling tort reform

1 Introduction

1.1 Tort reform

From fields ranging from obstetrics to surgery to opthomology, medical doctors all across the United States face the risk of being unnecessarily sued. These risks often raise health care costs by forcing doctors to engage in defensive medicine (Crain et al., 2009; Kessler and McClellan, 1996; Studdert et al., 2005). These concerns have made reforming the civil justice system as it applies to doctors a top priority among many policy-makers both on the left and right side of the aisle (Jones, 2010, March 5).

Such reforms to the civil justice system are known as medical malpractice reforms. Belonging to a larger class of reforms known as tort reform, medical malpractice reform seeks to fundamentally transform important aspects of the civil justice system to preserve its integrity. Some approaches to tort reform include imposition of monetary caps, which limit the amount of money that a jury may award a plaintiff. These caps may apply to punitive damages, monetary damage awards, non-economic damages or appeal bonds. Other approaches have been to reform certain aspects of the civil justice system such as statutes of limitations and class action lawsuits among others.

In this study, we estimate the impact of malpractice reforms over the course of the last decade by computing probabilities regarding tort losses. This question is important to look at because policy-makers often wonder about the efficacy of reforms that have been implemented. Estimation of the associated probabilities, however, is not elementary as public policy research has clearly illustrated that different states exhibit different degrees of litigiousness (American Tort Reform Association, 2014; McQuillan and Abramyan, 2008a, McQuillan and Abramyan, 2010). Texas and Alaska have been demonstrated, for instance, to be substantially less litigious than other states such as California and New York (McQuillan and Abramyan, 2010). Additionally, different areas within states (such as New York City versus western upstate New York) may also exhibit a certain degree of heterogeneity. For example, the American Tort Reform Association has recently published a study titled ‘Judicial Hellholes 2014–2015’ discussing localities ‘where judges in civil cases systematically apply laws and court procedures in an unfair and unbalanced manner’. They argue that although ‘entire states are occasionally cited as Hellholes, specific counties or courts in a given state more typically warrant such citations’, (American Tort Reform Association, 2014). Although one study quantitatively looks at the impact of tort reforms via linear regression methods (Crain et al., 2009), it ignores this heterogeneity. We improve semiparametric density ratio estimation (DRE) methodologies to model this heterogeneity and compare this new approach to existing semiparametric density ratio methods as well as to standard regression methods.

1.2 Bayesian parametric methods

Given the dramatic improvements in statistical computing power over the course of the last few decades, incorporating heterogeneity in parametric models has become increasingly common in statistical modelling. Statisticians can now choose from a variety of methodologies ranging from Bayesian methods to finite mixture methods to combinations of the two (Kamakura and Russell, 1989; Kyung et al., 2011; Lenk and DeSarbo, 2000; Rossi and Allenby, 2003).

Modelling individual-level heterogeneity is important in understanding real-world phenomena as there is often no a priori reason to believe that all individuals (or observations) in a dataset behave in the same manner. However, with data sets often possessing limited information about each individual, it is often impossible to estimate models incorporating heterogeneity from a frequentist perspective. The Bayesian approach enables the researcher to assume that individual-level parameters follow a lower dimensional probability distribution from which statistical inferences can be made (Gelfand, 1996; Morris, 1983).

In this study, we apply an empirical Bayesian approach to a semiparametric methodology used thus far only for frequentist statistical modelling. Our approach allows these models to accommodate individual-level heterogeneity, thus enabling the researcher to make direct statistical inference about the overall population. We discuss this semiparametric methodology in the following section.

1.3 Density ratio estimation

DRE methods have been around for decades. In 1997, following Prentice and Pyke (1979) and others, Qin and Zhang (1997) proposed the idea of making assumptions about the ratios of probability densities (referred to as tilts) based on subsamples within the data sets. Such density ratios have been looked at extensively in machine learning research. Some approaches have utilized kernel-based methods among other methods to estimate the actual ratio, known as the importance, between the two densities. These methods are useful in covariate shift adaptation as well as outlier detection (Kanamori et al., 2009; Sugiyama et al., 2008).

DRE itself has had myriads of applications in statistical research (Owen, 1988; Prentice and Pyke, 1979; Qin and Zhang, 1997). In 1999, Gilbert et al. improved on DRE's methodologies and applied these improvements to understand the efficacy of human immunodeficiency virus (HIV) vaccine trials. In 2005, Qin and Zhang posited that the tilt functions have an exponential functional form (Qin and Zhang, 2005). Three years later, Kedem et al. (2008) applied the methodology to time series analysis (Gilbert et al., 1999; Kedem et al., 2008). A variety of studies in recent research have used DRE to understand the distributional properties of data. For example, in 2009, Kedem et al. utilized DRE to understand risk factors regarding cancer. They compared case and control groups to estimate the probabilities of particular risk factors influencing the incidence of cancer. Voulgaraki et al. (2012) extended this research to understand the distributional properties of height, weight and age on cancer patients. They utilized these distributions to estimate conditional expectations of weight, given height and age of patients. Subsequently, in 2014, Kedem et al. utilized DRE to understand food contamination. Having data about food contamination, the authors examined the probabilities of these contaminants exceeding critical thresholds to determine the risk of contracting particular illnesses. In addition to these studies, there have been many other studies that have utilized the semiparametric benefits of the DRE approach (Fokianos, 2004; Fokianos et al., 2001; Kedem et al., 2009; Phue et al., 2007; Prentice and Pyke, 1979).

Fokianos and Qin (2008) applied importance sampling in conjunction with DRE, which necessitated the generation of artificial data (Fokianos and Qin, 2008). Prior to the Fokianos and Qin study, however, all research utilizing DRE was based on within-sample data. With data divided into smaller subsets, comparisons would be made between these sets. A number of recent studies proposed an innovative adaptation of DRE known as ‘out of sample fusion’. Using the idea of having a single primary dataset as a reference, and an additional artificial (possibly simulated) dataset, the studies have shown that more accurate inferences can be made regarding the primary dataset by applying DRE to both the samples (Katzoff et al., 2014; Kedem et al., 2014; Zhou, 2012). Dayaratna (2014) found that this methodology can be particularly useful for analyzing Bayesian posterior samples (Dayaratna, 2014).

Multi task learning is another related field of density ratio modelling. The primary idea behind multi-task learning is to understand multiple tasks collectively with the goal to improve classification accuracy or to improve performance of an existing task. Bickel et al. (2008), for example, used multi task learning to forecast the outcome of therapy attempts for HIV patients with certain genetic properties.

Although a few studies have utilized Bayesian methods to look at these types of problems, no research, to our knowledge, has done so merging the semiparametric DRE method with Bayesian methods for modelling individual-level heterogeneity to understand overall distributional properties about the population (Lazar, 2003; Mengersen et al., 2013; Schennach, 2005; Yang et al., 2012). We ameliorate this limitation in this study. Specifically, we adapt a Bayesian approach to the semiparametric DRE method to model individual-level heterogeneity. We apply this methodology to a dataset used in a tort reform study performed by the Pacific Research Institute to understand changes in per capita tort losses throughout the country (Crain et al., 2009). Our focus is, therefore, on understanding the overall distributional properties of per capita tort losses, rather than making individual level predictions as done in Bickel et al. (2008). As a result, we harness our attention away from developing techniques to estimate the actual density ratio (as described in many papers—see, for example, (Sugiyama et al., 2008) and instead towards understanding overall distributional properties. We compare our approach to existing density ratio methods that ignore heterogeneity as well as standard regression methods.

2 Problem formulation

Suppose that we are given a dataset consisting of $i = 1, . . ., I$ samples, providing us with the following $P$ -dimensional vectors $x_{i, j} = (x_{i, j, 1}, x_{i, j, 2}, . . ., x_{i, j, P})$ with $j = 1, . . ., n_{i}$ observations within each $i^{th}$ sample and $\sum_{i = 1}^{I} n_{i} = N$ . Let $M = I + 1$ and define probability density functions $g_{i}$ such that

x_{i, j} \sim g_{i} .

(2.1)

Furthermore, we can also define

g_{M} \equiv g

as our reference probability density function, describing another sample of size

n_{M}

, assuming that the densities

g_{i}

satisfy the following relationship with respect to this reference regarding their ratios:

\frac{g_{i} (x)}{g (x)} = w (θ_{i}, x),

(2.2)

where θ_i is a vector to be estimated. Assume exponential tilt functions, defining

w (θ_{i}, x_{i, j}) \equiv e^{α_{i} + β_{i}' h (x_{i, j})}

, where we are currently assuming

h : R^{P} \to R^{P}

. Additionally, allow

α_{i} \sim N (μ_{α}, 1)

and

β_{i} \sim N (μ_{β}, Σ_{β})

, where

β_{i}' = (β_{i, 1}, . . ., β_{i, P})

be our heterogeneity distributions.¹

For example, one potential choice for $h$ is $h (x) = x$ as in Voulgaraki et al. (2012).

By permitting the model's coefficients to vary for every sample, the model has the ability to capture ‘sample-level’ (or individual-level if each sample represents an individual) heterogeneity.²

We assume the above parameterization for α_i (with a constant variance) to ensure statistical identifiability of the model after marginalization.

We begin by assuming that

Σ_{β}

is a diagonal matrix and, hence, that the random variables

β_{i, p}

are statistically independent of each other. This assumption is quite reasonable for our particular application, which we discuss further in Section 3.2. Additionally, we make the assumption that

n_{i} = 1 \forall i = 1, \dots, I

and

n_{M} = N

.³

We make this assumption so that our two samples are identical in size, although it is not necessarily a requirement that they be equal. If they are not equal, then the estimation becomes more dependent on the sample of greater size.

2.1 Bayesian density ratio estimation

Let $G (x) = G_{I + 1} (x)$ be our reference CDF (cumulative distribution function) and let $p_{ij} = dG (x_{i, j}) = {dG}_{I + 1} (x_{i, j})$ . We can utilize the method of constrained empirical likelihood and estimate $g_{i}$ and $θ_{i}$ as follows. The empirical likelihood function, based on our pooled data $x_{ij}$ is

L (θ, G_{M}) = \prod_{i = 1}^{M} \prod_{j = 1}^{n_{i}} p_{ij} \prod_{i = 1}^{I} \prod_{j = 1}^{n_{i}} e^{α_{i} + β_{i}' h (x_{i, j})},

(2.3)

where

θ = (α_{1}, \dots, α_{I}, β_{1, 1}, \dots, β_{I, P})

. As stated above, we make the assumption that

n_{i} = 1 \forall i = 1, \dots, I

and

n_{M} = N = I

. We can marginalize the empirical likelihood function by integrating the empirical likelihood against the heterogeneity distributions as follows:

\begin{matrix} ML (μ_{α}, μ_{β}, Σ_{β}, G_{M}) & = & \int_{- \infty}^{+ \infty} . . . \int_{- \infty}^{+ \infty} L (θ, G_{M}) \prod_{i = 1}^{I} \frac{1}{\sqrt{2 π}} e^{(\frac{- (α_{i} - μ_{α})^{2}}{2})} \\ \cdot \prod_{p = 1}^{P} \frac{1}{\sqrt{2 π σ_{β_{p}}^{2}}} e^{(\frac{- (β_{i, p} - μ_{β_{p}})^{2}}{2 σ_{β_{p}}^{2}})} d α_{i} d β_{i, p} \\ = & \int_{- \infty}^{+ \infty} . . . \int_{- \infty}^{+ \infty} \prod_{i = 1}^{M} \prod_{j = 1}^{n_{i}} p_{ij} \prod_{i = 1}^{I} \prod_{j = 1}^{n_{i}} e^{α_{i} + β_{i}' h (x_{i, j})} \frac{1}{\sqrt{2 π}} e^{(\frac{- (α_{i} - μ_{α})^{2}}{2})} \\ \cdot \prod_{p = 1}^{P} \frac{1}{\sqrt{2 π σ_{β_{p}}^{2}}} e^{(\frac{- (β_{i, p} - μ_{β_{p}})^{2}}{2 σ_{β_{p}}^{2}})} d α_{i} d β_{i, p} \\ = & \prod_{i = 1}^{M} \prod_{j = 1}^{n_{i}} p_{ij} \prod_{i = 1}^{I} \prod_{j = 1}^{n_{i}} \int_{- \infty}^{+ \infty} \dots \int_{- \infty}^{+ \infty} e^{α_{i} + β_{i}' h (x_{i, j})} \frac{1}{\sqrt{2 π}} e^{(\frac{- (α_{i} - μ_{α})^{2}}{2})} \\ \cdot \prod_{p = 1}^{P} \frac{1}{\sqrt{2 π σ_{β_{p}}^{2}}} e^{(\frac{- (β_{i, p} - μ_{β_{p}})^{2}}{2 σ_{β_{p}}^{2}})} d α_{i} d β_{i, p} \\ ML (μ_{α}, μ_{β}, Σ_{β}, G_{M}) & = & \prod_{i = 1}^{M} \prod_{j = 1}^{n_{i}} p_{ij} \prod_{i = 1}^{I} \prod_{j = 1}^{n_{i}} e^{μ_{α} + \frac{1}{2}} e^{{μ_{β}}^{'} h (x_{i, j}) + \frac{1}{2} {h (x_{i, j})}^{'} Σ_{β} h (x_{i, j})} . \end{matrix}

(2.4)

This result provides us with the following result:

Result 2.1 Thm (Marginalized empirical likelihood for density ratio model assuming normal prior distributions) The marginalized log-likelihood function of Equation (2.3), assuming normal heterogeneity distributions, is provided by

\begin{matrix} LL (μ_{α}, μ_{β}, Σ_{β}, G_{M}) & = & log ML (μ_{α}, μ_{β}, Σ_{β}, G_{M}) \\ = & \sum_{i = 1}^{M} \sum_{j = 1}^{n_{i}} log p_{ij} + \sum_{i = 1}^{I} \sum_{j = 1}^{n_{i}} (μ_{α} + \frac{1}{2} + {μ_{β}}^{'} h (x_{i, j}) + \frac{1}{2} {h (x_{i, j})}^{'} Σ_{β} h (x_{i, j})) . \end{matrix}

This result is simply due to taking the logarithm of Equation (2.4).

We can maximize the above marginalized likelihood subject to constraints analogous to those used in Voulgaraki et al. (2012):

p_{ij} \geq 0, \sum_{i = 1}^{M} \sum_{j = 1}^{n_{i}} p_{ij} = 1 and \sum_{i = 1}^{M} \sum_{j = 1}^{n_{i}} p_{ij} e^{μ_{α} + \frac{1}{2} + {μ_{β}}^{'} h (x_{i, j}) + \frac{1}{2} {h (x_{i, j})}^{'} Σ_{β} h (x_{i, j})} = 1 .^{4}

(2.5)

⁴

This constraint is easy to see after integration of both sides of the constraint imposed in Voulgaraki et al (2012): $\sum_{i = 1}^{M} \sum_{j = 1}^{n_{i}} p_{ij} e^{α_{k} + β_{k}' h (x_{i, j})} = 1$ over the heterogeneity distributions F, providing us with $\int \sum_{i = 1}^{M} \sum_{j = 1}^{n_{i}} p_{ij} e^{α_{k} + β_{k}' h (x_{i, j})} dF = \int 1 dF \forall k = 1, \dots, I$ .

The optimization of the empirical likelihood function can be performed numerically to estimate

μ_{α}

μ_{β}

and

Σ_{β}

2.2 Advantages of marginalization

The marginalization presented above has a few important purposes. In data sets having small $n_{i} \forall i$ , each observation contains very limited information regarding each $α_{i}$ and $β_{i}$ . To overcome this problem, integration across this high-dimensional parameter space enables us to significantly reduce the dimensionality of the problem to an empirical likelihood function involving just $μ_{α}, μ_{β}$ and $Σ_{β}$ . Furthermore, the marginalization essentially transforms our model from one density ratio model into another significantly reduced density ratio model. In particular, after starting with $I$ different samples, each of size 1, using a sample of size $N = I$ as a reference and integrating over the parameter space, the density ratio becomes another exponential with a slightly different functional form. As a result, this new density ratio compares two different distributions, each of sample size $N$ . The resulting marginalized distribution represents an overall distribution of the first $I$ distributions, each of sample size 1. Theoretician properties of these estimators such as statistical unbiasedness, consistency and asymptotically normality have been discussed in past research (Dayaratna, 2014; Fokianos, 2004; Lu, 2007; Owen, 2001).

2.3 Derivation of distributions

Our optimization of the empirical likelihood can be used to derive the empirical distributions. Defining $γ \equiv λ / 2 N$ , where $λ$ is a Lagrange multiplier, we can replace $γ, μ_{α}, μ_{β}$ and $Σ_{β}$ by their estimators. As a result, following the derivations in Voulgaraki, 2011, estimators of ${\hat{p}}_{ij}$ and $\hat{G} (x)$ are provided by

{\hat{p}}_{ij} = \frac{1}{2 N} \frac{1}{1 + \hat{γ} [e^{{\hat{μ}}_{α} + \frac{1}{2}} e^{{\hat{μ}}_{β}^{'} h (x_{i, j}) + \frac{1}{2} {h (x_{i, j})}^{'} {\hat{Σ}}_{β} h (x_{i, j})} - 1]}

(2.6)

and

\begin{matrix} \hat{G} (x) & = & \sum_{i = 1}^{M} \sum_{j = 1}^{n_{i}} {\hat{p}}_{ij} I (x_{ij} \leq x) \\ = & \frac{1}{2 N} \sum_{i = 1}^{M} \sum_{j = 1}^{n_{i}} \frac{I (x_{ij} \leq x)}{1 + \hat{γ} [e^{{\hat{μ}}_{α} + \frac{1}{2}} e^{{\hat{μ}}_{β}^{'} h (x_{i, j}) + \frac{1}{2} {h (x_{i, j})}^{'} {\hat{Σ}}_{β} h (x_{i, j})} - 1]} . \end{matrix}

(2.7)

Furthermore, as our ‘marginalized distribution’, which we will hereafter refer to as $\hat{H}$ , we have

\begin{matrix} \hat{H} (x) & = & \sum_{i = 1}^{M} \sum_{j = 1}^{n_{i}} {\hat{p}}_{ij} e^{{\hat{μ}}_{α} + \frac{1}{2} + {\hat{μ}}_{β}^{'} h (x_{i, j}) + \frac{1}{2} {h (x_{i, j})}^{'} {\hat{Σ}}_{β} h (x_{i, j})} I (x_{ij} \leq x) \\ = & \frac{1}{2 N} \sum_{i = 1}^{M} \sum_{j = 1}^{n_{i}} \frac{e^{{\hat{μ}}_{α} + \frac{1}{2} + {\hat{μ}}_{β}^{'} h (x_{i, j}) + \frac{1}{2} {h (x_{i, j})}^{'} {\hat{Σ}}_{β} h (x_{i, j})} I (x_{ij} \leq x)}{1 + \hat{γ} [e^{{\hat{μ}}_{α} + \frac{1}{2}} e^{{\hat{μ}}_{β}^{'} h (x_{i, j}) + \frac{1}{2} {h (x_{i, j})}^{'} {\hat{Σ}}_{β} h (x_{i, j})} - 1]} . \end{matrix}

(2.8)

Despite the fact that each of the first $I$ distributions had sample size 1, the marginalized distribution in Equation (2.9) provides an overall distribution of these $I$ samples. As we illustrate in the following section, this marginalized distribution can be extremely useful for statistical inference.⁵

A series of numerical simulations illustrating the efficacy of this approach is discussed in detail in Dayaratna, (2014). For researchers interested in estimating the probability density function of the sample, kernel density estimators can be found. Optimal bandwidth selection for kernel density estimation is discussed in detail in Voulgaraki et al., 2012.

3 An application: Tort reform

In this section, we analyze tort loss data from the 50 states belonging to the United States of America. We utilize the Bayesian DRE approach presented in this study to quantify the probability of the difference in tort losses between 2004 and 2006 being below a particular amount (Crain et al., 2009). These probabilities provide information regarding the efficacy of recently instituted tort reforms. After computing these probabilities, we compare this approach to existing DRE methods as well as standard regression methods.

3.1 Data

Our dataset was identical to that used in the Crain et al. (2009) study and was provided to us by two of the paper's authors. We examined per capita tort losses defined by Crain et al. (2009) to be the ‘payments by defendants (or their insurance companies) for judgements, settlements, attorney fees, and administrative expenses in tort lawsuits...’, in thousands of (real 2006) dollars per capita, of each of the 50 US states in 2004 and 2006 (Crain et al., 2009). For this analysis, we specifically examined medical malpractice tort losses, although analysis of other aspects of the civil justice system are useful ideas for future research.

Our data is denominated in per capita losses to foster equivalent analysis across the different states. The dataset consisted of a single observation for each of the 50 states (for a total sample size of 50 for each year). Table 1 contains a summary of our data on a year by year basis.

Table 1

Summary table of data of per capita tort losses

	2004	2006
Mean	$ 22.31	$ 15.23
Standard Deviation	$ 14.00	$ 12.65
Minimum	$-$ $ 2.58	$ 0.95
Maximum	$ 80.80	$ 65.64

States that had the lowest per capita tort losses were Wyoming in 2004 and Vermont in 2006, respectively. State that had the highest per capita tort losses was New York in both 2004 and 2006.

There have been many medical malpractice reforms instituted in the early 2000s. For example, Arkansas, Colorado, Florida, Idaho, Mississippi, Missouri, Ohio, Oklahoma, South Carolina, West Virginia and Wisconsin instituted laws placing caps on damage awards in medical malpractice lawsuits. Arizona, Georgia, Montana, North Dakota, South Carolina and Virginia placed laws regarding the conditions on the use of expert witnesses in medical malpractice lawsuits. Nevada and Washington placed laws regarding the use of statues of limitations in medical malpractice lawsuits. Florida and Nevada passed laws regarding limiting attorney fees. New Hampshire, South Carolina, Washington and Wyoming instituted laws regarding pre-trial screening and arbitration for medical malpractice cases (McQuillan and Abramyan, 2008b). We used 2004 and 2006 data in our analysis because as the above reforms were implemented during the first half of the last decade, the distributional differences in per capita tort losses between the two years would serve as reasonable indicator of their efficacy at reducing the probability of extreme tort losses. In doing so, as discussed earlier, since different states may differ in how litigious they are it is of utmost importance to incorporate this heterogeneity in our modelling. Our semiparametric Bayesian approach thus enables us to reduce the dimensionality of the problem and generate an overall marginalized distribution of differences in per capita tort losses all across the country.

3.2 Estimation

It is informative to understand the distributions of the difference in per capita tort losses between 2004 and 2006 as these distributions will shed light on the efficacy of tort reforms that had been recently instituted around that time period (Crain et al., 2009). Treating each state-based observation within each year as an independent single-unit sample with its own unique parametrization, we examined the difference in losses between the two years. We did so by utilizing the following model specification:

\frac{g_{i} (x)}{g (x)} = e^{α_{i} + β_{i} x}; i = 1, \dots, 50

(3.1)

where $x$ represents the per capita difference in tort losses between 2004 and 2006 (specifically the tort losses of the former year subtracted from the latter year). We let $α_{i} \sim N (μ_{α}, 1)$ and $β_{i} \sim N (μ_{β}, σ_{β}^{2})$ as we did in Section 2. By enabling our model's coefficients to vary across the states, we enable our model to capture state-level heterogeneity. As each state's tort laws do not impact other states, we are able to simplify our problem by assuming statistical independence amongst $α_{i}$ and $β_{i}$ . We estimated the marginalized distribution after using simulated data from a standard normal distribution as a reference in a manner similar to that of the out of sample fusion techniques used in recent research (Dayaratna, 2014; Katzoff et al., 2014; Kedem et al., 2014; Zhou, 2012). The marginalized distribution represents the overall distribution of all 50 states regarding differences in tort losses between 2004 and 2006. From this distribution, we were able to compute the probabilities of reductions in tort losses, which better enable us to the efficacy of tort reforms instituted during the time period.

We also utilized a bootstrap approach to estimate confidence intervals around these point estimates. In particular, we resampled our dataset (with replacement) 1000 times and re-estimated our probabilities for each sample and used the resulting set to determine our interval estimates. The results are depicted in Tables 2 and 3.

Table 2

Coefficient estimates, using Bayesian DRE approach

Coefficient	Estimate	Lower 95 $%$ CI	Upper 95 $%$ CI
$μ_{α}$	$-$ 2.3333	$-$ 3.4056	$-$ 1.8233
$μ_{β}$	$-$ 0.1664	$-$ 0.4536	0.7196
$σ_{β}^{2}$	0.8195	0.6701	0.8826

Table 3

Analysis of Tort loss data, using Bayesian DRE approach

Probability	Estimate	Lower 95% Limit	Upper 95% Limit
P( $Tort {Losses}_{2006} - Tort {Losses}_{2004}$ $<$ 0)	0.782	0.667	0.889
P( $Tort {Losses}_{2006} - Tort {Losses}_{2004}$ $< -$ 500)	0.751	0.628	0.860
P( $Tort {Losses}_{2006} - Tort {Losses}_{2004}$ $< -$ 1000)	0.722	0.590	0.840
P( $Tort {Losses}_{2006} - Tort {Losses}_{2004}$ $< -$ 1500)	0.706	0.568	0.824
P( $Tort {Losses}_{2006} - Tort {Losses}_{2004}$ $< -$ 2000)	0.696	0.556	0.815
P( $Tort {Losses}_{2006} - Tort {Losses}_{2004}$ $< -$ 5000)	0.660	0.520	0.780

Additionally, to understand the model's (which we hereafter refer to as Bayesian DRE approach) goodness of fit compared to existing DRE methods (standard DRE approach) that ignore heterogeneity, we used an approach suggested in Voulgaraki et al., (2012):

R_{α, k}^{2} = 1 - exp [- {(\frac{x_{α}}{m - x_{α}})}^{k}] .

(3.2)

In (3.2), $m$ is defined as the number of times the estimated cumulative distribution function falls inside the estimated $1 - α$ confidence interval obtained from the corresponding empirical cumulative distribution function, having both functions across the sample points. $k$ is a constant specified a priori. We estimated this statistic for $k = 1$ and $k = 2$ and compared it to the existing DRE method that makes no assumptions regarding heterogeneity. These results are outlined in Table 4. Figures 1 and 2 present plots comparing the estimated distributions of per capita $Tort {Losses}_{2006} - Tort {Losses}_{2004}$ $\hat{H}$ from (2.9) against the corresponding empirical cumulative distribution functions for $Tort {Losses}_{2006} - Tort {Losses}_{2004}$ . The plots indicate that the fit improves substantially between the Bayesian DRE approach with respect to the standard approach.

Table 4

Goodness of fit diagnostics

Existing DRE		Bayesian DRE
$R_{0.95, 1}^{2}$	$R_{0.95, 2}^{2}$	$R_{0.95, 1}^{2}$	$R_{0.95, 2}^{2}$
0.337	0.156	0.999	1.000

Figure 1

Plot of $\hat{H}$ vs. $\tilde{H}$ ‐ DRE, Difference in Per Capita Tort Loss Data between 2006 and 2004

Figure 2

Plot of $\hat{H}$ vs. $\tilde{H}$ ‐ Bayesian DRE, Difference in Per Capita Tort Loss Data between 2006 and 2004

Our analysis illustrates that the Bayesian DRE approach substantially improved model fit indicating the presence of unobserved heterogeneity across the country during that year that the standard DRE approach was unable to properly model. Furthermore, our results illustrate a substantial reduction in per capita tort losses. For example, Table 3 suggests a 0.66 probability of a reduction greater than 5000 dollars per capita with the interval estimates for these probabilities being well above zero. These results alongside the results from Crain et al. (2009) demonstrate the success of state-based medical malpractice reforms, many of which had been instituted around this time period (Crain et al., 2009). These reforms included imposition of strict statute of limitations for filing lawsuits, standards regarding expert witnesses, economic damage caps, attorney fee limitations, and requirements for pre-trial screening. In addition, existing tort laws, not necessarily recently instituted, may have become more stringently enforced during this time period.

3.3 Comparison to standard approaches

We also compared our modeling approach to that of a standard analysis of variance approach modeling per capita tort losses as a function of location. In particular, we specified the model:

\begin{matrix} y_{i, t} & = & μ + α_{i} + β_{t} + ε_{i, t} \end{matrix}

(3.3)

for state

i

and time period

t

(1996, 2004, 2006) allowing

ε_{i, t} \sim N (0, σ^{2})

. We computed probabilities analogous to those computed in Table 3 by estimating for per capita dollar values

c

P ({\hat{\overset{̅}{y}}}_{., 2006} - {\hat{\overset{̅}{y}}}_{., 2004} < c)

. Our results are presented in Table 5.

Table 5

Probability of per capita Tort losses being below 0, ANOVA approach

	Probability
P( $Tort {Losses}_{2006} - Tort {Losses}_{2004}$ $<$ 0)	0.799
P( $Tort {Losses}_{2006} - Tort {Losses}_{2004}$ $< -$ 500)	0.782
P( $Tort {Losses}_{2006} - Tort {Losses}_{2004}$ $< -$ 1000)	0.765
P( $Tort {Losses}_{2006} - Tort {Losses}_{2004}$ $< -$ 1500)	0.746
P( $Tort {Losses}_{2006} - Tort {Losses}_{2004}$ $< -$ 2000)	0.727
P( $Tort {Losses}_{2006} - Tort {Losses}_{2004}$ $< -$ 5000)	0.597

The ANOVA model has a coefficient of determination of 0.77 compared to goodness of fit diagnostics of virtually 1 from our Bayesian DRE approach. Additionally, the results Table 5 suggest the same phenomenon as suggested by Table 3 from our Bayesian DRE analysis ’ The success of recently instituted tort reforms in reducing tort losses. These results thus further substantiate our findings and also illustrate that our Bayesian DRE approach significantly fits our application better than existing methods.

3.4 Policy implications

The Congressional Budget Office has found that state-based tort reform reduced the number of lawsuits, reduced the quantity of damage awards, and brought down insurance claims (Congressional Budget Office, 2004). Our analysis also demonstrates that the risks associated with the civil justice system declined between 2004 and 2006. These reductions were the result of state-based tort reforms implemented throughout the country, including economic caps as well as standards regarding expert witnesses among others (Crain et al., 2009). Although these results illustrate the efficacy of certain malpractice reforms instituted around this time period, considerably more can be done (McQuillan, 2007; Crain et al., 2009). In fact, recent research has discussed the benefits of malpractice reform, noting that such reforms have the capacity to improve the practicing environment for physicians, which can tremendously benefit patients (Cornyn and Meese, 2010; Rubin and Shepherd, 2007; Levy, 2005; Rivlin, 2012; Kessler et al., 2005; Kessler and McClellan, 1996; Studdert et al., 2005).

Table 6

Coefficient estimates

	Estimate	Std Error	t-value	p-value
(Intercept)	8.182	4.964	1.648	0.102
Year ’ 2004	6.268	1.686	3.717	0.000
Year ’ 2006	-0.810	1.686	-0.480	0.632
Alaska	-1.010	6.883	-0.147	0.884
Arizona	15.208	6.883	2.209	0.029
Arkansas	5.741	6.883	0.834	0.406
California	-0.472	6.883	-0.069	0.945
Colorado	5.742	6.883	0.834	0.406
Connecticut	25.831	6.883	3.753	0.000
Delaware	32.056	6.883	4.657	0.000
Florida	16.352	6.883	2.376	0.019
Georgia	8.898	6.883	1.293	0.199
Hawaii	7.131	6.883	1.036	0.303
Idaho	-0.879	6.883	-0.128	0.899
Illinois	23.176	6.883	3.367	0.001
Indiana	-2.118	6.883	-0.308	0.759
Iowa	1.898	6.883	0.276	0.783
Kansas	-2.856	6.883	-0.415	0.679
Kentucky	10.040	6.883	1.459	0.148
Louisiana	-1.721	6.883	-0.250	0.803
Maine	9.975	6.883	1.449	0.150
Maryland	21.142	6.883	3.071	0.003

Table 7

Coefficient estimates

	Estimate	Std Error	t-value	p-value
Massachusetts	9.980	6.883	1.450	0.150
Michigan	0.063	6.883	0.009	0.993
Minnesota	0.981	6.883	0.142	0.887
Mississippi	5.116	6.883	0.743	0.459
Missouri	10.044	6.883	1.459	0.148
Montana	17.345	6.883	2.520	0.013
Nebraska	-2.705	6.883	-0.393	0.695
Nevada	7.656	6.883	1.112	0.269
New Hampshire	8.228	6.883	1.195	0.235
New Jersey	31.919	6.883	4.637	0.000
New Mexico	3.672	6.883	0.533	0.595
New York	44.331	6.883	6.440	0.000
North Carolina	5.118	6.883	0.744	0.459
North Dakota	-1.991	6.883	-0.289	0.773
Ohio	8.300	6.883	1.206	0.231
Oklahoma	-1.030	6.883	-0.150	0.881
Oregon	4.537	6.883	0.659	0.511
Pennsylvania	12.408	6.883	1.803	0.075
Rhode Island	12.259	6.883	1.781	0.078
South Carolina	-3.837	6.883	-0.557	0.578
South Dakota	7.665	6.883	1.114	0.268
Tennessee	17.144	6.883	2.491	0.014
Texas	1.116	6.883	0.162	0.872
Utah	3.336	6.883	0.485	0.629
Vermont	4.066	6.883	0.591	0.556
Virginia	2.914	6.883	0.423	0.673
Washington	7.842	6.883	1.139	0.257
West Virginia	6.187	6.883	0.899	0.371
Wisconsin	-3.571	6.883	-0.519	0.605
Wyoming	-0.408	6.883	-0.059	0.953

4 Conclusions and future research

This study demonstrates the success of medical malpractice reform in reducing the probabilities of extreme tort losses. Medical malpractice reform, in addition to other reforms, such as instilling competition in the health care markets can be important in reducing the costs and improving the quality of care (Dayaratna, 2012; Dayaratna, 2013; Capretta and Dayaratna, 2013; Parente et al., 2011). From a methodological perspective, this study applies empirical Bayesian methods to semiparametric density ratio modeling, allowing statisticians to incorporate individual-level heterogeneity in such models. Our marginalization provides a closed-form empirical likelihood, allowing us to make direct inferences regarding the population sans the computationally intensive approaches typically associated with Bayesian methods (Gelfand and Smith, 1990).

Although our marginalization is useful, we looked at tort losses on a per-capita basis. Future research could potentially look tort losses with respect to the number of overall claims and perhaps use the approach presented in this study while comparing it to existing difference in difference methods. Furthermore, future research could also look at the different definitions of treatment and control groups; for example, states that instituted caps on damages only as well as states with historically above-average payments before 2004. Additionally, although our focus in this study is on medical malpractice reform, this approach can be extended to many other settings where modeling individual-level heterogeneity is important. From biostatistics to economics to professional sports the Bayesian approach presented here is a useful addition to the applied statistician's toolbox (Voulgaraki et al., 2012; Greenspan, 2013; Miller, 2007; Dayaratna and Miller, 2013).

5 Appendix

As discussed in Section 3.3, we compared our model to a standard analysis of variance approach that modeled per capita tort losses as a function of location. Specifically, we modeled:

\begin{matrix} y_{i, t} & = & μ + α_{i} + β_{t} + ε_{i, t} \end{matrix}

(5.1)

where $i$ represents state $i = 1 \dots 50$ and t represents the time period (1996, 2004, 2006) in question. We allowed $ε_{i, t} \sim N (0, σ^{2})$ . Table 6 and 7 are our coefficient estimates for our standard ANOVA model described in Section 3.3.

As discussed earlier, this model had an $R^{2}$ value of 0.77 and pertinent probabilities are discussed in Table 5.

Footnotes

Acknowledgments

The authors would like to thank Mark Crain and Hovannes Abramyan for providing the data used in this project. This article is based on a component of the author's doctoral dissertation (Dayaratna, 2014). We would like to thank Sashi Dayaratna, Hovannes Abramyan, Sandra Dayaratna, colleagues in The Heritage Foundation's Center for Data Analysis, John Malcolm, and conference participants at the 30th International Workshop on Statistical Modelling in Linz, Austria for constructive comments. An abridged version of this manuscript was published as part of the Proceedings of the 30th International Workshop on Statistical Modelling in Linz, Austria (Dayaratna and Kedem, 2015). Please note that nothing written here is to be construed as necessarily reflecting the views of The Heritage Foundation or as an attempt to aid or hinder the passage of any bill before Congress.

References

American Tort Reform Association (2014) Judicial Hellholes 2014-2015 . American Tort Reform Association.

Bickel

Bogojeska

Lengauer

Scheffer

(2008) Multi-task learning for hiv therapy screening. In Proceedings of the 25th international conference on Machine learning , pages 56–63. ACM.

Capretta

Dayaratna

(2013) Compelling evidence makes the case for a market-driven health care system. Heritage Foundation Backgrounder , (2867). The Heritage Foundation.

Congressional Budget Office (2004, June) The effects of tort reform: Evidence from the states. A CBO Paper . Congressional Budget Office, Washington DC.

Cornyn

Meese

(2010) Health care and medical malpractice reform: The necessity of reform in the current debate. Heritage Foundation Lecture, 1142. The Heritage Foundation.

Crain

McQuillan

Abramyan

(2009) Tort law tally: How state tort reforms affect tort losses and tort insurance premiums. Pacific Research Institute, San Francisco, CA.

Dayaratna

Kedem

(2015) A probabilistic examination of the efficacy of tort reform via semiparametric density ratio modeling. Proceedings of the 30th International Workshop on Statistical Modelling , pages 75–78. Linz, Austria.

Dayaratna

(2012) Studies show: Medicaid patients have worse access and outcomes than the privately insured. Heritage Foundation Backgrounder, (2740). The Heritage Foundation.

Dayaratna

(2013) Competitive markets in health care: The next revolution. Heritage Foundation Backgrounder , (2833). The Heritage Foundation.

10.

Dayaratna

(2014) Contributions to Bayesian Statistical Modeling in Public Policy Research . PhD thesis, University of Maryland.

11.

Dayaratna

Miller

(2013) The Pythagorean won-loss formula and hockey: A statistical justification for using the classic baseball formula as an evaluative tool in hockey. The Hockey Research Journal: A Publication of the Society for International Hockey Research , pages 193–209.

12.

Fokianos

(2004) Merging information for semiparametric density estimation. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 66, 941–958.

13.

Fokianos

Qin

(2008) A note on Monte Carlo maximization by the density ratio model. Journal of Statistical Theory and Practice , 2, 355–367.

14.

Fokianos

Kedem

Qin

Short

(2001) A semiparametric approach to the one-way layout. Technometrics , 43, 56–65.

15.

Gelfand

(1996) Model determination using sampling-based methods. In Gilks WR, Richardson S, and Spiegelhalter D, eds, Markov chain Monte Carlo in practice , pages 145–161. Boca Raton, Florida: Springer.

16.

Gelfand

Smith

(1990) Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association , 85, 398–409.

17.

Gilbert

Lele

Vardi

(1999) Maximum likelihood estimation in semiparametric selection bias models with application to AIDS vaccine trials. Biometrika , 86, 27–43.

18.

Greenspan

(2013) The Map and the Territory: Risk, Human Nature, and the Future of Forecasting . New York, NY: Penguin.

19.

Jones

(2010, March 5) Should tort reform be part of health care reform? A liberal thinks so. The Wall Street Journal. Retrieved from http://on.wsj.com/1612Fef

20.

Kamakura

Russell

(1989) A probabilistic choice model for market segmentation and elasticity structure. Journal of Marketing Research , 26, 379–390.

21.

Kanamori

Hido

Sugiyama

(2009) A least-squares approach to direct importance estimation. The Journal of Machine Learning Research , 10, 1391–1445.

22.

Katzoff

Zhou

Khan

Kedem

(2014) Out of sample fusion in risk prediction. Journal of Statistical Theory and Practice , 8, 444–459.

23.

Kedem

Wei

Williams

(2008) Forecasting mortality rates via density ratio modeling. Canadian Journal of Statistics , 36, 193–206.

24.

Kedem

Kim

E-y

Voulgaraki

Graubard

(2009) Two-dimensional semiparametric density ratio modeling of testicular germ cell data. Statistics in Medicine , 28, 2147–2159.

25.

Kedem

Pan

Zhou

(2014) Out of sample fusion: A Monte Carlo method. Proceedings of the 2014 IEEE Conference on Computational Science and Computational Intelligence , pages 364–367. Las Vegas, Nevada.

26.

Kessler

McClellan

(1996) Do doctors practice defensive medicine? The Quarterly Journal of Economics , 111, 353–390.

27.

Kessler

Sage

Becker

(2005) Impact of malpractice reforms on the supply of physician services. Journal of the American Medical Association , 293, 2618–2625.

28.

Kyung

Gill

Casella

(2011) New findings from terrorism data: Dirichlet process random-effects models for latent groups. Journal of the Royal Statistical Society: Series C (Applied Statistics) , 60, 701–721.

29.

Lazar

(2003) Bayesian empirical likelihood. Biometrika , 90, 319–326.

30.

Lenk

DeSarbo

(2000) Bayesian inference for finite mixtures of generalized linear models with random effects. Psychometrika , 65, 93–119.

31.

Levy

(2005) Do's and dont's of tort reform. Cato Institute Commentary. The Cato Institute.

32.

(2007) Asymptotic theory for multiple-sample semiparametric density ratio models and its application to mortality forecasting . PhD thesis, University of Maryland.

33.

McQuillan

(2007) Jackpot justice: The true cost of America's tort system . Pacific Research Institute, San Francisco, CA.

34.

McQuillan

Abramyan

(2008a) U.S. tort liability index: 2008 report. Pacific Research Institute, San Francisco, CA.

35.

McQuillan

Abramyan

(2008b) 2008 U.S. tort liability Index Master Database [Data File].

36.

McQuillan

Abramyan

(2010) U.S. tort liability index: 2010 report. Pacific Research Institute, San Francisco, CA.

37.

Mengersen

Pudlo

Robert

(2013) Bayesian computation via empirical likelihood. Proceedings of the National Academy of Sciences , 110, 1321–1326.

38.

Miller

(2007) A derivation of the Pythagorean Won-Loss Formula in baseball. Chance , 20, 40–48.

39.

Morris

(1983) Parametric empirical Bayes inference: theory and applications. Journal of the American Statistical Association , 78, 47–55.

40.

Owen

(1988) Empirical likelihood ratio confidence intervals for a single functional. Biometrika , 75, 237–249.

41.

Owen

(2001) Empirical likelihood . Boca Raton, Florida: CRC press.

42.

Parente

Feldman

Abraham

(2011) Consumer response to a national marketplace for individual health insurance. Journal of Risk and Insurance , 78, 389–411.

43.

Phue

J-N

Kedem

Jaluria

Shiloach

(2007) Evaluating microarrays using a semiparametric approach: Application to the central carbon metabolism of Escherichia coli BL21 and JM109. Genomics , 89, 300–305.

44.

Prentice

Pyke

(1979) Logistic disease incidence models and case-control studies. Biometrika , 66, 403–411.

45.

Qin

Zhang

(1997) A goodness-of-fit test for logistic regression models based on case-control data. Biometrika , 84, 609–618.

46.

Qin

Zhang

(2005) Density estimation under a two-sample semiparametric model. Nonparametric Statistics , 17, 665–683.

47.

Rivlin

(2012) Curing health care: the next president should complete, not abandon, Obama's reform. Brookings Institution Campaign 2012 Policy Brief . The Brooklyn Institution.

48.

Rossi

Allenby

(2003) Bayesian Statistics and Marketing. Marketing Science , 22, 304–328.

49.

Rubin

Shepherd

(2007) Tort reform and accidental deaths. Journal of Law and Economics , 50, 221–238.

50.

Schennach

(2005) Bayesian exponentially tilted empirical likelihood. Biometrika , 92, 31–46.

51.

Studdert

Mello

Sage

DesRoches

Peugh

Zapert

Brennan

(2005) Defensive medicine among high-risk specialist physicians in a volatile malpractice environment. Journal of the American Medical Association , 293, 2609–2617.

52.

Sugiyama

Nakajima

Kashima

Buenau

Kawanabe

(2008) Direct importance estimation with model selection and its application to covariate shift adaptation. Proceedings of the 20th Annual Conference on Neural Information Processing Systems, pages 1433–1440. Vancouver, British Columbia.

53.

Voulgaraki

(2011) Semiparametric Regression and Mortality Rate Prediction . PhD thesis, University of Maryland.

54.

Voulgaraki

Kedem

Graubard

(2012) Semiparametric regression in testicular germ cell data. The Annals of Applied Statistics , 6, 1185–1208.

55.

Yang

(2012) Bayesian empirical likelihood for quantile regression. The Annals of Statistics , 40, 1102–1131.

56.

Zhou

(2012) Out of Sample Fusion . PhD thesis, University of Maryland.

Bayesian semiparametric density ratio modelling with applications to medical malpractice reform

Abstract

Keywords

1 Introduction

1.1 Tort reform

1.2 Bayesian parametric methods

1.3 Density ratio estimation

2 Problem formulation

2.3 Derivation of distributions

3.1 Data

Table 1

Summary table of data of per capita tort losses

Coefficient estimates, using Bayesian DRE approach

Analysis of Tort loss data, using Bayesian DRE approach

Goodness of fit diagnostics

Plot of H ^ vs. H ˜ ‐ DRE, Difference in Per Capita Tort Loss Data between 2006 and 2004

Plot of H ̂ vs. H ̃ ‐ Bayesian DRE, Difference in Per Capita Tort Loss Data between 2006 and 2004

Probability of per capita Tort losses being below 0, ANOVA approach

Table 6

Coefficient estimates

Coefficient estimates

5 Appendix

Footnotes

Acknowledgments

References

Plot of $\hat{H}$ vs. $\tilde{H}$ ‐ DRE, Difference in Per Capita Tort Loss Data between 2006 and 2004

Plot of $\hat{H}$ vs. $\tilde{H}$ ‐ Bayesian DRE, Difference in Per Capita Tort Loss Data between 2006 and 2004