Abstract
We consider the distribution planning problem in the motion picture industry. This problem involves forecasting theater‐level box office revenues for a given movie and using these forecasts to choose the best locations to screen a movie. We first develop a method that predicts theater‐level box office revenues over time for a given movie as a function of movie attributes and theater characteristics. These estimates are then used by the distributor to choose where to screen the movie. The distributor's location selection problem is modeled as an integer programming‐based optimization model that chooses the location of theaters in order to optimize profits. We tested our methods on realistic box office data and show that it has the potential to significantly improve the distributor's profits. We also develop some insights into why our methods outperform existing practice, which are crucial to their successful practical implementation.
1. Introduction
The motion picture industry is an important sector of the US economy. Movie releases at theaters generated approximately US$9 billion in revenue in 2005, representing a nearly 20% increase since the beginning of the decade. In spite of this increase, the Motion Picture Association of America (MPAA) reports that only nine of the 549 movies released in 2005 generated profit higher than US$50 million (
Distribution of a motion picture is handled by its distributor, who forms an important link in the motion picture industry supply chain (Figure 1). Examples of major distributors in the motion picture industry include Buena Vista (Walt Disney's distribution arm), Columbia (Sony's distribution arm), Universal, Paramount, Fox, and Warner Brothers. The motion picture distribution industry is highly concentrated with these six distributors accounting for 70% of box office sales in the United States. The distributor secures rights from the producer, undertakes marketing (including advertising in television, local, and national media) and has to choose at which theaters to screen the movie. The theaters are owned by exhibitors such as AMC, Regal, General Cinema, and Mann. Historically, distributors have been more dominant and profitable than exhibitors in the motion picture industry. To achieve balance of power between these entities, there has been federal regulation and individual statutes passed by states. The aims of these statutes were to ensure that distributors picked exhibitors on a movie by movie basis using a competitive bidding process. A detailed overview of the history of anti‐trust legislation in the motion picture industry can be found in Vany and Eckert (1991). However, with increasing overcapacity of screens and changes in technology that enables different channels of distribution such as DVD's, satellite, and the Internet, the balance of power is again being shifted in favor of the distributors. Scott (2005) provides a comprehensive discussion of these issues.

Distribution planning for a new movie is done far in advance of the actual release date. The studio (i.e., producer) announces the distributor several months before actual theatrical release of the movie. Then, the distributor solicits bids to exhibit the movie. In response, exhibitors contact the distributor with bids to screen the movie. These bids include a proposed set of theaters and their coverage territory, initial duration of play, and revenue‐sharing agreements. The distributor now needs to choose which exhibitors and theaters to use for the initial screening of the movie. This decision has to be made at least 3 months before the scheduled theatrical release of the movie. These decisions are then conveyed to the exhibitor. If the distributor picks a particular theater, this movie is scheduled to be shown there for the agreed period. If not, the exhibitor has the option to try other movies, as this planning takes place far in advance of when the movie is actually released in the theater. As the initial duration of play for the selected theaters expires (which is agreed by the contract), the distributor is free to add additional theaters as offers to show new movies are made by the exhibitors (McGrath 2004, Rothenberg 2003).
The distribution planning problem involves forecasting theater‐level box office revenues for a given movie and using these forecasts to choose the best locations to screen a movie. Distribution planning in the motion picture industry is made difficult by a complex environment. Forecasting box offices revenues for a given duration is challenging as movies are experiential products (Hirschman and Holbrook 1982) and, consequently, it is difficult to forecast their audience appeal until they are already available in the theaters. This is because audience appeal varies widely depending on movie attributes such as genre, star presence, special effects, MPAA ratings, critical reviews, etc. Even experienced analysts in the industry have not been very successful in forecasting box office revenues (Table 1). While there has been some academic research on aggregate box office forecasting, disaggregate theater‐level forecasting is even harder due to dissimilarity in location‐specific characteristics of theaters, such as amenities and demographics (i.e., median income, age, population density, etc.). This causes significant differences in revenue for the same movie in different markets (Table 2). All these factors conspire to make the theater‐level box office revenue forecasting problem in the motion picture industry an extremely challenging problem. Even after the forecasts are generated, picking the optimal location of theaters to show a particular movie is a challenging problem due to the large number of movies released each year by multiple distributors and the abundance in potential theater locations. For example, there were 500 movies released in the year 2005 and more than 7000 possible theater locations in the United States and Canada to show these movies.
National Research Group's estimate of the opening weekend box office.
Relative percentage error=(NRG estimate−Actual BO)/Actual BO.
Source: IMDB and Variety Magazine.
Movies are shown in the same number of theaters in each market and revenues reported are Friday to Sunday averages over duration of play.
Data Source: Nielsen EDI's box office sample.
Despite the practical relevance and complexity of this problem, we have found nothing in the academic or managerial literature that describes how to conduct effective distribution planning in the motion picture industry. This paper presents a method for addressing these issues and addresses an important problem in entertainment operations management, an underrepresented area in the growing stream of research in service operations management (Apte et al. 2008, Spohrer and Maglio 2008). Specifically, we have developed an empirical technique that provides a box office revenue estimate over time for a new motion picture at a selected theater. We use these estimates to model the distributor's location selection problem (DLSP) as an integer programming model. This model chooses the location of theaters to screen the movie in order to optimize the distributor's profits. We have tested our methods on actual industry data and show that our approach offers the potential to significantly improve box office profits for new movies.
This paper is organized as follows. In section 2, we review the relevant literature. In section 3, we develop an empirical method to estimate the theater‐level box office revenue for a given movie. In section 4, we formulate the DLSP. We develop heuristics to solve this problem and also construct upper bounds to evaluate the quality of these heuristics. We present computational results in section 5. We use these results to assess the performance of our forecasting method and the heuristics to solve the DLSP. These results also provide insight into what affects theater‐level box office revenue and the impact of these revenue estimates on the location choice decision. In section 6, we test our methods on realistic box office data and show that it has the potential to significantly improve current industry practice. In the concluding section, we summarize our work and present future research directions.
2. Literature Review
There is extensive literature on motion pictures in the popular press and in the film and television areas (Bart 2000, Hayes and Bing 2004, Vogel 2001); however, much of this is descriptive in nature and relies heavily on anecdotal industry knowledge. There is a limited, but emerging, stream of academic research focused on the motion picture industry. Eliashberg et al. (2006) provide a comprehensive overview on the critical issues in practice, current research, and future research directions in the motion picture industry. Areas of research include product diffusion (Elberse and Eliashberg 2003, Neelamegham and Chintagunta 1999), seasonal release patterns (Einav 2002, Krider and Weinberg 1998), ancillary markets (Lehmann and Weinberg 2000), and contract design and competition (De Vany and Walls 1996). In the broader context of the entertainment industry, there has been work on scheduling commercial videotapes (Bollapragada et al. 2004), managing on‐air advertisement inventory (Bollapragada and Mallik 2008), media revenue management (Araman and Popescu 2009), and theme park flow management (Rajaram and Ahmadi 2003).
There has been a significant stream of research on aggregate box office forecasting of new motion pictures. Many previous studies have attempted to explain aggregate box office success as a function of movie attributes such as budget, star power, MPAA rating, release timing, and Academy Award nominations and winners (Litman and Ahn 1998, Wyatt 1994). Recent work focuses on the influence of a major distributor (Sochay 1994), advertising and critical reviews on box office success (Eliashberg and Shugan 1997, Zuckerman and Kim 2003, Zufryden 1996). Sawhney and Eliashberg (1996) develop a parsimonious aggregate forecasting model and test it on realistic data. However, none of this work considers disaggregated theater‐level box office revenue forecasts for a new movie. There have been several papers on scheduling of screens with multiple movies (Eliashberg et al. 2009, Dawande et al. 2010, Swami et al. 1999). This is an important tactical problem once the location of the theater to screen a movie is chosen, but these papers do not consider the broader strategic question on which theaters to show the movie, as addressed in DLSP.
This paper makes the following contributions. First, we develop a method to calculate detailed disaggregated theater‐level box office sales forecasts, based on both movie attributes and theater characteristics. Second, unlike the work discussed above, we directly consider the DLSP. Correct selection of theater location is essential to the ultimate box office success of a movie and we develop an optimization model to make this choice. Third, we test this model extensively on realistic data and show that it has the potential to significantly improve existing industry practice.
3. Forecasting Theater‐Level Box Office Revenues
Consider a distributor who has to make box office revenue forecasts at each of the possible theaters where they could release a new movie. To provide a precise statement of this problem, we consider n possible theaters and let j∈N=(1, …, n) index the set of theaters. Let π
ijt
define the expected box office revenue forecast when movie i is shown at theater j in week t, where i∈M=(1, …, m) indexes the set of movies and t∈Q=(1, …, q) indexes the set of time periods. Estimating π
ijt
is critical for several reasons. First, when we tabulate total box office revenue across theaters over time (i.e.,
, ∀t), we obtain the adoption pattern for movie i. This pattern provides crucial guidance for various strategic decisions made by the distributor such as determining the marketing budget and eventually the distribution strategy (i.e., platforming, wide, or saturation release) for the movie. This, in turn, is used in determining the maximum number of theaters across regions, the minimum play length at any theater, and, finally, in negotiating the revenue‐sharing contract during and after this minimum play length. Second, π
ijt
is a key parameter in determining at which theaters the movie will be screened and, ultimately, its box office success.
However, estimating π ijt is challenging for several reasons. First, it is difficult to understand which movie attributes and theater characteristics will affect theater‐level box office revenue and how they do so. Second, it is challenging to estimate how this complex relationship between theater characteristics, movie attributes, and box office revenues changes over time. Finally, this estimation is difficult, as forecasting box office revenues requires an understanding of the individual moviegoer's decision process to see a given movie and incorporating this process into the estimation method.
Typical industry practice when forecasting box office revenues is to compare a new movie with recently released movies that are similar in one movie attribute and use multiple, separate comparisons to study the effect of different movie attributes on box office revenues (Rothenberg 2003, Tannenbaum 2001). In addition, some distributors also use multiple regression analyses on a set of comparable attributes. While these procedures are simple and provide flexibility to incorporate subjective expertise, they are not very accurate. This is because this approach does not explicitly consider any of the above‐discussed aspects that make estimating π ijt an immensely challenging problem. There are box office estimation models in the academic literature that incorporate some of these aspects. Most of these models run multiple regressions directly on box office revenue as a function of certain sets of movie attributes (Litman and Ahn 1998). A major limitation of this method is that it only provides point estimates for box office revenues by assuming an unrestricted horizon for exhibiting the movie. In addition, this approach does not consider significant variations in box office revenues across time periods and differences in adoption patterns across movies. Sawhney and Eliashberg (1996) developed a parsimonious model to forecast a movie's box office success as a function of time. They use an innovative method that incorporates an individual moviegoer's decision process to adopt (or see) a given movie and also consider the impact of different movie adoption patterns. However, the objective of this model is to estimate box office revenue at the national or aggregate level, and this approach cannot be used to provide local or disaggregate location‐specific, theater‐level estimates.
To overcome the described challenges inherent in estimating π ijt , we develop a four‐step method. These steps are outlined in Figure 2. In Step 1, we extend the Sawhney and Eliashberg (1996) model to include location‐specific, theater‐level characteristics. We estimate the parameters of this model using nonlinear regression on a historical database of realized box office revenues for selected movies. In Step 2, we use multiple regression models to link the estimated model parameters to movie attributes and theater characteristics corresponding to this historical database; then, we also consider any potential interactions between these variables. This step provides us with a function that estimates model parameters given a set of movie attributes and theater specific characteristics. In Step 3, we use this function to estimate the model parameters for a new movie given its attributes and the location‐specific characteristics of the theater under consideration. In Step 4, we use these estimates of the model parameters of Step 1 to estimate the box office revenue for the new movie at a given theater across time. Below, we describe each of these steps in detail.

Step 1: Incorporation of theater‐level characteristics
In this step, we extend the Sawhney and Eliashberg (1996) model to include location‐specific, theater‐level characteristics. Here, we assume that an individual movie patron's decision process to choose whether to watch a movie depends on two independent sub‐processes: (1) the decision to see a movie p in theater j, followed by (2), the decision to visit theater j. These processes are modeled as stochastic processes with stationary parameters λ pj representing the time‐to‐decide parameter and γ pj representing the time‐to‐act parameter. Then, the expected time to decide becomes (1/λ pj ) and the expected time to act becomes (1/γ pj ). Although it is plausible that the time‐to‐decide process is mainly influenced by movie attributes, we believe that the availability of the chosen movie in a theater that is acceptable to the patron also affects the time‐to‐decide parameter. Once the individual has decided to watch a movie, the next decision is where to watch it, which is again influenced by theater characteristics.
Following the approach of Sawhney and Eliashberg (1996), the expected cumulative number of adopters of movie p at theater j by time τ can then be expressed as
Step 2: Calibration of regression model parameters from historical box office information
The second step in the estimation process is to connect the estimated value of parameters
, ∀j∈N, ∀p∈H from Step 1 to movie attributes, theater characteristics, and their possible interactions. To do this, we rely on multiple regressions based on the historical box office information ∀p∈H used in Step 1.
Let
Here, α
N
, α, and α denote the population intercepts and
,
, and
(a∈Ω=(1, …, A), and
,
, and
(c∈Λ=(1, …, C) represent the regression coefficients associated with vectors
,
, and
, where a′∈Ω′⊆Ω and c′∈Λ′⊆Λ. Finally, ɛ
N
, ɛ, and ɛ are independent and identically distributed random error terms.
Step 3: Estimation of model parameters for a new movie
Let
represent the model parameters for a new movie s in theater j. Once the regression coefficients of Equations (2) through (4) are calculated, we use the known attributes Z
s
=(Z
1s
, … Z
As
) of new movie s and individual theater characteristics at theater j to estimate parameters
, and
as
Step 4: Estimation of theater‐level box office revenues for a new movie
In this step, we use the estimates of
, and
from (5) to (7) in (1) to estimate
, the expected cumulative number of adopters for new movie i at theater j until time τ, as
Let φ
j
be the ticket price at theater j and t=τ
2−τ
1 be the time interval under consideration. Then, we calculate π
sjt
as
In section 5, we conduct computational experiments to test the accuracy of this method.
4. The DLSP
Once we forecast theater‐level box office revenues, the distributor needs to use this forecast to choose at which theaters to show a new movie, in order to maximize profits. To address this problem, we consider n possible theaters and let j, j′∈N=(1, …, n) index the set of theaters. These movie theaters are located in r regions indexed by r∈P=(1, …, u).
Define the variables
We are given
Here, parameter K
(MAX) is defined by the type of distribution strategy (i.e., limited, platforming, wide, or saturation release) chosen for the particular movie. The distribution strategy is chosen to be consistent with the marketing budget for a given movie. Next, the regions chosen are often major metropolitan areas. The minimum number of theaters per region is often specified by past experience. Finally, parameters
are derived from the coverage territory specified by the exhibitor in each territory. The coverage territory is defined by a set of competing theaters that need to be excluded in each territory, which in turn specifies
.
Recollect that π
ijt
defines the expected box office revenue forecast when movie i is shown at theater j in week t, where i∈M=(1, …, m) indexes the set of movies and t∈Q=(1, …, q) indexes the set of time periods. Let t
ij
∈Q denote the duration chosen by the exhibitor to show movie i at theater j. Note that t
ij
can be optimally chosen using the models described in Swami et al. (1999, 2001) or Somlo (2005) and is affected by the minimum play length P
(MIN) fixed by a contractual agreement between the distributor and exhibitor. Then, the distributor's expected box office profit during this period is given by
, where c
ij
D
is the distribution cost of movie i to theater j, and constant s
ijt
D
represents the portion of revenues allocated to the distributor for movie i at theater j during week t. This factor depends on the duration chosen by the exhibitor and the nature of the contractual agreement between the exhibitor and distributor.
The DLSP can be represented by the following binary integer program:
Objective function (10) is chosen to maximize the distributor's total expected box office profits for movie i by the appropriate choice of theaters. Constraint (11) ensures that the total number of theaters selected to screen a movie does not exceed K (MAX). Constraints (12) guarantee that a set minimum number of theaters are picked for each region. Constraints (13) ensure that the distributor does not pick competing theaters. This is important because a distributor often chooses multiple exhibitors to show a movie. Consequently, to prevent dilution of sales at the selected theater, exhibitors require that competing theaters within the vicinity are not picked. Finally, 0–1 integrality of the variables is imposed by constraints (14).
The DLSP can be used by the distributor to determine the optimal set of theaters for the initial screening of a movie. As the minimum play length commitment for the theaters expires, the distributor is free to add additional theaters as new offers to show the movie are made by exhibitors. In this case, the distributor would need to consider the set of theaters in which the movie still has to be shown, reduce K
r
(MIN)at the appropriate regions, reduce K
(MAX) by the total number of theaters where the movie is still being shown, remove the additional theaters for which constraint (13) will not be feasible, and resolve the DLSP. Note that this approach also can be used to include preferred theaters in the beginning or in later iterations of the DLSP. Here, the preferred theaters are akin to the theaters where the movie still has to be shown. Such preferred theaters may be necessary to maintain a long‐term relationship with the exhibitor. Also note that this model can be extended to multiple movies. Here, we would first need to index the parameters and variables of the DLSP by movie index i. Then, the objective function for the DLSP for multiple movies would now be
and we would have constraints (11) through (14) for each movie i. Since this problem is decomposable by movie or index i, we could use the same solution procedure to solve the DLSP with a single movie.
P
P
In light of Proposition 1, it is unlikely that we could solve large, realistic problems to optimality. In particular, we found in our computational analysis that we could not find solutions using leading commercial software tools such as the XPRESS and CPLEX solvers in GAMS (Brooke et al. 1992) when the number of theaters is large (over 1800 theaters) and when each theater has many competing theaters (averaging over seven per theater). Consequently, we elected to develop heuristics to solve such instances of this problem. We also present upper bounds to evaluate the quality of these heuristics.
4.1. Upper Bounds
To develop upper bounds on the DLSP, one could relax one or more of constraints (11) through (13) by introducing Lagrange multipliers and solving the resulting sub‐problem optimally. Then, this sub‐problem can be optimized over the multipliers to provide a tight upper bound. However, the upper bound from any such relaxation would be no smaller than a simple linear programming relaxation of the DLSP, in which we relax constraint (14) by allowing W j ∈[0, 1], ∀j. This is because relaxations of the DLSP involving constraints (11) through (13) have the integrality property (Geoffrion 1974), as established by the following proposition.
P
P
, where A, b, C, d, and f are the appropriate matrices, Ax≤b represents the set of constraints we keep, Cx≤d represents the set of constraints we relax, and x∈X represents the integrality constraints. Let Co{xɛX|Cx≤d} represent the convex hull formed by the constraints we relax. Since in the DLSP K
r
(MIN), K
(MAX)∈N
+
, ∀r and L
jr
, b
jj′∈{0, 1}, ∀r, j, j′ note that Co{x∈X|Cx≤d}={x|Cx≤d} for any relaxation involving constraints (11), (12), and (13). It follows from Geoffrion (1974) that any relaxation of the DLSP involving these constraints has the integrality property. ▪
In light of Proposition 2, we generate an upper bound for the DLSP by solving its linear programming relaxation with W j ∈[0, 1], ∀j.
4.2. Heuristics
In general, the solution provided by the upper bounds may not be feasible for the DLSP due to the violation of the integrality constraints (14). To achieve feasibility, we develop the following heuristics.
Step 1: In each region, select K
r
(MIN) theaters in descending order of θ
ij
D
. This satisfies constraints (12). Remove the selected theaters from consideration. Sort all of the remaining theaters in descending order of θ
ij
D
and pick an additional
theaters. Thus, constraint (11) is binding.
Step 2: We consider the theaters selected by Step 1 and look for those theaters that violate constraints (13). We first remove the violating theater with the lowest θ ij D . We then find a replacement theater that has the highest possible θ ij D without violating constraints (13). This procedure is repeated until all violating theaters across all regions are eliminated.
Step 1: Calculate the ratio
for each theater. R
i
can be regarded as the scaled expected box office profit for the distributor from movie i at theater j. The scale factor
decreases as
, the number of competing theaters for theater j, increases.
Step 2: For each region, select the theater with highest ratio and in the case of a tie select the theater with the lowest number of competing theaters. Note that by picking the theaters in this ratio, we implicitly reduce the number of competing theaters. This, in turn, ensures that it is more feasible to pick subsequent theaters without drastic reduction in revenues. Once this theater is selected, remove all the competing theaters from consideration. If, at this point, constraint (12) is satisfied, go to next region. If constraint (12) is not satisfied, pick the ratio with the next highest value and repeat this procedure. Continue until constraint (12) is satisfied. If this still does not lead to a feasible solution, restart this procedure with the theater with the next highest ratio and continue until this constraint is satisfied. Repeat this step for every region.
Step 3: Remove all theaters selected in Step 2 from consideration.
Step 4: Consider the theaters that have not been removed in Step 2 or 3 and choose the remaining
theaters in decreasing order of
. After each selection, remove the competing theaters associated with the chosen theater.
Note that this heuristic attempts to satisfy constraints (12) and (13) in Step 2, and constraint (11) and (14) in Step 4. In the next section, we test the performance of our method to estimate π ijt and also the effectiveness of both these heuristics and the upper bound across a variety of data sets.
5. Computational Study
The financial box office data required for the computational study were purchased at the time of this research from Nielsen Entertainment Data Incorporated (EDI) located in Beverly Hills, California. We selected theaters located within the continental United States and purchased data of weekly box office revenues for all movies played at a given theater between May 22, 2000 and May 25, 2001.
1
The time period was chosen to completely cover an entire major release period during summer. The sample consisted of 149 movies and 1218 theaters. This sample of movies accounted for 98.42% of box office revenues during this period. In addition to purchasing financial data, we built two separate databases to collect information regarding movie attributes and theater characteristics corresponding to the movies and theaters in this period. These were created using Microsoft Access 2000. Data for the movie attributes were gathered from the Internet Movie Database (
5.1. Results of Estimation Method
We summarize our results corresponding to the sequence of steps in the estimation method outlined in section 5.
Step 1: Incorporation of theater‐level characteristics
We use nonlinear regression to approximate the model parameters: the maximum potential market size (N j ), time‐to‐decide (λ j ), and time‐to‐act (γ j ) for a given movie. To execute this regression, we used the Levenberg–Marquardt algorithm (Bates and Watts 1988) of the NLIN procedure of SAS, a commercially available statistical software (SAS 2001). To ensure convergence during these runs, it was critical to specify good starting values for the nonlinear regression. Therefore, we used a grid search to obtain good starting values for the parameters. Results of the parameter approximation for selected pairs of movies and theaters are presented in Table 3.
From Table 3, we make three important observations. First, note that the magnitude of estimated box office revenue for the same movie can change significantly across theaters. For example, the estimated box office revenue for the movie WOMEN varied from US$31,400 to US$171,535. Thus, including location‐specific theater characteristics in the box office estimation procedure is an important aspect in optimizing box office profits. Second, different movies lead to vastly dissimilar revenues at the same theater. For example, consider the theater FENW in Table 3. The revenues for the three movies shown for the same duration vary from around US$22,640 to US$255,337. This confirms the intuition that movie attributes have a significant influence on theater‐level box office revenues. Third, even when estimated box office revenues were similar, these could have been derived from very dissimilar parameter estimates and adoption patterns. For instance, the estimated box office revenue for movie LIESBTH at theater WYNN was US$22,607 and the estimate for the movie VELN at theater FENW was US$22,691; however, the estimate for LIESBTH at theater WYNN was based on N=22.994, λ=26.168, and γ=0.341, while the box office estimate for VELN at theater FENW was based on N=23.049, λ=2.43, and γ=2.043. This meant that LIESBTH followed an exponential‐shaped adoption pattern, whereas VELN's adoption pattern was consistent with the shape of the Erlang‐2 distribution. These differences in adoption patterns have very different implications for the distribution strategy of these movies and, ultimately, for parameter K (MAX).
Since the market size, time‐to‐act, and time‐to‐decide parameters vary significantly across movies and theaters and lead to different adoption patterns, it is critical that they are estimated by incorporating the impact of both movie attributes and theater characteristics; however, we found that we could not develop generalizations or simple rules to determine how these aspects affected these parameters. To overcome this, we resorted to multiple regressions, described in the next step.
Step 2: Calibration of regression model parameters from historical box office information
To run the multiple regression connecting the parameters of movie characteristics and theater attributes, we collected 1218 parameter triplets
∀p∈H from the nonlinear regression of Step 1. Note that each triplet corresponds to a theater across a range of movies. We divided these theater‐triplets into two sets, so that each set had approximately the same number of movies. The first set is the calibration sample with 609 theater‐triplets and 75 movies used to calibrate the multiple regression coefficients. The second was the holdout sample with the remaining 609 theater‐triplets and 74 movies, which was used to test the validity of the regression results in Steps 3 and 4. We used the IML procedure, a multiple regression routine in SAS to run the regressions. The regression results showed that homoscedasticity (equal variance) was violated, therefore, to correct this problem, we transformed the dependent variables to their natural logarithm. The regression results are summarized in Table 4.
Statistically significant at 1%,
Statistically significant at 5%,
Statistically significant at 10%.
The regression results show that movie attributes and theater characteristics are good predictors of maximum theater‐level box office revenues (R N 2=0.53). As a comparison, we also ran a multiple regression model using only movie attributes. We found that the predictive power of the regression came down significantly (R N 2=0.25). This provides strong support to include theater characteristics in our forecasting model. In addition, this regression in general possesses more predictive power for the time‐to‐act parameter (γ) than for the time‐to‐decide parameter (λ) (R 2=0.15, R 2=0.27). These results also provide interesting insight into which movie attributes and theater characteristics affect N, λ, and γ.
Movie attributes that significantly affect box office revenues include production budget, critics' reviews, genre, and release date. As expected, higher budget and positive critics' reviews add to box office success. Certain genres, specifically animation and fantasy, influenced box office revenues because fantasy and sci‐fi movies usually cater to specialized crowds. These results were similar to those of Litman and Ahn (1998). We also found that spring release dates adversely impacted box office revenues, possibly because of springtime travel and the restart of outdoor activities. Significant theater characteristics affecting box office revenues included increased presence of competing theaters, amount of discount on the ticket price, median age, population density, and geographical location specified in one of seven broadly classified regions in the United States. We found that the presence of competing theaters positively impacted box office revenues at a particular theater. While this result seems counter‐intuitive, this could be due to clustering effect (Chisholm and Norman 2002, Pinkse and Slade 1998, Tannenbaum 2001) in which the collective presence of stores offering the same or similar services allows better tapping into higher demand in urban areas. As expected, the size of the ticket discount was negatively correlated with box office revenues and explained why distributors often request that the number of discounted tickets be limited. We also found that median age was negatively correlated to box office revenue, while increased population density positively influenced box office revenue. Finally, the geographical location of a theater had a significant impact on box office revenue.
The time‐to‐decide parameter, λ, was influenced by movie attributes and theater characteristics. Several movie‐related attributes, such as fantasy and animation genres, star presence, and movies heavy in special effects, were positively correlated with λ. This is because these attributes pull audiences into theaters earlier, reducing the expected time to decide. On the other hand, restrictive MPAA ratings and winter release timing were negatively correlated with λ, because these attributes dampen interest and, thus, increase the expected time to decide. Significant theater characteristics included adult prices and number of screens. As anticipated, the magnitude of the discount on ticket prices was positively correlated with λ, as this reduced the time to decide. In addition, increasing the number of screens was negatively correlated with λ. This is because increasing the number of screens typically increased the time to decide presumably due to the perception that when there are more screens, the duration of movies would be longer and this could also reduce the chances of movies being sold out in subsequent weeks.
Movie attributes such as wider MPAA ratings and winter or spring opening dates were negatively correlated with γ. This is because these attributes increase the expected time to act. Conversely, genre (animation and fantasy) and sequels were positively correlated with γ, as they typically catered to special audiences whose expected time to act is smaller. A highly significant theater characteristic was the number of competing theaters within a 5‐mile radius. As expected, this was positively correlated with γ, because the previously described clustering effect could reduce the expected time to act.
Finally, we list the results on the interaction of theater‐related variables with movie‐related ones. None of the interaction terms proved to be significant predictors of box office revenue in our sample, but could be significant in other samples.
Steps 3 and 4: Estimation of model parameters and theater‐level box office revenues for a new movie
We define a new movie as a movie shown in theaters in the holdout part of our sample. To validate the multiple regression results, we estimated model parameters
, calculated box office revenue estimates for the holdout sample, and compared those estimates with the actual, achieved box office revenue. To better assess the performance of the box office estimation procedure, we also developed a benchmark model. This was based on a multiple regression model directly running box office revenue against the complete set of movie attributes that was used for our model without including any theater characteristics. This benchmark model itself was an enhancement on industry practice that was based on choosing one movie attribute per simple regression run and using multiple separate regressions to study the effects of different movie attributes on box office result (Rothenberg 2003, Tannenbaum 2001). We observed heteroscedasticity (unequal variance) in the error terms; therefore, we transformed the actual box office measure to its logarithm. The predictive power of the benchmark model was significantly weaker than that of our model. In addition, we observed that critics' ratings were positively correlated with box office revenue estimates, while this estimate was negatively correlated to spring season release dates.
Across the entire holdout sample consisting of 609 theaters and 74 movies, the average forecast error of our method was 15%, while the average forecast error for the benchmark model was 60%. Thus, our method reduces average forecast errors from the benchmark model by 75%. Table 5 summarizes actual box office sales across all theaters, aggregate forecasts and forecast error (expressed as a percentage of actual box office sales) for a select sample of movies for our method and the benchmark model. We next use the estimates of box office revenues from our method to test the DLSP.
Relative percentage forecast error=(Method estimate−Actual BO)/Actual BO.
Relative percentage forecast error=(Benchmark model estimate−Actual BO)/Actual BO.
5.2. Results of the DLSP
The parameters required for the DLSP, such as the portion of revenues allocated to the distributor, distribution costs, maximum number of theaters, and minimum number of theaters sought per region, were set based on specific movie‐level information. In addition, the key inputs to the DLSP were expected theater‐level box office sales (i.e., π ijt ) and the duration of play (i.e., t ij ). We estimated π ijt using the procedure outlined in section 3, while t ij was calculated by the model described in Somlo (2005).
We used XPRESS, a mixed‐integer programming solver in GAMS, to solve the DLSP. This generated optimal solutions for instances of the DLSP, when, on average, each theater had fewer than seven competing theaters. The average time for each run was around 53 seconds. However, we found that when each theater had more than seven competing theaters on average, GAMS could not solve the DLSP. This provided the motivation for developing the heuristics (i.e., lower bounds) to address this problem and upper bounds to evaluate the quality of the heuristics. A specialized Microsoft VisualStudio.Net program was written to calculate the lower bounds using the heuristics in section 4.1. To derive the upper bound, we solved the DLSP as a linear program using XPRESS in which W j is relaxed to be continuous between 0 and 1.
To examine how our models perform on larger problems, we used the reference data to construct larger problems with n theaters and m movies, where n=3000 and 6000 and m=150 and 300. We first analyzed the historical box office data and defined the probability distributions for the significant variables in the multiple regressions defined by (2) through (4). We then ran Monte Carlo simulations on these variables to generate expected theater‐level revenues for these larger problems for the required choice of n and m; however, we observed that some of these simulations generated several gigabytes of data without providing additional insight on the performance of the forecasting technique and the DLSP. Therefore, after careful consideration, we elected to analyze the 3000‐theater problem for the DLSP across 150 movies.
Table 6 summarizes some of our salient results from our computational tests. In this table, a row represents the solution technique used. These included the optimal solutions generated by GAMS, the upper bound generated by the linear programming relaxation in GAMS, and the lower bounds based on the myopic and greedy heuristics. Columns in the table represent the problem size of the DLSP represented by the number of theaters, movies, and the competition density, which is the average number of competing theaters per theater. On the basis of our discussion with several industry experts (McGrath 2004, Rothenberg 2003, Tannenbaum 2001) and to cover a broad range of scenarios, we picked the competition density to be 6, 11, and 15. The numbers in the body of the table describe the percentage gap of the given technique from a reference point, if that technique was successful in generating a solution for the given problem. As the DLSP problems were solved to optimality at a competition density level of six competitors per theater, this was used as a reference point. However, GAMS was unable to generate optimal solutions for problems with higher competition densities. Consequently, the upper bound was used as this reference for the remaining problems.
Our computational results have been quite encouraging. From Table 6, observe that when we use the myopic heuristic for a competition density level of 15 competitors per theater, the average gap from the upper bound was 10.3%, while the corresponding gap with the greedy heuristic was 11.2%. As the competition density level decreased, the performance of both heuristics improved. For instance, the average gaps for the six‐competitors‐per‐theater problem reduced to 1.8% and 1.3% for the myopic and greedy heuristics, respectively. We also wanted to better understand the circumstances under which the percentage gaps change. This could provide us with insights into how to improve the upper bound and the heuristics. We observe from our analysis that these gaps were uniformly higher when the number of available theaters for selection was higher. Conversely, the gaps were significantly lower when the number of available theaters was lower. It is important to note that these gaps were reduced because the upper bound became tighter. These results show that the heuristics perform well across a range of data and there is scope to improve the upper bounds.
To test how sensitive the value of the heuristics were to estimates of theater‐level box office revenues, we scaled π ijt by (1−x) and (1+x), where x=0.1, 0.2, and 0.3. Note that our scaling procedure resulted in six additional data sets for the 3000‐theater, 150‐movie problem set at a density level of 15 competitors per theater. Table 7 summarizes the average gaps for the myopic and greedy heuristics. These results show that average gaps for the myopic heuristic ranged from 10.3% to 10.95%, while the corresponding gaps for the greedy heuristic ranged from 11.2% to 12.7%. These results show that the heuristics were not significantly sensitive to estimation errors in π ijt and, thus, provide a reliable basis to address this problem. To see how sensitive the solutions of the heuristics were to the estimates of π ijt , we compared the optimal theater locations selected across the heuristics for the scaled values of π ijt . We found that, while the total number of locations and the composition of those locations in terms of theater types were stable, the actual locations proposed by the different solutions varied with differences of scale in π ijt .
The stability in the value of the heuristics and the total number of locations can be reassuring, but can also be misleading to the movie distributor. It is reassuring because the distributor can select the number of theaters prescribed by its general distribution plan using reasonably accurate theater‐level revenue forecasts over time. On the other hand, the stability might mislead the distributor into thinking that one has to only consider the same set of theaters across different movies. However, due to changes in movie attributes, the set of theaters may vary widely across movies. For instance, across the 150 movies, we found that on average only 43% of the theaters were common. This result reinforces the importance of including theater characteristics and movie attributes when determining the distribution plan.
We wanted to better understand the effects of minimum play length P (MIN) and consequently t ij on the solution of the DLSP. Distributors consider this parameter vital toward achieving the desired exposure, which, in turn, will affect the financial potential of a movie. Consequently, they go to great lengths to ensure that the agreed‐upon screen is allocated to the particular movie for the requested period of time. We tested our procedures with P (MIN) set to 3 weeks as the base case, and 2 and 4 weeks as alternative settings for the 3000‐theater, 150‐movie, 15‐competitors‐per‐theater problem. The sensitivity analysis of this parameter provided several interesting insights.
We compared the solutions with the alternative values of P (MIN) to the base case and found that a major shift occurs in the optimal theater locations when the minimum play length requirement was changed. At first glance, the change in the number of theaters and the actual selection were minimal, but a more detailed analysis on the new set of theaters revealed a significant difference in the type of theaters to target. The extent of this change in the types of theaters is surprising and goes unrecognized by distributors. For example, the total number of theaters selected by the DLSP showed a very modest increase from 740 to 750 theaters when we change P (MIN) from 3 to 2 weeks. However, more than 55% of the theaters recommended for the base case were replaced for the shortened commitment period. The new solution selected more mini‐type theaters in neighborhoods with moderate income in contrast with the original solution's heavy dominance by multi‐ and mega‐theaters in high‐income areas. When we change P (MIN) from 3 to 4 weeks, the total number of theaters decreased from 740 to 670, as the movie had to be shown for a longer duration, but, here again, more than 41% of the chosen theaters differed from the base case and were different theater types. Here mega‐ and multi‐theaters were preferred over mini‐theaters. This shows that the minimum play length requirement strongly influences the type of theaters that needs to be chosen, and distributors should carefully examine the effect of changing P (MIN) on theater choice before agreeing to change it on a movie‐by‐movie basis. The DLSP provides a structured and robust basis for conducting this assessment.
6. Application
We have compared the methods detailed in this paper to the theater selection decisions made by motion picture distributors on realistic data for the 3000‐theater, 150‐movie problem with the number of average competing theaters set to the highest level of 15 competitors per theater. We then ran the myopic and greedy heuristic to solve the DLSP for each movie and picked the best solution.
Next, we constructed a distribution plan for a given movie replicating the procedure that distributors would have used in practice, based on extensive discussions with several leading distributors (McGrath 2004, Molter 2004, Rothenberg 2003). In this procedure, distributors first ranked theaters in decreasing order of historical revenues across all movies. If necessary, the distributors modified the initial ranking by weighting sales along with a key attribute rank. For instance, consider the case of when a movie is targeted toward a certain ethnicity (e.g., African‐American viewers). In this case, theaters in a region would be ranked in decreasing order of the proportion of that ethnicity in the vicinity of the theater's location. A final ranking will be developed by assigning weights to the sales and ethnicity rankings. Next, they picked the highest ranked theaters in each region while ensuring that the minimum number of theaters and competition constraints were met in each region. Finally, they also made sure that the total number of theaters across regions did not exceed the maximum number of theaters required, which was set based on the distribution strategy derived from the marketing budget for each movie.
We wanted to compare our method with the distributor's procedure. To ensure that the quality of the theater‐level box office forecast and the duration of play did not affect this comparison, theater rankings were developed using theater‐level revenue forecasts using our estimation procedure, while the optimal duration of play for a given movie at each theater was determined by the model in Somlo (2005). Comparing our method with the distributor's procedure, we found that theaters chosen by our method were 51% different on average than those selected by the distributor. In addition, had our method been implemented, this would have increased the average box office profit by US$2950 per theater, or by an average of US$1.8 million per movie. This translates to a 12% increase in expected distributor's box office profit. In addition, absolute revenue and individual percentage improvements for some movies were as high as US$5 million or 33%, respectively.
It is important to note that these numbers underestimate true gains. In practice, the distributor's method would have performed worse than these results indicate without the advantage of the optimal duration of play for each movie determined using the model in Somlo (2005). We believe that our method outperforms the distributor's procedure because its allocation of theaters is based not just on a ranking of historical sales volumes across all movies or a ranking based on weighting of sales along with certain attributes, but matching all the key attributes of a given new movie with the characteristics of the theater under consideration. For instance, consider the movie “What Women Want.” The distributor's approach chose theaters with historically high sales in areas where the ethnicity was predominantly white at each of the regions. In contrast, our approach chose smaller theaters in urban areas with a higher percentage of singles and with higher population densities. This led to an average increase of box office profit by US$1000 per theater or around US$1.3 million in total possibly due to the appeal of this movie with singles who typically lived in more densely populated urban areas. As an additional example, consider the movie “Meet the Parents.” The distributors again chose theaters with historically high sales in predominantly white neighborhoods. In addition to considering ethnicity, our approach allocated more theaters in the Midwest and the South Central region and in higher income neighborhoods. This led to an average increase in box office profits per theater of US$1100 or US$1.5 million across all theaters perhaps due to the movie's appeal with conservative and affluent audiences. These examples provide further evidence that including theater‐level characteristics is crucial to effective distribution planning in the motion picture industry.
7. Conclusions
Our goal in this paper is to expose the reader to an intellectually engaging problem context laden with opportunities for research that can have a high impact on profits in the motion picture industry. The following conclusions can be drawn from this research.
There are significant differences in revenue across the same movie in different theaters due to dissimilarity in location‐specific characteristics at the theater such as amenities and demographics (e.g., median income, age, and population density). Therefore, it is important to develop detailed, disaggregate theater‐level box office forecasts and to use these forecasts to determine the distribution plan that decides on which theaters to show a given movie. Forecasting theater‐level, box office revenues is challenging as it requires an understanding of which movie attributes and theater characteristics will affect revenues, how they do so, and how this changes over time. In addition, one needs to also understand how the decision process of individual moviegoers to see a given movie affects theater‐level forecasts and how this can be integrated into the forecasting process. The estimation procedure developed in this paper incorporates these aspects into determining detailed, theater‐level revenue forecasts. This procedure reduces average forecast error by over 75% compared with benchmark models based on industry practice. Given theater‐level revenue forecasts over time, the distributor faces the problem of determining at which theaters to show a given movie in order to optimize profits. This problem is complicated because a minimum number of theaters has to be selected in each region and because the distributor needs to ensure that competing theaters are not selected. The DLSP provides an effective basis to approach this problem. We also found that this model was robust with variations in the theater‐level revenue forecast and provides a basis to understanding the impact of changes in minimum play length on theater choice. In addition, the DLSP outperformed the method used by the distributors to select theaters and has the potential to increase average distributor profits by 12%, or around US$1.8 million per movie.
This paper provides several avenues for future research. First, refinements could be developed to further improve the accuracy of the theater‐level box office revenue forecasting procedure. Second, improvements could be made on the heuristics to increase the profits from the DLSP. Finally, the approach developed in this paper in which we determine the best locations to show a movie by estimating profit as a function of movie (or product) attributes and theater (or location) characteristics can be applied in a variety of service industry settings. For instance, one could use this idea to choose the best locations for concerts in the music industry, to determine the optimal location of specialty boutiques in the retail industry, and to pick the locations of resorts and restaurants in the hospitality industry. The modifications required to apply our model in these contexts could be a promising area for new research.
In conclusion, we believe that the methods presented in this paper provide a useful method to forecast theater‐level box office revenues and use these forecasts to choose the best locations to screen the movie to optimize the distributor's profit in the motion picture industry.
Footnotes
Acknowledgments
We would like to thank Thomas McGrath, Executive Vice President, Viacom Entertainment Group; Don Tannenbaum, Executive Vice President, Distribution, Warner Brothers; Thomas Molter, Vice President, International Distribution, Warner Brothers; and Steve Rothenberg, President of Distribution, Artisan Entertainment, who all provided valuable industry‐specific information for this research. We would also like to thank Professors Anand Bodapati, Charles Corbett, Donald Morrison, Rakesh Sarin, and Charles Weinberg for several helpful comments. Financial assistance for this research was provided by the Center for International Business Education and Research and the Entertainment and Media Management Institute at UCLA. We would also like to thank Theodore Treantafelles at the UCLA Anderson School who helped us with various aspects of this work.
1To check whether the sample represents a typical year of movie releases, we tested the difference between the proportions of movies released over time and across genres in the previous and following years. We could accept the null hypothesis at the 95% confidence level that the two sample population proportions are equal in each class.
