Abstract
Premised on the widespread belief that school leadership is central to school improvement (e.g., Datnow & Castellano, 2001), the research and policy communities in many countries have been engaged in urgent searches for the leadership practices that matter most. These searches have been “theory free” in some cases and “theory driven” in others. In the U.S. context, Marzano, Waters, and McNulty’s (2005) meta-analysis of 70 studies is likely the best known of the recent theory-free approaches, identifying a list of 21 specific leadership “responsibilities” associated with student achievement. Theory-driven approaches have typically centered on a handful of leadership “models” (e.g., constructivist leadership, learning leadership), but often without much effort to empirically test their validity. 1 Some of these models are intended to be used specifically in school contexts, and models of “instructional leadership” clearly dominate this field (Hallinger, 2003).
Other models considered promising for school leaders (e.g., servant leadership, strategic leadership) trace their origins to research conducted largely in nonschool contexts. These models have been welcomed under the school leadership tent with varying, but typically limited, degrees of testing and adaptation to the unique circumstances in which school leaders work. The most widely adopted and tested of these models is “transformational leadership.” 2 It was initially conceptualized by Burns (1978) and further developed by Bass (e.g., 1985) for use in a wide array of organizational contexts, and Leithwood and his colleagues modified and adapted a version of transformational leadership first proposed by Podsakoff, MacKenzie, Moorman, and Fetter (1990) to better suit the leadership demands found in schools. 3
Transformational leadership theory claims that a relatively small number of leadership behaviors or practices are capable of increasing the commitment and effort of organizational members toward the achievement of organizational goals. The values and aspirations of both leader and follower are enhanced by these practices. Unlike traditional models of leadership that are “transactional” in nature, transformational leadership theory argues that, given adequate support, organizational members become highly engaged and motivated by goals that are inspirational because those goals are associated with values in which they strongly believe—or are persuaded to strongly believe. Transformational leadership theory, then, identifies which internal states of organizational members are critical to their performance and specifies a set of leaders practices most likely to have a positive influence on those internal states.
Transformational leadership theory does not, however, predict the behaviors of organizational members resulting from the influence of transformational leadership practices, much less the consequences of those behaviors for more distal organizational outcomes. In an educational policy environment with a laser-like focus on improving student achievement, transformational leadership theoretically offers only a partial “solution” to the leadership “problem.” Although improving student achievement would seem to require what transformational leadership delivers, it also requires predictable changes in the performance of organizational members. Teacher practices, for example, must often change in specified ways if student achievement is to improve. This has given rise to recent descriptions of school leadership that combine practices associated with both transformational and instructional leadership models (Leithwood, Louis, Anderson, & Wahlstrom, 2004; H. M. Marks & Printy, 2003; Robinson, Hohepa, & Lloyd, 2009). We consider this recent development further in the conclusion to this article.
The purposes for this meta-analytic review are the following:
To identify the practices associated with transformational school leadership (TSL)
To describe the internal states and behaviors that TSL practices are assumed to influence and the extent to which such influence is manifest
To identify the school conditions influenced by TSL and the extent of such influence
To assess the influence of TSL on student achievement 4
We chose to pursue these goals using unpublished research as our primary source of evidence for reasons explained below.
Review Method
Analysis of Evidence
There have been at least five reviews of TSL research reported over the past 15 years. 5 Three of them have been limited only to published research, and three have used traditional “vote-counting” methods. Vote-counting reviews advance narrative review methods by counting the numbers of studies providing evidence about the same phenomenon. But they still retain some of the weaknesses of traditional narrative reviews, in particular difficulty in explaining conflicting results and assessing the magnitude of relationships. Although limited to reviews of quantitative research, meta-analysis significantly advances the precision of vote-counting reviews by applying statistical tools to the analysis of results reported by original studies. Of the five previous reviews of TSL research, only two used meta-analytic review methods. One of these (Chin, 2007) included only studies based on Bass’s conception and measure of transformational leadership, whereas the second (Robinson et al., 2009) encompassed multiple leadership models but included only five studies of TSL. 6
Although this review used meta-analytic techniques to accomplish three of its purpose, narrative review methods were used to identify and synthesize TSL practices.
Data Sources
The primary evidence for this review is unpublished theses and dissertations. We chose this source of evidence for four reasons. First, it is a source of data largely ignored by previous reviews of TSL and so might provide insights as yet unreported in the published literature. Second, among the numerous ways in which bias can creep into a review of literature, publication bias is among the most likely and arguably the most overlooked. Published research, for example, rarely reports nonsignificant findings “even when they are replications of earlier studies reporting significant results” (Kraemer & Andrews, 1982, p. 405). The most alarming view of this problem is that “journals are filled with the 5% of the studies that show Type 1 errors while the file drawers back at the lab are filled with the 95% of the studies that show non-significant (e.g., p > .05) results” (Rosenthal, 1979, p. 638). Slavin (1995) and Wolf (1986) have both recommended including in reviews evidence reported in books, dissertations, and unpublished papers presented at professional meetings.
A third justification for our sources of evidence is that theses make up a substantial proportion of the whole population of studies inquiring into any given hypothesis. Some are eventually published in journals, although many are not; unpublished dissertations are the original sources of much of the evidence on which more public reports are based. Finally, some have argued that many unpublished studies are better designed than many published studies. For example, Slavin (1995) argues that it “may sometimes be easier to get a poorly designed study into a low quality journal than to get it past a dissertation committee” (p. 14).
This review aimed to include all unpublished theses about TSL that met three selection criteria. To be included in the review, a thesis had to
Report quantitative data
Investigate the relationship between transformational leadership and at least one variable concerning school conditions, teachers’ internal states and practices, or student outcomes
Conduct at least one of the following types of statistical analysis: correlation, regression, ANOVA, and t test
The abstract and method sections of each thesis were read with these criteria as screens, resulting in the selection of 79 theses for the review.
Effect Size Adjustments
A correlation coefficient is subject to three sources of error that can be eliminated at the level of meta-analysis: sampling error, error of measurement, and range variation (Hunter, Schmidt, & Jackson, 1982). This study did the corrections of the variance for sampling error. The correction for attenuation (error of measurement) was attempted but eventually was not done because the application of the adjustment procedure for unreliability did not improve the analytical results or reduce the variance as expected in the trial examination of the effect size distribution concerning TSL impact on teacher commitment. As to the third source of error, range variation, the samples of the studies included in this review were not restricted or deliberately selected from a particular subpopulation that was used to study a particular a group of school leaders who differed greatly in the range of values from the those of populations. Therefore, there was no need to do a range correction.
Reducing Sampling Error: Weighted Effect Size Means
Calculating weighted means is the best way to reduce sampling error (Hunter et al., 1982). Different studies have different sample sizes. From a statistical perspective, effect size values based on larger samples are more precise estimates of the corresponding population value than those based on smaller samples (Lipsey & Wilson, 2001). Therefore, the effect size of each study should carry a different “weight” in the sample of research findings to be meta-analyzed; when calculating the mean of the effect sizes, this needs to be taken into account. Although the optimum weights are based on the standard error of the effect size, in practice the standard error for a given statistic is estimated from sample values using a formula derived from statistical theory (Lipsey & Wilson, 2001). This study followed this rule and used the inverse variance weight w (the inverse of the squared standard error value) to adjust the value of each effect size ESi. That is, we used
Fisher z Transformation of r
Another concern about r is that as the population value of r gets further and further from zero, the distribution of rs sampled from that population becomes more and more skewed. This fact complicates the comparisons and combination of rs (Rosenthal, 1994). So Fisher (1928, cited in Rosenthal, 1994) devised the transformation (Zr), which is distributed nearly normally, to address this complication. Some scholars agree with Rosenthal, others do not. Among those who do not, for example, Hunter and Schmidt (2004) suggest averaging the effect size r directly since the Fisher z transformation adds further bias to the estimates. Practically, these two kinds of treatments usually result in very similar values of the mean of the effect size estimates since r ranges from 0 to 1 and in such a small range, a nonlinear line can be regarded as a linear line. In other words, the mean of r, which is not of a linear function and cannot be added up directly arithmetically, is roughly equal to the mean of the adjusted r using log functions (Fisher z transformation) since the nonlinear function r can be transformed into linear functions with the use of logarithms (i.e., Fisher z transformation). This study used the first treatment and transformed effect sizes (ES) to get the group of adjusted effect sizes (ES’). We then averaged them to get the mean of the effect sizes. This is the way that most scholars do this calculation. The following formula was used to adjust ES (ri) and then average them (Zi) to get the mean of the ES.
Reliabilities Adjustment: Eliminating Measurement Error
Hunter et al. (1982) provided methods for adjusting for the unreliability of measures (correction for attenuation). Their objective was to permit the meta-analyst to come as close as possible to estimating the magnitude of the relationship represented in an effect size as it would appear under ideal research circumstances. They provided procedures to adjust for the unreliability of the variables and other adjustments. Rosenthal (1994) argued against these procedures. They argued that the goal of getting an estimate of what effect size we might expect to find in the best of all possible worlds is to teach us better what is, and this is not a proper goal for a meta-analysis. Correction for unreliability alone can yield corrected effect size correlations greater than 1.00 (Rosenthal, 1991). They recommended looking for correlates (referred to as moderating effects in this study) of effect sizes in lieu of these procedures. However, the information required to apply most of these adjustments is often unavailable for all of or even for a majority of the research studies coded for a meta-analysis. So Lipsey and Wilson (2001) recommended that it is up to the meta-analyst to decide if it is better to adjust some effect sizes while not adjusting others or to leave them all unadjusted under the rationale that they are more comparable that way, even if they are less accurate.
Based on the initial testing and the fact that the majority of the studies included in our research did not report reliabilities, we did not apply the measurement reliability adjustment procedure to the meta-analyses involved. Instead, following Rosenthal’s (1991) suggestion, we tested whether the measurements themselves had moderating effects (called moderators in this study). The moderating effects of the measures of the independent variables were tested (i.e., leadership measures since the way that various conceptualizations or models of leadership moderate the leadership effects themselves is the interest of this study). If their reliabilities were adjusted, the pattern of difference would be lost. This test of the moderating effects of leadership instruments or models enhanced the validity of the study.
Outlier Analysis
Extreme values may cause significant within-group heterogeneity of individual effect sizes that may not exist in reality (Hunter & Schmidt, 2004). Furthermore, the weighted averages given to large-sample-size studies may cause the overall effect size to be influenced by relatively few studies. Thus, extreme values of effect sizes were checked before computing the weighted means of effect sizes involved in each meta-analysis. Quartiles (called fences) were used to calculate the extreme values in the tails of each of the effect sizes distributions. If the lower quartile is Q1 and the upper quartile is Q3 (defined as the 25th and 75th percentiles, respectively), then the difference (Q3 – Q1) is called the interquartile range or IQ. The fences are then defined as follows (NIST/SEMATECH, 2009):
Lower inner fence: Q1 − 1.5*IQ
Upper inner fence: Q3 + 1.5*IQ
Lower outer fence: Q1 − 3*IQ
Upper outer fence: Q3 + 3*IQ
Only the extreme values were removed from the analysis (i.e., the values outside of the upper and lower outer fences), whereas the moderate extreme values were retained (those within the inner fences) following Hunter and Schmidt’s (2004) suggestion that these values may occur simply because of large sampling errors, which have been previously corrected. By applying this criterion, no original data points were deleted from our analyses.
Calculating the Confidence Intervals Around the Mean Effect Size
“Confidence intervals indicate the range within which the population mean is likely to be, given the observed data” and “this is useful in indicating the degree of precision of the estimate of the mean effect size” (Lipsey & Wilson, 2001, p. 114). This study used the following formulae to calculate the standard error of the mean
where
If the confidence interval does not include zero, the mean effect size is statistically significant at p ≤ α. A direct test of the significance of the mean effect size is obtained by computing a z test as,
Homogeneity and Heterogeneity Analysis
To examine whether various effect sizes that are averaged into a mean value all estimate the same population effect size, we need to know whether the effect sizes form a homogeneous distribution. “In a homogeneous distribution, the dispersion of the effect sizes around their mean is no greater than that expected from sampling error alone” and “[a] statistical test that rejects the null hypothesis of homogeneity indicates that the variability of the effect sizes is larger than would be expected from sampling error. Therefore, each effect size does not estimate a common population mean” (Lipsey & Wilson, 2001, p. 115) and vice versa. The homogeneity test is based on the Q statistic, which is distributed as a chi-square with k – 1 degrees of freedom where k is the number of effect sizes (Hedges & Olkin, 1985, cited in Lipsey & Wilson, 2001). This study used an algebraically equivalent formula that is computationally simpler to implement to calculate the Q,
If Q was not significant (indicating homogeneity), we interpreted the results. “If Q exceeds the critical value for a chi-square with k – 1 degrees of freedom, then the null hypothesis of homogeneity is rejected” (Lipsey & Wilson, 2001, p. 116). In cases where Q was significant (indicating heterogeneity), we went to the next stage, the analysis of heterogeneous distribution of ES. There are three ways to understand and respond to this heterogeneity (Lipsey & Wilson, 2001):
Assume that the variability beyond subject-level sampling error is random among studies whose sources cannot be identified. In this case, the analyst adopts a random effects model (REM).
Assume that variability beyond subject level sampling error is systematic and is derived from identifiable differences between studies. In this case the analyst adopts a fixed effects model (FEM).
Assume that the variance beyond subject-level sampling error is derived partly from systematic factors that can be identified and partly from random sources that cannot be identified. This requires a mixed effects model (MEM).
This study used REM, FEM, and MEM, respectively, to decide which model was most suitable to each analysis. FEM has more statistical power for detecting a moderator relationship with effect size (Lipsey & Wilson, 2001). For each meta-analysis, we started with the FEM since identifying systematic between-study differences and the moderators that affect study results regarding the leadership effects on various school, teacher, and student outcomes is the interest of this study. If Q was significant, further analyses were conducted to identify the contrasts whose means were significantly different, which is a procedure similar to the post hoc one-way ANOVA.
Wilson (2009) suggested using REMs when
Total Q is significant and you assume that the excess variability across effect sixes derives from random differences across studies
The within-studies Q from an analog to the ANOVA (homogeneity analysis appropriate for categorical variables, which looks for systematic differences between groups of responses within a variable) is significant (FEM)
Thus, when the above conditions appeared, REM was applied after the FEM was used. Also, if the random component was very large (relative sampling error), MEM was used thereafter since a large random component leaves open the possibility that the differences between studies are, in fact, systemic (Lipsey & Wilson, 2001). Even when this situation did not appear, FEM and REM were conducted for each meta-analysis to compare the results and search for further explanations. As Lipsey and Wilson (2001) have said, “A sensitivity analysis comparing the results from fixed and mixed effects models is usually advisable” (p. 125). As well, this comparison served as additional research that is needed to sort out the conditions under which the various models, fixed, random, and mixed, are most appropriate (Overton, 1998). So we examined which model was the best at explaining variances and interpreted results accordingly.
Macros for SPSS written by Wilson (2009; Lipsey & Wilson, 2001) were used to analyze effect size distributions. Both fixed and mixed models or random models were used to compute weighted means, test moderating effects and moderators, and calculate and compare group means. In the case of MEM, there are three methods for estimating mixed effects: namely, method-of-moments random effects, full-information maximum likelihood, and restricted-information maximum likelihood. All three methods were used in each of the meta-analyses in this study. Only the findings that resulted from the maximum likelihood method were reported, mainly because the confidence intervals yielded by this method compared with the other two methods are often more precise (i.e., narrower). We manually calculated the computations that Wilson’s macros do not cover, such as converting different types of effect sizes into Pearson correlation coefficients or calculating and combining effect sizes for each.
Reliability Issues
Reliability issues in meta-analysis are mainly concerned with consistency in the location of studies, comprehensiveness of the collection of studies, coding of the features and results of the studies to be included in the analysis, and calculation and recording of the effect size estimates and significance levels. The degree of reliability in these areas was enhanced in this study by a thorough search of online databases for dissertations and theses, appropriate inclusion criteria that strike a balance between an exhaustive examination of this body of literature and the exclusion of studies that were not suitable to be included for this review, systematic and consistent coding of studies, and the application of standard meta-analytic techniques in the calculation and conversion of effect sizes.
In terms of the reliability of the coding of the features of the studies, interrater reliability (as the coding was initially done by two coders) was enhanced by developing and pilot testing coding forms before coding characteristics for the meta-analysis; developing a detailed, explicit coding scheme and procedures for coding; and involving the coders in discussions and decisions concerning coding rules. Initially, two researchers coded the same studies independently. Then their work was compared and discussed and the consistency between them was enhanced. At the stage of data analysis, we reviewed and corrected all coding when necessary. For effect size coding, there was only one coder.
All these procedures help ensure a high consistency in coding, and hence high internal reliability was achieved. Furthermore, sampling error for the collection of studies reviewed was reduced by calculating weighted means of effect sizes. Adjustment for the unreliability of measures was initially prepared but later discarded for a number of reasons.
Validity Issues
External and construct validity of a meta-analyses are both related to the “apples” and “oranges” problem of trying to determine which studies should be aggregated (Wolf, 1986). External validity was enhanced in this study by systematic narrative review and the synthesis of outcome variable measures, the conducting of separate meta-analyses on different outcome variables, the exploration and testing of moderating effects, and the testing of the homogeneity of the results. Internal validity in meta-analysis is concerned with the extent to which variations in design quality influence the outcomes of the meta-analysis (Wolf, 1986). Internal validity was enhanced in this study by the selection of a body of research (dissertations or theses) containing studies of a similar high quality using appropriate inclusion criteria and by reducing minimizing publication bias by including only unpublished research.
The following sections of the article summarize results of our analysis about the nature of TSL and its effects on school conditions, teachers’ internal states and behaviors, and student achievement. Following guidelines provided by Cohen (1988) and Hattie (2009), 7 an effect size of .20 is interpreted in the text as small, .40 as moderate, and .60 as large.
The Nature of Transformational School Leadership
Although sharing most of the same underlying goals and assumptions, six different approaches to or models of transformational leadership were included in the evidence reviewed for this study. 8
Bass and Avolio’s (e.g., 1995, 2000) two-factor model (transformational and transactional leadership conceptualized as two ends of an approach to leadership based on dramatically different theories of human motivation) with the Multifactor Leadership Questionnaire (MLQ) as its primary measuring instruments
Leithwood, Aitken, and Jantzi’s (2001) TSL model measured using the Nature of School Leadership survey (NSL)
Kouzes and Posner’s (1995) model measured with the Leadership Practices Inventory
M. Sashkin’s (1990) visionary leadership model measured with the Leadership Behavior Questionnaire (LBQ)
A model developed by Chong-Hee No (1994 in Ham, 1999) and measured with the Principal’s Transformational Leadership Questionnaire
A transformational leadership model and measure developed by Wiley (1998)
After eliminating nonsubstantive distinctions in wording, these six models of TSL, as a whole, include the 11 leadership practices listed in the left column of Table 1.
Transformational School Leadership Practices Measured by Instruments Used in the Research
Multifactor Leadership Questionnaire (e.g., Bass & Avolio, 1995).
Early versions of the Nature of School Leadership survey (Leithwood, Aitken, & Jantzi, 2001).
Leadership Practices Inventory (Kouzes & Posner, 1995).
Leadership Behavior Questionnaire (M. Sashkin, 1990).
Principal’s Transformational Leadership Questionnaire (Chong-Hee No, 1994, in Ham, 1999).
Author-constructed transformational leadership instrument (Wiley, 1998).
This column also organizes these 11 practices into four categories or dimensions after a conception of TSL originally proposed in Leithwood et al. (2001) and extended in recent reviews of published school leadership research (e.g., Leithwood & Jantzi, 2005; Leithwood & Riehl, 2005). Several approaches to TSL (Bass & Avolio, 2000) also include the two sets of related practices (contingent reward and management by exception) listed at the bottom of Table 1.
The five right-hand columns of Table 1 indicate (X) which practices are included in and measured by each of the six approaches to TSL. Some leadership practices are common to all or the majority of these models. For example, all models and their corresponding measures include “developing a widely shared vision/goals” and “providing individualized support.” The meanings typically ascribed to each of the leadership practices are described below.
Setting Directions
1. Develop a shared vision and building goal consensus
Leaders enacting this practice identify, develop, and articulate a shared vision or broad purpose for their schools that is appealing and inspiring to staff. They also build consensus among staff about the importance of common purpose and more specific goals, motivate staff with these challenging, but achievable goals, and communicate optimism about achieving these goals. These leaders also monitor progress in achieving shared goals and keep these goals at the forefront of staff decision making.
2. Hold high performance expectations
Leaders expect a high standard of professionalism from staff, expect their teaching colleagues to hold high expectations for students, and expect staff to be effective innovators.
Developing People
3. Provide individualized support
Involved in the various definitions of providing individualized support are leaders listening and attending to individuals’ opinions and needs, acting as mentors or coaches to staff members, treating staff as individuals with unique needs and capacities, and supporting their professional development.
4. Provide intellectual stimulation
Leaders enacting this set of practices challenge the staff’s assumptions, stimulate and encourage their creativity, and provide information to staff members to help them evaluate their practices, refine them, and carry out their tasks more effectively.
5. Model valued behaviors, beliefs, and values
Modeling includes “walking the talk,” providing a model of high ethical behavior, instilling pride, respecting and trusting in the staff, symbolizing success, and demonstrating a willingness to change one’s own practices as a result of new understandings and circumstances.
Redesigning the Organization
6. Strengthening school culture
Leaders enacting this set of practices promote an atmosphere of caring and trust among staff, build a cohesive school culture around a common set of values, and promote beliefs that reflect the school vision.
7. Building structures to enable collaboration
Leaders ensure that staff participate in decisions about programs and instruction, establish working conditions that facilitate staff collaboration for planning and professional growth, and distribute leadership broadly among staff.
8. Engaging parents and the wider community
Leaders demonstrate sensitivity to parent and wider community aspirations and requests, reflect community characteristics and values in the school, and actively encourage parents and guardians to become involved in their children’s education at home and in school.
Improving the Instructional Program
9. Focus on instructional development
The development and inclusion of this set of leadership practices represents the most substantial difference between models of transformational leadership developed for school and nonschool contexts. These practices are typically associated with models of “instructional leadership” but are included in NSL’s conception and measure of TSL as a result of work over many years to create and test a “purpose-built” model of transformational leadership appropriate for school contexts. More recent versions of NSL expand the number of practices in this dimension, but these additional practices (e.g., staffing the instructional program) were not measured by studies meeting the inclusion criteria for this review.
Related Practices
The two practices in this category are traditional approaches to leadership in their own right. The first (contingent reward) reflects traditional models of motivation and is a key feature of what Bass (1985) called “transactional leadership,” whereas the second (management by exception) often manifests itself as nonleadership.
10. Contingent reward
The leader rewards staff members for completing agreed-on work.
11. Management by exception
The leader monitors the performance of staff members and interacts with them when their behavior deviates from expectations.
Avolio and Bass include in their transformational leadership model “laissez-faire leadership.” Laissez-faire leaders avoid their own supervisory responsibilities and avoid trying to influence their subordinates (Bass, 1990). This essentially nonleadership practice was excluded from our analysis since it is not a dimension of transformational leadership. Most of the studies in our review that included it reported nonsignificant or negative effects, as have previous reviews of transformational leadership (e.g., Judge & Piccolo, 2004).
TSL Effects on School Conditions
A total of 46 analyses reported in 32 studies examined the effects of TSL, as a whole, on 17 school conditions. Table 2 summarizes the results of these analyses. Based on 249 effect sizes, overall effects on aggregate school conditions are moderate, significant, and positive (weighted mean r = .44). Sufficient evidence was available for six of these conditions to permit meta-analyses. TSL had large or close to large effects on shared goals (.67), working environment (.56), and improved instruction (.55). TSL had moderate effects on organizational culture (.44) and shared decision making (.36).
Transformational School Leadership Aggregate Effects on School Conditions
p < .05. **p < .01. ***p < .001.
For conditions examined in only one study, the effects of TSL ranged from large (on school coherence and coordination and improvement in developing people) to small (on level of technology, organizational effectiveness). TSL had its largest effects on improved direction setting, achieving shared goals, and peer cohesiveness within schools.
In sum, this evidence demonstrates a significant contribution by TSL to a wide range of school conditions, many of which have been identified in previous research as enabling teaching and learning. Non-school-sector evidence also demonstrates large effects of transformational leadership on organizational conditions during the management of complex organizational change (Underdue, 2005).
The moderating effects of school level, school type (public vs. religious, or private), and leadership model were tested on the relationship between TSL and school conditions variously labeled organizational culture, school climate, and organizational learning for which sufficient data were provided to permit moderation testing. The moderate effects of TSL on school culture did not significantly differ between elementary and secondary schools or among school types. The leadership model, however, was a significant moderator. TSL behaviors or practices as measured by the NSL correlated significantly higher with school culture (r = .57) than did TSL practices as measured by MLQ (r = .33). The most powerful leadership practices influencing school culture were those related to the categories setting directions (e.g., developing a shared vision) and developing people (e.g., providing intellectual stimulation).
We also tested the magnitude of the effects of individual TSL practices on school conditions resulting from our meta-analyses (Table 3). All of the leadership practices had at least moderate effects on school conditions (.34 to .47). Management by exception was not effective in influencing any of the conditions measured by the studies.
Effects of Individual Transformational School Leadership (TSL) Practices on Aggregate School Conditions
p < .05. **p < .01. ***p < .001.
These findings suggest, in sum, that each TSL leadership practice adds to the status of consequential school conditions. Each condition is complex, and improvement requires leaders to enact a wide range of practices. A narrow set of leadership practices seems unlikely to work. Leaders influence school conditions through their achievement of a shared vision and agreed-on goals for the organization, their high expectations and support of organizational members, and practices that strengthen school culture and foster collaboration within the organization.
TSL Effects on Teachers’ Internal States and Behaviors
A total of 88 analyses provided by 46 studies examined the effects of TSL on 21 teacher states and behaviors. Sufficient evidence was available for 9 of these states and behaviors to permit meta-analyses. Based on 183 effect sizes, Table 4 indicates that TSL’s overall effects on teachers are in the high to moderate range (weighted mean r = .57). TSL’s influence, as a whole, is strongest on individual teachers’ internal states (.61), followed by its influence on their behaviors (.47) and collective internal states (.23).
Transformational School Leadership Aggregate Effects on Teachers’ Internal States and Behaviors
p < .05. **p < .01. ***p < .001.
Among individual teacher internal states, TSL is especially strongly related to perception of leaders’ effectiveness (.82), job satisfaction (.76), and teacher commitment (.70). TSL is also strongly related to three teacher practices or behaviors: disciplinary practices (.73), use of knowledge (.69), and perceptions of their school’s effectiveness (.63).
One previous review (Chin, 2007) also reported large effects of TSL on two teacher-related outcomes—teacher job satisfaction (.71) and teacher-perceived school effectiveness (.70) as measured by the MLQ.
The moderating effects of school level, school type, and leadership model were tested on the relationship between TSL and both teacher commitment and job satisfaction when data permitted doing so. The effects of TSL on teacher commitment (FEM) were large and did not differ across school levels. TSL, as measured by the MLQ, correlated significantly higher with teacher commitment than when TSL was measured using the NSL. But when an REM was applied, this difference was no longer significant. The strong effects of TSL on teacher job satisfaction did not differ significantly between elementary and secondary schools.
Specific leadership practices with the greatest influence on both teacher commitment and teachers’ job satisfaction were those related to building relationships, developing people (i.e., modeling and providing intellectual stimulation and individualized support), and developing a shared vision (a direction-setting practice).
A total of 10 TSL practices were examined for their impact on teachers’ internal states and behaviors when data allowed meta-analyses to be calculated or when effect sizes were reported in the original studies. Table 5 summarizes meta-analyses assessing the effects of individual TSL practices on teachers’ internal states and behaviors. These results indicate that leaders influence teachers mainly through people-developing practices, namely, modeling behaviors (.54), providing individualized support (.52) and intellectual stimulation (.50), and achieving a shared vision and agreed-on goals for the organization (.50), a direction-setting practice. Contingent reward also had a large effect on teachers (.51). Holding high expectations and organizational redesigning practices such as strengthening school culture, building collaborative structures, and providing a community focus had small but significant influences on teachers (.21, .25). Management by exception had a significant, negative effect on teachers’ internal states or practices (–.31).
Effects of Individual Transformational School Leadership (TSL) Practices on Aggregate Teacher Internal States and Behaviors
p < .001.
TSL Effects on Student Achievement
To analyze the effects of TSL on student achievement, the effect sizes reported by the original studies were grouped based on whether they examined direct or indirect effects. Studies using direct effects designs examined the relationship between TSL and student achievement only, whereas indirect effects designs also included either mediating or moderating variables. A total of 93 analyses reported in 33 studies examined TSL effects on six types of student outcomes (achievements, attendance, college-going rate, drop-out rate, graduation rate, and percentage of time removed from regular classes). However, student achievement, typically measured with statewide achievement tests, was the most frequently used dependent measure (31 studies and 82 analyses) and the only one considered in this article.
Among the 31 studies examining TSL achievement effects, 24 examined the direct effects, whereas 23 also examined indirect effects. These analyses assessed the total and independent effects of TSL, in combination with moderators, mediators, or both. Previous evidence suggested that indirect analyses are more likely to detect significant effects (Hallinger & Heck, 1998). Indirect analyses also are consistent with recently tested theoretical accounts of how leadership influences student learning (e.g., Hallinger & Heck, 2009; Leithwood, Patten, & Jantzi, 2010). Nevertheless, the results of our meta-analysis are based on direct effect designs because the studies reporting indirect effects were not combinable.
Table 6 reports the effects of individual TSL practices on student achievement. TSL had small but significant, positive effects on student achievement: the weighted mean r was .09, with a 95% confidence interval around the mean effect size ranging from .04 to .14. Separate analyses of TSL’s impacts on achievement in reading (.15) and math (.18) yielded significant and slightly larger positive effects. Results in Table 6 indicate that two dimensions—building collaborative structures (weighted mean r = .17) and providing individualized support (weighted mean r = .15)—have significant direct effects on student achievement. These statistics are based on the FEM. The effects of TSL on student learning did not differ across school levels or with the use of MLQ versus NSL for measuring leadership.
Transformational School Leadership (TSL) Effects on Student Achievement
p < .05. **p < .01.
Conclusion
Aggregate TSL as well as each of the individual TSL practices had moderate effects on teacher internal states and behaviors, as a whole, as well as on school conditions as a whole. These results provide considerable support for the central claims of transformational leadership theory as we described them in the introduction to this article. One reasonable interpretation of these two sets of results is that TSL has direct effects on teachers’ internal states and behaviors and these, in turn, influence school conditions. This interpretation is consistent with recent efforts to conceptualize the indirect influence of leaders on students (Leithwood, Patten, & Jantzi, 2010; Silins, Mulford, & Zarins, 2003).
Comparing Unpublished and Published Research Results
Results of our meta-analyses indicated a significant but small effect of TSL on student achievement (weighted mean r = .09), although effect sizes increased when achievement in math (r = .18) was examined separately from achievement in reading (r = .15). These results are consistent with earlier claims based on the results of published TSL research (Leithwood & Jantzi, 2005). These results also support general claims about significant school leader contributions to student learning reported in other meta-analyses concerned with both TSL and other approaches to school leadership. For example,
Witziers, Bosker, and Krüger (2003) reported small but significant achievement effects (Zr = .04) of practices reflecting Hallinger’s (2003) model of instructional leadership (IL)
Chin’s (2007) review of TSL’s effects found in unpublished studies reported a weighted mean effect of r = .49
Brown’s (2001) comparison of results from studies of IL and TSL reported an effect size of .74 for IL and .62 for TSL
Marzano et al.’s (2005) review of eclectically conceptualized, mostly unpublished leadership research reported an r of .25
As these examples begin to illustrate, however, although there may be little debate about the significant effects of well-exercised leadership on schools, teachers, and students, there is growing interest in which approaches or models of leadership in particular make the greatest contribution to student learning. The small number of original studies and reviews of evidence exploring this question directly (Hallinger, 2003; H. M. Marks & Printy, 2003; Robinson, Hohepa, & Lloyd, 2009) have so far restricted their comparisons to the effects of transformational leadership and IL.
Hallinger’s (2003) review, among other things, complicates the question of which leadership approach matters most by concluding that, at the level of what is actually measured, both IL and TSL approaches have much in common. Both approaches, as he points out (slightly paraphrased), have leaders,
Creating a shared sense of purpose
Developing a climate of high expectations and a culture focused on the improvement of teaching and learning
Shaping the reward structure to reflect the goals set for staff and students
Providing a wide range of activities aimed at intellectual stimulation and development of staff
Visibly modeling the values that are being fostered in the school
The two approaches to leadership differ, according to Hallinger, with respect to,
The target of change (i.e., first-order or second-order effects)
The extent to which the leader adopts a coordination and control (as is typical of IL approaches) versus an empowerment strategy (as with TSL)
The degree to which leadership is located in an individual (IL) or is shared (TSL)
The last of these distinctions emerged as a key result of large-scale empirical studies reported by H. M. Marks and Printy (2003) and by Louis and Wahlstrom (2010). Marks and Printy’s evidence suggested that the practices associated with either approach to leadership (TSL, IL) alone were not as powerful as a combination of such practices. They argued for “integrated leadership—transformational leadership coupled with shared instructional leadership” (p. 392). Louis and Wahlstrom’s evidence prompted the conclusion that leadership practices targeted directly at improving instruction had significant direct effects on teachers’ working relationships and indirect effects on student achievement. However, when leadership was shared between teachers and principals, teachers’ working relationships were stronger and student achievement was higher. Leadership effects on student achievement, according to Louis and Wahlstrom, occurred largely because effective leadership strengthened the professional community, encouraging teachers to work together to improve their practice and to improve student learning. Professional community, in turn, was a strong predictor of instructional practices that were strongly associated with student achievement. A school climate that encouraged levels of student effort above and beyond the levels encouraged in individual classrooms was the link between professional community and student achievement in this study. These conclusions amount to a call for what H. M. Marks and Printy (2003) referred to as “integrated leadership.”
Based on this line of thought, we suggest that future efforts to conceptualize leadership reflect the practices that seem important across most organizational sectors (primarily transformational leadership practices) as well as practices that are uniquely designed to improve the “technical core” of the organization. In schools, the technical core is instruction, and this has led us to propose a series of practices included in a dimension we call “improving instruction” (Leithwood & Jantzi, 2005; Leithwood, Louis, Anderson, & Wahlstrom, 2004; Leithwood & Riehl, 2005). 9 Other organizations have different “technical cores.”
Finally, we consider the results of Robinson, Hohepa, and Lloyd’s (2009) “best evidence” meta-analysis of leadership effects on students. Within the educational leadership field, this has been a much discussed review, especially the claims it makes about two issues. The first issue concerned the relative size of IL and TSL effects on students; IL effects were reported to be about 3 times larger than the effects of TSL (p. 90). For several reasons, however, it is difficult to compare these results directly to ours (or most others): The nature of the mean effect sizes reported in their study is likely different from the one used in our study. First, the effect sizes are not r correlation coefficients or Cohen d values, which are commonly used effect sizes, including in our study and the above-mentioned works of others. Second, the majority of the original studies included in their review employed statistical modeling such as structural equation modeling (SEM) or regression and subsequently reported path coefficients from SEM or regression coefficients. In addition, Robinson and her colleagues do not actually describe the meta-analytic techniques used for their review and do not indicate the nature or types of effect sizes they report.
Furthermore, in the research reporting the Robinson et al.’s (e.g., 2009) study,
The significance testing and homogeneity analysis of the effect sizes were not presented
Academic and nonacademic student outcomes were combined as their outcome measure
Their meta-analysis was based on a very small sample of studies (5 studies of TSL and 12 studies of IL)
Effect sizes resulted from studies that used different conceptual modeling, and controlled different variables in their statistical modeling were combined together
Effects sizes indicating direct or indirect effects were not discernable in their study, and the combination of effects sizes resulting from direct and indirect analyses is not appropriate
All these factors threatened the validity and claims of their study.
A second much-discussed claim emerging from the Robinson et al. (2008) review concerned the especially strong average effects on student outcomes of five leadership practices: (a) establishing goals and expectations (ES = .42); (b) resourcing strategically (ES = .31); (c) planning, coordinating, and evaluating teaching and the curriculum (ES = .42); (d) promoting and participating in teacher learning and development (ES = .84); (e) ensuring an orderly and supportive environment (ES = .27).
Robinson and her colleagues arrive at the two sets of claims we highlight here from different analytic paths, and this may explain why they seem contradictory. Among the five especially powerful sets of leadership practices, all but the third (direct focus on teaching and curriculum) are included among the practices Hallinger’s (2003) analysis indicated were shared by both TSL and IL models. Clearly, the current policy environment for school leaders makes this third practice a requirement of their work. But this seems to argue, once again, for an “integrated model,” one that is premised on the hard-to-refute claim that improving student learning entails improving both the classroom conditions directly experienced by students (the reputed focus of IL) as well as the wider organizational conditions that enable those classroom conditions (the reputed focus of TSL). In fact, this claim is acknowledged by most IL models that include attention to such organizational conditions as culture and climate as well as more recent extensions of TSL that include a focus on the improvement of curriculum and instruction.
Implications for Future Research
Four implications for future research from this review deserve special emphasis. First, future research aimed at assessing the extent to which school leadership influences students should eschew the exclusive use of whole leadership models and test the more specific practices that have emerged as consequential from recent research and reviews of research. Our evidence, as well as the results of other studies we have cited, demonstrate substantially different effects sizes among the leadership practices included within most leadership models.
Second, future research inquiring about how leadership influences student learning should also be “practice specific.” It is likely that the influence of different leadership practices travels different routes to improve student outcomes. Theoretically, for example, goal-setting practices should have indirect effects on students through the direct effects they have on teacher motivation, building collaborative cultures should have indirect effects on students through their direct effects on teacher collective capacity, and providing individualized support should have indirect effects on students through the direct effects of individual teacher capacities and commitments.
Third, as we have argued elsewhere (Leithwood, Patten, & Jantzi, 2010), research aimed at assessing indirect leadership effects on students requires very complex research designs (e.g., long chains of connected variables), exceptionally sensitive data- collection processes, and statistical models that sometimes take on a life of their own. Furthermore, research of this sort fails to adequately acknowledge and build on the substantial bodies of evidence that are already available about variables having direct effects on students. We think more is to be gained by less complex designs aimed at determining those practices with significant effects on school and classroom variables already known with a high degree of certainty to have important direct effects on students. 10 All but the largest and most ambitious of future studies, in other words, should use “deeper” measures of fewer variables so as to produce more robust evidence about a smaller number of associations than is possible with the more complex designs required for indirect effects leadership studies.
Finally, a major implication of our review for policy makers and practitioners is to appreciate that when they invoke the term instructional leadership to convey what they believe is the preferred form of leadership to drive their improvement efforts forward, they have not said anything very meaningful about the leadership practices they value. Comparing our evidence about TSL practices with well-developed models of IL revealed many more similarities than differences. And so the claim that IL has much greater effects on students than TSL is more confusing than enlightening. Leadership policy and practice will be improved by acknowledging the need for leaders to pay close attention to both the classroom conditions that students experience directly and the wider organizational conditions that enable, stimulate, and support those conditions.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
