Abstract
New Focus on Collective Bargaining
Increasingly, policy makers aiming to raise student achievement have turned their attention to issues of teacher quality. The focus on teachers—and in particular on the variation in effectiveness of the teacher workforce—is driven by a growing body of research that shows teacher quality to be the most important schooling factor in students’ academic success (Darling-Hammond, 2000; Rivkin, Hanushek, & Kain, 2005; Rockoff, 2004). While prominent in policy debates, less empirical attention has been paid to the governing mechanisms that may influence the quality and distribution of teachers within school districts. 1 Chief among these mechanisms are collective bargaining agreements (CBAs).
It is surprising that few empirical studies have focused on CBAs given that it is quite common for policy makers and pundits alike to point to CBAs, and some CBA provisions in particular (e.g., seniority-based job protections), as key inhibitors to effective school district operation and student achievement (Cohen, Walsh, & Biddle, 2008; Hess & Loup, 2008). A focus on CBAs is also timely given that the federal government’s Race to the Top grant competition incents states to make dramatic changes in teacher policies, many of which must be negotiated as part of the collective bargaining process.
In this article, we introduce a unique new data set derived from
We find generally high correlations between restrictiveness estimates calculated from different subsets of data. Importantly, restrictiveness estimates calculated using only high-profile provisions are highly correlated with restrictiveness estimates based on all provisions, suggesting that researchers can still draw important conclusions by applying the PIIR method to readily available data on teacher CBAs. However, estimates from certain subsections of the contract—grievance and layoff—do not correlate highly with estimates calculated from the full data. And work relying only on layoff provisions may in fact lead to opposite conclusions than research informed by all provisions.
Background
A large literature on bargaining in the private sector suggests that competition between firms in a given industry limits private-sector unions from demanding inflated benefits and wages (Clark, Delaney, & Frost, 2002). But public-sector unions’ viability depends on members’ ability to persuade the public and elected officials that contracts and bargaining demands are instrumental to positive policy outcomes and not exclusively devoted to members’ more narrow economic concerns (Klingner, 1994). 2 Scholars have recently begun to explore the connections between collective bargaining and teacher workforce outcomes in education (Koski & Horng, 2007; Levin, Mulhern, & Schunck, 2005; Moe, 2005, 2009; Strunk, 2011).
Detailed studies of bargaining in the education context focus on the provisions driving union “strength” or “power” and the influence of collective bargaining on outcomes like wages and student achievement. Most of these studies rely on simple indicators from one section of CBAs to capture a union’s strength in the bargaining process. For example, studies by Moe (2005) and Koski and Horng (2007) rely on measures of seniority-based transfer rights to assess the relationship between union strength and important teacher workforce outcomes. 3 Moe’s work on the relationship between union power and student achievement relies on a similar unifaceted measure (Moe, 2009). Carefully chosen CBA provisions can inform our understanding of how these provisions influence specified outcomes. However, in highlighting particular cherry-picked provisions, this work may overlook important trade-offs in the negotiation process and, in doing so, provide a misleading picture of union strength and the relationship between union demands and other important outcomes (e.g., student achievement). 4
We argue that most existing studies of the influence of collective bargaining on teacher distribution and student outcomes do not go far enough in addressing sustained critiques of the bargaining literature. Kochan & Wheeler (1975) published a study in 1975 arguing that to successfully advance the state of collective bargaining theories that utilize outcomes as the dependent variable 1) outcomes should be conceptualized in a way that includes all (or a representative sample) of the relevant items of interest that form the content of negotiations; 2) a concept of union power should be developed that reflects the underlying complexity of forces affecting a bargaining relationship and that is susceptible to measurement; 3) the model should be tested empirically in order to assess its validity; and 4) the test should take place at the level at which bargaining actually takes place.
Existing studies that focus on particular subsections or provisions of CBAs to the exclusion of others (Koski & Horng, 2007; Moe, 2005, 2009) may ignore relevant items of interest (Criterion 1) and therefore may not capture the complexity of forces driving contract negotiations (Criterion 2). For instance, Koski and Horng (2007) and Moe (2005, 2009) focused on seniority-based transfer rights without regard to other potentially important contract provisions.
Recent work by Strunk and Reardon (2010), however, seeks to quantify the latent restrictiveness of a teacher contract using a data set of CBAs from a large, representative sample of California school districts that includes the full range of provisions mentioned in contracts across California. 5 Specifically, they cleverly adapt Reardon and Raudenbush’s (2006) PIIR model to teacher CBAs by coding provisions in each CBA as “responses” to a conditionally structured survey that addresses nearly every provision that could appear in a CBA. Their data set and methods of analysis address Kochan and Wheeler’s first two concerns, and research utilizing this measure of contract restrictiveness (Strunk, 2011; Strunk & Grissom, 2010) can therefore draw more robust conclusions because the measure is a function of all bargained provisions.
Strunk and colleagues have done further research to investigate the external validity of the PIIR restrictiveness measure. For example, Strunk and Grissom (2010) compared PIIR restrictiveness measures with a statewide survey of school board members in California and found that contracts in districts with stronger unions (measured by school board members’ evaluations of union power and union support of school board members in recent elections) allow school district administrators less flexibility than do contracts in districts with weaker, less active unions. This begins to address Kochan and Wheeler’s third criterion (that all measures must be tested empirically), and we contribute to this effort in two important ways. First, we report the results of applying the PIIR methodology to our data set of CBAs in Washington state. To our knowledge, this is only the second data set analyzed in this manner, and the first that includes every CBA in a state. Second, we assess the internal validity of the PIIR measure by estimating restrictiveness using various subsets of provisions: an objectively derived “restricted” subset of provisions (Strunk & Reardon, 2010), a subjectively derived subset of high-profile provisions, and subsets of data corresponding to eight categories of provisions. The results of this analysis should be of interest to researchers who are drawn to the PIIR approach but do not have access to comprehensive data on all provisions in teacher CBAs.
Data Collection, Coding, and Categories of Restrictiveness
CBAs from Washington state inform our analysis. Washington has 295 school districts, but 25 of these districts are not governed by a CBA. We collected the active CBA for each of the 270 districts that had a CBA operating in the 2010-2011 school year. 6
CBAs are legal documents, and the length and detail of these documents preclude simple evaluation and comparison. To understand how CBAs and the provisions they contain relate to one another and other outcomes of interest, it is necessary to encapsulate each agreement’s contents in a concise, logical, and consistent manner. To do this, we follow a rubric adapted from that developed by Strunk (Strunk & Reardon, 2010). 7 Strunk’s rubric attempts to address all of the provisions that could appear in a CBA so that resulting data, like the CBAs themselves, capture information on the host of provisions included in the following subsections: association rights, evaluation, grievance procedures, layoffs, hiring procedures and transfers, benefits and leaves, and workload.
CBA Coding
Undergraduate students at the University of Washington were responsible for the coding of CBAs. All students were split into pairs. Students independently coded each CBA then met with a partner to resolve coding discrepancies and provide a consolidated, agreed-upon record for each CBA. We use this consolidated coding in subsequent analyses. 8
Our primary goal is to explore the extent to which different measures of contract restrictiveness agree with each other in providing a similar picture of the overall CBA. In calculating these measures of contract restrictiveness (which also may be judged as a measure of union power), we seek to capture key issues driving the outcome of management-union negotiations in each district. Moreover, we want the measure to reflect the underlying complexity of CBAs. Following Strunk and Reardon (2010), we code CBAs in a manner that treats each provision in a CBA as a “response” to a survey that includes all contract provisions covered in CBAs.
Designing a measure of restrictiveness that adequately captures the complexity of contracts is not trivial. For instance, many important provisions in CBAs—such as the length of the school day, the negotiated class size in each grade, and number of leave days teachers receive—require a numerical response. Others—such as “Does this CBA include a no-strike clause?” and “Are tenured teachers evaluated differently than nontenured teachers?—invite dichotomous categorization. And many “responses” in a CBA are conditional to responses earlier in the CBA—for example, the response to “is seniority the only factor in selecting a teacher to voluntarily transfer?” is conditional on the response to “does seniority play
Strunk and Reardon (2010) utilized a PIIR model to overcome these data challenges and obtain a measure of CBA “restrictiveness.” PIIR models require a dichotomous response to each provision. Binary responses can be used to account for the conditional structure of response provision data. We use actual data from three districts in our sample to illustrate how our initial observed data, like Strunk and Reardon’s, are transformed into a binary, conditional structure in preparation for PIIR analysis.
For each district contract, CBA coders “respond” to a series of questions regarding the important provisions noted above. The CBA provision rubric “asks” two types of questions: gateway questions (GQs) and subquestions (SQs).
9
GQs (
The SQs noted above illustrate one of the challenges posed by observed data. To get the most information out of each provision and each contract, coders may initially record a qualitative or numerical response to particular questions such as the length of a school day, the number of students in a class, the timelines used to file grievances, and so on. Table 1 gives two example questions from the grievance section of the coding rubric that we will use to illustrate the coding process.
Example Questions From Coding Rubric.
Note: CBA = collective bargaining agreement.
Responses to these questions appear in what we term an observed response matrix. The observed response matrix for the entire data set is 270 districts by 766 individual provision items. Table 2 gives an example of an observed response matrix for the two example questions and three districts in our sample.
Example Observed Response Matrix.
Because the PIIR model requires a binary response, when all contracts are coded and combined, we analyze the distribution of numerical response with an eye for cutoff points that will preserve variance in information but allow a binary structure. Each numerical question is recoded as a series of increasingly restrictive questions that lend themselves to binary response.
11
Resulting response categories might be thought of as “bins.” Each bin contains information from a minimum of 10 CBAs (any question with fewer than 20 responses
Recoding Questions to Force a Binary Response.
Note: CBA = collective bargaining agreement.
Example Binary Response Matrix.
The PIIR Model
We now redirect attention to the PIIR model. As noted above, the PIIR model treats each provision in a CBA as a binary “response” to a survey that includes all contract provisions covered in CBAs. And because many “responses” in a CBA are conditional to responses earlier in the CBA—for example, the response to “is seniority the only factor in selecting a teacher to voluntarily transfer?” is conditional on the response to “does seniority play any role in selecting a teacher to voluntarily transfer?”—the PIIR model uses as the dependent variable the conditional probability that a provision appears in a CBA
In Model 1, the conditional probability of provision
The dependent variable in Equation 1 is conditional on each item being in the “risk set” for a particular CBA. Provision
Example Gate Matrix.
We use this “gate matrix” to form a “risk matrix” that indicates whether a provision is in the risk set for each CBA. CBAs that responded affirmatively to Question 1 above could have responded affirmatively to 1b whether or not they actually did. Therefore, Question 1b is in the risk set for that CBA. SQs are not in the risk set of any CBA that has a zero for any of its gate questions. Table 6 gives an example risk matrix for our example questions and districts.
Example Risk Matrix.
Once we have this “risk matrix,” we can limit the binary response matrix to only those observations that correspond to items in the risk set for a particular CBA. The resulting matrix is called the CBA-Item matrix and is a record of actual responses to each item considered in a particular CBA. Table 7 contains the example CBA-Item matrix (note that the “response” column comes directly from the binary response matrix in Table 4). In the CBA-Item matrix, each response
Example CBA—Item Matrix.
Note: CBA = collective bargaining agreement.
The PIIR approach described above allows us to consider each CBA as a comprehensive document rather than subjectively pulling out specific CBA provisions that we (or others) may believe should have more or less influence on student and teacher outcomes. Each CBA can then be compared with every other CBA in the state, and by rubric design, the most restrictive district in the state should give management the least flexibility. However, two contracts by this measure may be considered equally restrictive if they have the same number of provisions (0s and 1s) even if they are “restrictive” in very different ways. 13 And it is quite likely that union and district representatives “trade” restrictiveness in one area of the contract for “leniency” in another. Therefore, in addition to obtaining an objective measure of CBA restrictiveness informed by all provisions within the “risk set” for each CBA, we also perform similar analyses on different subsets and categories of provisions.
Restricted Subset of Provisions
The measure of contract “restrictiveness” based on all provisions is objective and detailed, but a measure relying on 633 contract provisions is not portable or easily replicated. Moreover, we use these restrictiveness estimates as the dependent variable in future analyses so we want to reduce the noise in this measure as much as possible. Therefore, like Strunk and Reardon, we assess the 633 contract items used in our full model to ensure that they are all contributing to the measurement of the underlying “restrictiveness” trait. Identifying any misfitting items allows those items adding more noise than signal to our measure of restrictiveness to be removed from our scale. The resulting scale should be more reliable and user-friendly (as it is composed of fewer items). 14 We begin with a relatively high .67 contract reliability (compared with Strunk and Reardon’s .572).
Like Strunk and Reardon (2010), we base our item reduction on the unbiased statistical methods used in test construction. We run exploratory Cronbach’s alpha analysis on all 633 items included in our initial model. We examine the item-total correlations produced for each of the 633 items. A low item-total correlation statistic for a specific item tells us that item fails to measure the concept captured by the other items. We follow a generally accepted standard used by test makers and Strunk and Reardon and objectively discard items with item-total correlations lower than .25 (Strunk & Reardon, 2010). After an initial round of item reduction, we reassess our data and remove any further items that have item-total correlations below .25 based on the new scale with fewer items. After three iterations of this process, no items with item-total correlation below this threshold remain. We are left with an instrument of 218 items that span the breadth of the contract. The reliability of this measure increased slightly to .72, which indicates that the 415 discarded items were in fact capturing more noise than the underlying trait. 15 Unfortunately this “reduced” set is still not nearly as “user-friendly” as Strunk and Reardon’s 39. This suggests that unlike California, in Washington, one must consider a larger number of provisions to get a good gauge on the restrictiveness of a particular CBA.
Categories of Provisions
CBAs often follow a similar layout or formula. Association rights, evaluation, grievance procedures, layoffs, hiring procedures and transfers, benefits and leaves, and workload are discussed in specified contract subsections. The Strunk coding rubric used to create the data used in these analyses also categorizes provisions in this manner. And previous work has focused on particular provisions that may fall under the umbrella of one of these subcategories (workload, layoffs, hiring, and transfers; Koski & Horng, 2007; Moe, 2005, 2009; Moe & Anzia, 2011). Discussions with teachers and district administrators lead us to believe that unions and district managers may bargain “trade-offs” between categories to come to a final, mutually beneficial agreement. Therefore, in addition to running PIIR analysis on our full and restricted data sets to obtain district restrictiveness estimates, we also run PIIR analyses of the categories reported in Table 8 to determine whether districts that are “highly restrictive” in one category appear to be more or less restrictive in related categories. When we run the PIIR model on a category of provisions, we
Subcategories for Analysis.
High-Profile Provisions
Our final data set comprised high-profile provisions, those talked about in the popular press and cited in prior subjectively focused academic research (Koski & Horng, 2007; Moe, 2005, 2009). Table 9 lists the “cherry-picked” provisions included in our final analyses. These provisions should adequately capture a district’s “visible” restrictiveness.
Cherry-Picked Provisions.
Note: CBA = collective bargaining agreement.
One of the strengths of the PIIR method is that it is highly objective; that is, researchers do not determine a priori which provisions should receive the most weight in the analysis. Thus, it may seem counterintuitive to apply the PIIR method to a subjectively chosen subset of provisions. However, many researchers only have access to data on high-profile provisions—for example, the National Council on Teacher Quality (2009) maintained a publicly available database of high-profile provisions for 150 large districts across the country—and the PIIR method can still generate an objective measure of CBA restrictiveness given the subset of provisions considered. The question we investigate, then, is whether high-profile provisions contribute to the same latent restrictiveness as the full range of provisions.
Restrictiveness Estimates and Internal Validity Assessment
We have described our data and a method of analysis (PIIR) that yields a measure of restrictiveness based on all provisions. This measure should capture the content of negotiations and reflect the underlying complexity of forces affecting a bargaining relationship. How restrictive are the 270 teacher contracts in the state of Washington, and does the PIIR estimate produced from
We use our item response data to obtain a “restrictiveness” measure for each contract and each provision in all 270 of Washington’s CBAs. Restrictiveness estimates obtained via fixed effects logit PIIR are presented in Column 2 of Table 10. All results have been standardized to have mean 0 and standard deviation 1 within each model. 16 Therefore, the magnitude of each coefficient should be interpreted in standard deviations of restrictiveness; for example, the CBA in Aberdeen School District is 0.24 standard deviations less restrictive than the average CBA in the state when we use the full range of provisions in our data set (column 2, Table 10). Column 3 of Table 10 displays each district’s restrictiveness estimate based on the objectively reduced data set described above. Column 4 of Table 10 provides district restrictiveness estimates based on the “cherry-picked” set of provisions identified in Table 9. Columns 5 to 12 of Table 10 present results by subsection of the CBA (the categories corresponding to each column are listed at the end of Table 10.)
PIIR Model Contract Restrictiveness Estimates.
Note: PIIR = Partial Independence Item Response.
Table 11 displays the correlations between the PIIR estimates calculated from each subset of data. This presentation should be considered a first attempt at assessing the internal validity of the PIIR measure. Comparisons highlight similarities and key differences between estimates based on different subsets of data. The correlations are generally high, suggesting that latent restrictiveness in one category is predictive of latent restrictiveness in another category or in the contract as a whole.
Measure Similarity: Pearson Correlations.
Note: + p<.1, * p<.05, ** p<.01, *** p<.001.
The exceptions are restrictiveness in grievance policies (which is only weakly correlated with other subsets) and layoff policies (which is negatively correlated with estimates from other categories). Researchers who rely on grievance and layoff policy as a proxy for “union power” should take note as these results suggest that provisions from these contract subsections may capture another dimension of bargaining and lead to misleading results. That said, note that the restrictiveness estimates using only hiring and transfer policies are somewhat highly correlated (
Also of particular interest is the moderately high correlation between the restrictiveness estimates using the cherry-picked provisions and using the entire contract (.75). This suggests that—although our item reduction demonstrates that a large number of provisions are necessary to make conclusive inferences about contract restrictiveness—it is still possible to infer a great deal about the restrictiveness of a contract from a small subset of subjectively chosen provisions. Thus, future research relying on highly contested provisions across contract subsections may yield results similar to research relying on exhaustive, detailed coding of a near-complete universe of provisions.
Conclusion
Our results suggest that while the PIIR method is an important development in the analysis of collective bargaining outcomes, researchers do not necessarily need to code
Further research investigating the external validity of PIIR measures informed by various data subsets will add confidence to these findings. In future research, we plan to explore potential determinants of contract provisions—districts’ demographic, social, political, and economic characteristics, and the corresponding characteristics of proximate districts. The findings reported here will be bolstered if similar factors correlate with contract restrictiveness regardless of the category of subset of data considered. We also plan to investigate the relationship between contract restrictiveness and the quality and distribution of the teacher workforce, and if our results are robust to measures that utilize only high-profile provisions, this will lend additional support to our findings that these high-profile provisions contribute to the same latent restrictiveness as the entire contract.
