Are states and districts ducking teacher evaluation reform?

The current school year is the first in which many states have begun to use the teacher evaluation system they promised as part of their Race to the Top applications. Predictably, debate about the merits of these systems has never been more active.

In a blog post for Education Next, Carol Chung describes her research on the implementation of teacher evaluation systems in 17 states and Washington, DC. The primary finding she highlights is that school districts have varied in the scores they give teachers. As an example, she describes three neighboring Florida counties with similar academic performance levels whose teachers had wildly different scores. Hillsborough County gave 38 percent of its teachers a “Highly Effective” ranking, but nearby Pasco County gave this rating (the highest in Florida’s system) to 5 percent of its teachers. Pasco gave 94 percent of its teachers the next highest rating, “Effective,” compared to 43 percent of teacher in Manatee County

Chung’s research is critically important. Most states are using systems of evaluation, rating, and consequences similar to Florida’s. For such systems to have meaning to the public, district scoring systems need to align with one another. If Manatee and Pasco County have a different definition of “Effective” teaching, the public loses the ability to compare quality across districts. Chung is also right to assume that score clustering suggests some districts are implementing this system faster than others, unless we believe that fully 94 percent of all teachers should fall into one category, as they do in Pasco.

Less compelling is the contention that the early challenges means that state and local officials are not trying to implement the new evaluation systems in good faith. Before one assumes local and state officials are not trying, s/he should consider the monumental challenge these reforms pose. Districts may struggle to get teacher evaluation right not because they are unwilling to implement the reform but because they are struggling with a new system  represents a sea change frompast practice.

Prior to today, in most districts the results of teacher evaluations had neither been made widely available nor carried high stakes, so these districts would have had no idea how their evaluations compared to those of other districts. Likewise, state officials have almost no real experience helping schools calibrate their evaluations with each other. 

Such calibration turns out to be incredibly difficult. For example, one of the qualities we might hope to observe in a “highly effective” high school literature teacher is the ability to foster classroom discussion on the assigned reading. We can train evaluators in frameworks that offer the clearest possible definition of the behavior we hope to see. Even if we do that training perfectly, two excellent evaluators can observe the same teacher at the same time teaching to the same kids and not agree on whether that teacher meets “highly effective” criteria. After all, exactly how much discussion constitutes “a lot” of discussion? How sophisticated an understanding of Hamlet’s graveyard scene must student comments indicate before we judge the teacher “highly effective?” 

Calibration of this sort is difficult enough within a school district, even if a superintendent or some other leader has the authority to enforce a strong set of standards. No fair observer should expect calibration across all the districts in a state to happen overnight. 

Elizabeth Sobka