Are teacher evaluations telling us what we need to know?
Just over a decade ago, improving teacher quality meant stacking up paper credentials. President George W. Bush's signature No Child Left Behind program, for example, created incentives for schools to hire more teachers with post-secondary education or certification in subject areas.
In the Obama years, the tide has shifted, notes a new study released by the Brookings Institution. Teachers are increasingly being paid, measured and promoted on the basis of classroom performance rather than credentials.
And yet, the report argues, far too little is known about whether these teacher evaluation systems are telling us what we really need to know.
In "Evaluating Teachers with Classroom Observations," researchers at the Brown Center on Education Policy at the Brookings Institution released their conclusions Tuesday, drawing from studying multiple years of data from four major urban school districts.
Overall, the Brookings report offered hard evidence that merit-based teaching evaluations, though imperfect, are statistically valid and much more valuable than older measures using paper credentials. But the authors did argue that much needs to be done to make the evaluations more fair and consistent — and to help teachers improve.
"We are asking administrators to do something that is so central to kids' learning but they really haven't been asked to do before," said Sandi Jacobs, vice president of the National Council on Teacher Quality, speaking as part of an online panel hosted by Brookings to discuss the new report.
In the past, Jacobs said, principals would visit a classroom and "just say, no alarm bells are going off, so I'll just quietly step out."
In the new paradigm, Jacobs said, principals will have to have searching, rigorous conversations with teachers to help them improve, the kinds of conversations that used to be only reserved for when there was a "big, big problem."
"Now we are asking administrators to sit down with teachers and say, yes, it was a good lesson, but it wasn't a great lesson," Jacobs said. "And here are some things you can do to hit it out of the park."
Among the key observations in the Brookings report is that student test scores play a relatively small role in teacher evaluations.
When students are tested at the end of a school year and then their gains over the next year are measured and compared to their end point a year previous, the knowledge gained is called a "value added measure."
But as study co-author Grover Whitehurst noted, kids are not tested until third grade — and because by sixth grade they are taking multiple teachers — the study found that only 22 percent of the teachers are evaluated even partially based on test scores.
And of those, he adds, often less than half of teacher quality measures hinge on test scores.
"We find unions drawing the line on value added, we find litigation over it, and you'd think that was all that was going on," Whitehurst said. In fact, he says, only a minority of teachers are evaluated based on student tests. And of those, only a fraction of their evaluations is derived from student tests.
The bulk of the evaluations, Whitehurst said, are based on classroom observations, usually conducted by the principal or other administrators. These observations comprise at least 40 and often as high as 70 percent of the teacher evaluation, even when student test scores are available.
This is despite the widespread misperceptions, the authors note, that teacher performance is, for better or worse, now being heavily quantified on the basis on test scores.
In an online panel hosted by Brookings, two of the report's co-authors and two outside experts discussed how to improve teacher evaluations, with an emphasis on the overlooked elephant in the room, the classroom observations component.
School districts and states are scrambling to figure out how to get more student testing into the teacher evaluation mix, Jacobs said.
But Jacobs agreed that "no matter how big we get that percentage, classroom observation is going to remain terribly important, because it's where the actionable feedback comes from." That is, she argues, improvement in the classroom is much more likely to result from high quality observation than from opaque scores.
One key finding in the Brookings study, given the heavy weight placed on classroom evaluations, was that the structure of classroom observations does not make allowance for the challenges teachers face in different classrooms.
The researchers found evidence that teachers who were tackling more challenging classrooms got weaker classroom evaluations.
Left uncorrected, this skew against teachers in tough classrooms will discourage good teachers from going where they are most needed, argued study co-author Dan Chingos, a Brookings fellow.
"We think it's an important source of bias, and a matter of fairness," Chingos said, adding that teachers may be discouraged from taking on tougher challenges.
Dan Goldhaber, director of the Seattle-based Center for Education Data & Research, agreed with Chingos in the panel discussion.
"There is not actually a lot of research that looks at adjusting the classroom observations for student background," Goldhaber said, but the Brookings call for change on this point is consistent with other research that shows student demographics negatively affecting teaching evaluations.
"I think this will come as a surprise to a lot of practitioners," Goldhaber added. “People are used to being observed, and the view is that they are capturing what the teacher is doing independent of students. But in fact what you are observing is the dynamic between the teacher and student and there are a lot of reasons to believe that dynamic will be influenced by the kind of students you have.”
Fairness and feedback
The overarching theme of the report was that teacher evaluations need to be more fair and consistent, but they should also give "actionable feedback" that allows teachers to improve, not merely be judged.
The Brookings study also found, not surprisingly, that classroom evaluations conducted by neutral outside observers are more valid than those conducted by administrators within the school, who will often carry positive or negative biases toward the teacher into the classroom they are evaluating.
The Brookings study was particularly harsh on a widespread practice of evaluating individual teachers based on the test score performance, or value added measures, of the school as a whole.
"You read stories about the gym teacher being judged by the math value added for all the kids in the school," Chingos said. But even for those who do teach math or English, Chingos said, this measurement system hurts you for "being a good teacher in a bad school."
"Just like with the failure to correct on observation scores, school-wide value added measures can create a penalty for teaching in challenging environments," Chingos said.