AERA Sets a Low Bar for Teacher Evaluation
The recent public statement by the American Educational Research Association (AERA) on the use of value-added measures of performance for evaluating educators and programs has a lot of scary language about “scientific and technical limitations,” and “adverse consequences of faulty evaluations.” The statement also includes specific requirements that AERA says value-added models should meet before they are used in evaluations. But do these requirements represent the “very high technical bar” AERA claims it is setting? I don’t think so.
Most value-added models in use easily meet AERA’s requirements. The requirements boil down to using high-quality assessments, using multiple years of data, reporting the imprecision in the results, always combining value added with other measures of educator effectiveness, and monitoring the evaluation system continuously for unintended consequences. I agree these are important requirements for evaluation measures.
For example, in our work with the DC Public Schools to design the IMPACT evaluation system, we recognized each of these as important cornerstones for a successful evaluation system. Teachers are evaluated using multiple years of evaluation data that combine information from multiple components. The value-added estimates used in IMPACT account for the fact that there is a statistical margin of error, and we have continued to work with DC Public Schools to refine and improve IMPACT over time. Now, IMPACT may not be for everyone. But states and districts can use value-added measures in many different ways while still meeting AERA’s requirements.
The technical bar should be high for all evaluation components, not only value added. Experts are asking AERA why it is singling out value added, when it gives a free pass to other measures used to evaluate educators. Every component of a teacher’s evaluation should meet a set of minimum requirements, yet AERA points to a set of “promising alternatives” for which the actual evidence is not all that promising. Even worse, classroom observations would fail AERA’s test outright because they are often used as the sole factor in evaluations and imprecision in these scores is almost always ignored. Scores from classroom observations are less accurate when teachers are observed for just one lesson out of hundreds that they give over the course of the year, and by a single observer. But teachers are often evaluated based on observations conducted only by the school’s principal, whose judgment may be affected by the background of the students in the class, or even the teacher’s race or gender.
The stakes for teachers and students are high, so we should be setting a high bar for educator evaluations. But the AERA requirements are at best a low bar because meeting them does not guarantee accuracy. For example, AERA doesn’t require that the measures predict future performance, a key goal of evaluations in any discipline. No single measure is perfect, so it makes sense to use information from multiple measures, and using multiple measures works. But AERA’s statement misleads us by saying its bar is high when it is not, and misdirects us by focusing negative attention on value added, when our bar should be equally high for all evaluation components.