That old sorcerer has vanished
And for once has gone away!
Spirits called by him, now banished,
My commands shall soon obey.
In Goethe's classic, the apprentice uses a sorcerer's spell to ease his daily chores. Chanting the master's words, he brings a broomstick to life and tells it to fetch water to clean the workshop. The broomstick obeys, only too well. It races between the well and back until the workshop begins to flood. Although the apprentice had enough knowledge to set magic in motion, he could not think ahead to what he did not know.
I worry about a similar flood of unintended consequences if the Los Angeles Times moves forward with its plans to publish a database
that places 6,000 Los Angeles third- to fifth-grade teachers on a spectrum from "least effective" to "most effective." The Times believes that the data will be a powerful tool to force better teaching, but it cannot anticipate all of the consequences. For example, consider that capable prospective teachers might avoid a profession in which they risk public embarrassment based on an undeveloped science. Consider the well-documented estimates that 25% of the value-added assessments are likely to be in error.
Publishing the database might easily undermine parent and teacher morale and make it more difficult for principals to advance school improvement. Being told that their child's teacher is "ineffective," or even marginally less effective than a teacher across the hall, may lead some parents to pressure the principal to place their child with a "high-scoring" teacher. Pitting parents against one another or against their principal is not a recipe for school improvement.
The Times' teacher effectiveness rankings are based on an elaborate statistical model created by Richard Buddin, a senior economist and education researcher at the Rand Corporation. (Significantly, Buddin did not attach teachers' names to his analysis
; that was done by the Times.)
Buddin is one of many researchers across the country exploring so-called value-added approaches to assessing teacher quality. The assessments measure gains that students make on standardized tests from one year to the next. For example, researchers compare test scores of fourth graders with their scores as third graders to determine the "value added" by the fourth grade teacher. Proponents believe that the "value added" reliably distinguishes between more and less effective teachers. And they think that school officials would use such comparisons to target support to struggling teachers and motivate them to do better.
- First, student assignments to schools and classrooms are rarely random. As a consequence it is not possible to definitively determine whether higher or lower students test scores result from teacher effectiveness or are an artifact of how students are distributed.
- Second, it is difficult to compare growth of struggling students with the growth of high performers. In technical terms, standardized tests do not form equal interval scales. Enabling students to move from the 20th percentile to the 30th is not the same as helping students move from the 80th to the 90th percentile. These test score numbers are not like inches along a tape measure that have the same value regardless of where they occur.
- Third, estimates of teacher effectiveness can range widely from year to year. In recent studies, 10-15% of teachers in the lowest category of effectiveness one year moved to the highest category the following year while 10-15% of teachers in the highest category fell to the lowest tier.
The National Academy of Sciences concluded that value-added analysis "should not be used as the sole or primary basis for making operational decisions because the extent to which the measures reflect the contribution of teachers themselves, rather than other factors, is not understood."
And yet, the Los Angeles Times is about to publish a database with the teacher effectiveness rankings of 6,000 elementary school teachers. The Times argues that its role is to provide "parents and the public ... information that would otherwise be withheld" about the "performance of public employees." The Times should not believe in the magic of this data and should realize that it cannot foresee or control all of the consequences.