Beware of the VAM: Valued-Added Measures for Teacher Accountability

By

In this article written for Colorín Colorado, Dr. Wayne Wright shares his concerns about using value-added measures to evaluate teachers of ELLs and discusses some of the problems that may arise with this approach.

See excerpts from Dr. Wright's book Foundations for Teaching English Language Learners: Research, Theory, Policy, and Practice (Caslon, 2010) in our policy history section.

Teachers are already very familiar with the unfairness of current school accountability programs based on snapshots of achievement as measured by a single test given at the end of the school year. Teachers of English language learners (ELLs) have been especially frustrated with such a system, which requires ELLs to meet the same passing criteria as English proficient students, without any consideration for how long students have been in the country, their English proficiency level, or their opportunity to learned tested material.

Thus, there is excitement over the promise of a new system of accountability based on students' growth over time. These growth models are called value-added measures (VAM). While such models are statistically complex, essentially they are calculated by testing students at the beginning of the school and at the end. The difference in the scores is purportedly a measure of the value added to student achievement (Bracey, 2010). Thus, according to Douglas N. Harris, a proponent of the responsible use of VAM as just one of several measures to determine teacher effectiveness, "in theory" VAM "provides an accurate estimate of what each individual teacher contributes to student learning" (2011, p. 2). At first sight this seems wonderful. A teacher of an ELL student, for example, could be recognized for all their students' great progress in learning English and academic content over the course of a year, rather than being punished for failing to get their students to pass the state test.

This appears to be a vast improvement over our current single-snapshot model, however, testing experts have urged that we must be extremely cautious about VAM. Unfortunately, these pleas have been ignored and we have already witnessed extreme abuses. In August 2010, the Los Angeles Times released teachers' value-added scores, publically ranking Los Angeles Unified School District teachers by name from best to worst. The New York Times sued to get access to teachers' value-added data so they can publish a similar list of New York teachers.

The idea that schools can be improved by publically "shaming" individual teachers is beyond disturbing and has been criticized by educational leaders and testing experts alike. Even Bill Gates criticized this approach in an op-ed published in the New York Times under the headline, "Shame is the not the Answer." Yet U.S. Secretary of Education Arne Duncan praised the Los Angeles Times for providing a "public service." Indeed, this approach is consistent with the Obama Administration's efforts to tie teacher evaluations to student test scores.

What are the problems with VAM?

The accuracy myth

The data and lists are inherently unfair because VAM scores are way too inaccurate to make reasonable judgments about teacher effectiveness. Harris notes that a major problem with VAM is that it grades teachers on a bell-curve, thus, "no matter how good the entire pool of teachers is, someone will always be at the bottom and half, by definition, will always be below average" (p. 2). Harris also notes that VAM only captures teachers' contributions to standardized test scores.

Bracey (2010) argues that VAM thus sets up a circular argument — effective teachers are those who raise test scores and test score gains are used to identify effective teachers. Of course effective teaching entails much more than raising test scores. Sparking children's curiosity and imagination, helping them develop a love of reading, developing their creativity, building their self-esteem, helping them see themselves as lifelong learners, preparing them to be good citizens, and so on, are all part of the magic of teaching that can never be measured by a standardized test.

Implications for ELLs: When it comes to ELLs, effective teaching is something that can never be captured by a single test score. VAM also makes the assumption that these test scores are valid for ELLs, however even a report commissioned the U.S. Department of Education acknowledge that we still don't know how to include ELLs in large-scale high-stakes standardized tests in a valid and reliable manner.

The single teacher myth

A further problem is that VAM is based on the assumption that all student learning across the year can be attributed to a single teacher. As Bracey notes, this ignores the true dynamics of schools where students learn from many educators — team teachers, librarians, specialists, paraprofessionals, etc.

Implications for ELLs: For ELL students, VAM cannot accurately account for collaborations between classroom and ESL teachers. Nor can it account for the learning students engage in with their families, peers, community members, and through watching TV and surfing the Internet.

The random assignment myth

VAM falsely assumes that students are randomly assigned to classrooms. Without random assignment, there is no basis for accurate judgment of teachers against one another.

Implications for ELLs: VAM can't account for the fact that ELL students are often placed in specialized language programs such as bilingual and sheltered English immersion classrooms. How much growth — as measured by a high-stakes test of questionable validity for ELLs — could one reasonably expect to see for beginning-level ELLs in a transitional bilingual education classroom versus intermediate and advanced ELLs in a dual language program?

The curriculum control myth

VAM assumes that teachers have full control over the curriculum. Yet how many teachers are forced to use curricular programs or instructional methods they have little faith in?

Implications for ELLs: ELL educators often work in a climate where instruction models and state restrictions have been created as a result of political will rather than sound educational research. For example, many teachers are forced to provide English-only instruction for ELLs because of state restrictions on bilingual education.

The similar test myth

VAM assumes that previous and current year tests are essentially equivalent in terms of content and level of difficulty. But as Bracey (2010) points out, "what if mathematics in one year is mostly about fractions and decimals, and the next year mostly about geometry and statistics? Does subtracing year one's scores from year two's make sense?" (p. 1).

Implications for ELLs: In several state ELLs may be tested in their native language one year, and in English the next. How can VAM account this, especially for students who do well on native language tests but struggle a bit the first time they take the tests in English? A false picture of student regression may be created, with blame placed on the teacher for making the student dumber!

Closing thoughts

A briefing paper published by the Economic Policy Institute authored by ten of the nation's foremost education and assessment experts, declared:

There is broad agreement among statisticians, psychometricians, and economists that student test scores alone are not sufficiently reliable and valid indicators of teacher effectiveness to be used in high-stakes personnel decisions, even when the most sophisticated statistical applications such as value-added modeling are employed. (p. 2)

The paper described VAM studies showing such instability, that teachers identified as highly effective one year were found to be highly ineffective in subsequent years, and vice-versa.

Due to this instability and the wide range of variables outside of classroom teacher control, organizations such as the Board on Testing and Assessment of the National Research Council of the National Academy of Sciences, The Educational Testing Service's Policy Information Center, and the RAND Corporation, have all issued statements declaring that VAM results should not be used as the sole or principal basis for making high-stake decisions about individual teachers or schools. Educators of ELLs should especially be wary of VAM results given the additional variables that cannot be accounted for when teaching culturally and linguistically diverse students in the process of attaining proficiency in English. It is important that educators work with their local district leaders, union leaders, and policy makers to ensure that VAM results are not used in ways that are misleading and harmful to ELL students, their teachers, and their schools.

About the Author

Wayne E. Wright is an Associate Professor in the Department of Bicultural-Bilingual Studies in the College of Education and Human Development at the University of Texas at San Antonio, where he provides training for future and current educators in the areas of ESL teaching methods, literacy, assessment, technology, and research. He has a Ph.D. in Educational Leadership and Policy Studies from Arizona State University, and a Master's degree in Language, Literacy, and Learning from California State University, Long Beach. He was an ESL and bilingual teacher in the Long Beach Unified School District in California working with English language learners in grades K-12 in bilingual, English as a second language, sheltered English immersion, heritage language, and mainstream classrooms.

Wright is the author of numerous research articles related to language minority education, and currently serves as the founding editor of the Journal of Southeast Asian American Education and Advancement. He has presented his research and provided training for language teachers throughout the world. In 2009 Wright was a Fulbright Scholar and Visiting Lecturer at the Royal University of Phnom Penh in Cambodia, where he provided training and assistance to the university and students in the Master's of Education Program. He and his wife Phal are parents of three amazing children.

Acknowledgements

Our ELLs and Policy section is made possible by a generous grant from the Carnegie Corporation. The statements and views expressed are solely the responsibility of the authors.

References

Bracey, Gerald. "What's the Value of Growth Measures?" FairTest.org. Retrieved 2/20/2012 from http://fairtest.org/whats-value-growth-measures

Economic Policy Institute. Problems with the Use of Student Test Scores to Evaluate Teachers. Briefing Paper #278. August 29, 2010.

Francis, D. and Rivera, M. Practical Guidelines for the Education of English Language Learners: Research-Based Recommendations for the Use of Accommodations in Large-Scale Assessments. Center on Instruction, University of Houston. 2006.

Gates, B. "Shame Is Not the Solution." The New York Times. Published 2/22/2012. Retrieved 2/20/2012 from http://www.nytimes.com/2012/02/23/opinion/for-teachers-shame-is-no-solution.html?_r=3&ref=opinion

Harris, D. Value-Added Measures in Education: What Every Educator Needs to Know. Pg. 2. Cambridge, MA: Harvard Education Press. 2011.

Phillips, A. "City to Release Teacher Ratings After Union Loses Suit." The New York Times. Published 2/14/2012. Retrieved 2/20/2012 from http://www.nytimes.com/schoolbook/2012/02/14/city-to-release-teacher-ratings-after-union-loses-suit/

Wright, D. "Fight Over LA Times Posting Teacher Ratings" ABC News 7 Report. Published 9/02/2010. Retrieved 2/20/2012 from http://abcnews.go.com/WNT/video/fight-la-times-posting-teacher-ratings-los-angeles-public-rank-school-children-11547754

Reprints

You are welcome to print copies or republish materials for non-commercial use as long as credit is given to Colorín Colorado and the author(s). For commercial use, please contact [email protected].

Beware of the VAM: Valued-Added Measures for Teacher Accountability

By

On this page