Assessment for Young ELLs: Strengths and Limitations in Current Practices

Young English Language Learners: Current Research and Emerging Directions for Practice and Policy

In "Strengths and Limitations of Current ELL Assessment Measures and Assessment Strategies: Reliability, Validity, and Utility," Linda Espinosa explores the current trends and implications in assessment for young ELLs. This excerpt originally was published in Young English Language Learners: Current Research and Emerging Directions for Practice and Policy (Teachers College Press, 2010).

In order for ELL children's outcomes to be fairly and appropriately assessed, it is important to understand some of the major limitations to many of the currently available measures for young children. A more detailed review of the technical characteristics of common ELL assessment measures (e.g., how the measure was developed, nature of normative sample, intended use of measure, validity, reliability, predictive ability, relationship to other more in-depth assessments, prior use with similar populations, etc.) can be found in a recently compiled compendium of ELL assessment measures (Barrueco, López, Ong, & Lozano, 2007).

Similarly, it is critical that users pay particular attention to the background information on standardization procedures contained in the respective assessment manuals for each measure. This is important both to guide the initial selection of the most appropriate assessment measure or measures, and to better understand any key concerns or limitations related to the interpretation of the results derived from individual assessments.

Discussion Points

Not intended to be an exhaustive summary of considerations or limitations, the following discussion highlights some of the issues and general limitations across assessment measures and procedures for ELLs:

  1. Despite the tremendous recent growth in the population of young ELL children, the corresponding development of a range of different types of appropriate measures for ELL children has lagged far behind. The limitations relate both to the overall number of available measures, and to the domains of skills and abilities covered by such measures.
  2. Many of the currently available measures for ELL children have been developed essentially as basic translations or adaptations of existing English language versions of measures, with varying levels of attention to ensuring comparability in the conceptual, linguistic or semantic content and/or level of diffi culty of the translated items across languages. As such, the content validity and construct validity may not be the same for the Spanish as for the English version of the same measure.
  3. Unless a test were specifically normed with a sample of similar ELL children, the tests are still less accurate and valid for dual language learners who have been judged suffi ciently fluent in English to be assessed in English. Often, the reliability coefficients for many standardized tests are lower for ELL children, even if they have been designated as fully proficient in English (Abedi, Leon, & Mirocha, 2001) meaning that the score for an individual child may not be an accurate measure of that specific child's abilities. The concurrent validity scores for some assessments have also been shown to be substantially lower for ELL students, again making the accuracy less dependable.
  4. The actual developmental construct that is being assessed by a measure may vary from one language to the next. On several common measures assessing different aspects of phonemic awareness, a child may be asked either to add to or take away parts of words to form new words. On English versions of such tasks, compound words are often used. For example, a child may be asked to say a word such as mailbox and then say it without mail (box), or blend the words mail and box together to form a new word (mailbox). However, since compound words occur much less frequently in Spanish, this particular type of task is much more complex for Spanish-speaking children to understand and to be as engaged in. Thus, unless items for a given task have been simultaneously developed in both English and the other language (or the measurement equivalence has been examined with the Spanish version), as is done with some measures like the Preschool Language Scale-4, there is a much greater risk that the translation process may result in an unintended change in the content, meaning, or linguistic complexity of the desired skill or ability that is being assessed.
  5. Many standardized assessment measures (both in English and other languages) contain a very small pool of test items to assess a given skill or ability of interest. Since many existing assessments are designed to assess a number of different skills and abilities, the developers often choose to keep the number of items for any given task to a small number, so as not to end up with an assessment that will be too lengthy and/or frustrating for the shorter attention span of many preschool-age children. However, given the abovenoted variability in young children's performance on many such assessments, the inclusion of a greater number of items would be one way to help to offset this inherent variability and improve the precision of the measures.
  6. It is not uncommon to see the inclusion of a fairly small number of young children in the normative samples used to develop the standardized assessment measures. Given the expected, normal level of variability in performance for these preschool-age children, one might expect to see the inclusion of larger numbers of children at the younger age levels, even as compared with the number of children included at the older ages for the normative sample. This is especially true with ELLs considering the great within-group variability.
  7. 7Many normative samples have a smaller-than-expected representation of low-income and culturally or linguistically diverse population subgroups, as compared with the composition of the total population of young children. However, information on the specific demographic composition of the normative sample used to develop a given measure may not always be readily available in the published assessment manuals. If the normative sample for a given measure does not match the demographic characteristics of those children who are being assessed, then the resulting norms may not be appropriate for use with such a different group of children.
  8. For assessments targeted toward ELL populations, there also is the consideration as to whether the normative samples used were monolingual Spanish-speaking or bilingual children, or some combination of the two. The desirability of different types of normative samples depends upon the nature of the question the user is interested in examining. On the one hand, some users may be most interested in examining a child's performance on a Spanish measure against the performance of monolingual Spanish speakers to assess the child's development against a normative group of children who primarily speak one language, Spanish. However, other users may be interested in examining how a child being raised in a bilingual environment performs in comparison to other, similar bilingual children.


In summary, these are just a few of the considerations that users should understand in order to be better informed when deciding which assessment measure or measures to select from among the different available assessment options, and for what purpose. While some assessments contain adequate descriptions of how the measure was developed, the composition of the normative sample, and the detailed information on the psychometric properties of the measure (e.g., reliability and validity), others provide much less information and, in some cases, contain misleading psychometric information.

For example, some assessment manuals present psychometric data, including information on the validity and reliability of the test for the English version of the measure, but don't present similar psychometric information on the specific non-English version. Caution should be exercised by potential users if any assessment does not provide detailed information on all aspects of the test. Despite the many limitations and/or concerns noted above, currently there are some available assessment measures that can be carefully utilized to gain a better understanding of the language development ELL children. Nevertheless there is still need for the continued development of newer and better ELL assessment measures and measurement strategies.


Our policy section is made possible by a generous grant from the Carnegie Corporation. The statements and views expressed are solely the responsibility of the authors.


Espinosa, L. Young English Language Learners: Current Research and Emerging Directions for Practice and Policy (Eds. García, E. and Frede, E.). Excerpts from Chapter 7, "Assessment for Young English Language Learners." Pps. 123-126. New York: Teachers College Press, Columbia University. 2010. Reprinted with permission.


Abedi, J., Leon, S., & Mirocha, J. (2001). Examining ELL and non-ELL student performance differences and their relationship to background factors: Continued analyses of extant data. Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Barrueco, S., López, M., Ong, C., & Lozano, P. (2007). A compendium of measures for the assessment of young English language learners. Washington, DC: Pew Task Force on Early Childhood Accountability.


For any reprint requests, please contact the author or publisher listed.

More by this author

aft shield logo
nea logo

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.