In the previous chapter, you saw that a standardized test is an assessment instrument for which there are uniform procedll!es for administration, design,  scoring, and is also a procedure that  through repeated administration and ongoing research:demonstrates criterion and construct validity. But a third, and perhaps the most important, elenlent of standardized testing is the presupposition of aaccepted set of standards on which to base the procedure.
A history of standardized testing in the United States reveals that during most of the decades in the middle of the twentieth century, standardized tests enjoyed a popularity and growth that was almost unchallenged. Toward the end of the twentieth century, such claims began to be challenged on all fronts (see Medina & Neill, 1990; Kohn, 2000), and at the vanguard of those challenges were the teachers of those millions of children. Teachers saw not only possible inequity in such tests but a disparity  between the content  and task of the test  and what they were teaching in their classes. Were  those tests accurate measures of achievement and success in the specified domains? Were those efficient, well-researched instruments based on carefully framed, comprehensive, validated standards of achievement? For the most part, they were not.
As educators became aware of this weakness, we saw the advent of a movement to establish standards on which students of all ages and subject-matter areas olight be assessed. Appropriately, the last 20 years have seen a mushrooming of efforts on the pa~ of educational leaders to base the plethora of school-administered standardized tests on clearly specified criteria within each content area being  measured or example,most departments ofeducation at the state level in the United States have now specified (or are in the process ofspecifying) the appropriate standards (that is, criteria or objectives) for each grade level (kinaergarten to grade 12) and each content area (math, language, Sciences , arts). construction of such standards makes possible a concordance between standardized test specifications and the goals and objectives of educational programs. And so, in the broad domain of language arts, teachers and educational administrators began the painstaking process of carefully examining existing curricular goals, conducting needs assessments among students, and designing appropriate assessments of those standards.


In creating such "benchmarks for accountability" (O'Malley & Valdez Pierce, 1996), there is a tremendous responsibility to carry out a comprehensive study of a number of domains:

  • literally thousands of categories oflanguage ranging from phonology at one end of a continuum to discourse, pragmatics, functional, and sociolinguistic elements at the other end;
  • specification of what ELD students~ needs are, at thirteen different gradelevels, for succeeding in their academic and social development;
  • a consideration of what is a realistic number and scope ofstandards to be included within a given curriculum;
  • 'a separate set of standards (qualifications, expertise, training)jor teachers to teach ELD students successfully in their classrooms; and
  • a thorough analysis of the means available to assess student attainment of those standards .

Standards setting is  a global challenge. In many  non-English-speaking countries, English is now a required subject starting as early as the first grade in some countries and by the seventh grade in virtually every country worldwide. In Japan and Korea, for example, a "com1nunicative" curriculum in English is required from third grade onward. California, with one of the largest populations of second language learners in the United States, was one of the fast states to generate standards. Other states follow similar sets of standards. Students must be prepared to use English effectively in social and academic settings. Listening and speaking skills provide one of the most important building blocks for the foundation of second language
acquisition. These skills are essential for developing reading and writing skills in English; however, to· ensure that ELLs acquire ·proficiency in English  listening speaking, reading, and writing, it is important that students  receive and writing instruction in English while they are developing fluency in oral English  To ensure that Ells develop the skills and concepts needed to demonstrate proficiency on the English-Language Arts (EIA) listening and Speaking standards, teachers must concurrently use both the ELD and the EIA standards. Ells achieving at the Advanced ELD proficiency  level should demonstrate proficiency on the EIA standards for their own and all prior grade levels. This means that all prerequisite skills needed to achieve the EIA standards must be learned by the Early Advanced EID proficiency level. Ells must develop both fluency in English and proficiency on the EIA standards.


The development of standards obviously implies the responsibility for correctly assessing their attainment. As standarcfs..based education became more accepted in the 1990s, many school systems across the United States found that the standardized tests of past decades were not in line with newly developed standards. Thus began the interactive process not only of developing standards but also of creating standards-based assessments.
The process of administering a comprehensive, valid, and fair assessment of ELD students continues to be perfected. Stringent budgets within departments of education worldwide predispose many in decision-making positions to rely on traditional standardized tests for ELD assessment, but rays of hope lie in the exploration of more student-centered approaches to learner assessment. Stack, Stack, and Fern (2002), for example, reported on a portfolio assessment system in the San Francisco Unified School District called the Language and Literacy Assessment Rubric (LALAR), in which multiple forms of evidence of students' work are collected. Teachers observe students year-round and record their observations on t scannable  at forms The use of the l ALAR system provides useful data on students' performance at· all grade levels for oral Production and for reading and writing performance in elementary and middle school grades (1-8). Further research is ongoing for highschool levels (grades 9-12).


At the higher levels of education (colleges, community colleges, adult schools, language schools, and workplace settings), standards-based assessment systems have also had an enormous impact.The Comprehensive Adult StudentASsessment System (CASAS), for example, is a program designed to provide broadly based assessments of ESL curricula across the United States. The system includes more than 80 standardized assessment instruments used to place learners in programs, diagnose learners' needs, monitor progress, and certify mastery of functional basic skills. Secretary's Commission in Achieving Necessary Skills (SCANS), outlines competencies necessary for language in the workplace. The competencies
cover language functions in terms of

  • resources (allocating time, materials, staff, etc.),
  • interpersonal skills, teamwork, customer service, etc.,
  • information processing, evaluating data, organizing fues, etc.,
  • systems (e.g., understanding social and organizational systems), and
  • I technology use and application.

These five competencies are acquired and maintained through training in the basic skills (reading, writing, listening, speaking); thinking skills such as reasoning ancreative problem solving; and personal qualities, such as self-esteem and Sociability.


Kuhlman (2001) emphasized the importance of teacher standards in three domains:
1. linguistics and language development
2. culture and the interrelationship between language and culture
3. planning and managing instruction
The International Association of Teachers of English to Speakers of Other Languages(TESOL). TESOL's standards committee advocates penormance-based assessment of teachers for the following reasons:

  • Teachers can demonstrate the standards in their teaching.
  • Teaching can be assessed through what teachers do with their learners in  their classrooms or virtual classrooms (their performance) .
  • This performance can be detailed in what are called "indicators": examples of evidence that the teacher can meet a part of a standard.
  • The processes used to assess teachers need to draw on complex evidence of perfomance. In other words, indicators are more tha~simple "how to" statements.
  • Performance-based assessment of the standards is an in"tegrated system. It is neither a checklist nor a series of discrete assessments.
  • Each assessment within the system has performance criteria against which the performance can be measured.
  • Performance criteria identify to what extent the teacher meets the standard.
  • Student learning is at the heart of the teachers performance.

The standards-based approach to teaching and assessment presents the profession with many challenges. However thorny those issues are, the social consequence of this movement cannot be ignored, especially in terms of student assessment.


One of those stories, as told by Russell Webster (personal communication), illustrates the high-stakes nature of this globally marketed standardized test. A ring of enterprising "business" persons organized a group of pretend testtakers to take the TOEFL in an early time zone on a given day. (In those days . .the tests were administered everywhere on the same day across a riuhjber of time zones. So TOEFL administrations ended in some East Asian countries as much as 8 to 14 hours they began in the United States.)
The task of each test-taking "spy" was not to pass the TOEFL, but to memorize a subset of items, including the stimulus and all of the multiple-choice options, and immediately upon leaving the exam to telephone those items to the central organizers. As the memorized subsections were called in, a complete form of the TOEFL was quickly reconstructed. The organizers had employed expert consultants to generate the correct response for each item, thereby re-creating the test items and their correct answers! For an outrageous price of many thousands of dollars, prearranged buyers of the results were given copies ~of the test items and correct responses with a few hours to spare before entering a test administration in theWestern Hemisphere. The story of how this underhanded group of entrepreneurs were caught and brought to justice is a long tale of blockbuster spy-novel proportions involving the FBI and, eventually, international investigators. But the story shows the huge gate-keeping role of tests like the TOEFL and the high price that sonle were willing to pay to gain access to a university in the United States and the visa that accompanied  it.

Consider the fact that correlations between TOEFL scores and academic performance in the ftrst year of college are impressively high (Henning & Cascallar , 1992). Are tests that lack a high level of content validity appropriate assessments of ability? A good deal of research says yes to this question as well. A study of the correlation Of TOEFL results with oral and written production, for example, showed that years before TOEFL's current use of an essay and oral production section, significant positive correlations were obtained between all subsections of the TOEFL and independent direct measures of oral and written production (Henning & Cascallar, 1992).Test promoters commonly use such findings to support: their claims for the efficacy of their tests. But several nagging, persistent issues emerge from the arguments about the consequences of standardized testing. Consider the following interrelated questions:1
. Should the educational and business world be satisfied with high but· not perfectnprobabilities of accurately assessing test-takers on standardized instruments?

  1. In other words, what about the small m who are not fairly assessed?
  2. Regardless of construct validation studies and correlation statistics, should further types of performance be elicited in order to get a more comprehensive picture of the test-taker?
  3.  Does the proliferation of standardized tests throughout a young person's life give rise to test-driven curricula, diverting the attention of students from creative or personal interests and in-depth pursuits?
  4.  Is the standardized test industry in effect promoting a cultural, social, and political agenda that maintains existing power structures by assuring opportunity to an elite (wealthy) class of people?

Test Bias

It is no secret that standardized tests involve a number ot types of test bias. That bias  comes in many forms: language,culture, race, gender,and learning styles (Medina & Neill, 1990). The National Center for Fair and Open Testing, in its bimonthly newsletter Fair Test, every year offers dozens of instances of claims of test bias from teachers, parents, students, and legal consultants (see their website: ). For example, reading selections in standardized tests may use a passage from a literary piece that reflects a middle-class, white, Anglo-Saxon norm. Lectures used following prompt for an essay in "general writing ability"on the IELTS:
You rent a house through an agency.The heating system has stopped working. You
phoned the agency a week ago, but it has still not been mended. Write a letter to
the agency. :mc plain the situation and tell them what you want them to do about it.

In an era when we seek to recognize the multiple intelligences present within every student (Gardner, 1983, 1999), is it not likely that standardized tests promote logical-mathematical and verbal-linguistic intelligences to the virtual exclusion of the other contextualized, integrative intelligences? Only very recently have traditionally receptive tests begun to include written  and oral production in their test battery a positive  sign.

Test-Driven Learning and Teaching

Yet another consequence of standardized testing is the danger  of test-driven  learning and teaching. When students and other test-takers know that one smg1e Measure of performance  will determine their lives, they are less likely to take a positive attitude toward learning. The motives in such a context are almost exclusively Extrinsic C, with  little likelihood of stirring intrinsic interests. Test-driven learning is a worldwide issue. In Japan, Korea, and Taiwan, to name just a few countries, students approaching their last year of secondary school focus obsessively on passing the year-end college entrance examination, a major section of which is English (Kuba, 2002). Little attention is given to any topic or task that does not directly contribute to passing that one exam. In the United States, high school seniors are forced to give almost as much attention to SAT scores. Teachers also get caught up in the wave of test-driven systelus. In Florida, elementary school teachers were recently promised, cash bonuses of $100 per student as reward for their schools' high performance on the state-mandated grade-level test, the Florida Comprehensive Achievement Exam (Fair Test, 2000).The effect~ of ~his policy was undue pressure on teachers to make sure their students excelled in the exam, possibly at the risk of ignoring other objectives in their curricula. But a further, ultimately more serious effect was to punish schools inlower-socioeconomic neighborhoods. A teacher in such a school might actually
be a superb teacher, and that teacher's students might make excellent progress through the school year, but because of the test-driven policy, the teacher would receive no reward at all.

One of the by-products of a rapidly growing testing industry is the danger of an abuse of power. Shohamy (1997) and others (such as Spolsky, 1997; Ramp-Lyons, 2001) see the
ethics of testing as an extension of what educators call critical pedagogy , or more precisely in this case, critical language testing (see TBP, Chapter 23, for some comments on critical language pedagogy in general). Proponents of a critical approach to language testing claim that large-scale standardized testing is not an Un biased process, but rather is the "agent of cultural, social, political, educational, and ideological agendas that shape the lives of individual participants, teachers, and learners" (Shohamy, 1997, p. 3).The issues of critical language testing are numerous:

  • Psychometric traditions are challenged by interpretive, individualized proceduresfor predicting success and evaluating ability.
  • Test designers have a responsibility to offer multiple modes of performance to account for varying styles and abilities among test-takers.
  • Tests are deeply embedded in culture and ideology.
  • Test-takers are political subjects in a political context.

These issues are not new. More than a century ago, British educator E Y. Edgeworth (1888) challenged the potential inaccuracy of contemporary qualifying exam.inations for university entrance. In recent years, the debate has heated up. In 1997, an entire issue of the journal Language Testing was devoted to questions about ethics in language testing. One of the problems highlighted by the push for critical language testing is the widespread conviction, already alluded to aBove, that carefully constructed standardized designed by reputable test manufacturers are infallible in their predictive validity. One standardized test is deemed to be sufficient; follow-up measures are considered to be too costly.
Tests promote the notion that answers to real-world problems have unambiguous right and wrong answers with no shades of gray. A corollary to the latter is that tests presume to reflect an appropriate core of common knowledge, such as the competencies reflected in the standards discussed earlier in this chapter. Logic would therefore dictate that the test-taker must buy in to such a system of beliefs in order to make the cut.

