ASSESSING GRAMMAR
Differing notions of ‘grammar’ for assessment
Introduction
The study of grammar has had a long and important role in the history ofsecond language and foreign language teaching. For centuries, to learn another language, or what I will refer to generically as an L2, meant to know the grammatical structures of that language and to cite prescriptions for its use. Grammar was used to mean the analysis of a language system, and the study of grammar was not just considered an essential feature of language learning, but was thought to be sufficient for learners to actually acquire another language (Rutherford, 1988). Grammar in and of itself was deemed to be worthy of study – to the extent that in the Middle Ages in Europe, it was thought to be the foundation of all knowledge and the gateway to sacred and secular understanding (Hillocks and Smith, 1991). Thus, the central role of grammar in language teaching remained relatively uncontested until the late twentieth century. Even a few decades ago, it would have been hard to imagine language instruction without immediately thinking of grammar.
What is meant by ‘grammar’ in theories of language?
Grammar and linguistics
When most language teachers, second language acquisition (SLA) researchers and language testers think of ‘grammar’, they call to mind one of the many paradigms (e.g., ‘traditional grammar’ or ‘universal grammar’) available for the study and analysis of language. Such linguistic grammars are typically derived from data taken from native speakers and minimally constructed to describe well-formed utterances within an individual framework. These grammars strive for internal consistency and are mainly accessible to those who have been trained in that particular paradigm.
Since the 1950s, there have been many such linguistic theories – too numerous to list here – that have been proposed to explain language phenomena. Many of these theories have helped shape how L2 educators currently define grammar in educational contexts. Although it is beyond the purview of this book to provide a comprehensive review of these theories, it is, nonetheless, helpful to mention a few, considering both the impact they have had on L2 education and the role they play in helping define grammar for assessment purposes.
These two views of linguistic analysis have been instrumental in determining how grammar has been conceptualized in L2 classrooms in recent years. They have also influenced definitions of L2 grammar forassessment purposes. I will now provide a brief overview of some of the more influential linguistic theories that typify the syntactocentric and communicative views of language.
Form-based perspectives of language
One of the oldest theories to describe the structure of language is traditional grammar. Originally based on the study of Latin and Greek, traditional grammar drew on data from literary texts to provide rich and lengthy descriptions of linguistic form. Unlike some other syntactocentric theories, traditional grammar also revealed the linguistic meanings of these forms and provided information on their usage in a sentence (Celce-Murcia and Larsen-Freeman, 1999). Traditional grammar supplied an extensive set of prescriptive rules along with the exceptions. A typical rule in a traditional English grammar might be:
The first-person singular of the present tense verb ‘to be’ is ‘I am’. ‘Am’ is used with ‘I’ in all cases, except in first-person singular negative tag and yes/no questions, which are contracted. In this case, the verb ‘are’ is used instead of ‘am’. For example, ‘I’m in a real bind, aren’t I?’ or ‘Aren’t I trying my best?’
Form- and use-based perspectives of language
The three theories of linguistic analysis described thus far have provided insights to L2 educators on several grammatical forms. These insights provide information to explain what structures are theoretically possible in a language. Other linguistic theories, however, are better equipped to examine how speakers and writers actually exploit linguistic forms during language use. For example, if we wish to explain how seemingly similar structures like I like to read and I like reading connote different meanings, we might turn to those theories that study grammatical form and use interfaces. This would address questions such as: Why does a language need two or more structures that are similar in meaning? Are similar forms used to convey different specialized meanings? To what degree are similar forms a function of written versus spoken language, or to what degree are these forms characteristic of a particular social group or a specific situation? It is important for us to discuss these questions briefly if we ultimately wish to test grammatical forms along with their meanings and uses in context.
One approach to linguistic analysis that has contributed greatly to our understanding of the grammatical forms found in language use, as well as the contextual factors that influence the variability of these forms, is corpus linguistics. I will briefly describe corpus linguistics along with how findings from this approach can be useful for assessing grammar.
The common practice of compiling linguistic corpora, or large andprincipled collections of natural spoken and written texts, in order toanalyze by computer patterns of language use in large databases of authentic texts has led to a relatively new field known as ‘corpus linguistics’. Not a theory of language per se, corpus linguistics embodies a suite of tools and methods designed to provide a source of evidence so that linguistic data can be analyzed distributionally – that is, to show how often and where a linguistic form occurs in spoken or written text. According to Biber, Conrad and Reppen (1998), these analyses typically focus on two concerns. One type of study examines the use of one linguistic feature (i.e., a lexical item or grammatical structure) in comparison with another. For example, corpus-based studies might examine the different uses of would. These studies might also compare the word wish with thatclauses and to-infinitives, or they might examine a linguistic feature with a non-linguistic feature, such as gender, dialect or setting.
Communication-based perspectives of language
Other theories have provided grammatical insights from a communicationbased perspective. Such a perspective expresses the notion that language involves more than linguistic form. It moves beyond the view of language as patterns of morphosyntax observed within relatively decontextualized sentences or sentences found within natural-occurring corpora. Rather, a communication-based perspective views grammar as a set of linguistic norms, preferences and expectations that an individual invokes to convey a host of pragmatic meanings that are appropriate, acceptable and natural depending on the situation. The assumption here is that linguistic form has no absolute, fixed meaning in language use (as seen in sentences 1.5 and 1.7 above), but is mutable and open to interpretation by those who use it in a given circumstance. Grammar in this context is often co-terminous with language itself, and stands not only for form, but also for meaningfulness and pragmatic appropriacy, acceptability or naturalness – a topic I will return to later since I believe that a blurring of these concepts is misleading and potentially problematic for language educators.
What is pedagogical grammar?
A pedagogical grammar represents an eclectic, but principled description of the target-language forms, created for the express purpose of helping teachers understand the linguistic resources of communication. These grammars provide information about how language is organized and offer relatively accessible ways of describing complex, linguistic phenomena for pedagogical purposes.
Research on L2 grammar teaching, learning and assessment
Research on L2 teaching and learning
Over the years, several of the questions mentioned above have intriguedlanguage teachers, inspiring them to experiment with different methods,approaches and techniques in the teaching of grammar. To determine if students had actually learned under the different conditions, teachers have used diverse forms of assessment and drawn their own conclusions about their students. In so doing, these teachers have acquired a considerable amount of anecdotal evidence on the strengths and weaknesses of using different practices to implement L2 grammar instruction. These experiences have led most teachers nowadays to ascribe to an eclectic approach to grammar instruction, whereby they draw upon a variety of different instructional techniques, depending on the individual needs, goals and learning styles of their students.
Comparative methods studies
The comparative methods studies sought to compare the effects of different language-teaching methods on the acquisition of an L2. These studies occurred principally in the 1960s and 1970s, and stemmed from areaction to the grammar-translation method, which had dominated language instruction during the first half of the twentieth century. More generally, these studies were in reaction to form-focused instruction (referred to as ‘focus on forms’ by Long, 1991), which used a traditional structural syllabus of grammatical forms as the organizing principle for L2 instruction. According to Ellis (1997), form-focused instruction contrasts with meaning-focused instruction in that meaning-focused instruction emphasizes the communication of messages (i.e., the act of making a suggestion and the content of such a suggestion) while formfocused instruction stresses the learning of linguistic forms. These can be further contrasted with form-and-meaning focused instruction (referred to by Long (1991) as ‘focus-on-form’), where grammar instruction occurs in a meaning-based environment and where learners strive to communicate meaning while paying attention to form. (Note that Long’s version of ‘focus-on-form’ stresses a meaning orientation with an incidental focus on forms.) These comparative methods studies all shared the theoretical premise that grammar has a central place in the curriculum, and that successful learning depends on the teaching method and the degree to which that promotes grammar processing.
Non-interventionist studies
While some language educators were examining different methods of teaching grammar in the 1960s, others were feeling a growing sense of dissatisfaction with the central role of grammar in the L2 curriculum. As a result, questions regarding the centrality of grammar were again raised by a small group of L2 teachers and syllabus designers who felt that the teaching of grammar in any form simply did not produce the desired classroom results. Newmark (1966), in fact, asserted that grammatical analysis and the systematic practice of grammatical forms were actually interfering with the process of L2 learning, rather than promoting it, andif left uninterrupted, second language acquisition, similar to first language acquisition, would proceed naturally.
At the same time, the role of grammar in the L2 curriculum was alsobeing questioned by some SLA researchers (e.g., Dulay and Burt, 1973; Bailey, Madden and Krashen, 1974) who had been studying L2 learning in instructed and naturalistic settings. In their attempts to characterize the L2 learner’s interlanguage at one or more points along the path toward target-like proficiency, several researchers came to similar conclusions about L2 development. They found that instead of making incremental leaps in grammatical ability through an accumulation of grammatical forms, as presented in a traditional grammar syllabus, learners in both instructed and naturalistic settings acquired the target structures in a relatively fixed order (Ellis, 1994) regardless of when they were introduced. For example, Krashen (1977) claimed that, in general, ESL learners first acquire the -ing affix, plural markings and the copula (stage 1), and then the auxiliary and the articles (stage 2). This is followed by the irregular past verb forms (stage 3) and finally, the regular past, the third-person singular affix and the possessive -s affix (stage 4). While this information is interesting, research findings involve only a skeletal list of the possible grammar points that any typical curriculum would encompass. As a result, we might wonder how this order will change if other grammar points are investigated at the same time. Also, we have no idea how this order would hold for many other languages.
Empirical studies in support of non-intervention
The non-interventionist position was examined empirically by Prabhu(1987) in a project known as the Communicational Teaching Project (CTP) in southern India. This study sought to demonstrate that the development of grammatical ability could be achieved through a task-based, rather than a form-focused, approach to language teaching, provided that the tasks required learners to engage in meaningful communication. In the CTP, Prabhu (1987) argued against the notion that the development of grammatical ability depended on a systematic presentation of grammar followed by planned practice. However, in an effort to evaluate the CTP program, Beretta and Davies (1985) compared classes involved in the CTP with classes outside the project taught with a structural-oral-situational method. They administered a battery of tests to the students, and found that the CTP learners outperformed the control group on a task-based test, whereas the non-CTP learners did better on a traditional structure test. These results lent partial support to the non-interventionist position by showing that task-based classrooms based on meaningful communication can also be effective in promoting SLA. However, these results also showed that again students do best when they are taught and tested in similar ways.
Possible implications of fixed developmental order to language assessment
The notion that structures appear to be acquired in a fixed developmental order and in a fixed developmental sequence might conceivably havesome relevance to the assessment of grammatical ability. First of all, these findings could give language testers an empirical basis for constructing grammar tests that would account for the variability inherent in a learner’s interlanguage. In other words, information on the acquisitionalorder of grammatical items could conceivably serve as a basis for selecting grammatical content for tests that aim to measure different levels of developmental progression, such as Chang (2002, 2004) did in examining the underlying structure of a test that attempted to measure knowledge of the relative clauses. These findings also suggest a substantive approach to defining test tasks according to developmental order and sequence on the basis of how grammatical features are acquired over time (Ellis, 2001b). In other words, one task could potentially tap into developmental level one, while another taps into developmental level two, and so forth.
To illustrate, grammar tests targeting beginning English-language learners often include questions on the articles and the third-person singular -s affix, two features considered to be ‘very challenging’ from an acquisitional perspective. Since, according to these findings, no beginning learner would be expected to have target-like control of these particular grammatical items, the inclusion of these grammatical features in a beginning classroom achievement test might be questionable. However, the inclusion of these items in a placement test would be highly appropriate since the goal of placement assessment is to identify a wide range of ability levels so that developmentally homogeneous groups canbe formed.
Problems with the use of development sequences as a basis for assessment
Although developmental sequence research offers an intuitively appealing complement to accuracy-based assessments in terms of interpreting test scores, I believe this method is fraught with a number of serious problems, and language educators should use extreme caution in applying this method to language testing. This is because our understanding of natural acquisitional sequences is incomplete and at too early a stage of research to be the basis for concrete assessment recommendations (Lightbown, 1985; Hudson, 1993). First, the number of grammatical sequences that show a fixed order of acquisition is very limited, far too limited for all but the most restricted types of grammar tests. For example, what is the order for acquiring the modals, the conditionals, or the infinitive or gerund complements? Second, much of the research on acquisitional sequences is based on data from naturalistic settings, where students are provided with considerable exposure to the language. We have yet to learn about how these sequences hold for students whose only exposure to a language is an L2 classroom. Furthermore, acquisitional sequences make reference only to linguistic forms; no reference is made to how these forms interact with the conveyance of literal and implied meanings associated with a specific context. Third, as the rate (not the route) of acquisition appears to be influenced by the learner’sfirst language and by exposure to other languages, we need to understandhow these factors might impact on development rates and how we wouldreconcile this if we wished to test heterogeneous groups of language learners. Finally, as the developmental levels represent an ordering of grammatical rules during acquisition, this may or may not be on the same measurement scale as accuracy scores. Thus, until further research demonstrates the precise relationship between these scales, we should be careful about comparisons between proficiency levels based on accuracy scales and levels of interlanguage development. In the end, it is premature to apply the findings from acquisitional sequences research to language assessment given our current level of understanding of developmental sequences.
Interventionist studies
Not all L2 educators are in agreement with the non-interventionist position to grammar instruction. In fact, several (e.g., Schmidt, 1983; Swain, 1991) have maintained that although some L2 learners are successful in acquiring selected linguistic features without explicit grammar instruction, the majority fail to do so. Testimony to this is the large number of non-native speakers who emigrate to countries around the world, live there all their lives and fail to learn the target language, or fail to learn it well enough to realize their personal, social and long-term career goals. In these situations, language teachers affirm that formal grammar instruction of some sort can be of benefit. Furthermore, most language teachers would contend that explicit grammar instruction, including systematic error correction and other instructional techniques, contributes immensely to their students’ linguistic development. Finally, despite the non-interventionist recommendations toward grammar teaching, I believe grammar still plays an important role in most L2 classrooms around the world.
Empirical studies in support of intervention
Aside from anecdotal evidence, the non-interventionist position has come under intense attack on both theoretical and empirical grounds with several SLA researchers affirming that efforts to teach L2 grammar typically results in the development of L2 grammatical ability. Hulstijn(1989) and Alanen (1995) investigated the effectiveness of L2 grammar instruction on SLA in comparison with no formal instruction. They found that when coupled with meaning-focused instruction, the formal instruction of grammar appears to be more effective than exposure to meaning or form alone. Long (1991) also argued for a focus on both meaning and form in classrooms that are organized around meaningful and sustained communicative interaction
.
Research on instructional techniques and their effects on acquisition
Much of the recent research on teaching grammar has focused on four types of instructional techniques and their effects on acquisition. Although a complete discussion of teaching interventions is outside thepurview of this book (see Ellis, 1997; Doughty and Williams, 1998), thesetechniques include form- or rule-based techniques, input-based techniques, feedback-based techniques and practice-based techniques (Norris and Ortega, 2000).
Grammar processing and second language development
In the grammar-learning process, explicit grammatical knowledge refers to a conscious knowledge of grammatical forms and their meanings. Explicit knowledge is usually accessed slowly, even when it is almost fully automatized (Ellis, 2001b). DeKeyser (1995) characterizes grammatical instruction as ‘explicit’ when it involves the explanation of a rule or the request to focus on a grammatical feature. Instruction can be explicitly deductive, where learners are given rules and asked to apply them, or explicitly inductive, where they are given samples of language from which to generate rules and make generalizations. Similarly, many types of language test tasks (i.e., gap-filling tasks) seem to measure explicit grammatical knowledge.
Implicit grammatical knowledge refers to ‘the knowledge of a language that is typically manifest in some form of naturally occurring language behavior such as conversation’ (Ellis, 2001b, p. 252). In terms of processing time, it is unconscious and is accessed quickly. DeKeyser(1995) classifies grammatical instruction as implicit when it does not involve rule presentation or a request to focus on form in the input; rather, implicit grammatical instruction involves semantic processing of the input with any degree of awareness of grammatical form. The hope, of course, is that learners will ‘notice’ the grammatical forms and identify form–meaning relationships so that the forms are recognized in the input and eventually incorporated into the interlanguage. This type of instruction occurs when learners are asked to listen to a passage containing a specific grammatical feature. They are then asked to answer comprehension questions, but not asked to attend to the feature. Similarly, language test tasks that require examinees to engage in interactive talk might also be said to measure implicit grammatical knowledge.
Implications for assessing grammar
The studies investigating the effects of teaching and learning on grammatical performance present a number of challenges for language assessment. First of all, the notion that grammatical knowledge structures can be differentiated according to whether they are fully automatized (i.e., implicit) or not (i.e., explicit) raises important questions for the testing of grammatical ability (Ellis, 2001b). Given the many purposes of assessment, we might wish to test explicit knowledge of grammar, implicit knowledge of grammar or both. For example, in certain classroom contexts, we might want to assess the learners’ explicit knowledge of one or more grammatical forms, and could, therefore, ask learners to answer multiple-choice or short-answer questions related to these forms.
The role of grammar in models of communicative language ability
The role of grammar in models of communicative competence
Every language educator who has ever attempted to measure a student’s communicative language ability has wondered: ‘What exactly does a student need to “know” in terms of grammar to be able to use it well enough for some real-world purpose?’ In other words, they have been faced with the challenge of defining grammar for communicative purposes. To complicate matters further, linguistic notions of grammar have changed over time, as we have seen, and this has significantly increased number of components that could be called ‘grammar’. In short, definitions of grammar and grammatical knowledge have changed over time and across context, and I expect this will be no different in the future.
Rea-Dickins’ definition of grammar
In discussing more specifically howgrammatical knowledge might be tested within a communicative framework, Rea-Dickins (1991) defined ‘grammar’ as the single embodiment of syntax, semantics and pragmatics. She argued against Canale and Swain’s (1980) and Bachman’s (1990b) multi-componential view of communicative competence on the grounds that componential representations overlook the interdependence and interaction between and among the various components. She further stated that in Canale and Swain’s (1980) model, the notion of grammatical competence was limited since it defined grammar as ‘structure’ on the one hand and as ‘structure and semantics’ on the other, but ignored the notion of ‘structure as pragmatics’. Similarly, she added that in Bachman’s (1990b) model, grammar was defined as structure at the sentence level and as cohesion at the suprasentential level, but this model failed to account for the pragmatic dimension of communicative grammar.
Larsen-Freeman’s definition of grammar
Another conceptualization of grammar that merits attention is LarsenFreeman’s (1991, 1997) framework for the teaching of grammar in com- municative language teaching contexts. Drawing on several linguistic theories and influenced by language teaching pedagogy, she has also characterized grammatical knowledge along three dimensions: linguistic form, semantic meaning and pragmatic use. Form is defined as both morphology, or how words are formed, and syntactic patterns, or how words are strung together. This dimension is primarily concerned with linguistic accuracy. The meaning dimension describes the inherent or literal message conveyed by a lexical item or a lexico-grammatical feature. This dimension is mainly concerned with the meaningfulness of an utterance. The use dimension refers to the lexico-grammatical choices a learner makes to communicate appropriately within a specific context. Pragmatic use describes when and why one linguistic feature is used in a given context instead of another, especially when the two choices conveya similar literal meaning. In this respect, pragmatic use is said to embody presuppositions about situational context, linguistic context, discourse context, and sociocultural context. This dimension is mainly concerned with making the right choice of forms in order to convey an appropriate message for the context.
What is meant by ‘grammar’ for assessment purposes?
Regardless of the assessment purpose, if we wish to make inferences about grammatical ability on the basis of a grammar test or some other form of assessment, it is important to know what we mean by ‘grammar’ when attempting to specify components of grammatical knowledge for measurement purposes. With this goal in mind, we need a definition of grammatical knowledge that is broad enough to provide a theoretical basis for the construction and validation of tests in a number of contexts. At the same time, we need our definition to be precise enough to distinguish it from other areas of language ability.
From a theoretical perspective, the main goal of language use is communication, whether it be used to transmit information, to perform transactions, to establish and maintain social relations, to construct one’s identity or to communicate one’s intentions, attitudes or hypotheses. Being the primary resource for communication, language knowledge consists of grammatical knowledge and pragmatic knowledge. Therefore, I propose a theoretical definition of language knowledge that consists of two distinct, but related, components.
Towards a definition of grammatical ability
Defining grammatical constructs
Although our basic underlying model of grammar will remain the same in all testing situations (i.e., grammatical form and meaning), what it means to ‘know’ grammar for different contexts will most likely change (see Chapelle, 1998). In other words, the type, range and scope of grammatical features required to communicate accurately and meaningfully will vary from one situation to another. For example, the type of grammatical knowledge needed to write a formal academic essay would be very different from that needed to make a train reservation. Given the many possible ways of interpreting what it means to ‘know’ grammar, it is important that we define what we mean by ‘grammatical knowledge’ for any given testing situation. A clear definition of what we believe it means to ‘know’ grammar for a particular testing context will then allow us to construct tests that measure grammatical ability.
One of the first steps in designing a test, aside from identifying the needfor a test, its purpose and audience, is to provide a clear theoretical definition of the construct(s) to be measured. If we have a theoreticallysound, as well as a clear and precise definition of grammatical knowledge, we can then design tasks to elicit performance samples of grammatical ability. By having the test-takers complete grammar tasks, we can observe – and score – their answers with relation to specific grammaticalcriteria for correctness. If these performance samples reflect the under-lying grammatical constructs – an empirical question – we can then use the test results to make inferences about the test-takers’ grammatical ability. These inferences, in turn, may be used to make decisions about the test-takers (e.g., pass the course). However, we need first to provide evidence that the tasks on a test have measured the grammatical constructs we have designed them to measure (Messick, 1993). The process of providing arguments in support of this evidence is called validation, and this begins with a clear definition of the constructs.
What is ‘grammatical ability’ for assessment purposes?
The approach to the assessment of grammatical ability in this book is based on several specific definitions. First, grammar encompasses grammatical form and meaning, whereas pragmatics is a separate, but related, component of language. A second is that grammatical knowledge, along with strategic competence, constitutes grammatical ability. A third is that grammatical ability involves the capacity to realize grammatical knowledge accurately and meaningfully in test-taking or other language-use contexts. The capacity to access grammatical knowledge to understand and convey meaning is related to a person’s strategic competence. It is this interaction that enables examinees to implement their grammatical ability in language use. Next, in tests and other language-use contexts, grammatical ability may interact with pragmatic ability (i.e., pragmatic knowledge and strategic competence) on the one hand, and with a host of non-linguistic factors such as the test-taker’s topical knowledge, personal attributes, affective schemata and the characteristics of the task on the other. Finally, in cases where grammatical ability is assessed by means of an interactive test task involving two or more interlocutors, the way grammatical ability is realized will be significantly impacted by both the contextual and the interpretative demands of the interaction.
The components of grammatical knowledge
Knowledge of phonological or graphological form and meaning
Knowledge of phonological/graphological form enables us to understand and produce features of the sound or writing system, with the exception of meaning-based orthographies such as Chinese characters, as they are used to convey meaning in testing or language-use situations.
Knowledge of lexical form and meaning
Knowledge of lexical form enables us to understand and produce those features of words that encode grammar rather than those that revealmeaning. This includes words that mark gender (e.g., waitress), countability (e.g., people) or part of speech (e.g., relate, relation). For example,when the word think in English is followed by the preposition about before a noun, this is considered the grammatical dimension of lexis, representing a co-occurrence restriction with prepositions. One area of lexical form that poses a challenge to learners of some languages is word formation. This includes compounding in English with a noun + noun or a verb + particle pattern (e.g., fire escape; breakup) or derivational affix-ation in Italian (e.g., ragazzino ‘little kid’, ragazzone ‘big kid’). For example, a student who says ‘a teacher of chemistry’ instead of ‘chemistry teacher’ or ‘*this people’ would need further instruction in lexical form.
Knowledge of morphosyntactic form and meaning
Knowledge of morphosyntactic form permits us to understand and produce both the morphological and syntactic forms of the language. This includes the articles, prepositions, pronouns, affixes (e.g., -est), syntactic structures, word order, simple, compound and complex sentences, mood, voice and modality. A learner who knows the morphosyntacticform of the English conditionals would know that: (1) an if-clause sets up a condition and a result clause expresses the outcome; (2) both clauses can be in the sentence-initial position in English; (3) if can be deleted under certain conditions as long as the subject and operator are inverted; and (4) certain tense restrictions are imposed on if and result clauses.
Knowledge of cohesive form and meaning
Knowledge of cohesive form enables us to use the phonological, lexical and morphosyntactic features of the language in order to interpret and express cohesion on both the sentence and the discourse levels. Cohesive form is directly related to cohesive meaning through cohesive devices (e.g., she, this, here) which create links between cohesive forms and their referential meanings within the linguistic environment or the surrounding co-text. Halliday and Hasan (1976, 1989) list a number of grammatical forms for displaying cohesive meaning.
Knowledge of information management form and meaning
Knowledge of information management form allows us to use linguistic forms as a resource for interpreting and expressing the information structure of discourse. Some resources that help manage the presentation of information include, for example, prosody, word order, tense-aspect and parallel structures. These forms are used to create information management meaning.
Knowledge of interactional form and meaning
Knowledge of interactional form enables us to understand and use linguistic forms as a resource for understanding and managing talk-ininteraction. These forms include discourse markers and communication management strategies. Discourse markers consist of a set of adverbs, conjunctions and lexicalized expressions used to signal certain language functions.
Designing test tasks to measure L2 grammatical ability
How does test development begin?
Every grammar-test development project begins with a desire to obtain(and often provide) information about how well a student knows grammar in order to convey meaning in some situation where the target language is used. The information obtained from this assessment then forms the basis for decision-making. Those situations in which we use the target language to communicate in real life or in which we use it for instruction or testing are referred to as the target language use (TLU) situations (Bachman and Palmer, 1996). Within these situations, the tasks or activities requiring language to achieve a communicative goal are called the target language use tasks. A TLU task is one of many languageuse tasks that test-takers might encounter in the target language use domain. It is to this domain that language testers would like to make inferences about language ability, or more specifically, about grammatical ability.
What do we mean by ‘task’?
The notion of ‘task’ in language-learning contexts has been conceptualized in many different ways over the years. Traditionally, ‘task’ has referred to any activity that requires students to do something for the intent purpose of learning the target language. A task then is any activity (i.e., short answers, role-plays) as long as it involves a linguistic or nonlinguistic (circle the answer) response to input. Traditional learning or teaching tasks are characterized as having an intended pedagogical purpose – which may or may not be made explicit; they have a set of instructions that control the kind of activity to be performed; they contain input (e.g., questions); and they elicit a response. More recently, learning tasks have been characterized more in terms of their communicative goals, their success in eliciting interaction and negotiation of meaning, and their ability to engage learners in complex meaningfocused activities (Nunan, 1989, 1993; Berwick, 1993; Skehan, 1998).
What are the characteristics of grammatical test tasks?
As the goal of grammar assessment is to provide as useful a measurement as possible of our students’ grammatical ability, we need to design test tasks in which the variability of our students’ scores is attributed to thedifferences in their grammatical ability, and not to uncontrolled or irrelevant variability resulting from the types of tasks or the quality of the tasks that we have put on our tests. As all language teachers know, the kinds of tasks we use in tests and their quality can greatly influence how students will perform. Therefore, given the role that the effects of task characteristics play on performance, we need to strive to manage (or at least understand) the effects of task characteristics so that they will function the way we designed them to – as measures of the constructs we want to measure (Douglas, 2000). In other words, specifically designed tasks will work to produce the types of variability in test scores that can be attributed to the underlying constructs given the contexts in which they were measured (Tarone, 1998). To understand the characteristics of test tasks better, we turn to Bachman and Palmer’s (1996) framework for analyzing target language use tasks and test tasks.
The Bachman and Palmer framework
Bachman and Palmer’s (1996) framework of task characteristics represents the most recent thinking in language assessment of the potential relationships between task characteristics and test performance. In this framework, they outline five general aspects of tasks, each of which is characterized by a set of distinctive features. These five aspects describe characteristics of (1) the setting, (2) the test rubrics, (3) the input, (4) the expected response and (5) the relationship between the input and response.
Describing grammar test tasks
When language teachers consider tasks for grammar tests, they call to mind a large repertoire of task types that have been commonly used in teaching and testing contexts. We now know that these holistic task types constitute collections of task characteristics for eliciting performance and that these holistic task types can vary on a number of dimensions. We also need to remember that the tasks we include on tests should strive to match the types of language-use tasks found in real-life or language instructional domains.
In designing grammar tests, we need to be familiar with a wide range of activities to elicit grammatical performance. In the rest of the chapter, I will describe several tasks in light of how they can be used to measure grammatical knowledge. I will use the Bachman and Palmer framework as a guide for task specification in this discussion.
Selected-response task types
Selected-response tasks present input in the form of an item, and testtakers are expected to select the response. Other than that, all other task characteristics can vary. For example, the form of the input can be language, non-language or both, and the length of the input can vary from aword to larger pieces of discourse. In terms of the response, selectedresponse tasks are intended to measure recognition or recall of grammatical form and/or meaning. They are usually scored right/wrong, based onone criterion for correctness; however, in some instances, partial-credit scoring may be useful, depending on how the construct is defined. Finally, selected-response tasks can vary in terms of reactivity, scope and directness.
Limited-production task types
Limited-production tasks present input in the form of an item with language and/or non-language information that can vary in length or topic. Different from selected-response tasks, limitedproduction tasks elicit a response embodying a limited amount of language production. The length of this response can be anywhere from a word to a sentence. All task characteristics in limited-production tasks can vary with the exception of two: the type of input (always an ‘item’) and the type of expected response (always ‘limited-production’).
Limited-production tasks are intended to assess one or more areas of grammatical knowledge depending on the construct definition. Unlike selected-response items, which usually have only one possible answer, the range of possible answers for limited-production tasks can, at times, be large – even when the response involves a single word.
Developing tests to measure L2 grammatical ability
What makes a grammar test ‘useful’?
Score-based inferences from grammar tests can be used to make a variety of decisions. For example, classroom teachers use these scores as a basis for making inferences about learning or achievement. These inferences can then serve to provide feedback for learning and instruction, assign grades, promote students to the next level, or even award a certificate. They can also be used to help teachers or administrators make decisions about instruction or the curriculum.
The information derived from language tests, of which grammar tests are a subset, can be used to provide test-takers and other test-users with formative and summative evaluations. Formative evaluation relating togrammar assessment supplies information during a course of instruction or learning on how test-takers might increase their knowledge of grammar, or how they might improve their ability to use grammar in communicative contexts. It also provides teachers with information on how they might modify future instruction or fine-tune the curriculum. For example, feedback on an essay telling a student to review the passive voice would be formative in nature. Summative evaluation provides test stakeholders with an overall assessment of test-taker performance related to grammatical ability, typically at the end of a program of instruction. This is usually presented as a profile of one or more scores or as a single grade.
Score-based inferences from grammar tests can also be used to make, or contribute to, decisions about program placement. This information provides a basis for deciding how students might be placed into a level of a language program that best matches their knowledge base, or it might determine whether or not a student is eligible to be exempted from further L2 study. Finally, inferences about grammatical ability can make or contribute to other high-stakes decisions about an individual’s readiness for learning or promotion, their admission to a program of study, or their selection for a job.
Given the goals and uses of tests in general, and grammar tests in particular, it is fitting to ask how we might actually know if a test is, indeed, able to elicit scorable behaviors from which to make trustworthy and meaningful inferences about an individual’s ability. In other words, how do we know if a grammar test is ‘good’ or ‘useful’ for our particular context?
Many language testers (e.g., Harris, 1969; Lado, 1961) have addressed this question over the years. Most recently, Bachman and Palmer (1996) have proposed a framework of test usefulness by which all tests and test tasks can be judged, and which can inform test design, development and analysis. They consider a test ‘useful’ for any particular testing situation to the extent that it possesses a balance of the following six complementary qualities: reliability, construct validity, authenticity, interactiveness, impact and practicality. They further maintain that for a test to be ‘useful’, it needs to be developed with a specific purpose in mind, for a specific audience, and with reference to a specific target language use (TLU) domain.
Overview of grammar-test construction
Bachman and Palmer (1996) organize test development into three stages: design, operationalization and administration. I will discuss each of these stages in the process of describing grammar-test development.
Stage 1: Design
The design stage of test development involves the accumulation of information and making initial decisions about the entire test process. In tests involving one class, this may be a relatively informal process; however, in tests involving wider audiences, such as a joint final exam or a placement test, the decisions about test development must be discussed and negotiated with several stakeholders. The outcome of the design stage is a design statement. According to Bachman and Palmer (1996, p. 88), this document should contain the following components:
- a description of the purpose(s) of the test,
- a description of the TLU domains and task types,
- a description of the test-takers,
- a definition of the construct(s) to be measured,
- a plan for evaluating test usefulness, and
- a plan for dealing with resources.
Stage 2: Operationalization
The operationalization stage of grammar-test development describes how an entire test involving several grammar tasks is assembled, and how the individual tasks are specified, written and scored.
- Specifying the scoring method
- Scoring selected-response tasks
- Scoring extended-production tasks
- Using scoring rubrics
- Grading
Stage 3: Test administration and analysis
The final stage in the process of developing grammar tests involves the administration of the test to individual students or small groups, andthen to a large group of examinees on a trial basis.
Illustrative tests of grammatical ability
The First Certificate in English Language Test (FCE)
Given the assessment purposes and the intended uses of the FCE, the FCE grammar assessments privilege construct validity, authenticity, interactiveness and impact. This is done by the way the construct of grammatical ability is defined. This is also done by the ways in which these abilities are tapped into, and the ways in which the task characteristics are likely to engage the examinee in using grammatical knowledge and other components of language ability in processing input to formulate responses. Finally, this is done by the way in which Cambridge ESOL has promoted public understanding of the FCE, its purpose and procedures, and has made available certain kinds of information on the test. These qualities may, however, have been stressed at the expense of reliability.
The Comprehensive English Language Test (CELT)
In terms of the purposes and intended uses of the CELT, the authors explicitly stated, ‘the CELT is designed to provide a series of reliable and easy-to-administer tests for measuring English language ability of nonnative speakers’ (Harris and Palmer, 1970b, p. 1). As a result, concerns for high reliability and ease of administration led the authors to makechoices privileging reliability and practicality over other qualities of testusefulness. To maximize consistency of measurement, the authors used only selected-response task types throughout the test, allowing for minimal fluctuations in the scores due to characteristics of the test method. This allowed them to adopt ‘easy-to-administer’ and ‘easy-toscore’ procedures for maximum practicality and reliability. Reliability Illustrative tests of grammatical ability 201was also enhanced by pre-testing items with the goal of improving their psychometric characteristics.
Reliability might have been emphasized at the expense of other important test qualities, such as construct validity, authenticity, interactiveness and impact. For example, construct validity was severely compromised by the mismatch among the purpose of the test, the way the construct was defined and the types of tasks used to operationalize the constructs. In short, scores from discrete-point grammar tasks were used to make inferences about speaking ability rather than make interpretations about the test-takers’ explicit grammatical knowledge.
Finally, authenticity in the CELT was low due to the exclusive use of multiple-choice tasks and the lack of correspondence between these tasks and those one might encounter in the target language use domain. Interactiveness was also low due to the test’s inability to fully involve the test-takers’ grammatical ability in performing the tests. The impact of the CELT on stakeholders is not documented in the published manual.
In all fairness, the CELT was a product of its time, when emphasis was on discrete-point testing and reliability, and when language testers were not yet discussing qualities of test usefulness in terms of authenticity, interactiveness and impact.
The Community English Program (CEP) Placement Test
Given the purposes and the intended uses of the CEP Placement Test, the grammar section privileges authenticity, construct validity, reliability and practicality. Similar to tasks in the instruction, the theme-based test tasks all support the same overarching theme presented from different perspectives. Then, the construct of grammatical knowledge is defined in terms of the grammar used to express the theme. Given the multiple-choice format and the piloting of items, reliability is an important concern. Finally, the multiple-choice format is used over a limited-production format to maximize practicality. This compromise is certainly emphasized at the expense of construct validity and authenticity (of task).
Nonetheless, grammatical ability is also measured in the writing and speaking parts of the CEP Placement Test. These sections privilege construct validity, reliability, authenticity and interactiveness. In these tasks, students are asked to use grammatical resources to write about and discuss the theme they have been learning about during the test. In boththe writing and speaking sections, grammatical ability is a separatelyscored part of the scoring rubric, and definitions of grammatical knowledge are derived from theory and from an examination of benchmark samples. Reliability is addressed by scoring all writing and speaking performance samples ‘blind’ by two raters. In terms of authenticity and interactiveness, these test sections seek to establish a strong correspondence between the test tasks and the type of tasks encountered in themebased language instruction – that is, examinees listen to texts in which the theme is presented, they learn new grammar and use it to express ideas related to the theme, they then read, write and speak about the theme. The writing and speaking sections require examinees to engage both language and topical knowledge to complete the tasks. In both cases, grammatical control and topical control are scored separately. Finally, while these test sections prioritize construct validity, reliability, authenticity and interactiveness, it is certainly at the expense of practicality and impact.
Learning-Oriented Assessments of Grammatical Ability
What is learning-oriented assessment of grammar?
Alternative assessment emphasizes an alternative to and rejection of selected-response, timed and one-shot approaches to assessment,whether they occur in large-scale or classroom assessment contexts. Alternative assessment encourages assessments in which students are asked to perform, create, produce or do meaningful tasks that both tap into higher-level thinking (e.g., problem-solving) and have real-world implications (Herman et al., 1992). Alternative assessments are scored by humans, not machines.
Similar to alternative assessment, authentic assessment stresses measurement practices which engage students’ knowledge and skills in ways similar to those one can observe while performing some real-life or ‘authentic’ task (O’Malley and Valdez-Pierce, 1996). It also encourages tasks that require students to perform some complex, extendedproduction activity, and emphasizes the need for assessment to be strictly aligned with classroom goals, curricula and instruction. Selfassessment is considered a key component of this approach.
Performance assessment refers to the evaluation of outcomes relevantto a domain of interest (e.g., grammatical ability), which are derived from the observation of students performing complex tasks that invoke realworld applications (Norris et al., 1998). As with most performance data, assessments are scored by human judges (Stiggins, 1987; Herman et al., 1992; Brown, 1998) according to a scoring rubric that describes what testtakers need to do in order to demonstrate knowledge or ability at a given performance level. Bachman (2002) characterized language performance assessment as typically: (1) involving more complex constructs than those measured in selected-response tasks; (2) utilizing more complex and authentic tasks; and (3) fostering greater interactions between the characteristics of the test-takers and the characteristics of the assessment tasks than in other types of assessments. Performanceassessment encourages self-assessment by making explicit the performance criteria in a scoring rubric. In this way, students can then use the criteria to evaluate their performance and contribute proactively to their own learning.
Challenges and new directions in assessing grammatical ability
Challenge 1: Defining grammatical ability
One major challenge revolves around how grammatical ability has been defined both theoretically and operationally in language testing. As we saw in Chapters 3 and 4, in the 1960s and 1970s language teaching and language testing maintained a strong syntactocentricview of language rooted largely in linguistic structuralism. Moreover, models of language ability, such as those proposed by Lado (1961) and Carroll (1961), had a clear linguistic focus, and assessment concentrated on measuring language elements –defined in terms of morphosyntactic forms on the sentence level – while performing language skills. Grammatical knowledge was determinedsolely in terms of linguistic accuracy. This approach to testing led to examinations such at the CELT (Harris and Palmer, 1970a) and the English Proficiency Test battery (Davies, 1964).
Challenge 2: Scoring grammatical ability
A second challenge relates to scoring, as the specification of both form and meaning is likely to influence the ways in which grammar assessments are scored. As we discussed in Chapter 6, responses with multiple criteria for correctness may necessitate different scoring procedures. For example, the use of dichotomous scoring, even with certain selectedresponse items, might need to give way to partial-credit scoring, since some wrong answers may reflect partial development either in form or meaning. As a result, language educators might need to adapt their scoring procedures to reflect the two dimensions of grammatical knowledge. This might, in turn, require the use of measurement models that can accommodate both dichotomous and partial-credit data in calculating and analyzing test scores. Then, in scoring extended-production tasks for both form and meaning, descriptors on scoring rubrics might need to be adapted to reflect graded performance in the two dimensions of grammatical knowledge more clearly. It should also be noted that more complex scoring procedures will impact the resources it takes to mark responses or to program machine-scoring devices. It will also require a closer examination (and hopefully ongoing research) of how a wrong answer may be a reflection of interlanguage development. However, successfully meeting these challenges could provide a more valid assessment of the test takers’ underlying grammatical ability.
Challenge 3: Assessing meanings
The third challenge revolves around ‘meaning’ and how ‘meaning’ in amodel of communicative language ability can be defined and assessed.The ‘communicative’ in communicative language teaching, communicative language testing, communicative language ability, or communicative competence refers to the conveyance of ideas, information, feelings, attitudes and other intangible meanings (e.g., social status) through language. Therefore, while the grammatical resources used to communicate these meanings precisely are important, the notion of meaning conveyance in the communicative curriculum is critical. Therefore, in order to test something as intangible as meaning in second or foreign language use, we need to define what it is we are testing.
Challenge 4: Reconsidering grammar-test tasks
The fourth challenge relates to the design of test tasks that are capable of both measuring grammatical ability and providing authentic and engaging measures of grammatical performance. Since the early 1960s, language educators have associated grammar tests with discrete-point, multiple-choice tests of grammatical form. These and other ‘traditional’ test tasks (e.g., grammaticality judgments) have been severely criticized for lacking in authenticity, for not engaging test-takers in language use, and for promoting behaviors that are not readily consistent with communicative language teaching. Discrete-point testing methods may have even led some teachers to have reservations about testing grammar or to have uncertainties about how to test it communicatively.
Challenge 5: Assessing the development of grammatical ability
The fifth challenge revolves around the argument, made by some researchers, that grammatical assessments should be constructed, scored and interpreted with developmental proficiency levels in mind. This notion stems from the work of several SLA researchers (e.g. Clahsen, 1985; Pienemann and Johnson, 1987; Ellis, 2001b) who maintain that the principal finding from years of SLA research is that structures appear to be acquired in a fixed order and a fixed developmental sequence. Furthermore, instruction on forms in non-contiguous stages appears to be ineffective. As a result, the acquisitional development of learners, they argue, should be a major consideration in the L2 grammar testing.
Sources :
Purpura, james. 2004. ASSESSING GRAMMAR. United Kingdom: University Press Cambridge.
ASSESSING VOCABULARY
The place of vocabulary in language assessment
Recent trends in language testing
However, scholars in the field of language testing have a rather different perspective on vocabulary-test items Of the conventional kind. Such items fit neatly into what language testers call the discretepoint approach to testing. This involves designing tests to assess whether learners have knowledge Of particular structural elements of the language: word meanings. word forms, Sentence patterns, sound contrasts and so on. In the last thirty years of the twentieth century, language testers progressively moved away from this approach, to the extent that such tests are now quite out of step with current thinking about how to design language tests, especially for proficiency assessment.
A number of criticisms can be made of discrete-point vocabulary tests.
- It is difficult to make any general statement about a learner's vocabulary on the basis of scores in such a test.
- Being proficient in a second language is not just a matter Of knowing a lot Of words — or grammar rules, for that matter — but being able to exploit that knowledge effectively for various communicative purposes. Learners can build up an impressive knowledge of vocabulary (as reflected in high test scores) and yet be incapable Of understanding a radio news broadcast or asking for assistance at an enquiry counter.
- Learners need to show that they can use words appropriatelytheir own speech and writing, rather than just demonstrating that they understand what a word can mean..
- In normal language use, words do not occur by themselves or in isolated sentences but as integrated elements of whole texts and discourse. They belong in specific conversations, jokes, stories, letters, textbooks, legal proceedings, newspaper advertisements and so on. And the way that we interpret a word is significantly influenced by the context in which it occurs.
- In communication situations, it is quite possible to compensate for lack of knowledge of particular words. We all know learners who are remarkably adept at getting their message across by making the best use of limited lexical resources. Readers do not have to understand every word in order to extract meaning from a text satisfactorily. Some words can be ignored, while the meaning of others can be guessed by using contextual clues, background knowledge of the subject matter and so on. Listeners can use similar strategies, as well as seeking clarification, asking for a repetition and checking that they have interpreted the message correctly.
Three dimensions of vocabulary assessment
Up to this point, I have outlined two contrasting perspectives on the role of vocabulary in language assessment. One point Of view is that it is perfectly sensible to write tests that measure whether learners know the meaning and usage of a set of words, taken as independent semantic units. The other view is that vocabulary must always be assessed in the context of a language-use task. Where it interacts in a natural way with other components of language knowledge. To some extent, the two views are complementary in that they relate to different purposes of assessment. Conventional vocabulary tests are most likely to be used by classroom teachers for assessing progress in vocabulary learning and diagnosing areas of weakness. Other users Of these tests are researchers in second language acquisition with a special interest in how learners develop their knowledge Of, and ability to use. Target-language words. On the other hand, researchers in language testing and those who undertake large testing projects tend to be more concerned with the design of tests that assess learners' achievement or proficiency on a broader scale. For such purposes, vocabulary knowledge has a lower profile, except to the extent that it contributes to, Or detracts from, the performance of communicative tasks.
Discrete — embedded
The first dimension focuses on the construct which underlies the assessment instrument. In language testing, the term construct refers to the mental attribute or ability that a test is designed to measure. Discretetest takes vocabulary knowledge as a distinct construct, separated from other components of language competence.However, most existing vocabulary tests are designed on the assumption that it is meaningful to treat them as an independent construct for assessment purposes and can thus be classified as discrete measures in the sense that I am defining it here. In Contrast, an embedded vocabulary measure is one that contributes to the assessment of a larger construct. I have already given an example of such a measure.
Selective — comprehensive
The second dimension concerns the range of vocabulary to be included in the assessment. A conventional vocabulary test is based on a set of target words selected by the test-writer, and the test-takers are assessed according to how well they demonstrate their knowledge of the meaning or use Of those words. This is what I call a selective vocabulary measure.
Context-independent — context-dependent
The role of context which is an old issue in vocabulary testing is the basis for the third dimension.Traditionally contextualization has meant that a word is presented to test-takers in a sentence rather than as an isolated element. From a contemporary perspective, it is necessary to broaden the notion Of context to include whole texts and, more generally, discourse.
Vocabulary tests: four case studies
The Vocabulary Levels Test
The Vocabulary levels Test was devised by Paul Nation at Victoria University Of Wellington in New Zealand in the early 1980s as a simple instrument for classroom use by teachers in order to help them develop a suitable vocabulary teaching and learning programed for their students.He has distributed copies freely and made it available in two publications (Nation, 1983; 1990), and it has been widely used in New Zealand and many Other countries. It has proved to be a useful tool for diagnostic vocabulary testing of migrant or international students when they first arrive at a secondary school in an English-speaking country. Moreover in the absence Of any more sophisticated measure, it has been used by researchers who needed an estimate of the vocabulary size of their non-native-speaking subjects. Meara calls it the •nearest thing we have to a Standard test in vocabulary' (1996a: 38). Thus, it is certainly a test that deserves attention in a book on vocabulary assessment.
The Eurocentres Vocabulary Size Test
Like the Vocabulary Levels Test, the Eurocentres Vocabulary Size Test (EVST) makes an estimate of a learner's vocabulary size using a graded sample of words covering numerous frequency levels. However, there are several differences in the way that the two tests are designed and so it is worthwhile to look at the EVST in some detail as well. As I noted in Chapter 4, the EVST is a checklist test which presents learners with a series of words and simply requires them to indicate whether they know each one or not. It includes a substantial proportion of non-words to provide a basis for adjusting the test-takers' scores if they appear to be overstating their vocabulary knowledge. Another distinctive feature of the EVST is that it is administered by computer rather than as a pen-and-paper test. Let us now look at the test from two perspectives: first as a placement instrument and then as a measure of vocabulary size.
The EVST as a measure of vocabulary size
If the Eurocentres test is to have a wider application than just as a placement tool for language schools. We also need to consider its validity as a measure of vocabulary Size and for this We should 100k into various aspects of its design:
- The format of the test and, in particular, the role of the non-words;
- The selection of the words to be tested; and
- The scoring of the test.
The Vocabulary Knowledge Scale
The development Of the Vocabulary Knowledge Scale (VKS) for use in their research on incidental vocabularyacquisition. The instrument is of interest not only as a test in its own right but also as a way Of exploring some issues that arise in any attempt to measure quality Of vocabulary knowledge in a practical manner.
The Test of English as a Foreign Language
Our fourth case Study involves one Of the major language tests in the world today. The Test of English as a Foreign Language, or TOEFL, is administered in 180 countries and territories to more than 900,000 candidates. As one might expect Of a test with such impressive vital statistics, this is an American invention — one of a whole range of tests. covering many spheres of education and employment in the United States, that are administered by the Educational Testing Service (ETS) of Princeton. New Jersey. Like Other ETS tests, TOEFL relies on sophisticated statistical analyses and testing technology in order to ensure its quality as a measuring instrument and its efficient administration to such large numbers of test-takers. The whole edifice, though, has been built on a simple building block: the multiple-choice item. Until recently, all the items in the basic TOEFL test have been of this type. The exclusive use Of the multiple-choice format has been one source Of criticism of the test by language teachers. Because it has limited the aspects of language proficiency that could be assessed by the test. Consequently it has been seen as having a very negative washback effect, in the sense that learners preparing to take it have often focused narrowly on test-taking Skills at the expense Of developing a wider range Of academic study skills.
The primary purpose of TOEFL is to assess whether foreign students planning to study in a tertiary institution where English is the medium of instruction have a sufficient level of proficiency in the language to be able to undertake their academic studies without being hampered by language-related difficulties. Thus, students from non-English-speaking countries applying for admission to North American colleges and universities normally take the test in their Own country some time in advance, and their scores help to determine whether they will be admitted and whether they will be required to take further ESL courses once they arrive on campus. Apart from university admissions officers.Certain employers and professional bodies also use TOEFL scores as a basis for deciding whether foreign-trained professionals. such as doctors, are proficient enough in the language to practice their skills in an Englishspeaking environment. This means that the test has an important gate-keeping role. in that it can influence a person's future prospectsfor education and employment, and therefore intending can didates take the test very seriously. A whole industry for TOEFL preparation has grown up in many countries to provide candidates with a wide range Of practice materials and intensive coaching in test-taking techniques.
From the viewpoint Of vocabulary assessment, the history Of the TOEFL programmed represents a fascinating case study Of how approaches to testing have changed in the latter part of the twentieth century. In particular, vocabulary testing has become progressively more embedded and context dependent as a result Of successive revisions Of the test battery during that period. Thus, we need to trace the development Of the test from the early 1960s to the present to see how and why the changes occurred.
Sources :
Read John. 2000. ASSESSING VOCABULARY. United kingdom: university Press Cambridge.