Tugas language assessment

Kamis, 14 Mei 2020

SUMMARY ASSESSING GRAMMAR AND VOCABULARY

ASSESSING GRAMMAR

Differing notions of ‘grammar’ for assessment

Introduction

The study of grammar has had a long and important role in the history ofsecond language and foreign language teaching. For centuries, to learn another language, or what I will refer to generically as an L2, meant to know the grammatical structures of that language and to cite prescriptions for its use. Grammar was used to mean the analysis of a language system, and the study of grammar was not just considered an essential feature of language learning, but was thought to be sufficient for learners to actually acquire another language (Rutherford, 1988). Grammar in and of itself was deemed to be worthy of study – to the extent that in the Middle Ages in Europe, it was thought to be the foundation of all knowledge and the gateway to sacred and secular understanding (Hillocks and Smith, 1991). Thus, the central role of grammar in language teaching remained relatively uncontested until the late twentieth century. Even a few decades ago, it would have been hard to imagine language instruction without immediately thinking of grammar.

What is meant by ‘grammar’ in theories of language?

Grammar and linguistics

When most language teachers, second language acquisition (SLA) researchers and language testers think of ‘grammar’, they call to mind one of the many paradigms (e.g., ‘traditional grammar’ or ‘universal grammar’) available for the study and analysis of language. Such linguistic grammars are typically derived from data taken from native speakers and minimally constructed to describe well-formed utterances within an individual framework. These grammars strive for internal consistency and are mainly accessible to those who have been trained in that particular paradigm.

Since the 1950s, there have been many such linguistic theories – too numerous to list here – that have been proposed to explain language phenomena. Many of these theories have helped shape how L2 educators currently define grammar in educational contexts. Although it is beyond the purview of this book to provide a comprehensive review of these theories, it is, nonetheless, helpful to mention a few, considering both the impact they have had on L2 education and the role they play in helping define grammar for assessment purposes.

These two views of linguistic analysis have been instrumental in determining how grammar has been conceptualized in L2 classrooms in recent years. They have also influenced definitions of L2 grammar forassessment purposes. I will now provide a brief overview of some of the more influential linguistic theories that typify the syntactocentric and communicative views of language.

Form-based perspectives of language

One of the oldest theories to describe the structure of language is traditional grammar. Originally based on the study of Latin and Greek, traditional grammar drew on data from literary texts to provide rich and lengthy descriptions of linguistic form. Unlike some other syntactocentric theories, traditional grammar also revealed the linguistic meanings of these forms and provided information on their usage in a sentence (Celce-Murcia and Larsen-Freeman, 1999). Traditional grammar supplied an extensive set of prescriptive rules along with the exceptions. A typical rule in a traditional English grammar might be:

The first-person singular of the present tense verb ‘to be’ is ‘I am’. ‘Am’ is used with ‘I’ in all cases, except in first-person singular negative tag and yes/no questions, which are contracted. In this case, the verb ‘are’ is used instead of ‘am’. For example, ‘I’m in a real bind, aren’t I?’ or ‘Aren’t I trying my best?’

Form- and use-based perspectives of language

The three theories of linguistic analysis described thus far have provided insights to L2 educators on several grammatical forms. These insights provide information to explain what structures are theoretically possible in a language. Other linguistic theories, however, are better equipped to examine how speakers and writers actually exploit linguistic forms during language use. For example, if we wish to explain how seemingly similar structures like I like to read and I like reading connote different meanings, we might turn to those theories that study grammatical form and use interfaces. This would address questions such as: Why does a language need two or more structures that are similar in meaning? Are similar forms used to convey different specialized meanings? To what degree are similar forms a function of written versus spoken language, or to what degree are these forms characteristic of a particular social group or a specific situation? It is important for us to discuss these questions briefly if we ultimately wish to test grammatical forms along with their meanings and uses in context.

One approach to linguistic analysis that has contributed greatly to our understanding of the grammatical forms found in language use, as well as the contextual factors that influence the variability of these forms, is corpus linguistics. I will briefly describe corpus linguistics along with how findings from this approach can be useful for assessing grammar.

The common practice of compiling linguistic corpora, or large andprincipled collections of natural spoken and written texts, in order toanalyze by computer patterns of language use in large databases of authentic texts has led to a relatively new field known as ‘corpus linguistics’. Not a theory of language per se, corpus linguistics embodies a suite of tools and methods designed to provide a source of evidence so that linguistic data can be analyzed distributionally – that is, to show how often and where a linguistic form occurs in spoken or written text. According to Biber, Conrad and Reppen (1998), these analyses typically focus on two concerns. One type of study examines the use of one linguistic feature (i.e., a lexical item or grammatical structure) in comparison with another. For example, corpus-based studies might examine the different uses of would. These studies might also compare the word wish with thatclauses and to-infinitives, or they might examine a linguistic feature with a non-linguistic feature, such as gender, dialect or setting.

Communication-based perspectives of language

Other theories have provided grammatical insights from a communicationbased perspective. Such a perspective expresses the notion that language involves more than linguistic form. It moves beyond the view of language as patterns of morphosyntax observed within relatively decontextualized sentences or sentences found within natural-occurring corpora. Rather, a communication-based perspective views grammar as a set of linguistic norms, preferences and expectations that an individual invokes to convey a host of pragmatic meanings that are appropriate, acceptable and natural depending on the situation. The assumption here is that linguistic form has no absolute, fixed meaning in language use (as seen in sentences 1.5 and 1.7 above), but is mutable and open to interpretation by those who use it in a given circumstance. Grammar in this context is often co-terminous with language itself, and stands not only for form, but also for meaningfulness and pragmatic appropriacy, acceptability or naturalness – a topic I will return to later since I believe that a blurring of these concepts is misleading and potentially problematic for language educators.

What is pedagogical grammar?

A pedagogical grammar represents an eclectic, but principled description of the target-language forms, created for the express purpose of helping teachers understand the linguistic resources of communication. These grammars provide information about how language is organized and oﬀer relatively accessible ways of describing complex, linguistic phenomena for pedagogical purposes.

Research on L2 grammar teaching, learning and assessment

Research on L2 teaching and learning

Over the years, several of the questions mentioned above have intriguedlanguage teachers, inspiring them to experiment with different methods,approaches and techniques in the teaching of grammar. To determine if students had actually learned under the different conditions, teachers have used diverse forms of assessment and drawn their own conclusions about their students. In so doing, these teachers have acquired a considerable amount of anecdotal evidence on the strengths and weaknesses of using different practices to implement L2 grammar instruction. These experiences have led most teachers nowadays to ascribe to an eclectic approach to grammar instruction, whereby they draw upon a variety of different instructional techniques, depending on the individual needs, goals and learning styles of their students.

Comparative methods studies

The comparative methods studies sought to compare the effects of different language-teaching methods on the acquisition of an L2. These studies occurred principally in the 1960s and 1970s, and stemmed from areaction to the grammar-translation method, which had dominated language instruction during the first half of the twentieth century. More generally, these studies were in reaction to form-focused instruction (referred to as ‘focus on forms’ by Long, 1991), which used a traditional structural syllabus of grammatical forms as the organizing principle for L2 instruction. According to Ellis (1997), form-focused instruction contrasts with meaning-focused instruction in that meaning-focused instruction emphasizes the communication of messages (i.e., the act of making a suggestion and the content of such a suggestion) while formfocused instruction stresses the learning of linguistic forms. These can be further contrasted with form-and-meaning focused instruction (referred to by Long (1991) as ‘focus-on-form’), where grammar instruction occurs in a meaning-based environment and where learners strive to communicate meaning while paying attention to form. (Note that Long’s version of ‘focus-on-form’ stresses a meaning orientation with an incidental focus on forms.) These comparative methods studies all shared the theoretical premise that grammar has a central place in the curriculum, and that successful learning depends on the teaching method and the degree to which that promotes grammar processing.

Non-interventionist studies

While some language educators were examining different methods of teaching grammar in the 1960s, others were feeling a growing sense of dissatisfaction with the central role of grammar in the L2 curriculum. As a result, questions regarding the centrality of grammar were again raised by a small group of L2 teachers and syllabus designers who felt that the teaching of grammar in any form simply did not produce the desired classroom results. Newmark (1966), in fact, asserted that grammatical analysis and the systematic practice of grammatical forms were actually interfering with the process of L2 learning, rather than promoting it, andif left uninterrupted, second language acquisition, similar to first language acquisition, would proceed naturally.

At the same time, the role of grammar in the L2 curriculum was alsobeing questioned by some SLA researchers (e.g., Dulay and Burt, 1973; Bailey, Madden and Krashen, 1974) who had been studying L2 learning in instructed and naturalistic settings. In their attempts to characterize the L2 learner’s interlanguage at one or more points along the path toward target-like proficiency, several researchers came to similar conclusions about L2 development. They found that instead of making incremental leaps in grammatical ability through an accumulation of grammatical forms, as presented in a traditional grammar syllabus, learners in both instructed and naturalistic settings acquired the target structures in a relatively fixed order (Ellis, 1994) regardless of when they were introduced. For example, Krashen (1977) claimed that, in general, ESL learners first acquire the -ing affix, plural markings and the copula (stage 1), and then the auxiliary and the articles (stage 2). This is followed by the irregular past verb forms (stage 3) and finally, the regular past, the third-person singular affix and the possessive -s affix (stage 4). While this information is interesting, research findings involve only a skeletal list of the possible grammar points that any typical curriculum would encompass. As a result, we might wonder how this order will change if other grammar points are investigated at the same time. Also, we have no idea how this order would hold for many other languages.

Empirical studies in support of non-intervention

The non-interventionist position was examined empirically by Prabhu(1987) in a project known as the Communicational Teaching Project (CTP) in southern India. This study sought to demonstrate that the development of grammatical ability could be achieved through a task-based, rather than a form-focused, approach to language teaching, provided that the tasks required learners to engage in meaningful communication. In the CTP, Prabhu (1987) argued against the notion that the development of grammatical ability depended on a systematic presentation of grammar followed by planned practice. However, in an effort to evaluate the CTP program, Beretta and Davies (1985) compared classes involved in the CTP with classes outside the project taught with a structural-oral-situational method. They administered a battery of tests to the students, and found that the CTP learners outperformed the control group on a task-based test, whereas the non-CTP learners did better on a traditional structure test. These results lent partial support to the non-interventionist position by showing that task-based classrooms based on meaningful communication can also be effective in promoting SLA. However, these results also showed that again students do best when they are taught and tested in similar ways.

Possible implications of fixed developmental order to language assessment

The notion that structures appear to be acquired in a fixed developmental order and in a fixed developmental sequence might conceivably havesome relevance to the assessment of grammatical ability. First of all, these findings could give language testers an empirical basis for constructing grammar tests that would account for the variability inherent in a learner’s interlanguage. In other words, information on the acquisitionalorder of grammatical items could conceivably serve as a basis for selecting grammatical content for tests that aim to measure different levels of developmental progression, such as Chang (2002, 2004) did in examining the underlying structure of a test that attempted to measure knowledge of the relative clauses. These findings also suggest a substantive approach to defining test tasks according to developmental order and sequence on the basis of how grammatical features are acquired over time (Ellis, 2001b). In other words, one task could potentially tap into developmental level one, while another taps into developmental level two, and so forth.

To illustrate, grammar tests targeting beginning English-language learners often include questions on the articles and the third-person singular -s affix, two features considered to be ‘very challenging’ from an acquisitional perspective. Since, according to these findings, no beginning learner would be expected to have target-like control of these particular grammatical items, the inclusion of these grammatical features in a beginning classroom achievement test might be questionable. However, the inclusion of these items in a placement test would be highly appropriate since the goal of placement assessment is to identify a wide range of ability levels so that developmentally homogeneous groups canbe formed.

Problems with the use of development sequences as a basis for assessment

Although developmental sequence research offers an intuitively appealing complement to accuracy-based assessments in terms of interpreting test scores, I believe this method is fraught with a number of serious problems, and language educators should use extreme caution in applying this method to language testing. This is because our understanding of natural acquisitional sequences is incomplete and at too early a stage of research to be the basis for concrete assessment recommendations (Lightbown, 1985; Hudson, 1993). First, the number of grammatical sequences that show a fixed order of acquisition is very limited, far too limited for all but the most restricted types of grammar tests. For example, what is the order for acquiring the modals, the conditionals, or the infinitive or gerund complements? Second, much of the research on acquisitional sequences is based on data from naturalistic settings, where students are provided with considerable exposure to the language. We have yet to learn about how these sequences hold for students whose only exposure to a language is an L2 classroom. Furthermore, acquisitional sequences make reference only to linguistic forms; no reference is made to how these forms interact with the conveyance of literal and implied meanings associated with a specific context. Third, as the rate (not the route) of acquisition appears to be influenced by the learner’sfirst language and by exposure to other languages, we need to understandhow these factors might impact on development rates and how we wouldreconcile this if we wished to test heterogeneous groups of language learners. Finally, as the developmental levels represent an ordering of grammatical rules during acquisition, this may or may not be on the same measurement scale as accuracy scores. Thus, until further research demonstrates the precise relationship between these scales, we should be careful about comparisons between proficiency levels based on accuracy scales and levels of interlanguage development. In the end, it is premature to apply the findings from acquisitional sequences research to language assessment given our current level of understanding of developmental sequences.

Interventionist studies

Not all L2 educators are in agreement with the non-interventionist position to grammar instruction. In fact, several (e.g., Schmidt, 1983; Swain, 1991) have maintained that although some L2 learners are successful in acquiring selected linguistic features without explicit grammar instruction, the majority fail to do so. Testimony to this is the large number of non-native speakers who emigrate to countries around the world, live there all their lives and fail to learn the target language, or fail to learn it well enough to realize their personal, social and long-term career goals. In these situations, language teachers affirm that formal grammar instruction of some sort can be of benefit. Furthermore, most language teachers would contend that explicit grammar instruction, including systematic error correction and other instructional techniques, contributes immensely to their students’ linguistic development. Finally, despite the non-interventionist recommendations toward grammar teaching, I believe grammar still plays an important role in most L2 classrooms around the world.

Empirical studies in support of intervention

Aside from anecdotal evidence, the non-interventionist position has come under intense attack on both theoretical and empirical grounds with several SLA researchers affirming that efforts to teach L2 grammar typically results in the development of L2 grammatical ability. Hulstijn(1989) and Alanen (1995) investigated the effectiveness of L2 grammar instruction on SLA in comparison with no formal instruction. They found that when coupled with meaning-focused instruction, the formal instruction of grammar appears to be more effective than exposure to meaning or form alone. Long (1991) also argued for a focus on both meaning and form in classrooms that are organized around meaningful and sustained communicative interaction

Research on instructional techniques and their effects on acquisition

Much of the recent research on teaching grammar has focused on four types of instructional techniques and their effects on acquisition. Although a complete discussion of teaching interventions is outside thepurview of this book (see Ellis, 1997; Doughty and Williams, 1998), thesetechniques include form- or rule-based techniques, input-based techniques, feedback-based techniques and practice-based techniques (Norris and Ortega, 2000).

Grammar processing and second language development

In the grammar-learning process, explicit grammatical knowledge refers to a conscious knowledge of grammatical forms and their meanings. Explicit knowledge is usually accessed slowly, even when it is almost fully automatized (Ellis, 2001b). DeKeyser (1995) characterizes grammatical instruction as ‘explicit’ when it involves the explanation of a rule or the request to focus on a grammatical feature. Instruction can be explicitly deductive, where learners are given rules and asked to apply them, or explicitly inductive, where they are given samples of language from which to generate rules and make generalizations. Similarly, many types of language test tasks (i.e., gap-filling tasks) seem to measure explicit grammatical knowledge.

Implicit grammatical knowledge refers to ‘the knowledge of a language that is typically manifest in some form of naturally occurring language behavior such as conversation’ (Ellis, 2001b, p. 252). In terms of processing time, it is unconscious and is accessed quickly. DeKeyser(1995) classifies grammatical instruction as implicit when it does not involve rule presentation or a request to focus on form in the input; rather, implicit grammatical instruction involves semantic processing of the input with any degree of awareness of grammatical form. The hope, of course, is that learners will ‘notice’ the grammatical forms and identify form–meaning relationships so that the forms are recognized in the input and eventually incorporated into the interlanguage. This type of instruction occurs when learners are asked to listen to a passage containing a specific grammatical feature. They are then asked to answer comprehension questions, but not asked to attend to the feature. Similarly, language test tasks that require examinees to engage in interactive talk might also be said to measure implicit grammatical knowledge.

Implications for assessing grammar

The studies investigating the effects of teaching and learning on grammatical performance present a number of challenges for language assessment. First of all, the notion that grammatical knowledge structures can be differentiated according to whether they are fully automatized (i.e., implicit) or not (i.e., explicit) raises important questions for the testing of grammatical ability (Ellis, 2001b). Given the many purposes of assessment, we might wish to test explicit knowledge of grammar, implicit knowledge of grammar or both. For example, in certain classroom contexts, we might want to assess the learners’ explicit knowledge of one or more grammatical forms, and could, therefore, ask learners to answer multiple-choice or short-answer questions related to these forms.

The role of grammar in models of communicative language ability

The role of grammar in models of communicative competence

Every language educator who has ever attempted to measure a student’s communicative language ability has wondered: ‘What exactly does a student need to “know” in terms of grammar to be able to use it well enough for some real-world purpose?’ In other words, they have been faced with the challenge of defining grammar for communicative purposes. To complicate matters further, linguistic notions of grammar have changed over time, as we have seen, and this has significantly increased number of components that could be called ‘grammar’. In short, definitions of grammar and grammatical knowledge have changed over time and across context, and I expect this will be no different in the future.

Rea-Dickins’ definition of grammar

In discussing more specifically howgrammatical knowledge might be tested within a communicative framework, Rea-Dickins (1991) defined ‘grammar’ as the single embodiment of syntax, semantics and pragmatics. She argued against Canale and Swain’s (1980) and Bachman’s (1990b) multi-componential view of communicative competence on the grounds that componential representations overlook the interdependence and interaction between and among the various components. She further stated that in Canale and Swain’s (1980) model, the notion of grammatical competence was limited since it defined grammar as ‘structure’ on the one hand and as ‘structure and semantics’ on the other, but ignored the notion of ‘structure as pragmatics’. Similarly, she added that in Bachman’s (1990b) model, grammar was defined as structure at the sentence level and as cohesion at the suprasentential level, but this model failed to account for the pragmatic dimension of communicative grammar.

Larsen-Freeman’s definition of grammar

Another conceptualization of grammar that merits attention is LarsenFreeman’s (1991, 1997) framework for the teaching of grammar in com- municative language teaching contexts. Drawing on several linguistic theories and influenced by language teaching pedagogy, she has also characterized grammatical knowledge along three dimensions: linguistic form, semantic meaning and pragmatic use. Form is defined as both morphology, or how words are formed, and syntactic patterns, or how words are strung together. This dimension is primarily concerned with linguistic accuracy. The meaning dimension describes the inherent or literal message conveyed by a lexical item or a lexico-grammatical feature. This dimension is mainly concerned with the meaningfulness of an utterance. The use dimension refers to the lexico-grammatical choices a learner makes to communicate appropriately within a specific context. Pragmatic use describes when and why one linguistic feature is used in a given context instead of another, especially when the two choices conveya similar literal meaning. In this respect, pragmatic use is said to embody presuppositions about situational context, linguistic context, discourse context, and sociocultural context. This dimension is mainly concerned with making the right choice of forms in order to convey an appropriate message for the context.

What is meant by ‘grammar’ for assessment purposes?

Regardless of the assessment purpose, if we wish to make inferences about grammatical ability on the basis of a grammar test or some other form of assessment, it is important to know what we mean by ‘grammar’ when attempting to specify components of grammatical knowledge for measurement purposes. With this goal in mind, we need a definition of grammatical knowledge that is broad enough to provide a theoretical basis for the construction and validation of tests in a number of contexts. At the same time, we need our definition to be precise enough to distinguish it from other areas of language ability.

From a theoretical perspective, the main goal of language use is communication, whether it be used to transmit information, to perform transactions, to establish and maintain social relations, to construct one’s identity or to communicate one’s intentions, attitudes or hypotheses. Being the primary resource for communication, language knowledge consists of grammatical knowledge and pragmatic knowledge. Therefore, I propose a theoretical definition of language knowledge that consists of two distinct, but related, components.

Towards a definition of grammatical ability

Defining grammatical constructs

Although our basic underlying model of grammar will remain the same in all testing situations (i.e., grammatical form and meaning), what it means to ‘know’ grammar for different contexts will most likely change (see Chapelle, 1998). In other words, the type, range and scope of grammatical features required to communicate accurately and meaningfully will vary from one situation to another. For example, the type of grammatical knowledge needed to write a formal academic essay would be very different from that needed to make a train reservation. Given the many possible ways of interpreting what it means to ‘know’ grammar, it is important that we define what we mean by ‘grammatical knowledge’ for any given testing situation. A clear definition of what we believe it means to ‘know’ grammar for a particular testing context will then allow us to construct tests that measure grammatical ability.

One of the first steps in designing a test, aside from identifying the needfor a test, its purpose and audience, is to provide a clear theoretical definition of the construct(s) to be measured. If we have a theoreticallysound, as well as a clear and precise definition of grammatical knowledge, we can then design tasks to elicit performance samples of grammatical ability. By having the test-takers complete grammar tasks, we can observe – and score – their answers with relation to specific grammaticalcriteria for correctness. If these performance samples reflect the under-lying grammatical constructs – an empirical question – we can then use the test results to make inferences about the test-takers’ grammatical ability. These inferences, in turn, may be used to make decisions about the test-takers (e.g., pass the course). However, we need first to provide evidence that the tasks on a test have measured the grammatical constructs we have designed them to measure (Messick, 1993). The process of providing arguments in support of this evidence is called validation, and this begins with a clear definition of the constructs.

What is ‘grammatical ability’ for assessment purposes?

The approach to the assessment of grammatical ability in this book is based on several specific definitions. First, grammar encompasses grammatical form and meaning, whereas pragmatics is a separate, but related, component of language. A second is that grammatical knowledge, along with strategic competence, constitutes grammatical ability. A third is that grammatical ability involves the capacity to realize grammatical knowledge accurately and meaningfully in test-taking or other language-use contexts. The capacity to access grammatical knowledge to understand and convey meaning is related to a person’s strategic competence. It is this interaction that enables examinees to implement their grammatical ability in language use. Next, in tests and other language-use contexts, grammatical ability may interact with pragmatic ability (i.e., pragmatic knowledge and strategic competence) on the one hand, and with a host of non-linguistic factors such as the test-taker’s topical knowledge, personal attributes, affective schemata and the characteristics of the task on the other. Finally, in cases where grammatical ability is assessed by means of an interactive test task involving two or more interlocutors, the way grammatical ability is realized will be significantly impacted by both the contextual and the interpretative demands of the interaction.

The components of grammatical knowledge

Knowledge of phonological or graphological form and meaning

Knowledge of phonological/graphological form enables us to understand and produce features of the sound or writing system, with the exception of meaning-based orthographies such as Chinese characters, as they are used to convey meaning in testing or language-use situations.

Knowledge of lexical form and meaning

Knowledge of lexical form enables us to understand and produce those features of words that encode grammar rather than those that revealmeaning. This includes words that mark gender (e.g., waitress), countability (e.g., people) or part of speech (e.g., relate, relation). For example,when the word think in English is followed by the preposition about before a noun, this is considered the grammatical dimension of lexis, representing a co-occurrence restriction with prepositions. One area of lexical form that poses a challenge to learners of some languages is word formation. This includes compounding in English with a noun + noun or a verb + particle pattern (e.g., fire escape; breakup) or derivational affix-ation in Italian (e.g., ragazzino ‘little kid’, ragazzone ‘big kid’). For example, a student who says ‘a teacher of chemistry’ instead of ‘chemistry teacher’ or ‘*this people’ would need further instruction in lexical form.

Knowledge of morphosyntactic form and meaning

Knowledge of morphosyntactic form permits us to understand and produce both the morphological and syntactic forms of the language. This includes the articles, prepositions, pronouns, affixes (e.g., -est), syntactic structures, word order, simple, compound and complex sentences, mood, voice and modality. A learner who knows the morphosyntacticform of the English conditionals would know that: (1) an if-clause sets up a condition and a result clause expresses the outcome; (2) both clauses can be in the sentence-initial position in English; (3) if can be deleted under certain conditions as long as the subject and operator are inverted; and (4) certain tense restrictions are imposed on if and result clauses.

Knowledge of cohesive form and meaning

Knowledge of cohesive form enables us to use the phonological, lexical and morphosyntactic features of the language in order to interpret and express cohesion on both the sentence and the discourse levels. Cohesive form is directly related to cohesive meaning through cohesive devices (e.g., she, this, here) which create links between cohesive forms and their referential meanings within the linguistic environment or the surrounding co-text. Halliday and Hasan (1976, 1989) list a number of grammatical forms for displaying cohesive meaning.

Knowledge of information management form and meaning

Knowledge of information management form allows us to use linguistic forms as a resource for interpreting and expressing the information structure of discourse. Some resources that help manage the presentation of information include, for example, prosody, word order, tense-aspect and parallel structures. These forms are used to create information management meaning.

Knowledge of interactional form and meaning

Knowledge of interactional form enables us to understand and use linguistic forms as a resource for understanding and managing talk-ininteraction. These forms include discourse markers and communication management strategies. Discourse markers consist of a set of adverbs, conjunctions and lexicalized expressions used to signal certain language functions.

Designing test tasks to measure L2 grammatical ability

How does test development begin?

Every grammar-test development project begins with a desire to obtain(and often provide) information about how well a student knows grammar in order to convey meaning in some situation where the target language is used. The information obtained from this assessment then forms the basis for decision-making. Those situations in which we use the target language to communicate in real life or in which we use it for instruction or testing are referred to as the target language use (TLU) situations (Bachman and Palmer, 1996). Within these situations, the tasks or activities requiring language to achieve a communicative goal are called the target language use tasks. A TLU task is one of many languageuse tasks that test-takers might encounter in the target language use domain. It is to this domain that language testers would like to make inferences about language ability, or more specifically, about grammatical ability.

What do we mean by ‘task’?

The notion of ‘task’ in language-learning contexts has been conceptualized in many different ways over the years. Traditionally, ‘task’ has referred to any activity that requires students to do something for the intent purpose of learning the target language. A task then is any activity (i.e., short answers, role-plays) as long as it involves a linguistic or nonlinguistic (circle the answer) response to input. Traditional learning or teaching tasks are characterized as having an intended pedagogical purpose – which may or may not be made explicit; they have a set of instructions that control the kind of activity to be performed; they contain input (e.g., questions); and they elicit a response. More recently, learning tasks have been characterized more in terms of their communicative goals, their success in eliciting interaction and negotiation of meaning, and their ability to engage learners in complex meaningfocused activities (Nunan, 1989, 1993; Berwick, 1993; Skehan, 1998).

What are the characteristics of grammatical test tasks?

As the goal of grammar assessment is to provide as useful a measurement as possible of our students’ grammatical ability, we need to design test tasks in which the variability of our students’ scores is attributed to thedifferences in their grammatical ability, and not to uncontrolled or irrelevant variability resulting from the types of tasks or the quality of the tasks that we have put on our tests. As all language teachers know, the kinds of tasks we use in tests and their quality can greatly influence how students will perform. Therefore, given the role that the effects of task characteristics play on performance, we need to strive to manage (or at least understand) the effects of task characteristics so that they will function the way we designed them to – as measures of the constructs we want to measure (Douglas, 2000). In other words, specifically designed tasks will work to produce the types of variability in test scores that can be attributed to the underlying constructs given the contexts in which they were measured (Tarone, 1998). To understand the characteristics of test tasks better, we turn to Bachman and Palmer’s (1996) framework for analyzing target language use tasks and test tasks.

The Bachman and Palmer framework

Bachman and Palmer’s (1996) framework of task characteristics represents the most recent thinking in language assessment of the potential relationships between task characteristics and test performance. In this framework, they outline five general aspects of tasks, each of which is characterized by a set of distinctive features. These five aspects describe characteristics of (1) the setting, (2) the test rubrics, (3) the input, (4) the expected response and (5) the relationship between the input and response.

Describing grammar test tasks

When language teachers consider tasks for grammar tests, they call to mind a large repertoire of task types that have been commonly used in teaching and testing contexts. We now know that these holistic task types constitute collections of task characteristics for eliciting performance and that these holistic task types can vary on a number of dimensions. We also need to remember that the tasks we include on tests should strive to match the types of language-use tasks found in real-life or language instructional domains.

In designing grammar tests, we need to be familiar with a wide range of activities to elicit grammatical performance. In the rest of the chapter, I will describe several tasks in light of how they can be used to measure grammatical knowledge. I will use the Bachman and Palmer framework as a guide for task specification in this discussion.

Selected-response task types

Selected-response tasks present input in the form of an item, and testtakers are expected to select the response. Other than that, all other task characteristics can vary. For example, the form of the input can be language, non-language or both, and the length of the input can vary from aword to larger pieces of discourse. In terms of the response, selectedresponse tasks are intended to measure recognition or recall of grammatical form and/or meaning. They are usually scored right/wrong, based onone criterion for correctness; however, in some instances, partial-credit scoring may be useful, depending on how the construct is defined. Finally, selected-response tasks can vary in terms of reactivity, scope and directness.

Limited-production task types

Limited-production tasks present input in the form of an item with language and/or non-language information that can vary in length or topic. Different from selected-response tasks, limitedproduction tasks elicit a response embodying a limited amount of language production. The length of this response can be anywhere from a word to a sentence. All task characteristics in limited-production tasks can vary with the exception of two: the type of input (always an ‘item’) and the type of expected response (always ‘limited-production’).

Limited-production tasks are intended to assess one or more areas of grammatical knowledge depending on the construct definition. Unlike selected-response items, which usually have only one possible answer, the range of possible answers for limited-production tasks can, at times, be large – even when the response involves a single word.

Developing tests to measure L2 grammatical ability

What makes a grammar test ‘useful’?

Score-based inferences from grammar tests can be used to make a variety of decisions. For example, classroom teachers use these scores as a basis for making inferences about learning or achievement. These inferences can then serve to provide feedback for learning and instruction, assign grades, promote students to the next level, or even award a certificate. They can also be used to help teachers or administrators make decisions about instruction or the curriculum.

The information derived from language tests, of which grammar tests are a subset, can be used to provide test-takers and other test-users with formative and summative evaluations. Formative evaluation relating togrammar assessment supplies information during a course of instruction or learning on how test-takers might increase their knowledge of grammar, or how they might improve their ability to use grammar in communicative contexts. It also provides teachers with information on how they might modify future instruction or fine-tune the curriculum. For example, feedback on an essay telling a student to review the passive voice would be formative in nature. Summative evaluation provides test stakeholders with an overall assessment of test-taker performance related to grammatical ability, typically at the end of a program of instruction. This is usually presented as a profile of one or more scores or as a single grade.

Score-based inferences from grammar tests can also be used to make, or contribute to, decisions about program placement. This information provides a basis for deciding how students might be placed into a level of a language program that best matches their knowledge base, or it might determine whether or not a student is eligible to be exempted from further L2 study. Finally, inferences about grammatical ability can make or contribute to other high-stakes decisions about an individual’s readiness for learning or promotion, their admission to a program of study, or their selection for a job.

Given the goals and uses of tests in general, and grammar tests in particular, it is fitting to ask how we might actually know if a test is, indeed, able to elicit scorable behaviors from which to make trustworthy and meaningful inferences about an individual’s ability. In other words, how do we know if a grammar test is ‘good’ or ‘useful’ for our particular context?

Many language testers (e.g., Harris, 1969; Lado, 1961) have addressed this question over the years. Most recently, Bachman and Palmer (1996) have proposed a framework of test usefulness by which all tests and test tasks can be judged, and which can inform test design, development and analysis. They consider a test ‘useful’ for any particular testing situation to the extent that it possesses a balance of the following six complementary qualities: reliability, construct validity, authenticity, interactiveness, impact and practicality. They further maintain that for a test to be ‘useful’, it needs to be developed with a specific purpose in mind, for a specific audience, and with reference to a specific target language use (TLU) domain.

Overview of grammar-test construction

Bachman and Palmer (1996) organize test development into three stages: design, operationalization and administration. I will discuss each of these stages in the process of describing grammar-test development.

Stage 1: Design

The design stage of test development involves the accumulation of information and making initial decisions about the entire test process. In tests involving one class, this may be a relatively informal process; however, in tests involving wider audiences, such as a joint final exam or a placement test, the decisions about test development must be discussed and negotiated with several stakeholders. The outcome of the design stage is a design statement. According to Bachman and Palmer (1996, p. 88), this document should contain the following components:

a description of the purpose(s) of the test,
a description of the TLU domains and task types,

a description of the test-takers,
a definition of the construct(s) to be measured,
a plan for evaluating test usefulness, and
a plan for dealing with resources.

Stage 2: Operationalization

The operationalization stage of grammar-test development describes how an entire test involving several grammar tasks is assembled, and how the individual tasks are specified, written and scored.

Specifying the scoring method
Scoring selected-response tasks
Scoring extended-production tasks
Using scoring rubrics
Grading

Stage 3: Test administration and analysis

The final stage in the process of developing grammar tests involves the administration of the test to individual students or small groups, andthen to a large group of examinees on a trial basis.

Illustrative tests of grammatical ability

The First Certificate in English Language Test (FCE)

Given the assessment purposes and the intended uses of the FCE, the FCE grammar assessments privilege construct validity, authenticity, interactiveness and impact. This is done by the way the construct of grammatical ability is defined. This is also done by the ways in which these abilities are tapped into, and the ways in which the task characteristics are likely to engage the examinee in using grammatical knowledge and other components of language ability in processing input to formulate responses. Finally, this is done by the way in which Cambridge ESOL has promoted public understanding of the FCE, its purpose and procedures, and has made available certain kinds of information on the test. These qualities may, however, have been stressed at the expense of reliability.

The Comprehensive English Language Test (CELT)

In terms of the purposes and intended uses of the CELT, the authors explicitly stated, ‘the CELT is designed to provide a series of reliable and easy-to-administer tests for measuring English language ability of nonnative speakers’ (Harris and Palmer, 1970b, p. 1). As a result, concerns for high reliability and ease of administration led the authors to makechoices privileging reliability and practicality over other qualities of testusefulness. To maximize consistency of measurement, the authors used only selected-response task types throughout the test, allowing for minimal fluctuations in the scores due to characteristics of the test method. This allowed them to adopt ‘easy-to-administer’ and ‘easy-toscore’ procedures for maximum practicality and reliability. Reliability Illustrative tests of grammatical ability 201was also enhanced by pre-testing items with the goal of improving their psychometric characteristics.

Reliability might have been emphasized at the expense of other important test qualities, such as construct validity, authenticity, interactiveness and impact. For example, construct validity was severely compromised by the mismatch among the purpose of the test, the way the construct was defined and the types of tasks used to operationalize the constructs. In short, scores from discrete-point grammar tasks were used to make inferences about speaking ability rather than make interpretations about the test-takers’ explicit grammatical knowledge.

Finally, authenticity in the CELT was low due to the exclusive use of multiple-choice tasks and the lack of correspondence between these tasks and those one might encounter in the target language use domain. Interactiveness was also low due to the test’s inability to fully involve the test-takers’ grammatical ability in performing the tests. The impact of the CELT on stakeholders is not documented in the published manual.

In all fairness, the CELT was a product of its time, when emphasis was on discrete-point testing and reliability, and when language testers were not yet discussing qualities of test usefulness in terms of authenticity, interactiveness and impact.

The Community English Program (CEP) Placement Test

Given the purposes and the intended uses of the CEP Placement Test, the grammar section privileges authenticity, construct validity, reliability and practicality. Similar to tasks in the instruction, the theme-based test tasks all support the same overarching theme presented from different perspectives. Then, the construct of grammatical knowledge is defined in terms of the grammar used to express the theme. Given the multiple-choice format and the piloting of items, reliability is an important concern. Finally, the multiple-choice format is used over a limited-production format to maximize practicality. This compromise is certainly emphasized at the expense of construct validity and authenticity (of task).

Nonetheless, grammatical ability is also measured in the writing and speaking parts of the CEP Placement Test. These sections privilege construct validity, reliability, authenticity and interactiveness. In these tasks, students are asked to use grammatical resources to write about and discuss the theme they have been learning about during the test. In boththe writing and speaking sections, grammatical ability is a separatelyscored part of the scoring rubric, and definitions of grammatical knowledge are derived from theory and from an examination of benchmark samples. Reliability is addressed by scoring all writing and speaking performance samples ‘blind’ by two raters. In terms of authenticity and interactiveness, these test sections seek to establish a strong correspondence between the test tasks and the type of tasks encountered in themebased language instruction – that is, examinees listen to texts in which the theme is presented, they learn new grammar and use it to express ideas related to the theme, they then read, write and speak about the theme. The writing and speaking sections require examinees to engage both language and topical knowledge to complete the tasks. In both cases, grammatical control and topical control are scored separately. Finally, while these test sections prioritize construct validity, reliability, authenticity and interactiveness, it is certainly at the expense of practicality and impact.

Learning-Oriented Assessments of Grammatical Ability

What is learning-oriented assessment of grammar?

Alternative assessment emphasizes an alternative to and rejection of selected-response, timed and one-shot approaches to assessment,whether they occur in large-scale or classroom assessment contexts. Alternative assessment encourages assessments in which students are asked to perform, create, produce or do meaningful tasks that both tap into higher-level thinking (e.g., problem-solving) and have real-world implications (Herman et al., 1992). Alternative assessments are scored by humans, not machines.

Similar to alternative assessment, authentic assessment stresses measurement practices which engage students’ knowledge and skills in ways similar to those one can observe while performing some real-life or ‘authentic’ task (O’Malley and Valdez-Pierce, 1996). It also encourages tasks that require students to perform some complex, extendedproduction activity, and emphasizes the need for assessment to be strictly aligned with classroom goals, curricula and instruction. Selfassessment is considered a key component of this approach.

Performance assessment refers to the evaluation of outcomes relevantto a domain of interest (e.g., grammatical ability), which are derived from the observation of students performing complex tasks that invoke realworld applications (Norris et al., 1998). As with most performance data, assessments are scored by human judges (Stiggins, 1987; Herman et al., 1992; Brown, 1998) according to a scoring rubric that describes what testtakers need to do in order to demonstrate knowledge or ability at a given performance level. Bachman (2002) characterized language performance assessment as typically: (1) involving more complex constructs than those measured in selected-response tasks; (2) utilizing more complex and authentic tasks; and (3) fostering greater interactions between the characteristics of the test-takers and the characteristics of the assessment tasks than in other types of assessments. Performanceassessment encourages self-assessment by making explicit the performance criteria in a scoring rubric. In this way, students can then use the criteria to evaluate their performance and contribute proactively to their own learning.

Challenges and new directions in assessing grammatical ability

Challenge 1: Defining grammatical ability

One major challenge revolves around how grammatical ability has been defined both theoretically and operationally in language testing. As we saw in Chapters 3 and 4, in the 1960s and 1970s language teaching and language testing maintained a strong syntactocentricview of language rooted largely in linguistic structuralism. Moreover, models of language ability, such as those proposed by Lado (1961) and Carroll (1961), had a clear linguistic focus, and assessment concentrated on measuring language elements –defined in terms of morphosyntactic forms on the sentence level – while performing language skills. Grammatical knowledge was determinedsolely in terms of linguistic accuracy. This approach to testing led to examinations such at the CELT (Harris and Palmer, 1970a) and the English Proficiency Test battery (Davies, 1964).

Challenge 2: Scoring grammatical ability

A second challenge relates to scoring, as the specification of both form and meaning is likely to influence the ways in which grammar assessments are scored. As we discussed in Chapter 6, responses with multiple criteria for correctness may necessitate different scoring procedures. For example, the use of dichotomous scoring, even with certain selectedresponse items, might need to give way to partial-credit scoring, since some wrong answers may reflect partial development either in form or meaning. As a result, language educators might need to adapt their scoring procedures to reflect the two dimensions of grammatical knowledge. This might, in turn, require the use of measurement models that can accommodate both dichotomous and partial-credit data in calculating and analyzing test scores. Then, in scoring extended-production tasks for both form and meaning, descriptors on scoring rubrics might need to be adapted to reflect graded performance in the two dimensions of grammatical knowledge more clearly. It should also be noted that more complex scoring procedures will impact the resources it takes to mark responses or to program machine-scoring devices. It will also require a closer examination (and hopefully ongoing research) of how a wrong answer may be a reflection of interlanguage development. However, successfully meeting these challenges could provide a more valid assessment of the test takers’ underlying grammatical ability.

Challenge 3: Assessing meanings

The third challenge revolves around ‘meaning’ and how ‘meaning’ in amodel of communicative language ability can be defined and assessed.The ‘communicative’ in communicative language teaching, communicative language testing, communicative language ability, or communicative competence refers to the conveyance of ideas, information, feelings, attitudes and other intangible meanings (e.g., social status) through language. Therefore, while the grammatical resources used to communicate these meanings precisely are important, the notion of meaning conveyance in the communicative curriculum is critical. Therefore, in order to test something as intangible as meaning in second or foreign language use, we need to define what it is we are testing.

Challenge 4: Reconsidering grammar-test tasks

The fourth challenge relates to the design of test tasks that are capable of both measuring grammatical ability and providing authentic and engaging measures of grammatical performance. Since the early 1960s, language educators have associated grammar tests with discrete-point, multiple-choice tests of grammatical form. These and other ‘traditional’ test tasks (e.g., grammaticality judgments) have been severely criticized for lacking in authenticity, for not engaging test-takers in language use, and for promoting behaviors that are not readily consistent with communicative language teaching. Discrete-point testing methods may have even led some teachers to have reservations about testing grammar or to have uncertainties about how to test it communicatively.

Challenge 5: Assessing the development of grammatical ability

The fifth challenge revolves around the argument, made by some researchers, that grammatical assessments should be constructed, scored and interpreted with developmental proficiency levels in mind. This notion stems from the work of several SLA researchers (e.g. Clahsen, 1985; Pienemann and Johnson, 1987; Ellis, 2001b) who maintain that the principal finding from years of SLA research is that structures appear to be acquired in a fixed order and a fixed developmental sequence. Furthermore, instruction on forms in non-contiguous stages appears to be ineffective. As a result, the acquisitional development of learners, they argue, should be a major consideration in the L2 grammar testing.

Sources :

Purpura, james. 2004. ASSESSING GRAMMAR. United Kingdom: University Press Cambridge.

ASSESSING VOCABULARY

The place of vocabulary in language assessment

Recent trends in language testing

However, scholars in the field of language testing have a rather different perspective on vocabulary-test items Of the conventional kind. Such items fit neatly into what language testers call the discretepoint approach to testing. This involves designing tests to assess whether learners have knowledge Of particular structural elements of the language: word meanings. word forms, Sentence patterns, sound contrasts and so on. In the last thirty years of the twentieth century, language testers progressively moved away from this approach, to the extent that such tests are now quite out of step with current thinking about how to design language tests, especially for proficiency assessment.

A number of criticisms can be made of discrete-point vocabulary tests.

It is difficult to make any general statement about a learner's vocabulary on the basis of scores in such a test.
Being proficient in a second language is not just a matter Of knowing a lot Of words — or grammar rules, for that matter — but being able to exploit that knowledge effectively for various communicative purposes. Learners can build up an impressive knowledge of vocabulary (as reflected in high test scores) and yet be incapable Of understanding a radio news broadcast or asking for assistance at an enquiry counter.
Learners need to show that they can use words appropriatelytheir own speech and writing, rather than just demonstrating that they understand what a word can mean..
In normal language use, words do not occur by themselves or in isolated sentences but as integrated elements of whole texts and discourse. They belong in specific conversations, jokes, stories, letters, textbooks, legal proceedings, newspaper advertisements and so on. And the way that we interpret a word is significantly influenced by the context in which it occurs.
In communication situations, it is quite possible to compensate for lack of knowledge of particular words. We all know learners who are remarkably adept at getting their message across by making the best use of limited lexical resources. Readers do not have to understand every word in order to extract meaning from a text satisfactorily. Some words can be ignored, while the meaning of others can be guessed by using contextual clues, background knowledge of the subject matter and so on. Listeners can use similar strategies, as well as seeking clarification, asking for a repetition and checking that they have interpreted the message correctly.

Three dimensions of vocabulary assessment

Up to this point, I have outlined two contrasting perspectives on the role of vocabulary in language assessment. One point Of view is that it is perfectly sensible to write tests that measure whether learners know the meaning and usage of a set of words, taken as independent semantic units. The other view is that vocabulary must always be assessed in the context of a language-use task. Where it interacts in a natural way with other components of language knowledge. To some extent, the two views are complementary in that they relate to different purposes of assessment. Conventional vocabulary tests are most likely to be used by classroom teachers for assessing progress in vocabulary learning and diagnosing areas of weakness. Other users Of these tests are researchers in second language acquisition with a special interest in how learners develop their knowledge Of, and ability to use. Target-language words. On the other hand, researchers in language testing and those who undertake large testing projects tend to be more concerned with the design of tests that assess learners' achievement or proficiency on a broader scale. For such purposes, vocabulary knowledge has a lower profile, except to the extent that it contributes to, Or detracts from, the performance of communicative tasks.

Discrete — embedded

The first dimension focuses on the construct which underlies the assessment instrument. In language testing, the term construct refers to the mental attribute or ability that a test is designed to measure. Discretetest takes vocabulary knowledge as a distinct construct, separated from other components of language competence.However, most existing vocabulary tests are designed on the assumption that it is meaningful to treat them as an independent construct for assessment purposes and can thus be classified as discrete measures in the sense that I am defining it here. In Contrast, an embedded vocabulary measure is one that contributes to the assessment of a larger construct. I have already given an example of such a measure.

Selective — comprehensive

The second dimension concerns the range of vocabulary to be included in the assessment. A conventional vocabulary test is based on a set of target words selected by the test-writer, and the test-takers are assessed according to how well they demonstrate their knowledge of the meaning or use Of those words. This is what I call a selective vocabulary measure.

Context-independent — context-dependent

The role of context which is an old issue in vocabulary testing is the basis for the third dimension.Traditionally contextualization has meant that a word is presented to test-takers in a sentence rather than as an isolated element. From a contemporary perspective, it is necessary to broaden the notion Of context to include whole texts and, more generally, discourse.

Vocabulary tests: four case studies

The Vocabulary Levels Test

The Vocabulary levels Test was devised by Paul Nation at Victoria University Of Wellington in New Zealand in the early 1980s as a simple instrument for classroom use by teachers in order to help them develop a suitable vocabulary teaching and learning programed for their students.He has distributed copies freely and made it available in two publications (Nation, 1983; 1990), and it has been widely used in New Zealand and many Other countries. It has proved to be a useful tool for diagnostic vocabulary testing of migrant or international students when they first arrive at a secondary school in an English-speaking country. Moreover in the absence Of any more sophisticated measure, it has been used by researchers who needed an estimate of the vocabulary size of their non-native-speaking subjects. Meara calls it the •nearest thing we have to a Standard test in vocabulary' (1996a: 38). Thus, it is certainly a test that deserves attention in a book on vocabulary assessment.

The Eurocentres Vocabulary Size Test

Like the Vocabulary Levels Test, the Eurocentres Vocabulary Size Test (EVST) makes an estimate of a learner's vocabulary size using a graded sample of words covering numerous frequency levels. However, there are several differences in the way that the two tests are designed and so it is worthwhile to look at the EVST in some detail as well. As I noted in Chapter 4, the EVST is a checklist test which presents learners with a series of words and simply requires them to indicate whether they know each one or not. It includes a substantial proportion of non-words to provide a basis for adjusting the test-takers' scores if they appear to be overstating their vocabulary knowledge. Another distinctive feature of the EVST is that it is administered by computer rather than as a pen-and-paper test. Let us now look at the test from two perspectives: first as a placement instrument and then as a measure of vocabulary size.

The EVST as a measure of vocabulary size

If the Eurocentres test is to have a wider application than just as a placement tool for language schools. We also need to consider its validity as a measure of vocabulary Size and for this We should 100k into various aspects of its design:

The format of the test and, in particular, the role of the non-words;
The selection of the words to be tested; and
The scoring of the test.

The Vocabulary Knowledge Scale

The development Of the Vocabulary Knowledge Scale (VKS) for use in their research on incidental vocabularyacquisition. The instrument is of interest not only as a test in its own right but also as a way Of exploring some issues that arise in any attempt to measure quality Of vocabulary knowledge in a practical manner.

The Test of English as a Foreign Language

Our fourth case Study involves one Of the major language tests in the world today. The Test of English as a Foreign Language, or TOEFL, is administered in 180 countries and territories to more than 900,000 candidates. As one might expect Of a test with such impressive vital statistics, this is an American invention — one of a whole range of tests. covering many spheres of education and employment in the United States, that are administered by the Educational Testing Service (ETS) of Princeton. New Jersey. Like Other ETS tests, TOEFL relies on sophisticated statistical analyses and testing technology in order to ensure its quality as a measuring instrument and its efficient administration to such large numbers of test-takers. The whole edifice, though, has been built on a simple building block: the multiple-choice item. Until recently, all the items in the basic TOEFL test have been of this type. The exclusive use Of the multiple-choice format has been one source Of criticism of the test by language teachers. Because it has limited the aspects of language proficiency that could be assessed by the test. Consequently it has been seen as having a very negative washback effect, in the sense that learners preparing to take it have often focused narrowly on test-taking Skills at the expense Of developing a wider range Of academic study skills.

The primary purpose of TOEFL is to assess whether foreign students planning to study in a tertiary institution where English is the medium of instruction have a sufficient level of proficiency in the language to be able to undertake their academic studies without being hampered by language-related difficulties. Thus, students from non-English-speaking countries applying for admission to North American colleges and universities normally take the test in their Own country some time in advance, and their scores help to determine whether they will be admitted and whether they will be required to take further ESL courses once they arrive on campus. Apart from university admissions officers.Certain employers and professional bodies also use TOEFL scores as a basis for deciding whether foreign-trained professionals. such as doctors, are proficient enough in the language to practice their skills in an Englishspeaking environment. This means that the test has an important gate-keeping role. in that it can influence a person's future prospectsfor education and employment, and therefore intending can didates take the test very seriously. A whole industry for TOEFL preparation has grown up in many countries to provide candidates with a wide range Of practice materials and intensive coaching in test-taking techniques.

From the viewpoint Of vocabulary assessment, the history Of the TOEFL programmed represents a fascinating case study Of how approaches to testing have changed in the latter part of the twentieth century. In particular, vocabulary testing has become progressively more embedded and context dependent as a result Of successive revisions Of the test battery during that period. Thus, we need to trace the development Of the test from the early 1960s to the present to see how and why the changes occurred.

Sources :

Read John. 2000. ASSESSING VOCABULARY. United kingdom: university Press Cambridge.

Jumat, 08 Mei 2020

SUMMARY ASSESSING READING AND WRITING

ASSESSING READING

In foreign language learning, reading is likewise a skill that teacher simply expect learners to acquire.. Basic, beginning-level textbooks in a foreign language presuppose a student's reading ability if only because it's a book that is the medium. Most formal tests use the written word' as a stimulus for test-taker response; even oral interviews may require reading performance for certain tasks. For learners of-English-, two primary hurdles must be cleared in order to become efficient readers. First, they need to be 'able to master fundamental bottom-up strategies for processing separate letters, words, and phrases, as well as top-down, conceptually driven strategies for comprehension. Second, as part of that top-down approach, second language readers must develop appropriate content and formal schemata-background information and cultural experience-to carry out those interpretations effectively.

experience-to carry out those interpretations effectively. The assessment of reading ability does not end with the measurement of comprehension. Strategic pathways to full understanding are often important factors to include in assessing learners, especially in the case of most classroom assessments that are formative in nature. As we consider a number of different types or genres of written texts, the components of reading ability, and specific tasks that are commonly used in the assessment of reading, let's not forget the unobservable nature of reading. Like listening, one cannot see the process of reading, nor can one observe a specific product of reading.

TYPES {GENRES} OF READING

Each type or genre of written text has its own set of governing rules and conventions. A reader must be able to anticipate those conventions in order to process meaning efficiently. Efficient readers also have to know what their purpose is in reading a text, the strategies for accomplishing that purpose, and how to retain the information. The content validity of an assessment procedure is largely established through the genre of a text. For example, if learners in a program of English for tourism have been learning how to deal with customers needing to arrange bus tours, then assessments of their ability should include guidebooks, maps, transportation schedules, calendars, and other relevant texts.

MICROSKILLS, MACROSKILlS, AND STRATEGIES FOR READING

The micro- and macroskills below represent the spectrum of possibilities for objectives in the assessment of reading comprehension.Micro- and macroskills for reading comprehension.

Micro- and macroskills for reading comprehension

Discriminate among the distinctive graphemes and orthographic patterns of English.
Retain chunks of language of different lengths in short-term memory.
Process writing at an efficient rate of speed to suit the purpose.
Recognize a core of words, and interpret word order patterns and their sign ificance.
Recognize grammatical word classes (nouns, verbs, etc.), systems (e.g., tense, agreement, pluralization), patterns, rules, and elliptical forms.
Recognize that a particular meaning may be expressed in different grammatical forms.
Recognize cohesive devices in written discourse ahd their role in signaling the relationsh ip between and among clauses. Macroskills
Recognize the rhetorical forms of written discourse and their significance for interpretation.
Recognize the communicative functions of written texts, according to form and purpose. _
Infer context that is not explicit by using background knowledge.
From described events, ideas, etc., infer links and connections between events, deduce causes and effects, and detect such relations as main idea supporting idea, new information, given information, generalization, andexample information.
Distinguish between literal and implied meanings.
Detect culturally specific references and interpret them in a context of the appropriate cultural schemata.
Develop and use a battery of reading strategies, such as scanning and skimming, detecting discourse markers, guessing the meaning of wordsfrom context, and activating schemata for the interpretation of texts.

TYPES OF READING

Nevertheless, for considering assessment procedures, several types ofreading performance are typically identified, and these will serve as organizers of various assessment tasks.

Perceptive. In keeping with the set of categories specified for listening comprehension, similar specifications are offered here; except with some differing terminology to capture the uniqueness of reading.
Selective. This category is largely an artifact of assessment formats. In order to ascertain one's reading recognition of lexical, grammatical, or discourse features
of language within a very short stretch of language.
Interactive. Included among interactive reading types are stretches of language of several paragraphs to one page or more in which the reader must, in a psycholinguistic sense, interact with the text.
Extensive. Extensive reading, as discussed in this book, applies to texts of more than a page, up to an<:t including professional articles, essays, technical reports, short stories, and books. (It should be noted that reading research commonly refers to "extensive reading" as longer stretches of discourse, such as long articles and books that are usually read outside :t classroom hour.

DESIGNING ASSESSMENT TASKS: PERCEPTIVE READING

Reading Aloud

The test-taker sees separate letters, words, and/or short sentences and reads them aloud, one by one, in the presence of an administrator. Since the assessment is of reading comprehension, any recognizable oral approximation of the target response is considered correct.

Written Response

The same stimuli are presented, and the test-taker's task is' to reproduce the probe in writing. Because of the transfer across different skills here, evaluation of the test taker's response

must be carefully treated. If an error occurs, make sure you determine its source; what might be assumed to be a writing error, for example, may actually be a reading error, and vice versa.

Multiple-Choice

Multiple-choice responses are not only a matter of choosing one of four or five possible answers. Other formats, some of which are especially useful at the low levels Of reading, include same/different, circle the answer, true/false, choose the letter, and matching.

Picture-Cued Items

Test-takers are shown a picture, such as the one on the next page, along with a written text and are given one of a number of possible tasks to perform.

DESIGNING ASSESSMENT TASKS: SECTIVE READING

Multiple-Choice (for Form-Focused Criteria)

By far the most popular method of testing a reading knowledge of vocabulary and grammar is the multiple-choice format, mainly for reasons of practicality: it is easy to administer and can be scored quickly.

Matching Tasks

The most frequently appearing criterion in matching procedures is vocabulary. Matching tasks have the advantage of offering an alternative to, traditional . multiple-choice or flJ.1-in-the-blank formats and are sometimes easier to construct than multiple-choice items, as long as the test designer has chosen the matches carefully. Some disadvantages do come with this framework, however. They can become more of a puzzle-solving process than a genuine test of comprehension as test-takers struggle with the search for a match, possibly among 10 or 20 different items. Like other tasks in this section, they also are contrived exercises that are endemic to academia that will seldom be found in the real world.

Editing Tasks

Editing for grammatical or rhetorical errors is a widely used test method for assessing linguistic competence in reading. The TOEFL® and many "other tests employ this technique with the argument that it not only focuses on grammar but also, introduces a simulation of the. authentic task of editing, or discerning errors in written passages. Its authenticity may be supported if you consider proofreading as a real-world skill that is being tested.

Picture-Cued Tasks

Several types of picture-cued methods are commonly used.

Test-takers read a sentence or passage and choose one of four pictures that is being described. The sentence (or sentences) at this level is more complex.
Test-takers read a series of sentences or definitions, each describing a labeled part of a picture or diagram. Their task is to identify each labeled item.

Gap-Filling Tasks

The obvious disadvantage of this type of task is its questionable assessment of reading ability. The task requires both reading and writing performance, thereby rendering it of 19w validity in isolating reading as the sole criterion. Another drawback is scoring the variety of creative responses that are likely to appear. You will have to make a number of judgment calls on what comprises a correct response.

DESIGNING ASSESSMENT TASKS: INTERACTIVE READING

Cloze Tasks

One of the most popular types of reading assessment task is the cloze procedure.

The word cloze was coined by educational psychologists to capture the Gestalt psychological concept of "closure," that is, the ability to fill in gaps in an incomplete image (visual; auditory, or cognitive) and supply (from background schemata) "omitted details. Cloze tests are usually a minimum of two paragraphs in length in order to account for' discourse expectancies. They can be constructed relatively easily as long as the specifications for choosing deletions and for scoring are clearly defined. Typically every seventh word (plus or minus two) is deleted (known as fixed-ratio deletion), but many cloze test designers instead use a rational deletion procedure 11, of choosing deletions according to the grammatical or discourse functions of the words. Rational deletion also allows the designer to avoid deleting words that would be difficult to predict from the context.

Two approaches to the scoring of cloze tests are commonly used. The exact word method gives credit to test-takers only if they insert the exact word that was originally deleted. The second n1ethod, appropriate word scoring, credits the test taker for supplying any word that is grammatically correct and that makes good sense in the context. In the sentence above about the "gorgeous sunset," the test takers would get credit for supplying beautiful, amazing, and spectacular. The choice between the two methods of scoring is one of practicality/reliability vs. face validity.

Impromptu Reading Plus Comprehension Questions

Notice that this set of questions, based on a 250-word passage, covers the comprehension of these features:

main idea (topic)
expressing/idioms phrases in context
inference (implied detail)
grammatical features
detail (scanning for a specifically stated detail)
excluding facts not written (unstated details)
supporting idea(s)
vocabulary in context

To construct your own assessments that involve short reading passages followed by questions, you can begin with TOEFL-like specs as a basis. Your focus in your own classroom will determine which of these-and possibly other specifications-you will include in your assessment procedure, how you will frame questions, and how much weight you will give each item in scoring.

Short-Answer Tasks

A reading passage is presented, and the test-taker reads questions that must be answered in a sentence or two. Questions might cover the same specifications indicated above for the TOEFL reading, but be worded in question form. For example, in a passage on the future of airline travel, the following questions might appear:

Open-ended reading comprehension questions

What do you think the main idea of this passage is?
What would you infer from the passage about the future of air travel?
In line 6 the word sensation is used. From the context, what do you think this word means?
What two ideas did the writer suggest for increasing airline business?
Why do you think the airlines have recently experienced a decline

Editing (Longer Texts)

Several advantages are gained in the longer format. First, authenticity is increased. The likelihood that students in English classrooms will read connected prose of a page or two is greater than the likelihood of their encountering the contrived format of unconnected sentences. Second, the task simulates proofreading one's own essay, where it is imperative to find and correct errors. And third, if the test is connected to a specific curriculum (such as placement into one of several writing courses), the test designer. can draw up specifications for a number of grammatical and rhetorical categories that match the content of the courses. Content validity is thereby supported, and along with it the face validity of a task in which students are willing to invest. Imao (2001, p. 185) was able to offer teachers a computer-generated breakdown of performance in the following categories:

Sentence structure
Verb tense
Noun/article features
Modal auxiliaries
Verb complements
Noun clauses
Adverb clauses
Conditionals
Logical connectors
Adjective clauses (including relative clauses)
Passives

Scanning

Scanning is a strategy used by all readers to fmd relevant information in a text. Assessment of scanning is carried out by presenting test-takers with a text (prose or something in a chart or graph format) and requiring rapid identification of relevant bits of information. Possible stimuli include ,r'

a one- to two-page news article,
an essay,
a chapter in a textbook,
a technical report,
a table or chart depicting some research fmdings,
a menu, and
an application form

Among the variety of scanning objectives (for each of the genres named above), the test-taker must locate

a date, name, or place in an article;
the setting for a narrative or story;
the principal divisions of a chapter;
the principal research fmding in a technical report;
a res1.I,lt reported in a specified cell in a table;
the cost of an item on a menu; and
specified data needed to fill out an application.

Ordering Tasks

Students always enjoy the activity of receiving little strips of paper, each with a sentence it, anti assembling them into a story, sometimes called the "strip story" technique. Variations on this can serve :as an assessment of overall global understanding of a story and of the cohesive devices that signal the order of events or ideas. Alderson et al. (1995, p. 53) warn, however, against assuming that there is only one 'logical order.

Information Transfer: Reading Charts, Maps, Graphs, Diagrams

Every educated person must be able to comprehend charts, maps, graphs, calendars, diagrams, and the like. Converting such nonverbal input into comprehensible intake requires not only an understanding of the graphic and verbal conventions of themedium but also a linguistic ability to interpret that information to someone else. Reading a map implies understanding the conventions of map graphics, but it is often accompanied by telling someone where to turn, how far to go, etc. Scanning menu requires an ability to understand the structure of most menus as well as the capacity to give an order when the time comes. Interpreting the numbers on a stock market report involves the interaction of understanding the numbers and of conveyingthat understanding to others. To comprehend information in this medium (hereafter referred to simply as "'graphics"), learners must be able to

comprehend specific conventions of the various types of graphics;
comprehend labels, headings, numbers, and symbols;
comprehend the possible relationships among elements of the graphic; and
make inferences that are not presented overtly.

DESIGNING ASSESSMENT TASKS: EXTENSIVE READING

Extensive reading involves somewhat longer texts than we have been dealing with up to this point. Journal articles, technical reports, longer essays, short stories, and books fall into this category. The reason for placing such reading into a separate category is that reading of this type of discourse almost always involves a focus on meaning using mostly top-down processing, with only occasional use of a targeted bottom-up strategy. Before examining ,a few tasks that have proved to be useful in assessing extensive reading, it is essential to note that a number of the tasks described in previous categories can apply here. Among them are

impromptu reading plus comprehension questions,
short-answer tasks,
editing,
scanning,
ordering,
information transfer, and
interpretation (discussed under graphics),

Skimming Tasks

Skimming is the process of rapid coverage of reading matter to determine its gist or main idea. It is a prediction strategy used to give a reader a sense of the topic and purpose of a text, the organization of the text, the perspective or point of view of the writer, its ease or difficulty, and/or its usefulness to the reader.

Summarizing and Responding

Evaluating summaries is difficult. (2001) used four criteria for the evaluation of a summary:

Criteria for assessing a summary (lmao, 2001, p. 184)

Expresses accurately the main idea and supporting ideas.
Is written in the student's own words; occasional vocabulary from the original text is acceptable.
Is logically organized.
Displays facility in the use of language to clearly express ideas in the text

As you can readily see, a strict adherence to the1critetjon ;of assessing reading, and reading only, implies consideration of only the frrst factor; the other three pertain to writing performance. The first criterion is nevertheless a crucial factor; otherwise the reader-writer could pass all three of the other criteria with virtually no understanding of the text itself.

Note-Taking and Outlining

Finally, a reader's comprehension of extensive texts may be assessed through an evaluation of a process of note-taking and/or outlining. Because of the difficulty of controlling the conditions and time frame for both these techniques, they rest firmly in the category of informal assessment. Their utility is in the strategic training that learners gain in retaining information through marginal notes that highlight key information or organizational outlines that put supporting ideas into a visually manageable framework. A teacher, perhaps in on one conferences-with students, can use student notes/outlines as indicators of the presence or absence of effective reading strategies, and thereby point the learners in positive directions.

In his introduction to Alderson's (2000, p. xx) book on assessing reading, Lyle Bachman observed: "Reading, through which we can access worlds of ideas and feelings, as well as the knowledge of the ages and visions of the future, is at once the most extensively researched and the most enigmatic of then so-called language skills." It's the almost mysterious "psycholinguistic guessing game" (Goodman, 1970) of reading that poses the enigma. We still have much to learn about how people learn to read, and especially about how the brain accesses, stores, and recalls visually represented language. This chapter has illustrated a number of possibilities for assessment of reading across the continuum of skills, from basic letter/word recognition to the retention of meaning extracted from vast quantities of linguistic' symbols. I hope it will spur you to go beyond the confines of these suggestions and create your own methods of assessing reading.

ASSESSING WRITING

In the field of second language teaching, only a half-century ago experts were saying that writing was primarily a convention for recording speech and for reinforcing grammatical and lexical features of language. Now we understand the uniqueness of writing as a skill with its own features and conventions. We also fully understand the difficulty of learning to write "well in any language, even in our own native language. Every educated child in developed countries learns the rudiments of writing in his or her native language, but very few learn to express themselves clearly with logical, well-developed organization that accomplishes an intended purpose. And yet we expect second language learners to write coherent essays with artfully chosen rhetorical and discourse devices.

TYPES OF WRITING PERFORMANCE

Four categories of written performance that capture the range of written production are considered here. Each category resembles the categories defined for the other three skills, but these categories, as always, reflect the uniqueness of the skill area.

Imitative. To produce written language, the learner must attain skills in the fundamental, basic tasks of .writing letters, words, punctuation, and very brief sentences.
Intensive (controlled). Beyond the fundamentals of imitative writing are skills in producing appropriate vocabulary within a context, collocations and idioms, and correct grammatical features up to 'the length of a: sentence
Responsive. Here, assessment tasks require learners to perform at a limited discourse level, connecting sentences into a paragraph and creating a logically connected sequence of two or three paragraphs.
Extensive. Extensive writing implies successful management of all the processes and strategies of writing for all purposes, up to the length of an essay, a term paper, a major research project report, or even a thesis.

MICRO- AND MACROSKII.IS OF WRITING

Micro- and macroskills of writing

,Microskills

Produce graphemes and orthographic patterns of English.
Produce writing at an efficient rate of speed to suit the purpose.
Produce an acceptable core of words and use appropriate word order patterns.
Use acceptable grammatical systems (e.g., tense, agreement, pluralization), patterns, and rules.
Express a particular meaning in different grammatical forms.
Use cohesive devices in written discourse. Macroskills
Use the rhetorical forms and conventions of written discourse.
Appropriately accomplish the communicative functions of written texts according to form and purpose.
Convey links and connections between events, and communicate such relations as main idea, supporting idea, new information, given information, generalization, and exemplification.
Distinguish between literal and implied meanings when writing.
Correctly convey culturally specific references in the context of the written text.
Develop and use a battery of writing strategies, such as accurately assessing the audience's interpretation, using prewriting devices, writing with fluency in the first drafts, using paraphrases and synonyms, soliciting peer and instructor feedback, and using feedback for revising and editing.

DESIGNING ASSESSMENT TASKS: IMITATIVE WRITING

Tasks in [Hand] Writing Letters, Words, and Punctuation

First, a comment should be made on the increasing use of personal and laptop computer and handheld instruments for creating written symbols. limited variety of types of tasks are commonly used to assess a person's ability to produce written letters and symbols. A few of the more common types are described here:

Copying. There is nothing innovative or modern about directing a test-taker to copy letters or words
listening cloze selection tasks. These tasks combine dictation with a written script that has a relatively frequent deletion ratio (every fourth or fifth word, perhaps).
Picture-cued tasks. Familiar pictures are displayed, and test-takers are told to write the word that the picture represents. Assuming no ambiguity in identifying the picture (cat, hat, chair, table, etc.), no reliance is made on aural comprehension for successful completion of the task.
Form completion tasks. A variation on pictures is the use of a simple form (registration, application, etc.) that asks for name, address, phone number, and other data. Assuming , of course, that prior classroom instruction has focused on filling out such forms, this task becomes an appropriate assessment ofsimple tasks such as writing one's name and address.
Converting numbers and abbreviations to words. Some tests have a section on which numbers are written-for example,hours of the day, dates, or schedules and test-takers are directed to write out the numbers. This task can serve as a reasonably reliable method to stimulate handwritten English

Spelling Tasks and Detecting Phoneme-Grapheme Correspondences.

number of task types are in popular use to assess the ability to spell words correctly and to process phoneme-grapheme correspondences

Spelling tests
Picture-cued tasks.
Multiple-choice techniques.
Matching phonetic symbols.

DESIGNING ASSESSMENT TASKS: INTENSIVE (CONTROLLED) WRITING

Dictation and Dicto -Comp

dictation was described as an assessment of the integration of listening and writing, but it was clear that the primary skill being assessed is listening. Because of its response mode, however, it deserves a second mention in this chapter. Dictation is simply the rendition in writing of what one hears aurally, so it could be classified as an imitative type of writing, especially since a proportion of the test taker's performance centers on correct spelling. A form of controlled writing related to dictation is a dicto-comp.

Grammatical Transformation Tasks

In the heyday of structural paradigms of language teaching with slot-filler ,techniques and slot substitution drills, the practice of making grammatical transformations...;....orally or in writing-was very popular. To this day, language teachers have also used this technique as an assessment task, ostensibly to measure grammatical competence. Numerous versions of the task are possible:

Change the tenses in a paragraph.
Change full forms of verbs to reduced forms (contractions).
Change statements to yes/no or wh-questions.
Change questions into statements.
Combine two sentences into one using a relative pronoun.
Change direct speech to indirect speech.
Change from active to passive voice.

Picture-Cued Tasks

The main advantage in this technique is in detaching the almost ubiquitous reading and writing connection and offering instead a nonverbal means to stimulate written responses.

Short sentences
Picture description.
Picture sequence description.

Vocabulary Assessment Tasks

The major techniques used to assess vocabulary are (a) defining and (b) using a word in a sentence. The latter is the more authentic, but even that task is constrained by a contrived situation in which the test-taker, usually in a matter of seconds, has to come up with an appropriate sentence, which major may not indicate that the test-taker "knows" the word. Vocabulary assessment is clearly form-focused in the above tasks, but the procedures are creatively linked by means of the target word, its collocations, and its morphological variants.

Ordering Tasks

One task at the sentence level may appeal to those who are fond of word games and puzzles: ordering (or reordering) a scrambled set of words into a correct sentence. Here is the way the item format appears.

Test-takers read:

Put the words below into the correct order to make a sentence:

1. cold / winter / is / weather / the / in / the

2. studying / what / you / are

3. next / clock / the / the / is / picture / to

Short-Answer and Sentence Completion Tasks

Some types of short-answer tasks were discussed in Chapter 8 because of the heavy participation of reading performance in their completion. Such items range from

simple and predictable to somewhat more elaborate responses.

ISSUES IN ASSESSING RESPONSIVE AND EXTENSIVE WRITING

Responsive writing creates the opportunity :tor test-takers to offer an array of 'possible creative responses within a pedagogic11 or assessment framework: test-takers are "responding" to a prompt or assignment. Freed from the strict control of intensivewriting, learners can exercise a number of options in choosing vocabulary, grammar, and discourse, but with some constraints and conditions. The learner is responsible for accomplishing a purpose in writing, for developing a sequence of connected ideas, and for empathizing with an audience. The genres of text that are typically addressed here are

short reports (with structured formats and conventions);
responses to the reading of an article or story;
summaries of articles or stories;
brief narratives or descriptions; and
interpretations of graphs, tables, and charts.

Both responsive and extensive writing tasks are the subject of some classic, widely debated assessment issues that take on a distinctly different flavor from those at the lower-end production of writing.

Authenticity. Authenticity is a trait that is given special attention: if test takers are being asked to perfornl a task, its face and content validity need to be assured in order to bring out the best in the writer
Scoring. Scoring is the thorniest issue at these final two stages of writing
Time. Yet another assessment issue surrounds the unique nature of writing: it is the only skill in which the language producer is not necessarily constrained by time, which implies the freedom to process multiple drafts before the text becomes a finished product.

DESIGNING ASSESSMENT TASK RESPONSIVE AND EXTENSIVE WRITING

Paraphrasing

One of the more difficult concepts for second language learners to grasp is paraphrasing. The initial step in teaching paraphrasing is to ensure that learners understand the importance of. paraphrasing: to say something in one's own words, to avoid plagiarizing, to offer some variety in expression. With those possible motivations and poses in mind, the test deSigner needs to elicit a paraphrase of a sentence or paragraph, usually .not more.

Guided Question and Answer

A variation on using guided questions is to prompt the test-taker to write from an outline. The outline may be self-created from earlier reading and/or discussion, or, which is less desirable, be provided by the teacher or test administrator. The out· line helps to guide the learner through a presumably logical development of ideas that have been given some forethought.

Paragraph Construction Tasks

The participation of reading performance is inevitable in writing effective paragraphs. To a great extent, writing is the art of emulating what one reads. You read an effective paragraph; you analyze the ingredients of its success; you emulate it. Assessment of paragraph development takes on a number of different forms:

1. Topic sentence writing. Assessment thereof consists of

specifying the writing of a topic sentence,
scoring points for its presence or absence, and
scoring and/or commenting on its effectiveness in stating the topic

2. Topic development within a paragraph

Four criteria are commonly applied to assess the quality of a paragraph:

the clarity of expression of ideas
the-logic-of the sequence and connections
the cohesiveness or unity of the paragraph
the overall effectiveness or impact of the paragraph as a whole

3.Development main and supporting ideas across paragraphs

These elements can be considered in evaluating a mUlti-paragraph essay:

• addressing the topic, main idea, or prinCipal purpose
organizing and developing supporting ideas
using appropriate details to undergird supporting ideas
showing facility and fluency in the use of language
demonstrating syntactic variety

Strategic Options

Developing main and supporting ideas is the goal for the writer attempting to create an effective text, whether a short one- to two-paragraph one or an extensive one of several pages. A number .of strategies are commonly taught to second language writers to accomplish their purposes. Aside from strategies of free writing, outlining, drafting, and revising, writers need to be aware of the task that has been demanded and to focus on the genre of writing and the expectations of that genre.

Attending to task. In responsive writing, the context is seldom completely open-ended: a task has been defined by the teacher or test administrator, and the writer must fulfll1 the criterion of the task. Depending on the genre of the text, one or more of these task types will be needed to achieve the writer's purpose. If students are asked, for example, to "agree or disagree with the author's statement:' a likely strategy would be to cite pros and cons and then take a stand.

Attending to genre. The genres of writing that were listed at the beginning of this chapter provide some sense of the many varieties of text that may be produced by a second language learner in a writing curriculum. Assessment of the more common genres may include the following criteria, along with chosen factors from the list in item #3 (main and supporting ideas) above:

Reports (Lab Reports, Project Summaries, Article/Book Reports, etc.)
Summaries of Readings/Lectures/Videos
Responses to Readings/lectures/Videos
Narration, Description, Persuasion Argument, and Exposition
Interpreting Statistical, Graphic, or Tabular Data
Library Research Paper

TEST OF WRITTEN ENGLISH (TWE®)

The Test o/Written English (11f1E). Established in 1986, the ~ has gained a reputation as a well-respected measure of written English, and a number of research articles support its validity (Frase et al., 1999; Hale et al., 1996; Longford, 1996; My ford et al., 1996). In 1998, a computer-delivered version of the 1WE was incorporated into the standard computer-based TOEFL and simply labeled as the "writing" section of the TOEFL. The 1WE is still offered as a separate test especially where only the paper based TOEFL is available.

Test preparation manuals such as Deborah Phillips's Longman Introductory Course for the TOEFL Test (2001) advise l WE test-takers to follow six steps to maximize success on the test:

Carefully identify the topic.
Plan your supporting ideas.
In the introductory paragraph, restate the topic and state the organizational plan of the essay.
Write effective supporting paragraphs (show transitions, include a topic sentence, specify details).
Restate your position and summarize in the concluding paragraph.
Edit sentence structure and rhetorical expression

It is important to put tests like theTWE in perspective. Timed impromptu tests have obvious limitations if you are looking for an authentic sample of performance in a real-world context.

How does the Educational Testing Service justify the TWE as such art indicator?

Research by Hale et al. (1996) showed that the prompts used in the TWE approximate writing tasks assigned in 162 graduate and undergraduate courses across several disciplines in eight universities. Another study (Golub-Smith et aI., 1993) ascertained the reliabilities across several types of prompts (e.g., compare/contrast vs. chart-graph interpretation). Both Myford et aI.(1996) and Longford (1996) studied the reliabilities of judges' ratings. The question of whether a mere 30-minute time period is sufficient to elicit a sufficient sample of a test-taker's writing was addressed by Hale (1992), Henning and Cas callar (1992) conducted a large-scale study to assess the extent to which TWE performance taps into the communicative competence of the test-taker. The upshot of this research-which is updated regularly-is that the TWE (which adheres to a high standard of excellence in standardized testing) is, within acceptable standard error ranges, a remarkably accurate indicator of writing ability.

The convenience of the TWE should not lull administrators into believing that TWEs and TOEFLs and the like are the only measures that should be applied to students. It behooves admissions and placement officers worldwide to offer secondary File measures of writing ability to those test-takers who.

are on the threshold of a minimum score,
may be disabled by highly time-constrained or anxiety-producing situations,
,could be culturally disadvantaged by a topic or situation, and/or
(in the case of computer-based writing) have had few opportunities to compose on a computer.

SCORING METHODS FOR RESPONSIVE AND EXTENSIVE WRITING

Holistic Scoring

The TWE scoring scale above is a prime example of holistic scoring. In Chapter 7, a rubric for scoring oral production holistically was presented. Advantages of holistic scoring include

fast evaluation,
relatively high inter-rater reliability,
the fact that scores represent "standards" that are easily interpreted by lay persons,
the fact that scores tend to emphasize the writer's strengths (Cohen, 1994,p. 315), and
applicability to writing across many different disciplines.

Its disadvantages must also be weighed into a decision on whether to use holistic scoring:
One score masks differences across the sub skills within each score.
No diagnostic information is available (no washback potential).
The scale may not apply equally well to all genres of writing.
Raters need to be extensively trained to use the scale accurately.

Primary Trait Scoring

A second method of scoring, primary trait, focuses on "how well students can write within a narrowly defined range of discourse" (Weigle, 2002, p. 110).This type of scoring en1phasizes the task at hand and assigns a score based on the effectiveness of the text's achieving that one goal. For example, if the purpose or function of an essay is to persuade the reader to do something, the score for the writing would rise or fall on the accomplishment of that function. If a learner is asked to exploit the. ,imaginative function of language by expressing personal feelings, then the response would be evaluated on that feature alone. the advantage of this method is that it allows both writer and evaluator to focus on function. In summary, a primary trait score would assess .

the accuracy of the account of the original (summary),
the clarity of the steps of the procedure and the fmal result (lab report),
the description of the main features of the graph (graph description), and
the expression of the writer's opinion (response to an article),

Analytic Scoring

For classroom instruction, holistic scoring provides little washback into the writer's further stages of learning. Analytic scoring may be more appropriately called analytic assessment in order to. capture its closer association with classroom language instruction than with formal testing. The order in which the five categories (organization, logical development of ideas, grammar, punctuation/spelling/mechanics, and style and quality of expression) are listed may bias the evaluator toward the greater importance of organization and logical development as opposed to punctuation and style. But the mathematical assignment of the 100-point scale gives equal weight (a maximum of 20 points) to each of the five major categories.

Analytic scoring of compositions offers writers a little more washback than a single holistic or primary trait score. Scores in five or six major elements will help to call the writers' attention to areas of needed improvement. Practicality is lowered in that more time is required for teachers to attend to details within each of the categories in order to render a final score or grade, but ultimately students receive more information about their writing. Numerical scores alone, however, are still not sufficient for enabling students to become proficient writers, as we shall see in the next section.

BEYOND SCORING: RESPONDING TO EXTENSIVE WRITING

Formal testing carries with it the burden of designing a practical and reliable instrument that assesses its intended criterion accurately. To accomplish that mission, designers of writing tests are charged with the task of providing as "objective" a scoring procedure as possible, and one that in many cases can be easily interpreted by agents beyond the learner. Most writing specialists agree that the best way to teach writing is a hands-on approach that stimulates student output and then generates a series ofse1f-assessments, peer editing and revision, and teacher response and conferencing (Raimes, 1991, 1998; Reid, 1993; Seow, 2002).

Assessment of initial stages in composing

Focus your efforts primarily on meaning, main idea, and organization.
Comment on the introductory paragraph.
Make general comments about the clarity of the main idea and logic or appropriateness of the organization.
As a rule of thumb, ignore minor (local) grammatical and lexical errors.
Indicate what appear to be major (global) errors (e.g., by underlining the text in question), but allow the writer to make corrections.
Do not rewrite··questionable, un grammatical ,or· awkward sentences; rather, probe with a question about meaning.
Comment on features that appear to be irrelevant to the topic.

Assessing Later Stages of the Process of Composing

Once the writer has determined and clarified his or her purpose and plan, and has completed at least one or perhaps two drafts, the focus shifts toward "fine tuning" the expression with a view toward a final revision. Editing and responding assume an appropriately different character now, with these guidelines:

Assessment of later stages in composing

Comment on the specific clarity and strength of all main ideas and supporting ideas, and on argument and logic.
Call attention to minor ("Iocal") grammatical and mechanical (spelling, punctuation) errors, but direct the writer to self-correct.
Comment on any further word choices and expressions that may not be awkward but are not as clear or direct as they could be.
Point out any problems with cohesive devices within and across paragraphs.
If appropriate, comment on documentation, citation of sources, evidence, and other support.
Comment on the adequacy and strength of the conclusion.

Through all these stages it is assumed that peers and teacher are both responding to the writer through conferencing in person, electronic communication, or, at the very least, an exchange of papers. The impromptu timed tests and the methods of scoring discussed earlier may appear to be only distantly related. to such an individualized process of creating a· written text, but are they, in reality? All those developmental stages may be the preparation that learners need both to function in creative real..world writing tasks and to successfully demonstrate their competence on a timed impromptu test. And those holistic scores are after all generalizations of the

various components of effective writing. If the hard work of successfully progressing through a semester or two ofa challenging course in academic writing ultimately means that writers are ready to function in their real-world contexts, and to

get a 5 or 6 on the TWE, then all the effort was worthwhile. This chapter completes the cycle of considering the assessment of all of the four skills of listening, speaking, reading, and writing. As you contemplate using some of the assessment techniques that have been suggested, I think you can now fully appreciate two significant overarching guidelines for designing an effective assessment procedure:

It is virtually impossible to isolate anyone of the four skills without the involvement of at least one other mode of performance. Don't underestimate the power of the integration of skills in assessments designed to target a single skill area.
The variety of assessment techniques and item types and tasks is virtually infmite in that there is always some possibility for creating a unique variation. Explore those alternatives, but with some caution lest your overzealous urge to be innovative distract you from a central focus on achieving the intended purpose and renderirig an appropriate evaluation of performance.

Source :

Brown, H. G. (2004) Language Assessment : Principle and Classroom Practice.New York : Longman