Tugas language assessment: SUMMARY ASSESSING LISTENING AND SPEAKING

ASSESSING LISTENING

In earlier chapters, a number of foundational principles of language assessment were introduced. Concepts like practicality, reliability, validity, authenticity, washback, direct and indirect "testing ,and formative-and summative assessment·-are by now part of your vocabulary. Now our focus will shift away from the standardized testing juggernaut to the level at which you will usually work: the day-to-day classroom assessment of listening, speaking, reading, and writing. Since this is the level at which you will most frequently have the opportunity to apply principles of assessment, the next four chapters of this book will provide guidelines and hands-on practice in testing within a curriculum of English as a second or foreign language.

But First two important caveats. The fact that the four language skills are discussed in four separate chapters should in no way predispose you to think that those skills are or should be assessed in isolation. Every TESOL professional (see TBP, Chapter 15) will tell you that the integration of skills is of paramount importance in language learning. likewise, assessment is more authentic and provides more washback when skills are integrated. Nevertheless, the skills are treated independently here in order to identify principles, test types, tasks, and issues associated with each one. Second, you may already have scanned through this book to look for a chapter on assessing grammar and vocabulary, or something in the way of a focus on form in assessment. The treatment of form-focused assessment is not relegated to a separate chapter here for a very distinct reason: there is no such thing as a test of grammar or vocabulary that does not invoke one or more of the separate skills of listening, speaking, reading, or writing! It's not uncommon to find little "grammar tests" and "vocabulary tests" in textbooks, and these may be perfectly useful instruments. But responses on these quizzes are usually written, with multiple-choice selection or fill in-the-blank items.

OBSERVING THE PERFORMANCE OF THE FOUR SKILLS

Before focusing on listening itself, think about the two interacting concepts of performance and observation. All language users perform the acts of listening, speaking, reading, and writing. They of course rely on their underlying competence in order to accomplish these performances. So, one important principle for assessing a learner's competence is to consider the fallibility of the results of a single performance, such as that produced in a test. As with any attempt at measurement, it is your obligation as a teacher to triangulate your measurements: consider at least two (or more) performances and/or contexts before drawing a conclusion. That could take the form of one or more of the following designs:

several tests that are combined to form an assessment
a single test with multiple test tasks to account for learning styles and performance
variables
in-class and extra-class graded work
alternative forms of assessment (e.g., journal, portfolio, conference, observation, self-assessment, peer assessment).

Multiple measures will always give you a more reliable and valid assessment than a single measure. A second principle is one that we teachers often forget. We must rely as much as possible on observable performance in our assessments of students. Observable means being able to see or hear the performance of the learner (the senses of touch, taste, and smell don't apply very often to language testing!).

THE IMPORTANCE OF LISTENING

In the standardized testing industry, a number of separate oral production tests are available (fest of Spoken English, Oral Proficiency Inventory, and PhonePass, to name several that are described Chapter 7 of this book), but it is rare to find just a listening test. One reason for this emphasis is that listening is often implied as a component of speaking. How could you speak a language without also listening? In addition, the overtly observable nature of speaking renders it more empirically measurable then listening. But perhaps a deeper cause lies in universal biases toward speaking. A good speaker is often (unwisely) valued more highly than a good listener.

Every teacher of language knows that one's oral production ability-other than monologues, speeches, reading aloud, and the like-is only as good as one's listening comprehension ability. But of even further impact is the likelihood that input in the aural-oral mode accounts for a large proportion of successful language acquisition. In a typical day, we do measurably more listening than speaking (with the exception of one or two of your friends who may be nonstop chatterboxes!).Whether in the workplace, educational, or home contexts, aural comprehension far outstrips oralproduction in quantifiable terms of time, number of words, effort, and attention. We therefore need·-to pay close attention to listening as a mode of performance for assessment in the classroom. In this chapter, we will begin with basic principles and types of listening, then move to a survey of tasks that can be used to assess listening.

BASIC TYPES OF IJSTENING

As with all effective tests, designing appropriate assessment tasks in listening beginswith the specification of objectives, or criteria. Those objectives may be classified in terms -of several types of listening performance. Think about what you do when you listen. Literally in nanoseconds, the following processes flash through your brain:

You recognize speech sounds and hold ~ temporary "imprint" of them in

short-term memory.

You simultaneously determine the type of speech event (monologue, interpersonal dialogue, transactional dialogue) that is being processed and attend toits context (who the speaker is, location, purpose) and the content of the message.
You use (bottom-up) linguistic decoding skills and/or (top-down) background schemata to bring a plausible interpretation to the message, and assign a literal and intended meaning to the utterance.
In most cases (except for repetition tasks, which involve short-term memory only), you delete the exact linguistic form in which the message was originally received in favor of conceptually retaining important or relevant information in long-term memory.

Each of these stages represents a potential assessment objective :

comprehending of surface structure elements such as phonemes, words, intonation, or a grammatical category :
understanding of pragmatic context
determining meaning of auditory input
developing the gist, a global or comprehensive understanding

From these stages we can derive four commonly identified types of listening performance, each of Which comprises a category within which t0 consider assessment tasks and procedures.

Intensive. Listening for perception of the components (phonemes, words, intonation, discourse markers, etc.) of a larger stretch of language.
Responsive. Listening to a relatively short stretch of language (a greeting, question, command, comprehension check, etc.) in order to make an equally short response.
Selective. Processing stretches of discourse such as short monologues for several minutes in order to scan for certain information.
Extensive. Listening to· develop a top-down, global understanding of spoken language.

For full comprehension, test-takers may at the extensive level need to invoke interactive skills (perhaps note-taking, questioning, discussion): listening that includes all four of the above types as test-takers actively participate in discussions) debates, conversations, role plays, and pair and group work. Their listening performance must be intricately integrated with speaking (and perhaps other skills) in the authentic give-and-take of communicative interchange.

MICRO AND MACROSKIILS OF USTENING

A useful way of synthesizing the above two lists is to consider a finite number of micro- and macro skills implied in the performance of listening comprehension. Richards' (1983) list of micro skills has proven useful in the domain of specifying objectives for learning and may be even more useful in forcing test makers to carefully identify specific assessment objectives.

Micro- and macro skills of listening (adapted from Richards, 1983)

Discriminate among the distinctive sounds of English.
Retain chunks of language of different lengths in short-term memory.
Recognize English stress patterns, words in stressed and unstressed positions, rhythm.ic structure, intonation contours, and their role in signaling information.
Recognize reduced forms of words.
Distinguish word boundaries, recognize a core of words, and interpret word order patterns and their significance.
Process speech at different rates of del ivery.
Process speech containing pauses, errors, corrections, and other performance variables.
Recognize grammatical word classes (nouns, verbs, etc.), systems (e.g.,
tense, agreement/pluralization), patterns, rules, and elliptical forms.
Detect sentence constituents and distinguish between major and minor
constituents.
Recognize that a particular meaning may be expressed in different
grammatical forms.
Recognize cohesive devices in spoken discourse. Macroskills
Recognize the communicative functions of utterances, according to
situations, participants, goals.
Infer situations, participants, goals using real-world knowledge.
From events, ideas, and so on, described, predict outcomes, infer links and connections between events, deduce causes and effects, and detect such relations as main idea, supporting idea, flew information, given information, generalization, and exemplification.
Distinguish between literal and implied meanings.
Use facial, kinesic, body language, and other nonverbal clues to decipher meanings.
Develop and use a battery of listening strategies, such as detecting key words, guessing the 'meaning of words from context, appealing for help, and signaling comprehension or lack thereof.

Developing a sense of which aspects of listening performance are predictably difficult will help you to challenge your students appropriately and to assign weights to items. Consider the following list of what makes listening difficult (adapted from Rich~ds, 1983; Dr, 1984; Dunkel, 1991):

1. Clustering

2. Redundancy

3. Reduced fonns

4. Perjonnance variables

5. Colloquial language

6. Rate ofdelivery

7. Stress, rhythm, and intonation

8. Interaction

DESIGNING ASSESSMENT TASKS: INTENSIVE LISTENING

Once you have determined objectives, your next step is to design the tasks,

including making decisions about how you will elicit performance and how you will' expect the test-taker to respond. We will look at tasks that range from intensive listening performance, such as minimal phonemic pair recognition, to extensive comprehension of language in communicative contexts. The focus in this section is on the micro skills of intensive listening.

Recognizing Phonological and Morphological Elements : A typical form of intensive listening at this level is the assessment of recognition of

phonological and morphological elements of language.

Paraphrase Recognition : The next step up on the scale oflistening comprehension microskills is words, phrases, and sentences, which are frequently assessed by providing a stimulus sentence and asking the test-taker to choose the correct paraphrase from a number of choices

DESIGNING ASSESSMENT TASKS: RESPONSIVE listening

A question-and-answer format can provide some interactivity in these lower-end listening

tasks. The test-taker's response is the appropriate answer to a question

DESIGNING ASSESSMENT TASKS: SELECTIVE LISTENING

third type of listening performance is selective listening, in-which the test-taker listens to a limited quantity of aural input and must discern within it some specific information. A number of techniques have been used 'that require selective listening.

Listening Cloze

Listening cloze tasks (sometit11es called cloze dictations or partial dictations) require the test-taker to listen to a story. fllonologue,or conversation and simultaneously read the written text in which selected words or phrases have been deleted .One potential weakness of listening cloze techniques is that they may simply become reading comprehension tasks. Test-takers who are asked to listen to a story with periodic deletions in the written version may not need to listen at all, yet may still be able to respond with the appropriate word or phrase.

Listening cloze tasks should normally use an exact word method of scoring,in which you accept as a correct response only the, actual word or phrase that was spoken and consider other appropriate words as incorrect. (See Chapter 8 for further discussion of these two methods.) Such stringency is warranted; your objective is, after all, to test listening comprehenSion, not grammatical or lexical expectancies.

Information Transfer

Selective listening can also be assessed through an information transfer technique in which aurally processed information must be transferred to a visual representation, such as labeling a diagram, identifying an element in a picture, completing a form, or showing routes on a map.

The objective of this task is to test prepositions and prepositional phrases of location (at the bottom, on top or, around, along with larger, smaller), so other words and phrases such as back yard, yesterday, last few seeds, and scare away are supplied only as context and need not be tested.

Information transfer tasks may reflect greater authenticity by using charts, maps, grids, timetables, and other artifacts of daily life. In the example below, test takers hear a student's daily schedule, and the task is to fill in the partially completed weekly calendar.

Sentence Repetition

The task of simply repeating a sentence or a partial sentence, or sentence repetition,is also used as an assessment of listening comprehension. As in a dictation(discussed below), the test-taker must retain a stretch of language long enough to reproduce it. and then' must respond with an oral repetition of that stimulus. Incorrect listening comprehension, whether at the phonemic or discourse level,may be manifested in the correctness of the repetition. A miscue in repetition is scored as a miscue in listening. In the case of somewhat longer sentences, one could argue that the ability to recognize and retain chunks of language as well as threads of meaning might be assessed through repetition. In Chapter 7, we will look closely at PhonePass, a commercially produced test that relies largely on sentence repetition to assess both oral production and listening comprehension. Sentence repetition is far from a flawless listening assessment task.

DESIGNING ASSESSMENT TASKS: EXTENSIVE LISTENING

Drawing a clear distinction between any two of the categories of listening referred to here is problematic, but perhaps the fuzziest division is between selective and extensive listening. Some important questions about designing assessments at this level emerge.

1. Can listening performance be distinguished from cognitive processing factors such as memory, associations, storage, and recall?

2. As assessment procedures become more communicative, does the task take into account test-takers' ability to use grammatical expectancies, lexical collocations, semantic interpretations, and pragmatic competence?

3. Are test tasks themselves correspondingly content valid and authentic-that is, do they mirror real-world language and context?

4. As assessment tasks become more and more open-ended, they more closely resemble pedagogical tasks, which leads one to ask what the difference is between assessment and teaching tasks. The answer is scoring: the former imply specified scoring procedures, while the latter do not.

Dictation

Dictation is a widely researched genre of assessing listening comprehension. In a dictation, test-takers hear a passage, typically of 50 to 100 words, recited three times: first, at normal speed; then, with long pauses between phrases or natural word groups, during which time test-takers write down what they have just heard; and finally, at normal speed once more so they can check their work and proofread. Dictations have been used as assessment tools for decades. Some readers still cringe at the thought of having to render a correctly spelled, verbatim version of a paragraph or story recited by the teacher. The difficulty of a dictation task can be easily manipulated by the length of the word groups (or bursts, as they are technically called), the length of the pauses, the speed at which the text is read, and the complexity of the discourse, grammar, and vocabulary used in the passage.

Scoring is another matter. Depending on your context and purpose in administering a dictation, you will need to decide on scoring criteria for several possiblekinds of errors:

spelling error only, ,but the word appears to have been heard correctly
spelling 'and/or obvious misrepresentation of a word, illegible word
grammatical error (For example, test-taker hears I cant do it, writes I can do it.)
skipped word or phrase
permutation of words
additional words not in the original
replacement of a word with an appropriate synonym

Determining the weight of each of these errors is a highly idiosyncratic choice; specialists disagree almost more than they agree on the importance of the above categories. They do agree (Buck, 2001) that a dictation is not a spelling test; and that the first item in the list above should not be considered an error. Dictation seems to provide a reasonably valid method for integrating listening and writing skills and for tapping into the cohesive elements of language implied in short passages. Despite these disadvantages, the practicality of the administration of dictations, a moderate degree of reliability in a well-established scoring system, and a strong correspondence to other language abilities speaks well for the inclusion of dictation among the possibilities for assessing extensive (or quasi-extensive) listening comprehension.

Communicative Stimulus-Response Tasks

Another-and more authentic-example of extensive listening is found in a popular genre of assessment. task in which the test-taker is presented with a stimulus monologue or conversation and then is asked to respond to a set of comprehension. Does this meet the criterion of authenticity? If you want to be painfully fussy, you might object that it is rare in the real world to eavesdrop on someone else's doctor-patient conversation. Nevertheless, the conversation itself is ._relativelyauthentic; we all have doctor-patient exchanges like this. Equally authentic, if you add a grain of salt, are monologues, lecturer, and news stories, all of which are commonly utilized as listening stimuli to be followed by comprehension questions

aimed at assessing certain objectives that are built into the stimulus.

Authentic Listening Tasks

Ideally, the language assessment field would have a stockpile of listening test types

that are cognitively demanding. communicative, and authentic, not to mention interactive

by means of an integration with speaking. However, the nature of a test as a sa1nple of performance and a set of tasks with limited time frames implies an equally limited capacity to mirror all the real-world contexts of listening performance. " There no such thing as a communicative language ability stated Buck (200 1, p. 92). "Every test requires

some comPofieiits-oicomIDiiiifcatlve language ability, and no test covers them all. Here are some possibilities

1.Note-taking.

2. Editing. Another

3. Interpretive tasks. One of the intensive listening tasks described above was paraphrasing a story or conversation. An interpretive task extends the stimulus material to a longer stretch of discourse and forces the test-taker to infer a response. Potential stimuli include

song lyrics,
[recited] poetry, .
radio/television news reports, and
an oral account of an experience.

4.Retelling.

ASSESSING SPEAKING

From a pragmatic view of language performance, listening and speaking are almost always closely interrelated. While it is possible to isolate some listening performance types (see Chapter 6),'it is very difficult to isolate oral-production tasks that do not directly involve the interaction of aural comprehension. Only in limited contexts of speaking (monologues, speeches, or telling a story and reading aloud) can we assess oral language without the aural participation of an interlocutor. While speaking is a productive skill that can be directly and empirically observed, those observations are invariably colored by the accuracy and effectiveness Of a test takers listening skill, which necessarily compromise the reliability and validity of an oral production test.

speaking is the product of creative construction of linguistic strings, the speaker makes choices of lexical srructure, and discourse:-If-your-goal is to-have test-takers demonstrate certain spoken grammatical categories, for example, the stimulus you design must elicit those grammatical categories in ways that prohibit the test-taker from avoiding or paraphrasing and thereby dodging production of the target form. A task become more and more open ended, the freedom of choice given to test-takers creates a challenge in scoring procedures. In receptive performance, the hecitation stimulus can be structured to anticipate predetermined responses and only those responses. In productive performance, the oral or written stimulus must be specific enough to elicit output within an expected range of performance such that scoring or rating procedures apply appropriately.

BASIC TYPES OF SPEAKING

In Chapter 6, we cited four categories of listening performance assessment tasks.

A similar taxonomy emerges for oral production

imitative.

Intensive.

Responsive.

Interactive.

Extensive (monologue)

MICRO- AND MACROSKUJS OF SPEAKING

In Chapter 6, a list of listening micro- and macroskills enumerated the various components of listening that make up criteria for assessment. A similar list of speaking skills can be drawn up for the same purpose: to serve as a taxonomy of skills from which you 'will select one or several that will become the objective(s) of an assessment task. The microskills refer to producing the smaller chunks of language such as phonemes, morfems, words, collocations, and phrasal units. The macroskills imply the speaker's focus on die-larger elements fluency, discourse, function, style, cohesion, nonverbal c?mmunication, and strategic.

Micro- and macroskills of oral production

Microskills

Produce differences among English phonemes and allophonic variants.
Produce chunks of language of different lengths.
Produce English stress patterns, words in stressed and unstressed positions,
rhythmic structure, and intonation contours.
Produce reduced forms of words and phrases.
Use an adequate number of lexical units (words) to accomplish pragmatic
purposes.
Produce fluent speech at different rates of delivery
Monitor one's own oral production and use various strategic devices pauses, fillers, self-corrections, backtracking-to enhance the clarity of the message.
Use grammatical word classes (nouns, verbs, etc.), systems (e.g., tense, agreement, pluralization), word order, patterns, rules, and elliptical forms.
Produce speech in natural constituents: in appropriate phrases, pause groups, breath groups, and sentence constituents
Express a particular meaning in different grammatical forms.
Use cohesive devices in spoken discourse.

Macroskills

Appropriately accomplish communicative functions according to
situations, participants, and goals.
Use appropriate styles, registers, implicature, redundancies, pragmaticconventions, conversation rules, floor-keeping and -yielding, interrupting, and other sociolinguistic features in face-to-face conversations.
Convey links and connections between events and communicate such relations as focal and peripheral ideas, events and feelings, new information and given information, generalization and exemplification.
Convey facial features, kinesics, body language, and other nonverbal cues along with verbal language.
Develop and use a battery of speaking strategies, such as emphasizing key words, rephrasing, providing a context for interpreting the meaning of words, appealing for help, and accurately assessing how well youre a interlocutor is understanding you.

There is such an array of oral production tasks that a complete treatment is almost impossible within the confines of one chapter in this book. Below is a consideration of the most common techniques with brief allusions to related tasks. As already noted in the introduction to this chapter, consider three important issues as you set out to design tasks:

No speaking task is capable of isolating the single skill of'oral production. Concurrent involvement of the additional performance of aural comprehension, and possibly reading, is usually necessary.
Eliciting the specific criterion you have designated for a task can be tricky because beyond the word level, spoken language offers a number of productive
options to test-takers. Make sure your elicitation prompt achieves its aims as closely
as possible.
Because of the above two characteristics of oral production assessment, it is important to carefully specify scoring procedures for a response so that ultimately you achieve as high a reliability index as possible.

DESIGNING ASSESSMENT TASKS: IMITATIVE SPEAKING

You may be surprised to see the inclusion of simple phonological imitation in a consideration of assessment of oral production. After all, endless repeating of words, phrases, and sentences was the province of the long-since-discarded Audio lingual Method, and in an era of communicative language teaching, many believe that non meaningful imitation of sounds is fruitless. Such opinions-have faded in recent years as we discovered that an overemphasis on fluency can sometimes lead to the decline of accuracy in speech. An occasional phonologically focused repetition task is warranted as long as repetition tasks are not allowed to occupy a dominant role in an overall oral production assessment, and as long as you artfully avoid a negative washback effect. Such tasks range from word level to sentence level, usually with each item focusing on. a specific phonological criterion.

PHONEPASS® TEST

The PhonePass test elicits computer-assisted oral production over a telephone. Test-takers. read aloud, repeat sentences, say words, and answer questions. With a downloadable test sheet as a reference, test-takers are directed to telephone a designated number and listen for directions. The PhonePass findings could signal an increase in the future use of repetition and read-aloud procedures for the assessment oL.oraLproduction. Because a test takers output is completely controlled, scoring using speech-recognition technology becomes achievable and practical. As researchers uncover the constructs underlying both repetition/read-aloud tasks and oral production in all its complexities,we will have access to more comprehensive explanations of why such simple tasks appear to be reliable and valid indicators of very complex oral production proficiency.

DESIGNING ASSESSMENT TASKS: INTENSIVE SPEAKING

At the intensive level, test-takers are prompted to produce short stretches of discourse (no more than a sentence) through which they demonstrate linguistic ability at a specified level of language. Many tasks are "cued" tasks in that they lead the test taker into a narrow band of possibilities.

Directed Response Tasks

In this type of task, the test administrator elicits a particular grammatical form or a transformation of a sentence. Such tasks are clearly mechanical and not communicative, but they do require minimal processing of meaning in order to produce the correct grammatical output

Directed response

Test-takers hear:

Tell me he went home.

Tell me that you like rock music.

Tell me that you aren't interested in tennis.

Tell him to come to my office at noon.

Remind him what time it is.

Read-Aloud Tasks

Intensive reading-aloud tasks include reading beyond the sentence level up to a paragraph or two. This technique is easily administered by selecting a passage that incorporates test specs and by recording the test-taker's output; the scoring is relatively easy because all of the test taker's oral production is controlled. Prator's (1972) Manual of American English Pronunciation included a "diagnostic passage" of about 150 words that students could read aloud into a tape recorder. Teachers listening to the recording would then rate students on number of phonological factors (vowels, diphthongs, consonants, consonant clusters, stress, and intonation) by completing a two-page diagnostic checklist on which all errors or questionable items were noted. These checklists ostensibly offered direction

to the teacher for emphases in the course to come.

Underhill (1987, pp. 77-78) suggested some variations on the task of simply

reading a short passage:

reading a scripted dialogue, with someone else reading the other part
reading sentences containing minimal pairs, for example :Try not to heat/hit the pan too much. And The doctor gave me a bill or pill.
reading information from a table or chart

Sentence/Dialogue Completion Tasks and Oral Questionnaires

Read dialogue in which one speaker's lines have been omitted. Test-takers are first given time to read through the dialogue to get its gist and to think about appropriate lines to fill in. An advantage-if-this technique lies in its moderate control of the output of the test-taker. While individual variations in responses are accepted, the technique taps into a learner's ability to discern expectancies in a conversation and to produce sociolinguistically correct language. One disadvantage of this technique is its reliance on literacy and an ability to transfer easily from written to spoken English.

Underhill (1987) describes yet another technique that is useful for controlling the test-taker's output: form-illling, or what I might rename "oral questionnaire." Here the test-taker sees a questionnaire that asks for certain categories of information (personal data, academic information, job experience, etc.) and supplies the information orally.

Picture-Cued Tasks

One of the more popular ways to elicit oral language performance at both intensive and extensive levels is a pictl1re-cued stimulus that requires a description from the test taker. Pictures may be very simple, designed to elicit a word or a phrase; somewhat more elaborate and "busy"; or composed of a series that tells a story or incident.

Opinions about paintings, persuasive monologue., and directions on a map create a more complicated problem for scoring. More demand is placed on the test administrator to make calculated judgments, in which case a modified form of a scale such as the one suggested for evaluating interviews (below) could be used:

grammar
vocabulary
comprehension
fluency
pronunciation
task (accomplishing the objective of the elicited task)

Translation (of limited Stretches of Discourse)

Translation is a part of our tradition in language teaching that we tend to discount or disdain, if only because our current pedagogical stance plays down its importance. Translation methods of teaching are certainly passe in an era of direct approaches to creating communicative classrooms. But we should remember that incountries where English is not the native or prevailing language. Under certain constraints, then, it is not far-fetched to suggest translation as a device to check oral production. Instead of offering pictures or written stimuli, the test-taker is given a native language word, phrase, or sentence and is asked to translate it.

DESIGNING ASSESSMENT TASKS: RESPONSIVE SPEAKING

Assessment of responsive tasks involves brief interactions with an interlocutor, differing from intensive tasks in the increased creativity given to the test-taker and from interactive tasks by the somewhat limited length of utterances.

Question and Answer

Question-and-answer tasks can consist of one or two questions from an interviewer, or they can make up a portion of a whole battery of questions and prompts in an oral interview. They can vary from simple questions like "What is this called in English?" to complex questions like "What are the steps governments should take, if any, to stem the rate of deforestation in tropical countries?" The first question is intensive in its purpose; it is a display question intended to elicit a predetermined correct response. We have already looked at some of these types or questions in the previous section. Questions at the responsive level tend to be genuine referential questions in which the test-taker is given more opportunity to produce meaningful language in response.

In designing such questions for test-takers, ifs important to make see that you know why you are asking the question. Are you simply trying to elicit strings of language output to gain a general sense of the test-taker's discourse competence? Are you combining discourse and grammatical competence in the same question? Is each question just one in a whole set of related questions? Responsive questions may take the following forms:

Questions eliciting open-ended responses

Test-takers hear:

1. What do you think about the weather today?

2. What do you like about the English language?

3. Why did you choose your academic major?

4. What kind of strategies have you used to help you learn English?

5. a. Have you ever been to the United States before?

b. What other countries have you visited?

c. Why did you go there? What did you like best about it?

d. If you could go back. what would you like to do or see?

e. What country would you like to visit next. and why?

Test-takers respond with a few sentences at most

Giving Instructions and Directions

Using such a stimulus in an assessment context provides an opportunity for the test-taker to engage in a relatively extended stretch of discourse, to be very clear and specific, and to use appropriate discourse markers and connectors. The technique is Simple: the administrator poses the problem, and the test-taker responds. Scoring is based primarily on comprehensibility and secondari1y on other specified grammatical or discourse categories. Here are some possibilities. Some pointers for creating such tasks: The test administrator needs to guard against test-takers knowing and preparing for such items in advance lest they simply parrot back a memorized set of sentences. An impromptu delivery of instructions is warranted here, or at most a minute or so of preparation time.

Paraphrasing

Another type of assessment task that can be categorized as responsive asks the test taker to read or hear a limited number of sentences (perhaps two to five) and-produce a paraphrase of the sentence. For example:

Paraphrasing a story

Test-takers hear: Paraphrase the following little story in your own words.

My weekend in the mountains was fabulous. The first day we backpacked into the

mountains and -climbed about 2.000 feet. The hike was strenuous but exhilarating. By

sunset we found these beautiful alpine lakes and made camp there. The sunset was amazingly beautiful. The next two days we just kicked back and did little day hikes.

some rock climbing, bird watching, swimming, and fishing. The hike out on the next

day was rea.lly" easy-all downhill-and the scenery was incredible.

Test-takers respond with two or three sentences.

The advantages of such tasks are that they elicit short stretches of output and perhaps tap into test-takers' ability to practice the conversational art of conciseness by reducing the output/input ratio.

TEST OF SPOKEN ENGLISH (TSE@)

The tasks on theTSE are designed to elicit oral production in various discourse categories rather than in selected phonological , grammatical or lexical targets. The following content specifications for the TSE represent the discourse and pragmatic contexts assessed in each administration:

1. Describe something physical.

2. Narrate from presented material.

3. Summarize information of the speaker's own choice.

4. Give directions based on visual materials.

5. Give instructions.

6. Give an opinion.

'. 7. Support an. opinion.

8. Compare/contrast.

9. Hypothesize.

10. Function "interactively."

11. Define.

Using these specifications, Lazaraton and Wagner (1996) examined 15 different specific

tasks in collecting background data from native and non-native speakers of English.

1. giving a personal description

2. describing a daily routine

~,. suggesting a gift and supporting one's choice

4. recommending a place to visit and supporting one's choice

5. giving directions

6. describing a favorite movie and supporting one's choice

7. telling a story from pictures

8. hypothesizing about future action

9. hypothesizing about a preventative action

10. making a telephone call to the dry cleaner

11. describing an important news event

., 12. giving an opinion about animals in the zoo

13. defIning a technical term

14. describing information in a graph and speculating about its implications

15. giving details about a trip schedule

DESIGNING ASSESSMENT TASKS: INTERACTIVE SPEAKING

Interview

Interviews can vary in length from perhaps five to forty-five minutes, depending on their purpose and context. Placement interviews, designed to get a quick spoken sample from a student in order to verify placement into a course. May need only five minutes if the interviewer is trained to evaluate the output accurately. Longer comprehensive interviews such as the OPI (see the next section) are designed to cover predetermined oral production contexts and may require the better part of an hour. Every effective interview contains a number of n land a story stages. 1W'0 decades ago, Michael Canale (1984) proposed a framework for oral proficiency testing that has withstood the test of time. He . suggested that test-takers will perform at their best if they are led through four stages.

Warm-up.
Level check.
Probe.
wind-down

The success of an oral interview will depend on

clearly specifying administrative procedures of the assessment (practicality), focusing the questions and probes on the purpose of the assessment (validity),
appropriately eliciting an optimal amount and quality of oral production from the test taker (biased for best performance), and
creating a consistent, workable scoring system (reliability).

This last issue is the thorniest. In oral production tasks that are open-ended and that involve a significant level of interaction, the interviewer is forced to make judgments that are susceptible to some unreliability. Through experience, training, and careful attention to the linguistic criteria being assessed, the ability to make such judgments accurately will be acquired.

Role Play

Role playing is a popular pedagogical activity in communicative language-teaching classes. Within constraint set forth by the guidelines, it frees students to be somewhat creative in their linguistic output. As an assessment device, role play opens some windows of opportunity for test takers to use discourse that might otherwise be difficult to elicit. With prompts such as "Pretend that you're a tourist asking me for directions" or "You're buying a necklace from me in a flea market, and you want to get a lower price," certain personal, strategic, and linguistic factors come into the foreground of the test-taker's oral abilities.

Discussions and Conversations

As formal assessment devices, discussions and conversations with and among students are difficult to specify and even more difficult to score. But as informal techniques to assess learners, they offer a level of authenticity and spontaneity that other assessment techniques may not provide. Discussions may be especially appropriate tasks through which to elicit and observe such abilities as

topic nomination, maintenance, and termination;
attention getting, interrupting, floor holding, control;
clarifying, questioning, paraphrasing;
comprehension Signals (nodding, "uh-huh,""hmm," etc.);
negotiating meaning;
intonation patterns for pragmatic effect;
kinesics, . eye contact, proxemics, body language; and
politeness, formality, and other Sociolinguistic factors

Games

Among informal assessment devices are a variety of games that directly involve language production. As assessments, the key is to specify a set of criteria and a reasonably practical and reliable scoring method. The-benefit of such an informal assessment may' not be as much in a summative evaluation as in its formative nature, with washback for the students.

ORAL PROFICIENCY INTERVIEW (OPI)

The best-known oral interview format is one that has gone through a considerable metamorphosis over the last half-century, the Oral Proficiency Interview ( OPI). Originally known as the Foreign Service Institute (FSI) test, the OPI is the result of a historical progression of revisions under the auspices of several agencies, including the Educational Testing Service and the American Council on Teaching Foreign Languages (ACTFL). Specifications for the OP! approximate those delineated above under the discussion of oral interviews in general In a series of structured tasks, the OPI is carefully designed to elicit pronunciation, fluency and integrative ability, sociolinguistic and cultural knowledge, grammar, and vocabulary.

From a Vygotskyan perspective, the OPI forces test-takers into a closed system where, because the interviewer is endowed with full social control, they are unable to negotiate a social world. For example, they cannot nominate topics for discussion, they cannot switch formality levels, they cannot display a full range of stylistic maneuver. The total control the OPI interviewers possess is reflected by the parlance of the test methodology .... In short, the OPI can only inform :us of how learners can deal with an artificial social imposition rather than enabling us to predict how they would be likely to manage authentic linguistic interactions with target-language native speakers.

DESIGNING ASSESSMENTS: EXTENSIVE SPEAKING

Oral Presentations

summary of oral assessment techniques would therefore be incomplete without some consideration of extensive speaking tasks. Once again the rules for effective assessment must be invoked: (a) specify the criterion, (b) set appropriate tasks, (c) elicit optimal output, and (d) establish practical, reliable scoring prcedures. And once again scoring is the key assessment challenge.

For oral presentations, a checklist or grid is a common means of scoring. or evaluation. Holistic scores are tempting to use for their apparent practicality, but they may obscure the variability of performance across several subcategories, especially the two major components of content and delivery.

Picture-Cued Story-Telling

One of the most common techniques for eliciting oral production is through visual pictures, photographs, diagrams, and charts.

Retelling a Story, News Event

In this type of task, test-takers hear or read a story or news event that they are asked to retell. This differs from the paraphrasing task discussed above (pages 161-162) in that it is a longer stretch of discourse and a different genre. The objectives in assigning such. a task vary from listening comprehension of the original to production of a number of oral discourse features (communicating sequences and relationships 01 events, stress and emphasis patterns, . "expression" in the case of a dramatic story), fluency, and interaction with the hearer. Scoring should of course meet the intended criteria.

Translation (of Extended Prose)

Translation of words, phrases, or short sentences was mentioned under the category of-intensive speaking. The advantage of translation is in the control of the content, vocabulary, and, to some extent, the grammatical and discourse features. The disadvantage is that translation of longer texts is a highly specialized skill for which some individuals obtain post-baccalaureate degrees! To judge a non specialist's oral language ability on such a skill may be completely invalid, especially if the test-taker has not engaged in translation at this level. Criteria for scoring should therefore take into account not only the purpose in stimulating a translation but the possibility of errors that are unrelated to oral production ability.

One consequence of our being articulate mammals is an extraordinarily complex system of vocal communication that has evolved over the millennia of human existence. This chapter has offered a relatively sweeping overview of some of the ways we have learned to assess our wonderful ability to produce sounds, words, and sentences, and to string them together to make meaningful texts. This chapter's limited number of assessment techniques may encourage your imagination to explore a potentially limitless number of possibilities for assessing oral production.

Source :

Brown, H. G. (2004) Language Assessment : Principle and Classroom Practice.New York : Longman

Tugas language assessment

Selasa, 05 Mei 2020

SUMMARY ASSESSING LISTENING AND SPEAKING

Tidak ada komentar:

Posting Komentar

SUMMARY ASSESSING GRAMMAR AND VOCABULARY

Laporkan Penyalahgunaan