Tugas language assessment: April 2020

Senin, 27 April 2020

SUMMARY BEYOND TESTS : ALTERNATIVES IN ASSESSMENT

In the public eye, tests have acquired an aura of infallibility in our culture of mass producing everything, including the education of school children. Everyone wants a test for everything, especially if the test is. cheap, quickly administered, and scored instantaneously. But we saw in Chapter 4 that while the standardized test industry has become a powerful juggernaut of influence on decisions about people's lives, it also has come under severe criticism from the public (Kahn, 2000). A more balanced viewpoint is offered by Bailey (1998, p. 204): "One of the disturbing things about tests is the extent to which many people accept the results uncritically, while other believe that all testing is invidious. But tests are simply measurement tools: It is the use to which we put their results that can be appropriate or inappropriate."It is clear by now that tests are one ofa number of possible types of assessment.
Assessment connotes a much broader concept in that most of the time when teachers are teaching, they are also assessing. Assessment includes all occasions from informal impromptu observations and comments up to and including tests. Early in the decade of the 1990s, in a culture of rebellion against the notion that all people and all skills could be measured by traditional tests, a novel concept emerged that began to be labeled "alternative" assessment. As teachers and students becoming aware of the shortcomings of standardized tests, "an alternative to standardized testing and all the problems found with such testing" (Huerta-Macias, 1995, p. 8) was proposed. That proposal was to assemble additional measures of students-portfolios, journals, observations, self-assessments, peer-assessments, and the like-in an effort to triangulate data about students. For some, such alternatives held "ethical potential" (Lynch, 2001, p. 228) in their promotion. of fairness and the balance of power relationships in the classroom. The defining characteristics of the various alternatives in assessment that have Been commonly used across the profession were aptly summed up by Brown and Hudson (1998, pp. 654-655). Alternatives in assessments.

1. require students to perform, create, produce, or do something;
2. use real-world contexts or Simulations;
3. are nonintrusive in that they extend the day-to-day classroom activities;
4. allow students to be assessed on what they normap.y; do in class every day;
5. use tasks that represent meaningful instructional activities; , j I
6. focus on processes as well as products;
7. tap into higher-level thinking and problem-solving skills;
8. provide information about both the strengths and weaknesses of students;
9. are multiculturally sensitive whenpropedy administered;
10. ensure that people, not machines, do the scoring, using human judgment;
11. encourage open disclosure of standards and rating criteria; and
12. call upon teachers to perform new instructional and assessment roles.

THE DILEMMA OF MAXIMIZING BOTH PRACTICALITY AND WASHBACK

The principal- purpose-of this chapter is to--examine-some of the alternatives in assessment that are markedly different from formal tests. Tests, especially large-scale standardized tests, tend to be one-shot performances that are timed, multiple- choice, decontextualized , norm referenced, and that foster extrinsic motivation. On the other hand, tasks like portfolios, journals, and self-assessment are

open-ended in their time orientation and format,
contextualized to a curriculum,
referenced to the criteria (objectives) of that curriculum, and
likely to build intrinsic motivation.

One way of looking at this contrast poses a challenge to you as a teacher and test designer. Formal standardized tests are almost by definition highly practical, reliable instruments. They are designed to minimize time and money on the part of test designer and test-taker, and to be painstakingly accurate in their scoring. Alternatives such as portf0lios, or conferencing with students on drafts of written work, or observations of learners over time all require considerable time and effort on the part of the teacher and the student. Even more time must be spent if the teacher hopes to offer a reliable evaluation within students across time, as well as across students (taking care not to favor one student or group of students). But the alternative techniques also offer markedly greater washback, are superior formative measures, and, because of their authenticity, usually carry greater face validity.

PERFORMANCE-BASED ASSESSMENT

Before proceeding to a direct consideration of types of alternatives in assessment, a word about performance-based assessment is in order. There has been a great deal of press in recent years about performance-based assessment, sometimes merely called performance assessment (Shohamy, 1995; Norris et aI., 1998). Performance-based assessment implies productive, observable skills, such as speaking and writing, of content-valid tasks. Such performance usually, but not always, brings with it an air of authenticity-real-world tasks that students have had time to develop. It often implies an integration of language skills, perhaps all four skills in the case of project work. Because the tasks that students perform are consistent with course goals and curriculum, students and teachers are likely to be more motivated to perform them, as opposed to a set of multiple-choice questions about facts and figures regarding the solar system. O'Malley and Valdez Pierce (1996) considered performance-based assessment to be a subset of authentic assessment. In other words, not all authentic assessment is performance-based. One could infer that reading, listenillg, and thinking have many authentic manifestations, but since they are not directly observable in and of
themselves, they are not performance-based. According to O'Malley· and Valdez Pierce (p. 5), the following are characteristics of performance assessment:

1. Students make a constructed response.
2. They engage in bigber -order thinking, with open-ended tasks.
3. Tasks are meaningful engaging, and authentic.
4. Tasks call for the integration of language skills.
S. Both process and product are assessed.
6. Depth of a student's mastery is emphasized over breadth

Performance-based assessment needs to be approached with caution. It is tempting for teachers to assume that if a student is doing something, then the process hasfulfilled its own goal and the evaluator-needs only to make a mark in the grade book that says "accomplished» next to a particular competency. In reality, performances as assessment procedures need to be treated with the same rigor as traditional tests. This implies that teachers should

state the overall goal of the performance,
specify the objectives (criteria) of the performance in detail,
prepare students for performance in stepwise progressions, .....
use a reliable evaluation form, checklist; or rating sheet, ..
treat performances as opportunities for giving feedback and provide that feedback systematically, and
if possible, utilize self- and peer-assessments judiciously.

PORTFOUOS

One of the most popular alternatives in assessment, especially within a framework of communicative language teaching, is portfolio development. According to Genesee and Upshur (1996), a portfolio is "a purposeful collection of students' work that demonstrates their efforts, progress, and achievements in given areas" (p. 99). Portfolios include materials such as :

essays and compositions in draft and fmal forms :
reports, project outlines;
poetry and creative prose;
artwork, photos, newspaper or magazine clippings;
audio and/or video recordings of presentations, demonstrations, etc.;
journals, diaries, and other personal reflections; .
tests, test scores, and written homework exercises;
notes on lectures; and
self· and peer-assessments comments, evaluations, and checklists

Gottlieb (1995) suggested a developmental scheme for considering the nature and purpose of portfolios, using the acronym CRADLE to designate six possible attributes of a portfolio:

Collecting
Reflecting
Assessing
Documenting
linking
Evaluating

The advantages of engaging students in portfolio development have been extolled in a number of sources (Genesee & Upshur, 1996; O'Malley &Valdez Pierce, 1996; Brown & Hudson, 1998; Weigle, 2002). A synthesis of those characteristics gives us a number of potential benefits. Portfolios

foster intrinsic motivation, responsibility, and ownership,
promote student-teacher interaction with the teacher as facilitator,
individualize learning and celebrate the uniqueness of each student,
provide tangible evidence of a student's work,
facilitate Critical thinking, self-assessment, and revision processes,
offer opportunities for collaborative work with peers, and
permit assessment of multiple dimensions of language learning.

At the same time, care must be taken lest portfolios become a haphazard pile of"junk" the purpose of which is a mystery to both teacher and student. Portfolios can fail if objectives are not clear, if guidelines are not given to students, if systematic periodic review and feedback are not present, and so on. Sometimes the thought of asking students to develop a portfolio is a daunting challenge, especially for new teachers and for those who have never created a portfolio on their own. Successful portfolio development will depend on following a number of steps and guidelines.

State objectives clearly. Pick one or more of the CRADLE attributes named above and specify them as objectives of developing a portfolio.
Give guidelines on what materials to include. Once the objectives have been determined, name the types of work that should be included.
Communicate assessment criteria to students. This is both the most important aspect of portfolio development and the most complex.
students feel rushed to gather materials and reflect on them, the effectiveness of the portfolio process is diminished. Make sure that students have time set aside for portfolio work (including in-class time) and that your own opportunities for conferencing are not compromised.
Establish periodic schedules for review and conferencing. By doing so, you will prevent students from throwing everything together at the end of a ten.
DeSignate an accessible place to keep portfolios. It is inconvenient for students to carry collections of papers and artwork. If you have a self-contained classroom or a place in a reading room or library to keep the materials, that may provide a good option. At the university level, designating a storage place on the campus may involve impossible logistics. In that case, encourage students to create their own accessible location and to bring to class only the materials they need.
Provide positive washback -giving final assessments. When a portfolio has been completed and the end of a term has arrived, a final summation is in order.

JOURNAlS

A journal is a log (or "account") of one's thoughts, feelings, reactions assessments,
ideas, or progress toward goals, usually written with little attention to structure, form, or correctness. Learners can articulate their thoughts without the threat of those thoughts being judged later (usually by the teacher). Sometimes journals are rambling sets of verbiage that represent a stream of consciousness with no particular point, purpose, or audience. Fortunately, models of journal use in educational practice have sought to tighten up this style of journal in· order to give them some focus (Staton et al., 1987). The result is the emergence of a number of overlapping categories or purposes in journal writing, such as the following:

language-learning logs
grammar journals
responses to readings
strategies-based learning logs
self-assessment reflections
diaries of attitudes, feelings, and other affective factors
acculturation logs

Most classroom-oriented journals are what have now come to be known as dialogue journals. They imply an interaction between a reader (the teacher) and the student through dialogues or responses. For the best results, those responses should be dispersed across a course at regular intervals, perhaps weekly or biweekly. One of the principal objectives in. a student's dialogue journal is to carry on a conversation with. the teacher. Through dialogue journals, teachers can become better acquainted with their students, in terms of both their learning progress and their affective states, and thus become better equipped to meet students individual needs. It is important to turn the advantages and potential drawbacks of journals into positive general steps and guidelines for using journals as assessment instruments. The following steps are not coincidentally parallel to those cited above for portfolio development:

1. Sensitively introduce students to the concept ofjournal writing. For many students, especially those from educational systems th~t play down the, notion of teacher-student dialogue and collaboration, journal writing' will be difficult at first. University-level students, who have passed through a dozen years of product writing, will have particular difficulty with the concept of writing without fear of a teacher's scrutinizing every grammatical or spelling error.With modeling, assurance, and purpose, however, students can make a remarkable transition into the potentially liberating process of journal writing. Students who are shown examples of journal entries and are given specific topiCS and schedules for writing will become comfortable with the process.
2. State the objective(s) of the journal. Integrate journal writing into the objectives
of the curriculum in some way, especially if journal entries become topics
of class discussion. The list of types of journals at the beginning of this section may
coincide with the following examples of some purposes of journals:

Language-learning logs
Grammar journals.
Responses to readings
Strategies-based learning logs,
Self-assessment reflections.
Diaries of attitudes, feelings, and other affective factors.
Acculturation logs.

3.type of journal is clear, students will benefit from models or suggestions on what kinds of topics to incorporate into their journals.
4. Carefully specify the criteria for assessing or grading journals. Students need to understand the freewriting involved in journals, but at the same time, they need to know assessment criteria. Once you have clarified that journals will not be evaluated for grammatical correctness and rhetorical conventions, state how they will.be evaluated. Usually the purpose of the journal will dictate the major assessment- criterion.Effort as exhibited in the thoroughness of students' entries will-no doubt be important. Also, the extent to which entries reflect the processing of course content might be considered. Maintain reliability by adhering conscientiously to the criteria that you have set up.
5. Provide optimal feedback in your responses. McNamara (1998, p. 39) recommended three different kinds of feedback to journals:

cheerleading feedback, in which you celebrate successes with the students or encourage them to persevere through difficulties,
instructional feedback, in which you suggest strategies or materials, suggest ways to fme-tune strategy use, or instruct students in their writing, and
reality-check feedback, in which you help the students set more realistic expectations for their language abilities.

6. Designate appropriate time frames and scbedules for review. Journals, like port0folies, need to be esteemed by students as integral parts of a course. There· fore, it is essential to budget-enough time within a curriculum for both writing journals and for your written responses. Set schedules for submitting journal entries periodically; return them in short order.
7. Provide formative, washback-giving final comments. Journals, perhaps even more than portfolios, are the most formative of all the alternatives in assessment They are day-by-day (or at least weekly) chronicles of progress whose pur· pose is to provide a thread of continuous assessment and reassessment, to recognize mid-stream direction changes, and/or to refocus on goals. Should you reduce a final assessment of such a procedure to a grade or a score? Sonle say yes, some say no (peyton & Reed, 1990), but it appears to be in keeping with the formative nature of journals not to do so.

CONFERENCES AND INTERVIEWS

reference was made to conferencing as a standard part of the process approach to teaching writing, in which the teacher, in a conversation about a draft, facilitates the improvement of the written work Such interaction has the advantage of one-on-one interaction between teacher and student, and the teacher's being able to direct feedback toward a student's specific needs. Conferences are not limited to drafts of written work. Including portfolios and journals discussed above, the list of possible functions and subject matter for conferencing is substantial:

commenting on drafts of essays and reports
reviewing portfolios
responding to journals
, advising on a student's plan for an oral presentation
assessing a proposal for a project
giving feedback on the results of performance on a test
clarifying understanding of a reading
exploring strategies-based options for enhancement or compensation
focusing on aspects of oral production
checking a student's self-assessment of a performance
setting personal goals for the near future
assessing general progress in a course

Genesee and Upshur (1996, p. 110) offered a number of generic kinds of questions that may be useful to pose in a conference:

What did you like about this work?
What do you think you did well?
How does it show improvement from previous work? Can you show me the improvement?
Are there things about this work you do not like? Are there things you would like to impr0ve?
Did you have any difficulties with this piece of work? If so, where, and what did you do [will you do] to overcome them?
What strategies did you use to figure out the meaning of words you could not understand?
What did you do when you did not know a word that you wanted to write?

B interviews have multiple objectives, as noted above, it is difficult tomgeneralize principles for conducting them, but the following ,guidelines may help to frame the questions efficiently:
1. Offer an initial atmosphere of warmth and anxiety-lowering (warm-up).
2. Begin with relatively simple questions.
3. Continue with level check and probe questions, but adapt to the interviewee
as needed.
4. Frame questions simply and directly.
5. Focus on only one factor for each question. Do not combine several objectives
in the same question.
6. Be prepared to repeat or reframe questions that are not understood.
7. Wind down with friendly and reassuring closing comments.

How do conferences and interviews score in terms of principles of assessment? Their practicality, as is true for man of the alternatives J9Jlssessment,Js low because they are time consuming. Reliability will vary between conferences and interviews. In the case of conferences, it may not be important to have rater reliability because the whole purpose is to offer individualized attention, which will vary greatly from student to student. For interviews, a relatively high level of reliability should be maintained with careful attention to objectives and procedures. Face validity for both can be maintained at a high level due to their individualized nature. As long as the subject matter of the conference/interview is clearly focused on the course and course objectives, content validity should also be upheld.

OBSERVATIONS
All teachers, whether they are aware of it or not, observe their students in the classroom almost constantly_ Virtually every question. every response, and almost every nonverbal behavior is, at some level of perception, noticed. All those intuitive perceptions are stored as little bits and pieces of information about students that can form a composite impression of a student's ability. Without eyer administering a test or a quiz, teachers know a lot about their students. In fact, experienced teachers are so good at this almost subliminal process of assessment that their estimates of a student's competence are often highly correlated with actual independently administered test scores. One of the objectives of such observation is to assess students without their awareness (and possible consequent anxiety) of the observation so that the naturalness of their linguistic performance is maximized.

In order to carry out classroom observation, it is ofcourse important to take the following steps:
1. Determine the specific objectives of the observation.
2. Decide how many students will be observed at one time.
3. Set up the logistics for making t:mn0ticed observations.
4. Design a system for recording observed performances.
5. Do not overestimate the number of different elements you can observe at one
time-keep them very limited.
6. Plan how many observations you will make.
7. Determine specifically how you will use the results.

Designing a system for observing is no simple task. Recording your observations can take the form of anecdotal records, checklists, or rating scales. Anecdotal records should be as specific as possible in focusing on the objective of the observation, but they are so varied in form that to suggest format here would -be counterproductive. Their very purpose is more note-taking than record-keeping .The key is to devise a system that maintains the principle of reliability as closely as possible. Checklists are a viable alternative for recording observation results. Some checklists of student classroom performance, such as the COLT observation scheme devised by Spada and Frohlich (1995), are elaborate grids referring to such variables as

whole-class, group, and individual participation,
content of the topic,
linguistic competence (form, function, discourse, Sociolinguistic),
materials being used, and
skill (listening, speaking, reading, writing),

If you scrutinize observations under the microscope of. principles of assessment, you will probably find moderate practicality and reliability in this type of procedure, especially if the objectives' are kept very Simple. Face validity and content validity are likely to get high marks since observations are likely to be integrated intothe ongoing process of a course. Washback is only moderate if you do little follow. up on observing. Some observations for research purposes may yield no washback whatever if the researcher simply disappears with the ifl...formation and never communicates anything back to the student. But a subsequent conference with a student can then yield very high washback as the student is made aware of empirical data on targeted performance. Authenticity is high because, if an observation goes relatively unnoticed by the student, then there is little likelihood of contrived contexts or playacting.

SELF- AND PEER ASSESSMENTS

A conventional view of language assessment might consider the notion of self and peer-assessment as an absurd reversal of politically correct power relationships. After all, how could learners who are still in the process of acquisition, especially the early processes, be capable of rendering an accurate assessment of their" own performance? Nevertheless, a closer look at the acquisition of any skill reveals the importance, if not the necessity, of self-assessment and the benefit of peer ·assessment. What successful learner has not developed the ability to 'monitor his or her own performance and to use the data gathered for adjustments <and corrections? Most successful learners extend the learning process well beyond the classroom and the presence of a teacher or tutor, autonomously mastering the art of self-assessment.
Self-assessment derives its theoretical justification from a number of well established principles of second language acquisition. The principle of autonomy stand out as one of the primary foundation stones of successful learning. The ability to set one's own goals both within and beyond the structure of a classroom curriculum, to pursue them without the presence of an external prod, and to independently monitor that pursuit are all keys to success. Developing intrinsic motivation that comes from a self-propelled desire to excel is at the top of the list of successful acquisition of any set of skills.
Researchers (such as Brown & Hudson, 1998) agree that the above theoretical underpinnings of self- and peer-assessment offer certain benefits: direct involvement of students in their own destiny, the encouragement of autonomy, and increased motivation because of their self-involvement. Of course, some noteworthy drawbacks must also be taken into account. Subjectivity is a primary obstacle, to overcome. Students may be either too harsh on themselves or too self-flattering, or they may not have the necessary tools to make an accurate assessment. Also, especially in the case of direct assessments of performance (see below), they may not be able to discern their own errors. In contrast, Bailey (1998) conducted a study in which learners showed moderately high correlations (between .58 and .64) between self rated oral production ability and scores on the OPI, which suggests that in the assessment of general competence, learners' self-assessments may be more accurate than one might suppose.

Types of Self- and Peer-Assessment

1. Assessment of a specific} performance

This category, a student typically monitors him- or herself-in either oral or written production-and renders some kind of evaluation of performance. The evaluation takes place immediately or very soon after the performance. Thus, having made an oral presentation, the student (or a peer) fills out a checklist that rates performance on a defined scale. Or perhaps
the student views a video-recorded lecture and completes a self-corrected ·comprehension quiz. A journal may serve as a tool for such self-assessment. the availability of media opens up a number of possibilities for self- and peer-assessment beyond the classroom. Internet sites such as Dave's ESL Café (http://www.eslcafe.coml) offer many self-correcting quizzes and tests. On this and other similar sites, a learner may access a grammar or vocabulary quiz on the Internet and then self-score the result, which may be followed by comparing with a
partner. Television and film media also offer convenient resources for self- and peer assessment. Gardner (1996) recommended· that students in non-English-speaking countries access bilingual news, films, and television programs and then self-assess their comprehension ability. He also noted that video versions of movies with subtitles can be viewed first without the subtitles, then with them, as another form of self- and/or peer-assessment.

2. Indirect assessment of[general) competence

Indirect assessment of[general) competence. Indirect self- or peer-assessment targets larger slices of time with a view to rendering an evaluation of general ability, as opposed to one specific, relatively time-cortstrained performance. The distinction between direct and indirect assessments is the classic competence-performance distinction. Self- and peer-assessments of performance are limited in time and focus to a relatively short performance.
Of course, indirect self- and peer-assessment is not confined to scored rating sheets and questionnaires. An ideal genre for self-assessment is through journals, where students engage in more. open-ended assessment and/or make their own further comments on the results of completed checklists.

3. Metacognitive assessment [for setting goals}.

Some kinds of evaluation are more strategic in nature, with the purpose not just of viewing past performance-or competence but of setting goals and maintaining an eye on the process of their pursuit. Personal goal-setting has the advantage of fostering intrinsic motivation and of providing learners with that extra-special impetus from having set and accomplished one's own goals. Strategic planning and self-monitoring can take the form of journal entries, choices from a list of possibilities, questionnaires, or cooperative (oral) pair or group planning. A simple ,illustration of goal-setting self-assessment was offered by Smolen, Newman, Wathen, and Lee (1995). In response to the assignment of malting "goal cards," a middle-school student wrote:

1. my goal for this week is to stop during reading and predict what is going to happened next in the story
2.my goal for this week is to finish writing my superman story

4. Socioaffective assessment.

Socioaffective assessment. Yet another type of self- and peer-assessment comes in the form of methods of examining affective factors in learning. Such assessment is quite different from looking at and planning linguistic aspects of acquisition. It requires looking at oneself through a psychological lens and may not differ greatly from self-assessment across a number of subject-matter areas or for any set of personal skills. When learners resolve to assess and improve motivation, to gauge and lower their own anxiety, to find mental or emotional obstacles to learning and then plan to overcome those barriers, an all-important socioaffective domain is invoked.

5. Student-generated tests.

A final type of assessment that is not usually classified strictly as self- or peer-assessment is the technique of engaging students in the process of constructing tests themselves. The traditional view of what a test is would never allow students to engage in test construction, but student-generated tests can be productive, intrinsically motivating, autonomy-building processes. Gorsuch (1998) found that student-generated quiz items transformed routine weekly quizzes into a collaborative and fulfilling experience. Students in small groups were directed to create content questions on their reading passages and to collectively choose six vocabulary items for inclusion on the quiz. The process of creating questions and choosing lexical items served as a more powerful reinforcement of the reading than any teacher-designed quiz could ever be.
To add further interest, Gorsuch directed students to keep records of their own scores to plot their progress through the term. Murphey (1995), another champion of self- and peer-generated tests, successfully employed the technique of directing students to generate their own lists of words, grammatical concepts, and content that they think are important over the course of a unit. The list is synthesized by Murphey into a list for review, and all items on the test come from the list. Students -thereby have' a; voice in determining the content of tests. On other occasions, Murphey has used what he calls "interactive pair tests" in which students assess each other using a set of quiz items. One student's response aptly summarized the impact of this technique:

We had a test today. But it was not a test, because we would study for it beforehand. I have someone question to my partner and my partner gave me some question . and we students decided what grade we should get. I hate tests, but I like this kind of test, so please dont give us a surprise test. I think, that kind of test that we did today is more useful for me than a surprise test because I study for it.

Guidelines for Self- and Peer-Assessment
Self- and peer-assessment are among the best possible formative types of assessment and possibly the most rewarding, but they must be carefully designed and administered for them to reach their potential. Four guidelines will help teachers bring this intrinsically motivating task into the classroom successfully.

Tell students the purpose ofthe assessment
Define the task(s) clearly.
Encourage impartial evaluation ofperformance or ability.
Ensure benefictal washback through follow-up tasks.

A Taxonomy of Self and Peer Assessment Tasks

To sum up the possibilities for self- and peer-assessment, it is helpful to consider a variety of tasks within each of the four skills.

Self- and peer-assessment tasks

Listening Tasks
listening to TV or radio broadcasts a.nd checking comprehension with a partner
listening to bilingual versions of a broadcast and checking comprehension
asking when you don't understand something in pair or group work
listening to an academic lecture and checking yourself on a "quiz" of the content
setting goals for creating/increasing opportunities for listening
Speaking Tasks
filling out student self-checklists and questionnaires
using peer checklists and questionnaires
rating someone's oral presentation (holistically)
detecting pronunciation or grammar errors On a self-recording
asking others for confirmation checks in conversational settings
setting goals for creating/increasing opportunities for speaking
Reading Tasks
reading passages with self-check comprehension questions following
reading and checking comprehension with a partner
taking vocabulary quizzes
taking grammar and vocabulary quizzes on the Internet
conducting self-assessment of reading habits
setting goals for creating/increasing opportunities for reading
Writing Tasks
revising written work on your own
revising written work with a peer (peer editing)
proof reading
using journal writing for reflection, assessment, and goal-setting
setting goals for creating/increasing opportunities for writing

.An evaluation of self- and peer-assessment according to our classic principles of assessment yields a pattern that is quite consistent with other alternatives to assessment that have been analyzed in this chapter. Practicality can achieve a moderate level with such procedures as checklists· and questionnaires, while reliability risks remaining at a low level, given the variation within and across learners. Once students accept the notion that they can legitimately assess themselves, then face validity can be raised from what might otherwise be a low level. Adherence to course objectives will maintain a high degree of content validity. Authenticity and washback both have very high potential because students are centering on their own linguistic needs and are receiving useful feedback.

Source :

Brown, H. G. (2004) Language Assessment : Principle and Classroom Practice.New York : Longman

Senin, 06 April 2020

SUMMARY STANDARDS-BASED ASSESSMENT

In the previous chapter, you saw that a standardized test is an assessment instrument for which there are uniform procedll!es for administration, design, scoring, and reporting.it is also a procedure that through repeated administration and ongoing research:demonstrates criterion and construct validity. But a third, and perhaps the most important, elenlent of standardized testing is the presupposition of aaccepted set of standards on which to base the procedure.
A history of standardized testing in the United States reveals that during most of the decades in the middle of the twentieth century, standardized tests enjoyed a popularity and growth that was almost unchallenged. Toward the end of the twentieth century, such claims began to be challenged on all fronts (see Medina & Neill, 1990; Kohn, 2000), and at the vanguard of those challenges were the teachers of those millions of children. Teachers saw not only possible inequity in such tests but a disparity between the content and task of the test and what they were teaching in their classes. Were those tests accurate measures of achievement and success in the specified domains? Were those efficient, well-researched instruments based on carefully framed, comprehensive, validated standards of achievement? For the most part, they were not.
As educators became aware of this weakness, we saw the advent of a movement to establish standards on which students of all ages and subject-matter areas olight be assessed. Appropriately, the last 20 years have seen a mushrooming of efforts on the pa~ of educational leaders to base the plethora of school-administered standardized tests on clearly specified criteria within each content area being measured or example,most departments ofeducation at the state level in the United States have now specified (or are in the process ofspecifying) the appropriate standards (that is, criteria or objectives) for each grade level (kinaergarten to grade 12) and each content area (math, language, Sciences , arts). construction of such standards makes possible a concordance between standardized test specifications and the goals and objectives of educational programs. And so, in the broad domain of language arts, teachers and educational administrators began the painstaking process of carefully examining existing curricular goals, conducting needs assessments among students, and designing appropriate assessments of those standards.

EID STANDARDS

In creating such "benchmarks for accountability" (O'Malley & Valdez Pierce, 1996), there is a tremendous responsibility to carry out a comprehensive study of a number of domains:

literally thousands of categories oflanguage ranging from phonology at one end of a continuum to discourse, pragmatics, functional, and sociolinguistic elements at the other end;
specification of what ELD students~ needs are, at thirteen different gradelevels, for succeeding in their academic and social development;
a consideration of what is a realistic number and scope ofstandards to be included within a given curriculum;
'a separate set of standards (qualifications, expertise, training)jor teachers to teach ELD students successfully in their classrooms; and
a thorough analysis of the means available to assess student attainment of those standards .

Standards setting is a global challenge. In many non-English-speaking countries, English is now a required subject starting as early as the first grade in some countries and by the seventh grade in virtually every country worldwide. In Japan and Korea, for example, a "com1nunicative" curriculum in English is required from third grade onward. California, with one of the largest populations of second language learners in the United States, was one of the fast states to generate standards. Other states follow similar sets of standards. Students must be prepared to use English effectively in social and academic settings. Listening and speaking skills provide one of the most important building blocks for the foundation of second language
acquisition. These skills are essential for developing reading and writing skills in English; however, to· ensure that ELLs acquire ·proficiency in English listening speaking, reading, and writing, it is important that students receive read.i.ng and writing instruction in English while they are developing fluency in oral English To ensure that Ells develop the skills and concepts needed to demonstrate proficiency on the English-Language Arts (EIA) listening and Speaking standards, teachers must concurrently use both the ELD and the EIA standards. Ells achieving at the Advanced ELD proficiency level should demonstrate proficiency on the EIA standards for their own and all prior grade levels. This means that all prerequisite skills needed to achieve the EIA standards must be learned by the Early Advanced EID proficiency level. Ells must develop both fluency in English and proficiency on the EIA standards.

ELD ASSESSMENT

The development of standards obviously implies the responsibility for correctly assessing their attainment. As standarcfs..based education became more accepted in the 1990s, many school systems across the United States found that the standardized tests of past decades were not in line with newly developed standards. Thus began the interactive process not only of developing standards but also of creating standards-based assessments.
The process of administering a comprehensive, valid, and fair assessment of ELD students continues to be perfected. Stringent budgets within departments of education worldwide predispose many in decision-making positions to rely on traditional standardized tests for ELD assessment, but rays of hope lie in the exploration of more student-centered approaches to learner assessment. Stack, Stack, and Fern (2002), for example, reported on a portfolio assessment system in the San Francisco Unified School District called the Language and Literacy Assessment Rubric (LALAR), in which multiple forms of evidence of students' work are collected. Teachers observe students year-round and record their observations on t scannable at forms The use of the l ALAR system provides useful data on students' performance at· all grade levels for oral Production and for reading and writing performance in elementary and middle school grades (1-8). Further research is ongoing for highschool levels (grades 9-12).

CASAS AND SCANS

At the higher levels of education (colleges, community colleges, adult schools, language schools, and workplace settings), standards-based assessment systems have also had an enormous impact.The Comprehensive Adult StudentASsessment System (CASAS), for example, is a program designed to provide broadly based assessments of ESL curricula across the United States. The system includes more than 80 standardized assessment instruments used to place learners in programs, diagnose learners' needs, monitor progress, and certify mastery of functional basic skills. Secretary's Commission in Achieving Necessary Skills (SCANS), outlines competencies necessary for language in the workplace. The competencies
cover language functions in terms of

resources (allocating time, materials, staff, etc.),
interpersonal skills, teamwork, customer service, etc.,
information processing, evaluating data, organizing fues, etc.,
systems (e.g., understanding social and organizational systems), and
I technology use and application.

These five competencies are acquired and maintained through training in the basic skills (reading, writing, listening, speaking); thinking skills such as reasoning ancreative problem solving; and personal qualities, such as self-esteem and Sociability.

TEACHER STANDARDS

Kuhlman (2001) emphasized the importance of teacher standards in three domains:
1. linguistics and language development
2. culture and the interrelationship between language and culture
3. planning and managing instruction
The International Association of Teachers of English to Speakers of Other Languages(TESOL). TESOL's standards committee advocates penormance-based assessment of teachers for the following reasons:

Teachers can demonstrate the standards in their teaching.
Teaching can be assessed through what teachers do with their learners in their classrooms or virtual classrooms (their performance) .
This performance can be detailed in what are called "indicators": examples of evidence that the teacher can meet a part of a standard.
The processes used to assess teachers need to draw on complex evidence of perfomance. In other words, indicators are more tha~simple "how to" statements.
Performance-based assessment of the standards is an in"tegrated system. It is neither a checklist nor a series of discrete assessments.
Each assessment within the system has performance criteria against which the performance can be measured.
Performance criteria identify to what extent the teacher meets the standard.
Student learning is at the heart of the teachers performance.

The standards-based approach to teaching and assessment presents the profession with many challenges. However thorny those issues are, the social consequence of this movement cannot be ignored, especially in terms of student assessment.

THE CONSEQUENCES OF STANDARDS-BASED
AND STANDARDIZED TESTING

One of those stories, as told by Russell Webster (personal communication), illustrates the high-stakes nature of this globally marketed standardized test. A ring of enterprising "business" persons organized a group of pretend testtakers to take the TOEFL in an early time zone on a given day. (In those days . .the tests were administered everywhere on the same day across a riuhjber of time zones. So TOEFL administrations ended in some East Asian countries as much as 8 to 14 hours they began in the United States.)
The task of each test-taking "spy" was not to pass the TOEFL, but to memorize a subset of items, including the stimulus and all of the multiple-choice options, and immediately upon leaving the exam to telephone those items to the central organizers. As the memorized subsections were called in, a complete form of the TOEFL was quickly reconstructed. The organizers had employed expert consultants to generate the correct response for each item, thereby re-creating the test items and their correct answers! For an outrageous price of many thousands of dollars, prearranged buyers of the results were given copies ~of the test items and correct responses with a few hours to spare before entering a test administration in theWestern Hemisphere. The story of how this underhanded group of entrepreneurs were caught and brought to justice is a long tale of blockbuster spy-novel proportions involving the FBI and, eventually, international investigators. But the story shows the huge gate-keeping role of tests like the TOEFL and the high price that sonle were willing to pay to gain access to a university in the United States and the visa that accompanied it.

Consider the fact that correlations between TOEFL scores and academic performance in the ftrst year of college are impressively high (Henning & Cascallar , 1992). Are tests that lack a high level of content validity appropriate assessments of ability? A good deal of research says yes to this question as well. A study of the correlation Of TOEFL results with oral and written production, for example, showed that years before TOEFL's current use of an essay and oral production section, significant positive correlations were obtained between all subsections of the TOEFL and independent direct measures of oral and written production (Henning & Cascallar, 1992).Test promoters commonly use such findings to support: their claims for the efficacy of their tests. But several nagging, persistent issues emerge from the arguments about the consequences of standardized testing. Consider the following interrelated questions:1
. Should the educational and business world be satisfied with high but· not perfectnprobabilities of accurately assessing test-takers on standardized instruments?

In other words, what about the small m who are not fairly assessed?
Regardless of construct validation studies and correlation statistics, should further types of performance be elicited in order to get a more comprehensive picture of the test-taker?
Does the proliferation of standardized tests throughout a young person's life give rise to test-driven curricula, diverting the attention of students from creative or personal interests and in-depth pursuits?
Is the standardized test industry in effect promoting a cultural, social, and political agenda that maintains existing power structures by assuring opportunity to an elite (wealthy) class of people?

Test Bias

It is no secret that standardized tests involve a number ot types of test bias. That bias comes in many forms: language,culture, race, gender,and learning styles (Medina & Neill, 1990). The National Center for Fair and Open Testing, in its bimonthly newsletter Fair Test, every year offers dozens of instances of claims of test bias from teachers, parents, students, and legal consultants (see their website: www.fairtest.org ). For example, reading selections in standardized tests may use a passage from a literary piece that reflects a middle-class, white, Anglo-Saxon norm. Lectures used following prompt for an essay in "general writing ability"on the IELTS:
You rent a house through an agency.The heating system has stopped working. You
phoned the agency a week ago, but it has still not been mended. Write a letter to
the agency. :mc plain the situation and tell them what you want them to do about it.

In an era when we seek to recognize the multiple intelligences present within every student (Gardner, 1983, 1999), is it not likely that standardized tests promote logical-mathematical and verbal-linguistic intelligences to the virtual exclusion of the other contextualized, integrative intelligences? Only very recently have traditionally receptive tests begun to include written and oral production in their test battery a positive sign.

Test-Driven Learning and Teaching

Yet another consequence of standardized testing is the danger of test-driven learning and teaching. When students and other test-takers know that one smg1e Measure of performance will determine their lives, they are less likely to take a positive attitude toward learning. The motives in such a context are almost exclusively Extrinsic C, with little likelihood of stirring intrinsic interests. Test-driven learning is a worldwide issue. In Japan, Korea, and Taiwan, to name just a few countries, students approaching their last year of secondary school focus obsessively on passing the year-end college entrance examination, a major section of which is English (Kuba, 2002). Little attention is given to any topic or task that does not directly contribute to passing that one exam. In the United States, high school seniors are forced to give almost as much attention to SAT scores. Teachers also get caught up in the wave of test-driven systelus. In Florida, elementary school teachers were recently promised, cash bonuses of $100 per student as reward for their schools' high performance on the state-mandated grade-level test, the Florida Comprehensive Achievement Exam (Fair Test, 2000).The effect~ of ~his policy was undue pressure on teachers to make sure their students excelled in the exam, possibly at the risk of ignoring other objectives in their curricula. But a further, ultimately more serious effect was to punish schools inlower-socioeconomic neighborhoods. A teacher in such a school might actually
be a superb teacher, and that teacher's students might make excellent progress through the school year, but because of the test-driven policy, the teacher would receive no reward at all.

ETHICAL ISSUES: CRITICAL lANGUAGE TESTING
One of the by-products of a rapidly growing testing industry is the danger of an abuse of power. Shohamy (1997) and others (such as Spolsky, 1997; Ramp-Lyons, 2001) see the
ethics of testing as an extension of what educators call critical pedagogy , or more precisely in this case, critical language testing (see TBP, Chapter 23, for some comments on critical language pedagogy in general). Proponents of a critical approach to language testing claim that large-scale standardized testing is not an Un biased process, but rather is the "agent of cultural, social, political, educational, and ideological agendas that shape the lives of individual participants, teachers, and learners" (Shohamy, 1997, p. 3).The issues of critical language testing are numerous:

Psychometric traditions are challenged by interpretive, individualized proceduresfor predicting success and evaluating ability.
Test designers have a responsibility to offer multiple modes of performance to account for varying styles and abilities among test-takers.
Tests are deeply embedded in culture and ideology.
Test-takers are political subjects in a political context.

These issues are not new. More than a century ago, British educator E Y. Edgeworth (1888) challenged the potential inaccuracy of contemporary qualifying exam.inations for university entrance. In recent years, the debate has heated up. In 1997, an entire issue of the journal Language Testing was devoted to questions about ethics in language testing. One of the problems highlighted by the push for critical language testing is the widespread conviction, already alluded to aBove, that carefully constructed standardized designed by reputable test manufacturers are infallible in their predictive validity. One standardized test is deemed to be sufficient; follow-up measures are considered to be too costly.
Tests promote the notion that answers to real-world problems have unambiguous right and wrong answers with no shades of gray. A corollary to the latter is that tests presume to reflect an appropriate core of common knowledge, such as the competencies reflected in the standards discussed earlier in this chapter. Logic would therefore dictate that the test-taker must buy in to such a system of beliefs in order to make the cut.

Source :
Brown, H.G. (2004) Language Assessment : Principle and classroom Practice. New York : Longman.

Kamis, 02 April 2020

ASSIGMENT 6

Figure 3.1 depicts various modes of elicitation and response. Are there other modes of elicitation that could be included in such a chart? Justify your additions with an example of each.

Answer :

Elicitation mode and responses :

a. Student speaking
example : students convey their ideas while learning and also assess pronoun with good grammar .and tell stories in front of class.

responses : students are able to understand covey their ideas well, and use good pronouns.

b. Student writing

Example : students write introduction essay using good and correct grammar rulers and students also write any text with regular grammar.

Responses : able to write essay and text well and use good grammar.

Tugas language assessment