Kamis, 26 Maret 2020

SUMMARY DESIGNING CLASSROOM LANGUAGE TESTS

TEST TYPES

The first task you will face in designing a test for your students is to determine the purpose for the test. Defining your purpose will help you choose the right kind of test, and it will also help you to focus on the specific objectives of the test.

 Language Aptitude Tests

Language aptitude test is designed to measure capacity or general ability to learn a foreign language and ultimate success in that undertaking.

Tasks in the Modern Language Aptitude Test

Number learning: Examinees must learn a set of numbers through aural input and then discriminate different combinations of those numbers.
Phonetic script: Examinees must learn a set of correspondences between speech sounds and phonetic symbols.
Spelling clues: Examinees must read words that are spelled somewhat phonetically, and then select from a list the one word whose meaning I closest to the "disguised" word.
Words in sentences: Examinees are given a key word in a sentence and are then asked to select a word in a second sentence that performs the same grammatical function as the key word.
 Paired associates: Examinees must quickly J earn  a set of  vocabuJary words from another language and memorize their English meanings.

Any test that claims to predict success in learning a language is undoubtedly flawed because we now know that with appropriate self-knowledge, active strategic involvement in learning, and/or strategies-based instruction, virtually everyone can succeed eventually.

Proficiency Tests

  A proficiency test is not limited to anyone course, curriculwn, or single skill in the language; rather, it tests overall ability. Proficiency tests have traditionally consisted of standardized multiple-choice items on grammar, vocabulary, reading comprehension, and aural comprehension. typical example of a standardized proficiency test is the Test of English as a Foreign Language (TOEFL ~ produced by the Educational Testing Service. The TOEFL is used by more than a thousand institutions of higher education in the United States an indicator of a prospective student's ability to undertake academic work in an English-speaking milieu.
key issue in testing proficiency is how the constructs of language ability are specified. The tasks that test-takers are required to perform must be legitimate samples of English language use in a defined context. Creating these tasks and validating them with research is a time-consuming and costly process.

Placement Tests

Certam profiCiency tests can act in the role of placement tests, the purpose of which is to place a student into a particular level or section of a language curriculum or school. A placement test usually, but not always, includes a sampling of the material to be covered in the various courses in a curriculum; a student's performance on the test should indicate the point at which the student will find mate- Vrial neither too easy nor too difficult but appropriately challenging.
Placement tests come in many varieties: assessing comprehension and production, responding through written and oral performance, open-ended and limited responses, selection (e.g., multiple-choice) and gap.filling formats, depending on the nature of a program and its needs. Some programs simply use existing standardized proficiency tests because of their obvious advantage in practicality-cost, speed in scoring, and efficient reporting of results.
In a recent one-month special summer program in English conversation and writing at San Francisco State University, 30 students were to be placed into one oftwo sections. The ultimate objective of the placement test (consisting of a five-minute oral interview and an essay-writing task) was to find a performance-based means to divide the srudents evenly into two sections.

Diagnostic Tests

A diagnostic test is designed to diagnose specified aspects of a language. A test in pronunciation, for example, might diagnose the phonological features of English that are difficult for learn. Usually, such tests offer a checklist of features for the administrator (often the teacher) to use in pinpointing difficulties. A writing diagnostic would elicit'a writing sample from students that would allow the teacher to identify those rhetorical and linguistic features on which the course needed to focus special attention. features that have already been taught; diagnostic tests should elicit information on what students need to work on in the future. Therefore, a diagnostic test will typically offer more detailed subcategorized information on the learner.

Achievement Tests

An achievement test is related directly to classroom lessons, units, or even a total curriculum. Achievement tests are (or should be) limited to particular material addressed in a curriculum within a particular time frame and are offered after a course haS focused on the objectives in question. Achievement tests can also serve the diagnostic role of indicating what a student needs to continue to work on in the future, but the primary role of an achievement test is to determine whether course objectives have been met-and appropriate knowledge and skills acquired-by the end of a period of instruction.

The specifications for an achievement test should be determined by
  •  the objectives of the lesson, unit, or   course being assessed,
  • the relative importance (or 'weight) assigned to each objective,
  • the tasks employed in classroom lessons during the unit of time,
  • practicality issues, such as the tinle frame for the test and turnaround time, and
  • the extent to which the test structure lends itself to formative washback.

SOME PRACTICAL STEPS TO TEST CONSTRUCTION

You may think that every test you devise must be a wonderfully innovative instrument that will gamer the accolades of your colleagues and the admiration of your students. Not so. First, new and innovative testing formats take a lot of effort to design and a long time to refme through trial and error. Second, traditional testing techniques can, with a little creativity, conform to the spirit of an interactive, communicative language curriculum. Your best tack as a new teacher is to work within the guidelines of accepted, known, traditional testing techniques. some practical steps in constructing classroom tests.


Assessing Clear, Unambiguous Objectives

In addition to knowing the purpose of the test you're creating, you need to know as specifically as possible what it is you want to test. If you're lucky, someone will have already stated those objectives clearly in performance terms. If you're a little less fortunate, you may have to go back through a unit and formulate them yourself. Let's say you have been teaching a unit in a lowintermediate integrated-skills class with an emphasis on social conversation, andinvolving some reading and writing, that includes the objectives outlined below, either stated already or .as you have reframed them. Notice that each objective is stated in terms of the performance elicited and the target linguistic domain.

Drawing Up Test Specifications

Test specifications for classroom use can be a simple and practical outline of your test. These informal, classroom-oriented specifications give you an indication of.\
the topics (objectives) you will cover,
the implied elicitation and response formats for items,
the number of items in each section, and
the time to be allocated for each.
Notice that three of the-six.-possible speaking objectives are not directly tested. This
decision may be based on the time you devoted to these objectives, but more likely
on the feasibility of testing that objective or simply on the fmite number of minutes
available to administer the test.

Devising Test Tasks

Your oral interview comes frrst, and so you draft questions to conform to the accepted pattern of oral interviews.  You begin and end with non scored items ( wann-up and wind down) designed to set students at ease, and then sandwich between them items intended to test the objective (level cbeck) and a little beyond (Probe). eally,. you would try out all your tests on students not in your class before actually administering the tests. But in our daily classroom teaching, the tryout phase is almost impossible. Alternatively, you could enlist the aid of a colleague to look over your test. And so you must do what you can to bring to your students an instrument that is, to the best of your ability, practical and reliable.
In he final revision of your test, imagine that you· are a student taking the test. Go through each set of directions and all items slowly and deliberately. Time yourself. (Often we underestimate the time students will need to complete a test.) If the test should be shortened or lengthened, make the. necessary adjustments. Make sure your test is neat and-uncluttered on the page, reflecting all the care and precision you have put into its construction. If there is an audio component, as there is in our hypothetical test, make sure that the sCript is clear, that,your,voice ,and any other voices are : clear, and that the audio equipment is in working order before starting the test.
Designing Multiple-Choice Test Items

In the sample achievement test above, two of the five components (both of the listeningmsections) specified a multiple-choice format for items. This was a bold step to take. Multiple -choice items, which may appear to be the Simplest kind of item to construct, are extremely difficult to design correctly. Hughes (2003, pp. 76-78) cautions against a number of weaknesses of multiple-choice items:

  • The technique tests only recognition knowledge.
  • Guessing may have a considern6Ie effect on test scores.
  • The-'technique severely restricts ~ what can be tested.
  • It is very difficult to write successful items.
  •  Washback   ,may be harmful.
  • Cheating may be facilitated.



  1. Design each item to measure a specific objective.
  2.  State both stem and options as simply and directly as possible.
  3. Make certain that the intended answer is clearly the only correct one.
  4. Use item indices to accept, discard, or revise items.


  • Itemfacility (or IF) is the extent to which an item is easy or difficult for the proposed group of test-takers.
  •  Item discrimination (10) is the extent to which an item differentiates between high- and low-ability test-takers. 
  • Distractor effu:iency is one more important measure ofa multiple-choice item's value in a test, and one that is related to item discrimination.


SCORING, GRADING, AND GIVING FEEDBACK
SCORING

Distractor effu:iency is one more important measure ofa multiple-choice item's value in a test, and one that is related to item discrimination. The integrated-skills class that we have been using as an example focuses on listening and speaking skills with some attention to reading and writing. Three ofyour nine objectives target reading and writing skills. Because oral production. is a driving force in your overall objectives, you decide to place more weight on the speaking (oral interview) section than on the other three sections:five minutes is actually  a long time to spend in a one-on-one situation a student, and some significant information can b'e extracted from such a session.
Your next task is to ~sign scoring for each item. This may take a little numerical common sense, but it doesn't require a degree in math. To make matters simple, you decide to have a 100-point test in which


  • the listening and reading items are each worth 2 pOints.
  • the oral interview will yield four scores ranging from 5 to I" reflecting fluency, prosodic features, accuracy of the target grammatical objectives, and discourse appropriateness. To weight these scores appropriately, you  will double each individual score and then add them together for a possible total sco of 40.
  • the writing sample has two scores: one for grammar/mechanics (including the correct use of so and because) and one for overall effectiveness of the message, each ranging fronl 5 to 1. Again, to achieve the correct weight for writing, you will double each score and add them, so the possible total is 20 points.


Grading

Your first thought might be that assigning grades to student performance on this test would be easy: just give an "A" for 90-100 percent, a "B" for 80-89 percent, and so on. Not so fast.
How you assign letter grades to this test is a product of :

  • the country, culture, and context of this English classroom, institutional expectations (most of them unwritten),
  • explicit and impliCit definitions of grades that you have set forth, the relationship you have established with this class, and
  • student expectations that have been engendered in previous tests andquizzes in this class.


Giving Feedback

section on scoring and grading would not be complete without some consideration of the forms in which you will offer feedback to your students, feedback that you want to become beneficial washback. You might choose to return the test to,:the student with one of, or a combination of, any of the possibilities below:
1. a letter grade
2. a total score
3. four subscores (speaking, listening, reading, writing) {~ for the listening and reading sections

  •  an indication of correct/incorrect responses
  •  marginal comments

5. for the oral interview

  • scores for each element being rated
  • a checklist of areas needing work
  • oral feedback after the interview
  • a post-interview conference to go over the results.

6. on the essay

  • scores for each element being rated
  • a checklist of areas ne~ding work
  • marginal and end-of-essay comments, suggestions.
  • a post-test conference to go over work
  • a self-assessment.

7. on all or selected parts of the test, peer checking of results
8~ a whole-class discussion of results of the test.
9. individual conferences with each student -toreview the whole test.


Source :
Brown, H. G. (2004) Language Assessment ; Principle and classroom Practice. New York :Longman.

Kamis, 19 Maret 2020

SUMMARY OF PRACTICALLY, RELIABILITY, AND VALIDITY

1.PRATICALITY

An effective test is practical is meant that  is not excessively expensive,stays within appropriate time constrains, is relatively easy to administer, and has a scoring/evaluation procedure, that is specific and time-efficient. A test of language proficiency that takes a student five hours to complete is impractical-it consumes more time(and money) than necessary to accomplish its objective. A test that can be scored only by computer is impractical if the test takes  place a thousand miles away the nearest computer. In classroom-based testing, time is almost always a crucial practicality factor for busy teachers with too few hours in the day.

2.REABILITY
A reliable test is consistent and dependable. If you give the same student or matched student on two different occasions, the should yield similar. reliability of a test may best be addressed by considering mnumber of factors that may contribute to the unreliability of a test.  A reliable test is consistent in its conditions across two more administrations; gives clear directions for scroring/evaluation; has uniform rubrics for scoring /evaluation ;lends itself to consistent application of those rubrics by the scorer ;contains items/tasks that are unambiguous to the task-taker.

3. VALIDITY

A valid test of reading ability actually measure reading ability. -not 20/20 vision, nor previous knowledge in a subject, nor some other variable of questionable relevance. How is the validity of a test established? There is no final, absolute measure of validity, but several different kinds of evidence may be invoked in support. In some cases, it may be appropriate to ex3.mine the extent to which a test calls for performance that matches that of the course or unit of study being testedAnother way of understanding content validity is to consider the difference between direct and indirect testing. Direct .testing involves the test-taker in actually perfonning the target task. Validity is a complex concept, yet it is indispensable to the teacher's understanding of what makes a good test. If in your language teaching you can attend to the practicality, reliability, and validity of tests of language, whether those tests are classroom tests related to a part of a lesson; fmal exams, or profiCiency tests, then you are well on the way to making accurate judgments about the competence of the learners with whom you are working.

4.AUTHENTICITY

Bachman and Palmer, (1996, p. 23) define authenticity as "the degree of correspondence
of the characteristics of a given language test task to the features of a target language task," and then suggest an agenda for identifying those target language tasks and for transforming them into valid test items.

In a test, authenticity may be 'present . in the following ways:
The language in the test is as natural as possible.
Items are contextualized rather than isolated.
Topics are meaningful (relevant, interesting) for the learner.
Some thematic organization to items is provided, such as through a story line
or episode.
Tasks represent, or closely approximate, real-world tasks.

5.WASHBACK

In large-scale assessment, washback generally refers to the effects the tests have on instruction in terms of how students prepare for the test. The challenge to teachers is to create classroom tests that serve as learning devices through which washback is achieved. washback also implies that students have ready access to you to discuss the feedback and evaluation you have given. While you almost certainly have known teachers with whom you wouldn't dare argue about a grade, an interactive, cooperative, collaborative classroom nevertheless can promote an atmosphere of dialogue between students and teachers regarding evaluative judgments. For learning.to continue, students need to have a chance to feed back on your feedback, to seek clarification of any issues that are fuzzy, and to set new and appropriate goals for themselves for the days and weeks ahead..





Source :
 Brown, H. G. (2004) Language Assessment : Principle and Classroom Practice.   
New York : Longman.

             

Assigment three

Practically, reability, and validity analysis on the Indonesian national examination 2018/2019.

1. PRACTICALLY

   Test are  pratical if they have easy and easy administration. In my opinion, this is very pratical because when the National examination students are given a lot of freesom to do easy ones first and are easily checked by the teachers, because questions like this already have the answer key.

  2. RELIABILITY

    The test must be reliability, that is to say, the question must related to the consistency or accuracy of the reliability test of test UN question being written and compared to other  UN questions such as mathematic, chemistry, and others because the facilitation of cities with villages is different from the same standard it is certain in cities with high scores because in the village lacks facilities.

3. VALIDITY

   This UN question is valid and correct because this UN was made by a pretty good special team, this test can recognize its target and show what should me measured.

SUMMARY ASSESSING GRAMMAR AND VOCABULARY

ASSESSING GRAMMAR Differing notions of ‘grammar’ for assessment Introduction The study of grammar has had a long and important role in the...