Testing in Language Programs by James Dean Brown, , available at Book Depository with free delivery worldwide. Performance-Based Assessment In language courses and programs around the world, test designers are now tackling this new and more student-centered. Testing in language programs (Chapter 8) By James Dean Brown Dr. Golshan 's class (loved it).
|Language:||English, Spanish, Dutch|
|Genre:||Children & Youth|
|ePub File Size:||19.37 MB|
|PDF File Size:||13.44 MB|
|Distribution:||Free* [*Regsitration Required]|
primary functions that tests serve in language programs, two functions that .. a program (as found in Brown ), the danger is that the groupings of. No part of this book may be reproduced or utilized in any form or. Sri Sarada Devi Entrance Encyclopedia of Hindu. TEFL courses in person and tutored those taking distance your lesson plan so that they can talk to you Putting Your Le Emergency Medicine.
Validity, for example, is certainly the most significant cardinal principle of assessment evaluation. Nur Hikmah rated it it was amazing Aug 12, Informal and Formal Assessment One way to begin untangling the lexical conundrum created by distinguishing among tests, assessment, and teaching is to distinguish between informal and formal assessment. How are standardized tests developed? You might decide to discard or improve some items with lower ID because you know they won't be as powerful an indicator of success on YOlir test. Other receptive item types include true-false questions and matching lists. Assessment is an integral part of the teaching-learning cycle.
In a , the modal sbould is ambiguous and the expected performance is not stated. In b , everyone can fulfill the act of "practicing"; no standards are stated or implied. For obvious reasons, c cannot be assessed. And d is really just a teacher's note on the type of activity to be used.
By specifying acceptable and unacceptable levels of performance, the goal can be tested. Are lesson objectives represented in tbeform of test specifications?
D n't let this word scare you. It simply means that a test should have a structure that follows logically from the lesson or unit you are testing. Some tests, of course, do not lend themselves to this kind of structure. We will return to the concept of test specs in the next chapter.
The content validity of an existing classroom test should be apparent in how the objectives of the unit being tested are represented in the form of the content of items, clusters of items, and item types. Do you clearly perceive the performance of test-takers as reflective of the classroom objectives? If so, and you can argue this, content validity has probably been achieved. In evaluating a classroom test, consider the extent to which before-, during-,.
Test-taking strategies Before the Test 1. Give students all the information you can about the test: Which topics will be the most important?
What kind of items wi II be on it? How long wi II it be? Encourage students to do a systematic review of material. Give them practice tests or exercises, if avai lable. Facilitate formation of a study group, If possible. Caution students to get a good night's rest before the test. Remind students to get to the classroom early. During the Test 1. After the test is distributed, tell students to look over the whole test quickly in order to get a good grasp of its different parts.
Remind them to mentally figure out how much time they will need for each part. Advise them to concentrate as carefully as possible. Warn students a few minutes before the end of the class period so that they can finish on time, proofread their answers, and catch careless errors.
When you return the test, include feedback on specific things the student did well, what he or she did not do well, and, if possible, the reasons for your comments. Advise students to pay careful attention in class to whatever you say about the test resu Its. Encou'rage questions from students. Advise students to pay special attention in the future to points on which they are weak.
Keep in mind that what comes before and after the test also contributes to its face validity. Good class preparation will give students a comfort level with the test, and good feedback-washback-will allow them to learn from it.
A re the test tasks as authentic as possible? Evaluate the extent to which a test is authentic by asking the following questions: Multiple-choice tasks--contextualized "Going To" 1. What this weekend? I'm not sure. Are you going to do b. You are goi ng to do c.
My friend Melissa and I a party. Would you like to come? I'd love to! What's it going to be? Who's go; ng to be? Where's it going to be? It is to be at Ruth's house. There are three countries I would like to visit. One is Italy. The other is New Zealand and other is Nepal. The others are New Zealand and Nepal. Others are New Zealand and Nepal. When I was twelve years old,1 used everyday. When Mr. Since the beginning of the year, I at Millennium Industries.
When Mona broke her leg, she asked her husband her to work. The conversation is one that might occur in the real world, even if with a little less formality. The sequence of items in the decontextualized tasks takes the test-taker into five different topic areas with no context for any.
Each sentence is likely to be written or spoken in the real world, but not in that sequence.
Given the constraints of a multiple-choice format, on a measure of authenticity I would say the first excerpt is "good" and the second excerpt is only "fair. The design of an effective test should point the. A test that achieves content validity demonstrates relevance to the curriculum in question and thereby sets the stage for washback.
Other evidence of washback may be less visible from an examination of the test itself. Here again, what happens before and after the test is critical. Preparation time before the test can contribute to washback since the learner is reviewing and focusing in a potentially broader way on the objectives in question. By spending classroom time after the test reviewing the content, students discover their areas of strength and weakness. The key is to play down the "Whew, I'm glad that's over" feeling that students are likely to have, and play up the learning that can now take place from their knowledge of the results.
See also Chapter This can be particularly effective for writing performance: Peer discussion ofthe test results may also be an alternative to simply listening to the teacher tell evetyone what they got right and wrong and why. Journal writing may offer students a specific place to recor1 their feelings, what they learned, and their resolutions for future effort.
Validity is of course always the fmal arbiter. And remember, too, that these principles, important as they are, are not the only considerations in evaluating or making an effective test.
Leave some space for other factors to enter in. In the next chapter, the focus is on how to design a test. I Individual work; G Group or pair work; e Whole-class discussion. G A checklist for gauging practicality is provided 'on page In your group, construct a similar checklist for either face validity, authenticity, or washback, as assigned to your group. Present your lists t. O the class and, in 'the case of multiple groups, synthesize fmdings into one checklist for each principle.
In a group, discuss the connection between washback and the above-named general principles of language learning and teaching. Come up with some specific examples for each. Use your best intuition to supply these evaluations, even though you don't have complete information on each context. Standardized multiple-choice proficiency test, no oral or written production.
S receives a report form listing a total score and part scores for listening, grammar, proofreading, and reading comprehension. Scenario 2: Timed impromptu test of written English TWE. S receives a report form listing one holistic score ranging between 0 and 6. I Scenario 3: One-on-one oral interview to assess overall oral production ability.
S receives one holistic score ranging between 0 and 5. Scenario 4: Multiple-choice listening quiz provided by a textbook with taped prompts, covering the content of a three-week module of a course. S receives a total score from T with no indication of which items were correctlincorrect. Scenario 5: S is given a sheet with 10 vocabulary items and directed to write 10 sentences using each word. T marks each item as acceptable!
S reads a passage of three paragraphs and responds to six multiple-choice general comprehension items. Scenario 7: S gives a 5-minute prepared oral presentation in class. T evaluates by filling in a rating sheet indicating S's success in delivery, Scenario 8: S listens to a lS-minute video lecture and takes notes.
T makes individual comments on each S's notes. Scenario 9: S writes a take-home overnight one-page essay on an assigned topic. Treads paper and comments on oiganization and content only, and returns essay to S for a subsequent draft. Scenario S assembles a portfolio of materials over a semester-long course.
T conferences with S on the portfol io at the end of the semester. S writes a dialogue journal over the course of a semester. T comments on entries every two weeks. In pairs, write two or three other potential lesson objectives addressing a proficiency level and skill area as assigned to your pair that you think are effective. Present them to the rest of the class for analysis and evaluation. Conduct a a brief interview with the teacher before the test, b an observation if possible of the actual administration of the assessment, and c a short interview with the teacher-after the fact to form your data.
Language testing and assessment part. Language Teaching, 34, Language testing and assessment part 2. Language Teaching, 35, Part 1 covers such issues as ethics and politics in language testing, standards-based assessment, computer-based testing, self-assessment, and other alternatives in testing.
Part 2 focuses on assessment of' the' skills of reading, listening, speaking, and writing, along with grammar and vocabulary. Hughes, Arthur. Testing for language teachers. Second Edition. Cambridge University Press. A widely used training manual for teachers, Hughes's book contains useful information on basic principles and techniques for.
You now have a sense of where tests belong in the larger domain of assessment. You have sorted through differences between fonnal and informal tests, formative and summative tests, and nonn- and criterion-referenced tests.. You have traced some of the historical lines of thought in the field of language ass.
And you should now possess a few tools with which you can evaluate the effectiveness of a classroom test.
In this chapter, you will draw on those foundations and tools to begin the process of designing tests or revising existing tests. To start that process, you need to ask some critical questions: U7bat is the purpose of the test? Why am I creating this test or why was it created by someone else? For an evaluation of overall-proficiency? ToplaGe students into a course? To measure achievement within a course? Once you have established the major purpose of a test, you can determine its objectives.
U7bat are the objectives ofthe test? What specifically am I trying to find out? Included here are decisions about what language abilities are to be assessed. How will the test tasks be selected and the separate items arranged?
They should also achieve content validity by presenting tasks that mirror those of the course or segment thereof being assessed. Further, they should be able to be evaluated reliably by the teacher or scorer. The tasks themselves should strive for authenticity, and the progression of tasks ought to be biased for best performance.
Tests vary in the form and function of feedback, depending on their purpose. For every test, the way results are reported is an important consideration. Under some circumstances a letter grade or a holistic score may be appropriate; other circumstances may require that a teacher offer substantive washback to the learner. These five questions should form the basis of your approach to designing tests for your classroom. Defining your purpose will help you choose the right kind of test, and it will also help you to focus on the specific objectives of the test.
Language Aptitude Tests Ont:. Language aptitude tests are ostensibly designed to apply to the classroom learning of any language. The MIAT, for example, consists of five different tasks. Tasks in the Modern Language Aptitude Test 1. Number learning: Examinees must learn a set of numbers through aural input and then discriminate different combinations of those numbers. Phonetic script: Examinees must learn a set of correspondences between speech sounds and phonetic symbols. Spelling clues: Examinees must read words that are spelled somewhat phonetically, and then select from a list the one word whose meaning is closest to the "disguised" word.
Words in sentences: Examinees are given a key word in a sentence and are then asked to select a word in a second sentence that performs the same grammatical function as the key word. Paired associates: Examinees must quickly Jearn a set of vocabuJary words from another language and memorize their English meanings.
Those correlatiofiS,however, presuppose a foreign language course in which success is measured by similar processes of mimicry, memorization, and puzzle-solving.
There is no research to show unequivocally that those kinds of tasks predict communicative success in a language, especially untutored acquisition of the language. Because of this limitation, standardized aptitude tests.
A further discussion of language aptitude can be found in PUT, Chapter 4. Proficiency Tests " If your aim is to test global competence in a language, then you are, in conventional terminology, testing proficiency. A proficiency test is not limited to anyone course, curriculwn, or single skill in the language; rather, it tests overall ability.
ProfiCiency tests have traditionally consisted of standardized multiple-choice items on grammar, vocabulary, reading comprehension, and aural comprehension. Sometimes a sample of writing is added, and more recent tests also include oral production perf9rmance.
As noted in the previous chapter, such tests often have content validity weaknesses, but several decades of construct validation research have brought us much closer to constructing successful communicative proficiency tests.
ProficienLY tests are almost always summative and norm-referenced. They pra. And beca: The TOEF The TOEFL consists of sections on listening comprehension, structure or grammatical accuracy , reading comprehension, and written expression. With the exception of its writing section, the TOEFL as well as many other large-scale proficiency tests is machine-scorable for rapid turnaround and cost effectiveness that is, for reasons of practicality. Research is in progress Bernstein et al. A key issue in testing proficiency is how the constructs of language ability are specified.
Creating these tasks and validating them with research is a time-consuming and costly process. Language teachers would be wise not to create an overall profiCiency test on their own. In Part I, students read a short articre and then write a summary essay. In Part II, students write a composition in response to an article.
Part III is multiple-choice: The maximum time allowed for the test is three hours. Most of the ESL courses at San Francisco State involve a combination of reading and writing, with a heavy emphasis on writing. Finally, proofreading drafts of essays is a useful academic skill, and the exercise in error detection simulates the proofreading process.
Teachers and administrators in the ESL program at SFSU are satisfied with this test's capacity to discriminate appropriately, and they feel that it is a more authentic test than its multiple-choice, discrete-point, grammar-vocabulary predecessor. Reliability problems are also present but are mitigated by conscientious training of all evaluators of the test.
What is lost in practicality and reliability is gained in the diagnostic information that the ESLPT provides. Placement tests come in many varieties: Some programs simply use existing standardized proficiency tests because of their obvious advantage in practicality-cost, speed in scoring, and efficient reporting of results. The ultimate objective of I a placement test is, of course, to correctly place a student into a.
In a recent one-month special summer program in English conversation and writing at San Francisco State University, 30 students were to be placed into one of two sections. The ultimate objective of the placement test consisting of a five-minute oral interview and an essay-writing task was to find a performance-based means to divide the srudents evenly into two sections.
This objective might have been achieved easily by administering a simple grid-scorable multiple-choice grammar-vocabulary test. Diagnostic Tests A diagnostic test is designed to diagnose specified aspects of a language. A test in I pronunciation, for example, might diagnose the phonological features of English that are difficult for learners and should therefore become part of a curriculum.
Usually, such tests offer a checklist of features for the administrator often the teacher to use in pinpointing difficulties. A writing diagnostic would elicit'a writing sample from students that would allow the teacher to identify those rhetorical and linguistic features on which the course needed to focus special attention. Any placement test that offers information beyond simply designating a course level may also serve diagnostic purposes.
There is also a fine line of difference between a diagnostic test and a general achievement test. Achievement tests analyze the extent to which students have acquired language features that have already been taught; diagnostic tests should elicit information on what students need to work on in the future.
A typical diagnostic test of oral production was created by Clifford Prator to accompany a manual of English pronunciation. Test-takers are directed to read a ISO-word passage while they are tape-recorded. The test administrator then refers to an. The main' categories include 1. An example of subcategories is shown in this list for the first category stress and rhythm: Thts infomlation can help teachers make decisions about awects of English phonology on which to focus.
This same information can help a student become aware of errors and encourage the adoption of appropriate compensatory strategies. Achievement Tests An achievement test is related directly to classroom lessons, units, or even a total curriculum.
Achievement tests are or should be limited to particular material addressed in a curriculum within a particular time frame and are offered after a course haS focused on the objectives in question.
Achievement tests are often summative because they are administered at the end of a unit dr term of study. They also play an important formative role. This washback contributes to the formative nature of such tests.
Here is the outline for a midterm examination offered at the high-intermediate level of an intensive English program in the United States. The course focus is on academic reading and writing; the structure of the course and its objectives may be implied from the sections of the test. Midterm examination outline, high-intermediate Section A.
Vocabulary Part 1 5 items: Gramnlar lO sentences: Readi ng comprehension 2 one-paragraph passages: It is unlikely that you would be asked to design an aptitude test or a proficiency test, but for the purposes of interpreting those tests, it is important that you understand their nature.
However, your opportunities to design placement, diagnostic, and achievement tests-especially the latter-will be plentiful. In the remainder of this chapter, we will explore the four remaining questions posed at the outset, and the focus will be on equipping you with the tools you need to create such classroom-oriented tests. You may think that every test you devise must be a wonderfully innovative instrument that will gamer the accolades of your colleagues and the admiration of your students.
Not so. First, new and innovative testing formats take a lot of effort to design and a long time to refme through trial and error. Your best tack as a new teacher is to work within the guidelines of accepted, known, traditional testing techniques.
In that spirit, then, let us consider some practical steps in constructing classroom tests. Assessing Clear, Unambiguous Objectives In addition to knowing the purpose of the test you're creating, you need to know as specifically as possible what it is you want to test. This is no way to approach a test. In other words, e.
Thus, an objective that states "Students 'will. You don't know whether students should be able to understand them in spoken or written language, or whether they shQl: Your first task in designing a test, then, is to determine appropriate objectives.
If you're a little less fortunate, you may have to go back through a unit and formulate them yourself. Notice that each objective is stated in terms of the performance elicited and the target linguistiC domain. Reading skills simple essay or story Students will 7.
Writing skills simple essay or story Students wi II 8. You may find, in reviewing the objectives of a unit or a course, that you cannot possibly test each one. Drawing Up Test Specifications Test specifications for classroom use can be a simple and practical outline of your test. For large-scale standardized tests [see Chapter 4] that are intended to be widely distributed and therefore are broadly generalized, test specifications are much more formal and detailed.
In the unit discussed above, your specifications will simply comprise a a broad outline of the test, b what skills you will test, and c what the items will look like. Let's look at the frrst two in relation to the midterm unit assessment already referred to above. Because of the constraints of your curriculum, your unit test must take no more than 30 minutes.
Since you have the luxury of teaching a small class only 12 students! You can therefore test oral production objectives directly at that time. You determine that the minute test will be divided equally in time among listening, reading, and writing. The next and potentially more complex choices involve the item types and tasks to use in this test.
It is surprising that there are a limited number of modes of eliciting responses that is, prompting and of responding on tests of any kind. Consider the options: It's that simple. But some complexity is added when you realize that the types of prompts in each case vary widely, and within each response mode, of course, there are a number of options, all of which are depicted in Figure 3. Elicitation mode: Oral Written Oral Written!
Figure 3. Elicitation and response modes in test cons! For example, it is unlikely that directions would be read aloud, nor would spelling a word be matched with a monologue. A modicum of intuition will eli. Armed with a nUInber of elicitation and response formats, you have decided to design your specs as follows, based on the objectives stated earlier: T makes audiotape in advance, with one other voice on it Tasks: Notice that three of the-six. This decision may be based on the time you devoted to these objectives, but more likely on the feasibility of testing that objective or simply on the fmite number of minutes available to administer the test.
Notice, too, that objectives 4 and 8 are not assessed. Finally, notice that this unit was mainly focused on listening and speaking, yet 20 minutes of the minute test is devoted to reading and writing tasks. Is this an appropriate decision? One more test spec that needs to be included is a plan for scoring and assigning relative weight to each section and each item within. This issue will be addressed later in this chapter when we look at scoring, grading, and feedback.
Devising Test Tasks Your oral interview comes frrst, and so you draft questions to conform to the accepted pattern of oral interviews see Chapter 7 for information on constructing oral interviews. Oral interview format A. Level-check questions objectives 3, 5, and 6 1. Tell me about what you did last weekend. Tell me about an interesting trip you took in the last year. How did you like the TV show we saw this week? Probe objectives 5, 6 1.
What is your opinion about? How do you feel about? The sitcom depicted a loud,noisy party with lots of small talk. Let's say your first draft of items produces the following possibilities within each section: Test items, first draft Listening, part a.
Listen to the sentence [on the tapel. Choose the sentence on your test page that is closest in meaning to the sentence you heard. They sure made a mess at that party, didn't they?
They didn't make a mess, did they? They did make a mess, didn't they? Listening, part b.
Choose the sentence on your test page that is the best answer to the question. Where did George go atter the party last night? Yes, he did. Because he was tired. To Elaine's place for another party. He went home around eleven o'clock. Reading sample items Directions: And then right away lightning strike right outside their house! Writing Directions: Write a paragraph about what you liked or didn't like about one of the characters at the party in the TV sitcom we saw.
As you can see, these items are quite traditional. You might self-critically admit that the format of some of the items is contrived, thus lowering the level of authenticity.
All four skills are represented, and the tasks are varied within the 30 minutes of the test. Are the directions to each section absolutely clear? Is there an example item for each section? Does each item measure a specified objective? Is each item stated in clear, simple language? See below for a primer on creating effective distractors. Is the difficulty of each item appropriate for your students? Is the language of each item sufficiently authentic?
Do the sum of the items and the test as a whole adequately reflect the learning objectives? In the current example that we have been analyzing, your revising process is likely to result in at least four changes or additions: In both interview and writing sections, you recognize that a scoring rubric will be essential. In the listening section, part b, you intend choice "c" as the correct answer, but you realize that choice "d" is also acceptable.
You shorten it to "d. Around eleven o'clock. In the writing prompt, you can see how some students would not use the words so or because, which were in your objectives, so you reword the prompt: Then, use the word so at least once and the word because at least once to tell why you liked or didn't like that person.
But in our daily classroom teaching, the tryout phase is almost impossible. Alternatively, you could enlist the aid of a colleague to look over your test.
Go through each set of directions and all items slowly and deliberately. Often we underestimate the time students will need to complete a test. If the test should be shortened or lengthened, make the. Make sure your test is neat and -uncluttered on the page, reflecting all the care and precision you have put into its construction.
If there is an audio component, as there is in our hypothetical test, make sure that the sCript is clear, that,your,voice ,and any other voices are: This was a bold step to take. MUltiple-choice items, which may appear to be the Simplest kind of item to construct, are extremely difficult to design correctly. Hughes , pp. The two prinCiples that stand out L. But is the preparation phase worth the effort?
Sometimes it is, but you might spend even more time designing such items than you save in grading the test. First, a,primer on terminology. Other receptive item types include true-false questions and matching lists. In the discussion here, the guidelines apply primarily to multiple-choice item types and not necessarily to other receptive types. Brown, , pp. Design each item to measure a specific objective. Consider this item introduced, and then revised, in the sample test above: Multiple-choice item, revised Voice: Where did George go after the party last night?
The specific objective being tested here is comprehension of wh-questions. Distractors b and d , as well as the' key item c , test comprehension of the meaning of where as opposed to why and when.
The objective has been directly addressed. Multiple-choice item, flawed Excuse me, do you know? But what does distractor c actually measure?
Brown , p. Can you think of a better distractor for c that would focus more clearly on the objective? State both stem and options as simply and directly as possible. We are sometimes tempted to make multiple-choice items too wordy. A good rule of thumb is to get directly to the point. Here's an example. Multiple-choice cloze item, flawed My eyesight has really been deteriorating lately.
I wonder if I need glasses. But if you simply want a student to identify the type of medical professional who deals with eyesight issues1 those sentences are superfluous. Moreover, by lengthening the stem, you have introduced a potentially. Another rule of succinctness is to remove needless redundancy from your options.
In the followmglfem, which were is repeated in all three options. It should be placed in the stem to keep the item as succinct as possible. Make certain that the intended answer is clearly the only correct one. In the proposed unit test described earlIer, the following item appeared in the original draft: Multiple-choice item, flawed Voice: A quick consideration of the distractor d reveals that it is a plausible answer, along with the intended key, c.
Eliminating unintended possible answers is often the most difficult problem of designing multiple--choice items. Use item indices to accept, discard, or revise items. Although measuring these factors on classroom tests would be useful, you probably will have neither the time nor the expertise to do this for every classroom test you create, especially one-time tests. Itemfacility or IF is the extent to which an item is easy or difficult for the proposed group of test-takers.
The answer is that an item that is too easy say 99 percent of respondents get it right or too difficult 99 percent get it wrong really does nothing to separate high-ability and low-ability test-takers.
The formula looks like this-: There is no absolute IF value that must be met to determine if an item should be included in the test as is, modified, or thrown out, but appropriate test items will generally have IFs that range between.
Two good reasons for occasionally including a very easy item. And very difficult items can provide a challenge to the highest-ability students. An item on which high-ability students who did well in the test and low-ability students who didn't score equally well would have poor ID because it did not discriminate between the two groups.
Suppose your class of 30 students has taken a test. Once you have calculated final scores for all 30 students, divide them roughly into thirds-that is, create three rank-ordered ability groups including the top 10 scores, the middle 10, and the lowest One clear, practical use for ID indices is to select items from a test bank that includes more items than you need.
You might decide to discard or improve some items with lower ID because you know they won't be as powerful an indicator of success on YOlir test. Your best calculated hunches may provide sufficient support for retaining, revising, and discarding proposed items. But if you are constructing 'a large-scale test, or one that will be administered multiple times, these indices are important factors.
For more information on IRT; see Bachman, ,1pp. Distractor effu: Consider the following. The same item 23 used, above is a multiple-choice item with five choices, and responses across upper- and lower-ability students are distributed as follows: C is the correct response.
No mathematical formula is needed to tell you that this item successfully attracts seven of the ten high-ability students toward the correct response, while only tw9 of the low-ability students get this one right. As shown above, its ID is. No one picked it, and therefore it probably has, P9. Why are good students choosing this one? Your scoring plan reflects the relative weight that you place on each section and items in each section.
The integrated-skills class that we have been using as an example focuses on listening and speaking skills with some attention to reading and writing. Three of your nine objectives target reading and writing skills. How do you assign SCOling to the various components of this test? Because oral production. You consider the listen! That leaves 20 percent for the writi,!! Again, to achieve the correct weight for writing, you will double each score and add them, so the possible total is 20 points.
Chapters 4 and 9 will deal in depth with scoring and assessiflg writing performance. After administering the test once, you may decide to shift some of these weights or to make other changes. You will then have valuable information about how easy or difficult the test was, about whether the time limit was reasonable, about your students' affective reaction to it, and about their general performance. Finally, you will have an intuitive judgment about whether this test correctly assessed your students.
Take note of these impressions, however nonempirical they may be, and use them for revising the test in another term. Grading Your first thought might be that assigning grades to student performance on this test would be easy: Not so fast! Grading is such a thorny issue that all of Chapter 11 is devoted to the topic. For the time being, then, we will set aside issues that deal with grading this test in particular, in favor of the comprehensive treatment of grading in Chapter You might choose to return the test to,: They offer the student only a modest sense ofwhere that student stands and a vague idea of overall performance, but the feedback they present does not become washback.
Washback "-"-,,,- is achieved '.. Of course, time and the logistics of large classes may not permit 5d and 6d, which for many teachers may be going above and beyond expectations for a test like this. Options 6 and 7, however, are clearly viable possibilities that solve some of the practicality issues that are so important in teachers' busy schedules.
This five-part template can serve as a pattern as you design classroom tests. You will also assess the pros and cons of what we've been calling standards-based assessment, including its social and political consequenc. You will consider an array of possibilities of what has come to be called "alternative" assessment Chapter 10 , only because portfolios, conferences, journals, self- and peer-assessments are not always comfortably categorized among more traditional forms of assessment.
And fmally Chapter 11 you will take a long, hard look at the dilemmas of grading students. Aptitude tests propose to predict one's performance in a language course. Review the rationale supporting such testing, and then summarize the controversy surrounding aptitude tests. What can you say about the validity and the ethics of aptitude testing? What are the test criteria? What kinds of items should be used?
How would you sample among a number of possible objectives? G Look again at the discussion of objectives page In a small group, discuss the following scenario: In the case that a teacher is faced with more objectives than are possible to sample in a test, draw up a set of guidelines for choosing which objectives to include on the test and which ones to exclude.
You might start with considering the issue. Are there other modes of elicitation that could be included in such a chart? Justify your additions with an example of each. G Select a language class in your immediate environment for the following project: In small groups, design an achievement test for a segment of the course preferably a unit for which there is no current test or for wpjch the present test is inadequate.
When it is completed, present your assessment project to the rest of the class. Calculate the item facility IF and item discrimination ID index for selected items. If there are no data for an existing test, select some items on the test and analyze the structure of those items in a distractor analysis to determine if they have a any bad distractors, b any bad stems, or c more than one potentially correct answer. Review the practicality of each and determine the extent to which practicality principally, more time expended is justifiably sacrificed in order to offer better washback to learners.
Cognitive abilities in foreign language aptitude: Then and now. In Thomas S. Stansfield Eds. Englewood Cliffs, NJ: Prentice Hall Regents. Testing in language programs.
Upper Saddle River, NJ: Assessment of student achievement. Sixth Edition. Allyn and Bacon. This widely used general. In particular, Chapters 3, 4, 5, and 6 describe detailed steps for designing tests and writing multiple-choice, true-false, and short-answer items. For almost a century, schools, universities, businesses, and governments have looked to standardized measures for economical, reliable, and valid assessments of those who would enter, continue in, or exit their institutions.
Proponents of these large-scale instruments make strong claims for their usefulness when great numbers of people must be measured quickly and effectively. The rush to carry out standardized testing in every walk of life has not gone unchecked. Some psychometricians have stood up in recent years to caution the public against reading too much into tests that require what may be a narrow band of specialized intelligence Sternberg, ; Gardner, ; Kooo, So it is important for you to understand what.
We can learn a great deal about many learners and their competencies through standardized forms of assessment. But some of those learners and some of those objectives may not be adequately measured by a sit-down, timed, multiple-choice format that is likely to be decontextualized. This chapter has two goals: A standardized test presupposes certain standard objectives, or criteria, that are held constant across one form of the test to another.
The criteria in large-scale standardized tests are designed to ap'ply to a broad band of competencies that are usually not exclusive to one particular curriculum.
A good standardized test is the '. It dictates standard procedures for administration. And finally, it is' typical of a norm-referenced test, the goal of which is to place test-takers on a continuum across a range of scores and to differentiate test-takers by their relative ranking.
Most elementary and secondary schools in the United States have standardized achievement tests to measure children's mastery of the standards or competencies that have been prescribed for specified grade levels.
While it is true that many standardized tests conform to a multiple-choice format, by no means is multiple-choice a prerequisite characteristic.
It so happens that a multiple-chOice format provides the test producer with an "objective" means for determining correct and incorrect responses, and therefore is the preferred mode for large-scale tests. Administration to large groups can be accomplished within reasonable time limits.
And, for better or for worse, there is often an air of face validity to such authoritative-looking instruments. Disadvantages center largely on the in..! This instrunrent had the appearance and face validity of a good test when in reality it had no -content Validity whatsoever. For example, r before , the TOEFL included neither a written nor an oral production section, yet statistics showed a reasonably strong correspondence between performance on the TOEFL and a student's written and-to a lesser extent-oral production.
Those who use standardized tests need to acknowledge both the advantages and limitations of indirect testing. In the pre.. Yet the construct validation statistics that offer that support never offer a percent probability of the relationship, leaving room for some possibility that the indirect test is not valid for its targeted use.
A more serious isslle lies in the assumption alluded to above that standardized tests correctly assess all learners equally well. Here is a non-language example. What about those few who do not fit the model? That small minority of drivers could endanger the lives of the majority, and is that a risk worth taking? Motor vehicle registration departments in the United States seem to think so, and thus avoid the high cost of behind-the-wheel driving tests.
Are you willing to rely on a standardized test result in the case of all the learners in your class? Of an applicant to your institution, or of a potential degre,e candidate exiting your program? These questions will be addressed more fully in Chapter 5, but for the moment, think carefully about what has come to be known.
The widespread acceptance, and sometime misuse, of this gate-keeping role of the testing industry has created a political, educational, and moral maelstrom. How are standardized tests developed? Where do test tasks and items come from?
How are they evaluated? Who selects items and their arrangement in a test? H ow do such items and tests- achieve consequential validity? Who sets norms and cut.. Are security add confidentiality an issue? Are cultural and racial biases an issue in test development? All these questions typify those that you might pose in an attempt to understand the process of test 'development. The second. As we look at the steps, one by one, you will see patterns that are consistent with those outlined in the previous two chapters.
Determine ihe purpose and objectives of the test.
Most standardized tests are expected to provide high practicality in administration and scoring without unduly compromising validity. The initial outlay of time and money for such a test is Significant, but the test would be used repeatedly.
Let's look at the three tests. More specifically, the TOEFL is designed to help institutions of higher learning make "valid decisions concerning English language profiCiency lin terms of lthelr] own requirements" p. Various cut-off scores apply, but most institutions require scores from to paper-based or from to computer-based in order to consider students for admission. B The ESLP'T, referred to in Chapter 3, is designed to place already admitted students at San Francisco State University in an appropriate course in academic writing, with the secondary goal of placing students into courses in oral production and grammar-editing.
While the test's primary purpose is to make placements, another desirable objective is to provide teachers with some diagnostic information about their students on the first day or two of class. C The GE'f,another test designed at SFSU, is given to prospective graduate students-both native and non-native speakers-in all disciplines to determine whether their writing ability is sufficient to permit them to enter graduate-level courses in their programs.
It is offered at the beginning of each term. Students who fail or marginally pass the GET are technically ineligible to take graduate courses in their field. Instead, they may elect to take a course in graduate-level writing of research papers. A pass in that course is equivalent to passing the GET. Types and Uses of Language Tests Chapter 2: Item Analysis in Language Testing Chapter 5: Describing Language Test Results Chapter 6: Interpreting Language Test Scores Chapter 7: Correlation in Language Testing Chapter 8: Language Test Reliability Chapter 9: Language Test Dependability Chapter Language Test Validity Chapter Rating details.
Book ratings by Goodreads. Goodreads is the world's largest site for readers with over 50 million reviews. We're featuring millions of their reader ratings on our book pages to help you find your new favourite book. Close X. Follow us.