Guidelines for text selection
- How is the text structured? Is it logical and clear?
- What is the language level of the text (vocabulary, grammar)?
- Will this text interest the students?
- Does it fall within their realm of experience?
- Is there a balance of old and new information?
- Is there a balance of abstract and concrete information? Is it appropriate for the level of students?
- What audience was the text written for? Is it similar to the students?
- What is the intended function of the text (argumentative, instructional etc)? Is it appropriate for the students?
- Are the charts, diagrams or pictures helpful or distracting? Are there too many or too few of them?
- Is the layout or typeface easy to read?
- Is the text long enough or too long?
Guidelines for item design (test format selection)
1.It is better to select tests that can be marked objectively.
2.Use a variety of item types (not only true/false and multiple choice).
3.Short answer questions should be used if range of possible answers is small and can be specified exhaustively in the key.
4.It must not be possible to complete any item without reading the text.
5.Items should be independent.
6.If the students have to list items, they must be told how many items they are expected to list.
7.Deleted words should be those which carry significant meaning rather than simple syntactic meaning. They should be reconstructed.
8.Summary completion items should require the students to understand the text rather than simply spot key words.
Types of reading comprehension questions.
· Main idea questions
· Factual questions
· Inference questions
· Analogy questions
· Written expression questions (what does they in line 7 refer to?)
· Organization questions
· Follow on questions
· Viewpoint questions
Possible task formats
- True/false
- Multiple choice
- Matching
- Information transfer (trace the route, complete the table, draw a diagram etc)
- Labeling
- Sequencing (texts, pictures)
- Short answer questions
- Problem-solving (from the following information work out people’s names)
- Identify topic (match the title with the text)
- Linking
- Identify linking words in the text
- Cloze ( together with vocabulary skills)
- C-tests
- Summary (together with writing skills)
Advantages and disadvantages of each technique will be analysed at your practical lessons.
LECTURE 4
1. Testing writing.
2. Testing speaking.
3. Independent external testing.
Testing writing.
When testing productive skills the test-developer or the teacher that chooses test formats must bear in mind advantages and limitations of the test formats, for example:
· the degree of control or freedom;
· the kind and amount of context/stimulus provided;
· whether task reflects real-life purpose.
Writing is a skill and it should be tested as such, not just grammar or vocabulary tasks in a written form. So we shall not speak about pre-writing (sentence combining, sentence expansion, sentence reduction, training specialized skills: spelling, punctuation, capitalization, building paragraphs from outlines, supplying grounds to topical sentences, supplying illustrative examples etc). We shall discuss testing of guided writing and free writing.
Students’ ability to handle guided writing tasks is tested with the help of different formats the difference depending on the form and amount of input provided by the teacher or test-developer. The input may be provided in visual form (pictures, photos, diagrams etc), or in verbal form (separate words and collocations, notes, a text for reading), or in visual and verbal form combined (photos with notes etc.). Input can create reason for communication. It gives help which is especially important for lower level students. Besides, it is easier to mark than free writing. The danger that exists is connected with the amount of input: on the one hand, it should be sufficient to provide help and on the other hand, not so detailed as to deprive students of creativity making them just copy the stimulus.
In case of integrative testing, e.g. Continue a story – first few sentences or paragraph is given or Read the letter and write a reply, which is realistic and very good for developing writing skill, it is difficult to mark written works as it is difficult to distinguish between problems in reading and writing. Another example of integrative testing is a dictation. But often it checks just listening and spelling. And a very clear marking scheme is to be established. The same is true about writing a summary (Read the text and summarise it in 20 lines). It is realistic but difficult to mark. Besides what is important in the text can be subjective.
Free writing, e.g. writing essays (Write about a day when everything went wrong), is very easy to set. It is better for higher levels. It is very difficult and time consuming to mark. Demands well developed schemes (holistic and different bands). It lacks objectivity and consistency. Often essays test imagination and general knowledge. Should take into account test-takers’ age, level of proficiency and realm of life experience.
Holistic marking schemes include description of different levels – marks. For example? TOEFL Writing test is marked from 6 to zero. Explanation may be as follows.
6 An essay at this level:
- effectively addresses the writing task,
- is well-organised and well-developed,
- uses appropriate details to support a thesis or illustrate ideas,
- displays consistent facility in use of language
- demonstrates syntactic variety and appropriate word choice though it may
have occasional errors.
3 An essay at this level may reveal one or more of the following weaknesses:
- inadequate organization or development;
- inappropriate or insufficient details to support or illustrate generalizations;
- a noticeably inappropriate choice of words or word forms;
- an accumulation of errors in sentence structure and/or usage.
0 A paper is rated 0 if it contains no response, merely copies the topic, is off-
topic, is written in a foreign language.
In case of different bands each parameter is estimated separately, so there are 6 bands for the content, text-organisation, accuracy, range of vocabulary and variety of structures etc.
Testing speaking.
When analyzing different test tasks one should consider the following points:
- number of test-takers involved in the speaking test;
- what does it test: dialogue skills (interaction) or monologue speech;
- role of a teacher ( as participant or as an assessor or combining both roles);
- recorded or not recorded;
- ease or difficulty of administration;
- criteria of assessment.
Possible formats:
Picture description (interpretation): you may use photos or drawings. It gives tester time to listen and students something concrete to speak about. But it is an artificial task and there is no interaction.
Giving instructions/explanations, e.g. how to make a dish. Realistic, concrete, but no interaction.
Oral presentations:students prepare and give short talks. Realistic and gives the teacher time to assess performance. In case of a prepared monologue, it becomes retelling a written speech, trying to recall the text, not to speak.
Information transfer: information gap is created through notes or pictures (jig saw reading). Realistic – need for communication. Tests key interactive strategies (questioning, clarifying, eliciting information). Can be problems when one speaker is a lot weaker than the other.
Role play: students assume roles with or without cued information). It is excellent for testing interaction and commonly used task in most materials. But it can test the ability to act. Much depends on psychological characteristics of students, not on their language abilities.
Free interviews: chat to students in groups or as individuals. It is realistic and can reduce stress for students in case of the right organization (procedure, teacher’s behaviour etc). But it is very difficult to rate performance. Again personality factors may be at work (shy/outgoing, laconic/blabbering etc). It is difficult if a teacher has to maintain conversation at the same time as rating the students’ performance. In internationally recognized tests raters and interlocutors are usually two different people.
Types of spoken tests
The most commonly used spoken test types are these:
· Interviews – these are relatively easy to set up, especially if there is a room apart from the classroom where learners can be interviewed. The class can be set some writing or reading task (or even the written component of the examination) while individuals are called out, one by one, for their interview. Such interviews are not without their problems though. The rather formal nature of interviews … means that the situation is hardly conducive to testing more informal, conversational speaking styles. Not surprisingly, students often underperform in interview-type conditions. It is also difficult to eliminate the effects of the interviewer – his or her questioning style, for example – on the interviewee’s performance. Finally, if the interviewer is also the assessor, it may be difficult to maintain the flow of the talk while at the same time making objective judgments about the interviewee’s speaking ability. Nevertheless, there are ways of circumventing some of these problems. A casual chat at the beginning can help put candidates at their ease. The use of pictures or a pre-selected topic as a focus for the interview can help, especially if candidates are given one or two minutes to prepare themselves in advance. If the questions are the same for each interview, the interviewer effect is at least the same for all candidates. And having a third party present to co-assess the candidate can help ensure a degree of objectivity.
· Live monologues – the candidates prepare and present a short talk on a pre-selected topic. This eliminates the interviewer effect and provides evidence of the candidates’ ability to handle an extended turn, which is not always possible in interviews. If other students take the role of the audience, a question-and-answer stage can be included, which will provide some evidence of the speaker’s ability to speak interactively and spontaneously. Buy giving a talk or presentation is only really a valid test if these are skills that learners are likely to need, e.g. if their purpose for learning English is business, law, or education.
· Recorded monologues – these are perhaps less stressful than a more public performance and, for informal testing, they are also more practicable in a way that live monologues are not. Learners can take turns to record themselves talking about a favourite sport or pastime, for example, in a room adjacent to the classroom, with minimal disruption to the lesson. The advantage of recorded tests is that the assessment can be done after the event, and results can be ‘triangulated’ – that is, other examiners can rate the recording and their ratings can be compared to ensure standardization.
· Role-plays – most students will be used to doing at least simple role-plays in class, so the same format can be used for testing. The other ‘role’ can be played either by the tester or another student, but again, the influence of the interlocutor is hard to control. The role-play should not require sophisticated performance skills or a lot of imagination. Situations grounded in everyday reality are best. They might involve using data that has been provided in advance. For example, students could use the information in a travel brochure to make a booking at a travel agency. This kind of test is particularly valid if it closely matches the learners’ needs. One problem, though, with basing the test around written data is that it then becomes a partial test of reading skills as well.
· Collaborative tasks and discussions – these are similar to role-plays except that the learners are not required to assume a role but simply to be themselves. For example, two candidates might be set the task of choosing between a selection of job applicants on the basis of their CVs. Or the learners simply respond with their own opinions to a set of statements relevant to a theme. Of course, as with role-plays, the performance of one candidate is likely to affect that of the others, but at least the learners’ interactive skills can be observed in circumstances that closely approximate real-life language use.
Speaking is judged according to the following criteria: problem-solving or relevance to the topic, fluency, interactive communication (how active the student is in reacting, keeping the conversation going, displaying the initiative etc), discourse management (making relevant and logical contributions to conversations, linking together own words and with what other candidates and examiner say), accuracy: pronunciation, grammar and vocabulary.
Assessment criteria
Having obtained a sample of the learner’s speaking ability, how does one go about assessing it? There are two main ways: either giving it a single score on the basis of an overall impression (called holistic scoring) or giving a separate score for different aspects of the task (analytic scoring). Holistic scoring (e.g. giving an overall mark out of, say, 20) has the advantage of being quicker, and is probably adequate for informal testing of progress. Ideally, though, more than one scorer should be enlisted, and any significant differences in scoring should be discussed and a joint score negotiated.
Analytic scoring takes longer, but compels testers to take a variety of factors into account and, if these factors are well chosen, is probably both fairer and more reliable. One disadvantage is that the scorer may be distracted by all the categories and lose sight of the overall picture – a woods-and-trees situation. Four or five categories seems to be the maximum that even trained scorers can handle at one time.
For the CELS Test of Speaking there are four categories: ‘Grammar and Vocabulary’, ‘Discourse Management’, ‘Pronunciation’, and ‘Interactive Communication’. They are described in the following terms:
Grammar and Vocabulary – on this scale, candidates are awarded marks for the accurate and appropriate use of syntactic forms and vocabulary in order to meet the task requirements at each level. The range and appropriate use of vocabulary are also assessed here.
Discourse Management – on this scale, examiners are looking for evidence of the candidate’s ability to express ideas and opinions in coherent, connected speech. The CELS tasks require candidates to construct sentences and produce utterances (extended as appropriate) in order to convey information and to express or justify opinions. The candidate’s ability to maintain a coherent flow of language with an appropriate range of linguistic resources over several utterances is assessed here.
Pronunciation – this refers to the candidate’s ability to produce comprehensible utterances to fulfil the task requirements, i.e. it refers to the production of individual sounds, the appropriate linking of words, and the use of stress and intonation to convey the intended meaning. L1 accents are acceptable provided communication is not impeded.
Interactive Communication – this refers to the candidate’s ability to interact with the interlocutor and the other candidate by initiating and responding appropriately and at the required speed and rhythm to fulfil the task requirements. It includes the ability to use functional language and strategies to maintain or repair interaction, e.g. in conversational turn-taking, and a willingness to develop the conversation and move the task towards a conclusion. Candidates should be able to maintain the coherence of the discussion and may, if necessary, ask the interlocutor or the other candidate for clarification.
It is worth emphasizing that grammatical accuracy is only one of several factors, and teachers need to remind themselves when assessing speaking that even native speakers produce non-grammatical forms in fast, unmonitored speech. It would be unfair, therefore, to expect a higher degree of precision in learners than native speakers are capable of.