AIOU Solved Assignments 1& 2 Code 8602 Spring 2020

AIOU Solved Assignments code B.ed 8602 Spring 2020 assignment 1 & 2  Course: Education, Assessment and Evaluation (8602) spring 2020. AIOU past papers

Education, Assessment and Evaluation (8602)
B.ed 1.5 Years
Spring, 2020

AIOU Solved Assignments 1 & 2 Code 8602 Spring 2020

Q.1   Highlight the important characteristics of classroom assessment. Also define the terms measurement, evaluation and test.

Assessment is important because of all the decisions you will make about children when teaching and caring for them. The decisions facing our three teachers at the beginning of this chapter all involve how best to educate children. Like them, you will be called upon every day to make decisions before, during, and after your teaching. Whereas some of these decisions will seem small and inconsequential, others will be “high stakes,” influencing the life course of children. All of your assessment decisions taken as a whole will direct and alter children’s learning outcomes.  Below outlines for you some purposes of assessment and how assessment can enhance your teaching and student learning. All of these purposes are important; if you use assessment procedures appropriately, you will help all children learn well.

The following general principles should guide both policies and practices for the assessment of young children:

  • Assessment should bring about benefits for children. Gathering accurate information from young children is difficult and potentially stressful. Assessments must have a clear benefit—either in direct services to the child or in improved quality of educational programs.
  • Assessment should be tailored to a specific purpose and should be reliable, valid, and fair for that purpose. Assessments designed for one purpose are not necessarily valid if used for other purposes. In the past, many of the abuses of testing with young children have occurred because of misuse.
  • Assessment policies should be designed recognizing that reliability and validity of assessments increase with children’s age. The younger the child, the more difficult it is to obtain reliable and valid assessment data. It is particularly difficult to assess children’s cognitive abilities accurately before age six. Because of problems with reliability and validity, some types of assessment should be postponed until children are older, while other types of assessment can be pursued, but only with necessary safeguards.
  • Assessment should be age appropriate in both content and the method of data collection. Assessments of young children should address the full range of early learning and development, including physical well-being and motor development; social and emotional development; approaches toward learning; language development; and cognition and general knowledge. Methods of assessment should recognize that children need familiar contexts to be able to demonstrate their abilities. Abstract paper-and-pencil tasks may make it especially difficult for young children to show what they know.
  • Assessment should be linguistically appropriate, recognizing that to some extent all assessments are measures of language. Regardless of whether an assessment is intended to measure early reading skills, knowledge of color names, or learning potential, assessment results are easily confounded by language proficiency, especially for children who come from home backgrounds with limited exposure to English, for whom the assessment would essentially be an assessment of their English proficiency. Each child’s first- and second-language development should be taken into account when determining appropriate assessment methods and in interpreting the meaning of assessment results.
  • Parents should be a valued source of assessment information, as well as an audience for assessment. Because of the fallibility of direct measures of young children, assessments should include multiple sources of evidence, especially reports from parents and teachers. Assessment results should be shared with parents as part of an ongoing process that involves parents in their child’s education.

Classroom assessment is the process, usually conducted by teachers, of designing, collecting, interpreting, and applying information about student learning and attainment to make educational decisions. There are four interrelated steps to the classroom assessment process. The first step is to define the purposes for the information. During this period, the teacher considers how the information will be used and how the assessment fits in the students’ educational program. The teacher must consider if the primary purpose of the assessment is diagnostic, formative, or sum-mative. Gathering information to detect student learning impediments, difficulties, or prerequisite skills are examples of diagnostic assessment. Information collected on a frequent basis to provide student feedback and guide either student learning or instruction are formative purposes for assessment, and collecting information to gauge student attainment at some point in time, such as at the end of the school year or grading period, is summative assessment.

Measuring the student behave and attitude

The next step in the assessment process is to measure student learning or attainment. Measurement involves using tests, surveys, observation, or interviews to produce either numeric or verbal descriptions of the degree to which a student has achieved academic goals. The third step is to evaluate the measurement data, which entails making judgments about the information. During this stage, the teacher interprets the measurement data to determine if students have certain strengths or limitations or whether the student has sufficiently attained the learning goals. In the last stage, the teacher applies the interpretations to fulfill the aims of assessment that were defined in first stage. The teacher uses the data to guide instruction, render grades, or help students with any particular learning deficiencies or barriers.


After the instructional objectives are formulated, educational experiences can be developed that encompass the teaching materials and instructional opportunities that will be provided to students. Also during this planning stage, teachers must consider how they will determine if students have attained the instructional objectives. Indeed, good objectives are those that clearly define the type of activity the students will accomplish to indicate the degree to which the students have attained the objective. After students experience the learning opportunities provided by the teacher and after assessment has occurred, the teacher’s task is to examine the assessment results and decide whether students have sufficiently reached the objectives. If they have not, the teacher can revise the educational experiences until attainment has occurred. Thus, Tyler’s model of testing emphasized the formative role of classroom assessment.


The aim of theory and practice in educational measurement is typically to measure abilities and levels of attainment by students in areas such as reading, writing, mathematics, science and so forth. Traditionally, attention focuses on whether assessments are reliable and valid. In practice, educational measurement is largely concerned with the analysis of data from educational assessments or tests. Typically, this means using total scores on assessments, whether they are multiple choice or open-ended and marked using marking rubrics or guides.

In technical terms, the pattern of scores by individual students to individual items is used to infer so-called scale locations of students, the “measurements”. This process is one form of scaling. Essentially, higher total scores give higher scale locations, consistent with the traditional and everyday use of total scores. If certain theory is used, though, there is not a strict correspondence between the ordering of total scores and the ordering of scale locations. The Rasch model provides a strict correspondence provided all students attempt the same test items, or their performances are marked using the same marking rubrics. In terms of the broad body of purely mathematical theory drawn on, there is substantial overlap between educational measurement and psychometrics. However, certain approaches considered to be a part of psychometrics, including Classical test theory, Item Response Theory and the Rasch model, were originally developed more specifically for the analysis of data from educational assessments.

One of the aims of applying theory and techniques in educational measurement is to try to place the results of different tests administered to different groups of students on a single or common scale through processes known as test equating. The rationale is that because different assessments usually have different difficulties, the total scores cannot be directly compared. The aim of trying to place results on a common scale is to allow comparison of the scale locations inferred from the totals via scaling processes.

AIOU Solved Assignments 1 & 2 Code 8602 Spring 2020

Q.2   a) Explain the cognitive domain of Bloom’s Taxonomy of education objective.

A widely accepted and commonly used method to systematize measures of proficiency and competence is Bloom’s taxonomy . The taxonomy is named after educational psychologist Dr. Benjamin Bloom. Bloom’s taxonomy, which was created by a committee that Bloom headed in 1956, was an attempt to make assessment more systematic and promote higher levels of thinking in education, such as analysis and evaluation rather than just restating facts.

A sound knowledge of Bloom’s taxonomy is important for trainers and instructional designers. First, goals must be developed, and then objectives must be written that enable those goals to be accomplished. Quality objectives support effective training, and those objectives should always align with the language and methods in Bloom’s taxonomy.

Three Domains of Bloom’s Taxonomy

Individuals have knowledge and skills of widely varying types and levels. Bloom’s taxonomy provides a way to classify their knowledge and skills. It also serves as a reference so that learning objectives can be properly written.

Consider the following skills:

  • Performing surgery
  • Dancing with a ballet troupe
  • Editing an article for a professional journal
  • Developing a detailed outline for a training activity
  • Convincing a group to buy a product or service
  • Mediating a dispute between coworkers

Not all skills are as difficult or complex as those listed above, but these skills require knowledge and abilities that are so varied that they need to be classified in a unique manner.

Bloom’s committee identified three domains of learning activity: cognitive, which addresses knowledge and thinking skills such as paraphrasing what has been learned; psychomotor, which has to do with physical skills such as manipulating a tool or an instrument; and affective, which is concerned with subjective areas such as emotional development and conflict resolution.

Beginning in 1948, a group of educators undertook the task of classifying education goals and objectives.  The intent was to develop a classification system for three domains: the cognitive, the affective, and the psychomotor.  Work on the cognitive domain was completed in the 1950s and is commonly referred to as Bloom’s Taxonomy of the Cognitive Domain.  Others have developed taxonomies for the affective and psychomotor domains.

The major idea of the taxonomy is that what educators want students to know (encompassed in statements of educational objectives) can be arranged in a hierarchy from less to more complex.  The levels are understood to be successive, so that one level must be mastered before the next level can be reached.

The original levels by Bloom et al. (1956) were ordered as follows:  Knowledge, Comprehension, Application, Analysis, Synthesis, and Evaluation.  The taxonomy is presented below with sample verbs and a sample behavior statement for each level.

KNOWLEDGEStudent recalls or
recognizes information,
ideas, and principles
in the approximate
form in which they
were learned.

The student will define
the 6 levels of Bloom’s
taxonomy of the
cognitive domain.
COMPREHENSIONStudent translates,
comprehends, or
interprets information
based on prior
The student will explain
the purpose of Bloom’s
taxonomy of the
cognitive domain.
APPLICATIONStudent selects, trans-
fers, and uses data
and principles to
complete a problem
or task with a mini-
mum of direction.
The student will
write an instructional
objective for each
level of Bloom’s
ANALYSISStudent distinguishes,
classifies, and relates
the assumptions,
hypotheses, evidence,
or structure of a
statement or question.
The student will
compare and contrast
the cognitive and
affective domains.
SYNTHESISStudent originates,
integrates, and
combines ideas into a
product, plan or
proposal that is new
to him or her.
The student will
design a classification
scheme for writing
educational objectives
that combines the
cognitive, affective,
and psychomotor
EVALUATIONStudent appraises,
assesses, or critiques
on a basis of specific
standards and criteria.
The student will
judge the effective-
ness of writing
objectives using
Bloom’s taxonomy.

Anderson and Krathwohl (2001) revised Bloom’s taxonomy to fit the more outcome-focused modern education objectives, including switching the names of the levels from nouns to active verbs, and reversing the order of the highest two levels (see Krathwohl, 2002 for an overview).  The lowest-order level (Knowledge) became Remembering, in which the student is asked to recall or remember information.  Comprehension, became Understanding, in which the student would explain or describe concepts.  Application became Applying, or using the information in some new way, such as choosing, writing, or interpreting.  Analysis was revised to become Analyzing, requiring the student to differentiate between different components or relationships, demonstrating the ability to compare and contrast.  These four levels remain the same as Bloom et al.’s (1956) original hierarchy.  In general, research over the last 40 years has confirmed these levels as a hierarchy.  In addition to revising the taxonomy, Anderson and Krathwohl added a conceptualization of knowledge dimensions within which these processing levels are used (factual, conceptual, procedural, and metacognition).

Factual KnowledgeTerminology
Elements & Components
Label map
List names
Interpret paragraph
Summarize book
Use math algorithmCategorize wordsCritique articleCreate short story
Conceptual KnowledgeCategories
Define levels of cognitive taxonomyDescribe taxonomy in own wordsWrite objectives using taxonomyDifferentiate levels of cognitive taxonomyCritique written objectivesCreate new classification system
Procedural KnowledgeSpecific Skills & Techniques
Criteria for Use
List steps in problem solvingParaphrase problem solving process in own wordsUse problem solving process for assigned taskCompare convergent and divergent techniquesCritique appropriateness of techniques used in case analysisDevelop original approach to problem solving
Meta-Cognitive KnowledgeGeneral Knowledge
Self Knowledge
List elements of personal learning styleDescribe implications of learning styleDevelop study skills appropriate to learning styleCompare elements of dimensions in learning styleCritique appropriateness of particular learning style theory to own learningCreate an original learning style theory

The Center for Excellence in Learning and Teaching at Iowa State University (2011) provides an excellent graphic representation on how these two taxonomies can be used together to generate lesson objectives.

The two highest, most complex levels of Synthesis and Evaluation were reversed in the revised model, and were renamed Evaluating and Creating.  As the authors did not provide empirical evidence for this reversal, it is my belief that these two highest levels are essentially equal in level of complexity.  Both depend on analysis as a foundational process.  However, synthesis or creating requires rearranging the parts in a new, original way whereas evaluation or evaluating requires a comparison to a standard with a judgment as to good, better or best.  This is similar to the distinction between creative thinking and critical thinking.  Both are valuable while neither is superior.  In fact, when either is omitted during the problem solving process, effectiveness declines (Huitt, 1992).

Synthesis /
Evaluation /
Analysis / Analyze
Application / Apply
Comprehension / Understand
Knowledge / Remember

In any case it is clear that students can “know” about a topic or subject in different ways and at different levels.  While most teacher-made tests still test at the lower levels of the taxonomy, research has shown that students remember more when they have learned to handle the topic at the higher levels of the taxonomy.  This is because more elaboration is required, a principle of learning based on finding from the information processing approach to learning.

b)     Describe the importance of cognitive domain in the development of achievement test.

Assessment is one of education’s new four-letter words, but it shouldn’t be, because it’s not assessment’s fault that some adults misuse it. Assessment is supposed to guide learning. It creates a dynamic where teachers and students can work together to progress their own understanding of a subject or topic. Assessment should be about authentic growth.

Testing in the U.S. is very different from assessment. I know that sounds absurd but tests have more finality here. When it comes to testing, we have a love affair with multiple choice or true and false. We test whether they know the right answer…or not. Lots of tests are made of hard questions and easy ones. How deeply they know the answer doesn’t matter, just as long as they know it. State tests focus less on what students know, and more on what teacher’s supposedly taught. 

When it comes to assessing student learning, most educators know about Bloom’s Taxonomy. They use it in their practices, and feel as though they have a good handle on how to use it in their instructional practices and assessment of student learning. In our educational conversations we bring up Blooms Taxonomy, and debate whether students have knowledge of a topic, and if they can apply it to their daily life.

Interestingly enough, Bloom himself has been quoted as saying that his handbook is “one of the most widely cited yet least read books in American education”. We are guilty of doing that from time to time. Its human nature to tout a philosophy that we may only have surface level knowledge of, which is kind of ironic when we’re talking about Bloom’s Taxonomy.

For a more in depth understanding of Bloom’s, the Center for Teaching at Vanderbilt University website says, “Here are the authors’ brief explanations of these main categories from the appendix of Taxonomy of Educational Objectives (Handbook One, pp. 201-207):

  • Knowledge – “involves the recall of specifics and universals, the recall of methods and processes, or the recall of a pattern, structure, or setting.”
  • Comprehension – “refers to a type of understanding or apprehension such that the individual knows what is being communicated and can make use of the material or idea being communicated without necessarily relating it to other material or seeing its fullest implications.”
  • Application – refers to the “use of abstractions in particular and concrete situations.”
  • Analysis – represents the “breakdown of a communication into its constituent elements or parts such that the relative hierarchy of ideas is made clear and/or the relations between ideas expressed are made explicit.”
  • Synthesis – involves the “putting together of elements and parts so as to form a whole.”
  • Evaluation –  engenders “judgments about the value of material and methods for given purposes.”

According to the @LeadingLearner Blog, “it (Bloom’s) was revised in 2000. In Bloom’s original work the knowledge dimensions consisted of factual, conceptual and procedural knowledge.  Later the metacognitive knowledge dimension was added and the nouns changed to verbs with the last two cognitive processes switched in the order.

  • Remember
  • Understand
  • Apply
  • Analyse
  • Evaluate
  • Create”

The criticism with Bloom’s is that it seems to focus on regurgitating information, and that anything goes. A student can provide a surface-level answer to a difficult question, or a deep answer to a surface-level question. It may show a student has an answer, but does it allow for teachers and students to go deeper with their learning, or do they just move on?

According to Pam Hook, “There is no necessary progression in the manner of teaching or learning in the Bloom’s taxonomy.” If we want students to take control over their own learning, can they use Bloom’s Taxonomy, or is there a better method to help them understand where to go next?

Going SOLO

A much less known taxonomy of assessing student learning is SOLO, which was created by John Biggs and Kevin Collis in 1982. According to Biggs, “SOLO, which stands for the Structure of the Observed Learning Outcome, is a means of classifying learning outcomes in terms of their complexity, enabling us to assess students’ work in terms of its quality not of how many bits of this and of that they got right.”

According to Biggs and Collis (1982), there are five stages of “ascending structural complexity.” Those five stages are:

  • Prestructural – incompetence (they miss the point).
  • Unistructural – one relevant aspect
  • Multistructural – several relevant and independent aspects
  • Relational – integrated into a structure
  • Extended Abstract – generalized to new domain

If we are going to spend so much time in the learning process, we need to do more than accept that students “get” something at “their level” and move on. Using SOLO taxonomy really presents teachers and students with the opportunity to go deeper into learning whatever topic or subject they are involved in, and assess learning as they travel through that learning experience.

Through reading blogs and research, one of the positives sides to SOLO is that it makes it easier for teachers to identify the levels, and therefore help guide students through the learning process. Ad for my unschooling friends, this has implications for all students, whether they are within the four walls of a school or outside of them.

John Hattie (Peter is a Visible Learning Trainer) is a proponent of SOLO Taxonomy, and has broken it down to an easier way for students to understand which gives them the ability to assess their own learning. Hattie suggests that teachers can use:

  • No Idea – equivalent to the prestructural level.
  • One Idea – equivalent to the unistructural level
  • Many Ideas – equivalent to the multistructural level
  • Relate – equivalent to the relational level
  • Extend – equivalent to the extended abstract

Lastly, Hook goes on to say, “there are some real advantages to SOLO Taxonomy.

  • These advantages concern not only item construction and scoring, but incorporate features of the process of evaluation that pay attention to how students learn, and how teachers devise instructional procedures to help students use progressively more complex cognitive processes.
  • Both teachers and students often progress from more surface to deeper constructs and this is mirrored in the four levels of the SOLO taxonomy.
  • The levels can be interpreted relative to the proficiency of the students. Six year old students can be taught to derive general principles and suggest hypotheses, though obviously to a different level of abstraction and detail than their older peers. Using the SOLO method, it is relatively easy to construct items to assess such abstractions.
  • Unlike the experience of some with the Bloom taxonomy it is relatively easy to identify and categorise the SOLO levels.
  • Similarly, teachers could be encouraged to use the ‘plus one’ principle when choosing appropriate learning material for students. That is, the teacher can aim to move the student one level higher in the taxonomy by appropriate choice of learning material and instructional sequencing.

AIOU Solved Assignments 1 & 2 Code 8602 Spring 2020

Q.3   Compare and contrast the characteristics of Criterion and Norm referenced Tests. Also highlight its utilization in teaching learning process.

What’s the best way to score tests? In this lesson, we’ll look at two major types of tests that are scored differently from each other: norm-referenced and criterion-referenced tests.


Ricki is an educational psychologist. She wants to do a study examining whether or not a certain curriculum will help students learn math skills. In order to figure that out, she has to put together a math test that the students will take after they’ve been exposed to the curriculum.

Psychological measurement is the process of evaluating psychological traits, including cognitive skills, like math, and other traits, like depression or altruism.

Measurement is the cornerstone of psychological studies. Without measurement, there is no way to gather data in a study. Without data, there is no way to know if your hypothesis is correct. For example, Ricki might think that the curriculum will help students learn math, but unless she measures their math skills, she won’t have the data to show whether it actually does help or not.

There are many tools used in psychological measurement. When looking at cognitive, or thinking, skills, tests are usually used to measure outcomes. IQ tests are examples of psychological tests that measure cognitive skills. So is Ricki’s math test.

Let’s look at two different ways of scoring tests: norm-referenced and criterion-referenced.


Ricki wants to know if her curriculum will help students learn math skills, and she’s written a math test for the students to take. But, how should she determine what passing means?

norm-referenced test scores a test by comparing a person’s performance to others who are similar. You can remember norm-referenced by thinking of the word ‘normal.’ The object of a norm-referenced test is to compare a person’s performance to what is normal for other people like him or her.

Think of it kind of like a race. If a runner comes in third in a race, that doesn’t tell us anything objectively about what the runner did. We don’t know if she finished in 30 seconds or 30 minutes; we only know that she finished after two other runners and ahead of everyone else.

So, if Ricki decides to make her test norm-referenced, she would compare students to what is normal for that age, grade, or class. Examples of norm-referenced tests include the SAT, IQ tests, and tests that are graded on a curve. Anytime a test offers a percentile rank, it is a norm-referenced test. If you score at the 80th percentile, that means that you scored better than 80% of people in your group.

Norm-referenced tests are a good way to compensate for any mistakes that might be made in designing the measurement tool. For example, what if Ricki’s math test is too easy, and everybody aces it? If it is a norm-referenced test, that’s OK because you’re not looking at the actual scores of the students but how well they did in relation to students in the same age group, grade, or class.


But, norm-referenced tests aren’t perfect. They aren’t completely objective and make it hard to know anything other than how someone did in comparison to others. But, what if we want to know about a person’s performance without comparing them to others?

criterion-referenced test is scored on an absolute scale with no comparisons made. It is interested in one thing only: did you meet the standards?

Let’s go back to our race scenario. Saying that a runner came in third place is norm-referenced because we are comparing her to the other runners in the race. But, if we look at her time in the race, that’s criterion-referenced. Saying she finished the race in 58:42 is an objective measure that is not a comparison to others.

Assessment results allow educators to make important decisions about students’ knowledge, abilities and future educational potential. There are multiple ways to summarize and interpret assessment results. This lesson will discuss ways to summarize norm-referenced assessments and criterion-referenced assessments.

Using Assessments

Teacher: Thank you for coming in today to meet with me regarding your child’s progress in school. I want to provide you information on the multiple types of assessments we take in the classroom and explain how we score and use the results for various purposes.

We take multiple types of assessments in our class, and there are many ways I summarize the results of these assessments. These summaries provide feedback regarding your child’s level of mastery and understanding. These assessments also give me a way to address any areas of weakness for individual students or in the class as a whole.

Raw Scores

The most basic way to summarize an assessment is through a raw score. A raw score is the score based solely on the number of correctly answered items on an assessment.

For example, this is your child’s most recent math test. His raw score was a 96 because he got 96 items correct on the assessment. Raw scores are often used in teacher-constructed assessments.

The potential drawback to the use of raw scores is that they may be difficult to interpret without knowledge of how one raw score compares to a norm group, which is a reference group used to compare one test taker’s score to similar other test takers. We’ll talk about using norm-referenced scores in a moment. Raw scores may also be difficult to understand without comparing them to specific criteria, which we will discuss now.

Criterion-Referenced Scores

I want to discuss another method of scoring: criterion-referenced scoring. This refers to a score on an assessment that specifically indicates what a student is capable of or what knowledge they possess.

Criterion-referenced scores are most appropriate when an educator wants to assess the specific concepts or skills a student has learned through classroom instruction. Most criterion-referenced assessments have a cut score, which determines success or failure based on an established percentage correct.

For example, in my class, in order for a student to successfully demonstrate their knowledge of the math concepts we discuss, they must answer at least 80% of the test questions correctly. Your child earned an 85% on his last fractions test; therefore, he demonstrated knowledge of the subject area and passed.

It’s important to remember that criterion-referenced scores tell us how well a student performs against an objective or standard, as opposed to against another student. For example, a learning objective in my class is ‘students should be able to correctly divide fractions.’ The criterion-referenced score tells me if that student meets the objective successfully.

The potential drawback for criterion-referenced scores is that the assessment of complex skills is difficult to determine through the use of one score on an assessment.

Norm-Referenced Scores

Now let’s discuss the type of score that compares one student’s performance on an assessment with the average performance of other peers. This is referred to as norm-referenced scores.

Norm-referenced scores are useful when educators want to make comparisons across large numbers of students or when making decisions on student placement (in K-12 schools or college) and grade advancement. Some familiar examples of norm-referenced assessments are the SAT, ACT and GRE.

Age/Grade Equivalent, Percentile, Standard

here are three types of norm-referenced scores. The first is age or grade equivalent. These scores compare students by age or grade. Breaking this type down, we can see that age equivalent scores indicate the approximate age level of students to whom an individual student’s performance is most similar, and grade equivalent scores indicate the approximate grade level of students to whom an individual student’s performance is most similar.

These scores are useful when explaining assessment results to parents or people unfamiliar with standard scores. For example, let’s look at your child’s raw score on a recent math standardized assessment. Looking at the chart, we see that your child’s raw score of 56 places him at an 8th grade level and an approximate age of 13.

The potential disadvantage of using age or grade equivalent scores is that parents and some educators misinterpret the scores, especially when scores indicate the student is below expected age or grade level.

The second type of norm-referenced scoring is percentile rank. These scores indicate the percentage of peers in the norm group with raw scores less than or equal to a specific student’s raw score.

Percentile rank scores can sometimes overestimate differences of students with scores that fall near the mean of the normed group and underestimate differences of students with scores that fall in the extreme lower or upper range of the scores.

AIOU Solved Assignments 1 & 2 Code 8602 Spring 2020

Q.4   Elaborate the different techniques for the measurement of aptitude of the learners by providing examples. Why aptitude measurement is important for the teachers in teaching learning process?

Although online learning has grown alongside the progress of digital technology over the last 15 years; the reasoning behind why students become absorbed, practise and achieve a variety of tasks and exercises, or why they avoid others are always of interest to the effectors and evaluators of the learning process.

By establishing the characteristics of distance and online learners; how they become motivated, how they feel about learning online; useful information will be found that would empower the teaching practices and thus ultimately enhance student retention and achievement.

A review of some of the literature available has revealed some research already undertaken in various areas of learning online, such as ‘training effectiveness and user attitudes’. Torkzadeh et al suggest, ” to achieve successful training we need to be cognizant of the user’s attitudes towards computers. Further investigation revealed other factors that should be taken into consideration; Miltiadou (1999) suggests that ‘it is important to identify motivational characteristics of online students’. By investigating and defining their motivation, it would lead to an understanding of ‘self-efficacy beliefs about their own abilities to engage, persist and accomplish specific tasks’.

Perhaps the most straightforward way of finding out about someone’s attitudes would be to ask them. However, attitudes are related to self-image and social acceptance (i.e. attitude functions).

In order to preserve a positive self-image, people’s responses may be affected by social desirability. They may not well tell about their true attitudes, but answer in a way that they feel socially acceptable.

Given this problem, various methods of measuring attitudes have been developed.  However, all of them have limitations.  In particular the different measures focus on different components of attitudes – cognitive, affective and behavioral – and as we know, these components do not necessarily coincide.

Attitude measurement can be divided into two basic categories

  • Direct Measurement (likert scale and semantic differential)
  • Indirect Measurement (projective techniques)

Semantic Differential

The semantic differential technique of Osgood et al. (1957) asks a person to rate an issue or topic on a standard set of bipolar adjectives (i.e. with opposite meanings), each representing a seven point scale.

To prepare a semantic differential scale, you must first think of a number of words with opposite meanings that are applicable to describing the subject of the test.

For example, participants are given a word, for example ‘car’, and presented with a variety of adjectives to describe it.  Respondents tick to indicate how they feel about what is being measured.

In the picture (above), you can find Osgood’s map of people’s ratings for the word ‘polite’. The image shows ten of the scales used by Osgood. The image maps the average responses of two groups of 20 people to the word ‘polite’.

The semantic differential technique reveals information on three basic dimensions of attitudes: evaluation, potency (i.e. strength) and activity.

Evaluation is concerned with whether a person thinks positively or negatively about the attitude topic (e.g. dirty – clean, and ugly – beautiful).

Potency is concerned with how powerful the topic is for the person (e.g. cruel – kind, and strong – weak).

Activity is concerned with whether the topic is seen as active or passive (e.g. active – passive).

Using this information we can see if a persons feeling (evaluation) towards an object is consistent with their behavior.  For example, a place might like the taste of chocolate (evaluative) but not eat it often (activity).  The evaluation dimension has been most used by social psychologists as a measure of a person’s attitude, because this dimension reflects the affective aspect of an attitude.

Evaluation of Direct Methods

An attitude scale is designed to provide a valid, or accurate, measure of an individual’s social attitude.  However, as anyone who has every “faked” an attitude scales knows there are shortcomings in these self report scales of attitudes.  There are various problems that affect the validity of attitude scales.  However, the most common problem is that of social desirability.

Socially desirability refers to the tendency for people to give “socially desirable” to the questionnaire items.  People are often motivated to give replies that make them appear “well adjusted”, unprejudiced, open minded and democratic.  Self report scales that measure attitudes towards race, religion, sex etc. are heavily affected by socially desirability bias.

Respondents who harbor a negative attitude towards a particular group may not wish be admit to the experimenter (or to themselves) that they have these feelings.  Consequently, responses on attitude scales are not always 100% valid.

Projective Techniques

To avoid the problem of social desirability, various indirect measures of attitudes have been used.  Either people are unaware of what is being measured (which has ethical problems) or they are unable consciously to affect what is being measured.

Indirect methods typically involve the use of a projective test.  A projective test is involves presenting a person with an ambiguous (i.e. unclear) or incomplete stimulus (e.g. picture or words). The stimulus requires interpretation from the person.  Therefore, the person’s attitude is inferred from their interpretation of the ambiguous or incomplete stimulus.

The assumption about these measures of attitudes it that the person will “project” his or her views, opinions or attitudes into the ambiguous situation, thus revealing the attitudes the person holds.  However, indirect methods only provide general information and do not offer a precise measurement of attitude strength since it is qualitative rather than quantitative. This method of attitude measurement is not objective or scientific which is a big criticism.

Examples of projective techniques include:

• Rorschach Inkblot Test

• Thematic Apperception Test (or TAT)

• Draw a Person Task

Here a person is presented with an ambiguous picture which they have to interpret.

The thematic apperception test (TAT) taps into a person’s unconscious mind to reveal the repressed aspects of their personality.

Although the picture, illustration, drawing or cartoon that is used must be interesting enough to encourage discussion, it should be vague enough not to immediately give away what the project is about.

TAT can be used in a variety of ways, from eliciting qualities associated with different products to perceptions about the kind of people that might use certain products or services.

The person must look at the picture(s) and tell a story. For example:

o What has led up to the event shown
o What is happening at the moment
o What the characters are thinking and feeling, and
o What the outcome of the story was

Draw a Person Test

Figure drawings are projective diagnostic techniques in which an individual is instructed to draw a person, an object, or a situation so that cognitive, interpersonal, or psychological functioning can be assessed.  The test can be used to evaluate children and adolescents for a variety of purposes (e.g. self-image, family relationships, cognitive ability and personality).

A projective test is one in which a test taker responds to or provides ambiguous, abstract, or unstructured stimuli, often in the form of pictures or drawings.

While other projective tests, such as the Rorschach Technique and Thematic Apperception Test, ask the test taker to interpret existing pictures, figure drawing tests require the test taker to create the pictures themselves. In most cases, figure drawing tests are given to children.  This is because it is a simple, manageable task that children can relate to and enjoy.

Some figure drawing tests are primarily measures of cognitive abilities or cognitive development. In these tests, there is a consideration of how well a child draws and the content of a child’s drawing.  In some tests, the child’s self-image is considered through the use of the drawings.

In other figure drawing tests, interpersonal relationships are assessed by having the child draw a family or some other situation in which more than one person is present. Some tests are used for the evaluation of child abuse.  Other tests involve personality interpretation through drawings of objects, such as a tree or a house, as well as people.

Finally, some figure drawing tests are used as part of the diagnostic procedure for specific types of psychological or neuropsychological impairment, such as central nervous system dysfunction or mental retardation.

Despite the flexibility in administration and interpretation of figure drawings, these tests require skilled and trained administrators familiar with both the theory behind the tests and the structure of the tests themselves.  Interpretations should be made with caution and the limitations of projective tests should be considered.

It is generally a good idea to use projective tests as part of an overall test battery. There is little professional support for the use of figure drawing, so the examples that follow should be interpreted with caution.

The House-Tree-Person (HTP) test, created by Buck in 1948, provides a measure of a self-perception and attitudes by requiring the test taker to draw a house, a tree, and a person.

  • The picture of the house is supposed to conjure the child’s feelings toward his or her family.
  • The picture of the tree is supposed to elicit feelings of strength or weakness. The picture of the person, as with other figure drawing tests, elicits information regarding the child’s self-concept.

The HTP, though mostly given to children and adolescents, is appropriate for anyone over the age of three.

Evaluation of Indirect Methods

The major criticism of indirect methods is their lack of objectivity. Such methods are unscientific and do not objectively measure attitudes in the same way as a Likert scale.

There is also the ethical problem of deception as often the person does not know that their attitude is actually being studied when using indirect methods.

The advantages of such indirect techniques of attitude measurement are that they are less likely to produce socially desirable responses, the person is unlikely to guess what is being measured and behavior should be natural and reliable.

AIOU Solved Assignments 1 & 2 Code 8602 Spring 2020

Q.5   Explain the advantages and disadvantages of objective type test items. Also highlight the importance and significance of the objective type tests.

Although essay questions are one of the most commonly used methods for assessing student learning, many are poorly designed and ineffectively used. Writing effective essay questions requires training and practice. There are subtle characteristics of effective essay questions that are often difficult to discern for those without adequate training. This workbook was developed to provide training and practice in discerning the often difficult to see characteristics of effective essay questions and to support educators in the development and use of essay questions.

Effective essay questions

it supports teaching assistants who work with educators and often have exam development and grading responsibilities. This workbook is the first in a series of three workbooks designed to improve the development and use of effective essay questions. It focuses on the writing and use of essay questions. The second booklet in the series focuses on scoring student responses to essay questions. The third workbook focuses on preparing students to respond to essay questions and can be used with both educators and students.

When to Use Essay or Objective Tests

Essay tests are especially appropriate when:

  • the group to be tested is small and the test is not to be reused.
  • you wish to encourage and reward the development of student skill in writing.
  • you are more interested in exploring the student’s attitudes than in measuring his/her achievement.
  • you are more confident of your ability as a critical and fair reader than as an imaginative writer of good objective test items.

Objective tests are especially appropriate when:

  • the group to be tested is large and the test may be reused.
  • highly reliable test scores must be obtained as efficiently as possible.
  • impartiality of evaluation, absolute fairness, and freedom from possible test scoring influences (e.g., fatigue, lack of anonymity) are essential.
  • you are more confident of your ability to express objective test items clearly than of your ability to judge essay test answers correctly.
  • there is more pressure for speedy reporting of scores than for speedy test preparation.

Either essay or objective tests can be used to:

  • measure almost any important educational achievement a written test can measure.
  • test understanding and ability to apply principles.
  • test ability to think critically.
  • test ability to solve problems.
  • test ability to select relevant facts and principles and to integrate them toward the solution of complex problems.

In addition to the preceding suggestions, it is important to realize that certain item types are better suited than others for measuring particular learning objectives. For example, learning objectives requiring the student to demonstrate or to show, may be better measured by performance test items, whereas objectives requiring the student to explain or to describe may be better measured by essay test items. The matching of learning objective expectations with certain item types can help you select an appropriate kind of test item for your classroom exam as well as provide a higher degree of test validity (i.e., testing what is supposed to be tested). To further illustrate, several sample learning objectives and appropriate test items are provided on the following page.

Despite the advantages associated

Despite the advantages associated with essay questions, there are also disadvantages. Have you ever labored over the wording of an essay question in an effort to make it clear and precise so that the students know exactly what you expect of them? Or have you ever felt the frustration of trying to develop reliable and fair scoring criteria for grading students’ responses to essay questions only to discover that you were as unsure of what was asked for in the essay question as the students? These are some of the difficulties of essay questions. This workbook addresses the advantages and disadvantages of essay questions and illustrates ways of improving the use of essay questions.

two major purposes for using essay questions

There are two major purposes for using essay questions. One purpose is to assess students’ understanding of and ability to think with subject matter content. The other purpose is to assess students’ writing abilities. These two purposes are so different in nature that it is best to treat them separately. This workbook will focus on essay questions that assess students’ thinking skills. When going through this workbook it is important to keep this focus in mind and to understand that some of the rules and principles discussed may even contradict rules and principles that apply for essay questions that assess students’ writing skills.

Items distributions method

Multiple-choice questions, matching exercises, and true-false items are all examples of selected response test items because they require students to choose an answer from a list of possibilities, whereas essay questions require students to compose their own answer. However, requiring students to compose a response is not the only characteristic of an effective essay question. There are assessment items other than essay questions that require students to construct responses (e.g., short answer, fill in the blank). Essay questions are different from these other constructed response items because they require more systematic and in-depth thinking. An effective essay question will align with each of the four criteria given in Stalnaker’s definition and provide students with an indication of the types of thinking and content to use in responding to the essay question.

Suggest criteria for marking of essay type test

If you teach, then you need to know how to calculate test scores. You may not be a math teacher, but you still need to do the math when it comes to calculating test scores for your students. When an essay is involved in a test, it isn’t so cut and dry in grading. You may be just a parent or a student who wants to figure out how you received a certain score. You can figure it out. Depending on whether the assignment was graded on a curve, the scores will dictate where the cutoffs should be.

Suggestions of marking criteria

  • Take a calculator and type in the number of questions there are on the test. The questions on the test should be numbered for you already. If they are not, then count them yourself.
  • Divide the amount of questions into 100, which is the highest score a student can receive on a test.
  • Take the number you received from dividing into 100 and multiply it with the amount of wrong answers. The number you received from dividing into 100 is how much each question is worth.
  • Type the number 100 into the calculator, and subtract the amount that you received that represents the wrong answers. The result will be the true score on the test.
  • To calculate a test score for an essay properly, score different aspects of the essay. If an essay is involved, there are different aspects of the essay that will need scores. One of these aspects is grammar. Others are proper sentence structure, spelling and punctuation.
  • Give 10 points for each aspect when an essay is involved. If this is done, realize that the remaining questions will need to be re-evaluated for score worth. For example, let’s say you have 10 questions and an essay. If the essay is worth 30 points in total (10 points for grammar, 10 points for sentence structure, 10 points for spelling and punctuation), then you need to take the 10 questions and divide them into 100 to find the true worth of each question.

Take the value of each question, and multiply it with the amount of wrong answers. Subtract that from 100, and add to the 30 points if earned to find test scores. After you have decided to use either an objective, essay or both objective and essay exam, the next step is to select the kind(s) of objective or essay item that you wish to include on the exam. To help you make such a choice, the different kinds of objective and essay items are presented in the following section of this booklet. The various kinds of items are briefly described and compared to one another in terms of their advantages and limitations for use. Also presented is a set of general suggestions for the construction of each item variation.

Objective type test

Basically, scoring objective test items is easy: It only requires one to follow the scoring rules. However, constructing good objective test items requires much more skill and effort. The first step is to develop a set of test specifications that can serve to guide the selection of test items. A table of specifications (or test blueprint) is a useful tool for this purpose. This tool is usually a two-way grid that describes content areas to be covered by the test as the row headings and skills and abilities to be developed (i.e., instructional objectives) as the column headings (Figure 2). After specifying the content and ability covered by the test using the table of specifications, the appropriate test item format is selected for each item. At this point, not only objective test items but also other types of test items—essay test or performance assessment—should be considered, depending on the learning outcomes to be measured.

The next step is to create specific test items. Typically, it is particularly important for objective test items to be written in clear and unambiguous language to allow examinees to demonstrate their attainment of the learning objectives. If complex wording is used, the item simply reflects reading comprehension ability. It is also important for each objective test item to focus on an important aspect of the content area rather than trivial details. Asking trivial details not only makes the test items unnecessarily difficult, it also obscures what the test constructor really wants to measure. Similarly, relatively novel material should be used when creating items that measure understanding or the ability to apply principles. Items created by copying sentences verbatim from a textbook only reflect rote memory, rather than higherorder cognitive skills.

Many other specific rules exist for constructing objective test items. Test constructors must be very careful that examinees with little or no content knowledge cannot arrive at the correct answer by utilizing the characteristics of the test format that are independent of specific content knowledge. Jason Millman and his colleagues called this skill of the examinees “test-wiseness.” For example, in multiple-choice test items, all options should be grammatically correct with respect to the stem (questions or incomplete statements preceding options), and key words from a stem, or their synonyms, should not be repeated in the correct option. Any violation of these rules would obviously provide an advantage for testwise examinees. Test composers should also equalize the length of the options of an item and avoid using specific determiners such as all, always, and never because some testwise examinees know that the correct option is frequently long and without such specific determiners. Robert Thorndike and Anthony Nitko have provided more comprehensive guidelines, with detailed explanations for constructing objective test items.


One common criticism of objective test items is that students are encouraged toward rote learning and other surface-processing strategies. Another related criticism is that objective tests, if used to evaluate the educational attainment of schools, encourage teachers to place undue emphasis on factual knowledge and disregard the understanding of students in the classrooms. Some evidence suggests that both are the case.

Kou Murayama, in a series of studies, investigated the effects of objective test items on the use of learning strategies. In one study, junior high school students participated in a history class for five days and took either an essay or short-answer test at the end of each day. Results showed that in the last day, those who took the short-answer tests used more rote learning strategies and fewer deep-processing strategies than those who took the essay tests. George Madaus reviewed much literature about the effects of standardized testing on what is taught at schools and found that teachers pay particular attention to the form of the questions and adjust their instruction accordingly, suggesting that objective tests could narrow instruction to the detriment of higher-order skills. Madaus argued that high-stakes tests—tests that are used to make important decisions such as the ranking ofschools—have much more influenceon teaching.

However, educators should be reminded that objective test items are not limited to testing for specific factual knowledge. Well written items may not have such negative effects on students’ use of learning strategies or teachers’ teaching styles. Thus, it is not the objective test items per se that should be changed. What is important is to change the stereotypical beliefs that objective test items require only rote learning of factual knowledge and avoid poorly constructed objective test items.


A variety of different types of objective test formats can be classified into two categories: a selected response format, in which examinees select the response from a given number of alternatives, including true/false, multiple choice, and matching test items; and a constructed response format, in which examinees are required to produce an entire response, including short answer test items. This distinction is sometimes captured in terms of recognition and recall. These two general categories are further divided into basic types of objective tests, illustrated in the following examples (Figure 1).

The true/false test is the simplest form of selected response formats. True/false tests are those that ask examinees to select one of the two choices given as possible responses to a test question. The choice is between true and false, yes and no, right and wrong, and so on. A major advantage of the true/false test is its efficiency as it yields many independent responses per unit of testing time. Therefore, teachers can cover course material comprehensively in a single test. However, one apparent limitation of the true/false test is its susceptibility to guessing. It should be noted, however, that test givers can attenuate the effects of guessing by increasing the number of items in a test. In addition, some guessing might reflect partial knowledge, which would provide a valid indication of achievement. Another selected response format type is the multiple-choice test, which has long been the most widely used among the objective test formats. Multiple-choice test items require the examinee to select one or more responses from a set of options (in most cases, 3–7). The correct alternative in each item is called the answer (or the key), and the remaining alternatives are called distracters. Examinees have less chance of guessing the correct answer to a multiple-choice test question compared to a true/false test question. In addition, the distracter an examinee selects may provide useful diagnostic information.

Leave a Reply

Your email address will not be published. Required fields are marked *