Dealing with Poor Assessment Methods

Considerable effort is required to produce quality assessments for medical education. Expertise is required, not only in subject content, but also in exam construction and delivery. For multiple reasons, assessments are not always the best that they could be.

Assessments can be reliable or valid or both. Reliability describes reproducibility: for example, do examination questions yield similar results each time they are administered and to different groups of students? Validity describes appropriateness: do examination questions measure knowledge and reasoning ability, thereby providing a measure of meaningful achievement?

There are two major categories for flaws in examinations, particularly examinations in a multiple choice format:

  • Construct irrelevant variance

  • Construct underrepresentation

Construct Irrelevant Variance

Construct irrelevant variance (CIV) is the introduction of extraneous, uncontrolled variables that affect assessment outcomes. The meaningfulness and accuracy of examination results is adversely affected, the legitimacy of decisions made upon exam results is affected, and the validity is reduced. Sources of CIV include:

  • Poorly constructed examination questions

  • Testwiseness

  • Guessing

  • Item bias

  • Indefensible passing score

  • Testing irregularities

When examinations contain flawed items, 'noise' is introduced in the form of badly worded, misleading, and confusing questions that make it more difficult for the student to answer correctly, even if the student has mastered the content domain of the question. Flawed items are more likely to produce 'false negatives' or students who fail the examination but should not have failed. The testwise student can use flaws in the structure of questions to arrive at the correct answer without knowing anything about the content upon which the question is based. Flawed questions lead to guessing, and flaws reduce randomness and may reduce the likelihood of chance alone from picking the corrent answer. Item bias is detected by differential item functioning that seeks to identify flawed test items that favor one group of students over another. Setting the passing standard requires judgment regarding the level of achievement required and should not be arbitrary. Academic institutions take care to create testing environments that preclude irregularities such as cheating.

Construct Underrepresentation

Construct underrepresentation (CU) occurs when the examination lacks validity because the examination content is not reflective of relevant knowledge. Examples of construct underrepresentation include:

  • Trivial content

  • Rote memorization for factual recall

  • Few examination items

  • Maldistribution of examination items

  • Teaching to the test

Trivial content is unimportant for future learning or care of patients. Examination items at low level of cognitive function require only rote memorization to recall isolated facts that may not reflect the integrated knowledge to support clinical reasoning with problem solving for care of patients with real medical problems. Maldistribution of examination items leads to oversampling of some content areas and undersampling of others. Too few examination items leads to failure to adequately sample the learning content in the achievement domain desired. The reliability of the examination suffers as well. An examination of sufficient length will be a fairer, more accurate, and reliable sample of important knowledge. Teaching to the test leads to scores that are an inaccurate reflection of the knowledge domain. In summary CU can be overcome when sufficient examination items require higher order cognition to solve clinically relevant problems.


The students who are most likely to be affected by CIV and CU are those who exhibit marginal performance toward the lowest passing level. Hence, improving performance above this level will help prevent the 'false negative' effect of poor examinations.

Students should exhibit the professional behavior to seek appropriate knowledge content for use at higher cognitive levels for clinical problem solving. Even if the test is not reflective of the student's true ability, the student moves on to higher levels where the value of such learning can be demonstrated.


