Department of Statistical Science
Duke University

presents:

Eric Bradlow
ebradlow@hops.wharton.upenn.edu
University of Pennsylvania

Professor Bradlow is going to present a four part mini seminar.

"Some Interesting Statistical Problems in Educational Testing"

Abstract 1:

"Some Statistical and Logical Considerations When Rescoring Tests"

When tests or portions of tests are scored subjectively by raters, a rescoring will yield a change in the ratings of some examinees. In a high stakes test with a fixed passing score a rescoring will result in the change of some pass/fail decisions. The number of changes depends on three things: (1) the reliability of the rating system, (2) the proportion of examinees that are expected to pass, and (3) the policy used to incorporate the rescore into the pass/fail decision. In this study, we provide a model that facilitates the evaluation of various rescoring strategies. We illustrate the use of this model for rescoring schemes that concatenate all ratings, old and new. The estimation of rescoring effects are computed by direct simulation. Sensitivity of results to distributional assumptions is described as well. A further generalization of the basic model is also considered in which a test is comprised of a mixture of objectively and subjectively scored items.

Keywords: normal linear model, constructed responses, rescoring, variance components models, rater reliability

Abstract 2:

"Negative Information and the Three-Parameter Logistic Model"

The three parameter logistic (3-PL) model is commonly used to describe the relationship among an unobserved latent trait (ability), unobserved item properties, and an observed binary outcome. We show that for certain values of the item properties and latent ability, the observed information about ability contained in the binary response is negative. This result has implications for maximization procedures such as Newton-Raphson, and approximate sampling methods such as the Metropolis-Hastings algorithm. We show further that the expected information is always non-negative, and that observed negative information does not occur in the limiting case with no guessing (2-PL model). The probability of negative information is expressed by a simple formula. This research extends the work of Samejima (1973) and Yen et al. (1991).

Keywords: Maximization routines, Approximate sampling methods

Abstract 3:

"Item Response Theory Models Applied to Data Allowing Examinee Choice"

Examinations that permit students to choose a subset of the items are popular despite the potential that students may take examinations of varying difficulty as a result of their choices. We provide a set of conditions for the validity of inference for IRT models applied to data collected from choice-based examinations. Valid likelihood and Bayesian inference using standard estimation methods require (except in extraordinary circumstances) that there is no residual dependence between the examinees choices and the following: (1) their (potential but unobserved) responses to omitted items, (2) their latent abilities conditional on the observed item responses. These independence assumptions are typical of those required in much more general settings. IRT models, though potentially useful tools for educational data, offer no special advantage with choice-based data.

Keywords: examinee choice, missing data, item response theory

Abstract 4:

"A Bayesian Random Effects Model for Testlets"

Standard item response theory (IRT) models fit to examination responses ignore the fact that sets of items (testlets) often come from a single common stimuli (e.g. a reading comprehension passage). In this setting, all items given to an examinee are unlikely to be conditionally independent (given examinee ability). Such models which assume conditional independence will overestimate the precision with which examinee ability is measured. Overstatement of precision may lead to inaccurate inferences as well as prematurely ending an examination in which the stopping rule is based on the estimated variance of examinee ability (e.g a computer adaptive test). To model examinations which may be a mixture of independent items and testlets, we modify standard IRT models to include an additional random effect for items nested within the same testlet. A Bayesian framework is introduced which facilitates posterior inference via a Data Augmented Gibbs Sampler (DAGS, Tanner and Wong 1987). The modified and standard IRT models are both applied to a data set taken from the (SAT data). We also provide simulation results which indicates that the degree of precision bias is a function of the variability of the testlet effects, as well as the testlet design.

Keywords: Gibbs sampler, Data augmentation, testlets.

March 7, 1997

4:00 pm - 5:00 pm

116 Old Chem Building

Any questions concerning the seminar may be addressed to Cheryl McGhee @ [919] 684-8029 or e-mail cheryl@stat.duke.edu. Please contact the author(s) directly for reprints etc.