JEROME P. REITER
Mrs. Alexander Hehmeyer Associate Professor of Statistical
Science
Department of Statistical Science
Duke University
Summary of dissertation
Title: Estimation
in the Presence of Constraints
that Prohibit Explicit Data Pooling
Advisor:
Donald B. Rubin,
Dept. of Statistics,
Harvard
University
Summary:
When using regression models where units can be classified into
distinct groups, similar parameters in each group can
be estimated by combining data across groups, as in hierarchical
Bayesian models. Such explicit data pooling typically increases
estimation accuracy relative to separate regressions in each group.
Sometimes, however, explicit data pooling is
prohibited because of legal or political constraints.
For example, under the design of the 2000
U.S. census that includes the Integrated Coverage
Measurement
Survey--which was the proposed
design of the
census until the Supreme Court ruled sampling for apportionment
illegal--the
Census Bureau avoids pooling data across states when estimating
population
counts because the law may not allow data from one state to affect
population
counts in another state. Similar constraints may apply when sampling is
used
to audit, assess, or compare the performances of several groups: a
group
may not want the data from other groups to affect its performance
estimate,
particularly when poor performance is penalized.
In my dissertation, I develop techniques for such constrained
estimation settings.
The general approach
is to utilize information from multiple groups to specify the model in
each group, but ultimately estimate the parameters in each group's
model using only that group's data. This may satisfy the constraints
because data pooling is used just for model specification and not for
parameter estimation, and it
may
increase estimation accuracy relative to regressing
separately in each group because it takes advantage of across-group
information.
The techniques can be
conceptualized as existing on a continuum ordered by how directly each
relies on data pooling to make estimates; those techniques that look
more like explicit data pooling are typically more accurate yet less
likely to be acceptable.
Using simulation
studies, I show that some of these techniques have great potential to
increase estimation accuracy.
I also
present a procedure that predicts the payoff of each technique from
observable characteristics of the data.
Portions of this research have
appeared in:
Reiter, J. (2000) ``Borrowing
Strength When Explicit Data Pooling
Is Prohibited.''
Journal of Official Statistics, 16, pp.
295-319.
Reiter, J. (2001) ``Borrowing Strength Without Explicit Data
Pooling.'' In
Monographs of Official Statistics: Bayesian Methods
With Applications to Science, Policy, and Official Statistics.
Edited by E. George and N. Photis. Eurostat, pp. 439-448.