JEROME P. REITER

Mrs. Alexander Hehmeyer Associate Professor of Statistical Science
Department of Statistical Science

Duke University

Summary of dissertation

 
Title:          Estimation in the Presence of Constraints that Prohibit Explicit Data Pooling
 
Advisor:    Donald B. Rubin,   Dept. of Statistics, Harvard University
 
Summary:
 
When using regression models where units can be classified into distinct groups, similar parameters in each group can be estimated by combining data across groups, as in hierarchical Bayesian models. Such explicit data pooling typically increases estimation accuracy relative to separate regressions in each group.  Sometimes, however, explicit data pooling is prohibited because of legal or political constraints.  For example, under the design of the 2000 U.S. census that includes the Integrated Coverage Measurement  Survey--which was the proposed design of the census until the Supreme Court ruled sampling for apportionment illegal--the Census Bureau avoids pooling data across states when estimating population counts because the law may not allow data from one state to affect population counts in another state. Similar constraints may apply when sampling is used to audit, assess, or compare the performances of several groups: a group may not want the data from other groups to affect its performance estimate, particularly when poor performance is penalized. 
 
In my dissertation, I develop techniques for such constrained estimation settings.  The general approach is to utilize information from multiple groups to specify the model in each group, but ultimately estimate the parameters in each group's model using only that group's data. This may satisfy the constraints because data pooling is used just for model specification and not for parameter estimation, and it may  increase estimation accuracy relative to regressing separately in each group because it takes advantage of across-group information.  The techniques can be conceptualized as existing on a continuum ordered by how directly each relies on data pooling to make estimates; those techniques that look more like explicit data pooling are typically more accurate yet less likely to be acceptable.  Using simulation studies, I show that some of these techniques have great potential to increase estimation accuracy.  I also present a procedure that predicts the payoff of each technique from observable characteristics of the data.
 
Portions of this research have appeared in:

Reiter, J.  (2000) ``Borrowing Strength When Explicit Data Pooling Is Prohibited.'' Journal of  Official Statistics, 16, pp. 295-319.

Reiter, J.  (2001) ``Borrowing Strength Without Explicit Data Pooling.'' In Monographs of Official Statistics: Bayesian Methods With Applications to Science, Policy, and Official Statistics. Edited by E. George and N. Photis. Eurostat, pp. 439-448.