Statistics 101
 Data Analysis and Statistical Inference

Extra problems on study design


 Click here for answers to these extra problems

1.  Survey, randomized experiment, or observational study?

In each of the following hypothetical scenarios, tell whether the study is a survey, a randomized experiment, or an observational study.  By our definition, an observational study is a causal study in which the treatments are not randomized to the subjects.  Note that the book includes surveys in the term observational studies, but we distinguish between surveys and causal studies.

i)  An economist reviews the employment figures of fast food restaurants to examine the trends in employment. 

ii) A biologist examines fish in a river to determine the proportion that show signs of problems due to pollutants poured in the river upstream.

iii) A study monitors the occurrence of heart disease over a 5 year period in men assigned at random to eat high fiber or low fiber diets.

iv) Annual return rates of three different types of mutual funds (small company growth funds, mid-size company growth funds, and large company growth funds) are compared to see which type yields the highest return.  The average rate of return for each type is estimated from random samples of funds in each type.

v)  A study from hospital records found that women who had low weight gains during pregnancy were more likely to have low birth weight babies than women who had high weight gains during pregnancy.


2.  Keep me single and kid-free!!

In 1976, the advice columnist Ann Landers asked readers of her column to respond to the following question:  "If you had to do it over again, would you have children?" She received over 10,000 responses, 80% from women.   About 70% of the respondents said no.

Problem:

Based on Ann Landers' survey, do you believe that roughly 70% of people in America in 1976 wished they hadn't had children?  Why or why not?

Reference:
Lohr, S.  Sampling: Design and Analysis. Pacific Grove, CA: Duxbury, 2000, p. 19.
 

3.  Give me a new computer.

In the June 1994 issue of PC World,  a magazine about personal computers (PCs), one of the conclusions from their report on reliability and service support for PCs was that "25% of new PCs have problems."   This also was the top headline in the newspaper USA Today on May 23, 1994.   The report was compiled as follows.   Each monthly issue of PC World from October 1993 to March 1994 included a survey form asking questions about users' hardware troubles.     Readers were asked to complete and mail in their forms.  Those who did so were entered in a drawing to win a new PC.   Over 45,000 people mailed in their form.  The 25% figure is the percentage of these respondents who indicated their PC had troubles.

Problem:

Based on the report of PC World, do you believe that roughly 25% of PCs have problems?  Why or why not?

Reference:
Lohr, S.  Sampling: Design and Analysis. Pacific Grove, CA: Duxbury, 2000, p. 19.

4.  Potential problems with surveys.

Identify potential biases in these hypothetical surveys.

Problems:

i)    To determine a community's opinions on neighborhood policing, the local government asks local police to knock on households' doors to interview their residents.

ii)   When surveying a large forest for species diversity, a forester examines a random sample of areas that are reasonably close to  the research station.

iii)  When surveying Duke students about proposed changes to Duke curriculum, representatives of  student government set up a table outside of the Bryan Center and ask opinions of those who they can convince to stop by the table.

iv)  To gauge local viewers' opinions on President Bush's plans for Social Security , a local TV news station does a piece on Social Security and asks people to call in to express whether they agree or disagree with the President's plans.

v)  When trying to estimate the percentage of people in America who might buy a new version of a product, a company randomly calls people from their list of current customers and asks them if they would buy the product.

5.  Identifying study flaws

For each problem below, you are given a method for collecting data and a corresponding conclusion.  If the conclusions are justifiable from the method of data collection,  explain why you think the method of data collection is effective.  If the conclusions are not justifiable from the method of data collection, say why you think the method of data collection is ineffective.   Don't assume that all the study designs below are ineffective; some may be perfectly reasonable.

For the survey designs, assume there is no ambiguity in questions.  For the causal study designs, assume the definition of any treatments and response is specific.   (In other words, don't worry about the method of stating the question, just focus on the data collection.)   Assume sample sizes are sufficiently large.

i)  An emergency room institutes a new screening procedure to identify people suffering from life threatening heart problems so that treatment can be initiated quickly.  In the first year after its initiation, there is a lower death rate due to heart failure compared to the previous year.   Based on these data, hospital administrators conclude the procedure reduces death rates for people with life-threatening heart problems in their emergency room.

ii)   In a study of people's ability to catch errors when proofreading, a document is prepared in which the first part has a high error rate (1 error every 5 lines of text) and the second part a low error rate (1 error every 20 lines of text).   Eighty randomly selected people proofread the entire document.   For each of these eighty people, researchers record the difference in the percentage of errors missed on the first part and the percentage of errors missed on the second part.  After all data were collected, the researchers found that most of these differences were close to zero, indicating that these eighty people caught similar percentages of errors in the first and second parts.  Based on these data,  the researchers conclude people's ability to catch errors when proofreading is not affected by the error rate in the document.

iii)  The Longitudinal Study on Aging (LSOA) surveyed people across the nation aged 70 and over in 1984 and reinterviewed them biannually through 1990.  In each interview, the sampled people were asked if they were in a nursing home during the year.   By the final 1990 LSOA interview, over 30% of those interviewed in 1984 had died.   Out of all the remaining people who completed the 1990 LSOA interview,  20% were in nursing homes.  Based on these data, the researchers estimate that 20% of people over age 70 were in nursing homes in 1990.

iv)   Early studies on Alzheimer's  disease (AD), which sampled patients treated by medical specialists in Neurology, found that most people with AD had high education levels.   However, later studies based on community surveys found that people with high education levels were less likely to have AD  than people with low education levels.  Explain how these studies could show different relationships between education and AD.

Reference:
Tamhane, A. and Dunlop, D.  Statistics and Data Analysis.  Upper Saddle River, NJ:  Prentice Hall, 2000.
 

6.   Clean Experimental Designs

This problem involves a hypothetical scenario.

An experiment to test a new laundry detergent, Sparkle Clean, is being conducted by Consumers Union, a consumer advocate group.  They would like to compare its performance to a laboratory standard detergent that they have used in previous experiments.  Their resources permit them to stain a maximum of 16 pieces of cloth with 2 teaspoons of a common staining compound. They use a well-calibrated scanner to detect the amount of stain left after washing in their brand new, well-functioning washing machine with some detergent.

Several suggestions for the experimental design have been made by the Union's research team.

a) Squeeky Pete's design:  "Since data are already available for the laboratory standard detergent from previous experiments, wash only with Sparkle Clean on all 16 pieces of cloth, and compare results to the previous data on the standard detergent."

b) Dirty Harry's design:  "To save money spent on washes, run one wash with 8 pieces using the standard detergent washes, then run one wash with 8 pieces using  Sparkle Clean."

c) Clean Kristine's design:  "Use both detergents with 8 separate runs per detergent, but to save time, use only a 10 second wash time with very hot water for each run."

d) Larry Laundry's design:  "Rather than run the experiment, use data from the company that makes Sparkle Clean, and compare them with past data from the standard detergent."

e) Sarah Fabric-Softener's (she's French, okay) design:  "To ease bookkeeping, run successively 8 separate runs with one piece each using the standard detergent, then run successively 8 separate runs with one piece each using Sparkle Clean."

Problem:

Comment on potential problems with these designs.

7. More questions on study design

For each proposed design below, discuss whether you think the proposal study will allow you to collect data that addresses the question of interest. If the study is effective, say why you think it is effective. If the study is flawed, say why you think it is flawed. Don't assume that all the studies below are flawed. Some may be perfectly reasonable.  For the causal study designs, assume the definition of any treatments and response is specific. In other words, don't worry about the method of stating the question, just focus on the method of collecting the data.

a) Bristol Meyers, manufacturer of a dental device called Ipana, once claimed that ``twice as many dentists use Ipana as any other (similar dental device)....'' They based this claim on results of a survey of 10,000 dentists randomly sampled from a list of 66,000 subscribers to two dental magazines. Of the 10,000 sampled dentists, 1,983 replied to the survey.

(This survey was done by Bristol Meyers and contested in a lawsuit against them. If only their executives had taken Stat 103....)

b) In a news story in September 2001, a group of researchers reported that chewers of regular gum are just as likely to quit smoking as chewers of nicotine gum. In the study, the researchers randomly assigned some smokers to chew nicotine gum and other smokers to chew regular gum. Study participants did not know which gum they received.

c) The Chronicle prints a story which claims that the percentage of American students at Duke who know the name of the current head of the United Nations is smaller than the percentage of international students at Duke who know the name of the current head of the United Nations. In the story, the newspaper says that it collected data by ``randomly sampling 100 international students at a social event organized by the International House and 100 American students exiting large classes.''

d) A business school researcher wants to know what factors affect the survival and success of small businesses. She selects a sample of 150 eating-and-drinking establishments from those listed in the telephone directory Yellow Pages for a large city. From these 150 establishments, she finds that most are family-owned. She concludes that being family-owned is vital to the success of small businesses.

e) A member of Congress wants to know whether her constituents support proposed legislation on health care. Her staff reports that 228 letters have been received on the subject, of which 193 oppose the legislation. Therefore, they conclude that approximately 85% of her constituents are opposed to the legislation.

f) The Guglhapf bakery (one of the best bakeries in Durham) wants to know what fraction of Durham households bakes some or all of their own bread. They hire a consultant to select and interview a random sample of 500 residential addresses in Durham. The interviewers visit the households only during regular working hours on weekdays. They find that 20\% of those contacted bake their own bread, so they conclude that about 20\% of households in Durham bake their own bread. 

g) A news story claims that pregnant mothers can increase their chances of having healthy babies by eating lobsters. That claim is based on a study showing that babies born to lobster-eating mothers have fewer health problems than babies born to mothers who don't eat lobster.

h) Volunteers are recruited through a newspaper article to participate in a study of a new vaccine for hepatitis C, which is transmitted by contact with infected blood and is a particular problem of intravenous drug users. The article specifically requests both intravenous drug users and medical workers to volunteer, assuring them confidentiality and offering them free medical care for the life of the study. Volunteers are then randomly assigned to receive either a placebo or the vaccine, and a year later they are tested to see if they have the disease.