Statistics 101
 Data Analysis and Statistical Inference

Answers to extra problems on study design
 


1. Survey, randomized experiment, or observational study?

i)  It is a survey. The economist seeks to describe the target population of all fast food stores.

ii)  It is a survey.  The biologist seeks to describe the target population of all fish in the river.

iii) This is a randomized experiment. It is a causal study because eating high/low fiber is a treatment: one can either follow or not follow a high fiber diet.   The response is occurrence of heart disease.    Treatments are randomized, i.e. the patients are randomly assigned to low/ high fiber treatment.

iv)  This is a survey.  We seek to describe three types of mutual funds, and use these descriptions to compare them.  This is not a causal study, because there are no treatments.   Each fund's type is a characteristic of the fund, like each person's age is a characteristic of the person.    Fund type, like age, is not potentially manipulable.

The target population in this survey is all mutual funds in the small company growth fund, mid-size company growth fund, and large company growth fund categoreis.  To get an appropriate sampling frame, one can compile a list of all possible mutual funds' names from a company like Morningstar Investment Magazine  or from looking at the Mutual Fund section of the Wall Street Journal.    To take a random sample, give each fund a number and then randomly pick a set of numbers using Minitab or some other software capable of giving random numbers.   The random sample contains the mutual funds whose numbers were selected by the random selector.

v)  This is an observational study.  The treatments are high weight gain in pregnancy and low weight gain in pregnancy are the treatments.  These are treatments because one could conceptually (if not realistically) manipulate the amount of weight gain.   The response is the weight of the newborn baby.  The researchers did not (and ethically could not) randomly assign the women to high/ low weight program.  Since there is no random assignment, but there are treatments, this is an observational study.

To get valid estimates of the causal effect of high/low weight gain on babies' birth weight,  we would try to construct aa group of high weight gain mothers that looks as similar as possible to the group of low weight gain mothers on all background characteristics.  Then, we would compare the average birth weights of the babies in the two groups.   If we can not get adequate balance of the background charcteristics in the two groups, we have to throw up our hands and admit that we cannot assess this causal question.

2.  Keep me single and kid-free!!

We don't believe that roughly 70% of people in America in 1976 wished they hadn't had children.  The sample is not a random sample from the entire U.S. population and likely fails to reflect the characteristics of the U.S. population.  Evidence of this includes:

(1)  The percentage of women in the respondents (80%) is very high compared to the percentage of women in the U.S. population (around 50%).  Men are likely to have different opinions than women on this issue, since men do not give birth and are in many families not primarily responsible for raising the children.

(2)   People responded voluntarily.  Perhaps they did so because they are passionate about this issue.  Such people are likely to feel the need to express opinions that run counter to prevailing wisdom, which in this case is that "children are a blessing."   This is a classic example of the problems with voluntary response sampling.

(3)  Only people who read Ann Landers's column could possibly answer the survey.  Ann Landers's readers are not necessarily representative of the U.S. population.  The opinions of people who do not read her column are just as important as those who do.  This is an example of frame coverage bias.
 

3.  Give me a new computer.

We should not rely on the results of the survey.

This is again a case of voluntary response sampling, in which readers respond because they have passion about the issue.  People with computer problems are more likely to complain about these problems, and the PC World survey provides a perfect outlet to complain.  People who don't have any problems are less likely to feel the need to tell people they don't have problems.  This potential bias is compunded because of the incentive offered by PC World:  people who respond are entered into a contest to win a new PC.  If you're having problems with your PC. you want a new one and so will respond to the survey!

There is also frame coverage bias.  The study has a target population of all PC owners.  However, the study is based only on the opinions of PC owners who subscribe to PC World.   This may not be the same population as all PC owners.
 

4.  Potential problems with surveys.

i)  When collecting responses, the interviewer should not dramatically affect the responses.  Keeping this in mind, local police are not the best people to interview others about local policing.  Respondents are unlikely to tell the police offer that they don't like the police!!  Honest opinions about this issue are hard to obtain with this design.

ii)  The forester selected a random sample of areas close to the station.  Areas close to the station are likely to be affected by humans and hence may not be representative of the entire forest.  For example, perhaps fewer animals are willing to live in proximity to humans, so that the forester would underestimate the amount of species diversity.  Although it is a lot more work, the forester should take a random sample from the entire forest.

iii)  This is a mess!!  There is convenience sampling and judgment sampling by the student government, mixed in with voluntary response by potential respondents.   People who go to the Bryan Center are likely to be different than people who do not go there.  For example, they may be more active in extracurriculars than those who do not go there.  Thus, sampling only people who go to the Bryan Center may lead to unrepresentative samples.  Further, interviewers sitting at tables may try to convince people who look "friendly" to stop by and complete the questionnaire.  Such people may hold similar opinions, and these opinions may differ frompeople who don't look as friendly.  Finally, only people who have strong opinions on this issue are likely to stop and fill out the questionnaire, thus leading to further bias.

iv)  This is voluntary response sampling, where people respond to the survey because they have strong opinions on Social Security.  Also, the nature of the TV program on social security could influence responses.  For example, if the program portrays Bush's plans in a very positive light, more respondents are likely to support the plan (and vice-versa).

v)  This design suffers from frame coverage bias, because the company wants to estimate the proportion of potential buyers in the entire American population but is sampling from their current customers.  Given that current customers have already decided to buy an old version of the product, these people may be more likely to buy the new version than those who had never bought the product before.

5.  Identifying study flaws

i).  The implied treatments in this problem are "use the new screening procedure" and "use the old screening procedure".   The response is the number of patients' deaths in the emergency room.

The main problem with this design is that there is no group of people who experienced the old screening procedure during the first year of its initiation.   Instead, the comparison group is patients from a previous year.   We have no assurance that these two groups of people look similar on background characteristics.   For example, perhaps last year the patients were sicker than the patients this year (e.g., a bad economy last year caused intense stress which led to more life-ending heart attacks).  Or, perhaps some other change in the emergency room (e.g., new doctors or new medical technology) other than the screening procedure is reducing death rates.  Unless we know that the two treatment groups have similar background characteristics, we cannot be sure that the reduction in death rates is caused mainly by the new procedure.

ii).  The treatments are high and low error rates in passages.   The response is the number of errors recognized by a person when proofreading these passages.

This is a flawed design because the ordering of the passages is confounded with the effects of the different error rates.  For example, say readers get tired of proofreading by the time they get to the second passage, so that they are prone to miss more errors when reading the second passge.   Since every person reads the high error rate passage first and the low error rate passage second,  this tiredness effect would make people's performance with the low error rate look worse than it really is.

To fix this design, two random groups can be created, so that one group reads the high error rate passage and one reads the low error rate passage.  Then, the average scores in the two groups are compared.  Or, one could randomize the order that people read the passages, so that each treatment has an equal chance of going first.

Another potential issue with this design is that results on this particular passage may not be generalizable to other reading materials.  Whether this is a problem would depend on the nature of the passage and the specific causal question of interest.

iii) The target population is all old people.  By using only the people who responded to all four interviews, the sampled population in 1990 no longer represents this target population.  Specifically, the sample does not represent people who would not answer all four interviews!  Such people may have nursing home admission rates that differ from those of people who did respond to all four interviews.  For example, the people who survived all four years are likely to represent the most healthy segment of old people, and this segment does not enter nursing homes as much as the less healthy segment does.   Thus, by using only people who answered all four years, we likely are underestimating the true percentage of  people who enter nursing homes.

iv)  These data come from two completely different target populations.  Those who went to see the neurologist are probably wealthier and actively sought out the doctor.  Such people tend to have higher educations, which explains why the studies using neurology data find a strong link between education and AD.   Those people in the community survey are likely to have a wider range of wealth and education.  There may be an entirely different  relationship between education and AD in the community population.  Thus, it is not surprising that the studies seem to contradict each other with regards to the link between education and AD.

6.   Clean Experimental Designs

Recall the essential principles of design of experiments:

(i) The units should be randomly assigned to the treatments, so that the groups to be compared look similar in characteristics that may affect the response, apart from the difference in treatment.
(ii)  The study should have realistic conditions and be generalizable to populations other than the study population.
(iii)  When possible, studies should be double blind.
(iv)  There should be no noncompliance, no interference between units, and no order effects.

Now the solutions:

(a)  The conditions used to obtain the data on standard detergent are likely to be different than the conditions used in the current washes.  For example, perhaps the laboratory results were obtained using a machine that is not as efficient as the new machine.  Following Squeeky Pete's design, we would have no clue whether the washes with Sparkle Clean are better (or worse) due to the detergent or due to the difference in machines.   We should strive to compare the brands under identical conditions. Hence this is not a good design.

(b)  This design is almost OK, but it has one problem.  By washing all pieces for each detergent together, we only have one washing for each detergent type.   Efffectively, the response variable is the amount of stain removed during one wash on all eight pieces.  Having only one observation per treatment group does not give as much information as running eight separate washes under each detergent.   In other words, this design wastes resources.

(c) The experiment is carried out under completely unrealistic conditions. The effects of the detergents may change completely under the normal and moderate conditions  in regular washing . Hence we should not accept this design.

(d) This is even worse than the first suggestion.   The detergents may not be compared under identical conditions.   Plus, it would be unwise to use data from the company that makes Sparkle Clean when we want to evaluate Sparkle Clean independently!!

(e) At first sight this may look like a reasonable design, however the consecutive washes could introduce an order effect into the comparisons of the treatments . For example,  perhaps the machine works less efficiently after several washes.  This would make Sparkle Clean look less effective, which is unfair since it is the machine that is failing and not Sparkle Clean .  Hence this design is also not good !
 

A reasonable design is to line of all 16 pieces of cloth, then randomly assign eight to get Sparkle Clean and eight to get the standard detergent.  Then, wash them under normal conditions separately in the order that they were lined up originally.  This randomizes the order of applying the detergent, which eliminates the order effect.
 

7. More questions on study design

a) There are two potential problems with this survey.  First, the sampling frame is the 66,000 dentists who subscribe to the magazines.  These dentists may not be representative of all dentists. For example, by subscribing to the magazines, they may be more interested in using dental products than dentists who do not subscribe. More importantly, not everyone responded to the survey.  It is likely that only dentists with strong feelings about Ipana responded.  For example, perhaps those who really like Ipana and have some brand loyalty to it were more likely to respond.

b) Assuming compliance with the study design, this appears to be a valid study.  Random assignment of two treatements and blinding were used.

c) This survey could give misleading results because the samples may not be representative.  For example, the students at the International House social event might be more up to speed with international affairs than those students who do not with to interact with other international students at such parties.  Or, the students exiting large classes are more likely to be first year students or sophomores, who may be more or less knowledgable about international affairs than others.

d) The problem with this study is that only surviving businesses are examined.  To determine the differences between those that survive and those that fail, one needs to examine the businesses that fail as well.  For example, what if 80% of all businesses that fail are family owned; then, there is no relationship between family ownership and survival.

e) Only people with strong opinions are likely to send a letter to their Congressperson.  Hence, these people are not represetative of all constituents.

f) By visiting only at home and during the week, the surveyors miss households in which no one is home during those times.  These households may have less time to bake bread than households where someone is home, so that the percentage is too high.

g) Eating lobster is associated with having high income, and high income is associated with better health outcomes.  Hence, eating lobster may not cause better birth outcomes, rather it is a signal for families with higher income.

h) This study is well-designed for the population of volunteers in the study.  Conclusions made from the experiment should be valid for them because of the randomization and blinding.  However, the researchers should be wary about extending these conclusions to broader populations.  People who are addicts have different health than non-addicts, and the vaccination may have different effect for non-addicts.