Statistics 101
Data Analysis and Statistical
Inference
Instructions for lab 3
Lab Objective
To verify the benefits of random sampling and to read about some
genuine causal studies.
Lab Procedures
Unit 1: The benefits of random assignment of treatments in
causal studies
What are the characteristics of youth doing time? The 1987 Survey
of Youth in Custody sampled juveniles and young adults in long-term,
state-operated juvenile institutions. Residents of 206 facilities
at the end of 1987 were interviewed about family background, previous
criminal history, and drug and alcohol use.
Open the data set syc2.jmp
by clicking on the link. The data set is comprised of 28 variables
for 2621 youths. The variables are described in the code book found at
the end of this lab. For example, here are the definitions of five
of the variables:
1) crimtype : most serious crime in current
offense.
2) numarr : number of times arrested
3) agefirst : age at first arrest
4) alcuse : Did the youth drink alcohol at all during the year
before being sent to the institution?
5) everdrug : Did the youth ever use illegal drugs?
The variables have missing data, filled in with 99s and 9s.
Since the purpose of this lab is to see how well random assignment
to treatments works, we'll be stupid and treat the 99s and 9s as if
they are real values. Again, this is not preferred practice;
contact a statistician for help when you encounter missing data in your
research.
Questions:
1) Write down the following numbers.
a) Percentage of youths in institutions who committed each
of the six crime types (include missing as a type)
b) Average and Median number of times youths in
institutions were arrested.
c) Average and Median age of youths at first arrest.
d) Percentage of youths in institutions who drink alcohol
during the year before their first arrest.
e) Percentage of youths who ever used illegal drugs.
Use Analyze - Distribution, and enter these variable names
into the Y-columns box. You will get the appropriate
summaries. You can enter all the variables in the Y-column to
get the summaries for all five variables simultaneously.
2) Can you conclude based on these data that using alcohol
increases the chance that youths will go to institutions? Explain
your answer in three or less sentences.
Now let's randomly assign half the youths to one group, and half to
another group. There is an odd number of youths, so one group will
have an extra person. Here's our method of random assignment:
i) create a column with numbers from 1 - 2621; ii) shuffle the
numbers in that column; iii) let one group get the youths numbered from
1 - 1311, and the other group get the youths from 1312-2621. This
general process can be used to do random assignment in other
experiments.
Step 1: Create a column with row numbers
Create a new column in JMP at the end of the data set by clicking Cols-New
Column. Name the column "Row number." Click on the New
Property button, and select Formula. In the resulting
dialog box, select Numeric-Count. Click on where it says
"from", and enter the number 1. Click on where it says "to", and
add the number 2621. Click on where it says "steps", and add the
number 2621. Click OK. Now, double click on the box
for "Row Number", highlight Formula, and click Remove
Property. The result is a column of row numbers.
Step 2. Shuffle the row numbers.
Double click on the box for "Row number", click on the New Property
button, and select Formula. Then, hit Edit Formula. In
the resulting dialog box, select Random - Col Shuffle. ClickOK.
You now have shuffled the row numbers randomly, without
changing the order of any of the other variables.
Step 3: Select the two groups
Go to Rows- Row Selection - Select Where... Click on "Row
numbers", then select "less than", then enter 1312 in the blank box.
Hit OK. This highlights all the rows with "Row
number" less than 1312. Next, make sure that no boxes are
highlighted, then go to Tables - Subset. Name this subset
"Group 1". Select "Selected Rows" as the option, and click OK.
This creates group 1.
To create group 2, go to Rows - Row Selection - Invert Row
Selection. Now you have highlighted all the youths with row
numbers greater than 1311. Make sure that no columns are highlighted,
then go to Tables - Subset. Name this subset "Group 2".
Select "Selected Rows" as the option, and click OK.
This creates group 2.
Questions:
3) Why was it necessary to shuffle the row numbers without changing
the order of the column variables?
4) Write down the percentages or means of crimtype, numarr, agefirst,alcuse,andeverdrugin
Group 1 and in Group 2.
Notice that the percentages and means for these variables are very
similar in the two groups. By assigning the youths to groups at
random, we are able to get close balance on all these variables.
It's amazing!!
5) Compare the percentages or means in the two groups for three other
variables of your choice. Report the percentages or means, and
state whether or not the random assignment produced two groups that are
similar on these variables.
Unit 2: Reading journal articles about causal
studies
In the next part of the lab, you'll be asked questions about two
journal articles describing causal studies. The objective of this part
of the lab is to give you some guidelines for what to look for when
reading about study designs in journals and the media. You won't
understand all the statistical methods used in the study; we haven't
learned them yet. By the end of the semester, you will understand
those methods. For now, we focus on the study designs.
You should read the articles and complete the questions before labs.
Type your answers.
Article 1: Is St. John's wort effective for treating
depression?
St. John's wort is an herb that is reputed to elevate moods.
In the early 1990s, anecdotal evidence suggested that St. John's
wort can effectively treat depression. However, the anecdotal
evidence was shaky--like all anecdotal evidence--because it did not
control for aspects of patients' background characteristics.
That is, the evidence was not collected from studies that
compared people who took St. John's wort and similar people who did not.
In 1993, Congress established the National Center for
Complementary and Alternative Medicine (NCCAM) within the National Institute
of Health (NIH) for the purpose of
supporting clinical trials to evaluate the effectiveness of alternative
medicine. Their first major, multi-centered study investigated
the effectiveness of St. John's wort in treating moderately severe
cases of depression. The study cost $6-million to run. In an October
1 1997 NIH
news release anouncing this study, the director of the National
Institute of Mental Health stated:
"This study will give us definitive answers about whether St. John's
wort works for clinical depression. The study will be the first
rigorous clinical trial of the herb that will be large enough and long
enough to fully assess whether it produces a therapeutic effect."
The study and its conclusions are reported by Davidson, J. R.T., et
al. (2002) in the Journal of the American Medical Association,
one of the most prestigious journals in medical science. An
April 9, 2002, NIH news
release summarized the results of the study as follows:
"An extract of the herb St. John's wort was no more
effective for treating major depression of moderate severity than
placebo, according to research published in the April 10 issue of
the Journal of the American Medical Association."
Below is the reference to the article by Davidson et al. (2002),
as well as a link. Click on the link, then click on "pdf of this
article". If you have trouble opening the pdf file, you can click
on "full text." Read the article and answer the questions below.
Click for a direct
link to the article.
Here is the reference for the article.
Davidson, J. R. T. et al. (2002). "Effect of Hypericum
perforatum (St. John's wort) in major depressive disorder.
Journal of the American Medical Association, vol 287, no
14.
FOR ALL QUESTIONS, WRITE NO MORE THAN THREE SENTENCES. TAs
WILL NOT READ MORE THAN THE FIRST THREE SENTENCES WHEN GRADING.
Questions:
The article uses the technical names "setraline" for the drug Zoloft
(which is manufactured by Pfizer and is a cousin of Prozac) and
"hypericum perforatum" for St. John's wort. We will replace
these by the popular names in discussing the results of the study.
1. a) What are the treatments? What are the dosages for
the treatments?
b) How long did the study last?
c) What are the main outcome measures?
2. Give three examples of people who are excluded from the study.
Why do you think the authors excluded these people from the study?
3. Write an explanation of how the subjects were assigned to
treatments for someone who hasn't read the article.
4. Based on Table 1, are the three groups reasonably
well-balanced on background characteristics before the study began?
If not which variables are not balanced?
5. The authors write in great length to convince us that the
study is double-blind. Why is double-blinding important for this
study?
Article 2: What are the effects on employment of increasing
the minimum wage?
Classical economic theory predicts that increasing wages decreases
employment. This is one of the main arguments against raising the
minimum wage. The theory is informative, but it should not be
trusted in isolation. We need evidence from data to see whether
the theory is correct.
Card and Krueger (1994) assessed the effects of raising the minimum
wage by examining wages and employment practices in fast food
restaurants.
Here is a direct
link to their article. It is a pdf file, so you'll need Adobe
Acrobat to read it.
The reference for the article is:
Card, D. and Krueger, A. B. (1994). "Minimum wages and
employment: A case study of the fast-food industry in New Jersey and
Pennsylvania". The American Economic Review, 84.
pp. 772-793.
FOR ALL QUESTIONS, WRITE NO MORE THAN THREE SENTENCES. TAs WILL
NOT READ MORE THAN THE FIRST THREE SENTENCES WHEN GRADING.
Questions:
1. a) What are the treatments?
b) What are the units of study? Why do the authors
use these units for study?
c) What are the main outcome measures?
2. Is this an observational study or a randomized experiment?
Justify your answer in one sentence.
3. Based on Table 2, were the two groups balanced on background
characteristics before the minimum wage took effect? If not, which
variables are not balanced?
4. The authors worked hard to obtain information from restaurants
that did not respond to the first wave of the survey. Why might
ignoring those missing restuarants be an unwise decision when estimating
the effect of the minimum wage on employment?
5. In all observational studies, there could be other factors
that affect the treatment groups' responses and thereby explain apparent
causal effects. Did Card and Krueger examine any alternative
hypotheses? If so, describe their analysis of any one of these
alternatives.
Code book with variable names
age : age of resident (99 = missing)
race : 1=white, 2=black, 3=Asian/Pacific Islander, 4=American
Indian, Aleut, Eskimo, 5 = other, 9 = missing
ethnicity : 1=Hispanic, 2=not Hispanic, 9=missing
educ: highest grade attained before sent to correctional
institution: 00 = never attended school, 01 - 12 = highest grade
attended, 13 = General Equivalency Diploma, 14 = other, 99 = missing
sex: 1=male, 2=female, 9=missing
livewith : Who did you live with most of the time you were
growing up? 1 = mother only, 2 = father only, 3 = both mother and
father, 4 = grandparents, 5 = other relatives, 6=friends, 7=foster
home, 8= agency or institution, 9 = someone else, 99 = missing
famtime : Has anyone in your family, such as your mother,
father, brother, sister, ever served time in jail or prison? 1 =
yes, 2 = no, 7 = don't know, 9 = missing.
crimtype : most serious crime in current offense:
1 = violent (murder, rape, robbery assault)
2 = property (burglary, larceny, arson, fraud, motor
vehicle theft)
3 = drug (drug possession or trafficking)
4 = public order (weapons violation, perjury, failure to
appear in court)
5 = juvenile status offense (truancy, running away,
incorrigible behavior)
9 = missing
everviol : Ever put on probation or sent to a correctional
institution for violent offense? 1 = yes, 0 = no.
numarr: number of times arrested (99 = missing)
probtn : number of times on probation (99 = missing)
corrinst : number of times previously committed to correctional
institution (99 = missing)
evertime : Prior to being sent here, did you ever serve time in a
correctional instituion? (1 = yes, 2 = no, 9 = missing)
prviol : 1 = previously arrested for violent offense.
prprop : 1= previously arrested for property offense.
prdrug : 1 = previously arrested for drug offense
prpub: 1 = previously arrested for public-order offense.
prjuv : 1 = previously arrested for juvenile status offense.
agefirst : age first arrested (99 = missing)
useweapon : Did you use a weapon... for this incident? 1
= yers, 2 = no, 9 = missing
alcuse : Did you drink alcohol at all during the year before
being sent here at this time? ( 1 = yes, 2 = no, didn't drink the
year before, 3 = no, don't drink at all, 9 = missing)
everdrug : Ever used illegal drugs? 0 = no, 1 = yes, 9 =
missing.