Statistics 101
Data Analysis and Statistical Inference
 

Instructions for lab 3


Lab Objective

To verify the benefits of random sampling and to read about some genuine causal studies.

Lab Procedures

Unit 1: The benefits of random assignment of treatments in causal studies 

What are the characteristics of youth doing time?  The 1987 Survey of Youth in Custody sampled juveniles and young adults in long-term, state-operated juvenile institutions.  Residents of 206 facilities at the end of 1987 were interviewed about family background, previous criminal history, and drug and alcohol use.

Open the data set syc2.jmp by clicking on the link.  The data set is comprised of 28 variables for 2621 youths. The variables are described in the code book found at the end of this lab.  For example, here are the definitions of five of the variables:

1)  crimtype  :  most serious crime in current offense.
2)  numarr : number of times arrested
3) agefirst : age at first arrest
4) alcuse : Did the youth drink alcohol at all during the year before being sent to the institution?
5) everdrug : Did the youth ever use illegal drugs?

The variables have missing data, filled in with 99s and 9s.  Since the purpose of this lab is to see how well random assignment to treatments works, we'll be stupid and treat the 99s and 9s as if they are real values.  Again, this is not preferred practice; contact a statistician for help when you encounter missing data in your research.

Questions:

1)  Write down the following numbers.
   a) Percentage of youths in institutions who committed each of the six crime types (include missing as a type)
   b) Average and Median number of times youths in institutions were arrested.
   c) Average and Median age of youths at first arrest.
   d) Percentage of youths in institutions who drink alcohol during the year before their first arrest.
   e) Percentage of youths who ever used illegal drugs.

Use Analyze - Distribution, and enter these variable names into the Y-columns box.  You will get the appropriate summaries.  You can enter all the variables in the Y-column to get the summaries for all five variables simultaneously.

2)  Can you conclude based on these data that using alcohol increases the chance that youths will go to institutions?  Explain your answer in three or less sentences.


Now let's randomly assign half the youths to one group, and half to another group.  There is an odd number of youths, so one group will have an extra person.  Here's our method of random assignment:  i) create a column with numbers from 1 - 2621; ii) shuffle the numbers in that column; iii) let one group get the youths numbered from 1 - 1311, and the other group get the youths from 1312-2621.  This general process can be used to do random assignment in other experiments.

Step 1:  Create a column with row numbers

Create a new column in JMP at the end of the data set by clicking Cols-New Column.  Name the column "Row number."  Click on the New Property button, and select Formula.  In the resulting dialog box, select Numeric-Count.   Click on where it says "from", and enter the number 1.  Click on where it says "to", and add the number 2621.  Click on where it says "steps", and add the number 2621.  Click OK.  Now, double click on the box for "Row Number", highlight Formula, and click Remove Property.  The result is a column of row numbers.

Step 2.  Shuffle the row numbers.

Double click on the box for "Row number", click on the New Property button, and select Formula. Then, hit Edit Formula. In the resulting dialog box, select Random - Col Shuffle.  ClickOK.  You now have shuffled the row numbers randomly, without changing the order of any of the other variables.

Step 3:  Select the two groups

Go to Rows- Row Selection - Select Where...  Click on "Row numbers", then select "less than", then enter 1312 in the blank box.  Hit OK.  This highlights all the rows with "Row number" less than 1312.  Next, make sure that no boxes are highlighted, then go to Tables - Subset.  Name this subset "Group 1".  Select "Selected Rows" as the option, and click OK.  This creates group 1.  

To create group 2, go to Rows - Row Selection - Invert Row Selection.  Now you have highlighted all the youths with row numbers greater than 1311. Make sure that no columns are highlighted, then go to Tables - Subset.  Name this subset "Group 2".  Select "Selected Rows" as the option, and click OK.  This creates group 2.    

Questions:

3) Why was it necessary to shuffle the row numbers without changing the order of the column variables?

4) Write down the percentages or means of crimtype, numarr, agefirst,alcuse,andeverdrugin Group 1 and in Group 2.  

Notice that the percentages and means for these variables are very similar in the two groups.  By assigning the youths to groups at random, we are able to get close balance on all these variables.  It's amazing!!

5) Compare the percentages or means in the two groups for three other variables of your choice.  Report the percentages or means, and state whether or not the random assignment produced two groups that are similar on these variables.



Unit 2:  Reading journal articles about causal studies 

In the next part of the lab, you'll be asked questions about two journal articles describing causal studies. The objective of this part of the lab is to give you some guidelines for what to look for when reading about study designs in journals and the media.  You won't understand all the statistical methods used in the study; we haven't learned them yet.  By the end of the semester, you will understand those methods.  For now, we focus on the study designs.

You should read the articles and complete the questions before labs.  Type your answers.


Article 1:  Is St. John's wort effective for treating depression?

St. John's wort is an herb that is reputed to elevate moods.  In the early 1990s, anecdotal evidence suggested that St. John's wort can effectively treat depression.  However, the anecdotal evidence was shaky--like all anecdotal evidence--because it did not control for aspects of patients' background characteristics.  That is, the evidence was not collected from studies that compared people who took St. John's wort and similar people who did not.

In 1993, Congress established the National Center for Complementary and Alternative Medicine (NCCAM) within the National Institute of Health (NIH) for the purpose of supporting clinical trials to evaluate the effectiveness of alternative medicine.  Their first major, multi-centered study investigated the effectiveness of St. John's wort in treating moderately severe cases of depression. The study cost $6-million to run. In an October 1 1997 NIH news release anouncing this study, the director of the National Institute of Mental Health stated:


"This study will give us definitive answers about whether St. John's wort works for clinical depression. The study will be the first rigorous clinical trial of the herb that will be large enough and long enough to fully assess whether it produces a therapeutic effect."

The study and its conclusions are reported by Davidson, J. R.T., et al. (2002) in the Journal of the American Medical Association, one of the most prestigious journals in medical science.  An April 9, 2002, NIH news release summarized the results of the study as follows:

"An extract of the herb St. John's wort was no more effective for treating major depression of moderate severity than placebo, according to research published in the April 10 issue of the Journal of the American Medical Association."

Below is the reference to the article by Davidson et al. (2002), as well as a link.  Click on the link, then click on "pdf of this article".  If you have trouble opening the pdf file, you can click on "full text."  Read the article and answer the questions below.

Click for a direct link to the article.

Here is the reference for the article.

Davidson, J. R. T. et al. (2002). "Effect of Hypericum perforatum (St. John's wort) in major depressive disorder.  Journal of the American Medical Association, vol 287, no 14.

FOR ALL QUESTIONS, WRITE NO MORE THAN THREE SENTENCES.  TAs WILL NOT READ MORE THAN THE FIRST THREE SENTENCES WHEN GRADING.

Questions:

The article uses the technical names "setraline" for the drug Zoloft (which is manufactured by Pfizer and is a cousin of Prozac) and "hypericum perforatum" for St. John's wort.  We will replace these by the popular names in discussing the results of the study.

1.  a) What are the treatments?  What are the dosages for the treatments?
     b) How long did the study last?
     c) What are the main outcome measures?

2.  Give three examples of people who are excluded from the study.  Why do you think the authors excluded these people from the study?

3.  Write an explanation of how the subjects were assigned to treatments for someone who hasn't read the article.

4.  Based on Table 1, are the three groups reasonably well-balanced on background characteristics before the study began?  If not which variables are not balanced?

5.  The authors write in great length to convince us that the study is double-blind.  Why is double-blinding important for this study?


Article 2:  What are the effects on employment of increasing the minimum wage?

Classical economic theory predicts that increasing wages decreases employment.  This is one of the main arguments against raising the minimum wage.   The theory is informative, but it should not be trusted in isolation.  We need evidence from data to see whether the theory is correct.

Card and Krueger (1994) assessed the effects of raising the minimum wage by examining wages and employment practices in fast food restaurants.

Here is a direct link to their article.  It is a pdf file, so you'll need Adobe Acrobat to read it.

The reference for the article is:

Card, D. and Krueger, A. B. (1994).  "Minimum wages and employment: A case study of the fast-food industry in New Jersey and Pennsylvania".  The American Economic Review,  84.  pp. 772-793.

FOR ALL QUESTIONS, WRITE NO MORE THAN THREE SENTENCES.  TAs WILL NOT READ MORE THAN THE FIRST THREE SENTENCES WHEN GRADING.

Questions:

1. a) What are the treatments?
    b) What are the units of study?  Why do the authors use these units for study?
    c) What are the main outcome measures?

2.  Is this an observational study or a randomized experiment?  Justify your answer in one sentence.

3.  Based on Table 2, were the two groups balanced on background characteristics before the minimum wage took effect?  If not, which variables are not balanced?

4.  The authors worked hard to obtain information from restaurants that did not respond to the first wave of the survey.  Why might ignoring those missing restuarants be an unwise decision when estimating the effect of the minimum wage on employment?

5.  In all observational studies, there could be other factors that affect the treatment groups' responses and thereby explain apparent causal effects.  Did Card and Krueger examine any alternative hypotheses?   If so, describe their analysis of any one of these alternatives.



Code book with variable names

age : age  of resident (99 = missing)

race : 1=white, 2=black, 3=Asian/Pacific Islander, 4=American Indian, Aleut, Eskimo, 5 = other, 9 = missing

ethnicity : 1=Hispanic, 2=not Hispanic, 9=missing

educ: highest grade attained before sent to correctional institution:  00 = never attended school, 01 - 12 = highest grade attended, 13 = General Equivalency Diploma, 14 = other, 99 = missing

sex:  1=male, 2=female, 9=missing

livewith : Who did you live with most of the time you were growing up?  1 = mother only, 2 = father only, 3 = both mother and father, 4 = grandparents, 5 = other relatives, 6=friends, 7=foster home, 8= agency or institution, 9 = someone else, 99 = missing

famtime : Has anyone in your family, such as your mother, father, brother, sister, ever served time in jail or prison?  1 = yes, 2 = no, 7 = don't know, 9 = missing.

crimtype : most serious crime in current offense:  
                       1 = violent (murder, rape, robbery assault)
                       2 = property (burglary, larceny, arson, fraud, motor vehicle theft)
                       3 = drug (drug possession or trafficking)
                       4 = public order (weapons violation, perjury, failure to appear in court)
                       5 = juvenile status offense (truancy, running away, incorrigible behavior)
                       9 = missing

everviol :  Ever put on probation or sent to a correctional institution for violent offense?  1 = yes, 0 = no.

numarr:  number of times arrested (99 = missing)

probtn : number of times on probation (99 = missing)

corrinst : number of times previously committed to correctional institution (99 = missing)

evertime : Prior to being sent here, did you ever serve time in a correctional instituion?  (1 = yes, 2 = no, 9 = missing)

prviol :  1 = previously arrested for violent offense.

prprop : 1= previously arrested for property offense.

prdrug : 1 = previously arrested for drug offense

prpub: 1 = previously arrested for public-order offense.

prjuv : 1 = previously arrested for juvenile status offense.

agefirst  : age first arrested (99 = missing)

useweapon : Did you use a weapon... for this incident?  1 = yers, 2 = no, 9 = missing

alcuse : Did you drink alcohol at all during the year before being sent here at this time?  ( 1 = yes, 2 = no, didn't drink the year before,  3 = no, don't drink at all, 9 = missing)

everdrug : Ever used illegal drugs?  0 = no, 1 = yes, 9 = missing.