Statistics 101
Data Analysis and Statistical Inference
 

Instructions for lab 7


Lab Objective

The purpose of the lab is to use JMP to analyze qualitative (categorical) data.

Lab Procedures


What are the characteristics of youth doing time?   The 1987 Survey of Youth in Custody sampled juveniles and young adults in long-term, state-operated juvenile institutions.  Residents of facilities at the end of 1987 were interviewed about family background, previous criminal history, and drug and alcohol use.

Open the data set syc.jmp by clicking on the link.  The data set is comprised of 28 variables for 2621 youths.   The youths were sampled from 206 facilities using a complex sampling design involving stratification and clustering (see Ch. 19 and 20 in FPP for more information on these topics). For simplicity, we'll assume the youths were selected from a simple random sample.

The variables are described in the code book found at the end of this lab.  We'll look at relationships among the following variables:

1)  crimtype  :  most serious crime in current offense.
2)  numarr : number of times arrested
3) agefirst : age at first arrest
4) alcuse : Did the youth drink alcohol at all during the year before being sent to the institution?
5) everdrug : Did the youth ever use illegal drugs?

These variables have missing data.  We'll exclude units with missing data from analyses.  Again, this is not preferred practice; contact a statistician for help when you encounter missing data in your research.

Questions:


1.  For each of these five variables, state whether you would treat it as a quantitative (i.e., continuous) variable or a qualitative (i.e., categorical) variable.  On your report, write the variable name, then either "quantitative" or "qualitative".  This is worth one point per variable.

Notice that all the variables are currently coded as continuous variables in JMP.  We need to recode the qualititive variables so that JMP can recognize them correctly.  You can do this quite easily:  simply double-click on the box with the name of the column, then change Data Type to Nominal.  Go ahead and change the types of the qualitative variables from Question 1 to be nominal.

2) Examine the percentages of youths in each crime type for each of the three alcohol categories.   Answer the questions after doing the following JMP procedures.

To examine the relationship between "crimtype" and "alcuse", use Analyze - Fit  Y by X, then enter "crimtype" as the Y variable and "alcuse" as the X variable.  After you click OK, you get a contingency table with lots of entries.  The top number in each box is the actual number of youths in that box.  The second number in each box is the percentage of youths in that box out of all youths in the data set.  The third number in each box is the percentage of youths in that box out of all youths in that column (e.g., the third number in the upper left box is the percentage of youths who used alcohol, given that they committed a violent crime). The fourth number in each box is the percentage of youths in that box out of all youths in that row (e.g., the fourth number in the upper left box is the percentage of youths who committed a violent crime, given that they used alcohol).

   a) Out of all the youths that had a drug offense as their worst offense, what percentage never drank alcohol?
   b) Out of all the youths that drank alcohol in the past year, what percentage had property offense as their worst offense?
   c) What percentage of these youths committed a public-order offense as their worst offense?

3a)  For each crime type, determine the percentages of youths who drank alcohol.  Now compare them across crime types.   In your opinion, are they: very different, a little different, exactly the same?  Choose one, and justify your answer by comparing the largest to the smallest value.

3b)  Here's a claim:  "When there is no relationship between the row variable and the column variable, the column percentages (e.g., the percentages in part 3a) should be very similar within any row in the table.  That is, the column percentages in row 1 should be similar;  the column percentages in row 2 should be similar; etc."  Do you think this claim is true or false?  Explain your answer.  Hint: Think about how you would interpret the relationship between the variables if the column percentages are all equal for some row.

4a)  For each crime type, determine the percentages of youths who used drugs.  Are the percentages very different, a little different, exactly the same?

4b) Based on these percentages,  do you think there is a strong relationship between drug usage and crime type for these youths?  Justify your answer with a sentence or two based on your answer to 4a.



Code book with variable names

age : age  of resident (99 = missing)

race : 1=white, 2=black, 3=Asian/Pacific Islander, 4=American Indian, Aleut, Eskimo, 5 = other, 9 = missing

ethnicity : 1=Hispanic, 2=not Hispanic, 9=missing

educ: highest grade attained before sent to correctional institution:  00 = never attended school, 01 - 12 = highest grade attended, 13 = General Equivalency Diploma, 14 = other, 99 = missing

sex:  1=male, 2=female, 9=missing

livewith : Who did you live with most of the time you were growing up?  1 = mother only, 2 = father only, 3 = both mother and father, 4 = grandparents, 5 = other relatives, 6=friends, 7=foster home, 8= agency or institution, 9 = someone else, 99 = missing

famtime : Has anyone in your family, such as your mother, father, brother, sister, ever served time in jail or prison?  1 = yes, 2 = no, 7 = don't know, 9 = missing.

crimtype : most serious crime in current offense:  
                       1 = violent (murder, rape, robbery assault)
                       2 = property (burglary, larceny, arson, fraud, motor vehicle theft)
                       3 = drug (drug possession or trafficking)
                       4 = public order (weapons violation, perjury, failure to appear in court)
                       5 = juvenile status offense (truancy, running away, incorrigible behavior)
                       9 = missing

everviol :  Ever put on probation or sent to a correctional institution for violent offense?  1 = yes, 0 = no.

numarr:  number of times arrested (99 = missing)

probtn : number of times on probation (99 = missing)

corrinst : number of times previously committed to correctional institution (99 = missing)

evertime : Prior to being sent here, did you ever serve time in a correctional instituion?  (1 = yes, 2 = no, 9 = missing)

prviol :  1 = previously arrested for violent offense.

prprop : 1= previously arrested for property offense.

prdrug : 1 = previously arrested for drug offense

prpub: 1 = previously arrested for public-order offense.

prjuv : 1 = previously arrested for juvenile status offense.

agefirst  : age first arrested (99 = missing)

useweapon : Did you use a weapon... for this incident?  1 = yers, 2 = no, 9 = missing

alcuse : Did you drink alcohol at all during the year before being sent here at this time?  ( 1 = yes, 2 = no, didn't drink the year before,  3 = no, don't drink at all, 9 = missing)

everdrug : Ever used illegal drugs?  0 = no, 1 = yes, 9 = missing.