Statistics 101
Data Analysis and Statistical
Inference
Instructions for lab 7
Lab Objective
The purpose of the lab is to use JMP to analyze qualitative
(categorical) data.
Lab Procedures
What are the characteristics of youth doing time? The 1987
Survey of Youth in Custody sampled juveniles and young adults in
long-term, state-operated juvenile institutions. Residents of
facilities at the end of 1987 were interviewed about family
background, previous criminal history, and drug and alcohol use.
Open the data set syc.jmp
by clicking on the link. The data set is comprised of 28 variables
for 2621 youths. The youths were sampled from 206 facilities
using a complex sampling design involving stratification and
clustering (see Ch. 19 and 20 in FPP for more information on these
topics). For simplicity, we'll assume the youths were selected from a
simple random sample.
The variables are described in the code book found at the end of this
lab. We'll look at relationships among the following variables:
1) crimtype : most serious crime in current
offense.
2) numarr : number of times arrested
3) agefirst : age at first arrest
4) alcuse : Did the youth drink alcohol at all during the year
before being sent to the institution?
5) everdrug : Did the youth ever use illegal drugs?
These variables have missing data. We'll exclude units
with missing data from analyses. Again, this is not preferred
practice; contact a statistician for help when you encounter missing
data in your research.
Questions:
1. For each of these five variables, state whether you would
treat it as a quantitative (i.e., continuous) variable or a
qualitative (i.e., categorical) variable. On your report, write
the variable name, then either "quantitative" or "qualitative".
This is worth one point per variable.
Notice that all the variables are currently coded as continuous
variables in JMP. We need to recode the qualititive variables so
that JMP can recognize them correctly. You can do this quite
easily: simply double-click on the box with the name of the
column, then change Data Type to Nominal. Go
ahead and change the types of the qualitative variables from Question
1 to be nominal.
2) Examine the percentages of youths in each crime type for each of
the three alcohol categories. Answer the questions after doing
the following JMP procedures.
To examine the relationship between "crimtype" and "alcuse", use Analyze
- Fit Y by X, then enter "crimtype" as the Y variable
and "alcuse" as the X variable. After you click OK, you
get a contingency table with lots of entries. The top number in
each box is the actual number of youths in that box. The second
number in each box is the percentage of youths in that box out of all
youths in the data set. The third number in each box is the
percentage of youths in that box out of all youths in that column
(e.g., the third number in the upper left box is the percentage of
youths who used alcohol, given that they committed a violent crime).
The fourth number in each box is the percentage of youths in that box
out of all youths in that row (e.g., the fourth number in the upper
left box is the percentage of youths who committed a violent crime,
given that they used alcohol).
a) Out of all the youths that had a drug offense as their
worst offense, what percentage never drank alcohol?
b) Out of all the youths that drank alcohol in the past
year, what percentage had property offense as their worst offense?
c) What percentage of these youths committed a
public-order offense as their worst offense?
3a) For each crime type, determine the percentages of youths who
drank alcohol. Now compare them across crime types. In
your opinion, are they: very different, a little different, exactly
the same? Choose one, and justify your answer by comparing the
largest to the smallest value.
3b) Here's a claim: "When there is no relationship between
the row variable and the column variable, the column percentages
(e.g., the percentages in part 3a) should be very similar within any
row in the table. That is, the column percentages in row 1
should be similar; the column percentages in row 2 should be
similar; etc." Do you think this claim is true or false?
Explain your answer. Hint: Think about how you would
interpret the relationship between the variables if the column
percentages are all equal for some row.
4a) For each crime type, determine the percentages of youths who
used drugs. Are the percentages very different, a little
different, exactly the same?
4b) Based on these percentages, do you think there is a strong
relationship between drug usage and crime type for these youths?
Justify your answer with a sentence or two based on your answer
to 4a.
Code book with variable names
age : age of resident (99 = missing)
race : 1=white, 2=black, 3=Asian/Pacific Islander, 4=American
Indian, Aleut, Eskimo, 5 = other, 9 = missing
ethnicity : 1=Hispanic, 2=not Hispanic, 9=missing
educ: highest grade attained before sent to correctional
institution: 00 = never attended school, 01 - 12 = highest grade
attended, 13 = General Equivalency Diploma, 14 = other, 99 = missing
sex: 1=male, 2=female, 9=missing
livewith : Who did you live with most of the time you were
growing up? 1 = mother only, 2 = father only, 3 = both mother
and father, 4 = grandparents, 5 = other relatives, 6=friends, 7=foster
home, 8= agency or institution, 9 = someone else, 99 = missing
famtime : Has anyone in your family, such as your mother,
father, brother, sister, ever served time in jail or prison? 1 =
yes, 2 = no, 7 = don't know, 9 = missing.
crimtype : most serious crime in current offense:
1 = violent (murder, rape, robbery assault)
2 = property (burglary, larceny, arson, fraud, motor
vehicle theft)
3 = drug (drug possession or trafficking)
4 = public order (weapons violation, perjury, failure to
appear in court)
5 = juvenile status offense (truancy, running away,
incorrigible behavior)
9 = missing
everviol : Ever put on probation or sent to a correctional
institution for violent offense? 1 = yes, 0 = no.
numarr: number of times arrested (99 = missing)
probtn : number of times on probation (99 = missing)
corrinst : number of times previously committed to correctional
institution (99 = missing)
evertime : Prior to being sent here, did you ever serve time in a
correctional instituion? (1 = yes, 2 = no, 9 = missing)
prviol : 1 = previously arrested for violent offense.
prprop : 1= previously arrested for property offense.
prdrug : 1 = previously arrested for drug offense
prpub: 1 = previously arrested for public-order offense.
prjuv : 1 = previously arrested for juvenile status offense.
agefirst : age first arrested (99 = missing)
useweapon : Did you use a weapon... for this incident? 1
= yers, 2 = no, 9 = missing
alcuse : Did you drink alcohol at all during the year before
being sent here at this time? ( 1 = yes, 2 = no, didn't drink
the year before, 3 = no, don't drink at all, 9 = missing)
everdrug : Ever used illegal drugs? 0 = no, 1 = yes, 9 =
missing.