Statistics 101
Data Analysis and Statistical
Inference
Instructions for lab 7
Lab Objective
The purpose of the lab is to analyze a data set using what we have
learned so far in class.
Lab Procedures
In California, there are four levels of incarceration facilities
for prisoners, categorized as Level 1, Level 2, Level 3, and Level
4. The level increases as the amount of security at the facility
increases. Level 4 facilities are reserved for the most
dangerous prisoners, or prisoners who need protection from other
inmates. Each prisoner is assigned to a facility based on
his or her classification score, which is determined from the
length of the prison sentence and other variables, including age,
marital status and prior convictions. (Note: Some prisoners are
assigned to facilities based on criteria other than classification
scores, but this is a relatively small number.)
The highest security (Level 4) facilities are expensive to run.
Hence, the California Department of Corrections (CDC) is interested in
how well Level 4 facilities eliminate risks that dangerous inmates pose
to other inmates, staff, and themselves. In this lab, we address
this question using data collected by the CDC. A reference for
the study is:
Berk, R. A. and de Leeuw J. (1999) "An Evaluation of California's
Inmate Classification System Using a Generalized Regression
Discontinuity Design. " Journal of
the American Statistical Association, Vol. 94, pp. 1045--1052.
Description of the data
Beginning in January 1994, the CDC began enrolling inmates in this
study. A total of 3,922 inmates are included (only 3918 have
classification scores). The response variable indicates whether
the prisoner committed any misconduct violations. All incidents
of misconduct were recorded, including less serious violations
such as not standing for a count or not showing up for an assignment,
as well as more serious violations, like drug trafficking or assaulting
a corrections officer. A "Strike 2" inmate is a prisoner serving
time for a second felony and who was sentenced under a California
law mandating sentence length enhancements. A "Strike 3" inmate
is a prisoner serving time for a third felony, in which case that same
law mandated a life sentence. Since such prisoners have little to
lose, they are usually assigned to the maximum (Level 4) security
prisons.
There are five variables in the data set.
Misconduct........Misconduct violation (1) or not (0)
Score.......... .Classification score
Strike 2....... .Two strikes inmate (1) or not (0)
Strike 3........ Three strikes inmate (1) or not (0)
Level 4........... Classified to Level 4 (1) or not (0)
For these data, we consider assignment to Level 4 to be the treatment,
and assignment to any other level to be the control. Note
that assignments to treatment or control are based on the classification
scores, not randomization. The response variable is misconduct.
Open the data set incarceration.jmp
by clicking on the link. All the variables are currently
coded as continuous variables. You should change them to
nominal as needed. The data were downloaded from the UCLA
Department of Statistics web site.
Questions:
1. Compare the percentages of Strike 2, Strike 3, and other
prisoners for the group that committed misconducts and the group that
did not commit misconducts. Does the "Strike rate" appear to be
associated with misconduct rates? If so, describe the
association. Include the percentages in your answer.
JMP Hint: We need a variable that has three levels--Strike 1,
Strike 2, other--to answer this question most directly. You can
create this variable by making a new column, using a Formula that is "Strike2 + 2*Strike3." This generates a 1 for
Strike 2 prisoners, a 2 for Strike 3 prisoners, and a zero for other
prisoners. Convert the variable to a nominal variable for use in
analyses.
2. Compare the distributions of classification scores for the
prisoners who commit misconducts and for those who do not commit
misconducts. Describe the main conclusions from your comparison.
Include a rough sketch of the graphical display you used for the
comparison.
3. Compare the percentage of prisoners in Level 4 facilities who
commit misconducts with the percentage of prisoners in Level 1-3
facilities who commit misconducts. What do these percentages
suggest about the effect of being in Level 4 on misconduct rates?
Include both percentages in your answer.
4. Would you say that the estimated difference you found in
Question 3 is a reasonable estimate of the effect of being assigned to a
Level 4 facility on misconduct rates? Or, do you think the
difference is a biased estimate of the effect of this treatment?
For your answer to this part, provide evidence based only on the
variables in the data set.