Statistics 101
Data Analysis and Statistical Inference
 

Instructions for lab 7


Lab Objective

The purpose of the lab is to analyze a data set using what we have learned so far in class.

Lab Procedures


In California,  there are four levels of incarceration facilities for prisoners, categorized as Level 1, Level 2, Level 3, and Level 4.  The level increases as the amount of security at the facility increases.   Level 4 facilities are reserved for the most dangerous prisoners, or prisoners who need protection from other inmates.   Each prisoner is assigned to a facility based on his or her classification  score, which is determined from the length of the prison sentence and other variables, including age, marital status and prior convictions.  (Note: Some prisoners are assigned to facilities based on criteria other than classification scores, but this is a relatively small number.)

The highest security (Level 4) facilities are expensive to run.   Hence, the California Department of Corrections (CDC) is interested in how well Level 4 facilities eliminate risks that dangerous inmates pose to other inmates, staff, and themselves.   In this lab, we address this question using data collected by the CDC.   A reference for the study is:

Berk, R. A. and de Leeuw J.  (1999) "An Evaluation of California's Inmate Classification System Using a Generalized Regression Discontinuity Design. " Journal of the American Statistical Association, Vol. 94, pp. 1045--1052.

Description of the data


Beginning in January 1994, the CDC began enrolling inmates in this study.  A total of 3,922 inmates are included (only 3918 have
classification scores).  The response variable indicates whether the prisoner committed any misconduct violations.  All incidents  of misconduct were recorded, including less serious violations such as not standing for a count or not showing up for an assignment, as well as more serious violations, like drug trafficking or assaulting  a corrections officer. A "Strike 2" inmate is a prisoner serving time for a second felony and who was sentenced under a  California law mandating sentence length enhancements.  A "Strike 3" inmate is a prisoner serving time for a third felony, in which case that same law mandated a life sentence.  Since such prisoners have little to lose, they are usually assigned to the maximum (Level 4)  security prisons.

There are five variables in the data set.
 
Misconduct........Misconduct violation (1) or not (0)
 
Score..........       .Classification score
 
Strike 2.......      .Two strikes inmate (1) or not (0)
 
Strike 3........      Three strikes inmate (1) or not (0)
 
Level 4...........    Classified to Level 4 (1) or not (0)

For these data, we consider assignment to Level 4 to be the treatment, and assignment to any other level to be the control.   Note that assignments to treatment or control are based on the classification scores, not randomization.  The response variable is misconduct.  

Open the data set incarceration.jmp by clicking on the link.    All the variables are currently coded as continuous  variables.  You should change them to nominal as needed.  The data were downloaded from the UCLA Department of Statistics web site.

Questions:

1.  Compare the percentages of Strike 2, Strike 3, and other  prisoners for the group that committed misconducts and the group that did not commit misconducts.  Does the "Strike rate" appear to be associated with misconduct rates?   If so, describe the association.   Include the percentages in your answer.

JMP Hint:   We need a variable that has three levels--Strike 1, Strike 2, other--to answer this question most directly.  You can create this variable by making a new column, using a Formula that is "Strike2 + 2*Strike3."  This generates a 1 for Strike 2 prisoners, a 2 for Strike 3 prisoners, and a zero for other prisoners.  Convert the variable to a nominal variable for use in analyses.

2.  Compare the distributions of classification scores for the prisoners who commit misconducts and for those who do not commit misconducts.   Describe the main conclusions from your comparison.  Include a rough sketch of the graphical display you used for the comparison.

3.  Compare the percentage of prisoners in Level 4 facilities who commit misconducts with the percentage of prisoners in Level 1-3 facilities who commit misconducts.  What do these percentages suggest about the effect of being in Level 4 on misconduct rates?  Include both percentages in your answer.

4.  Would you say that the estimated difference you found in Question 3 is a reasonable estimate of the effect of being assigned to a Level 4 facility on misconduct rates?  Or, do you think the difference is a biased estimate of the effect of this treatment?  For your answer to this part, provide evidence based only on the variables in the data set.