Statistics 103
  Probability and Statistical Inference
 

Instructions for lab 7


Lab Objective

The purpose of the lab is to perform an exploratory data analysis.

Lab Procedures

What is the most important factor in determining the selling price of a house?  Is it the size of the house, the location, or the number of rooms?   In this lab, we examine real estate data from the Chicago metropolitan area to assess these questions.  The data comprise 3044 sales between 1989-1990 of detached single-family homes.  They were collected by the Department of Housing and Urban Development.  The variables on the data set are described in the code book appended to this lab.

Download the data set HousingData.JMP.

IMPORTANT CAVEAT:  The answers you get here are incomplete, in the sense that we're only looking at bivariate relationships.  It is possible to predict sales price from more than one variable at a time using multiple regression.  This is the preferred approach.  Multiple regression is covered in econometrics and other advanced statistical courses.

Questions

You are approched by a client who asks you to predict selling prices for houses in the Chicago area.  The client is planning to build a new house and wants to be able to sell it for lots of money.  The only data you have are the ones in the 1989-1990 HUD data set.  The questions the client wants you to answer are listed below.

When appropriate, provide number summaries that justify your answers.  Include brief interpretations of those numbers (e.g., "the correlation between xxx and yyy is -0.87, which means the two variables have a very strong, inverse relationship").  You don't need to provide graphical displays, but you can allude to them when appropriate.  Keep your total writing to one page if possible.

1)  Relations of prices with housing characteristics
a)  Describe the relationship between price and number of rooms in the house.
b)  Is the ratio of number of bathrooms to number of rooms in the house a good predictor of sales price?
c)  Describe the relationship between price and living area of the house.
d)  Describe the relationship between price and lot size.

2)  Relations of prices with amenities
a)  What percentage of houses have some form of air-conditioning, or no air-conditioning at all?  How does this differ by county?
b)  How much do houses with air-conditioning cost relative to houses without it?  How does this differ by county?
c)  What percentage of houses have some form of garage (i.e. built-in garage or not built-in garage), or no garage at all (i.e. other options)?  How does this differ by county?
d)  How much do houses with garages (i.e. built-in garage or not built-in garage) cost relative to houses without them?  How does this differ by county?

3) Relation of prices with neighborhood characteristics
a)  Describe the relationship between prices and the median income of the county.
b)  Describe the relationship between prices and the amount spent per capita by munipal governments.
c)  Describe the relationship between prices and the amount spent per pupil on school expenditures.


----------------------------------------------------------------------------------------------------
CODE BOOK

VARIABLE                 DESCRIPTION

SPRICE               Contract sales price of the house ($)
NROOMS           Total number of habitable room enclosures
LVAREA             Total living area (square ft)
AGE                     Age of the dwelling (years)
LOTSIZE             Total area of the lot (square ft)
AC TYPE             Central air-conditioning = 2, window or wall air-conditioning = 1, no air conditioning = 0
CNTRL AC          Central air-conditioning = 1, otherwise = 0
WNDW AC         Window/wall air-conditioning = 1, otherwise = 0
NO AC                 No air-conditioning = 1, otherwise = 0
AIRCON              Any air-conditioning = 1, no air-conditioning = 0.
NBATH                Number of bathrooms in the dwelling unit
GARAGE             Built-in garage = 4, carport = 3, not built-in garage = 2, on-site parking = 1, none = 0
PTAXES              Property tax rate during the year of purchase in the city where the unit is located (%)
COOK                  Dwelling in Cook county = 1, in Dupage County = 0
SSPEND              Operating expenses per pupil in the school district where the dwelling unit is located ($)
MSPEND             Expenditure per-capita by the municipal government where the dwelling is located during 1987 ($)
PCTWHT             Percentage of white population in 1990 in the census tract in which the unit is located (%)
MEDINC              Median 1990 income of the census tract in which the dwelling unit is located ($)
DFCL                    Distance from the Loop area in downtown Chicago (0.1 mile)
DFNI                     Distance from nearest highway entrance (0.1 mile)
PMD                     Distance from the nearest particulate matter (type of pollution) monitoring station (0.1 mile)
SOD                      Distance from the nearest sulfur (type of pollution) monitoring station (0.1 mile)
PARTICLE           Annual mean particulate matter readings for years 1989 and 1990 (in mg/m3)
SULFUR              Annual mean sulfur readings for years 1989 and 1990 (parts per million )
RACE                   Race of the purchaser, =1 if white, = 0 otherwise
CHILDREN         Number of dependent children in the household
INCOME              Total annual income of the purchaser ($)
OHARE                Housing unit within five mile radius of O'Hare Airport = 1, otherwise = 0
MARSTAT           Head of household married = 1,  not married = 0
RATE                   Mortgage interest rate published monthly. Each home is assigned a rate based on the month in which it was purchased.