Statistics 103
Probability and Statistical
Inference
Instructions for lab 7
Lab Objective
The purpose of the lab is to perform an exploratory data analysis.
Lab Procedures
What is the
most important factor in determining the selling price of a
house? Is it the size of the house, the location, or the number
of rooms? In this lab, we examine real estate data from the
Chicago metropolitan area to assess these questions. The data
comprise 3044 sales between 1989-1990 of detached single-family
homes. They were collected by the Department of Housing and Urban
Development. The variables on the data set are described in the
code book appended to this lab.
Download the data set HousingData.JMP.
IMPORTANT CAVEAT: The
answers you get here are incomplete, in the sense that we're only
looking at bivariate relationships. It is possible to predict
sales price from more than one variable at a time using multiple
regression. This is the preferred approach. Multiple
regression is covered in econometrics and other advanced statistical
courses.
Questions
You are approched by a client who asks you to
predict selling prices for houses in the Chicago area. The client
is planning to build a new house and wants to be able to sell it for
lots of money. The only data you have are the ones in the
1989-1990 HUD data set. The questions the client wants you to
answer are listed below.
When
appropriate, provide number summaries that justify your
answers. Include brief interpretations of those numbers (e.g.,
"the correlation between xxx and yyy is -0.87, which means the two
variables have a very strong, inverse relationship"). You don't
need to provide graphical displays, but you can allude to them when
appropriate. Keep your total writing to one page if possible.
1) Relations of prices with housing characteristics
a) Describe the relationship between price and number of rooms in
the house.
b) Is the ratio of number of bathrooms to number of rooms in the
house a good
predictor of sales price?
c) Describe the relationship between price and living area of the
house.
d) Describe the relationship between price and lot size.
2) Relations of prices with amenities
a) What percentage of houses have some form of air-conditioning,
or no air-conditioning at all? How does this differ by county?
b) How much do houses with air-conditioning cost relative to
houses without it? How does this differ by county?
c) What percentage of houses have some form of garage (i.e.
built-in garage or not built-in garage), or no garage at all (i.e.
other options)? How does this differ by county?
d) How much do houses with garages (i.e. built-in garage or not
built-in garage) cost relative to houses without them? How does
this differ by county?
3) Relation of prices with neighborhood characteristics
a) Describe the relationship between prices and the median income
of the county.
b) Describe the relationship between prices and the amount spent
per capita by munipal governments.
c) Describe the relationship between prices and the amount spent
per pupil on school expenditures.
----------------------------------------------------------------------------------------------------
CODE BOOK
VARIABLE
DESCRIPTION
SPRICE
Contract sales price of the house ($)
NROOMS Total number of
habitable room enclosures
LVAREA
Total living area
(square ft)
AGE
Age of the dwelling (years)
LOTSIZE
Total area of the lot
(square ft)
AC TYPE
Central air-conditioning = 2, window or wall air-conditioning = 1, no
air conditioning = 0
CNTRL AC Central
air-conditioning = 1, otherwise = 0
WNDW AC Window/wall
air-conditioning = 1, otherwise = 0
NO AC
No air-conditioning =
1, otherwise = 0
AIRCON
Any
air-conditioning = 1, no air-conditioning = 0.
NBATH
Number of bathrooms in the dwelling unit
GARAGE
Built-in garage = 4, carport = 3, not built-in garage = 2, on-site
parking = 1, none = 0
PTAXES
Property tax rate during the year of purchase in the city where the
unit is located (%)
COOK
Dwelling in Cook
county = 1, in Dupage County = 0
SSPEND
Operating expenses per pupil in the
school district where the dwelling unit is located ($)
MSPEND
Expenditure per-capita by the municipal government where the dwelling
is located during 1987 ($)
PCTWHT
Percentage of white population in 1990 in the census tract in which the
unit is located (%)
MEDINC
Median 1990 income of the census tract
in which the dwelling unit is located ($)
DFCL
Distance from the Loop area in downtown Chicago (0.1 mile)
DFNI
Distance from nearest highway entrance (0.1 mile)
PMD
Distance from the
nearest particulate matter (type of pollution) monitoring station (0.1
mile)
SOD
Distance from
the nearest sulfur (type of pollution) monitoring station (0.1 mile)
PARTICLE Annual
mean particulate matter readings for years 1989 and 1990 (in mg/m3)
SULFUR
Annual mean sulfur readings for years
1989 and 1990 (parts per million )
RACE
Race of
the purchaser, =1 if white, = 0 otherwise
CHILDREN Number of dependent
children in the household
INCOME
Total annual
income of the purchaser ($)
OHARE
Housing unit within five
mile radius of O'Hare Airport = 1, otherwise = 0
MARSTAT Head of
household married = 1, not married = 0
RATE
Mortgage
interest rate published monthly. Each home is assigned a rate based on
the month in which it was purchased.