Read the article Modeling Home Prices Using Realtor Data.
Create a directory (folder) named HomePricesProject. Store all work for this project in this directory.
Create an Rmarkdown document named Project1.Rmd inside the HomePricesProject directory. Complete all subsequent directions in this document.
Read the data from http://ww2.amstat.org/publications/jse/datasets/homes76.dat.txt into an R object named HP.
Remove columns 1, 7, 10, 15, 16, 17, 18, and 19 from HP and store the result back in HP.
Name the columns in HP price, size, lot, bath, bed, year, age, garage, status, active, and elem, respectively.
Use the function datatable from the DT package to display the data from HP. Your data display should look similar to the one below.
Explore the data for variables size, lot, bath, bed, age, garage that might help explain the price of a house. (Hint: matrix of correlations/ matrix of scatterplots)
What are the units for price and size?
mod1 that regresses price on all of the variables in HP with the exception of status and year. Produce a summary of mod1 and graph the residual plot. Based on your residual plot, what modification you might do to mod1? Report the adjusted \(R^2\) value for mod1.Create a new model (mod2) by modifying mod1 using the modification you suggested in the previous question. Report the adjusted \(R^2\) value for mod2. Do you see any improvements to mod1? Justify your answer using adjusted \(R^2\) values and residual plots.
Create a new model (mod3) by adding an interaction term of bath and bed and age\(^2\) to mod1. Report the adjusted \(R^2\) value for mod3. Do you see any improvements to mod1? Justify your answer using adjusted \(R^2\) values and residual plots.
Create a new model (mod4) with all the terms in mod3 but using only edison and harris from elem variable. Hint: When adding edison in to the model you can use I(elem == 'edison'). Your estimated coefficients should agree with those in the article. Report the adjusted \(R^2\) value for mod4.
Conduct a F-test (anova(mod4, mod3)) and perform a 4-step hypothesis test to check whether the Full model (mod3) is better than reduced model (mod4). Does your p-value agree with the one presented in the article?
mod4 to create a 95% prediction interval for a home with the following features: 1879 feet, lot size category 4, two and a half baths, three bedrooms, built in 1975, two-car garage, and near Harris Elementary School. Interpret your results.