Part 1 (30 points)

HousePrices data set is a cross-sectional data set on house prices and other features, e.g., number of bedroom, of houses in Windsor, Ontario. The data were gathered during the summer of 1987.

Use the HousePrices data to perform the following tests using Linear Regression settings:

i. Construct a summary stat for all the variables in the HousePrices data. ( 5 points)

ii. What is the percentage of houses in the data with Driveway, Gas-Heat and Air-conditioning present? (Hint: find the mean after creating dummy variables with driveway, gasheat, and aircon variables respectively). (5 points)

iii. Construct a linear regression model to test whether number of bedrooms influence house prices. Provide a summary of the linear regression model using summary() function. (10 points)

The online quiz (Q1 to Q4) will be related to the following concepts. You do not have to respond to the following questions in the R program:

a. How do you interpret the coefficient of Number of Bedrooms in the model?

b. What is the null hypothesis related to the model to test the effect of number of bedrooms on house price?

c. To infer the effect of number of bedrooms on house price, draw your conclusion based on p-value.

d. Comment on model accuracy: R-square

iv. Construct a multiple linear regression model by including all variables as predictors of house prices (response variable) and observe the effect on the house prices. Provide a summary of the regression model using summary() function. (10 points)

Variable description of HousePrices data: A data frame containing 546 observations on 12 variables.

price: Sale price of a house.

lotsize: Lot size of a property in square feet.

bedrooms: Number of bedrooms.

bathrooms: Number of full bathrooms.

stories: Number of stories excluding basement.

driveway: Factor. Does the house have a driveway?

recreation: Factor. Does the house have a recreational room?

fullbase: Factor. Does the house have a full finished basement?

gasheat: Factor. Does the house use gas for hot water heating?

aircon: Factor. Is there central air conditioning?

garage: Number of garage places.

prefer: Factor. Is the house located in the preferred neighborhood of the city?

Part 2: (40 points)

Use the Credit data to perform the following tests using Linear Regression settings:

A. Perform the following steps: (20 points)

i. Attach the Credit data to the R environment. (5 points)

ii. Observe the number of rows in the Credit data. Observe the dimension of the Credit data. ( 5 points)

iii. Provide a summary stat for the variables in Credit data. (5 points)

iv. What is the percentage of Student in the Credit data? What is the percentage of Female in the Credit data? (5 points)

B. Construct a linear regression model as follows: (20 points)

Response variable: Credit Card Balance

Predictors: Credit Rating, Student, Credit Rating * Student (interaction terms)

Provide a summary of the model using summary() function.

The online quiz (Q5 to Q8) will be related to the following concepts. You do not have to respond to the following questions in the R program:

i. What is the effect of Student on Credit Card Balance? Explain the coefficient in terms of how Student status effect changes in Credit Card Balance. Explain the significance level of the coefficient of Student.

ii. What is the total effect of Credit Rating on Credit Card Balance for non-students? Explain the coefficient in terms of how changes in Credit Rating effect changes in Credit Card Balance for non-students. Explain the significance level of the coefficient of Credit Rating.

iii. What is the total effect of Credit Rating on Credit Card Balance for students? Is the effect of Credit Rating on Credit Card Balance is significantly different for students vs. non-students? Explain results using relevance test statistics.

Part 3: (30 points)

Use the Credit data to perform the following tests using Linear Regression settings: Online quiz (Q9 to Q10) will be based on the results of the regressions performed below.

i. Test whether Age influence Credit Card Balance on the basis of simple linear regression.

(Provide a summary of the model using summary() function).

ii. Use Age and Credit Rating as predictors of Credit Card Balance (response variable) in a multiple linear regression setting. (Provide a summary of the model using summary() function). Interpret the effects of both the predictors.

iii. Compare effect of Age from part (i) and (ii). Provide answer to this question in R Code (Script).

Deliverables:

1. Please submit one R program (one file) containing three parts of the assignment (mark/comment so that each part is separated clearly in the program). R code should provide comments on each sections of the assignment the code is intended for. Also indicate which team member contributed to which sections of the code.