1 Data description

After you graduated, you started to work for one of the best firms in the country. You were hired because you have Data Analysis skills with \(R\). During your first week your manager comes to your office and gives you the following data set and ask you to “analyze the hell out of this data” (his words. not mine). Mainly he wants you to build a linear model to predict executive salaries. But you know you can do much more! Analyze the given data and create a report like your job depends on this.

Data stored as a .txt file under week 10. Data consists of 11 variables.

Variable Description
y Salary of executive
x1 Experience (in years)
x2 Education (in years)
x3 Gender (1 if male 0 if female)
x4 Number of employees supervised
x5 Corporate assets (in millions of USD)
x6 Board member (1 if yes, 0 if no)
x7 Age (in years)
x8 Company profits (in millions of USD)
x9 Has international responsibility (1 if yes, 0 if no)
x10 Company’s total sales (in millions of USD)

2 Load data into R

3 Rename columns using the following names

Variable Names Description
y salary Salary of executive
x1 experience Experience (in years)
x2 education Education (in years)
x3 gender Gender (1 if male 0 if female)
x4 emps_sup Number of employees supervised
x5 assets Corporate assets (in millions of USD)
x6 board_mb Board member (1 if yes, 0 if no)
x7 age Age (in years)
x8 profit Company profits (in millions of USD)
x9 int_res Has international responsibility (1 if yes, 0 if no)
x10 sales Company’s total sales (in millions of USD)

4 Convert Data types

The data types in this data set does not make a lot of sense. We change the data types as shown below.

library(dplyr)

ExeSal2 <- ExeSal %>%
  mutate( salary = as.numeric(salary), experience = as.numeric(experience), education = as.numeric(education), gender = as.factor(gender), emps_sup = as.numeric(emps_sup), assets = as.numeric(assets), board_mb = as.factor(board_mb), age = as.numeric(age), profit = as.numeric(profit), int_res = as.factor(int_res), sales = as.numeric(sales))

str(ExeSal2)
'data.frame':   100 obs. of  12 variables:
 $ id        : int  1 2 3 4 5 6 7 8 9 10 ...
 $ salary    : num  11.4 11.8 11.4 11.2 11.7 ...
 $ experience: num  12 25 20 3 19 14 18 2 14 4 ...
 $ education : num  15 14 14 19 12 13 18 17 13 16 ...
 $ gender    : Factor w/ 2 levels "0","1": 2 2 1 2 2 1 2 2 2 2 ...
 $ emps_sup  : num  240 510 370 170 520 420 290 200 560 230 ...
 $ assets    : num  170 160 170 170 150 160 170 180 180 160 ...
 $ board_mb  : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
 $ age       : num  44 53 56 26 43 53 43 31 43 36 ...
 $ profit    : num  5 9 5 9 7 9 7 10 7 10 ...
 $ int_res   : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ sales     : num  21 28 26 24 27 27 22 26 23 25 ...

For this project you need to build several multiple linear regression models. Be sure to Conduct EDA and describe for each model. For full credit, build at least five multiple linear regression models including at least one model with an interaction (quantitative and qualitative) term. It is important that you interpret each coefficient of each model. Finaly, for each model, run model diagnostics.