1 Libraries and whatnot

Our goal here is to explore data visualization, descruptive statistics, and assumption checks. Here are the libraries and whatnot that we’ll be using

library(reshape2)
library(tidyverse)
library(stats)
library(psych)
library(tibble)
library(mvoutlier)
library(gvlma)
library(tibble)
jamie.theme <- theme_bw() + theme(axis.line = element_line(colour = "black"), 
    panel.grid.minor = element_blank(), panel.grid.major = element_blank(), 
    panel.border = element_blank(), panel.background = element_blank(), legend.title = element_blank())

2 Summary Statistics

Generating basic descriptive statistics from the mtcars dataset from base R. Inspect the dataset

head(mtcars, n = 15)  # List the data in 'mtcars' dataset
##                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4          21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag      21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710         22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive     21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant            18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360         14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D          24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230           22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280           19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C          17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE         16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL         17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC        15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

2.1 Interquartile range

Show the interquartile range for a particular variable, mean and SD

summary(mtcars$mpg)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.40   15.43   19.20   20.09   22.80   33.90

2.2 mean

mean(mtcars$mpg)  # Calculate the average mpg 
## [1] 20.09062

2.3 standard deviation

sd(mtcars$mpg)
## [1] 6.026948

2.4 range

range(mtcars$mpg)
## [1] 10.4 33.9

3 Coarse data visualization

Nothing fancy here. Just brute force eyeballing of trends

3.1 Scatterplot

Let’s say, for example, we are interested in MPG as a function of total number of carborators. We could treat carborators as a continuous variable and plot them.

plot(mtcars$carb ~ mtcars$mpg, col = "red")

3.2 Run a simple OLS regression, take beta weights and annotate scatterplot

hmmm… maybe we want to run a quick regression and plot the slope and intercept (MPG as a linear function of HP)

maybe <- lm(mtcars$mpg ~ mtcars$hp)
summary(maybe)
## 
## Call:
## lm(formula = mtcars$mpg ~ mtcars$hp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7121 -2.1122 -0.8854  1.5819  8.2360 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
## mtcars$hp   -0.06823    0.01012  -6.742 1.79e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.863 on 30 degrees of freedom
## Multiple R-squared:  0.6024, Adjusted R-squared:  0.5892 
## F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07
plot(maybe)