This markdown will create two linear data sets by randomly sampling from a linear function. We will then test the linearity of each function with a rainbow test. Finally, we will see how the graphs of the functions change based on Y axis scaling.

Create two samples by randomly pulling 1000 values from a linear relationship

We pull points from the function y = 0.4x. Below X is sampled from a domain on the interval [0, 100]

set.seed(200)
line100_sample <- data.frame(Type = "100", X = sample(1:100, 1000, replace = TRUE))
line100_sample$Y <- line100_sample$X * 0.4
# sample must be sorted by X value to be linear as tested by rainbow test
line100_sample <- line100_sample %>%
  dplyr::arrange(X)
head(line100_sample)
##   Type X   Y
## 1  100 1 0.4
## 2  100 1 0.4
## 3  100 1 0.4
## 4  100 1 0.4
## 5  100 1 0.4
## 6  100 1 0.4

Here we sample from the same function with X on [0, 1000]

set.seed(300)
line1000_sample <- data.frame(Type = "1000", X = sample(1:1000, 1000, replace = TRUE))
line1000_sample$Y <- line1000_sample$X * 0.4
# sample must be sorted by X value to be linear as tested by rainbow test
line1000_sample <- line1000_sample %>%
  dplyr::arrange(X) %>%
  distinct()
head(line1000_sample)
##   Type X   Y
## 1 1000 1 0.4
## 2 1000 2 0.8
## 3 1000 4 1.6
## 4 1000 5 2.0
## 5 1000 8 3.2
## 6 1000 9 3.6

Investigate the linearity of both functions using a rainbow test

We must first create a linear model of 100 range sample then run a rainbow test

line100_lm <- lm(data = line100_sample, formula = line100_sample$X ~ line100_sample$Y)
print(lmtest::raintest(line100_lm))
## 
##  Rainbow test
## 
## data:  line100_lm
## Rain = -0.76061, df1 = 500, df2 = 498, p-value = 1

Create a linear model of 1000 range sample then run a rainbow test

line1000_lm <- lm(data = line1000_sample, formula = line1000_sample$X ~ line1000_sample$Y)
print(lmtest::raintest(line1000_lm))
## 
##  Rainbow test
## 
## data:  line1000_lm
## Rain = 0.19857, df1 = 324, df2 = 322, p-value = 1

The P-value of both rainbow tests is greater than 0.5, which means we can accept the null hypothesis, which states that the data is linear.

Compare graphs of both functions when y is scaled freely or not in ggplot facet wrapping

Graph the two functions together fixed Y axis scaling

both_line_df <- dplyr::bind_rows(line100_sample, line1000_sample)

compare_plot_fixed <- ggplot(data = both_line_df, aes(x = X, y = Y)) + 
  geom_point() +
  facet_wrap(~ Type)
compare_plot_fixed

Graph the two functions together with free Y axis scaling

compare_plot_free <- ggplot(data = both_line_df, aes(x = X, y = Y)) + 
  geom_point() +
  facet_wrap(~ Type, scales = "free_y")
compare_plot_free