This markdown will create two linear data sets by randomly sampling from a linear function. We will then test the linearity of each function with a rainbow test. Finally, we will see how the graphs of the functions change based on Y axis scaling.
We pull points from the function y = 0.4x. Below X is sampled from a domain on the interval [0, 100]
set.seed(200)
line100_sample <- data.frame(Type = "100", X = sample(1:100, 1000, replace = TRUE))
line100_sample$Y <- line100_sample$X * 0.4
# sample must be sorted by X value to be linear as tested by rainbow test
line100_sample <- line100_sample %>%
dplyr::arrange(X)
head(line100_sample)
## Type X Y
## 1 100 1 0.4
## 2 100 1 0.4
## 3 100 1 0.4
## 4 100 1 0.4
## 5 100 1 0.4
## 6 100 1 0.4
Here we sample from the same function with X on [0, 1000]
set.seed(300)
line1000_sample <- data.frame(Type = "1000", X = sample(1:1000, 1000, replace = TRUE))
line1000_sample$Y <- line1000_sample$X * 0.4
# sample must be sorted by X value to be linear as tested by rainbow test
line1000_sample <- line1000_sample %>%
dplyr::arrange(X) %>%
distinct()
head(line1000_sample)
## Type X Y
## 1 1000 1 0.4
## 2 1000 2 0.8
## 3 1000 4 1.6
## 4 1000 5 2.0
## 5 1000 8 3.2
## 6 1000 9 3.6
We must first create a linear model of 100 range sample then run a rainbow test
line100_lm <- lm(data = line100_sample, formula = line100_sample$X ~ line100_sample$Y)
print(lmtest::raintest(line100_lm))
##
## Rainbow test
##
## data: line100_lm
## Rain = -0.76061, df1 = 500, df2 = 498, p-value = 1
Create a linear model of 1000 range sample then run a rainbow test
line1000_lm <- lm(data = line1000_sample, formula = line1000_sample$X ~ line1000_sample$Y)
print(lmtest::raintest(line1000_lm))
##
## Rainbow test
##
## data: line1000_lm
## Rain = 0.19857, df1 = 324, df2 = 322, p-value = 1
The P-value of both rainbow tests is greater than 0.5, which means we can accept the null hypothesis, which states that the data is linear.
Graph the two functions together fixed Y axis scaling
both_line_df <- dplyr::bind_rows(line100_sample, line1000_sample)
compare_plot_fixed <- ggplot(data = both_line_df, aes(x = X, y = Y)) +
geom_point() +
facet_wrap(~ Type)
compare_plot_fixed
Graph the two functions together with free Y axis scaling
compare_plot_free <- ggplot(data = both_line_df, aes(x = X, y = Y)) +
geom_point() +
facet_wrap(~ Type, scales = "free_y")
compare_plot_free