Why?

Sometimes it’s easiest from a sharing standpoint just to read data from somewhere like github rather that distributing it with a markdown and messing with individual idiosyncratic directories. We can post data to a public repository like:
https://github.com/Reilly-ConceptsCognitionLab/reillylab_data/tree/main

The trick is getting the data from Github into R. Let’s go through the process of creating some data, saving it as an rda file, posting it to Github and then reading it back in. You will need to install the ‘RCurl’ library.

# Here are some fake data
set.seed(123)
myfakedata_test <- data.frame(cbind(gaussian = rnorm(10, 50, 5), uniform = runif(10)))  #using cbind
print(myfakedata_test)
##    gaussian   uniform
## 1  47.19762 0.8895393
## 2  48.84911 0.6928034
## 3  57.79354 0.6405068
## 4  50.35254 0.9942698
## 5  50.64644 0.6557058
## 6  58.57532 0.7085305
## 7  52.30458 0.5440660
## 8  43.67469 0.5941420
## 9  46.56574 0.2891597
## 10 47.77169 0.1471136

Saving data as rda and csv files

Let’s save that object as an rda file and as a CSV. But we will do something snazzy – marking the CSV with the system date so it has a distinct name. If you don’t add row.names=F the first column of your CSV will be a sequence of numbers reflecting an artifact of the rownames from R

currentDate <- Sys.Date()
save(myfakedata_test, file = paste("data/myfakedata", currentDate, ".rda", sep = "-"))
write.csv(myfakedata_test, file = paste("data/myfakedata", currentDate, "csv", sep = "-"))

Here’s the file in your data folder!


Let’s push that file to github using github desktop. I’ll first change the name to ‘myfakedata.rda’ to make it easier. I have already set up a public repository called reillylab_data for storing and distributing raw data. I have also cloned this repository on my own computer (it appears as a folder). I will dump ‘myfakedata.rda’ into that folder and sync with github desktop. Here it is ready for staging. You will just commit and push it to the origin

Now if you look on Github, the file should automagically be there!

Okay so in this next step we want to start with a clean slate and read myfakedata.rda into R from Github. The first step is to find its URL and write that to an object. You can find that URL by navigating to the file and copying its URL like this:

Now this is a key step! You have to add ?raw=true to the end of the URL for it to work.
This URL you just copied will not work!
https://github.com/Reilly-ConceptsCognitionLab/reillylab_data/blob/main/myfakedata.rda

This URL with ‘?raw=true’ added to the end will work. No backslashes – nothing fancy
https://github.com/Reilly-ConceptsCognitionLab/reillylab_data/blob/main/myfakedata.rda?raw=true

myurl <- "https://github.com/Reilly-ConceptsCognitionLab/reillylab_data/blob/main/myfakedata.rda?raw=true"
load(url(myurl))
print(myfakedata)
##    gaussian   uniform
## 1  47.19762 0.8895393
## 2  48.84911 0.6928034
## 3  57.79354 0.6405068
## 4  50.35254 0.9942698
## 5  50.64644 0.6557058
## 6  58.57532 0.7085305
## 7  52.30458 0.5440660
## 8  43.67469 0.5941420
## 9  46.56574 0.2891597
## 10 47.77169 0.1471136

Now when you want to distribute data to collaborators or have people download it for a class, you don’t have to send them the data. Just have them download (and load) it from github.