Sometimes it’s easiest from a sharing standpoint just to read data
from somewhere like github rather that distributing it with a markdown
and messing with individual idiosyncratic directories. We can post data
to a public repository like:
https://github.com/Reilly-ConceptsCognitionLab/reillylab_data/tree/main
The trick is getting the data from Github into R. Let’s go through the process of creating some data, saving it as an rda file, posting it to Github and then reading it back in. You will need to install the ‘RCurl’ library.
# Here are some fake data
set.seed(123)
myfakedata_test <- data.frame(cbind(gaussian = rnorm(10, 50, 5), uniform = runif(10))) #using cbind
print(myfakedata_test)
## gaussian uniform
## 1 47.19762 0.8895393
## 2 48.84911 0.6928034
## 3 57.79354 0.6405068
## 4 50.35254 0.9942698
## 5 50.64644 0.6557058
## 6 58.57532 0.7085305
## 7 52.30458 0.5440660
## 8 43.67469 0.5941420
## 9 46.56574 0.2891597
## 10 47.77169 0.1471136
Let’s save that object as an rda file and as a CSV. But we will do something snazzy – marking the CSV with the system date so it has a distinct name. If you don’t add row.names=F the first column of your CSV will be a sequence of numbers reflecting an artifact of the rownames from R
currentDate <- Sys.Date()
save(myfakedata_test, file = paste("data/myfakedata", currentDate, ".rda", sep = "-"))
write.csv(myfakedata_test, file = paste("data/myfakedata", currentDate, "csv", sep = "-"))
Here’s the file in your data folder!
Let’s push that file to github using github desktop. I’ll first
change the name to ‘myfakedata.rda’ to make it easier. I have already
set up a public repository called reillylab_data for storing and
distributing raw data. I have also cloned this repository on my own
computer (it appears as a folder). I will dump ‘myfakedata.rda’ into
that folder and sync with github desktop. Here it is ready for staging.
You will just commit and push it to the origin
Now if you look on Github, the file should automagically be there!
Okay so in this next step we want to start with a clean slate and
read myfakedata.rda into R from Github. The first step is to find its
URL and write that to an object. You can find that URL by navigating to
the file and copying its URL like this:
Now this is a key step! You have to add ?raw=true to the end of the
URL for it to work.
This URL you just copied will not work!
https://github.com/Reilly-ConceptsCognitionLab/reillylab_data/blob/main/myfakedata.rda
This URL with ‘?raw=true’ added to the end will work. No backslashes
– nothing fancy
https://github.com/Reilly-ConceptsCognitionLab/reillylab_data/blob/main/myfakedata.rda?raw=true
myurl <- "https://github.com/Reilly-ConceptsCognitionLab/reillylab_data/blob/main/myfakedata.rda?raw=true"
load(url(myurl))
print(myfakedata)
## gaussian uniform
## 1 47.19762 0.8895393
## 2 48.84911 0.6928034
## 3 57.79354 0.6405068
## 4 50.35254 0.9942698
## 5 50.64644 0.6557058
## 6 58.57532 0.7085305
## 7 52.30458 0.5440660
## 8 43.67469 0.5941420
## 9 46.56574 0.2891597
## 10 47.77169 0.1471136
Now when you want to distribute data to collaborators or have people download it for a class, you don’t have to send them the data. Just have them download (and load) it from github.