R Hero saves Backup City with archivist and GitHub
Have you ever suffered because of the impossibility of reproducing graphs, tables or analysis’ results in R? Have you ever bothered yourself for not being able to share R objects (i.e., plots or final analysis models) within your reports, posters or articles? Or maybe simply you have too many objects you can’t manage to store in a convenient and handy way? Now you can share partial results of analysis, provide hooks to valuable R objects within articles, manage analysis’ results and restore objects’ pedigree with archivist package and its extension archivist.github, allautomatically through GitHub without closing RStudio. If you are tired of archiving results by yourself, then read how you can became an R Hero with the archivist.github package power.
R Hero archiving power
Recently I’ve visited Backup City, a data analysis mecca in the middle of Reproducible Research RLand. That’s where I ovearheared a feverish discussion between R Hero and commissar O’Rdon. You can read the story of their meeting at the opening comic.
archivist.gitub: archivist and GitHub integration
archivist.github is a package with tools for archiving, managing and sharing R objects via GitHub and is the extension of the archivist. You can install package from CRAN
I have prepared a workflow graph to visualize functionalities of archivist.github
and provide explanation of core powers in this post.
After you’ve created a GitHub developer application (the process is described at archivist.github: 2.1 OAuth open autorization, set: Homepage URL - http://github.com, Authorization callback URL - http://localhost:1410) you will be able to automatically create repositories on GitHub from R console. Below is an example on how to authorise with GitHub API (using your application Client ID and Client Secret), create a GitHub repository with archivist-like Repository and automatically archive R object on GitHub
One can check that the artifact is really on GitHub and that the commit was performed (with great help of git2r package)
Each object (referred as artifact) is archived with it’s metadata and md5hash in case someone would like to restore or search for archived objects within Repository.
Partial results archiving and objects’ pedigree restoration
We have prepared extended version of pipe - %>% operator %a% so that every partial result of analysis workflow can be archived. Below is an example of workflow archiving for RTCGA (about which I wrote here) RNASeq data (genes’ expression) (broader example can be find here) and it’s pedigree restoration
|1||filter(substr(bcr_patient_barcode, 14, 15) == “01”)||1da5a026aae19e0d0467ba3773679e28|
[[env]] is the object before transformations. We are working on using original names for objects in this issue.
This operation does not archive objects automatically on GitHub as this is functionality from base archivist package. One have to upload objects with
print() to use
After global parameters specification (aoptions() function sets ‘user’, ‘repo’, and ‘password’ parameters for each archivist.github and archivist function globally) we don’t have to use
archive function after each call to provide hooks in rmarkdown reports. We can overload
print() function for specific classes so that after printing objects will be also evaluated with
Call: lm(formula = weight ~ group, data = pld)
Call: lm(formula = weight ~ group - 1, data = pld)
This is the GitHub equivalent for local archiving with addHooksToPrint
Feedback and Notes
If you have any comments or user request, please see Feedback and Notes section to be aware of our future plans.
More examples can be checked at archivist.github Tutorial or you can learn more during @pbiecek talk How to use the archivist package to boost reproducibility of your research at useR2016 Conference.
If you’d like to meet more R Heroes then restore message that was archived for commisar O’Rdon with
Paintings were made by pedzlenie.Tweet