Rocker: explanation and motivation for Docker containers usage in applications development


Marcin Kosiński

      

September 30, 2016

What can be called an
(R) application?

What can be called an
(R) application?

An application program is a computer program designed to perform a group of coordinated functions, tasks, or activities for the benefit of the user.

a spreadsheet, a web browser, a media player…

or an R executable code!

source('01_read_data.R')
source('02_data_everything.R')
source('03_send_data.R')

What is the development speed of main
R packages?

Speed of R development?

Thanks

Source: Why should you backup your R objects? by pbiecek

Speed of R development?

Thanks

Source: Why should you backup your R objects? by pbiecek

Problems of using
(R) applications?

Problems of using
(R) applications?

Are used in various environments on different platforms (development/production).
Each might have different

  • base version of R
  • versions of R packages
  • versions of dependent software (java/spark)
  • global system variables

or lack of them

data.frame(
  value = Sys.getenv(
    c('JAVA_HOME', 'LANG', 'HADOOP_CONF_DIR')))
                                            value
JAVA_HOME       /usr/lib/jvm/java-7-openjdk-amd64
LANG                                  pl_PL.UTF-8
HADOOP_CONF_DIR                                  

Problems of using
(R) applications?

pandoc version 1.12.3 or higher is required and was not found (R shiny)

# pandoc version 1.12.3 or higher is required and was not found.

rocker/shiny

Sys.setenv(RSTUDIO_PANDOC="/opt/shiny-server/ext/pandoc")
Sys.getenv('RSTUDIO_PANDOC')
[1] "/usr/lib/rstudio/bin/pandoc"

rmarkdown::render freezes because pandoc freezes when LC_ALL and LANG are unset

Problems of using
(R) applications?

Code example - Can’t gather tibble in R

library(tidyr) # dplyr 0.4.2
iris %>%
  select(-Sepal.Width) %>%
  gather(Species) %>% head
  Species      Species value
1  setosa Sepal.Length   5.1
2  setosa Sepal.Length   4.9
3  setosa Sepal.Length   4.7
4  setosa Sepal.Length   4.6
5  setosa Sepal.Length   5.0
6  setosa Sepal.Length   5.4
library(tidyr) # dplyr 0.4.3
Error: Each variable must have a unique name.
Problem variables: 'Species'

Problems of using
(R) applications?

Object example

created in one version of ggplot can’t be printed in another.

library(ggplot2)
library(archivist)
archivist::aread('pbiecek/archivist/scripts/packDev/923ec99f79cce099408d4973471dd30d')
Error in FUN(X[[i]], ...) : attempt to apply non-function
5. FUN(X[[i]], ...)
4. lapply(layers, function(y) y$layer_data(plot$data))
3. ggplot_build(x)
2. print.ggplot(x)
1. function (x, ...) UseMethod("print")(x)

Small solution is to restore libraries from session in which the object was created.

Problems of using
(R) applications?

Dependent software example

Can’t install git2r nor devtools R packages on centOS 7.0 64 bit

configure: error: OpenSSL library required
See `config.log' for more details
ERROR: configuration failed for package ‘git2r’
* removing ‘/usr/lib64/R/library/git2r’
ERROR: dependency ‘git2r’ is not available for package ‘devtools’
* removing ‘/usr/lib64/R/library/devtools’

Lack of OpenSSL

yum install openssl-devel

Docker

What is Docker?

Package your application into a standardized unit for software development

Docker containers wrap a piece of software in a complete filesystem that contains everything needed to run: code, runtime, system tools, system libraries – anything that can be installed on a server. This guarantees that the software will always run the same, regardless of its environment.

Docker’s architecture

Thanks

Source: What is Docker’s architecture?

Basic commands

docker build -t tag path # build locally
docker push -t # to Docker registry
docker pull -t
docker run -it / -d # run Docker
docker images # show downloaded Dockers
docker ps # show running Dockers
docker rmi # remove Docker

What might be Rocker?

Using (useful) Rockers/Dockers

Because using is simple than creating.

Docker containers for Bioconductor

docker run -ti bioconductor/devel_base R

Rocker - R configurations for Docker

docker run -d -p 8787:8787 rocker/rstudio
docker run -d -p 80:3838 \
    -v /srv/shinyapps/:/srv/shiny-server/ \
    -v /srv/shinylog/:/var/log/ \
    rocker/shiny

Creating Dockers/Rockers

rocker/hadleyverse/Dockerfile

rocker/rstudio

## Start with the official rocker image providing 'base R' 
FROM r-base:latest
...

rocker/r-base

FROM debian:testing

Bigger example

CzasDojazdu - Dockerfile

FROM rocker/hadleyverse:latest 
MAINTAINER Marcin Kosiński "m.p.kosinski@gmail.com"
RUN R -e "install.packages('shinydashboard', 
                repos='https://cran.rstudio.com/')"
...
RUN mkdir -p app/Rscripts app/dane app/dicts

ADD Rscripts /app/Rscripts
ADD dane /app/dane
ADD dicts /app/dicts
ADD 000_runme.R /app/

VOLUME /srv/shiny-server/CzasDojazdu/
WORKDIR /app
CMD R -f /app/000_runme.R

Overall benefits

Thanks

Get started with Docker

Get started with Docker

Thanks

More on R bloggers: R 3.3.0 is another motivation for Docker

Potential question - Docker vs Virtual Machine

How is Docker different from a normal virtual machine?

So, when do you use a Container or VM?

Docker vs VMs

Potential question - Docker on Windows

Docker for Windows

Getting Started with Docker for Windows