Skip to content

R / RStudio

Description#

At IFB Core, R and RStudio server share the same software base.

It is possible to access the same R deployment, with the same packages via RStudio and via SSH. This will make it easier to switch from one interface to the other. The core of R are indeed provided by Conda packages.

Note that through CLI (Command Line Interface), we can provide several R versions in parallel. But only one is available via RStudio server.

RStudio#

RStudio is a web interface on top of R with some useful features. It can be installed either on a desktop or through a shared server.

https://ondemand.cluster.france-bioinformatique.fr

R#

R can also be accessed in a interacting mode through a terminal.

$ ssh login@core.cluster.france-bioinformatique.fr

$ # Though CLI, we can access different R versions in parallel
$ module avail r/
------------------------------------------------------------------------------------------------------- /shared/software/modulefiles -------------------------------------------------------------------------------------------------------
r/3.5.1  r/3.6.3  r/4.0.0
$ module load r/3.6.3

$ # With srun --pty, we access one of the slurm node and keep the interaction with it
$ srun --pty R

R version 3.6.3 (2020-02-29) -- "Holding the Windsock"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
[...]

> 1+1
[1] 2

Rscript#

Rscript enables the launch of a stand-alone job in the background

Pros:

  • If the job is rather long, it can be launched in the background using (sbatch)
  • For reproducibility, you keep all your R commands in a text file you can reuse in the future
  • It can manage arguments that can be used with different sets of data
  • It can be shared and published

Cons:

  • Since it's stand-alone and so self-sufficient, it needs to have all the parameters, and the inputs in advance.

A sample script#

#!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)
print("Hello, World!")
print(args[1])

In the terminal#

$ module load r/3.6.3

$ srun Rscript helloword.r Sandy
[1] "Hello, World!"
[1] "Sandy"

Packages#

R Packages are installed:

Migrate from RStudio 2 R/Rscript#

Why migrate from RStudio to Rscript and bother yourself with Linux, Rscript and Slurm.

It's because, you need power and:

  • Our IFB Core RStudio have limited resources: 56 CPU and 512 GB of RAM.
  • It's shared between users.
  • If a user consumes too much resources, the server will be overloaded and all user sessions will suffer a loss of performance

Using the IFB Cluster, it's 70 computer nodes that are shared between users. Once you have booked resources, they are just for you.

[RStudio] Save the RStudio script#

If you already did some tests on RStudio, you can save your script in a text file .r

[Rscript] Adapt the script to Rscript#

As was said before:

Since it's stand-alone and so self-sufficient, it needs to have all the parameters, and inputs in advance.

So we need to adapt one by one the command lines we launched on RStudio to produce a stand-alone R script.

To edit and transfer your script, please look at this tutorial: quick-start#transfer

In this example, we will:

  • Pass a file as argument to the script
  • Plot the data in a pdf
  • Save the session in a RData

The script: plot.r#

#!/usr/bin/env Rscript

# Deal with arguments
args = commandArgs(trailingOnly=TRUE)
myfile = args[1]

# Read the file
mydata = read.table(myfile, header=FALSE, sep="\t")

print(head(mydata))

# Open a pdf file
pdf("rplot.pdf")
# Create a plot
plot(mydata)
# plot(mydata) # we can add graph to add page within the pdf
# Close the pdf file
dev.off()

# Save the R Session
save.image(file = "myplot.RData")

In the terminal#

$ # Easy!
$ cd /shared/project/myfancyproject

$ # Generation of a little dummy file:
$ paste <(seq 0 10) <(seq 0 10) > 10.tab

$ # Load the R environment
$ module load r/3.6.3

$ # Launch your job on the cluster with srun
$ srun Rscript plot.r 10.tab
V1 V2
1  0  0
2  1  1
3  2  2
4  3  3
5  4  4
6  5  5
null device
        1
$ # Check that we get our expected outputs
$ ls -l rplot.pdf myplot.RData
-rw-rw-r-- 1 user group  218 Jul 29 15:25 myplot.RData
-rw-rw-r-- 1 user group 4598 Jul 29 15:25 rplot.pdf

[Slurm] [Rscript] MORE POWER! :rocket:#

More CPU, more Memory#

Without any argument, by default, srun will only be allowed to use:

  • 2 GB of RAM
  • 1 CPU

To increase the resources available for the job:

srun --mem 20GB --cpus 4 Rscript plot.r 10.tab

Use sbatch#

srun have some limits:

  • If the connection is lost, the job is stopped
  • It can be interesting to keep a track of your analyses to rerun them for example.

In order to use sbatch, you need a little bash script with some parameters for it.

A sbatch script: rplot.sbatch#
#!/bin/bash
#
#SBATCH -o slurm.%N.%j.out
#SBATCH -e slurm.%N.%j.err

#SBATCH --partition fast
#SBATCH --mem 20GB
##SBATCH --cpus 4

module load r/3.6.3
Rscript plot.r 10.tab
In the terminal#
sbatch rplot.sbatch