Skip to content

R / RStudio

Description

At IFB, R and RStudio server share the same software base.

It is possible to access the same R deployment, with the same packages via RStudio and via SSH. This will make it easier to switch from one interface to the other. The core of R are indeed provided by Conda packages.

Note that through CLI (Command Line Interface), we can provide several R versions in parallel. But only one is available via RStudio server.

RStudio

RStudio is a web interface on top of R with some useful features. It can be installed either on a desktop or through a shared server.

The IFB Core Cluster provides a RStudio server on a computer with 56 CPU and 512 GB of RAM.

https://rstudio.cluster.france-bioinformatique.fr/

R

R can also be accessed in a interacting mode through a terminal.

$ ssh login@core.cluster.france-bioinformatique.fr

$ # Though CLI, we can access different R versions in parallel
$ module avail r/
------------------------------------------------------------------------------------------------------- /shared/software/modulefiles -------------------------------------------------------------------------------------------------------
r/3.5.1  r/3.6.3  r/4.0.0
$ module load r/3.6.3

$ # With srun --pty, we access one of the slurm node and keep the interaction with it
$ srun --pty R

R version 3.6.3 (2020-02-29) -- "Holding the Windsock"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
[...]

> 1+1
[1] 2

Rscript

Rscript enables the launch of a stand-alone job in the background

Pros: - If the job is rather long, it can be launched in the background using (sbatch) - For reproducibility, you keep all your R commands in a text file you can reuse in the future - It can manage arguments that can be used with different sets of data - It can be shared and published Cons: - Since it's stand-alone and so self-sufficient, it needs to have all the parameters, and the inputs in advance.

A sample script

#!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)
print("Hello, World!")
print(args[1])

In the terminal

$ module load r/3.6.3

$ srun Rscript helloword.r Sandy
[1] "Hello, World!"
[1] "Sandy"

Packages

R Packages are installed: - On demand via the IFB Community Support - By yourself using install.packages() for example

Migrate from RStudio 2 R/Rscript

Why migrate from RStudio to Rscript and bother yourself with Linux, Rscript and Slurm.

It's because, you need power and: - Our IFB Studio have limited resources: 56 CPU and 512 GB of RAM. - It's shared between users. - If a user consumes too much resources, the server will be overloaded and all user sessions will suffer a loss of performance

Using the IFB Cluster, it's 70 computer nodes that are shared between users. Once you have booked resources, they are just for you.

[RStudio] Save the RStudio script

If you already did some tests on RStudio, you can save your script in a text file .r

[Rscript] Adapt the script to Rscript

As was said before:

Since it's stand-alone and so self-sufficient, it needs to have all the parameters, and inputs in advance.

So we need to adapt one by one the command lines we launched on RStudio to produce a stand-alone R script.

To edit and transfer your script, please look at this tutorial: quick-start/#transfer

In this example, we will: - Pass a file as argument to the script - Plot the data in a pdf - Save the session in a RData

The script: plot.r

#!/usr/bin/env Rscript

# Deal with arguments
args = commandArgs(trailingOnly=TRUE)
myfile = args[1]

# Read the file
mydata = read.table(myfile, header=FALSE, sep="\t")

print(head(mydata))

# Open a pdf file
pdf("rplot.pdf")
# Create a plot
plot(mydata)
# plot(mydata) # we can add graph to add page within the pdf
# Close the pdf file
dev.off()

# Save the R Session
save.image(file = "myplot.RData")

In the terminal

$ # Easy!
$ cd /shared/project/myfancyproject

$ # Generation of a little dummy file:
$ paste <(seq 0 10) <(seq 0 10) > 10.tab

$ # Load the R environment
$ module load r/3.6.3

$ # Launch your job on the cluster with srun
$ srun Rscript plot.r 10.tab
V1 V2
1  0  0
2  1  1
3  2  2
4  3  3
5  4  4
6  5  5
null device
        1
$ # Check that we get our expected outputs
$ ls -l rplot.pdf myplot.RData
-rw-rw-r-- 1 user group  218 Jul 29 15:25 myplot.RData
-rw-rw-r-- 1 user group 4598 Jul 29 15:25 rplot.pdf

[Slurm][Rscript] MORE POWER! :rocket:

More CPU, more Memory

Without any argument, by default, srun will only be allowed to use: - 2 GB of RAM - 1 CPU

To increase the resources available for the job:

$ srun --mem 20GB --cpus 4 Rscript plot.r 10.tab

Use sbatch

srun have some limits: - If the connection is lost, the job is stopped - It can be interesting to keep a track of your analyses to rerun them for example.

In order to use sbatch, you need a little bash script with some parameters for it.

A sbatch script: rplot.sbatch

#!/bin/bash
#
#SBATCH -o slurm.%N.%j.out
#SBATCH -e slurm.%N.%j.err

#SBATCH --partition fast
#SBATCH --mem 20GB
##SBATCH --cpus 4

module load r/3.6.3
Rscript plot.r 10.tab

In the terminal

sbatch rplot.sbatch