R / RStudio
Description#
At IFB Core, R and RStudio server share the same software base.
It is possible to access the same R deployment, with the same packages via RStudio and via SSH. This will make it easier to switch from one interface to the other. The core of R are indeed provided by Conda packages.
Note that through CLI (Command Line Interface), we can provide several R versions in parallel. But only one is available via RStudio server.
RStudio#
RStudio is a web interface on top of R with some useful features. It can be installed either on a desktop or through a shared server.
https://ondemand.cluster.france-bioinformatique.fr
R#
R can also be accessed in a interacting mode through a terminal.
$ ssh login@core.cluster.france-bioinformatique.fr
$ # Though CLI, we can access different R versions in parallel
$ module avail r/
------------------------------------------------------------------------------------------------------- /shared/software/modulefiles -------------------------------------------------------------------------------------------------------
r/3.5.1 r/3.6.3 r/4.0.0
$ module load r/3.6.3
$ # With srun --pty, we access one of the slurm node and keep the interaction with it
$ srun --pty R
R version 3.6.3 (2020-02-29) -- "Holding the Windsock"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
[...]
> 1+1
[1] 2
Rscript#
Rscript enables the launch of a stand-alone job in the background
Pros:
- If the job is rather long, it can be launched in the background using (
sbatch
) - For reproducibility, you keep all your R commands in a text file you can reuse in the future
- It can manage arguments that can be used with different sets of data
- It can be shared and published
Cons:
- Since it's stand-alone and so self-sufficient, it needs to have all the parameters, and the inputs in advance.
A sample script#
#!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)
print("Hello, World!")
print(args[1])
In the terminal#
$ module load r/3.6.3
$ srun Rscript helloword.r Sandy
[1] "Hello, World!"
[1] "Sandy"
Packages#
R Packages are installed:
- On demand via the community support website
- By yourself using
install.packages()
for example
Migrate from RStudio 2 R
/Rscript
#
Why migrate from RStudio to Rscript and bother yourself with Linux, Rscript and Slurm.
It's because, you need power and:
- Our IFB Core RStudio have limited resources: 56 CPU and 512 GB of RAM.
- It's shared between users.
- If a user consumes too much resources, the server will be overloaded and all user sessions will suffer a loss of performance
Using the IFB Cluster, it's 70 computer nodes that are shared between users. Once you have booked resources, they are just for you.
[RStudio] Save the RStudio script#
If you already did some tests on RStudio, you can save your script in a text file .r
[Rscript] Adapt the script to Rscript#
As was said before:
Since it's stand-alone and so self-sufficient, it needs to have all the parameters, and inputs in advance.
So we need to adapt one by one the command lines we launched on RStudio to produce a stand-alone R script.
To edit and transfer your script, please look at this tutorial: quick-start#transfer
In this example, we will:
- Pass a file as argument to the script
- Plot the data in a pdf
- Save the session in a RData
The script: plot.r
#
#!/usr/bin/env Rscript
# Deal with arguments
args = commandArgs(trailingOnly=TRUE)
myfile = args[1]
# Read the file
mydata = read.table(myfile, header=FALSE, sep="\t")
print(head(mydata))
# Open a pdf file
pdf("rplot.pdf")
# Create a plot
plot(mydata)
# plot(mydata) # we can add graph to add page within the pdf
# Close the pdf file
dev.off()
# Save the R Session
save.image(file = "myplot.RData")
In the terminal#
$ # Easy!
$ cd /shared/project/myfancyproject
$ # Generation of a little dummy file:
$ paste <(seq 0 10) <(seq 0 10) > 10.tab
$ # Load the R environment
$ module load r/3.6.3
$ # Launch your job on the cluster with srun
$ srun Rscript plot.r 10.tab
V1 V2
1 0 0
2 1 1
3 2 2
4 3 3
5 4 4
6 5 5
null device
1
$ # Check that we get our expected outputs
$ ls -l rplot.pdf myplot.RData
-rw-rw-r-- 1 user group 218 Jul 29 15:25 myplot.RData
-rw-rw-r-- 1 user group 4598 Jul 29 15:25 rplot.pdf
[Slurm] [Rscript] MORE POWER! :rocket:#
More CPU, more Memory#
Without any argument, by default, srun
will only be allowed to use:
- 2 GB of RAM
- 1 CPU
To increase the resources available for the job:
srun --mem 20GB --cpus 4 Rscript plot.r 10.tab
Use sbatch
#
srun
have some limits:
- If the connection is lost, the job is stopped
- It can be interesting to keep a track of your analyses to rerun them for example.
In order to use sbatch
, you need a little bash script with some parameters for it.
A sbatch script: rplot.sbatch
#
#!/bin/bash
#
#SBATCH -o slurm.%N.%j.out
#SBATCH -e slurm.%N.%j.err
#SBATCH --partition fast
#SBATCH --mem 20GB
##SBATCH --cpus 4
module load r/3.6.3
Rscript plot.r 10.tab
In the terminal#
sbatch rplot.sbatch