Troubleshooting
💡 The following Troubleshooting can be completed by consulting the IFB Community Forum
[SLURM] Invalid account or account/partition combination specified
#
Complete message:
srun: error: Unable to allocate resources: Invalid account or account/partition combination specified
Explanation 1#
Your current default SLURM account should be the demo
one. You may have seen a red notice at login? You can check that using:
$ sacctmgr list user $USER
User Def Acct Admin
---------- ---------- ---------
cnorris demo None
Solution#
If you don't already have a project, you have to request one from the platform: https://my.cluster.france-bioinformatique.fr/manager2/project
Otherwise, you already have a project/account, you can either:
- Specify at each job your SLURM account:
srun -A my_account command
#!/bin/bash
#SBATCH -A my_account
command
- Change your default account
sacctmgr update user $USER set defaultaccount=my_account
☝️ How can I access to a Terminal 📺 in order to run the different commands?
- With Open On Demand: Shell Access
- With a SSH Client: Cluster/Log in
⚠️ status_bar is updated hourly. So it may still display demo as your default account by don't worry, it should have work.
[RStudio] Timeout or do not start#
Try to clean session files and cache:
# Remove (rm) or move (mv) RStudio files
# mv ~/.rstudio ~/.rstudio.backup-2022-27-02
rm -rf ~/.rstudio
rm -rf ~/.local/share/rstudio
rm .RData
Retry.
If it doesn't work, try to remove your configuration (settings will be lost)
rm -rf ~/.config/rstudio
☝️ How can I access to a Terminal 📺 in order to run the different commands?
- With Open On Demand: Shell Access
- With a SSH Client: Cluster/Log in
Retry.
If it doesn't work, contact the support (IFB Community Forum)
[JupyterHUB] Timeout or do not start#
Kill your job/session using the web interface (Menu "File" --> "Hub Control Panel" --> "Stop server") or in command line:
# Remove running jupyter job
scancel -u $USER -n jupyter
Clean session files, cache:
# Remove (rm) or move (mv) JupyterHUB directories
# mv ~/.jupyter ~/.jupyter.backup-2022-27-02
rm -rf ~/.jupyter
rm -rf ~/.local/share/jupyter
☝️ How can I access to a Terminal 📺 in order to run the different commands?
- With Open On Demand: Shell Access
- With a SSH Client: Cluster/Log in
[SLURM][RStudio] /tmp No space left on device
/ Error: Fatal error: cannot create 'R_TempDir'
#
Explanation 1#
The server on which the job ran must have a full on its /tmp/
. Indeed, by default, R by default, is writing temporary files in the /tmp/
directory of the server.
The local directory /tmp/
is limited and shared. It's not a good practice to let a software writing on local disk.
Solution#
The solution is to change the default temporary directory and expect that the tool is well developed (and the /tmp not hard-coded).
Please add the following lines at the beginning of your sbatch
script.
#!/bin/bash
# SBATCH -p fast
TMPDIR="/shared/projects/my_intereting_project/tmp/"
TMP="${TMPDIR}"
TEMP="${TMPDIR}"
mkdir -p "${TMPDIR}"
export TMPDIR TMP TEMP
module load r/4.1.1
Rscript my_script.R
[GPU] How to know the availability of GPU nodes#
We can use sinfo
command with "Generic resources (gres)" information.
For example:
sinfo -N -O nodelist,partition:15,Gres:30,GresUsed:50 -p gpu
NODELIST PARTITION GRES GRES_USED
gpu-node-01 gpu gpu:1g.5gb:14 gpu:1g.5gb:0(IDX:N/A)
gpu-node-02 gpu gpu:3g.20gb:2,gpu:7g.40gb:1 gpu:3g.20gb:1(IDX:0),gpu:7g.40gb:0(IDX:N/A)
gpu-node-03 gpu gpu:7g.40gb:2 gpu:7g.40gb:2(IDX:0-1)
In other words:
- gpu-node-01: 14 profiles 1g.5gb, 0 used
- gpu-node-02: 2 profiles 3g.20gb, 1 used
- gpu-node-02: 1 profile 7g.40gb, 0 used
- gpu-node-03: 1 profile 7g.40gb, 2 used
So we can see which GPU/profiles are immediately available.
More information about this "profile" ("Multi-Instance GPU"):
- https://ifb-elixirfr.gitlab.io/cluster/doc/slurm/slurm_at/#gpu-nodes
- https://docs.nvidia.com/datacenter/tesla/mig-user-guide/