Skip to content

SLURM GPU

SLURM partitions and nodes

The IFB Core HPC Cluster is providing GPU (Graphics Processing Unit) nodes with GPU cards.

GPUs have the advantage of offering a large number of computational units compared to CPUs and are particularly suited to highly parallel computations such as Deep Learning, data mining, image processing and pattern recognition. For example, the use of tools that can take advantage of GPU processors has recently enabled the democratisation of Nanopore technology for sequencing or the rise of epigenetics.

(Last update: 04/05/2022)

GPU nodes

Nbr CPU RAM (GB) GPU Type Disk /tmp
3 62 515 2 NVIDIA Ampere A100 40GB 4 TB

GPU Cards have been partitioned into isolated GPU instances with Multi-Instance GPU (MIG).

GPU Instance Profiles

⚠️ The values below can change. To check the current situation:

sinfo -Ne -p gpu --format "%.15N %.4c %.7m %G"
| Profile Name | GPU Memory | GPU Compute Slice | Number of Instances Available | | --:| --:| --:| --:| | 1g.5gb | 5GB | 1 | 14 | | 3g.20gb | 20GB | 3 | 2 | | 7g.40gb | 40GB | 7 | 3 |

Usage

Pre-requisites

To access to GPU nodes, you need to be granted to access to the gpu partition.

You need to request one using to the support community support website

Parameters to control the job

#SBATCH --partition=gpu
#SBATCH --gres=gpu:3g.20gb:1
  • --partition=gpu : the partition that allows access to the GPU nodes
  • --gres=gpu:3g.20gb:1 :
    3g.20gb: a card profile (see above)
    :1: the number of card in the reservation (see above)

$CUDA_VISIBLE_DEVICES

Note that you can use the variable $CUDA_VISIBLE_DEVICES in the command line to indicate the device number to your software (if it request it).

# Here is the values of CUDA_VISIBLE_DEVICES with "interactive" srun jobs.
$ srun --pty -p gpu --gres=gpu:3g.20gb:1 env | grep CUDA
CUDA_VISIBLE_DEVICES=0

$ srun --pty -p gpu --gres=gpu:3g.20gb:2 env | grep CUDA
CUDA_VISIBLE_DEVICES=0,1

Examples

Hello world

The NVIDIA System Management Interface (nvidia-smi) is a command line utility, intended to aid in the management and monitoring of NVIDIA GPU devices.

$ srun -p gpu --gres=gpu:k80:2 nvidia-smi
srun: job 35429913 queued and waiting for resources
srun: job 35429913 has been allocated resources
Wed Feb  1 16:36:54 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:84:00.0 Off |                    0 |
| N/A   40C    P0    55W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:85:00.0 Off |                    0 |
| N/A   31C    P0    70W / 149W |      0MiB / 11441MiB |     99%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Guppy basecaller

(This example need ❤️)

Here a minimal sbatch script for guppy_basecaller.

#SBATCH --partition=gpu
#SBATCH --gres=gpu:3g.20gb:1
#SBATCH --cpus-per-task=XX
#SBATCH --mem=XXGB 


module avail guppy/6.1.1-gpu
guppy_basecaller [...] --device "cuda:$CUDA_VISIBLE_DEVICES" 

💡 Tips

For one card, --device "cuda:$CUDA_VISIBLE_DEVICES" is ok since the render will be --device "cuda:0".

But for 2 cards, guppy_basecaller expect something like that --device "cuda:0 cuda:1". Maybe try something like that:

[...]
DEVICES=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{for(i=1; i<=NF; i++) {  printf "cuda:"$i" " }}')
[...]
guppy_basecaller [...] --device "cuda:$DEVICES" 

Optimization

Note that you can optimize the job but setting the following option (See Guppy manuel for more information):

  • --gpu_runners_per_device: Number of runners per GPU device.
  • --cpu_threads_per_caller: Number of CPU worker threads per basecaller.
  • --num_callers: Number of parallel basecallers to create.
  • --num_alignment_threads: Number of worker threads to use for alignment.

Alphafold2

Please have a look at Software environment > Alphafold2 page !