Cluster description

Hardware

16 compute nodes (128 cores, 2 TB RAM)
HPE Apollo 2000 XL225n Gen10+, 2x AMD EPYC 7662 (3.1GHz, 64 cores), 3 TB RAM
68 compute nodes (28 cores, 256 GB RAM)
DELL C6320, 2x Intel Xeon E5-2695v3 (2.3GHz, 14 cores), 256GB RAM
1 "fat memory" compute node (64 cores, 3 TB RAM)
DELL R930, 4x Intel Xeon E7-8860v3 (2.2GHz, 16 cores), 3 To RAM
3 gpu compute nodes (32 cores, 512 GB RAM, 1 NVMe scratch volume each)
DELL R7525, 2x AMD EPYC 7343 AMD-EPYC-7343 (3,2GHz, 16 cores), 2x graphic processors NVIDIA Ampere A100 40GB [NVIDIA A100] (https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/a100/pdf/nvidia-a100-datasheet-us-nvidia-1758950-r4-web.pdf), 512 GB RAM
2 login and 1 admin nodes
DELL R630, 2x Intel Xeon E5-2620v4 (2.1GHz, 8 cores), 128 Go RAM

Total of 103 nodes - 8434 cores, 56TB of RAM

Storage based on DDN Lustre EXAScaler (2 PB)
SFA400NVXE, ES400NVX and 2xSS9012 expansion enclosures, 11 x 3.84TB SSD, 180 x 16TB HDD
Local network Ethernet 10Gbits/s
DELL S6000 switch 10/40Gb Ethernet
Internet access 1Gbits/s

The cluster is managed by Slurm (version 20.11.8).

Scientific software and tools are available through Environment Modules and are mainly based on Conda packages or Singularity images.

Operating System: CentOS (cluster) and sometimes Ubuntu

Around the cluster management: Nagios Core, Netbox, Proxmox VE, VMware ESX.

Deployment and configuration are powered by Ansible and GitLab. Schema orchestration