Skip to content

Alphafold2

Alphafold2 (https://alphafold.ebi.ac.uk/) is available on the IFB cluster from the command line by loading the alphafold/2.2.3 module, in Jupyter using the Alphafold kernel or on usegalaxy.fr. .

Alphafold requires GPU to run its prediction algorithm (So you need access to GPU node).

Alphafold databases are made available on every GPU node in /shared/bank/alphafold2/current

Usage

To run Alphafold2 v2.2.3

from the command line

We assume you have an amino acid sequences called my.fasta

Connect to the cluster login node through SSH :

ssh <login>@core.cluster.france-bioinformatique.fr

Load the Alphafold2 module :

module load alphafold/2.2.3

Get Help :

run_alphafold.sh --helpfull

Start an interactive session or run a batch job on the GPU node :

Interactive approach

Create an allocation for GPU ressources

salloc -p gpu --gres=gpu:3g.20gb:1 --cpus-per-task=10 --mem=50G

Run alphafold in monomer mode

srun run_alphafold.sh \
    --fasta_paths=my.fasta \
    --output_dir=/shared/projects/<myproject> \
    --model_preset=monomer \
    --db_preset=full_dbs \
    --data_dir=/shared/bank/alphafold2/current \
    --uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \
    --mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2018_12.fa \
    --pdb70_database_path=/shared/bank/alphafold2/current/pdb70/pdb70 \
    --template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \
    --max_template_date=2020-05-14 \
    --obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \
    --bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --uniclust30_database_path=/shared/bank/alphafold2/current/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
    --use_gpu_relax=true

Run alphafold in multimer mode

srun run_alphafold.sh     
    --fasta_paths=my.fasta \
    --output_dir=/shared/projects/<myproject> \    
    --model_preset=multimer \     
    --db_preset=full_dbs \     
    --data_dir=/shared/bank/alphafold2/current \     
    --uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \     
    --mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2018_12.fa \     
    --template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \     
    --max_template_date=2020-05-14 \     
    --obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \     
    --bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \    
    --pdb_seqres_database_path=/shared/bank/alphafold2/current/pdb_seqres/pdb_seqres.txt \    
    --uniprot_database_path=/shared/bank/alphafold2/current/uniprot/uniprot.fasta \    
    --uniclust30_database_path=/shared/bank/alphafold2/current/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \    
    --use_gpu_relax=true \

Do not forget to relinquish the allocation

exit

Batch job approach :

Create a my_fold.sh script based on the following example :

#!/bin/bash

#SBATCH -p gpu
#SBATCH --gres=gpu:1g.5gb:1
#SBATCH --cpus-per-task=10
#SBATCH --mem=50G

module load alphafold/2.1.1

mkdir -p /tmp/$USER_alphafold

srun run_alphafold.sh --fasta_paths=my.fasta \
    --output_dir=/shared/projects/<myproject> \
    --model_preset=monomer \
    --db_preset=full_dbs \
    --data_dir=/shared/bank/alphafold2/current \
    --uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \
    --mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2018_12.fa \
    --pdb70_database_path=/shared/bank/alphafold2/current/pdb70/pdb70 \
    --template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \
    --max_template_date=2020-05-14 \
    --obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \
    --bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --uniclust30_database_path=/shared/bank/alphafold2/current/uniclust30/uniclust30_2018_08/uniclust30_2018_08
  • model_preset allows to control models. Choose between monomer, monomer_casp14, monomer_ptm and multimer (needs multi-sequence FASTA file & changes some path).

  • db_preset allows to control MSA speed and quality. Choose between reduced_dbs (fastest) and full_dbs (best compromise).

Start your batch job

sbatch my_fold.sh

To use Alphafold2 from Jupyter :

  1. Connect to https://jupyterhub.cluster.france-bioinformatique.fr
  2. Choose the GPU profile and click on "Start server"
  3. Use the sample notebook AlphaFold.ipynb with the Alphafold kernel

    AlphaFold.ipynb

To use Alphafold2 from UseGalaxy.fr :

  1. Connect to https://usegalaxy.fr/
  2. Choose Alphafold 2 in the tool list
  3. Import your data, select your parameters and execute your job