Skip to content

AlphaFold2

DeepMind's AlphaFold2 is available on the IFB cluster from the command line by loading either
- the alphafold/2.3.2 module for the DeepMind's version - the massivefold/1.0.0 module for an extended version of AlphaFold with more flags, that allows extensive sampling, and use of all versions of neural network models. More details here: https://github.com/GBLille/MassiveFold

AlphaFold can also be used on usegalaxy.fr.

It requires GPU to run its prediction algorithm, which means you have to request access to GPU nodes.

Databases used by AlphaFold are made available on every GPU nodes in /shared/bank/alphafold2. Several versions are installed there. The last version of the databases is the current. They are the ones that can be used with AlphaFold v2.3.2.

To run AlphaFold v2.3.2

From the command line

We assume you have an amino acid sequence called my.fasta.

Connect to the cluster login node through SSH :

ssh <login>@core.cluster.france-bioinformatique.fr

Load the AlphaFold2 module :

module load alphafold/2.3.2

Get Help :

To get the list of all the flags provided by AlphaFold2, just run:

run_alphafold.sh --helpfull

Run a batch job on the GPU node or start an interactive session

Batch job approach

Create a my_fold.sh script based on the following example:

  • For monomer

Replace test_monomer by the name of your fasta file. To get more information about the profiles to set in --gres, see here.

#!/bin/bash

#SBATCH -p gpu
#SBATCH --gres=gpu:3g.20gb:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=50G
#SBATCH --job-name=test_monomer
#SBATCH -o %x.o%j
#SBATCH -t 24:0:0

module load alphafold/2.3.2

run_alphafold.sh \
    --fasta_paths=test_monomer.fasta \
    --output_dir=/shared/projects/<myproject> \
    --data_dir=/shared/bank/alphafold2/current \
    --db_preset=full_dbs \
    --model_preset=monomer_ptm \
    --models_to_relax=best \
    --use_gpu_relax=true \
    --max_template_date=2023-07-11 \
    --use_precomputed_msas=false \
    --uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \
    --mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2022_05.fa \
    --template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \
    --obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \
    --bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --pdb70_database_path=/shared/bank/alphafold2/current/pdb70/pdb70 \
    --uniref30_database_path=/shared/bank/alphafold2/current/uniref30/UniRef30_2021_03 
  • For multimer

Replace test_multimer by the name of your fasta file.

#!/bin/bash

#SBATCH -p gpu
#SBATCH --gres=gpu:3g.20gb:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=50G
#SBATCH --job-name=test_multimer
#SBATCH -o %x.o%j
#SBATCH -t 24:0:0

module load alphafold/2.3.2

run_alphafold.sh \
    --fasta_paths=test_multimer.fasta \
    --output_dir=/shared/projects/<myproject> \
    --data_dir=/shared/bank/alphafold2/current \
    --db_preset=full_dbs \
    --models_to_relax=best \
    --use_gpu_relax=true \
    --model_preset=multimer \
    --max_template_date=2023-07-11 \
    --num_multimer_predictions_per_model=5 \
    --use_precomputed_msas=false \
    --uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \
    --mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2022_05.fa \
    --template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \
    --obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \
    --bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --pdb_seqres_database_path=/shared/bank/alphafold2/current/pdb_seqres/pdb_seqres.txt \
    --uniref30_database_path=/shared/bank/alphafold2/current/uniref30/UniRef30_2021_03 \
    --uniprot_database_path=/shared/bank/alphafold2/current/uniprot/uniprot.fasta 

Start your batch job:

sbatch my_fold.sh

Interactive approach

Create an allocation for GPU ressources in function of your needs. To get more information about the profiles to set in --gres, see here.

salloc -p gpu --gres=gpu:1g.5gb:1 --cpus-per-task=8 --mem=120G
  • Run AlphaFold2 in monomer mode:
srun run_alphafold.sh \
    --fasta_paths=test_monomer.fasta \
    --output_dir=/shared/projects/<myproject> \
    --data_dir=/shared/bank/alphafold2/current \
    --db_preset=full_dbs \
    --model_preset=monomer_ptm \
    --models_to_relax=best \
    --use_gpu_relax=true \
    --max_template_date=2023-07-11 \
    --use_precomputed_msas=false \
    --uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \
    --mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2022_05.fa \
    --template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \
    --obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \
    --bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --pdb70_database_path=/shared/bank/alphafold2/current/pdb70/pdb70 \
    --uniref30_database_path=/shared/bank/alphafold2/current/uniref30/UniRef30_2021_03 
  • Run AlphaFold2 in multimer mode:
srun run_alphafold.sh \
    --fasta_paths=test_multimer.fasta \
    --output_dir=/shared/projects/<myproject> \
    --data_dir=/shared/bank/alphafold2/current \
    --db_preset=full_dbs \
    --models_to_relax=best \
    --use_gpu_relax=true \
    --model_preset=multimer \
    --max_template_date=2023-07-11 \
    --num_multimer_predictions_per_model=5 \
    --use_precomputed_msas=false \
    --uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \
    --mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2022_05.fa \
    --template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \
    --obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \
    --bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --pdb_seqres_database_path=/shared/bank/alphafold2/current/pdb_seqres/pdb_seqres.txt \
    --uniref30_database_path=/shared/bank/alphafold2/current/uniref30/UniRef30_2021_03 \
    --uniprot_database_path=/shared/bank/alphafold2/current/uniprot/uniprot.fasta 

Do not forget to relinquish the allocation:

exit

To use AlphaFold2 from usegalaxy.fr :

  1. Connect to https://usegalaxy.fr/
  2. Choose AlphaFold 2 in the tool list
  3. Import your data, select your parameters and run your job

To run MassiveFold v1.0.0

The same databases as AlphaFold v2.3.2 are required for MassiveFold v1.0.0. It can be run in the same way as AlphaFold v2.3.2, but additional flags can be used, as described here: https://github.com/GBLille/MassiveFold

Here is an example of a job file (batch job approach) for a monomer:

#!/bin/bash

#SBATCH -p gpu
#SBATCH --gres=gpu:3g.20gb:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=50G
#SBATCH --job-name=test_monomer_MF
#SBATCH -o %x.o%j
#SBATCH -t 24:0:0

module load massivefold/1.0.0

run_alphafold.sh \
    --fasta_paths=test_monomer.fasta \
    --output_dir=/shared/projects/<myproject> \
    --data_dir=/shared/bank/alphafold2/current \
    --db_preset=full_dbs \
    --model_preset=monomer_ptm \
    --max_template_date=2023-07-11 \
    --use_precomputed_msas=false \
    --num_predictions_per_model=1 \
    --models_to_relax=best \
    --use_gpu_relax=true \
    --alignments_only=false \
    --dropout=false \
    --dropout_rates_filename= \
    --max_recycles=3 \
    --early_stop_tolerance=0.5 \
    --bfd_max_hits=100000 \
    --mgnify_max_hits=501 \
    --uniprot_max_hits=50000 \
    --uniref_max_hits=10000 \
    --models_to_use= \
    --start_prediction=1 \
    --no_templates=false \
    --uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \
    --mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2022_05.fa \
    --template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \
    --obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \
    --bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --pdb70_database_path=/shared/bank/alphafold2/current/pdb70/pdb70 \
    --uniref30_database_path=/shared/bank/alphafold2/current/uniref30/UniRef30_2021_03 

and here for a multimer:

#!/bin/bash

#SBATCH -p gpu
#SBATCH --gres=gpu:3g.20gb:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=50G
#SBATCH --job-name=test_multimer_MF
#SBATCH -o %x.o%j
#SBATCH -t 24:0:0

module load massivefold/1.0.0

run_alphafold.sh \
    --fasta_paths=test_multimer.fasta \
    --output_dir=/shared/projects/<myproject> \
    --data_dir=/shared/bank/alphafold2/current \
    --db_preset=full_dbs \
    --model_preset=multimer \
    --max_template_date=2023-07-11 \
    --use_precomputed_msas=false \
    --num_predictions_per_model=5 \
    --models_to_relax=best \
    --use_gpu_relax=true \
    --alignments_only=false \
    --dropout=false \
    --dropout_rates_filename= \
    --max_recycles=20 \
    --early_stop_tolerance=0.5 \
    --bfd_max_hits=100000 \
    --mgnify_max_hits=501 \
    --uniprot_max_hits=50000 \
    --uniref_max_hits=10000 \
    --models_to_use= \
    --start_prediction=1 \
    --no_templates=false \
    --uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \
    --mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2022_05.fa \
    --template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \
    --obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \
    --bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --pdb_seqres_database_path=/shared/bank/alphafold2/current/pdb_seqres/pdb_seqres.txt \
    --uniref30_database_path=/shared/bank/alphafold2/current/uniref30/UniRef30_2021_03 \
    --uniprot_database_path=/shared/bank/alphafold2/current/uniprot/uniprot.fasta 

Start your batch job:

sbatch my_fold.sh

To create plots

plddt, predicted average error and sequence coverage plots can optionally be created using these scripts https://github.com/GBLille/MassiveFold/tree/main/MF_scripts/plots.

They were built extracting code from DeepMind and ColabFold's.

Just run the MF_plots.py script after the structure prediction (at the end of the job file if not in interactive mode; both python files have to be present in the running folder). Parameters are detailed with:

  python MF_plots.py --help

"DM" for DeepMind's plots "CF" for ColabFold's plots.

For example run the commands:

  python MF_plots.py --input_path ./jobname --plot_type one_for_all
-> group plots with values from all the top 10 predictions (default value of top_n_predictions parameter)
  python MF_plots.py --input_path ./jobname --plot_type for_each --top_n_predictions 5
-> individual plots for each of the top 5 predictions
  python MF_plots.py --input_path ./jobname --plot_type for_each --top_n_predictions 5 --chosen_plots CF_PAEs
-> mix plot_type by adding group PAE plot to the "for_each" individual ones
  python MF_plots.py --input_path ./jobname --chosen_plots coverage,CF_PAEs
-> regardless of the plot type, plot alignment coverage and group PAE for top 10 predictions