AlphaFold2
DeepMind's AlphaFold2 is available on the IFB cluster from the command line by loading either:
- the
alphafold/2.3.2
module for the DeepMind's version - the
massivefold/1.0.0
module for an extended version of AlphaFold with more flags, that allows extensive sampling, and use of all versions of neural network models. More details here: https://github.com/GBLille/MassiveFold.
AlphaFold can also be used on usegalaxy.fr.
It requires GPU to run its prediction algorithm, which means you have to request access to GPU nodes.
Databases used by AlphaFold are made available on every GPU nodes in /shared/bank/alphafold2
. Several versions
are installed there. The last version of the databases is the current.
They are the ones that can be used with AlphaFold v2.3.2.
To run AlphaFold v2.3.2#
From the command line#
We assume you have an amino acid sequence called my.fasta
.
Connect to the cluster login node through SSH#
ssh <login>@core.cluster.france-bioinformatique.fr
Load the AlphaFold2 module#
module load alphafold/2.3.2
Get Help#
To get the list of all the flags provided by AlphaFold2, just run:
run_alphafold.sh --helpfull
Run a batch job on the GPU node or start an interactive session#
Batch job approach
Create a my_fold.sh script based on the following example:
- For monomer
Replace test_monomer by the name of your fasta file. To get more information about the profiles
to set in --gres
, see here.
#!/bin/bash
#SBATCH -p gpu
#SBATCH --gres=gpu:3g.20gb:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=50G
#SBATCH --job-name=test_monomer
#SBATCH -o %x.o%j
#SBATCH -t 24:0:0
module load alphafold/2.3.2
run_alphafold.sh \
--fasta_paths=test_monomer.fasta \
--output_dir=/shared/projects/<myproject> \
--data_dir=/shared/bank/alphafold2/current \
--db_preset=full_dbs \
--model_preset=monomer_ptm \
--models_to_relax=best \
--use_gpu_relax=true \
--max_template_date=2023-07-11 \
--use_precomputed_msas=false \
--uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \
--mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2022_05.fa \
--template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \
--bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--pdb70_database_path=/shared/bank/alphafold2/current/pdb70/pdb70 \
--uniref30_database_path=/shared/bank/alphafold2/current/uniref30/UniRef30_2021_03
- For multimer
Replace test_multimer by the name of your fasta file.
#!/bin/bash
#SBATCH -p gpu
#SBATCH --gres=gpu:3g.20gb:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=50G
#SBATCH --job-name=test_multimer
#SBATCH -o %x.o%j
#SBATCH -t 24:0:0
module load alphafold/2.3.2
run_alphafold.sh \
--fasta_paths=test_multimer.fasta \
--output_dir=/shared/projects/<myproject> \
--data_dir=/shared/bank/alphafold2/current \
--db_preset=full_dbs \
--models_to_relax=best \
--use_gpu_relax=true \
--model_preset=multimer \
--max_template_date=2023-07-11 \
--num_multimer_predictions_per_model=5 \
--use_precomputed_msas=false \
--uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \
--mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2022_05.fa \
--template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \
--bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--pdb_seqres_database_path=/shared/bank/alphafold2/current/pdb_seqres/pdb_seqres.txt \
--uniref30_database_path=/shared/bank/alphafold2/current/uniref30/UniRef30_2021_03 \
--uniprot_database_path=/shared/bank/alphafold2/current/uniprot/uniprot.fasta
Start your batch job:
sbatch my_fold.sh
Interactive approach
Create an allocation for GPU ressources in function of your needs. To get more information about the profiles
to set in --gres
, see here.
salloc -p gpu --gres=gpu:1g.5gb:1 --cpus-per-task=8 --mem=120G
- Run AlphaFold2 in monomer mode:
srun run_alphafold.sh \
--fasta_paths=test_monomer.fasta \
--output_dir=/shared/projects/<myproject> \
--data_dir=/shared/bank/alphafold2/current \
--db_preset=full_dbs \
--model_preset=monomer_ptm \
--models_to_relax=best \
--use_gpu_relax=true \
--max_template_date=2023-07-11 \
--use_precomputed_msas=false \
--uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \
--mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2022_05.fa \
--template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \
--bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--pdb70_database_path=/shared/bank/alphafold2/current/pdb70/pdb70 \
--uniref30_database_path=/shared/bank/alphafold2/current/uniref30/UniRef30_2021_03
- Run AlphaFold2 in multimer mode:
srun run_alphafold.sh \
--fasta_paths=test_multimer.fasta \
--output_dir=/shared/projects/<myproject> \
--data_dir=/shared/bank/alphafold2/current \
--db_preset=full_dbs \
--models_to_relax=best \
--use_gpu_relax=true \
--model_preset=multimer \
--max_template_date=2023-07-11 \
--num_multimer_predictions_per_model=5 \
--use_precomputed_msas=false \
--uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \
--mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2022_05.fa \
--template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \
--bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--pdb_seqres_database_path=/shared/bank/alphafold2/current/pdb_seqres/pdb_seqres.txt \
--uniref30_database_path=/shared/bank/alphafold2/current/uniref30/UniRef30_2021_03 \
--uniprot_database_path=/shared/bank/alphafold2/current/uniprot/uniprot.fasta
Do not forget to relinquish the allocation:
exit
To use AlphaFold2 from usegalaxy.fr#
- Connect to https://usegalaxy.fr/
- Choose
AlphaFold 2
in the tool list - Import your data, select your parameters and run your job
To run MassiveFold v1.0.0#
The same databases as AlphaFold v2.3.2 are required for MassiveFold v1.0.0. It can be run in the same way as AlphaFold v2.3.2, but additional flags can be used, as described here: https://github.com/GBLille/MassiveFold
Here is an example of a job file (batch job approach) for a monomer:
#!/bin/bash
#SBATCH -p gpu
#SBATCH --gres=gpu:3g.20gb:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=50G
#SBATCH --job-name=test_monomer_MF
#SBATCH -o %x.o%j
#SBATCH -t 24:0:0
module load massivefold/1.0.0
run_alphafold.sh \
--fasta_paths=test_monomer.fasta \
--output_dir=/shared/projects/<myproject> \
--data_dir=/shared/bank/alphafold2/current \
--db_preset=full_dbs \
--model_preset=monomer_ptm \
--max_template_date=2023-07-11 \
--use_precomputed_msas=false \
--num_predictions_per_model=1 \
--models_to_relax=best \
--use_gpu_relax=true \
--alignments_only=false \
--dropout=false \
--dropout_rates_filename= \
--max_recycles=3 \
--early_stop_tolerance=0.5 \
--bfd_max_hits=100000 \
--mgnify_max_hits=501 \
--uniprot_max_hits=50000 \
--uniref_max_hits=10000 \
--models_to_use= \
--start_prediction=1 \
--no_templates=false \
--uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \
--mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2022_05.fa \
--template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \
--bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--pdb70_database_path=/shared/bank/alphafold2/current/pdb70/pdb70 \
--uniref30_database_path=/shared/bank/alphafold2/current/uniref30/UniRef30_2021_03
and here for a multimer:
#!/bin/bash
#SBATCH -p gpu
#SBATCH --gres=gpu:3g.20gb:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=50G
#SBATCH --job-name=test_multimer_MF
#SBATCH -o %x.o%j
#SBATCH -t 24:0:0
module load massivefold/1.0.0
run_alphafold.sh \
--fasta_paths=test_multimer.fasta \
--output_dir=/shared/projects/<myproject> \
--data_dir=/shared/bank/alphafold2/current \
--db_preset=full_dbs \
--model_preset=multimer \
--max_template_date=2023-07-11 \
--use_precomputed_msas=false \
--num_predictions_per_model=5 \
--models_to_relax=best \
--use_gpu_relax=true \
--alignments_only=false \
--dropout=false \
--dropout_rates_filename= \
--max_recycles=20 \
--early_stop_tolerance=0.5 \
--bfd_max_hits=100000 \
--mgnify_max_hits=501 \
--uniprot_max_hits=50000 \
--uniref_max_hits=10000 \
--models_to_use= \
--start_prediction=1 \
--no_templates=false \
--uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \
--mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2022_05.fa \
--template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \
--bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--pdb_seqres_database_path=/shared/bank/alphafold2/current/pdb_seqres/pdb_seqres.txt \
--uniref30_database_path=/shared/bank/alphafold2/current/uniref30/UniRef30_2021_03 \
--uniprot_database_path=/shared/bank/alphafold2/current/uniprot/uniprot.fasta
Start your batch job:
sbatch my_fold.sh
To create plots#
plddt, predicted average error and sequence coverage plots can optionally be created using these scripts https://github.com/GBLille/MassiveFold/tree/main/MF_scripts/plots.
They were built extracting code from DeepMind and ColabFold's.
Just run the MF_plots.py script after the structure prediction (at the end of the job file if not in interactive mode; both python files have to be present in the running folder). Parameters are detailed with:
python MF_plots.py --help
"DM" for DeepMind's plots "CF" for ColabFold's plots.
For example run the commands:
python MF_plots.py --input_path ./jobname --plot_type one_for_all
-> group plots with values from all the top 10 predictions (default value of top_n_predictions parameter)
python MF_plots.py --input_path ./jobname --plot_type for_each --top_n_predictions 5
-> individual plots for each of the top 5 predictions
python MF_plots.py --input_path ./jobname --plot_type for_each --top_n_predictions 5 --chosen_plots CF_PAEs
-> mix plot_type by adding group PAE plot to the "for_each" individual ones
python MF_plots.py --input_path ./jobname --chosen_plots coverage,CF_PAEs
-> regardless of the plot type, plot alignment coverage and group PAE for top 10 predictions