AlphaFold2
DeepMind's AlphaFold2 is available on the IFB cluster from the command line by
loading either
- the alphafold/2.3.2
module for the DeepMind's version
- the massivefold/1.0.0
module for an extended version of AlphaFold with more flags, that allows extensive sampling,
and use of all versions of neural network models. More details here: https://github.com/GBLille/MassiveFold
AlphaFold can also be used on usegalaxy.fr.
It requires GPU to run its prediction algorithm, which means you have to request access to GPU nodes.
Databases used by AlphaFold are made available on every GPU nodes in /shared/bank/alphafold2
. Several versions
are installed there. The last version of the databases is the current.
They are the ones that can be used with AlphaFold v2.3.2.
To run AlphaFold v2.3.2
From the command line
We assume you have an amino acid sequence called my.fasta
.
Connect to the cluster login node through SSH :
ssh <login>@core.cluster.france-bioinformatique.fr
Load the AlphaFold2 module :
module load alphafold/2.3.2
Get Help :
To get the list of all the flags provided by AlphaFold2, just run:
run_alphafold.sh --helpfull
Run a batch job on the GPU node or start an interactive session
Batch job approach
Create a my_fold.sh script based on the following example:
- For monomer
Replace test_monomer by the name of your fasta file. To get more information about the profiles
to set in --gres
, see here.
#!/bin/bash
#SBATCH -p gpu
#SBATCH --gres=gpu:3g.20gb:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=50G
#SBATCH --job-name=test_monomer
#SBATCH -o %x.o%j
#SBATCH -t 24:0:0
module load alphafold/2.3.2
run_alphafold.sh \
--fasta_paths=test_monomer.fasta \
--output_dir=/shared/projects/<myproject> \
--data_dir=/shared/bank/alphafold2/current \
--db_preset=full_dbs \
--model_preset=monomer_ptm \
--models_to_relax=best \
--use_gpu_relax=true \
--max_template_date=2023-07-11 \
--use_precomputed_msas=false \
--uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \
--mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2022_05.fa \
--template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \
--bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--pdb70_database_path=/shared/bank/alphafold2/current/pdb70/pdb70 \
--uniref30_database_path=/shared/bank/alphafold2/current/uniref30/UniRef30_2021_03
- For multimer
Replace test_multimer by the name of your fasta file.
#!/bin/bash
#SBATCH -p gpu
#SBATCH --gres=gpu:3g.20gb:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=50G
#SBATCH --job-name=test_multimer
#SBATCH -o %x.o%j
#SBATCH -t 24:0:0
module load alphafold/2.3.2
run_alphafold.sh \
--fasta_paths=test_multimer.fasta \
--output_dir=/shared/projects/<myproject> \
--data_dir=/shared/bank/alphafold2/current \
--db_preset=full_dbs \
--models_to_relax=best \
--use_gpu_relax=true \
--model_preset=multimer \
--max_template_date=2023-07-11 \
--num_multimer_predictions_per_model=5 \
--use_precomputed_msas=false \
--uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \
--mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2022_05.fa \
--template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \
--bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--pdb_seqres_database_path=/shared/bank/alphafold2/current/pdb_seqres/pdb_seqres.txt \
--uniref30_database_path=/shared/bank/alphafold2/current/uniref30/UniRef30_2021_03 \
--uniprot_database_path=/shared/bank/alphafold2/current/uniprot/uniprot.fasta
Start your batch job:
sbatch my_fold.sh
Interactive approach
Create an allocation for GPU ressources in function of your needs. To get more information about the profiles
to set in --gres
, see here.
salloc -p gpu --gres=gpu:1g.5gb:1 --cpus-per-task=8 --mem=120G
- Run AlphaFold2 in monomer mode:
srun run_alphafold.sh \
--fasta_paths=test_monomer.fasta \
--output_dir=/shared/projects/<myproject> \
--data_dir=/shared/bank/alphafold2/current \
--db_preset=full_dbs \
--model_preset=monomer_ptm \
--models_to_relax=best \
--use_gpu_relax=true \
--max_template_date=2023-07-11 \
--use_precomputed_msas=false \
--uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \
--mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2022_05.fa \
--template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \
--bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--pdb70_database_path=/shared/bank/alphafold2/current/pdb70/pdb70 \
--uniref30_database_path=/shared/bank/alphafold2/current/uniref30/UniRef30_2021_03
- Run AlphaFold2 in multimer mode:
srun run_alphafold.sh \
--fasta_paths=test_multimer.fasta \
--output_dir=/shared/projects/<myproject> \
--data_dir=/shared/bank/alphafold2/current \
--db_preset=full_dbs \
--models_to_relax=best \
--use_gpu_relax=true \
--model_preset=multimer \
--max_template_date=2023-07-11 \
--num_multimer_predictions_per_model=5 \
--use_precomputed_msas=false \
--uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \
--mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2022_05.fa \
--template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \
--bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--pdb_seqres_database_path=/shared/bank/alphafold2/current/pdb_seqres/pdb_seqres.txt \
--uniref30_database_path=/shared/bank/alphafold2/current/uniref30/UniRef30_2021_03 \
--uniprot_database_path=/shared/bank/alphafold2/current/uniprot/uniprot.fasta
Do not forget to relinquish the allocation:
exit
To use AlphaFold2 from usegalaxy.fr :
- Connect to https://usegalaxy.fr/
- Choose
AlphaFold 2
in the tool list - Import your data, select your parameters and run your job
To run MassiveFold v1.0.0
The same databases as AlphaFold v2.3.2 are required for MassiveFold v1.0.0. It can be run in the same way as AlphaFold v2.3.2, but additional flags can be used, as described here: https://github.com/GBLille/MassiveFold
Here is an example of a job file (batch job approach) for a monomer:
#!/bin/bash
#SBATCH -p gpu
#SBATCH --gres=gpu:3g.20gb:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=50G
#SBATCH --job-name=test_monomer_MF
#SBATCH -o %x.o%j
#SBATCH -t 24:0:0
module load massivefold/1.0.0
run_alphafold.sh \
--fasta_paths=test_monomer.fasta \
--output_dir=/shared/projects/<myproject> \
--data_dir=/shared/bank/alphafold2/current \
--db_preset=full_dbs \
--model_preset=monomer_ptm \
--max_template_date=2023-07-11 \
--use_precomputed_msas=false \
--num_predictions_per_model=1 \
--models_to_relax=best \
--use_gpu_relax=true \
--alignments_only=false \
--dropout=false \
--dropout_rates_filename= \
--max_recycles=3 \
--early_stop_tolerance=0.5 \
--bfd_max_hits=100000 \
--mgnify_max_hits=501 \
--uniprot_max_hits=50000 \
--uniref_max_hits=10000 \
--models_to_use= \
--start_prediction=1 \
--no_templates=false \
--uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \
--mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2022_05.fa \
--template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \
--bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--pdb70_database_path=/shared/bank/alphafold2/current/pdb70/pdb70 \
--uniref30_database_path=/shared/bank/alphafold2/current/uniref30/UniRef30_2021_03
and here for a multimer:
#!/bin/bash
#SBATCH -p gpu
#SBATCH --gres=gpu:3g.20gb:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=50G
#SBATCH --job-name=test_multimer_MF
#SBATCH -o %x.o%j
#SBATCH -t 24:0:0
module load massivefold/1.0.0
run_alphafold.sh \
--fasta_paths=test_multimer.fasta \
--output_dir=/shared/projects/<myproject> \
--data_dir=/shared/bank/alphafold2/current \
--db_preset=full_dbs \
--model_preset=multimer \
--max_template_date=2023-07-11 \
--use_precomputed_msas=false \
--num_predictions_per_model=5 \
--models_to_relax=best \
--use_gpu_relax=true \
--alignments_only=false \
--dropout=false \
--dropout_rates_filename= \
--max_recycles=20 \
--early_stop_tolerance=0.5 \
--bfd_max_hits=100000 \
--mgnify_max_hits=501 \
--uniprot_max_hits=50000 \
--uniref_max_hits=10000 \
--models_to_use= \
--start_prediction=1 \
--no_templates=false \
--uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \
--mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2022_05.fa \
--template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \
--bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--pdb_seqres_database_path=/shared/bank/alphafold2/current/pdb_seqres/pdb_seqres.txt \
--uniref30_database_path=/shared/bank/alphafold2/current/uniref30/UniRef30_2021_03 \
--uniprot_database_path=/shared/bank/alphafold2/current/uniprot/uniprot.fasta
Start your batch job:
sbatch my_fold.sh
To create plots
plddt, predicted average error and sequence coverage plots can optionally be created using these scripts https://github.com/GBLille/MassiveFold/tree/main/MF_scripts/plots.
They were built extracting code from DeepMind and ColabFold's.
Just run the MF_plots.py script after the structure prediction (at the end of the job file if not in interactive mode; both python files have to be present in the running folder). Parameters are detailed with:
python MF_plots.py --help
"DM" for DeepMind's plots "CF" for ColabFold's plots.
For example run the commands:
python MF_plots.py --input_path ./jobname --plot_type one_for_all
python MF_plots.py --input_path ./jobname --plot_type for_each --top_n_predictions 5
python MF_plots.py --input_path ./jobname --plot_type for_each --top_n_predictions 5 --chosen_plots CF_PAEs
python MF_plots.py --input_path ./jobname --chosen_plots coverage,CF_PAEs