AlphaFold
DeepMind's AlphaFold is available on the IFB Core Cluster in different implementations
- Alphafold 3
- Alphafold 2
- MassiveFold: an implementation that allow to massively expand the sampling of structure predictions by improving the computing of AlphaFold based predictions
AlphaFold can also be used on usegalaxy.fr.
Alphafold 3#
It requires GPU to run its prediction algorithm, which means you have to request access to GPU nodes.
Databases used by AlphaFold are made available on every GPU nodes in {{ no such element: dict object['path_bank_alphafold3'] }}
.
# List the version available
module avail avail alphafold/3*
# Load a specific version
module load alphafold/3.0.1
Example#
Input#
fold_input.json
{
"name": "2PV7",
"sequences": [
{
"protein": {
"id": ["A", "B"],
"sequence": "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGGLFARYLRASGYPISILDREDWAVAESILANADVVIVSVPINLTLETIERLKPYLTENMLLADLTSVKREPLAKMLEVHTGAVLGLHPMFGADIASMAKQVVVRCDGRFPERYEWLLEQIQIWGAKIYQTNATEHDHNMTYIQALRHFSTFANGLHLSKQPINLANLLALSSPIYRLELAMIGRLFAQDAELYADIIMDKSENLAVIETLKQTYDEALTFFENNDRQGFIDAFHKVRDWFGDYSEQFLKESRQLLQQANDLKQG"
}
}
],
"modelSeeds": [1],
"dialect": "alphafold3",
"version": 1
}
sbatch#
#!/bin/bash
#SBATCH -p gpu
#SBATCH --gres=gpu:7g.40gb:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=50G
DB_PATH={{ no such element: dict object['path_bank_alphafold3'] }}
INPUT_JSON=./fold_input.json
module load alphafold/3.0.1
run_alphafold.py \
--json_path=$INPUT_JSON \
--model_dir=$DB_PATH \
--db_dir=$DB_PATH \
--jackhmmer_n_cpu=$SLURM_CPUS_PER_TASK \
--output_dir=.
AlphaFold 2#
❌ Currently broken
It requires GPU to run its prediction algorithm, which means you have to request access to GPU nodes.
Databases used by AlphaFold are made available on every GPU nodes in {{ no such element: dict object['path_bank_alphafold2'] }}
.
From the command line#
We assume you have an amino acid sequence called my.fasta
.
☝️ How can I access to a Terminal 📺 in order to run the different commands?
- With Open On Demand: Shell Access
- With a SSH Client: Cluster/Log in
Load the AlphaFold2 module
# List the version available
module avail avail alphafold/2*
# Load a specific version
module load alphafold/2.3.2
To get the list of all the flags provided by AlphaFold2, just run:
run_alphafold.sh --helpfull
For monomer#
Replace test_monomer by the name of your fasta file. To get more information about the profiles
to set in --gres
, see here.
#!/bin/bash
#SBATCH -p gpu
#SBATCH --gres=gpu:7g.40gb:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=50G
INPUT=test_monomer.fasta
BANK_PATH={{ no such element: dict object['path_bank_alphafold2'] }}
MAX_TEMPLATE_DATE=2023-07-11
module load alphafold/2.3.2
run_alphafold.sh \
--fasta_paths=$INPUT \
--db_preset=full_dbs \
--model_preset=monomer_ptm \
--models_to_relax=best \
--use_gpu_relax=true \
--use_precomputed_msas=false \
--max_template_date=$MAX_TEMPLATE_DATE \
--data_dir=$BANK_PATH \
--uniref90_database_path=BANK_PATH/uniref90/uniref90.fasta \
--mgnify_database_path=BANK_PATH/mgnify/mgy_clusters_2022_05.fa \
--template_mmcif_dir=BANK_PATH/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=BANK_PATH/pdb_mmcif/obsolete.dat \
--bfd_database_path=BANK_PATH/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--pdb70_database_path=BANK_PATH/pdb70/pdb70 \
--uniref30_database_path=BANK_PATH/uniref30/UniRef30_2021_03 \
--output_dir=.
For multimer#
Replace test_multimer by the name of your fasta file.
#!/bin/bash
#SBATCH -p gpu
#SBATCH --gres=gpu:7g.40gb:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=50G
INPUT=test_multimer.fasta
BANK_PATH={{ no such element: dict object['path_bank_alphafold2'] }}
MAX_TEMPLATE_DATE=2023-07-11
module load alphafold/2.3.2
run_alphafold.sh \
--fasta_paths=$INPUT \
--db_preset=full_dbs \
--models_to_relax=best \
--use_gpu_relax=true \
--model_preset=multimer \
--num_multimer_predictions_per_model=5 \
--use_precomputed_msas=false \
--max_template_date=$MAX_TEMPLATE_DATE \
--data_dir=$BANK_PATH \
--uniref90_database_path=$BANK_PATH/uniref90/uniref90.fasta \
--mgnify_database_path=$BANK_PATH/mgnify/mgy_clusters_2022_05.fa \
--template_mmcif_dir=$BANK_PATH/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=$BANK_PATH/pdb_mmcif/obsolete.dat \
--bfd_database_path=$BANK_PATH/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--pdb_seqres_database_path=$BANK_PATH/pdb_seqres/pdb_seqres.txt \
--uniref30_database_path=$BANK_PATH/uniref30/UniRef30_2021_03 \
--uniprot_database_path=$BANK_PATH/uniprot/uniprot.fasta \
--output_dir=.
To use AlphaFold2 from usegalaxy.fr#
- Connect to https://usegalaxy.fr/
- Choose
AlphaFold 2
in the tool list - Import your data, select your parameters and run your job
MassiveFold#
The same databases as AlphaFold 2 are required for MassiveFold. It can be run in the same way as AlphaFold 2, but additional flags can be used, as described here: https://github.com/GBLille/MassiveFold
Here is an example of a job file (batch job approach) for a monomer:
#!/bin/bash
#SBATCH -p gpu
#SBATCH --gres=gpu:7g.40gb:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=50G
INPUT=test_monomer.fasta
BANK_PATH={{ no such element: dict object['path_bank_alphafold2'] }}
MAX_TEMPLATE_DATE=2023-07-11
module load massivefold/1.0.0
run_alphafold.sh \
--fasta_paths=$INPUT \
--db_preset=full_dbs \
--model_preset=monomer_ptm \
--use_precomputed_msas=false \
--num_predictions_per_model=1 \
--models_to_relax=best \
--use_gpu_relax=true \
--alignments_only=false \
--dropout=false \
--dropout_rates_filename= \
--max_recycles=3 \
--early_stop_tolerance=0.5 \
--bfd_max_hits=100000 \
--mgnify_max_hits=501 \
--uniprot_max_hits=50000 \
--uniref_max_hits=10000 \
--models_to_use= \
--start_prediction=1 \
--no_templates=false \
--max_template_date=$MAX_TEMPLATE_DATE \
--data_dir=$BANK \
--uniref90_database_path=$BANK/uniref90/uniref90.fasta \
--mgnify_database_path=$BANK/mgnify/mgy_clusters_2022_05.fa \
--template_mmcif_dir=$BANK/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=$BANK/pdb_mmcif/obsolete.dat \
--bfd_database_path=$BANK/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--pdb70_database_path=$BANK/pdb70/pdb70 \
--uniref30_database_path=$BANK/uniref30/UniRef30_2021_03 \
--output_dir=.
and here for a multimer:
#!/bin/bash
#SBATCH -p gpu
#SBATCH --gres=gpu:7g.40gb:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=50G
INPUT=test_multimer.fasta
BANK_PATH={{ no such element: dict object['path_bank_alphafold2'] }}
MAX_TEMPLATE_DATE=2023-07-11
module load massivefold/1.0.0
run_alphafold.sh \
--fasta_paths=$INPUT \
--db_preset=full_dbs \
--model_preset=multimer \
--use_precomputed_msas=false \
--num_predictions_per_model=5 \
--models_to_relax=best \
--use_gpu_relax=true \
--alignments_only=false \
--dropout=false \
--dropout_rates_filename= \
--max_recycles=20 \
--early_stop_tolerance=0.5 \
--bfd_max_hits=100000 \
--mgnify_max_hits=501 \
--uniprot_max_hits=50000 \
--uniref_max_hits=10000 \
--models_to_use= \
--start_prediction=1 \
--no_templates=false \
--max_template_date=$MAX_TEMPLATE_DATE \
--data_dir=$BANK \
--uniref90_database_path=$BANK/uniref90/uniref90.fasta \
--mgnify_database_path=$BANK/mgnify/mgy_clusters_2022_05.fa \
--template_mmcif_dir=$BANK/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=$BANK/pdb_mmcif/obsolete.dat \
--bfd_database_path=$BANK/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--pdb_seqres_database_path=$BANK/pdb_seqres/pdb_seqres.txt \
--uniref30_database_path=$BANK/uniref30/UniRef30_2021_03 \
--uniprot_database_path=$BANK/uniprot/uniprot.fasta \
--output_dir=.
To create plots#
plddt, predicted average error and sequence coverage plots can optionally be created using these scripts https://github.com/GBLille/MassiveFold/tree/main/MF_scripts/plots.
They were built extracting code from DeepMind and ColabFold's.
Just run the MF_plots.py script after the structure prediction (at the end of the job file if not in interactive mode; both python files have to be present in the running folder). Parameters are detailed with:
python MF_plots.py --help
"DM" for DeepMind's plots "CF" for ColabFold's plots.
For example run the commands:
python MF_plots.py --input_path ./jobname --plot_type one_for_all
-> group plots with values from all the top 10 predictions (default value of top_n_predictions parameter)
python MF_plots.py --input_path ./jobname --plot_type for_each --top_n_predictions 5
-> individual plots for each of the top 5 predictions
python MF_plots.py --input_path ./jobname --plot_type for_each --top_n_predictions 5 --chosen_plots CF_PAEs
-> mix plot_type by adding group PAE plot to the "for_each" individual ones
python MF_plots.py --input_path ./jobname --chosen_plots coverage,CF_PAEs
-> regardless of the plot type, plot alignment coverage and group PAE for top 10 predictions