Alphafold2
Alphafold2 (https://alphafold.ebi.ac.uk/) is available on the IFB cluster from the command line by loading the alphafold/2.2.3
module, in Jupyter using the Alphafold
kernel or on usegalaxy.fr.
.
Alphafold requires GPU to run its prediction algorithm (So you need access to GPU node).
Alphafold databases are made available on every GPU node in /shared/bank/alphafold2/current
Usage
To run Alphafold2 v2.2.3
from the command line
We assume you have an amino acid sequences called my.fasta
Connect to the cluster login node through SSH :
ssh <login>@core.cluster.france-bioinformatique.fr
Load the Alphafold2 module :
module load alphafold/2.2.3
Get Help :
run_alphafold.sh --helpfull
Start an interactive session or run a batch job on the GPU node :
Interactive approach
Create an allocation for GPU ressources
salloc -p gpu --gres=gpu:3g.20gb:1 --cpus-per-task=10 --mem=50G
Run alphafold in monomer mode
srun run_alphafold.sh \
--fasta_paths=my.fasta \
--output_dir=/shared/projects/<myproject> \
--model_preset=monomer \
--db_preset=full_dbs \
--data_dir=/shared/bank/alphafold2/current \
--uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \
--mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2018_12.fa \
--pdb70_database_path=/shared/bank/alphafold2/current/pdb70/pdb70 \
--template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \
--max_template_date=2020-05-14 \
--obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \
--bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--uniclust30_database_path=/shared/bank/alphafold2/current/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--use_gpu_relax=true
Run alphafold in multimer mode
srun run_alphafold.sh
--fasta_paths=my.fasta \
--output_dir=/shared/projects/<myproject> \
--model_preset=multimer \
--db_preset=full_dbs \
--data_dir=/shared/bank/alphafold2/current \
--uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \
--mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2018_12.fa \
--template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \
--max_template_date=2020-05-14 \
--obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \
--bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--pdb_seqres_database_path=/shared/bank/alphafold2/current/pdb_seqres/pdb_seqres.txt \
--uniprot_database_path=/shared/bank/alphafold2/current/uniprot/uniprot.fasta \
--uniclust30_database_path=/shared/bank/alphafold2/current/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--use_gpu_relax=true \
Do not forget to relinquish the allocation
exit
Batch job approach :
Create a my_fold.sh script based on the following example :
#!/bin/bash
#SBATCH -p gpu
#SBATCH --gres=gpu:1g.5gb:1
#SBATCH --cpus-per-task=10
#SBATCH --mem=50G
module load alphafold/2.1.1
mkdir -p /tmp/$USER_alphafold
srun run_alphafold.sh --fasta_paths=my.fasta \
--output_dir=/shared/projects/<myproject> \
--model_preset=monomer \
--db_preset=full_dbs \
--data_dir=/shared/bank/alphafold2/current \
--uniref90_database_path=/shared/bank/alphafold2/current/uniref90/uniref90.fasta \
--mgnify_database_path=/shared/bank/alphafold2/current/mgnify/mgy_clusters_2018_12.fa \
--pdb70_database_path=/shared/bank/alphafold2/current/pdb70/pdb70 \
--template_mmcif_dir=/shared/bank/alphafold2/current/pdb_mmcif/mmcif_files \
--max_template_date=2020-05-14 \
--obsolete_pdbs_path=/shared/bank/alphafold2/current/pdb_mmcif/obsolete.dat \
--bfd_database_path=/shared/bank/alphafold2/current/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--uniclust30_database_path=/shared/bank/alphafold2/current/uniclust30/uniclust30_2018_08/uniclust30_2018_08
-
model_preset
allows to control models. Choose between monomer, monomer_casp14, monomer_ptm and multimer (needs multi-sequence FASTA file & changes some path). -
db_preset
allows to control MSA speed and quality. Choose between reduced_dbs (fastest) and full_dbs (best compromise).
Start your batch job
sbatch my_fold.sh
To use Alphafold2 from Jupyter :
- Connect to https://jupyterhub.cluster.france-bioinformatique.fr
- Choose the
GPU
profile and click on "Start server" -
Use the sample notebook AlphaFold.ipynb with the Alphafold kernel
To use Alphafold2 from UseGalaxy.fr :
- Connect to https://usegalaxy.fr/
- Choose
Alphafold 2
in the tool list - Import your data, select your parameters and execute your job