Protein folding on Jean Zay
IDRIS has several software for protein folding.
Alphafold and Colabfold use two steps:
- Multiple sequence alignment
- Protein folding
The alignment step is quite long and not available on GPU. Therefore it is recommended to do this step outside of a GPU job not to waste hours.
You can use the prepost partition for this step and then use the results for the folding.
Available versions
Version | On V100 | On A100 | On H100 |
alphafold/2.1.2 | ✅ | ||
alphafold/2.2.4 | ✅ | ✅ | |
alphafold/2.3.1 | ✅ | ✅ | |
alphafold/2.3.2 | ✅ | ✅ | |
alphafold/3.0.0 | ✅ | ||
alphafold/3.0.1 | ✅ |
Submission script example
For a monomer
- alphafold.slurm
#!/usr/bin/env bash #SBATCH --nodes=1 # Number of nodes #SBATCH --ntasks-per-node=1 # Number of tasks per node #SBATCH --cpus-per-task=10 # Number of OpenMP threads per task #SBATCH --gpus-per-node=1 # Number of GPUs per node #SBATCH --hint=nomultithread # Disable hyperthreading #SBATCH --job-name=alphafold # Jobname #SBATCH --output=%x.o%j # Output file %x is the jobname, %j the jobid #SBATCH --error=%x.o%j # Error file #SBATCH --time=10:00:00 # Expected runtime HH:MM:SS (max 100h for V100, 20h for A100) ## ## Please, refer to comments below for ## more information about these 4 last options. ##SBATCH --account=<account>@gpu # To specify gpu accounting: <account> = echo $IDRPROJ ##SBATCH --partition=<partition> # To specify partition (see IDRIS web site for more info) ##SBATCH --qos=qos_gpu-dev # Uncomment for job requiring less than 2 hours ##SBATCH --qos=qos_gpu-t4 # Uncomment for job requiring more than 20h (max 16 GPU, V100 only) module purge module load alphafold/2.2.4 export TMP=$JOBSCRATCH export TMPDIR=$JOBSCRATCH ## In this example we do not let the structures relax with OpenMM python3 $(which run_alphafold.py) \ --output_dir=outputs \ --uniref90_database_path=${DSDIR}/AlphaFold/uniref90/uniref90.fasta \ --mgnify_database_path=${DSDIR}/AlphaFold/mgnify/mgy_clusters_2018_12.fa \ --template_mmcif_dir=${DSDIR}/AlphaFold/pdb_mmcif/mmcif_files \ --obsolete_pdbs_path=${DSDIR}/AlphaFold/pdb_mmcif/obsolete.dat \ --bfd_database_path=${DSDIR}/AlphaFold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ --uniclust30_database_path=${DSDIR}/AlphaFold/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \ --pdb70_database_path=${DSDIR}/AlphaFold/pdb70/pdb70 \ --fasta_paths=test.fa \ --max_template_date=2021-07-28 \ --use_gpu_relax=False \ --norun_relax \ --data_dir=${DSDIR}/AlphaFold/model_parameters/2.2.4
For a multimer
Attention: the fasta file must contain the different monomers.
- alphafold_multimer.slurm
#!/usr/bin/env bash #SBATCH --nodes=1 # Number of nodes #SBATCH --ntasks-per-node=1 # Number of tasks per node #SBATCH --cpus-per-task=10 # Number of OpenMP threads per task #SBATCH --gpus-per-node=1 # Number of GPUs per node #SBATCH --hint=nomultithread # Disable hyperthreading #SBATCH --job-name=alphafold # Jobname #SBATCH --output=%x.o%j # Output file %x is the jobname, %j the jobid #SBATCH --error=%x.o%j # Error file #SBATCH --time=10:00:00 # Expected runtime HH:MM:SS (max 100h for V100, 20h for A100) ## ## Please, refer to comments below for ## more information about these 4 last options. ##SBATCH --account=<account>@gpu # To specify cpu accounting: <account> = echo $IDRPROJ ##SBATCH --partition=<partition> # To specify partition (see IDRIS web site for more info) ##SBATCH --qos=qos_gpu-dev # Uncomment for job requiring less than 2 hours ##SBATCH --qos=qos_gpu-t4 # Uncomment for job requiring more than 20h (max 16 GPU, V100 only) module purge module load alphafold/2.2.4 export TMP=$JOBSCRATCH export TMPDIR=$JOBSCRATCH ## In this example we let the structures relax with OpenMM python3 $(which run_alphafold.py) \ --output_dir=outputs \ --uniref90_database_path=${DSDIR}/AlphaFold/uniref90/uniref90.fasta \ --mgnify_database_path=${DSDIR}/AlphaFold/mgnify/mgy_clusters_2018_12.fa \ --template_mmcif_dir=${DSDIR}/AlphaFold/pdb_mmcif/mmcif_files \ --obsolete_pdbs_path=${DSDIR}/AlphaFold/pdb_mmcif/obsolete.dat \ --bfd_database_path=${DSDIR}/AlphaFold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ --pdb_seqres_database_path=${DSDIR}/AlphaFold/pdb_seqres/pdb_seqres.txt \ --uniclust30_database_path=${DSDIR}/AlphaFold/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \ --uniprot_database_path=${DSDIR}/AlphaFold/uniprot/uniprot.fasta \ --use_gpu_relax \ --model_preset=multimer \ --fasta_paths=test.fasta \ --max_template_date=2022-01-01 \ --data_dir=${DSDIR}/AlphaFold/model_parameters/2.2.4
Advice for the alignment
Alignments are done with MMSeqs. The software uses a features for reading files which is very inefficient on Spectrum Scale, the network file system of Jean Zay.
If you have a large number of sequences to align it is possible to copy the database in memory on a prepost node. It is not recommended if you do not have a large number of sequences since the copy in memory can be quite long.
- colab_align.slurm
#!/usr/bin/env bash #SBATCH --nodes=1 # Number of nodes #SBATCH --ntasks-per-node=1 # Number of tasks per node #SBATCH --cpus-per-task=10 # Number of OpenMP threads per task #SBATCH --hint=nomultithread # Disable hyperthreading #SBATCH --job-name=align_colabfold # Jobname #SBATCH --output=%x.o%j # Output file %x is the jobname, %j the jobid #SBATCH --error=%x.o%j # Error file #SBATCH --time=10:00:00 # Expected runtime HH:MM:SS (max 20h) #SBATCH --partition=prepost DS=$DSDIR/ColabFold DB=/dev/shm/ColabFold input=test.fa mkdir $DB cp $DS/colabfold_envdb_202108_aln* $DS/colabfold_envdb_202108_db.* $DS/colabfold_envdb_202108_db_aln.* $DS/colabfold_envdb_202108_db_h* $DS/colabfold_envdb_202108_db_seq* $DB cp $DS/uniref30_2103_aln* $DS/uniref30_2103_db.* $DS/uniref30_2103_db_aln.* $DS/uniref30_2103_db_h* $DS/uniref30_2103_db_seq* $DB cp $DS/*.tsv $DB module purge module load colabfold/1.3.0 colabfold_search ${input} ${DB} results
Submission script example for folding
- colab_fold.slurm
#!/usr/bin/env bash #SBATCH --nodes=1 # Number of nodes #SBATCH --ntasks-per-node=1 # Number of tasks per node #SBATCH --cpus-per-task=10 # Number of OpenMP threads per task #SBATCH --gpus-per-node=1 # Number o module purge module load colabfold/1.3.0 export TMP=$JOBSCRATCH export TMPDIR=$JOBSCRATCH ## This script works if you generated the results folder with colabfold_search results results ## We do not advice to perform the alignment in the same job as the folding. ## The results of the folding will be stored in results_batch. colabfold_batch --data=$DSDIR/ColabFold results results_batch