Table des matières
Repliement de protéines sur Jean Zay
L’IDRIS propose plusieurs logiciels de repliement de protéines.
Conseils
L’utilisation d’Alphafold et Colabfold se fait en deux étapes :
- Alignement de séquences multiples
- Repliement de la protéine
L’alignement de séquences est assez long et n’est pas porté sur GPU. Il est préférable de le faire en dehors de la réservation GPU pour ne pas gâcher des heures de calculs.
Une possibilité est de le faire sur la partition pre-post, puis d’utiliser les résultats pour la phase de repliement.
Alphafold
Liens utiles
Versions disponibles
Version |
---|
2.3.1 |
2.2.4 |
2.1.2 |
Exemple de script de soumission
Alphafold 2.3.1
Monomer A100
- alphafold-2.3.1-A100.slurm
#!/usr/bin/env bash #SBATCH --nodes=1 # Number of nodes #SBATCH --ntasks-per-node=1 # Number of tasks per node #SBATCH --cpus-per-task=8 # Number of OpenMP threads per task #SBATCH --gpus-per-node=1 # Number of GPUs per node #SBATCH -C a100 # Use A100 partition #SBATCH --hint=nomultithread # Disable hyperthreading #SBATCH --job-name=alphafold # Jobname #SBATCH --output=%x.o%j # Output file %x is the jobname, %j the jobid #SBATCH --error=%x.o%j # Error file #SBATCH --time=10:00:00 # Expected runtime HH:MM:SS (max 20h) ## ## Please, refer to comments below for ## more information about these 3 last options. ##SBATCH --account=<account>@a100 # To specify cpu accounting: <account> = echo $IDRPROJ ##SBATCH --partition=<partition> # To specify partition (see IDRIS web site for more info) ##SBATCH --qos=qos_gpu-dev # Uncomment for job requiring less than 2 hours module purge module load cpuarch/amd module load alphafold/2.3.1 export TMP=$JOBSCRATCH export TMPDIR=$JOBSCRATCH fafile=test.fa python3 $(which run_alphafold.py) \ --output_dir=outputs_${fafile} \ --uniref90_database_path=${ALPHAFOLDDB}/uniref90/uniref90.fasta \ --mgnify_database_path=${ALPHAFOLDDB}/mgnify/mgy_clusters_2022_05.fa \ --template_mmcif_dir=${ALPHAFOLDDB}/pdb_mmcif \ --obsolete_pdbs_path=${ALPHAFOLDDB}/pdb_mmcif/obsolete.dat \ --bfd_database_path=${ALPHAFOLDDB}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ --pdb70_database_path=${ALPHAFOLDDB}/pdb70/pdb70 \ --uniref30_database_path=${ALPHAFOLDDB}/uniref30/UniRef30_2021_03 \ --use_gpu_relax \ --model_preset=monomer \ --fasta_paths=${fafile} \ --max_template_date=2022-01-01 \ --data_dir=${ALPHAFOLDDB}/model_parameters/2.3.1
Pour un monomer
- alphafold.slurm
#!/usr/bin/env bash #SBATCH --nodes=1 # Number of nodes #SBATCH --ntasks-per-node=1 # Number of tasks per node #SBATCH --cpus-per-task=10 # Number of OpenMP threads per task #SBATCH --gpus-per-node=1 # Number of GPUs per node #SBATCH --hint=nomultithread # Disable hyperthreading #SBATCH --job-name=alphafold # Jobname #SBATCH --output=%x.o%j # Output file %x is the jobname, %j the jobid #SBATCH --error=%x.o%j # Error file #SBATCH --time=10:00:00 # Expected runtime HH:MM:SS (max 100h) ## ## Please, refer to comments below for ## more information about these 4 last options. ##SBATCH --account=<account>@v100 # To specify cpu accounting: <account> = echo $IDRPROJ ##SBATCH --partition=<partition> # To specify partition (see IDRIS web site for more info) ##SBATCH --qos=qos_gpu-dev # Uncomment for job requiring less than 2 hours ##SBATCH --qos=qos_gpu-t4 # Uncomment for job requiring more than 20h (max 16 GPUs) module purge module load alphafold/2.2.4 export TMP=$JOBSCRATCH export TMPDIR=$JOBSCRATCH ## In this example we do not let the structures relax with OpenMM python3 $(which run_alphafold.py) \ --output_dir=outputs \ --uniref90_database_path=${DSDIR}/AlphaFold/uniref90/uniref90.fasta \ --mgnify_database_path=${DSDIR}/AlphaFold/mgnify/mgy_clusters_2018_12.fa \ --template_mmcif_dir=${DSDIR}/AlphaFold/pdb_mmcif/mmcif_files \ --obsolete_pdbs_path=${DSDIR}/AlphaFold/pdb_mmcif/obsolete.dat \ --bfd_database_path=${DSDIR}/AlphaFold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ --uniclust30_database_path=${DSDIR}/AlphaFold/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \ --pdb70_database_path=${DSDIR}/AlphaFold/pdb70/pdb70 \ --fasta_paths=test.fa \ --max_template_date=2021-07-28 \ --use_gpu_relax=False \ --norun_relax \ --data_dir=${DSDIR}/AlphaFold/model_parameters/2.2.4
Pour un multimer
Attention le fichier fasta doit contenir les différents monomers.
- alphafold_multimer.slurm
#!/usr/bin/env bash #SBATCH --nodes=1 # Number of nodes #SBATCH --ntasks-per-node=1 # Number of tasks per node #SBATCH --cpus-per-task=10 # Number of OpenMP threads per task #SBATCH --gpus-per-node=1 # Number of GPUs per node #SBATCH --hint=nomultithread # Disable hyperthreading #SBATCH --job-name=alphafold # Jobname #SBATCH --output=%x.o%j # Output file %x is the jobname, %j the jobid #SBATCH --error=%x.o%j # Error file #SBATCH --time=10:00:00 # Expected runtime HH:MM:SS (max 100h for V100, 20h for A100) ## ## Please, refer to comments below for ## more information about these 4 last options. ##SBATCH --account=<account>@v100 # To specify gpu accounting: <account> = echo $IDRPROJ ##SBATCH --partition=<partition> # To specify partition (see IDRIS web site for more info) ##SBATCH --qos=qos_gpu-dev # Uncomment for job requiring less than 2 hours ##SBATCH --qos=qos_gpu-t4 # Uncomment for job requiring more than 20h (max 16 GPUs, V100 only) module purge module load alphafold/2.2.4 export TMP=$JOBSCRATCH export TMPDIR=$JOBSCRATCH ## In this example we let the structures relax with OpenMM python3 $(which run_alphafold.py) \ --output_dir=outputs \ --uniref90_database_path=${DSDIR}/AlphaFold/uniref90/uniref90.fasta \ --mgnify_database_path=${DSDIR}/AlphaFold/mgnify/mgy_clusters_2018_12.fa \ --template_mmcif_dir=${DSDIR}/AlphaFold/pdb_mmcif/mmcif_files \ --obsolete_pdbs_path=${DSDIR}/AlphaFold/pdb_mmcif/obsolete.dat \ --bfd_database_path=${DSDIR}/AlphaFold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ --pdb_seqres_database_path=${DSDIR}/AlphaFold/pdb_seqres/pdb_seqres.txt \ --uniclust30_database_path=${DSDIR}/AlphaFold/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \ --uniprot_database_path=${DSDIR}/AlphaFold/uniprot/uniprot.fasta \ --use_gpu_relax \ --model_preset=multimer \ --fasta_paths=test.fasta \ --max_template_date=2022-01-01 \ --data_dir=${DSDIR}/AlphaFold/model_parameters/2.2.4
Colabfold
Liens utiles
Conseils pour la phase d’alignement
Le logiciel utilisé pour la phase d’alignement est MMSeqs. Celui-ci utilise une fonctionnalité pour lire les fichiers de la base de données qui est très inefficace sur le système de fichiers partagés Spectrum Scale présent à l'IDRIS.
Si vous avez un grand nombre de séquences à replier il est possible de copier la base de données en mémoire vive d’un noeud prepost pour accélérer les calculs. Ce n’est pas intéressant si vous avez moins de 20 séquences.
- colab_align.slurm
#!/usr/bin/env bash #SBATCH --nodes=1 # Number of nodes #SBATCH --ntasks-per-node=1 # Number of tasks per node #SBATCH --cpus-per-task=10 # Number of OpenMP threads per task #SBATCH --hint=nomultithread # Disable hyperthreading #SBATCH --job-name=align_colabfold # Jobname #SBATCH --output=%x.o%j # Output file %x is the jobname, %j the jobid #SBATCH --error=%x.o%j # Error file #SBATCH --time=10:00:00 # Expected runtime HH:MM:SS (max 20h) #SBATCH --partition=prepost DS=$DSDIR/ColabFold DB=/dev/shm/ColabFold input=test.fa mkdir $DB cp $DS/colabfold_envdb_202108_aln* $DS/colabfold_envdb_202108_db.* $DS/colabfold_envdb_202108_db_aln.* $DS/colabfold_envdb_202108_db_h* $DS/colabfold_envdb_202108_db_seq* $DB cp $DS/uniref30_2103_aln* $DS/uniref30_2103_db.* $DS/uniref30_2103_db_aln.* $DS/uniref30_2103_db_h* $DS/uniref30_2103_db_seq* $DB cp $DS/*.tsv $DB module purge module load colabfold/1.3.0 colabfold_search ${input} ${DB} results
Exemple de script de soumission pour le repliement
- colab_fold.slurm
#!/usr/bin/env bash #SBATCH --nodes=1 # Number of nodes #SBATCH --ntasks-per-node=1 # Number of tasks per node #SBATCH --cpus-per-task=10 # Number of OpenMP threads per task #SBATCH --gpus-per-node=1 # Number o module purge module load colabfold/1.3.0 export TMP=$JOBSCRATCH export TMPDIR=$JOBSCRATCH ## This script works if you generated the results folder with colabfold_search results results ## We do not advice to perform the alignment in the same job as the folding. ## The results of the folding will be stored in results_batch. colabfold_batch --data=$DSDIR/ColabFold results results_batch