Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

srun: Job step creation temporarily disabled, retrying #36

Open
donthuanalyst opened this issue Aug 19, 2021 · 2 comments
Open

srun: Job step creation temporarily disabled, retrying #36

donthuanalyst opened this issue Aug 19, 2021 · 2 comments

Comments

@donthuanalyst
Copy link

Hello,

I am running Satsuma2 in the following way. The genome assemblies I am trying to align are 225 Mb in size.

SatsumaSynteny2 -q asm.v2.FINAL.fasta -t target_genome.fna -o ../satsuma2_result

Satsuma has been running for more than 2 days.

The last message it generated on the std out 2 days ago is

Waiting for seed pre-filters...

I found the following message in one of the log files that satsuma2 slave jobs generated.

srun: mem < mem-per-cpu - resizing mem to be equal to mem-per-cpu
srun: Job step creation temporarily disabled, retrying

I am not sure whether satsuma2 got into some trouble.
Satsuma2 submitted several child jobs on the cluster. I am not sure whether they got stuck or are doing something.

I greatly appreciate any feedback you may provide to make satsuma2 run efficiently.

Thank you very much for your help.

@jonwright99
Copy link
Contributor

Hi,
I think you should kill the jobs and restart Satsuma2. If you started it with the command above Satsuma2 will default to running Kmatch (to find the initial matches) with 100Gb of memory, then spawn a single slave process with 100Gb of memory. You should specify more slaves and more resources if you have them available, setting the -km_mem parameter to 300 (300Gb of memory for Kmatch) and -slaves to 5, -threads to 5 and -sl_mem to 300Gb. This will spawn 5 slave processes, each using 5 threads and 300Gb of memory. It won't hurt if you increase the resources to whatever you have available.

@donthuanalyst
Copy link
Author

donthuanalyst commented Aug 26, 2021

Thank you so much for your response.

I tried to run Satsuma2 as you suggested:

#!/bin/bash
#SBATCH --cpus-per-task 10
#SBATCH --mem-per-cpu 30000
#SBATCH --partition=mpi
#SBATCH --nodelist=node082
#SBATCH --mail-user [email protected]
#SBATCH --mail-type BEGIN,END,FAIL
#SBATCH -J satsuma2

cd $SLURM_SUBMIT_DIR

cd ../results/satsuma2_results

module load gcc/9.2.0
module load satsuma2/37c5f38

export SATSUMA2_PATH=/cm/shared/apps/satsuma2/37c5f38/bin

SatsumaSynteny2 -q asm.v2.FINAL.fasta \
-t ref_genome.fna \
-o ../satsuma2_results \
-slaves 3 \
-threads 3 \
-sl_mem 300

Even though I explicitly mentioned to run just 3 slaves I see that it ran 11 KMatch jobs and gave the error messages mentioned below

less satsuma2_results/KM17.log
srun: error: Unable to create job step: More processors requested than permitted

ERROR message in the stdout file

/cm/local/apps/slurm/var/spool/job1802343/slurm_script: line 24: 58869 Segmentation fault      SatsumaSynteny2 -q asm.v2.FINAL.fasta -t ref_genome.fna -o ../satsuma2_results -slaves
3 -threads 3 -sl_mem 300
[rdonthu@boqueron src]$ ll ../results/satsuma2_results/
total 520040
-rw-r--r-- 1 rdonthu tgiray       767 Aug 25 14:23 KM11.log
-rw-r--r-- 1 rdonthu tgiray       781 Aug 25 14:37 KM13.log
-rw-r--r-- 1 rdonthu tgiray       784 Aug 25 14:23 KM15.log
-rw-r--r-- 1 rdonthu tgiray        81 Aug 25 14:22 KM17.log
-rw-r--r-- 1 rdonthu tgiray        81 Aug 25 14:22 KM19.log
-rw-r--r-- 1 rdonthu tgiray        81 Aug 25 14:22 KM21.log
-rw-r--r-- 1 rdonthu tgiray        81 Aug 25 14:22 KM23.log
-rw-r--r-- 1 rdonthu tgiray        81 Aug 25 14:22 KM25.log
-rw-r--r-- 1 rdonthu tgiray        81 Aug 25 14:22 KM27.log
-rw-r--r-- 1 rdonthu tgiray       787 Aug 25 14:25 KM29.log
-rw-r--r-- 1 rdonthu tgiray       787 Aug 25 14:27 KM31.log
-rw-r--r-- 1 rdonthu tgiray     65448 Aug 25 14:23 kmatch_results.k11
-rw-r--r-- 1 rdonthu tgiray         0 Aug 25 14:23 kmatch_results.k11.finished
-rw-r--r-- 1 rdonthu tgiray 136398024 Aug 25 14:37 kmatch_results.k13
-rw-r--r-- 1 rdonthu tgiray         0 Aug 25 14:37 kmatch_results.k13.finished
-rw-r--r-- 1 rdonthu tgiray 231339816 Aug 25 14:23 kmatch_results.k15
-rw-r--r-- 1 rdonthu tgiray         0 Aug 25 14:23 kmatch_results.k15.finished
-rw-r--r-- 1 rdonthu tgiray         0 Aug 25 14:22 kmatch_results.k17.finished
-rw-r--r-- 1 rdonthu tgiray         0 Aug 25 14:22 kmatch_results.k19.finished
-rw-r--r-- 1 rdonthu tgiray         0 Aug 25 14:22 kmatch_results.k21.finished
-rw-r--r-- 1 rdonthu tgiray         0 Aug 25 14:22 kmatch_results.k23.finished
-rw-r--r-- 1 rdonthu tgiray         0 Aug 25 14:22 kmatch_results.k25.finished
-rw-r--r-- 1 rdonthu tgiray         0 Aug 25 14:22 kmatch_results.k27.finished
-rw-r--r-- 1 rdonthu tgiray  83956392 Aug 25 14:25 kmatch_results.k29
-rw-r--r-- 1 rdonthu tgiray         0 Aug 25 14:25 kmatch_results.k29.finished
-rw-r--r-- 1 rdonthu tgiray  80697816 Aug 25 14:27 kmatch_results.k31
-rw-r--r-- 1 rdonthu tgiray         0 Aug 25 14:27 kmatch_results.k31.finished
-rw-r--r-- 1 rdonthu tgiray         0 Aug 25 14:22 satsuma.log
-rw-r--r-- 1 rdonthu tgiray       321 Aug 25 14:22 slurm_tmp.sh

Full stdout file

SATSUMA: Welcome to SatsumaSynteny! Current date and time: 2021/08/25 14:22:35
Path for Satsuma2: '/cm/shared/apps/satsuma2/37c5f38/bin'
Executing SatsumaSynteny2
Setting up grid.
Preparing...
select=0        chunks=73417
chunks: 73417
select=0        chunks=55594
chunks: 55594
size x=3060 y=2317 size=7090020
Initializing multimatches...
Done.
SATSUMA: Acquiring seeds, date and time: 2021/08/25 14:22:47
Running seed pre-filter:
  /cm/shared/apps/satsuma2/37c5f38/bin/KMatch asm.v2.FINAL.fasta ref_genome.fna 11 ../satsuma2_results/kmatch_results.k11 11 10 1; touch ../satsuma2_results/kmatch_results
.k11.finished
Submitted batch job 1802348
Running seed pre-filter:
  /cm/shared/apps/satsuma2/37c5f38/bin/KMatch asm.v2.FINAL.fasta ref_genome.fna 13 ../satsuma2_results/kmatch_results.k13 13 12 1; touch ../satsuma2_results/kmatch_results
.k13.finished
Submitted batch job 1802349
Running seed pre-filter:
  /cm/shared/apps/satsuma2/37c5f38/bin/KMatch asm.v2.FINAL.fasta ref_genome.fna 15 ../satsuma2_results/kmatch_results.k15 15 14 1; touch ../satsuma2_results/kmatch_results
.k15.finished
Submitted batch job 1802350
Running seed pre-filter:
  /cm/shared/apps/satsuma2/37c5f38/bin/KMatch asm.v2.FINAL.fasta ref_genome.fna 17 ../satsuma2_results/kmatch_results.k17 17 16 1; touch ../satsuma2_results/kmatch_results
.k17.finished
Submitted batch job 1802351
Running seed pre-filter:
  /cm/shared/apps/satsuma2/37c5f38/bin/KMatch asm.v2.FINAL.fasta ref_genome.fna 19 ../satsuma2_results/kmatch_results.k19 19 18 1; touch ../satsuma2_results/kmatch_results
.k19.finished
Submitted batch job 1802352
Running seed pre-filter:
  /cm/shared/apps/satsuma2/37c5f38/bin/KMatch asm.v2.FINAL.fasta ref_genome.fna 21 ../satsuma2_results/kmatch_results.k21 21 20 1; touch ../satsuma2_results/kmatch_results
.k21.finished
Submitted batch job 1802353
Running seed pre-filter:
  /cm/shared/apps/satsuma2/37c5f38/bin/KMatch asm.v2.FINAL.fasta ref_genome.fna 23 ../satsuma2_results/kmatch_results.k23 23 22 1; touch ../satsuma2_results/kmatch_results
.k23.finished
Submitted batch job 1802354
Running seed pre-filter:
  /cm/shared/apps/satsuma2/37c5f38/bin/KMatch asm.v2.FINAL.fasta ref_genome.fna 25 ../satsuma2_results/kmatch_results.k25 25 24 1; touch ../satsuma2_results/kmatch_results
.k25.finished
Submitted batch job 1802355
Running seed pre-filter:
  /cm/shared/apps/satsuma2/37c5f38/bin/KMatch asm.v2.FINAL.fasta ref_genome.fna 27 ../satsuma2_results/kmatch_results.k27 27 26 1; touch ../satsuma2_results/kmatch_results
.k27.finished
Submitted batch job 1802356
Running seed pre-filter:
  /cm/shared/apps/satsuma2/37c5f38/bin/KMatch asm.v2.FINAL.fasta ref_genome.fna 29 ../satsuma2_results/kmatch_results.k29 29 28 1; touch ../satsuma2_results/kmatch_results
.k29.finished
Submitted batch job 1802357
Running seed pre-filter:
  /cm/shared/apps/satsuma2/37c5f38/bin/KMatch asm.v2.FINAL.fasta ref_genome.fna 31 ../satsuma2_results/kmatch_results.k31 31 30 1; touch ../satsuma2_results/kmatch_results
.k31.finished
Submitted batch job 1802358
Waiting for seed pre-filters...
loading results for k=19
/cm/local/apps/slurm/var/spool/job1802343/slurm_script: line 24: 58869 Segmentation fault      SatsumaSynteny2 -q asm.v2.FINAL.fasta -t ref_genome.fna -o ../satsuma2_results -slaves
3 -threads 3 -sl_mem 300

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants