Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't get anno block #10

Open
sidizhao opened this issue Feb 9, 2023 · 27 comments
Open

can't get anno block #10

sidizhao opened this issue Feb 9, 2023 · 27 comments

Comments

@sidizhao
Copy link

sidizhao commented Feb 9, 2023

Hi there,

I've been using isoCirc successfully for a while now but this week after installing v1.0.6, it seems to be generating some errors towards the end of the process. I am able to get isocirc.bed output but not isocirc.out. Here's the full error:

Matplotlib created a temporary config/cache directory at /tmp/977627.tmpdir/matplotlib-z2gis1si because the default path (/home/s.zhao/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
== 06:19:32-Feb-09-2023 == [check_dependencies] Checking dependencies ...
== 06:19:32-Feb-09-2023 == [check_dependencies] Checking dependencies done!
== 06:19:33-Feb-09-2023 == [Tandem-Repeats-Finder] Finding tandem repeats with TRF ...
== 06:19:33-Feb-09-2023 == [Tandem Repeats Finder] trf409.legacylinux64 /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/722_primary_pacbio_long_corrected.0.fa 2 7 7 80 10 100 2000 -h -ngs > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/trf.out
== 14:35:27-Feb-09-2023 == [Tandem-Repeats-Finder] Finding tandem repeats with TRF done!
== 14:35:27-Feb-09-2023 == [Mapping] Mapping consensus sequence to genome ...
== 14:35:27-Feb-09-2023 == [Mapping] minimap2 -ax splice -ub --MD --eqx /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/hct116_pacbio/annotation/all-chrs.fa /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/cons.fa -t 1 > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/cons.fa.sam
[M::mm_idx_gen::135.0830.90] collected minimizers
[M::mm_idx_gen::238.320
0.89] sorted minimizers
[M::main::238.4270.89] loaded/built the index for 455 target sequence(s)
[M::mm_mapopt_update::243.515
0.89] mid_occ = 792
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 455
[M::mm_idx_stat::245.9300.89] distinct minimizers: 167291034 (34.68% are singletons); average occurrences: 6.239; average spacing: 3.075
[M::worker_pipeline::1077.080
0.85] mapped 94741 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -ax splice -ub --MD --eqx -t 1 /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/hct116_pacbio/annotation/all-chrs.fa /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/cons.fa
[M::main] Real time: 1079.441 sec; CPU: 913.986 sec; Peak RSS: 18.981 GB
== 14:53:27-Feb-09-2023 == [Mapping] Mapping consensus sequence to genome done!
== 14:53:27-Feb-09-2023 == [Classifying] Classifying consensus alignment ...
== 14:53:27-Feb-09-2023 == [classify_bam_core] Processing /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/cons.fa.sam ...
== 14:53:31-Feb-09-2023 == [classify_bam_core] 100000 BAM records done ...
== 14:53:33-Feb-09-2023 == [classify_bam_core] Processing /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/cons.fa.sam done.
== 14:53:33-Feb-09-2023 == [Classifying] Classifying consensus alignment done!
== 14:53:35-Feb-09-2023 == [gtfToGenePred] gtfToGenePred -ignoreGroupsWithoutExons /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.gene_pred
== 14:55:19-Feb-09-2023 == [genePredToBed] genePredToBed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.gene_pred /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.bed
== 14:56:10-Feb-09-2023 == [get_transcript_from_gtf] Loading transcript from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.gtf ...
== 14:56:31-Feb-09-2023 == [get_transcript_from_gtf] Loading transcript from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.gtf done!
== 14:56:31-Feb-09-2023 == [get_splice_site_from_bed12] Loading splice site from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.bed ...
[E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/high.bam'
== 14:56:41-Feb-09-2023 == [get_splice_site_from_bed12] Loading splice site from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.bed done!
== 14:56:41-Feb-09-2023 == [get_splice_junction_from_bed12] Loading splice junction from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.bed ...
[E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/high.bam'
== 14:56:49-Feb-09-2023 == [get_splice_junction_from_bed12] Loading splice junction from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.bed done!
== 14:56:49-Feb-09-2023 == [get_exon_from_bed12] Loading exon from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.bed ...
[E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/high.bam'
== 14:56:57-Feb-09-2023 == [get_exon_from_bed12] Loading exon from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.bed done!
== 14:56:57-Feb-09-2023 == [get_back_splice_junction_from_bed] Loading splice junction from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/patient_722_short_read_annotation.bed ...
[E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/high.bam'
== 14:56:57-Feb-09-2023 == [get_back_splice_junction_from_bed] Loading splice junction from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/patient_722_short_read_annotation.bed done!
[E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/high.bam'
[E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/low.bam'
== 14:56:58-Feb-09-2023 == [read_wise_eval] Generating read-wise evaluation result ...
== 14:58:02-Feb-09-2023 == [read_wise_eval] Generating read-wise evaluation result done!
== 14:58:02-Feb-09-2023 == [filter_circRNA_read] Filtering back-splice-junctions ...
== 14:58:03-Feb-09-2023 == [filter_circRNA_read] Filtering back-splice-junctions done!
== 14:58:03-Feb-09-2023 == [rescue_reads] Rescuing reads using reliable back-splice-junctions ...
== 14:58:03-Feb-09-2023 == [rescue_reads] Rescuing reads using reliable back-splice-junctions done!
== 14:58:03-Feb-09-2023 == [uniq_isoform_with_unsorted_coors] Generating isoform-wise evaluation result ...
== 14:58:03-Feb-09-2023 == [uniq_isoform_with_unsorted_coors] Generating isoform-wise evaluation result done!
== 14:58:13-Feb-09-2023 == [bed2exonGtf] bed2exonGtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.gtf
== 14:58:18-Feb-09-2023 == [exonGtf] awk -v OFS="\t" '($3=="exon"){print}' /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.gtf > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.exon.gtf
== 14:58:29-Feb-09-2023 == [gtf2bed] awk -v OFS="\t" '($3=="gene"){print $1,$4-1,$5}' /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.gtf > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.gene.bed
== 14:58:43-Feb-09-2023 == [gtf2bed] awk -v OFS="\t" '($3=="CDS"){print}' /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.gtf > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.cds.gtf
== 14:58:58-Feb-09-2023 == [gtf2bed] awk -v OFS="\t" '($3=="UTR" || $3=="five_prime_utr" || $3=="three_prime_utr"){print}' /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.gtf > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.utr.gtf
== 14:59:11-Feb-09-2023 == [gtf2bed] awk -v OFS="\t" '($3=="exon" && ($0 ~ /gene_biotype "lincRNA"/ || $0 ~ /gene_type "lincRNA"/)){print}' /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.gtf > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.lincRNA.gtf
== 14:59:29-Feb-09-2023 == [gtf2bed] awk -v OFS="\t" '($3=="exon" && ($0 ~ /gene_biotype "antisense"/ || $0 ~ /gene_type "antisense"/)){print}' /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.gtf > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.antisense.gtf
== 14:59:51-Feb-09-2023 == [gtf2bed] awk -v OFS="\t" '($3=="exon" && ($0 ~ /gene_biotype "rRNA"/ || $0 ~ /gene_type "rRNA"/)){print}' /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.gtf > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.rRNA.gtf
== 15:01:14-Feb-09-2023 == [bed2exonGtf] bed2exonGtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.five.site.bed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.five.site.exon.gtf
== 15:01:23-Feb-09-2023 == [bed2exonGtf] bed2exonGtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.three.site.bed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.three.site.exon.gtf
== 15:01:26-Feb-09-2023 == [bed2exonGtf] bed2exonGtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.five.site.bed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.five.site.exon.gtf
== 15:01:47-Feb-09-2023 == [bed2exonGtf] bed2exonGtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.three.site.bed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.three.site.exon.gtf
== 15:02:13-Feb-09-2023 == [itst_gtf_gtf] itst_gtf_gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.five.site.exon.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.five.site.exon.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.five.site.gene.out
== 15:02:18-Feb-09-2023 == [itst_gtf_gtf] itst_gtf_gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.three.site.exon.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.three.site.exon.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.three.site.gene.out
== 15:02:22-Feb-09-2023 == [gtf2gene] gtf2gene /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.ovlp.gene.out
== 15:02:46-Feb-09-2023 == [itst_gtf_bed] itst_gtf_bed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.cds.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.CDS.out
== 15:02:52-Feb-09-2023 == [itst_gtf_bed] itst_gtf_bed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.utr.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.UTR.out
== 15:02:53-Feb-09-2023 == [itst_gtf_bed] itst_gtf_bed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.lincRNA.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.lincRNA.out
== 15:02:54-Feb-09-2023 == [itst_gtf_bed] itst_gtf_bed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.antisense.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.antisense.out
== 15:02:55-Feb-09-2023 == [itst_gtf_bed] itst_gtf_bed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.rRNA.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.rRNA.out
== 15:02:56-Feb-09-2023 == [itst_intron] bedtools intersect -v -a /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.gtf -b /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.bed -split > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.intron.out
== 15:02:58-Feb-09-2023 == [itst_intergenic] bedtools intersect -v -a /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.gtf -b /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.gene.bed > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.intergenic.out
== 15:02:59-Feb-09-2023 == [itst_exon] bedtools intersect -a /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.gtf -b /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/hg38_with_maher_lab_lncrna.gtf.exon.gtf -wa -wb > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.out
== 15:03:08-Feb-09-2023 == [get_block_anno] No "exon_number" found in record.

I checked back at the previous successful runs' error logs and there wasn't a step [get_block_anno] in it, it just goes from [itst_exon] to [output_isoform_eval]. I know my GTF file does have exon_number and exon_id for most of the transcripts (I don't know if I need to clean up my GTF more? It used to work fine though.)

Here's a few lines:

chr1 ensGene exon 11869 12227 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "1"; exon_id "ENST00000456328.1"; gene_name "ENSG00000223972";
chr1 ensGene exon 12613 12721 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "2"; exon_id "ENST00000456328.2"; gene_name "ENSG00000223972";
chr1 ensGene exon 13221 14409 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "3"; exon_id "ENST00000456328.3"; gene_name "ENSG00000223972";

Also, I notice that some of the steps in [gtf2bed] require "gene_type" or "gene_biotype" in them. Should the GTF file include the biotypes as well?

@yangao07
Copy link
Collaborator

yangao07 commented Feb 9, 2023

Based on this log, isoCirc is trying to get "exon_number" from "/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_matched_patients/722_primary_pacbio/isocirc_output_short_read/output_0/isocirc.bed.exon.out", which is expected to be like this:

chr16	isocirc	exon	66625	66738	.	+	.	gene_id "isocirc0"; transcript_id "isocirc0"; exon_number "1"; exon_id "isocirc0.1";	chr16	havana	exon	66537	66738	.	-	.	gene_id "ENSG00000234769"; gene_version "4"; transcript_id "ENST00000326592"; transcript_version "9"; exon_number "6"; gene_name "WASH4P"; gene_source "havana"; gene_biotype "protein_coding"; transcript_name "WASH4P-001"; transcript_source "havana"; transcript_biotype "protein_coding"; havana_transcript "OTTHUMT00000133175"; havana_transcript_version "2"; exon_id "ENSE00001686309"; exon_version "1"; tag "basic";

For gene_type, it is not required.

@sidizhao
Copy link
Author

sidizhao commented Feb 9, 2023

Here's that file:

output_0$ head isocirc.bed.exon.out
chr1 isocirc exon 233198939 233199030 . - . gene_id "isocirc0"; transcript_id "isocirc0"; exon_number "1"; exon_id "isocirc0.1"; chr1 ensGene exon 233198939 233199030 . - . gene_id "ENSG00000135749"; transcript_id "ENST00000258229"; exon_number "20"; exon_id "ENST00000258229.20"; gene_name "ENSG00000135749";
chr1 isocirc exon 233198939 233199030 . - . gene_id "isocirc0"; transcript_id "isocirc0"; exon_number "1"; exon_id "isocirc0.1"; chr1 ensGene exon 233198939 233199030 . - . gene_id "ENSG00000135749"; transcript_id "ENST00000462233"; exon_number "19"; exon_id "ENST00000462233.19"; gene_name "ENSG00000135749";
chr1 isocirc exon 233198939 233199030 . - . gene_id "isocirc0"; transcript_id "isocirc0"; exon_number "1"; exon_id "isocirc0.1"; chr1 ensGene exon 233198939 233199030 . - . gene_id "ENSG00000135749"; transcript_id "ENST00000475463"; exon_number "8"; exon_id "ENST00000475463.8"; gene_name "ENSG00000135749";
chr1 isocirc exon 233198939 233199030 . - . gene_id "isocirc0"; transcript_id "isocirc0"; exon_number "1"; exon_id "isocirc0.1"; chr1 ensGene exon 233198939 233199030 . - . gene_id "ENSG00000135749"; transcript_id "ENST00000488780"; exon_number "7"; exon_id "ENST00000488780.7"; gene_name "ENSG00000135749";
chr1 isocirc exon 233198939 233199030 . - . gene_id "isocirc0"; transcript_id "isocirc0"; exon_number "1"; exon_id "isocirc0.1"; chr1 ensGene exon 233198939 233199030 . - . gene_id "ENSG00000135749"; transcript_id "ENST00000430153"; exon_number "7"; exon_id "ENST00000430153.7"; gene_name "ENSG00000135749";
chr1 isocirc exon 233198939 233199030 . - . gene_id "isocirc0"; transcript_id "isocirc0"; exon_number "1"; exon_id "isocirc0.1"; chr1 ensGene exon 233198967 233199030 . - . gene_id "ENSG00000135749"; transcript_id "ENST00000518351"; exon_number "1"; exon_id "ENST00000518351.1"; gene_name "ENSG00000135749";
chr1 isocirc exon 233198939 233199030 . - . gene_id "isocirc0"; transcript_id "isocirc0"; exon_number "1"; exon_id "isocirc0.1"; chr1 ensGene exon 233199020 233199030 . - . gene_id "ENSG00000135749"; transcript_id "ENST00000517808"; exon_number "1"; exon_id "ENST00000517808.1"; gene_name "ENSG00000135749";
chr1 isocirc exon 233198939 233199030 . - . gene_id "isocirc0"; transcript_id "isocirc0"; exon_number "1"; exon_id "isocirc0.1"; chr1 knownGene exon 233198939 233199030 . - . gene_id "A6NKB5"; transcript_id "ENST00000258229.14"; exon_number "20"; exon_id "ENST00000258229.14.20"; gene_name "A6NKB5";
chr1 isocirc exon 233198939 233199030 . - . gene_id "isocirc0"; transcript_id "isocirc0"; exon_number "1"; exon_id "isocirc0.1"; chr1 knownGene exon 233198939 233199030 . - . gene_id "H0YB15"; transcript_id "ENST00000462233.5"; exon_number "19"; exon_id "ENST00000462233.5.19"; gene_name "H0YB15";
chr1 isocirc exon 233198939 233199030 . - . gene_id "isocirc0"; transcript_id "isocirc0"; exon_number "1"; exon_id "isocirc0.1"; chr1 knownGene exon 233198939 233199030 . - . gene_id "H0YBF4"; transcript_id "ENST00000475463.6"; exon_number "8"; exon_id "ENST00000475463.6.8"; gene_name "H0YBF4";

@yangao07
Copy link
Collaborator

yangao07 commented Feb 9, 2023

Also, you mentioned this was run successfully until the recent update of v1.0.6.
This is weird because nothing has been changed related to this part.

@sidizhao
Copy link
Author

sidizhao commented Feb 9, 2023

So I concatenated more custom entries to the GTF I used for this run, which I checked to have exon_number and exon_id in those entries as well. I am quite confused as well.

@yangao07
Copy link
Collaborator

yangao07 commented Feb 9, 2023

I see, but there must be several lines that have no "exon_number" so as to cause this error.

@sidizhao
Copy link
Author

sidizhao commented Feb 9, 2023

I think I know where the problem is. So I looked at the exact same circRNA, isocirc1 which was detected in both the old run and the new run. From the old isocirc.out:

isocirc1 chr10 34422057 34422259 NA NA NA 1 202 0 202 N NA False,False NA NA NA False False True +GT/AG True NNC FSM NA NA NA NA NA NA NA NA NA 1 m64043_220730_094118/30345339/ccs

In this case, it seems like it didn't really get a successful annotation but outputted the file anyway. Is there a particular reason why this new run isn't doing the same? I'm looking at the intermediate files of the new run:

$ cat isocirc.bed.ovlp.gene.out
isocirc1 G009115 G009115 +

I think G009115 is one of the newer transcripts I added on, which means in the old run it wasn't getting recognized. By searching through the new annotation:

$ grep G009115 hg38_with_maher_lab_lncrna.gtf
chr10 mitranscriptome gene 34417023 34459184 . + . gene_id "G009115"; gene_name "Unknown"
chr10 mitranscriptome transcript 34417023 34436597 . + . transcript_id "T039819"; gene_id "G009115"; transcript_name "Unknown"; gene_name "Unknown"
chr10 mitranscriptome exon 34417023 34417308 . + . transcript_id "T039819"; gene_id "G009115"; transcript_name "Unknown"; gene_name "Unknown"
chr10 mitranscriptome transcript 34417023 34459184 . + . transcript_id "T039820"; gene_id "G009115"; transcript_name "Unknown"; gene_name "Unknown"
chr10 mitranscriptome exon 34417023 34417308 . + . transcript_id "T039820"; gene_id "G009115"; transcript_name "Unknown"; gene_name "Unknown"
chr10 mitranscriptome exon 34435248 34436597 . + . transcript_id "T039819"; gene_id "G009115"; transcript_name "Unknown"; gene_name "Unknown"
chr10 mitranscriptome exon 34458756 34459184 . + . transcript_id "T039820"; gene_id "G009115"; transcript_name "Unknown"; gene_name "Unknown"

Do you think if I added exon number and exon id to these transcripts, it'll rectify the problem?

@yangao07
Copy link
Collaborator

yangao07 commented Feb 9, 2023

Yes, you should try that.

@sidizhao
Copy link
Author

Resolved. Thank you!
Is there a way to make the short read correct step a separate command? My computer cluster has a hard time running the entire process at once, so I typically end up having to break long_corrected.fa into smaller files and redo the isocirc command without the correction.

@yangao07
Copy link
Collaborator

You can run lordec (or any long-read correction tool) separately if you have matched short-read data to correct the long-read data, and then use the corrected long reads as input.

@sidizhao
Copy link
Author

sidizhao commented Feb 10, 2023

Okay I'll keep that in mind.

Actually I just ran into some small problems. Since I've broken the fasta file up, some of the smaller files aren't finishing the job, whereas some of them did finish and produced results. Here's one example, and it seems that it just gets cut off after [read_wise_eval] started. Is this normal?

Matplotlib created a temporary config/cache directory at /tmp/995513.tmpdir/matplotlib-gp59czes because the default path (/home/s.zhao/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
== 11:25:49-Feb-10-2023 == [check_dependencies] Checking dependencies ...
== 11:25:49-Feb-10-2023 == [check_dependencies] Checking dependencies done!
== 11:25:49-Feb-10-2023 == [Tandem-Repeats-Finder] Finding tandem repeats with TRF ...
== 11:25:50-Feb-10-2023 == [Tandem Repeats Finder] trf409.legacylinux64 /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/hct116_FAR20705_nanopore_long_corrected.16.fa 2 7 7 80 10 100 2000 -h -ngs > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/trf.out
== 13:09:15-Feb-10-2023 == [Tandem-Repeats-Finder] Finding tandem repeats with TRF done!
== 13:09:15-Feb-10-2023 == [Mapping] Mapping consensus sequence to genome ...
== 13:09:15-Feb-10-2023 == [Mapping] minimap2 -ax splice -ub --MD --eqx /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/hct116_pacbio/annotation/all-chrs.fa /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/cons.fa -t 1 > /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/cons.fa.sam
[M::mm_idx_gen::107.0280.99] collected minimizers
[M::mm_idx_gen::188.941
0.99] sorted minimizers
[M::main::188.9490.99] loaded/built the index for 455 target sequence(s)
[M::mm_mapopt_update::192.535
0.99] mid_occ = 792
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 455
[M::mm_idx_stat::194.8580.99] distinct minimizers: 167291034 (34.68% are singletons); average occurrences: 6.239; average spacing: 3.075
[M::worker_pipeline::640.656
0.99] mapped 101445 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -ax splice -ub --MD --eqx -t 1 /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/hct116_pacbio/annotation/all-chrs.fa /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/cons.fa
[M::main] Real time: 642.341 sec; CPU: 636.824 sec; Peak RSS: 18.981 GB
== 13:19:57-Feb-10-2023 == [Mapping] Mapping consensus sequence to genome done!
== 13:19:57-Feb-10-2023 == [Classifying] Classifying consensus alignment ...
== 13:19:57-Feb-10-2023 == [classify_bam_core] Processing /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/cons.fa.sam ...
== 13:19:59-Feb-10-2023 == [classify_bam_core] 100000 BAM records done ...
== 13:20:01-Feb-10-2023 == [classify_bam_core] Processing /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/cons.fa.sam done.
== 13:20:01-Feb-10-2023 == [Classifying] Classifying consensus alignment done!
== 13:20:03-Feb-10-2023 == [gtfToGenePred] gtfToGenePred -ignoreGroupsWithoutExons /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.with.exon.id.gtf /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/hg38_with_maher_lab_lncrna.with.exon.id.gtf.gene_pred
== 13:20:27-Feb-10-2023 == [genePredToBed] genePredToBed /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/hg38_with_maher_lab_lncrna.with.exon.id.gtf.gene_pred /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/hg38_with_maher_lab_lncrna.with.exon.id.gtf.bed
== 13:20:30-Feb-10-2023 == [get_transcript_from_gtf] Loading transcript from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.with.exon.id.gtf ...
== 13:20:44-Feb-10-2023 == [get_transcript_from_gtf] Loading transcript from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/hg38_with_maher_lab_lncrna.with.exon.id.gtf done!
== 13:20:44-Feb-10-2023 == [get_splice_site_from_bed12] Loading splice site from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/hg38_with_maher_lab_lncrna.with.exon.id.gtf.bed ...
[E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/high.bam'
== 13:20:52-Feb-10-2023 == [get_splice_site_from_bed12] Loading splice site from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/hg38_with_maher_lab_lncrna.with.exon.id.gtf.bed done!
== 13:20:52-Feb-10-2023 == [get_splice_junction_from_bed12] Loading splice junction from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/hg38_with_maher_lab_lncrna.with.exon.id.gtf.bed ...
[E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/high.bam'
== 13:20:58-Feb-10-2023 == [get_splice_junction_from_bed12] Loading splice junction from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/hg38_with_maher_lab_lncrna.with.exon.id.gtf.bed done!
== 13:20:58-Feb-10-2023 == [get_exon_from_bed12] Loading exon from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/hg38_with_maher_lab_lncrna.with.exon.id.gtf.bed ...
[E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/high.bam'
== 13:21:06-Feb-10-2023 == [get_exon_from_bed12] Loading exon from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/hg38_with_maher_lab_lncrna.with.exon.id.gtf.bed done!
== 13:21:06-Feb-10-2023 == [get_back_splice_junction_from_bed] Loading splice junction from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/HCT116_short_read_annotation.bed ...
[E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/high.bam'
== 13:21:06-Feb-10-2023 == [get_back_splice_junction_from_bed] Loading splice junction from /storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/annotation/HCT116_short_read_annotation.bed done!
[E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/high.bam'
[E::idx_find_and_load] Could not retrieve index file for '/storage1/fs1/christophermaher/Active/maherlab/sidizhao/circ_rna/long_read/isocirc_rerun/crc_cell_lines/hct116_FAR20705_nanopore/isocirc_output_short_read/output_16/low.bam'
== 13:21:08-Feb-10-2023 == [read_wise_eval] Generating read-wise evaluation result ...
Traceback (most recent call last):
File "/usr/local/bin/miniconda3/bin/isocirc", line 219, in
main()
File "/usr/local/bin/miniconda3/bin/isocirc", line 216, in main
isocirc_core(args)
File "/usr/local/bin/miniconda3/bin/isocirc", line 132, in isocirc_core
hf.hcBSJ_fullIso(high_bam, low_bam, long_len_fn, cons_info, cons_fa,
File "/usr/local/bin/miniconda3/lib/python3.10/site-packages/isocirc/hcBSJ_fullIso.py", line 797, in hcBSJ_fullIso
eval_core(processed_cnt, r, cons_info_dict, ref_fa, cons_fa, all_site, all_exon, all_sj, circ_sj, sj_xid, key_sj_xid, site_dis, end_dis, all_out)
File "/usr/local/bin/miniconda3/lib/python3.10/site-packages/isocirc/hcBSJ_fullIso.py", line 742, in eval_core
is_known_bsj, is_cano_bsj, dis_to_cano_bsj, bsj_motif, align_bsj = pg.is_known_cano_bsj(bsj, circ_sj, ref_seq, cons_fa[r.query_name][:].seq.upper(), int(eval_out['startCoor0based']), int(eval_out['endCoor']), r.is_reverse, r.cigartuples, int(eval_out['refMapLen']), int(eval_out['consMapLen']), int(eval_out['consLen']), end_dis, force_strand, bsj_dis_to_known_ss)
File "/usr/local/bin/miniconda3/lib/python3.10/site-packages/isocirc/parse_gff.py", line 771, in is_known_cano_bsj
score, alignBSJ1 = get_cano_bsj_align(up_dis1, down_dis1, strand1, ref_seq, read_seq, start, end, end_dis, is_reverse, cigartuples, ref_map_len, cons_len)
File "/usr/local/bin/miniconda3/lib/python3.10/site-packages/isocirc/parse_gff.py", line 673, in get_cano_bsj_align
return pb.pairwise_align(bsj_ref_seq, bsj_read_seq, 'g', True)
File "/usr/local/bin/miniconda3/lib/python3.10/site-packages/isocirc/parse_bam.py", line 103, in pairwise_align
return res.score, get_cigar_from_pairwise_res(r.format())
File "/usr/local/bin/miniconda3/lib/python3.10/site-packages/isocirc/parse_bam.py", line 80, in get_cigar_from_pairwise_res
cigartuples.append((cigar_op_dict[op], 1))
UnboundLocalError: local variable 'op' referenced before assignment

@yangao07
Copy link
Collaborator

This actually looks very weird.
Can you upload your data here? Both long reads and annotation file.
So that I can try to track this error.

@sidizhao
Copy link
Author

sidizhao commented Feb 10, 2023

@sidizhao
Copy link
Author

I tried again with the newest push pip install isocirc==1.0.6a0 and the same error persists for this file.

@sidizhao
Copy link
Author

Hi, just to follow up on this issue. Were you able to take a look at what could've potentially trigged this error?

@yangao07
Copy link
Collaborator

Which circRNA bed file did you use as input?

@sidizhao
Copy link
Author

@yangao07
Copy link
Collaborator

I don't see the error msg with this command:

isocirc /home/gaoy1/sdata/isocirc_debug/hct116_FAR20705_nanopore_long_corrected.14.fa /home/gaoy1/data/genome/hg38/hg38.fa /home/gaoy1/sdata/isocirc_debug/debug.gtf /home/gaoy1/sdata/isocirc_debug/HCT116_short_read_annotation.bed /home/gaoy1/sdata/isocirc_debug/output -t32

Seems very weird.

@sidizhao
Copy link
Author

Yeah I don't quite understand why it would generate an error because other parts of the fasta file have successfully completed running. Do you have an inkling of why that specific "UnboundLocalError: local variable 'op' referenced before assignment" would happen? In the meantime I will also try to ask the IT people maintaining the cluster and see if it's on our end.

@yangao07
Copy link
Collaborator

Can you try to re-install the isocirc from the latest source (not the pip install)?
And re-run it on this dataset. I added some error msg related to this error.

@sidizhao
Copy link
Author

Alright I'll get back to you. I've been running it on a docker image I built. Will change pip to git and try again.

@sidizhao
Copy link
Author

sidizhao commented Feb 16, 2023

== 22:04:10-Feb-15-2023 == [read_wise_eval] Generating read-wise evaluation result ...
== 22:04:10-Feb-15-2023 == [get_cigar_from_pairwise_res] Unexpected alignment string: target TCATAAAACGTTACTTAAAA 0.

It now shows this.

I tried to look for "TCATAAAACGTTACTTAAAA" in any of the intermediate files and it's not showing up.

@yangao07
Copy link
Collaborator

Can you try pip show biopython ?
Seems like you are using the old version of biopython.

@sidizhao
Copy link
Author

Name: biopython
Version: 1.78

@yangao07
Copy link
Collaborator

The new version requires biopython >= 1.79. This is why the error come up.

@sidizhao
Copy link
Author

Should I specify that when I build the docker? The only ones I had installed other than isocirc were bedtools and minimap2.

@yangao07
Copy link
Collaborator

I am not familiar with docker.
Usually, there should be no problem since it is listed in the requirement.txt.
You can try to re-install every thing.

@sidizhao
Copy link
Author

Yeah I think the docker image is still pulling the local 1.78 version for some reason. I'm working on fixing that. Hopefully this will fix everything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants