Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue of not generating out file #5

Open
braveagle0 opened this issue Jul 1, 2021 · 12 comments
Open

Issue of not generating out file #5

braveagle0 opened this issue Jul 1, 2021 · 12 comments

Comments

@braveagle0
Copy link

I tried to run isocirc with test data. It worked great! However, when I tried to run isocirc with my own data, it did not generate isocirc.out, isocirc_stats.out or isocirc.bed. I downloaded the fa data from ensembl (http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ) and the gtf file also from ensembl (http://ftp.ensembl.org/pub/release-104/gtf/homo_sapiens/Homo_sapiens.GRCh38.104.gtf.gz). The circRNA bed file was downloaded from http://circatlas.biols.ac.cn/.

The output file contains the following files:
cons.fa cons.fa.sam high.bam Homo_sapiens.GRCh38.104.gtf.gene_pred TotalRNAonly.fa.len
cons.fa.fai cons.info Homo_sapiens.GRCh38.104.gtf.bed low.bam trf.out

Thanks for help!

@yangao07
Copy link
Collaborator

yangao07 commented Jul 2, 2021

Do you have the log information file?
That will help me find out why.

Yan

@braveagle0
Copy link
Author

[M::mm_idx_gen::628.7940.62] collected minimizers
[M::mm_idx_gen::703.737
0.70] sorted minimizers
[M::main::703.7400.70] loaded/built the index for 194 target sequence(s)
[M::mm_mapopt_update::714.095
0.70] mid_occ = 765
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 194
[M::mm_idx_stat::716.1330.70] distinct minimizers: 167225302 (35.46% are singletons); average occurrences: 6.030; average spacing: 3.074
[M::worker_pipeline::2221.021
4.31] mapped 188232 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -ax splice -ub --MD --eqx -t 8 /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa output2/cons.fa
[M::main] Real time: 2221.478 sec; CPU: 9580.517 sec; Peak RSS: 20.286 GB
[E::idx_find_and_load] Could not retrieve index file for 'output2/high.bam'
[E::idx_find_and_load] Could not retrieve index file for 'output2/high.bam'
[E::idx_find_and_load] Could not retrieve index file for 'output2/high.bam'
[E::idx_find_and_load] Could not retrieve index file for 'output2/high.bam'
== 16:43:39-Jul-01-2021 == [check_dependencies] Checking dependencies ...
== 16:43:41-Jul-01-2021 == [check_dependencies] Checking dependencies done!
== 16:43:41-Jul-01-2021 == [Tandem-Repeats-Finder] Finding tandem repeats with TRF ...
== 16:43:41-Jul-01-2021 == [fxtools] fxtools sx TotalRNAonly.fa 8 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/
== 16:44:30-Jul-01-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.2 2 7 7 80 10 100 2000 -h -ngs > output2/trf.out.2; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.2
== 16:44:30-Jul-01-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.1 2 7 7 80 10 100 2000 -h -ngs > output2/trf.out.1; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.1
== 16:44:30-Jul-01-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.3 2 7 7 80 10 100 2000 -h -ngs > output2/trf.out.3; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.3
== 16:44:30-Jul-01-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.4 2 7 7 80 10 100 2000 -h -ngs > output2/trf.out.4; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.4
== 16:44:30-Jul-01-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.5 2 7 7 80 10 100 2000 -h -ngs > output2/trf.out.5; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.5
== 16:44:30-Jul-01-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.6 2 7 7 80 10 100 2000 -h -ngs > output2/trf.out.6; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.6
== 16:44:30-Jul-01-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.7 2 7 7 80 10 100 2000 -h -ngs > output2/trf.out.7; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.7
== 16:44:30-Jul-01-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.8 2 7 7 80 10 100 2000 -h -ngs > output2/trf.out.8; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.8
== 17:10:51-Jul-01-2021 == [Tandem Repeats Finder] cat output2/trf.out.1 >> output2/trf.out; rm output2/trf.out.1
== 17:10:59-Jul-01-2021 == [Tandem Repeats Finder] cat output2/trf.out.2 >> output2/trf.out; rm output2/trf.out.2
== 17:11:06-Jul-01-2021 == [Tandem Repeats Finder] cat output2/trf.out.3 >> output2/trf.out; rm output2/trf.out.3
== 17:11:10-Jul-01-2021 == [Tandem Repeats Finder] cat output2/trf.out.4 >> output2/trf.out; rm output2/trf.out.4
== 17:11:15-Jul-01-2021 == [Tandem Repeats Finder] cat output2/trf.out.5 >> output2/trf.out; rm output2/trf.out.5
== 17:11:18-Jul-01-2021 == [Tandem Repeats Finder] cat output2/trf.out.6 >> output2/trf.out; rm output2/trf.out.6
== 17:11:21-Jul-01-2021 == [Tandem Repeats Finder] cat output2/trf.out.7 >> output2/trf.out; rm output2/trf.out.7
== 17:11:28-Jul-01-2021 == [Tandem Repeats Finder] cat output2/trf.out.8 >> output2/trf.out; rm output2/trf.out.8
== 17:11:30-Jul-01-2021 == [fxtools] fxtools lp TotalRNAonly.fa > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/TotalRNAonly.fa.len 2> /dev/null
== 17:15:38-Jul-01-2021 == [Tandem-Repeats-Finder] Finding tandem repeats with TRF done!
== 17:15:38-Jul-01-2021 == [Mapping] Mapping consensus sequence to genome ...
== 17:15:38-Jul-01-2021 == [Mapping] minimap2 -ax splice -ub --MD --eqx /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa output2/cons.fa -t 8 > output2/cons.fa.sam
== 17:52:42-Jul-01-2021 == [Mapping] Mapping consensus sequence to genome done!
== 17:52:42-Jul-01-2021 == [Classifying] Classifying consensus alignment ...
== 17:52:42-Jul-01-2021 == [classify_bam_core] Processing output2/cons.fa.sam ...
== 17:52:53-Jul-01-2021 == [classify_bam_core] 100000 BAM records done ...
== 17:53:05-Jul-01-2021 == [classify_bam_core] 200000 BAM records done ...
== 17:53:16-Jul-01-2021 == [classify_bam_core] 300000 BAM records done ...
== 17:53:30-Jul-01-2021 == [classify_bam_core] 400000 BAM records done ...
== 17:53:46-Jul-01-2021 == [classify_bam_core] 500000 BAM records done ...
== 17:53:53-Jul-01-2021 == [classify_bam_core] Processing output2/cons.fa.sam done.
== 17:53:53-Jul-01-2021 == [Classifying] Classifying consensus alignment done!
== 17:54:06-Jul-01-2021 == [gtfToGenePred] gtfToGenePred -genePredExt -ignoreGroupsWithoutExons /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/Homo_sapiens.GRCh38.104.gtf.gene_pred
== 17:55:06-Jul-01-2021 == [genePredToBed] genePredToBed /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/Homo_sapiens.GRCh38.104.gtf.gene_pred /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/Homo_sapiens.GRCh38.104.gtf.bed
== 17:55:08-Jul-01-2021 == [get_transcript_from_bed12] Loading transcript from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/Homo_sapiens.GRCh38.104.gtf.gene_pred ...
== 17:55:24-Jul-01-2021 == [get_transcript_from_gene_pred] Loading transcript from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/Homo_sapiens.GRCh38.104.gtf.gene_pred done!
== 17:55:24-Jul-01-2021 == [get_splice_site_from_bed12] Loading splice site from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/Homo_sapiens.GRCh38.104.gtf.bed ...
== 17:55:37-Jul-01-2021 == [get_splice_site_from_bed12] Loading splice site from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/Homo_sapiens.GRCh38.104.gtf.bed done!
== 17:55:37-Jul-01-2021 == [get_splice_junction_from_bed12] Loading splice junction from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/Homo_sapiens.GRCh38.104.gtf.bed ...
== 17:55:46-Jul-01-2021 == [get_splice_junction_from_bed12] Loading splice junction from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/Homo_sapiens.GRCh38.104.gtf.bed done!
== 17:55:46-Jul-01-2021 == [get_exon_from_bed12] Loading exon from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/Homo_sapiens.GRCh38.104.gtf.bed ...
== 17:55:55-Jul-01-2021 == [get_exon_from_bed12] Loading exon from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output2/Homo_sapiens.GRCh38.104.gtf.bed done!
== 17:55:55-Jul-01-2021 == [get_back_splice_junction_from_bed] Loading splice junction from /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/human_circRNA_v2.0.bed ...
Traceback (most recent call last):
File "/home/guans/bin/anaconda3/bin/isocirc", line 219, in
main()
File "/home/guans/bin/anaconda3/bin/isocirc", line 216, in main
isocirc_core(args)
File "/home/guans/bin/anaconda3/bin/isocirc", line 135, in isocirc_core
isoform_out, bed_out, stats_out)
File "/home/guans/bin/anaconda3/lib/python3.7/site-packages/isocirc/hcBSJ_fullIso.py", line 782, in hcBSJ_fullIso
circ_sj.append(pg.get_back_splice_junction_from_bed(circ_anno_bed, high_bam))
File "/home/guans/bin/anaconda3/lib/python3.7/site-packages/isocirc/parse_gff.py", line 291, in get_back_splice_junction_from_bed
start = int(ele[bed_header['chromStart']])
ValueError: invalid literal for int() with base 10: 'Start'

@braveagle0
Copy link
Author

Please see the log above and help! Thanks a lot!

@yangao07
Copy link
Collaborator

yangao07 commented Jul 3, 2021

Based on the error message

ValueError: invalid literal for int() with base 10: 'Start'

Your bed file human_circRNA_v2.0.bed may have a header line and it should be removed.

@braveagle0
Copy link
Author

I removed the head and ran the isocirc again. Here is the log info
"[M::mm_idx_gen::130.0861.12] collected minimizers
[M::mm_idx_gen::140.759
1.61] sorted minimizers
[M::main::140.7631.61] loaded/built the index for 194 target sequence(s)
[M::mm_mapopt_update::143.284
1.60] mid_occ = 765
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 194
[M::mm_idx_stat::145.2681.59] distinct minimizers: 167225302 (35.46% are singletons); average occurrences: 6.030; average spacing: 3.074
[M::worker_pipeline::1236.486
7.06] mapped 188232 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -ax splice -ub --MD --eqx -t 8 /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa output_no_head/cons.fa
[M::main] Real time: 1236.718 sec; CPU: 8725.111 sec; Peak RSS: 22.318 GB
[E::idx_find_and_load] Could not retrieve index file for 'output_no_head/high.bam'
[E::idx_find_and_load] Could not retrieve index file for 'output_no_head/high.bam'
[E::idx_find_and_load] Could not retrieve index file for 'output_no_head/high.bam'
[E::idx_find_and_load] Could not retrieve index file for 'output_no_head/high.bam'
[E::idx_find_and_load] Could not retrieve index file for 'output_no_head/high.bam'
[E::idx_find_and_load] Could not retrieve index file for 'output_no_head/low.bam'
== 10:51:32-Jul-12-2021 == [check_dependencies] Checking dependencies ...
== 10:51:33-Jul-12-2021 == [check_dependencies] Checking dependencies done!
== 10:51:33-Jul-12-2021 == [Tandem-Repeats-Finder] Finding tandem repeats with TRF ...
== 10:51:33-Jul-12-2021 == [fxtools] fxtools sx TotalRNAonly.fa 8 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/
== 10:52:03-Jul-12-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.1 2 7 7 80 10 100 2000 -h -ngs > output_no_head/trf.out.1; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.1
== 10:52:03-Jul-12-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.2 2 7 7 80 10 100 2000 -h -ngs > output_no_head/trf.out.2; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.2
== 10:52:03-Jul-12-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.3 2 7 7 80 10 100 2000 -h -ngs > output_no_head/trf.out.3; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.3
== 10:52:03-Jul-12-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.4 2 7 7 80 10 100 2000 -h -ngs > output_no_head/trf.out.4; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.4
== 10:52:03-Jul-12-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.5 2 7 7 80 10 100 2000 -h -ngs > output_no_head/trf.out.5; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.5
== 10:52:03-Jul-12-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.6 2 7 7 80 10 100 2000 -h -ngs > output_no_head/trf.out.6; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.6
== 10:52:03-Jul-12-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.7 2 7 7 80 10 100 2000 -h -ngs > output_no_head/trf.out.7; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.7
== 10:52:03-Jul-12-2021 == [Tandem Repeats Finder] trf409.legacylinux64 /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.8 2 7 7 80 10 100 2000 -h -ngs > output_no_head/trf.out.8; rm /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.8
== 11:17:41-Jul-12-2021 == [Tandem Repeats Finder] cat output_no_head/trf.out.1 >> output_no_head/trf.out; rm output_no_head/trf.out.1
== 11:17:46-Jul-12-2021 == [Tandem Repeats Finder] cat output_no_head/trf.out.2 >> output_no_head/trf.out; rm output_no_head/trf.out.2
== 11:17:50-Jul-12-2021 == [Tandem Repeats Finder] cat output_no_head/trf.out.3 >> output_no_head/trf.out; rm output_no_head/trf.out.3
== 11:17:51-Jul-12-2021 == [Tandem Repeats Finder] cat output_no_head/trf.out.4 >> output_no_head/trf.out; rm output_no_head/trf.out.4
== 11:17:53-Jul-12-2021 == [Tandem Repeats Finder] cat output_no_head/trf.out.5 >> output_no_head/trf.out; rm output_no_head/trf.out.5
== 11:17:56-Jul-12-2021 == [Tandem Repeats Finder] cat output_no_head/trf.out.6 >> output_no_head/trf.out; rm output_no_head/trf.out.6
== 11:18:00-Jul-12-2021 == [Tandem Repeats Finder] cat output_no_head/trf.out.7 >> output_no_head/trf.out; rm output_no_head/trf.out.7
== 11:18:02-Jul-12-2021 == [Tandem Repeats Finder] cat output_no_head/trf.out.8 >> output_no_head/trf.out; rm output_no_head/trf.out.8
== 11:18:04-Jul-12-2021 == [fxtools] fxtools lp TotalRNAonly.fa > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/TotalRNAonly.fa.len 2> /dev/null
== 11:22:30-Jul-12-2021 == [Tandem-Repeats-Finder] Finding tandem repeats with TRF done!
== 11:22:30-Jul-12-2021 == [Mapping] Mapping consensus sequence to genome ...
== 11:22:30-Jul-12-2021 == [Mapping] minimap2 -ax splice -ub --MD --eqx /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa output_no_head/cons.fa -t 8 > output_no_head/cons.fa.sam
== 11:43:09-Jul-12-2021 == [Mapping] Mapping consensus sequence to genome done!
== 11:43:09-Jul-12-2021 == [Classifying] Classifying consensus alignment ...
== 11:43:09-Jul-12-2021 == [classify_bam_core] Processing output_no_head/cons.fa.sam ...
== 11:43:30-Jul-12-2021 == [classify_bam_core] 100000 BAM records done ...
== 11:43:51-Jul-12-2021 == [classify_bam_core] 200000 BAM records done ...
== 11:44:12-Jul-12-2021 == [classify_bam_core] 300000 BAM records done ...
== 11:44:27-Jul-12-2021 == [classify_bam_core] 400000 BAM records done ...
== 11:44:44-Jul-12-2021 == [classify_bam_core] 500000 BAM records done ...
== 11:44:51-Jul-12-2021 == [classify_bam_core] Processing output_no_head/cons.fa.sam done.
== 11:44:51-Jul-12-2021 == [Classifying] Classifying consensus alignment done!
== 11:45:03-Jul-12-2021 == [gtfToGenePred] gtfToGenePred -genePredExt -ignoreGroupsWithoutExons /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.gene_pred
== 11:46:12-Jul-12-2021 == [genePredToBed] genePredToBed /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.gene_pred /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.bed
== 11:46:15-Jul-12-2021 == [get_transcript_from_bed12] Loading transcript from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.gene_pred ...
== 11:46:28-Jul-12-2021 == [get_transcript_from_gene_pred] Loading transcript from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.gene_pred done!
== 11:46:28-Jul-12-2021 == [get_splice_site_from_bed12] Loading splice site from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.bed ...
== 11:46:42-Jul-12-2021 == [get_splice_site_from_bed12] Loading splice site from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.bed done!
== 11:46:42-Jul-12-2021 == [get_splice_junction_from_bed12] Loading splice junction from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.bed ...
== 11:46:53-Jul-12-2021 == [get_splice_junction_from_bed12] Loading splice junction from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.bed done!
== 11:46:53-Jul-12-2021 == [get_exon_from_bed12] Loading exon from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.bed ...
== 11:47:03-Jul-12-2021 == [get_exon_from_bed12] Loading exon from /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.bed done!
== 11:47:03-Jul-12-2021 == [get_back_splice_junction_from_bed] Loading splice junction from /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/human_circRNA_v2.0.bed ...
== 11:47:11-Jul-12-2021 == [get_back_splice_junction_from_bed] Loading splice junction from /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/human_circRNA_v2.0.bed done!
== 11:47:13-Jul-12-2021 == [read_wise_eval] Generating read-wise evaluation result ...
== 11:58:03-Jul-12-2021 == [high_quality] 100000 high mapping quality BAM records have been processed ...
== 12:03:09-Jul-12-2021 == [read_wise_eval] Generating read-wise evaluation result done!
== 12:03:09-Jul-12-2021 == [filter_circRNA_read] Filtering back-splice-junctions ...
== 12:03:13-Jul-12-2021 == [filter_circRNA_read] Filtering back-splice-junctions done!
== 12:03:13-Jul-12-2021 == [rescue_reads] Rescuing reads using reliable back-splice-junctions ...
== 12:03:19-Jul-12-2021 == [rescue_reads] Rescuing reads using reliable back-splice-junctions done!
== 12:03:19-Jul-12-2021 == [uniq_isoform_with_unsorted_coors] Generating isoform-wise evaluation result ...
== 12:03:19-Jul-12-2021 == [uniq_isoform_with_unsorted_coors] Generating isoform-wise evaluation result done!
== 12:03:19-Jul-12-2021 == [bed2exonGtf] bed2exonGtf output_no_head/isocirc.bed output_no_head/isocirc.bed.exon.gtf
== 12:03:20-Jul-12-2021 == [exonGtf] awk -v OFS="\t" '($3=="exon"){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.exon.gtf
== 12:03:46-Jul-12-2021 == [gtf2bed] awk -v OFS="\t" '($3=="gene"){print $1,$4-1,$5}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.gene.bed
== 12:04:04-Jul-12-2021 == [gtf2bed] awk -v OFS="\t" '($3=="CDS"){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.cds.gtf
== 12:04:23-Jul-12-2021 == [gtf2bed] awk -v OFS="\t" '($3=="UTR" || $3=="five_prime_utr" || $3=="three_prime_utr"){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.utr.gtf
== 12:04:40-Jul-12-2021 == [gtf2bed] awk -v OFS="\t" '($3=="exon" && ($0 ~ /gene_biotype "lincRNA"/ || $0 ~ /gene_type "lincRNA"/)){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.lincRNA.gtf
== 12:05:10-Jul-12-2021 == [gtf2bed] awk -v OFS="\t" '($3=="exon" && ($0 ~ /gene_biotype "antisense"/ || $0 ~ /gene_type "antisense"/)){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.antisense.gtf
== 12:05:34-Jul-12-2021 == [gtf2bed] awk -v OFS="\t" '($3=="exon" && ($0 ~ /gene_biotype "rRNA"/ || $0 ~ /gene_type "rRNA"/)){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.rRNA.gtf
== 12:07:43-Jul-12-2021 == [bed2exonGtf] bed2exonGtf output_no_head/isocirc.bed.five.site.bed output_no_head/isocirc.bed.five.site.exon.gtf
== 12:07:45-Jul-12-2021 == [bed2exonGtf] bed2exonGtf output_no_head/isocirc.bed.three.site.bed output_no_head/isocirc.bed.three.site.exon.gtf
== 12:07:46-Jul-12-2021 == [bed2exonGtf] bed2exonGtf /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.five.site.bed /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.five.site.exon.gtf
== 12:07:52-Jul-12-2021 == [bed2exonGtf] bed2exonGtf /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.three.site.bed /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.three.site.exon.gtf
== 12:07:56-Jul-12-2021 == [itst_gtf_gtf] itst_gtf_gtf output_no_head/isocirc.bed.five.site.exon.gtf /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.five.site.exon.gtf output_no_head/isocirc.bed.five.site.gene.out
== 12:08:03-Jul-12-2021 == [itst_gtf_gtf] itst_gtf_gtf output_no_head/isocirc.bed.three.site.exon.gtf /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_no_head/Homo_sapiens.GRCh38.104.gtf.three.site.exon.gtf output_no_head/isocirc.bed.three.site.gene.out
== 12:08:09-Jul-12-2021 == [gtf2gene] gtf2gene output_no_head/isocirc.bed.exon.gtf /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf output_no_head/isocirc.bed.ovlp.gene.out
Traceback (most recent call last):
File "/home/guans/bin/anaconda3/bin/isocirc", line 219, in
main()
File "/home/guans/bin/anaconda3/bin/isocirc", line 216, in main
isocirc_core(args)
File "/home/guans/bin/anaconda3/bin/isocirc", line 135, in isocirc_core
isoform_out, bed_out, stats_out)
File "/home/guans/bin/anaconda3/lib/python3.7/site-packages/isocirc/hcBSJ_fullIso.py", line 826, in hcBSJ_fullIso
itst_out_dict = intersect_with_bed(out_dir, circRNA_bed, all_anno, all_anno_bed, itst_anno_dict, flank_len, bedtools)
File "/home/guans/bin/anaconda3/lib/python3.7/site-packages/isocirc/hcBSJ_fullIso.py", line 414, in intersect_with_bed
get_ovlp_gene_name_id(ovlp_gene_name_id, gene_id_dict, gene_name_dict, gene_strand_dict)
File "/home/guans/bin/anaconda3/lib/python3.7/site-packages/isocirc/hcBSJ_fullIso.py", line 214, in get_ovlp_gene_name_id
strand_dict[ele[0]] = ele[3] if strand_dict[ele[0]] == 'NA' else strand_dict[ele[0]] + ',' + ele[3]
IndexError: list index out of range
"
Would you please help?

Thanks!

@yangao07
Copy link
Collaborator

Can you show me a few lines of output_no_head/isocirc.bed.ovlp.gene.out?

@braveagle0
Copy link
Author

Here is a few lines of the file:
"isocirc0 ENSG00000230021 -
isocirc10000 ENSG00000204390 HSPA1L -
isocirc10001 ENSG00000204371 EHMT2 -
isocirc10002 ENSG00000213676 ATF6B -
isocirc10003 ENSG00000213676 ATF6B -
isocirc10004 ENSG00000223501 VPS52 -
isocirc10005 ENSG00000124493 GRM4 -
isocirc10006 ENSG00000124493 GRM4 -
isocirc10007 ENSG00000124493 GRM4 -
isocirc10008 ENSG00000124493 GRM4 -
isocirc10009 ENSG00000270800 RPS10-NUDT3 -
isocirc10009 ENSG00000272325 NUDT3 -
isocirc1000 ENSG00000143473 KCNH1 -
isocirc1000 ENSG00000283952 -
isocirc1000 ENSG00000284299 -
isocirc10010 ENSG00000124507 PACSIN1 +"

@yangao07
Copy link
Collaborator

Seems like some of the genes in your GTF file do not have a gene name.
Can you type in grep ENSG00000230021 Homo_sapiens.GRCh38.104.gtf and paste the output here?

@braveagle0
Copy link
Author

1 havana transcript 720053 724564 . - . gene_id "ENSG00000230021"; gene_version "10"; transcript_id "ENST00000447954"; transcrip t_version "2"; gene_source "havana"; gene_biotype "transcribed_processed_pseudog ene"; transcript_source "havana"; transcript_biotype "processed_transcript"; tra nscript_support_level "2 (assigned to previous version 1)";
1 havana exon 724358 724564 . - . gene_id "ENSG000 00230021"; gene_version "10"; transcript_id "ENST00000447954"; transcript_versio n "2"; exon_number "1"; gene_source "havana"; gene_biotype "transcribed_processe d_pseudogene"; transcript_source "havana"; transcript_biotype "processed_transcr ipt"; exon_id "ENSE00001688006"; exon_version "2"; transcript_support_level "2 ( assigned to previous version 1)";
1 havana exon 720053 720200 . - . gene_id "ENSG000 00230021"; gene_version "10"; transcript_id "ENST00000447954"; transcript_versio n "2"; exon_number "2"; gene_source "havana"; gene_biotype "transcribed_processe d_pseudogene"; transcript_source "havana"; transcript_biotype "processed_transcr ipt"; exon_id "ENSE00001675630"; exon_version "2"; transcript_support_level "2 ( assigned to previous version 1)";
[guans@login-0-1 GRCh38]$ grep ENSG00000230021 Homo_sapiens.GRCh38.104.gtf
1 havana gene 586071 827796 . - . gene_id "ENSG00000230021"; gene_version "10"; gene_source "havana"; gene_biotype "transcribed_processed_pseudogene";
1 havana transcript 586071 612813 . - . gene_id "ENSG00000230021"; gene_version "10"; transcript_id "ENST00000634833"; transcript_version "2"; gene_source "havana"; gene_biotype "transcribed_processed_pseudogene"; transcript_source "havana"; transcript_biotype "processed_transcript"; tag "basic"; transcript_support_level "5 (assigned to previous version 1)";
1 havana exon 612741 612813 . - . gene_id "ENSG00000230021"; gene_version "10"; transcript_id "ENST00000634833"; transcript_version "2"; exon_number "1"; gene_source "havana"; gene_biotype "transcribed_processed_pseudogene"; transcript_source "havana"; transcript_biotype "processed_transcript"; exon_id "ENSE00003812707"; exon_version "1"; tag "basic"; transcript_support_level "5 (assigned to previous version 1)";
1 havana exon 607955 608056 . - . gene_id "ENSG00000230021"; gene_version "10"; transcript_id "ENST00000634833"; transcript_version "2"; exon_number "2"; gene_source "havana"; gene_biotype "transcribed_processed_pseudogene"; transcript_source "havana"; transcript_biotype "processed_transcript"; exon_id "ENSE00001718533"; exon_version "1"; tag "basic"; transcript_support_level "5 (assigned to previous version 1)";

@yangao07
Copy link
Collaborator

I see.
Your GTF file has no "gene_name" tags, this is why isoCirc met an error.

I just updated the related script.
You can try the latest version of isoCirc (v1.0.4), it should work now.

@braveagle0
Copy link
Author

I tried v1.0.4 and still encounter some errors.
"== 12:24:05-Jul-15-2021 == [read_wise_eval] Generating read-wise evaluation result ...
== 12:37:01-Jul-15-2021 == [high_quality] 100000 high mapping quality BAM records have been processed ...
== 12:43:43-Jul-15-2021 == [read_wise_eval] Generating read-wise evaluation result done!
== 12:43:43-Jul-15-2021 == [filter_circRNA_read] Filtering back-splice-junctions ...
== 12:43:47-Jul-15-2021 == [filter_circRNA_read] Filtering back-splice-junctions done!
== 12:43:47-Jul-15-2021 == [rescue_reads] Rescuing reads using reliable back-splice-junctions ...
== 12:43:52-Jul-15-2021 == [rescue_reads] Rescuing reads using reliable back-splice-junctions done!
== 12:43:52-Jul-15-2021 == [uniq_isoform_with_unsorted_coors] Generating isoform-wise evaluation result ...
== 12:43:53-Jul-15-2021 == [uniq_isoform_with_unsorted_coors] Generating isoform-wise evaluation result done!
== 12:43:53-Jul-15-2021 == [bed2exonGtf] bed2exonGtf output_104/isocirc.bed output_104/isocirc.bed.exon.gtf
== 12:43:56-Jul-15-2021 == [exonGtf] awk -v OFS="\t" '($3=="exon"){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.exon.gtf
== 12:44:39-Jul-15-2021 == [gtf2bed] awk -v OFS="\t" '($3=="gene"){print $1,$4-1,$5}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.gene.bed
== 12:45:26-Jul-15-2021 == [gtf2bed] awk -v OFS="\t" '($3=="CDS"){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.cds.gtf
== 12:46:09-Jul-15-2021 == [gtf2bed] awk -v OFS="\t" '($3=="UTR" || $3=="five_prime_utr" || $3=="three_prime_utr"){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.utr.gtf
== 12:46:53-Jul-15-2021 == [gtf2bed] awk -v OFS="\t" '($3=="exon" && ($0 ~ /gene_biotype "lincRNA"/ || $0 ~ /gene_type "lincRNA"/)){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.lincRNA.gtf
== 12:47:42-Jul-15-2021 == [gtf2bed] awk -v OFS="\t" '($3=="exon" && ($0 ~ /gene_biotype "antisense"/ || $0 ~ /gene_type "antisense"/)){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.antisense.gtf
== 12:48:30-Jul-15-2021 == [gtf2bed] awk -v OFS="\t" '($3=="exon" && ($0 ~ /gene_biotype "rRNA"/ || $0 ~ /gene_type "rRNA"/)){print}' /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf > /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.rRNA.gtf
== 12:55:15-Jul-15-2021 == [bed2exonGtf] bed2exonGtf output_104/isocirc.bed.five.site.bed output_104/isocirc.bed.five.site.exon.gtf
== 12:55:19-Jul-15-2021 == [bed2exonGtf] bed2exonGtf output_104/isocirc.bed.three.site.bed output_104/isocirc.bed.three.site.exon.gtf
== 12:55:22-Jul-15-2021 == [bed2exonGtf] bed2exonGtf /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.five.site.bed /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.five.site.exon.gtf
== 12:55:35-Jul-15-2021 == [bed2exonGtf] bed2exonGtf /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.three.site.bed /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.three.site.exon.gtf
== 12:55:46-Jul-15-2021 == [itst_gtf_gtf] itst_gtf_gtf output_104/isocirc.bed.five.site.exon.gtf /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.five.site.exon.gtf output_104/isocirc.bed.five.site.gene.out
== 12:56:12-Jul-15-2021 == [itst_gtf_gtf] itst_gtf_gtf output_104/isocirc.bed.three.site.exon.gtf /mnt/home2/guans/circRNA/Nanopore/RCA_totalRNAonly/fastq_pass/output_104/Homo_sapiens.GRCh38.104.gtf.three.site.exon.gtf output_104/isocirc.bed.three.site.gene.out
== 12:56:38-Jul-15-2021 == [gtf2gene] gtf2gene output_104/isocirc.bed.exon.gtf /mnt/home2/guans/ref_genome/hg38/CircRNA_reference/GRCh38/Homo_sapiens.GRCh38.104.gtf output_104/isocirc.bed.ovlp.gene.out
Traceback (most recent call last):
File "/home/guans/bin/anaconda3/bin/isocirc", line 219, in
main()
File "/home/guans/bin/anaconda3/bin/isocirc", line 216, in main
isocirc_core(args)
File "/home/guans/bin/anaconda3/bin/isocirc", line 135, in isocirc_core
isoform_out, bed_out, stats_out)
File "/home/guans/bin/anaconda3/lib/python3.7/site-packages/isocirc/hcBSJ_fullIso.py", line 826, in hcBSJ_fullIso
itst_out_dict = intersect_with_bed(out_dir, circRNA_bed, all_anno, all_anno_bed, itst_anno_dict, flank_len, bedtools)
File "/home/guans/bin/anaconda3/lib/python3.7/site-packages/isocirc/hcBSJ_fullIso.py", line 414, in intersect_with_bed
get_ovlp_gene_name_id(ovlp_gene_name_id, gene_id_dict, gene_name_dict, gene_strand_dict)
File "/home/guans/bin/anaconda3/lib/python3.7/site-packages/isocirc/hcBSJ_fullIso.py", line 214, in get_ovlp_gene_name_id
strand_dict[ele[0]] = ele[3] if strand_dict[ele[0]] == 'NA' else strand_dict[ele[0]] + ',' + ele[3]
IndexError: list index out of range
"

@braveagle0
Copy link
Author

Do you mind sharing with me where you downloaded your .fa, .gtf and .bed file? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants