Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No intron retention detected in a manually made genome and annotation file #407

Open
circR opened this issue Jun 6, 2024 · 3 comments
Open

Comments

@circR
Copy link

circR commented Jun 6, 2024

Hi,
I tried to quantify the intron rentention level for a transposon gene and manually made a genome file and a gtf file.
IGV visualization of the bam files in the tmp folder shows clear difference of intron rentetion, but the output file from rmats shows 0. The intron there is 41nt.
image

Here's the script I used:
rmats.py --s1 S1.txt --s2 S2.txt
--gtf genome/Tc1/Tc1.test.gtf
--readLength 150
--bi genome/Tc1/STAR_Index
-t paired
--od test/
--tmp tmp/
--variable-read-length
--allow-clipping
--novelSS
--mil 30

The summary.txt shows this:
EventType TotalEventsJC TotalEventsJCEC SignificantEventsJC SigEventsJCSample1HigherInclusion SigEventsJCSample2HigherInclusion SignificantEventsJCEC SigEventsJCECSample1HigherInclusion SigEventsJCECSample2HigherInclusion
SE 0 0 0 0 0 0 0 0
A5SS 0 0 0 0 0 0 0 0
A3SS 0 0 0 0 0 0 0 0
MXE 0 0 0 0 0 0 0 0
RI 0 0 0 0 0 0 0 0

Here's the "read_outcomes_by_bam.txt"
tmp/2024-06-06-19_16_17_795004_bam1_1/Aligned.sortedByCoord.out.bam
USED: 965
NOT_PAIRED: 0
NOT_NH_1: 20
NOT_EXPECTED_CIGAR: 49
NOT_EXPECTED_READ_LENGTH: 0
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 0
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 0
CLIPPED: 0
TOTAL_FOR_BAM: 1034
tmp/2024-06-06-19_16_17_795004_bam2_1/Aligned.sortedByCoord.out.bam
USED: 1893
NOT_PAIRED: 0
NOT_NH_1: 16
NOT_EXPECTED_CIGAR: 110
NOT_EXPECTED_READ_LENGTH: 0
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 0
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 3
CLIPPED: 0
TOTAL_FOR_BAM: 2022

Could you help with this?
Thanks a lot.

@EricKutschera
Copy link
Contributor

rMATS usually needs the gtf to have a transcript with the intron included and another transcript with the intron spliced out. This post has some details: #17

From the IGV screenshot it looks like the gtf might only have a transcript with the intron spliced out (if the top blue line is the entire gene and not a transcript)

@circR
Copy link
Author

circR commented Jun 10, 2024

Hi,
Thanks for the answer.
There was a transcript line for the full length gene in the gtf file.
After I checked the issue #17 , I changed the "transcript" to "exon" and now rmats detects that intron retention.
Does that mean I need to create an exon annotation for each possible intron rentention event?
For example:
image
If multiple exons in a gene are retained, how could I let rmats to detect all these events? Will a full-length transcript annotation plus every exon annotation be enough or I have to annotate every possible new exon produced by intron retention?
Thank you for your help.

@EricKutschera
Copy link
Contributor

I think you'll need to have an exon annotation for each retained intron event that you want rmats to report. The code needs the "exon" that is actually two exons with the intron retained. In that example it looks like there are 4 exons. You could use a gtf that has each of those 4 exons defined and also 3 additional exons which go from the start of one exon to the end of the next: exon1-exon2, exon2-exon3, exon3-exon4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants