Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not detectable #387

Open
tebioinformatics opened this issue Mar 25, 2024 · 3 comments
Open

Not detectable #387

tebioinformatics opened this issue Mar 25, 2024 · 3 comments

Comments

@tebioinformatics
Copy link

Hi!

I am doing research using RNA-Seq at my university.
I was interested in this rMATS analysis and actually performed it.

This is the code I used
#######################
python rmats.py --b1 /mnt/k/Shortread_version111/Nakagawa/rmats/sample_list/control.txt
--b2 /mnt/k/Shortread_version111/Nakagawa/rmats/sample_list/Expansion.txt
--gtf /mnt/c/Ubuntu/reference_111_human/Homo_sapiens.GRCh38.111.gtf
--nthread 32
-t paired
--readLength 85
--allow-clipping
--od /mnt/k/Shortread_version111/Nakagawa/rmats/after_rmats/Control_vs_Expansion
--tmp /mnt/k/Shortread_version111/Nakagawa/rmats/rmats_tmp

gtf: 19.037578344345093
There are 63241 distinct gene ID in the gtf file
There are 252989 distinct transcript ID in the gtf file
There are 39483 one-transcript genes in the gtf file
There are 1650905 exons in the gtf file
There are 26151 one-exon transcripts in the gtf file
There are 23792 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 4.000395
Average number of exons per transcript is 6.525600
Average number of exons per transcript excluding one-exon tx is 7.162618
Average number of gene per geneGroup is 8.690959
statistic: 0.019989490509033203

read outcome totals across all BAMs
USED: 3375138
NOT_PAIRED: 725
NOT_NH_1: 1322994872
NOT_EXPECTED_CIGAR: 15539258
NOT_EXPECTED_READ_LENGTH: 767579045
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 688350
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 8003
CLIPPED: 0
total: 2110185391
outcomes by BAM written to: /mnt/k/Shortread_version111/Nakagawa/rmats/rmats_tmp/2024-03-25-14_47_51_615864_read_outcomes_by_bam.txt

novel: 594.8329858779907
The splicing graph and candidate read have been saved into /mnt/k/Shortread_version111/Nakagawa/rmats/rmats_tmp/2024-03-25-14_47_51_615864_*.rmats
save: 0.9123458862304688
loadsg: 0.06289386749267578

==========
Done processing each gene from dictionary to compile AS events
Found 58886 exon skipping events
Found 4675 exon MX events
Found 22200 alt SS events
There are 13447 alt 3 SS events and 8753 alt 5 SS events.
Found 9800 RI events

ase: 2.5728988647460938
count: 3.155134677886963
Processing count files.
Done processing count files.

I checked the contents of summary.txt, but the results did not confirm any significant differences as shown here
EventType EventTypeDescription TotalEventsJC TotalEventsJCEC SignificantEventsJC SigEventsJCSample1HigherInclusion SigEventsJCSample2HigherInclusion SignificantEventsJCEC SigEventsJCECSample1HigherInclusion SigEventsJCECSample2HigherInclusion
SE skipped exon 611 832 0 0 0 0 0 0
A5SS alternative 5' splice sites 105 131 0 0 0 0 0 0
A3SS alternative 3' splice sites 191 233 0 0 0 0 0 0
MXE mutually exclusive exons 102 188 0 0 0 0 0 0
RI retained intron 351 623 0 0 0 0 0 0

I do the mapping by fastp and STAR and then run rMATS.
And I did it by setting readlength to the average value of bp after fastp.

I don't know what more to do.

Can you help me?

Thank you!!

@EricKutschera
Copy link
Contributor

From the read outcome section:

USED: 3375138
NOT_PAIRED: 725
NOT_NH_1: 1322994872
NOT_EXPECTED_CIGAR: 15539258
NOT_EXPECTED_READ_LENGTH: 767579045
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 688350
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 8003
CLIPPED: 0
total: 2110185391

Only about 0.16% of alignments were USED. About 36.37% were filtered out due to NOT_EXPECTED_READ_LENGTH. You mentioned that you used an average read length. In that case you can disable the read length filter with --variable-read-length.

@tebioinformatics
Copy link
Author

Thank you so much.

(base) tebio@DESKTOP-7LSS577:~/rmats_turbo_v4_2_0$ python rmats.py --b1 /mnt/k/Shortread_version111/Nakagawa/rmats/sample_list/control.txt
--b2 /mnt/k/Shortread_version111/Nakagawa/rmats/sample_list/Expansion.txt
--gtf /mnt/c/Ubuntu/reference_111_human/Homo_sapiens.GRCh38.111.gtf
--nthread 32
-t paired
--novelSS
--readLength 100
--allow-clipping
--od /mnt/k/Shortread_version111/Nakagawa/rmats/after_rmats/Control_vs_Expansion
--tmp /mnt/k/Shortread_version111/Nakagawa/rmats/rmats_tmp
--variable-read-length

gtf: 22.94182014465332
There are 63241 distinct gene ID in the gtf file
There are 252989 distinct transcript ID in the gtf file
There are 39483 one-transcript genes in the gtf file
There are 1650905 exons in the gtf file
There are 26151 one-exon transcripts in the gtf file
There are 23792 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 4.000395
Average number of exons per transcript is 6.525600
Average number of exons per transcript excluding one-exon tx is 7.162618
Average number of gene per geneGroup is 8.690959
statistic: 0.020065784454345703

read outcome totals across all BAMs
USED: 655137821
NOT_PAIRED: 725
NOT_NH_1: 1322994872
NOT_EXPECTED_CIGAR: 15539258
NOT_EXPECTED_READ_LENGTH: 0
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 115344396
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 1168319
CLIPPED: 0
total: 2110185391
outcomes by BAM written to: /mnt/k/Shortread_version111/Nakagawa/rmats/rmats_tmp/2024-03-25-18_26_48_402930_read_outcomes_by_bam.txt

novel: 622.9113461971283
The splicing graph and candidate read have been saved into /mnt/k/Shortread_version111/Nakagawa/rmats/rmats_tmp/2024-03-25-18_26_48_402930_*.rmats
save: 106.44097113609314
loadsg: 0.4102354049682617

==========
Done processing each gene from dictionary to compile AS events
Found 136236 exon skipping events
Found 22873 exon MX events
Found 78181 alt SS events
There are 44637 alt 3 SS events and 33544 alt 5 SS events.
Found 19704 RI events

ase: 11.792859554290771
count: 47.57935094833374
Processing count files.
Done processing count files.

I ran it again, can you say this is going well?

Furthermore, is novel spilising also included in the MATS file?

@EricKutschera
Copy link
Contributor

That output seems fine. Yes, the novel events are in the MATS file. This post has some details about distinguishing the novel splicing events: #210 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants