Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The value of the four columns "IJC_SAMPLE_1 SJC_SAMPLE_1 IJC_SAMPLE_2 SJC_SAMPLE_2" in the EVENT.MATS.JCEC.txt file is 0, while the value of the five columns "PValue FDR IncLevel1 IncLevel2 IncLevelDifference" five columns have a value of NA. #376

Open
happypiggyzjx opened this issue Mar 20, 2024 · 2 comments

Comments

@happypiggyzjx
Copy link

Hello Mr EricKutschera. I'm sorry to bother you again, but as you said, I added the --statoff command and I'm running the command as follows:

rmats.py --s1 /cold_data/zhaojiaxin/rmats/lab_sfiles/labsf7/lab25.sf7 --s2 /cold_data/zhaojiaxin/rmats/lab_sfiles/labsd6/lab25.sd6 --gtf /hot_warm_data/wangduo/reference/Homo_sapiens.GRCh38.109.gtf --bi /cold_data/zhaojiaxin/genomeIndex -t paired --readLength 100 --nthread 8 --libType fr-unstranded --od /cold_data/zhaojiaxin/rmatscorrect/lab25 --tmp /cold_data/zhaojiaxin/rmatscorrect/lab25 --variable-read-length --statoff

Although the result is no longer just the title, it appears as described in the title, ie: The value of the four columns "IJC_SAMPLE_1 SJC_SAMPLE_1 IJC_SAMPLE_2 SJC_SAMPLE_2" in the EVENT.MATS.JCEC.txt file is 0, while the value of the five columns "PValue FDR IncLevel1 IncLevel2 IncLevelDifference" five columns have a value of NA. In the meantime, I have attached the relevant image at the end of the article for you to check where the error is. The image shows the first 10 lines of the file in question, and the order of the files involved in the image is: SE.MATS.JCEC.txt, RI.MATS.JCEC.txt, MXE.MATS.JCEC.txt, A3SS.MATS.JCEC.txt, A5SS.MATS.JCEC.txt, summary.txt

  1. SE.MATS.JCEC.txt
    image
  2. RI.MATS.JCEC.txt
    image
  3. MXE.MATS.JCEC.txt
    image
  4. A3SS.MATS.JCEC.txt
    image
  5. A5SS.MATS.JCEC.txt
    image
  6. summary.txt
    image

For your information, the following is a brief report of the results given at the end of running the command:
There are 62802 distinct gene ID in the gtf file
There are 252890 distinct transcript ID in the gtf file
There are 38782 one-transcript genes in the gtf file
There are 1648370 exons in the gtf file
There are 26259 one-exon transcripts in the gtf file
There are 23459 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 4.026783
Average number of exons per transcript is 6.518130Average number of exons per transcript excluding one-exon tx is 7.157498Average number of gene per geneGroup is 8.658762
statistic: 0.027956485748291016
read outcome totals across all BAMsUSED: 2166NOT_PAIRED: 0NOT_NH_1: 2548
NOT_EXPECTED_CIGAR: 68
NOT_EXPECTED_READ_LENGTH: 0NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 24
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 5498
CLIPPED: 0
total: 10304
outcomes by BAM written to: /cold_data/zhaojiaxin/rmatscorrect/lab29/2024-03-19-14:13:47_806693_read_outcomes_by_bam.txt

novel: 0.05301499366760254The splicing graph and candidate read have been saved into /cold_data/zhaojiaxin/rmatscorrect/lab29/2024-03-19-14:13:47_806693_*.rmats
save: 0.004134654998779297
loadsg: 0.002422332763671875

==========
Done processing each gene from dictionary to compile AS events
Found 58422 exon skipping events
Found 4647 exon MX events
Found 22059 alt SS events
There are 13359 alt 3 SS events and 8700 alt 5 SS events.
Found 9739 RI events

ase: 1.7005910873413086
count: 0.40009641647338867
Processing count files.
Done processing count files.

Now my question is, is this result a normal result? In other words, is my code correct and if so how should I improve it? If it is correct is it possible to use this data for direct subsequent analysis? I look forward to your professional reply and send you my best regards.

@EricKutschera
Copy link
Contributor

It looks like the main issue is that there are only 10k alignments:

USED: 2166
NOT_PAIRED: 0
NOT_NH_1: 2548
NOT_EXPECTED_CIGAR: 68
NOT_EXPECTED_READ_LENGTH: 0
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 24
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 5498
CLIPPED: 0
total: 10304

Millions of alignments are recommended to get useful results: https://groups.google.com/g/rmats-user-group/c/hgEOH_b5Pr0/m/KqyKUfevAAAJ

@xiaobearxiaobear
Copy link

i have similar question:ie: The value of the four columns "IJC_SAMPLE_1 SJC_SAMPLE_1 IJC_SAMPLE_2 SJC_SAMPLE_2" in the EVENT.MATS.JCEC.txt file is 0, while the value of the five columns "PValue FDR IncLevel1 IncLevel2 IncLevelDifference" five columns have a value of NA.
here is my report of the results given at the end of running the command
$ python3 /public/home/xiaoming/miniconda3/envs/myenvname/bin/rmats.py --b1 b1.txt --b2 b2.txt --gtf /public/home/xiaoming/hTSC_STB96_RNA_seq/hTSC_STB96_rMATs/hTSC_STB24/GCF_000001405_40_GRCh38_p14_genomic.gtf -t paired --libType fr-unstranded --readLength 100 --nthread 4 --od AS --tmp AS --variable-read-length --statoff
gtf: 110.33750867843628
There are 50006 distinct gene ID in the gtf file
There are 200905 distinct transcript ID in the gtf file
There are 24989 one-transcript genes in the gtf file
There are 2298165 exons in the gtf file
There are 10649 one-exon transcripts in the gtf file
There are 4969 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 4.017618
Average number of exons per transcript is 11.439063
Average number of exons per transcript excluding one-exon tx is 12.023358
Average number of gene per geneGroup is 8.155996
statistic: 0.07963252067565918

read outcome totals across all BAMs
USED: 575925120
NOT_PAIRED: 6175140
NOT_NH_1: 381853089
NOT_EXPECTED_CIGAR: 6831273
NOT_EXPECTED_READ_LENGTH: 0
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 23086919
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 979511
CLIPPED: 68565642
total: 1063416694
outcomes by BAM written to: AS/2024-06-14-23_18_12_994584_read_outcomes_by_bam.txt

novel: 1732.8452179431915
The splicing graph and candidate read have been saved into AS/2024-06-14-23_18_12_994584_*.rmats
save: 11.968095302581787
loadsg: 0.5375621318817139

==========
Done processing each gene from dictionary to compile AS events
Found 89609 exon skipping events
Found 18301 exon MX events
Found 34574 alt SS events
There are 17338 alt 3 SS events and 17236 alt 5 SS events.
Found 6597 RI events

ase: 2.233708381652832
count: 21.929344415664673
Processing count files.
Done processing count files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants