Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the sequencing library is vague #412

Open
xiaobearxiaobear opened this issue Jun 19, 2024 · 1 comment
Open

the sequencing library is vague #412

xiaobearxiaobear opened this issue Jun 19, 2024 · 1 comment

Comments

@xiaobearxiaobear
Copy link

My RNA-seq data were downloaded from the Internet, and the authors did not elaborate on the type of library built. for --libtype parameter, i try two ways.
one: python3 /public/home/xiaoxiong/miniconda3/envs/libgsl/rMATS/rmats.py --b1 b1.txt --b2 b2.txt --gtf /public/home/xiaoxiong/hTSC_STB96_RNA_seq/hTSC_STB96_rMATs_Ensembol/STB72_STB96/genes.gtf -t paired --libType fr-firststrand --readLength 100 --nthread 4 --od AS_libgsl_firststrand --tmp AS_libgsl_firststrand --variable-read-length
gtf: 153.78015661239624
There are 36601 distinct gene ID in the gtf file
There are 199138 distinct transcript ID in the gtf file
There are 14315 one-transcript genes in the gtf file
There are 1305354 exons in the gtf file
There are 5603 one-exon transcripts in the gtf file
There are 3566 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 5.440780
Average number of exons per transcript is 6.555022
Average number of exons per transcript excluding one-exon tx is 6.715845
Average number of gene per geneGroup is 7.854318
statistic: 0.42818236351013184

read outcome totals across all BAMs
USED: 583925661
NOT_PAIRED: 5064400
NOT_NH_1: 270852260
NOT_EXPECTED_CIGAR: 6728411
NOT_EXPECTED_READ_LENGTH: 0
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 45292289
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 1269436
CLIPPED: 65970785
total: 979103242
outcomes by BAM written to: AS_libgsl_firststrand/2024-06-19-13_12_09_367167_read_outcomes_by_bam.txt

novel: 2397.6510169506073
The splicing graph and candidate read have been saved into AS_libgsl_firststrand/2024-06-19-13_12_09_367167_*.rmats
save: 28.220320224761963
loadsg: 17.720765352249146

==========
Done processing each gene from dictionary to compile AS events
Found 89553 exon skipping events
Found 11671 exon MX events
Found 18472 alt SS events
There are 11225 alt 3 SS events and 7247 alt 5 SS events.
Found 6930 RI events

ase: 5.3955464363098145
count: 153.7568280696869
Processing count files.
Done processing count files.
4773f9cf07b10c90bd82196e3074a97
4

another python3 /public/home/xiaoxiong/miniconda3/envs/libgsl/rMATS/rmats.py --b1 b1.txt --b2 b2.txt --gtf /public/home/xiaoxiong/hTSC_STB96_RNA_seq/hTSC_STB96_rMATs_Ensembol/hTSC_STB72/genes.gtf -t paired --libType fr-secondstrand --readLength 100 --nthread 4 --od AS_libgsl_second --tmp AS_libgsl_second --variable-read-length
gtf: 1452.7515552043915
There are 36601 distinct gene ID in the gtf file
There are 199138 distinct transcript ID in the gtf file
There are 14315 one-transcript genes in the gtf file
There are 1305354 exons in the gtf file
There are 5603 one-exon transcripts in the gtf file
There are 3566 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 5.440780
Average number of exons per transcript is 6.555022
Average number of exons per transcript excluding one-exon tx is 6.715845
Average number of gene per geneGroup is 7.854318
statistic: 0.08252692222595215

read outcome totals across all BAMs
USED: 54658238
NOT_PAIRED: 5238074
NOT_NH_1: 266573095
NOT_EXPECTED_CIGAR: 6915951
NOT_EXPECTED_READ_LENGTH: 0
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 443321716
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 131377399
CLIPPED: 64958367
total: 973042840
outcomes by BAM written to: AS_libgsl_second/2024-06-19-13_19_27_769310_read_outcomes_by_bam.txt

novel: 2378.783494949341
The splicing graph and candidate read have been saved into AS_libgsl_second/2024-06-19-13_19_27_769310_*.rmats
save: 3.096289873123169
loadsg: 1.7018141746520996

==========
Done processing each gene from dictionary to compile AS events
Found 47791 exon skipping events
Found 3302 exon MX events
Found 16193 alt SS events
There are 9849 alt 3 SS events and 6344 alt 5 SS events.
Found 6783 RI events

ase: -3.3512372970581055
count: 5.232464551925659
Processing count files.
Done processing count files.
12
3

so, I don't know if that is the right method, or if I should try it again -libType fr-unstranded.
do you have a good suggestion?

@EricKutschera
Copy link
Contributor

When you ran with --libType fr-firststrand 583 million alignments were used:

USED: 583925661
NOT_PAIRED: 5064400
NOT_NH_1: 270852260
NOT_EXPECTED_CIGAR: 6728411
NOT_EXPECTED_READ_LENGTH: 0
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 45292289
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 1269436
CLIPPED: 65970785
total: 979103242

When you ran with --libType fr-secondstrand only 54 million alignments were used:

USED: 54658238
NOT_PAIRED: 5238074
NOT_NH_1: 266573095
NOT_EXPECTED_CIGAR: 6915951
NOT_EXPECTED_READ_LENGTH: 0
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 443321716
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 131377399
CLIPPED: 64958367
total: 973042840

The main differences are about 400 million EXON_NOT_MATCHED_TO_ANNOTATION and 130 million JUNCTION_NOT_MATCHED_TO_ANNOTATION when run with --libType fr-secondstrand. If the library was actually unstranded then I would expect to see similar numbers of NOT_MATCHED_TO_ANNOTATION for fr-firststrand and fr-secondstrand. Based on the output it seems that your data should be run with --libType fr-firststrand

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants