Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readLength warning #392

Open
nbizzozero opened this issue Apr 12, 2024 · 5 comments
Open

readLength warning #392

nbizzozero opened this issue Apr 12, 2024 · 5 comments

Comments

@nbizzozero
Copy link

Since the summary.txt was empty, I decided to tweak the readLength as suggested in a separate comment using --variable length. Previously I had used 145 but I am not sure what was the read length in someone else's data and I tried a different length of 50.

Now I got a different error:

WARNING: The post step should use the same read length as the prep step.
The prep step's read length: 145
The post step's read length: 50

This is my first time using this program. Could you let me know how to fix this?

Thanks, NPB

@EricKutschera
Copy link
Contributor

Here's the line for that warning: https://github.com/Xinglab/rmats-turbo/blob/v4.3.0/rMATS_pipeline/rmatspipeline/rmatspipeline.pyx#L3879

The main steps of rMATS are prep and post (which by default also includes stat). The different steps can be run with --task which defaults to --task both to run prep and post

In the prep step rMATS processes the input reads and summarizes them in .rmats files (which are written to the --tmp directory). The prep step uses --readLength to filter out reads that don't match that length (but --variable-read-length disables that filter). The .rmats file includes the --readLength value that was used in the prep step

In the post step the .rmats files are loaded and the splicing events are detected and quantified. The post step uses --readLength as part of the PSI value (IncLevel) calculation. The warning is there because the length used to filter reads in the prep step doesn't match the length that will be used to calculate the PSI values in the post step

The warning could happen if you ran --task prep with one value of --readLength and later --task post with a different value. Another possibility is that you are using --task both but are using a --tmp directory that has files from a previous run

You can avoid the error by running all tasks with the same length and using a new --tmp directory if you change the length or the dataset

@nbizzozero
Copy link
Author

nbizzozero commented Apr 12, 2024 via email

@nbizzozero
Copy link
Author

Hi Eric,

After I came back to work yesterday after a month off from work, I tried your suggestion of using --task both or -- task prep and --task post but I still was unable to get any splicing results from 3 controls and 3 KO samples. There was a problem with KO-10 that I need to fix (this was decompressed accidentally) and in the b2.txt-file that did not list one of the sample. But I was able to run rmats as shown in the attached Word document. Do you think the problem I am encountering could be due to the specific .bam files I am using or the fact that rmats was installed in an apptainer?
New rmats script tests for June 3 and 4 to add processing tasks.docx

Do you have any further suggestions to troubleshoot how I run rmats-turbo in our HPC?

Thanks again, Nora

@EricKutschera
Copy link
Contributor

I would recommend using directories that don't have anything else in them for --od and --tmp. Then while you are working through any errors you can delete those directories each time you run rmats until you resolve the errors (--od /users/nperrone/rmats_out --tmp /users/nperrone/rmats_tmp).

From the ls output it looks like you have .rmats files from 4 different runs in the directory used as --tmp which would explain the warnings like WT-13-Neurite.bam found 4 times in .rmats files

In the last run it looks like you were still getting:

Fail to open KO-1-Neurite
Fail to open KO-10-Neurite.bam

Did you resolve those issues?

From the output, rmats couldn't use any of the alignments from your bam files:

read outcome totals across all BAMs
USED: 0
NOT_PAIRED: 0
NOT_NH_1: 69532725
NOT_EXPECTED_CIGAR: 2094101
NOT_EXPECTED_READ_LENGTH: 0
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 248509502
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 17531986
CLIPPED: 124819036
total: 462487350

About half of the alignments are not used due to EXON_NOT_MATCHED_TO_ANNOTATION. It could be that the reference files used to create the bam files are not compatible with the --gtf given to rmats. See this post: #367 (comment)

@nbizzozero
Copy link
Author

nbizzozero commented Jun 5, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants