Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NUG starts results in many canonical_extended ORFs #68

Open
bmmalone opened this issue Apr 25, 2017 · 3 comments
Open

NUG starts results in many canonical_extended ORFs #68

bmmalone opened this issue Apr 25, 2017 · 3 comments
Labels
priority: low Low priority issue question

Comments

@bmmalone
Copy link
Contributor

Comparing results using NUG starts compared to AUG-only starts, we see many, many more canonical_extended ORFs when using NUG than expected. For example, there are usually more canonical_extended than canonical predictions.

(There is no strand bias, though.)

@bmmalone
Copy link
Contributor Author

This seems to occur when we have a "close" upstream, in-frame NUG start. The attached image shows an example, but this appears to be much more common than upstream, in-frame AUGs.

nug-upstream-start

@bmmalone
Copy link
Contributor Author

This is a result of the "select the longest ORF for each stop codon" postprocessing step. Thus, there is not really a simple fix for the behavior. A few ideas are:

  • Incorporate the ORF type in the model, where "canonical" is more likely to be translated than others.

  • Run both AUG and NUG and subtract out the AUG canonical results from the NUG predictions.

There is not an immediate plan to address this issue.

@m-swirski
Copy link

I see the same phenomena and in fact looking at the orf profiles it seems to be correct because of leaky scanning - some fraction of translation starts on each potential start codon and it is what actually should be expected. The only problem is lack of annotation for these leaky-scanning derived isoforms - sometimes as little as 0.1% of translation initiates on particular alternative start codon and the only isoform found in final "filtered.prediction.orfs.bed" is the longest one. Could one work around bayes_factors to delineate between possible starts?

@eboileau eboileau added the priority: low Low priority issue label Dec 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: low Low priority issue question
Projects
None yet
Development

No branches or pull requests

3 participants