Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Is it possible to ignore certain characters? #15

Open
nrminor opened this issue Feb 4, 2024 · 2 comments
Open

Question: Is it possible to ignore certain characters? #15

nrminor opened this issue Feb 4, 2024 · 2 comments

Comments

@nrminor
Copy link

nrminor commented Feb 4, 2024

Hello,

I'm currently working on a tool that calls pairwise distances between large numbers of SARS-CoV-2 genome sequences. One of the issues with this kind of bioinformatics is that these sequences often contain many non-ATGC characters that represent an ambiguous base, e.g., "N" is a stand-in for any base. If my understanding is correct, the distance metrics in triple_accel would treat these characters as mismatches. Is there a way I could use triple_accel, as it's currently written, to ignore non-ATGC bases rather than counting them toward the total edit distance?

Thanks for your help and for the excellent crate!
--Nick

@Daniel-Liu-c0deb0t
Copy link
Owner

Hey Nick! It is not currently possible to ignore certain bytes (at least without modifying the library code).

I would suggest using Block Aligner, my newer library. Note that unlike triple_accel, Block Aligner uses scoring (+ for matches, - for mismatches and indels) instead of edit distance. It also supports custom substitution matrices so you can ignore Ns. Finally, it is faster in many cases.

@nrminor
Copy link
Author

nrminor commented Feb 5, 2024

Block Aligner looks great, Daniel! I will definitely give it a try in my tool. Thanks for the quick reply and for pointing me in that direction!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants