-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filtering out host (human) genome beforehand? #96
Comments
Hi, |
Thanks for your response. It would be great to remove the phix genome in the next version of Hecatomb. I will be looking forward to the next version. In addition, you may wish to include a feature to only search in the DNA vs RNA viral catalogue or both. This way, it may better suit the type of the dataset you are investigating. I am curious to know your thoughts! |
Yes, I agree 100%. This misclassification of host DNA as RNA viruses is very typical. I like the idea of switching off searching for RNA viruses; I'll have to think of the best way to implement it as we want to do the same thing for phages. |
Thank you for this wonderful tool!
Should we filter out the host (human) genome before executing the pipeline e.g. using kneadData or fastp? the same also applies to the PhiX genomes?
I tried Hecatomb on one of my DNA shotgun metagenomics datasets, and I found that there is a large difference in the output with or without host DNA removal beforehand. Specifically, the number (and diversity) of viral sequences retrieved to was much higher when I did not remove the host (human) DNA before using Hecatomb. The other issue is that I found a large proportion of sequences was assigned to RNA viruses including ones that I should not normally see in my dataset, such as Human immunodeficiency virus. These RNA viruses were found with with or without prior host DNA removal, however, it was significantly higher when I included the dataset without removing the host DNA. This makes me think that the host DNA is mistakenly classified in my dataset.
Also, I am not sure whether I should expect to find any RNA viruses when my dataset is mainly shotgun DNA metagenomics.
For more context, my dataset is a bulk shotgun metagenomics datasets (i.e. not viral enriched).
Thank you in advance!
The text was updated successfully, but these errors were encountered: