Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database processing update #46

Merged
merged 36 commits into from
Jul 18, 2024
Merged

Database processing update #46

merged 36 commits into from
Jul 18, 2024

Conversation

farchaab
Copy link
Contributor

@farchaab farchaab commented Jun 11, 2024

PR for version 0.9.20

Summary

  • Updated pre-processing for Silva 138.1 (with species)
  • Updated pre-processing for Greengenes2
  • Updated rdp_classifier version from 2.5 to 2.14
  • Optimized lineage2taxTrain.py
  • Added taxa formatting when no pre-processing for Silva 138.1, Greengenes2, ezBiocloud and UNITE
    • Solved convergent taxa in Silva 138.1 and Greengenes2

Details

Additions

  • 5627a03 Added conda env and container for pandas
  • d1d3860 Added python script for derep_and_merge_taxonomy:
    • Fills empty taxa with placeholder names (ex: Escherichia-shigella_s)
    • Replaces Incertae, unkown and endosymbionts with NaN (caused convergent taxons for rdp)
    • For unclassified taxa, each taxa corresponding to a different cluster will have a unique index
  • cc8540a Added db_version parameter
  • 6f30c51 Added script to clean taxonomy files when no processing
  • d574275 Added list of classifiers for preparing databases (RDP, QIIME, and DADA2 by default)
  • 1619919 Added RDP memory as parameter

Changes

  • 4a94081 R scripts linting
  • bdb8a0e Changed R to python script for derep and merge taxonomy
  • c79db5a a467bc7 Updated python env and container to 3.12.3
  • 991605e Replaced numbers_species and numbers_genus by a dict {'Species': 2, 'Genus': 4}
  • 9920ea4 Use rdp_classifier instead of rdptools
  • 7c32765 Updated tax_formatting.py to work with Greengenes2

Fixes

@farchaab farchaab marked this pull request as ready for review June 14, 2024 07:48
@farchaab farchaab merged commit 6b4bbf4 into master Jul 18, 2024
@farchaab farchaab deleted the dev branch July 18, 2024 10:32
@farchaab farchaab changed the title Release 0.9.20 Database processing update Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant