Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database processing with latest SILVA and greengenes versions #42

Open
farchaab opened this issue May 29, 2024 · 2 comments
Open

Database processing with latest SILVA and greengenes versions #42

farchaab opened this issue May 29, 2024 · 2 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@farchaab
Copy link
Contributor

farchaab commented May 29, 2024

When using SILVA v138.1 (wSpecies_train_set) I get an error in Derep_and_merge_taxonomy.

  • fasta file
>1
AACTGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAG
TCGAGCGGCAGCACGGGTACTTGTACCTGGTGGCGAGCGGCGGACGGGTGAGTAATGCCT
  • taxonomy
>Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Pseudomonadaceae;Pseudomonas;amygdali;
>Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Pectobacteriaceae;Dickeya;phage;
>Bacteria;Actinobacteriota;Actinobacteria;Actinomycetales;Actinomycetaceae;F0332;
>Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;equi;
>Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;porcinus;
>Bacteria;Actinobacteriota;Actinobacteria;Pseudonocardiales;Pseudonocardiaceae;Saccharomonospora;
>Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Pseudomonadaceae;Pseudomonas;
>Bacteria;Firmicutes;Clostridia;Peptostreptococcales-Tissierellales;Anaerovoracaceae;[Eubacterium] nodatum group;
>Bacteria;Proteobacteria;Alphaproteobacteria;Rhizobiales;Xanthobacteraceae;Bradyrhizobium;
>Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Porticoccaceae;Porticoccus;hydrocarbonoclasticus;

Inspecting the log:

[1] ‘1.0.2’

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

[1] ‘0.8.5’
[1] ‘0.20.41’
[1] ‘1.4.0’
[1] ‘0.5.0’
Error in `$<-.data.frame`(`*tmp*`, V2, value = character(0)) :
  replacement has 0 rows, data has 452064
Calls: $<- -> $<-.data.frame
Execution halted
@farchaab farchaab self-assigned this May 29, 2024
@farchaab farchaab added enhancement New feature or request bug Something isn't working labels May 29, 2024
@valscherz
Copy link
Collaborator

valscherz commented Jun 2, 2024

You are probably arleady aware, but just in case I think there are two differences with EzBioCloud explaining the errors:

  1. The genus is not repeted at species levels (To be verified, but in EzBioCloud it would report: >Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Pseudomonadaceae;Pseudomonas;Pseudomonas amygdali;)
  2. There are here a variable numbers of ranks (species sometimes missing..)

@farchaab
Copy link
Contributor Author

farchaab commented Jun 4, 2024

Hello @valscherz, I am aware of this and I am updating the script to repeat the genus in the species name and deal with missing ranks

@farchaab farchaab changed the title Database processing with latest SILVA version Database processing with latest SILVA and greengenes versions Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
Status: To do
Development

No branches or pull requests

2 participants