Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion for making GTDB database #91

Open
Xinpeng021001 opened this issue Jul 2, 2024 · 8 comments
Open

Suggestion for making GTDB database #91

Xinpeng021001 opened this issue Jul 2, 2024 · 8 comments

Comments

@Xinpeng021001
Copy link

Hi,

I followed the wiki to create the GTDB-database and I noticed in the final step there might be some errors:

find OPERA-MS-DB/ -type f -name '*.fna.gz' > OPERA-MS-DB/genomes_list.tx

It will give a empty list file and I guess it should be:

find -L OPERA-MS-DB/ -type f -name '*.fna.gz' > OPERA-MS-DB/genomes_list.txt

Otherwise when run the strain cluster step, it may give the error and fail at that part.

Best,

@jsgounot
Copy link
Contributor

jsgounot commented Jul 3, 2024

Hi Xinpeng,

thanks for letting me know.

Regards,
JS

@jsgounot jsgounot closed this as completed Jul 3, 2024
@Xinpeng021001
Copy link
Author

Also forgot to mention: the threads function of the python program will give errors if using multiple threads(more than 1), I fixed manually and could send it later if needed.

Best Regards,
Xinpeng

@jsgounot
Copy link
Contributor

jsgounot commented Jul 3, 2024

Oh, that's interesting, it works fine on my machine and others. I'm interested to see the error message (if you still have it) and the fix, thanks.

@jsgounot jsgounot reopened this Jul 3, 2024
@Xinpeng021001
Copy link
Author

I guess it might be my env/python version error if it works for you. Let me post it here:

python $WORK/final_course_project/glacier_algae/script/OPERA-MS/src_utils/make_operams_db_from_gtdb.py all_genomes.txt.gz all_taxonomy_r220.tsv.gz --outdir test --threads 16
Read taxonomy file
Check taxonomic information
Read genome file
Check concordance
Define genome size and seq numbers. Number of threads: 16
Traceback (most recent call last):
File "/work/yinlab/xinpeng/final_course_project/glacier_algae/script/OPERA-MS/src_utils/make_operams_db_from_gtdb.py", line 148, in
main()
File "/work/yinlab/xinpeng/final_course_project/glacier_algae/script/OPERA-MS/src_utils/make_operams_db_from_gtdb.py", line 145, in main
process(args)
File "/work/yinlab/xinpeng/final_course_project/glacier_algae/script/OPERA-MS/src_utils/make_operams_db_from_gtdb.py", line 60, in process
seqinfos = multi_threads_seqinfos(fnames) if args.threads > 1 else single_thread_seqinfos(fnames)
File "/work/yinlab/xinpeng/final_course_project/glacier_algae/script/OPERA-MS/src_utils/make_operams_db_from_gtdb.py", line 109, in multi_threads_seqinfos
with concurrent.futures.ProcessPoolExecutor(max_workers=args.threads) as executor:
NameError: name 'args' is not defined

@Xinpeng021001
Copy link
Author

The old code:

def multi_threads_seqinfos(fnames):
seqinfos = {}
with concurrent.futures.ProcessPoolExecutor(max_workers=args.threads) as executor:
if USED_TQDM:
iterator = tqdm.tqdm(executor.map(fasta_info, fnames), total=len(fnames))
else:
iterator = executor.map(fasta_info, fnames)

    for seqres in iterator:
        seqinfos.update(seqres)

return seqinfos


and the fixed:

def multi_threads_seqinfos(fnames, threads):
seqinfos = {}
with concurrent.futures.ProcessPoolExecutor(max_workers=threads) as executor:
if USED_TQDM:
iterator = tqdm.tqdm(executor.map(fasta_info, fnames), total=len(fnames))
else:
iterator = executor.map(fasta_info, fnames)

    for seqres in iterator:
        seqinfos.update(seqres)

return seqinfos

@Xinpeng021001
Copy link
Author

by the way, you forget a "t" in the find command :)

find -L OPERA-MS-DB/ -type f -name '*.fna.gz' > OPERA-MS-DB/genomes_list.tx

find -L OPERA-MS-DB/ -type f -name '*.fna.gz' > OPERA-MS-DB/genomes_list.txt

jsgounot added a commit that referenced this issue Jul 3, 2024
@jsgounot
Copy link
Contributor

jsgounot commented Jul 3, 2024

This is weird, I should have caught this issue before. Thanks for letting me know.

@Xinpeng021001
Copy link
Author

my pleasure :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants