Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conda version of hifiasm slower than manually built version #392

Open
gconcepcion opened this issue Jan 30, 2023 · 24 comments
Open

conda version of hifiasm slower than manually built version #392

gconcepcion opened this issue Jan 30, 2023 · 24 comments

Comments

@gconcepcion
Copy link

Hi Haoyu,

We've noticed internally that running a version of hifiasm compiled outside of conda with gcc/11.1.0 or gcc/11.3.0 runs significantly faster in terms of CPU time than a version of hifiasm installed / compiled using conda.

For a small 10Mb test dataset, the difference is roughly 3-4 times faster for the manually built version than with the conda built version.
hifiasm compiled w/ gcc/11.3.0 :
[M::main] Real time: 121.909 sec; CPU: 4144.494 sec; Peak RSS: 16.353 GB

hifiasm installed / compiled using conda:
[M::main] Real time: 393.322 sec; CPU: 15529.166 sec; Peak RSS: 16.353 GB

The time differential is even more significant when running a full human sized dataset.

Can you think of anything about the conda installed version that would result in a binary that takes longer to compute the assembly graph? Could this have something to do with compiler flags?

Thanks!

@chhylp123
Copy link
Owner

I have no idea about that... Could you please share the 10M dataset with us? I will have a try to reproduce this issue.

@gconcepcion
Copy link
Author

Just sent you an email with a link to a 10Mb region of HG005

@chhylp123
Copy link
Owner

Thanks a lot!

@xzhoubayer
Copy link

quay.io/biocontainers/hifiasm:0.18.5--h5b5514e_0 is 5-fold slower than the 0.15.2 version compiled outside the container, when they were used to assemble a 2.5 Gb plant genome

@chhylp123
Copy link
Owner

@xzhoubayer Is it a public dataset? Can I reproduce this issue on my side?

@xzhoubayer
Copy link

xzhoubayer commented Feb 4, 2023

It is not a public dataset. I cannot share. I think there are some public data sets of corn genomes

@xzhoubayer
Copy link

0.18.5 should not be slower than 0.15.2, right?

@chhylp123
Copy link
Owner

Might be a conda issue, instead of hifiasm itself.

hkeward added a commit to PacificBiosciences/wdl-dockerfiles that referenced this issue Feb 9, 2023
The conda version of hifiasm is significantly slower than the manually
built; see chhylp123/hifiasm#392.
hkeward added a commit to PacificBiosciences/wdl-dockerfiles that referenced this issue Feb 9, 2023
The conda version of hifiasm is significantly slower than the manually
built; see chhylp123/hifiasm#392.
@chhylp123
Copy link
Owner

Hi @gconcepcion, I'm wondering are previous conda versions of hifiasm also several times slower?

@williamrowell
Copy link

I switched from building my own docker with hifiasm to using the conda hifiasm sometime between July and October 2020 (I think you added the bioconda recipe around this time), and I've been using the conda version regularly since then. Based on the ~500 human samples I've assembled with the conda build, my expectation has always been roughly 18 to 36 hours on 48 threads for 20-30x depth, and I haven't really seen that change.

@gconcepcion
Copy link
Author

I personally never noticed the issue because I always manually build hifiasm and it's always very fast. @williamrowell and I discovered this issue recently because he has been telling me for the past two years how slow it was in his human-wgs pipeline (which was contrary to my experience) so I finally decided to dig in to figure out what was going on and realized there is some unknown discrepancy between conda and a manually built version.

@hkeward
Copy link

hkeward commented Feb 14, 2023

I ran a small test sample using the conda-based and a manually built version of hifiasm (version 0.15.5).
Using the conda environment:

User System Elapsed %CPU Maxresident memory
441484.13 1035.65 2:59:40 4104%CPU 29.50G

Using a manually built version of hifiasm:

User System Elapsed %CPU Maxresident memory
166349.91 1198.05 1:11:50 3887%CPU 29.22G

Using the manually built version ran ~2.5x faster.

@chhylp123
Copy link
Owner

I see. Thanks a lot. It should be the issue of conda receipt. I will fix it as soon as possible.

@bgruening
Copy link

How do you all compile hifiasm? The build script looks not too fancy to me: https://github.com/bioconda/bioconda-recipes/blob/master/recipes/hifiasm/build.sh and I'm wondering what could go wrong here.

@williamrowell
Copy link

We're seeing this across a lot of different build environments, but we're all just running something like the basic instructions in @chhylp123's repo under Getting Started:

git clone https://github.com/chhylp123/hifiasm
cd hifiasm && make

or something equivalent like the following from @hkeward's Dockerfile:

RUN wget https://github.com/chhylp123/hifiasm/archive/refs/tags/${HIFIASM_VERSION}.tar.gz && \
	tar -zxvf ${HIFIASM_VERSION}.tar.gz --directory /opt && \
	rm ${HIFIASM_VERSION}.tar.gz
RUN cd /opt/hifiasm-${HIFIASM_VERSION} && \
	make

@chhylp123
Copy link
Owner

@bgruening The CXXFLAGS of build.sh overwrites the -O3 inside the makefile. So I was trying to modify it as bioconda/bioconda-recipes@8b5b243. However, bioconda always reported (see: https://dev.azure.com/bioconda/bioconda-recipes/_build/results?buildId=28288&view=l[…]-55c4-b71d-229b239cfb2f&t=7df82132-b284-504b-53d6-7d3e63519572):

x86_64-conda-linux-gnu/bin/ld: cannot find -lz: No such file or directory

Do you have any idea about this issue?

@bgruening
Copy link

This means zlib is not found and I guess this is because you are overwriting CXXFLAGS in your Makefile.

I guess you can make it work by using something likeCXXFLAGS := $(CXXFLAGS) -g -O3 -msse4.2 -mpopcnt -fomit-frame-pointer -Wall

@chhylp123
Copy link
Owner

chhylp123 commented Feb 17, 2023

Thanks a lot. If I understand correctly, writing bioconda receipt in this way will not overwrite CXXFLAGS within the Makefile:
bioconda/bioconda-recipes@7db38ec. But it still could not pass the checking of bioconda?

@bgruening
Copy link

@chhylp123 have you seen this here: #402

Please feel free to open a bioconda PR than I can mess around with it.

@bgruening
Copy link

@gconcepcion @williamrowell can you maybe try version 0.18.8 from bioconda?

@gconcepcion
Copy link
Author

gconcepcion commented Feb 23, 2023

Yes, I can confirm that 0.18.8 from bioconda now performs as expected:

binary/slurm-33956810.out:[M::main] Real time: 330.544 sec; CPU: 6533.171 sec; Peak RSS: 16.356 GB
conda/slurm-33956821.out:[M::main] Real time: 326.120 sec; CPU: 6409.821 sec; Peak RSS: 16.413 GB

Thanks for the fix everyone!

@bgruening
Copy link

Very cool, thanks for testing!

@chhylp123
Copy link
Owner

Thanks all for the great help! But I still don't understand why my commit didn't work. Could you please explain more (see: #402 (comment))? Thank you in advance.

@chhylp123
Copy link
Owner

@bgruening My question is that as this issue mentioned (weidai11/cryptopp#525), if users set CXXFLAGS by the command line, the GNU make will overwrite CXXFLAGS no matter it is hardcoded in Makefile or not? I guess the GNU make will overwrite CXXFLAGS in anyway?

williamrowell added a commit to PacificBiosciences/pb-human-wgs-workflow-snakemake that referenced this issue Mar 8, 2023
- older hifiasm versions _built by conda_ are slower than those same versions built outside of conda chhylp123/hifiasm#392
- this is a problem with the conda build process
- fixed recently chhylp123/hifiasm#392
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants