Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numpy 2 compatibility #101

Open
matbryan52 opened this issue Jun 21, 2024 · 2 comments
Open

Numpy 2 compatibility #101

matbryan52 opened this issue Jun 21, 2024 · 2 comments

Comments

@matbryan52
Copy link
Member

matbryan52 commented Jun 21, 2024

Looks like we are OK for numpy 2 here, just hdbscan which is not yet compatible. Their new version 0.8.37 released a few days ago pins numpy < 2, so new installs will work. Older installs that are upgraded will likely crash with the ValueError for the dtype size change on the import of hdbscan.

See scikit-learn-contrib/hdbscan#642

We might also want to checkout fast_hdbscan mentioned in that issue: https://github.com/TutteInstitute/fast_hdbscan

@sk1p
Copy link
Member

sk1p commented Jun 25, 2024

We might also want to checkout fast_hdbscan mentioned in that issue: https://github.com/TutteInstitute/fast_hdbscan

I didn't look in detail yet, but the description sounds good - we already depend on numba, so installation (and maintenance on their side) should be a lot less painful. The better performance is then just the cherry on top.

@matbryan52
Copy link
Member Author

We might also want to checkout fast_hdbscan mentioned in that issue: https://github.com/TutteInstitute/fast_hdbscan

I didn't look in detail yet, but the description sounds good - we already depend on numba, so installation (and maintenance on their side) should be a lot less painful. The better performance is then just the cherry on top.

In the end not such a simple change, yet.

Looks like fast_hdbscan is not quite yet numpy2 compatible (use of np.bool8 at least). Dropping to older numpy does mean that tests pass with fast_hdbscan, which is a good sign, but the import time is huge (10+ seconds on ptycho), I think because they run a fit to warmup numba at import : https://github.com/TutteInstitute/fast_hdbscan/blob/main/fast_hdbscan/__init__.py

Might want to wait and see, for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants