Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] NN Descent has low recall for large datasets #204

Open
jinsolp opened this issue Jun 27, 2024 · 0 comments
Open

[BUG] NN Descent has low recall for large datasets #204

jinsolp opened this issue Jun 27, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@jinsolp
Copy link

jinsolp commented Jun 27, 2024

Description

NN Descent shows low recall (compared to using brute force knn) for large datasets. This makes it difficult to scale up and out and use NN Descent for building knn graphs for other algorithms such as UMAP.

Result for dataset with 1000 features
The recall does not improve after a certain point even with a lot of iterations
Screenshot 2024-06-27 at 1 18 27 PM

Result for dataset with 100 features
Not as bad as the dataset above with 1000 features, but also shows low recall for dataset with smaller number of features too
Screenshot 2024-06-27 at 1 23 55 PM

Reproducing the bug

The experiments above were run on specifically this commit on raft's branch-24.08.

  1. Change the raft/cpp/test/neighbors/ann_nn_descent.cuh file test input to test for larger datasets like below
    Screenshot 2024-06-26 at 6 34 07 PM
    index_params.max_iterations is set to 100 in the same file, and can be changed to test for larger number of iterations.
    The test file random generates data in the setup() function.

  2. build the test

  3. Run the test
    ./cpp/build/gtests/NEIGHBORS_ANN_NN_DESCENT_TEST --gtest_filter=AnnNNDescentTest/AnnNNDescentTestF_U32*

@jinsolp jinsolp added the bug Something isn't working label Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant