You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NN Descent shows low recall (compared to using brute force knn) for large datasets. This makes it difficult to scale up and out and use NN Descent for building knn graphs for other algorithms such as UMAP.
Result for dataset with 1000 features
The recall does not improve after a certain point even with a lot of iterations
Result for dataset with 100 features
Not as bad as the dataset above with 1000 features, but also shows low recall for dataset with smaller number of features too
Reproducing the bug
The experiments above were run on specifically this commit on raft's branch-24.08.
Change the raft/cpp/test/neighbors/ann_nn_descent.cuh file test input to test for larger datasets like below index_params.max_iterations is set to 100 in the same file, and can be changed to test for larger number of iterations.
The test file random generates data in the setup() function.
build the test
Run the test ./cpp/build/gtests/NEIGHBORS_ANN_NN_DESCENT_TEST --gtest_filter=AnnNNDescentTest/AnnNNDescentTestF_U32*
The text was updated successfully, but these errors were encountered:
Description
NN Descent shows low recall (compared to using brute force knn) for large datasets. This makes it difficult to scale up and out and use NN Descent for building knn graphs for other algorithms such as UMAP.
Result for dataset with 1000 features
The recall does not improve after a certain point even with a lot of iterations
Result for dataset with 100 features
Not as bad as the dataset above with 1000 features, but also shows low recall for dataset with smaller number of features too
Reproducing the bug
The experiments above were run on specifically this commit on raft's
branch-24.08
.Change the
raft/cpp/test/neighbors/ann_nn_descent.cuh
file test input to test for larger datasets like belowindex_params.max_iterations
is set to 100 in the same file, and can be changed to test for larger number of iterations.The test file random generates data in the
setup()
function.build the test
Run the test
./cpp/build/gtests/NEIGHBORS_ANN_NN_DESCENT_TEST --gtest_filter=AnnNNDescentTest/AnnNNDescentTestF_U32*
The text was updated successfully, but these errors were encountered: