Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Memory Evaluation For different algorithm #1139

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

luyuncheng
Copy link
Collaborator

@luyuncheng luyuncheng commented Sep 15, 2023

Description

When we want to introduce a new algorithm or engine, we prefer to evaluate the performance, memory, disk size. in benchmark tests, we can evaluate the performance as a single node.

But we can not evaluate a engine/algorithm takes how much memory, because in benchmark jvm make it hard to evaluate the memory only in jni layer.

in #946 we try to assess memory usage with different algorithm, and when using benchmark/intergration tests, java heap memory and other memory usage make it hard to evaluate algorithm real memory usage.

so i write a memory tests, and only use faiss_wrapper/nmslib_wrapper. it can evaluate memory usage, file size.
i use http://corpus-texmex.irisa.fr/ vector file format. and add test_util::load_data to read sift.fvecs with SIFT1M datasets.

i do some tests like following report:

SIFT1M

Algotightm Index RES FileSize Query RES
HNSW32 1.8GB 634MB 769MB
NSG64 3.1GB 586MB 752MB
NSG32 3.1GB 577MB 639MB

GIST960

Algotightm Index RES FileSize Query RES
HNSW32 11.2GB 3.9GB 3944MB
NSG64 12GB 3.7GB 3923MB
NSG32 12GB 3.7GB 3806MB

Usage:

  1. go to http://corpus-texmex.irisa.fr/, and download SIFT1M dataset, and unzip into a directory like 'dataset/sift/sift_base.fvecs'

  2. and run different tests

./bin/jni_memory_test --gtest_filter=FaissNSGQueryMemoryTest.*

Issues Resolved

#946

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@luyuncheng luyuncheng changed the title Add Memory Tests For different algorightm Add Memory Evaluation For different algorightm Sep 15, 2023
jni/CMakeLists.txt Show resolved Hide resolved
jni/tests/faiss_memory_test.cpp Outdated Show resolved Hide resolved
jni/tests/faiss_memory_test.cpp Outdated Show resolved Hide resolved
jni/tests/faiss_memory_test.cpp Outdated Show resolved Hide resolved
jni/tests/faiss_memory_test.cpp Outdated Show resolved Hide resolved
jni/tests/faiss_memory_test.cpp Outdated Show resolved Hide resolved
jni/tests/faiss_memory_test.cpp Outdated Show resolved Hide resolved
jni/tests/faiss_memory_test.cpp Outdated Show resolved Hide resolved
jni/tests/faiss_memory_test.cpp Outdated Show resolved Hide resolved
jni/tests/faiss_memory_test.cpp Outdated Show resolved Hide resolved
Signed-off-by: luyuncheng <[email protected]>
Update and rename faiss_memory_test.cpp to memory_test.cpp

Signed-off-by: luyuncheng <[email protected]>
jni/CMakeLists.txt Outdated Show resolved Hide resolved
Update CMake file

Signed-off-by: luyuncheng <[email protected]>
Signed-off-by: luyuncheng <[email protected]>
@navneet1v
Copy link
Collaborator

@luyuncheng can you add details around what is Index RES and Query RES?

using ::testing::Return;
#define GTEST_COUT std::cerr << "[ ] [ INFO ]"

TEST(FaissHNSWIndexMemoryTest, BasicAssertions) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add on all the tests what we are testing and what is the expectation as comments on top of all test function.

Also, I want to understand little bit here in terms of what are the failure scenario for these tests. may be you explained it in older comments, if yes can you point me there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@navneet1v

These test just evaluate the memory only with EngineWrapper, not unit test.

When we want to introduce a new algorithm or engine, we prefer to evaluate the performance, memory, disk size. in benchmark tests, we can evaluate the performance as a single node.

But we can not evaluate a engine/algorithm takes how much memory, because in benchmark jvm make it hard to evaluate the memory only in jni layer.

so i added these code and want to evaluate different algorithm/engine in different param, at index time, query time memory usage, and time usage.

@luyuncheng
Copy link
Collaborator Author

@luyuncheng can you add details around what is Index RES and Query RES?

@navneet1v
Index Res: Build Graph Index takes a long time, i use a monitor to check the avg resident memory during the Build Index.
Query Res: Query 1000 vector sequential, used a monitor to check the avg resident memory during the Query Index.

@luyuncheng luyuncheng changed the title Add Memory Evaluation For different algorightm Add Memory Evaluation For different algorithm Sep 21, 2023
@jmazanec15
Copy link
Member

jmazanec15 commented Sep 26, 2023

In general, I really like having the ability to use the jni_wrapper in order to test our code with real data sets (not just random data). This has a lot of potential to help us debug memory problems as well as performance problems.

That being said, I think that the memory monitoring should be done outside of the tests. Adding memory monitoring inside the test may make them difficult to work across platforms. I see the tests themselves more like JNI integration tests or end to end tests or microbenchmarks. We should remove all calls to get specific memory information from faiss and change the name from memory_test to integ_test or e2e_test or microbenchmarks. Instead, to check memory, I think that they can be run with an external monitor. For instance, I believe you used gperftools at some point.

@@ -150,6 +150,16 @@ namespace test_util {

float RandomFloat(float min, float max);

// Read vector file formats
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment about the format the data is expected to be in

// Read vector file formats
void load_data(char* filename, float*& data, unsigned& num, unsigned& dim);

// asign data into vector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: asign -> assign. Also, can we add more detail about how this function should be used in the comment?

@navneet1v
Copy link
Collaborator

@luyuncheng are you still working on this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants