-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add binary format support with HNSW method in Faiss Engine #1781
Add binary format support with HNSW method in Faiss Engine #1781
Conversation
bd60a2c
to
3898397
Compare
66419ea
to
d12ac25
Compare
3898397
to
4befe2b
Compare
95e0fbe
to
6840c3e
Compare
dafd79b
to
a913082
Compare
f310113
to
b6c9c6e
Compare
protected float computeScore() throws IOException { | ||
final BytesRef value = binaryDocValues.binaryValue(); | ||
final ByteArrayInputStream byteStream = new ByteArrayInputStream(value.bytes, value.offset, value.length); | ||
final byte[] vector = byteStream.readAllBytes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be behind the VectorSerializer interfaces otherwise there is no common way to read the binary vectors/float vectors.
import java.util.List; | ||
|
||
/** | ||
* Vector transfer for float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better java doc needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is child class of VectorTransfer
. Wouldn't the java doc in VectorTransfer
be enough?
src/main/java/org/opensearch/knn/index/codec/transfer/VectorTransfer.java
Show resolved
Hide resolved
abstract public void init(final long totalLiveDocs); | ||
|
||
/** | ||
* Transfer a single vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the documentation is incorrect, we are not transferring 1 vector here. We are transferring more than 1 vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to say this method is called for a single vector.
Yes. We buffer vectors and transfer more than 1 vector later internally but it is hidden from caller.
/** | ||
* Close the transfer | ||
*/ | ||
abstract public void close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how we are ensuring that close method is always called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it is caller's responsibility to call the close. Any idea to enforce it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me make it as closable class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I got your point. I think it should be refactored so that there is no dependency between methods.
@navneet1v I captured the all comments related with code refactoring in #1810. Please update in the issue if I missed any. I will follow up after completing this feature implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for a big PR like this, it would be helpful to give a high level summary of what you had to change in the implementation and why. For instance, explaining the different components changed, like why we are changing the free api to take a boolean. This will make review a lot easier and there will be less back and forth.
src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesConsumer.java
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/mapper/KNNVectorFieldMapperUtil.java
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/memory/NativeMemoryAllocation.java
Show resolved
Hide resolved
612a612
to
29815af
Compare
I think there are multiple refactoring suggested in the code and my worry is there are new features that we are working on including Disk based vector search and reducing memory foot print during index creation they all touch the same code path. Doing refactoring later will impact those features. |
We can discuss offline. My plan is working on refactoring after 2.16 release. Therefore, the delay won't be much between now and later. |
Signed-off-by: Heemin Kim <[email protected]>
Ran |
5d008cd
into
opensearch-project:feature/binary-format
Signed-off-by: Heemin Kim <[email protected]>
…n Faiss Engine (#1829) * Add faiss custom patch to support search parameter in binary index (#1815) Signed-off-by: Heemin Kim <[email protected]> * Add binary format support with HNSW method in Faiss Engine (#1781) Signed-off-by: Heemin Kim <[email protected]> --------- Signed-off-by: Heemin Kim <[email protected]>
…n Faiss Engine (opensearch-project#1829) * Add faiss custom patch to support search parameter in binary index (opensearch-project#1815) Signed-off-by: Heemin Kim <[email protected]> * Add binary format support with HNSW method in Faiss Engine (opensearch-project#1781) Signed-off-by: Heemin Kim <[email protected]> --------- Signed-off-by: Heemin Kim <[email protected]> (cherry picked from commit fe1d86f)
…n Faiss Engine (#1829) (#1834) * Add faiss custom patch to support search parameter in binary index (#1815) Signed-off-by: Heemin Kim <[email protected]> * Add binary format support with HNSW method in Faiss Engine (#1781) Signed-off-by: Heemin Kim <[email protected]> --------- Signed-off-by: Heemin Kim <[email protected]> (cherry picked from commit fe1d86f)
…n Faiss Engine (opensearch-project#1829) * Add faiss custom patch to support search parameter in binary index (opensearch-project#1815) Signed-off-by: Heemin Kim <[email protected]> * Add binary format support with HNSW method in Faiss Engine (opensearch-project#1781) Signed-off-by: Heemin Kim <[email protected]> --------- Signed-off-by: Heemin Kim <[email protected]>
Description
What this PR has
Add binary format support with HNSW method in Faiss engine
What this PR does not have
Changes
UNDEFINED
space type. This is needed because the space type is parsed before we know about data type and we are going to have a different default space type perdata_type
;hamming
for binary andl2
for all other. Therefore, I had to assignUNDEFINED
space type when user does not provide any space type first. Then later inside mapper, I set proper default space type based on data type if the space type isUNDEFINED
isBinary
parameter inJNIService.free
method. Becausefaiss::Index
andfaiss::BinaryIndex
does not share common parent class, we cannot determine if it isfaiss::Index
orfaiss::BinaryIndex
using cast. Therefore, we need to know if it is binary index or not. The reason why we not having a separatefreeBinaryIndex
method is because it is single delete method in cpp side whereas,createIndex
has separate method for binary index likecreateBinaryIndex
.IndexEntryContext
. This data will be used when loading an index into memory to determine if it is binary index or not. For creation and query, we use vector data type value infieldInfo
to determine if it is binary index or not.VectorTransfer
,KNNIterator
, andbyteQueryVector
variable insideQuery
was needed as binary vector will havebyte[]
type when non binary vector will havefloat[]
type. Refactoring will be needed to enhance maintainability which is captured in [FEATURE]Refactoring of code for better maintainability #1810Testing
Response
Issues Resolved
#1767
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.