Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the keys smaller #23

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Make the keys smaller #23

wants to merge 1 commit into from

Conversation

irevoire
Copy link
Member

Make the keys as small as possible by:

  • Getting rid of the padding and the INTEGER_KEY thingy
  • Making the index only 7-bit and shoving the mode as the last bit of the index
  • Having only two modes instead of three, now the metadata is the last node of the tree nodes

Performances results:

baseline / main branch:

Starts to insert document in the database ...
Took 181.275834ms to insert the vectors
Took 217.843506416s to build
Took 218.02478225s in total to insert and build

Louis's query took 3.873666ms

smol_keys:

Starts to insert document in the database ...
Took 116.265167ms to insert the vectors
Took 201.482862333s to build
Took 201.5991275s in total to insert and build

Louis's query took 6.476875ms

And even though the query time is pretty small, the 3.8ms vs 6.4ms seems pretty consistent

Copy link
Member

@Kerollmops Kerollmops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the README to specify that users can use a u7 for their index, please?

And I am very worried about the x2 on the query time. Can you measure that just to understand the reason why? We can't double the query time to gain 10% of indexing time...

dimensions: usize,
) -> Result<Writer<D>> {
if index > 0b0111_1111 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a function for that, instead?

let node = lazy_node.decode().unwrap();
writeln!(f, "Tree {}: {node:?}", key.item)?;
}
_ => unreachable!("Impossiburuu"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ => unreachable!("Impossiburuu"),
_ => unreachable!(),

database: heed::Database<KeyCodec, U>,
) -> Result<Reader<'t, D>> {
if index > 0b0111_1111 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a non_meta function here, please?

@@ -28,9 +28,15 @@ pub struct Reader<'t, D: Distance> {
impl<'t, D: Distance> Reader<'t, D> {
pub fn open<U>(
rtxn: &'t RoTxn,
index: u16,
index: u8,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an infromation on what's the limit of this u8, please? Just a single doc line, we will refine that later.

Comment on lines +89 to +92
let item_bytes = self.item.to_be_bytes();
debug_assert_eq!(item_bytes.len(), output.len() - 1);

output[1..].copy_from_slice(&item_bytes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let item_bytes = self.item.to_be_bytes();
debug_assert_eq!(item_bytes.len(), output.len() - 1);
output[1..].copy_from_slice(&item_bytes);
let item_bytes = self.item.to_be_bytes();
output[1..].copy_from_slice(&item_bytes);

This function will panic if the two slices have different lengths.
https://doc.rust-lang.org/stable/std/primitive.slice.html#method.copy_from_slice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants