Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting EVO representations rather than logits #32

Open
amoskalev opened this issue Mar 8, 2024 · 5 comments
Open

Extracting EVO representations rather than logits #32

amoskalev opened this issue Mar 8, 2024 · 5 comments

Comments

@amoskalev
Copy link

Hi, thanks for your amazing work!

How can I extract representations rather than logits from the model?

I am using the huggingface version, and I see the model returns logits' and 'past_key_values. Could you please explain what's in past_key_values and if anything of those can be used as a sequence representation? Or maybe you can suggest other ways to access representations of a model?

@davidkell
Copy link

davidkell commented Mar 16, 2024

Here's how I'm currently solving this (adapted from usage in README) :

from evo import Evo
import torch

device = 'cuda:0'

evo_model = Evo('evo-1-131k-base')
model, tokenizer = evo_model.model, evo_model.tokenizer
model.to(device)
model.eval()

# monkey patch the unembed function with identity
# this removes the final projection back from the embedding space into tokens
# so the "logits" of the model is now the final layer embedding
# see source for unembed - https://huggingface.co/togethercomputer/evo-1-131k-base/blob/main/model.py#L339

from torch import nn

class CustomEmbedding(nn.Module):
  def unembed(self, u):
    return u

model.unembed = CustomEmbedding()

# end custom code

sequence = 'ACGT'
input_ids = torch.tensor(
    tokenizer.tokenize(sequence),
    dtype=torch.int,
).to(device).unsqueeze(0)

embed, _ = model(input_ids) # (batch, length, embed dim)

print('Embed: ', embed)
print('Shape (batch, length, embed dim): ', embed.shape)

# you can now use embedding for downstream classification tasks
# you probably want to aggregate over position dimension
# e.g. mean value = embed.mean(dim=1) or final token embedding = embed[:, -1, :]

Note that this is for the model object returned by evo-model, which is an instance of StripedHyena. If you are using Huggingface directly, this is wrapped with StripedHyenaModelForCausalLM, so you need to do model.backbone.unembed = CustomEmbedding()

@seyonechithrananda
Copy link

Thanks @davidkell !

@zhongwang
Copy link

@davidkell I tried your code on a A100 40GB using the evo-8k model, embedding the 4-letter sequence in the example costs over 400MB GPU RAM, the model itself needs 13GB. The embedding dimension is 4096. I don't understand why it cost so much memory. 4x4096 BF16 should only take 32KB, right? I tried to embed a 2kb sequence but always ran out of cuda memory. Anyone has a similar problem?

@davidkell
Copy link

davidkell commented Apr 16, 2024

I had a similar experience. I was able to get inference working for 2k sequences on A100 80GB (e.g. available on Paperspace), although around 2.5-3k I would get OOM. I haven't looked in depth on what is driving the memory requirement

@davidkell
Copy link

davidkell commented Apr 16, 2024

Quoting from this issue #24:

Prompting with longer sequences requires sharding for the model, which is currently not supported

So I think if you want to generate embeddings for longer sequences, you will need to manually shard on GPUs or setup CPU offloading or something like that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants