Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MixFormer attention question (please...) #89

Open
NJiHyeon opened this issue Aug 13, 2023 · 1 comment
Open

MixFormer attention question (please...) #89

NJiHyeon opened this issue Aug 13, 2023 · 1 comment

Comments

@NJiHyeon
Copy link

Hello.
Thank you for letting me know a really good model.
I am writing this because I have a question while studying while reading MixFormer code.
When dividing queries, keys, and values in Attention class, do we do it with torch.split(q, [t_h * t_w * 2, s_h * s_w], dim=2), not torch.split(q, [t_h * t_w, s_h * s_w], dim=2)??
I wonder what the exact meaning of multiplying the "template" by two is.
And I'm wondering if the code works even if it runs on torch.split(q, [t_h * t_w, s_h * s_w], dim=2).
Thank you.!!!

@yutaocui
Copy link
Collaborator

Hi, welcome to follow. The reason why we use torch.split(q, [t_h * t_w * 2, s_h * s_w], dim=2) is that two templates are employed for simulating the static template (i.e. the first given one) and online template during the training process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants