Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible small bug in code? #9

Open
fbragman opened this issue Mar 29, 2021 · 0 comments
Open

Possible small bug in code? #9

fbragman opened this issue Mar 29, 2021 · 0 comments

Comments

@fbragman
Copy link

fbragman commented Mar 29, 2021

Hi,

Thank you for making the code public. It's really nice to see!

I think the following line here

self.norm = LayerNorm(hidden_features)

should be

if concat:
    self.norm = LayerNorm(hidden_features * num_heads)

Also, apologies if I have missed this but if the output of the multi-headed attention is concatenated, shouldn't this be reflected in the size of the shared weights W at successive layers? Currently it is constant at d_in x d but it should be dK x d at intermediate layers.

self.W = clones(nn.Linear(in_features, hidden_features), num_of_heads)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant