-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Alpha-VLLM Team] Add Lumina-T2X to diffusers #8652
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: YiYi Xu <[email protected]>
Co-authored-by: YiYi Xu <[email protected]>
Co-authored-by: YiYi Xu <[email protected]>
Co-authored-by: YiYi Xu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you! I did another round of review! I think we are close to merge once these are addressed
- Removed extra input parameter - Fixed typo on Feed_Forward Co-authored-by: YiYi Xu <[email protected]>
Co-authored-by: YiYi Xu <[email protected]>
Co-authored-by: YiYi Xu <[email protected]>
Co-authored-by: YiYi Xu <[email protected]>
Yes, this is called Grouped Query Attention proposed in this paper, which can optimize training and inference efficiency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! I think we can merge this soon
I have one question here need your input #8652 (comment) - can you look into to refactor with get_1d_rotary_pos_embed
, or create a method similar to it for Lumina
the rest of changes are just to make sure the code structure and naming conventions are consistent with other models - we can certainly help with these to finish it up!
|
||
return Transformer2DModelOutput(sample=output) | ||
|
||
def precompute_freqs_cis( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use this instead?
diffusers/src/diffusers/models/embeddings.py
Line 277 in 0bae6e4
def get_1d_rotary_pos_embed(dim: int, pos: Union[np.ndarray, int], theta: float = 10000.0, use_real=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! I have refactored this by adding a new function called get_2d_rotary_pos_embed_lumina
using get_1d_rotary_pos_embed
inside. I added some lines of code in get_1d_rotary_pos_embed
to enable context extrapolation proposed in our paper. (Note that this is a universal method as long as the model use RoPE, such as lumina and hunyuan. Besides, the added argument are disabled in default so they will not influence existing pipelines.)
scale_msa, gate_msa, scale_mlp, gate_mlp = self.adaLN_modulation(adaln_input).chunk(4, dim=1) | ||
|
||
# Self-attention | ||
hidden_states = modulate(self.attn_norm1(hidden_states), scale_msa) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make a LuminaLayerNormZero
similar to AdaLayerNormZero
and keep it in this file since it's specific to Luminn?
Co-authored-by: YiYi Xu <[email protected]>
What does this PR do?
Add Lumina-T2X to diffusers
Fixes #8652
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.