Everyone, I have implemented multi-token prediction of InfiniAttention and meta. #518

win10ogod · 2024-05-08T02:52:52Z

Is anyone willing to help improve memory consumption?
I will post a PR.
(https://github.com/win10ogod/llama2-InfiniAttention.c/blob/master/model.py)
I commented outmodel_export(raw_model, os.path.join(out_dir, "model.bin"), version=0)
Because I don’t understand C language, there are also errors in export .py

The text was updated successfully, but these errors were encountered:

bityigoss · 2024-05-13T06:37:23Z

if we still tie the weights of lm_head in multi-token prediction, how they output different token predictions? @win10ogod

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Everyone, I have implemented multi-token prediction of InfiniAttention and meta. #518

Everyone, I have implemented multi-token prediction of InfiniAttention and meta. #518

win10ogod commented May 8, 2024

bityigoss commented May 13, 2024

Everyone, I have implemented multi-token prediction of InfiniAttention and meta. #518

Everyone, I have implemented multi-token prediction of InfiniAttention and meta. #518

Comments

win10ogod commented May 8, 2024

bityigoss commented May 13, 2024