Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run model.generate() for MoD model #4063

Open
1 task done
Zkli-hub opened this issue Jun 4, 2024 · 3 comments
Open
1 task done

Unable to run model.generate() for MoD model #4063

Zkli-hub opened this issue Jun 4, 2024 · 3 comments
Labels
pending This problem is yet to be addressed

Comments

@Zkli-hub
Copy link

Zkli-hub commented Jun 4, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

I find that I cannot run the generate() function to inference inputs using the converted model, can you help me?

Here is the error:

Reproduction

from transformers import AutoTokenizer, LlamaForCausalLM
model = LlamaMoDForCausalLM.from_pretrained("LLaMA-Factory/saves/llama2-7b-mod/full/sft_full_0")
tokenizer = AutoTokenizer.from_pretrained("LLaMA-Factory/saves/llama2-7b-mod/full/sft_full_0")
prompt = "Hey, are you conscious? Can you talk to me?"
inputs = tokenizer(prompt, return_tensors="pt")
generate_ids = model.generate(inputs.input_ids)
tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

Expected behavior

Run inference code using trained MoD model (sft based on llama2_mod)
image

Others

No response

@hiyouga hiyouga added pending This problem is yet to be addressed labels Jun 5, 2024
@hiyouga
Copy link
Owner

hiyouga commented Jun 5, 2024

cc: @mlinmg

@mlinmg
Copy link
Contributor

mlinmg commented Jun 10, 2024

I'll try to reproduce it this days, please send you tranformers and mod package verison.
Also what model did you started with? you said llama2_mod but I can't find it on HF

@PhoebusSi
Copy link

在NPU(910上),deepspeed和mod方法好像没有适配。
deepspeed+非mod的模型可以跑,不用deepspeed的mod小模型也可以跑(所以大点的模型因为不能用deepspeed而oom),
但是deepspeed+mod的大模型就不能跑,一直卡在第一个iteration,然后过会儿就超时pipe broken了。

@mlinmg @hiyouga 求解答~ 或者有其他适配了mod能work的并行化方式也好?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

4 participants