Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. #270

Open
WeixuanXiong opened this issue Jun 20, 2024 · 1 comment

Comments

@WeixuanXiong
Copy link

WeixuanXiong commented Jun 20, 2024

在用
torchrun --nproc_per_node=4 train.py --train_args_file train_args/sft/qlora/qwen2-7b-sft-qlora.json
训练qwen2+qlora+unsloth时(use_unsloth=true)出现错误:
ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. Make sure you loaded the model on the correct device using for example device_map={'':torch.cuda.current_device()}you're training on. Make sure you loaded the model on the correct device using for example device_map={'':torch.cuda.current_device() or device_map={'':torch.xpu.current_device()}

qwen2-7b-sft-qlora.json文件参数设置如下:
image

完整错误如下:
2024-06-20 01:48:35.195 | INFO | main:init_components:388 - Train model with sft task
2024-06-20 01:48:35.196 | INFO | main:load_sft_dataset:351 - Loading data with UnifiedSFTDataset
2024-06-20 01:48:35.196 | INFO | component.dataset:init:19 - Loading data: ./data/dummy_data.jsonl
2024-06-20 01:48:35.197 | INFO | component.dataset:init:22 - Use template "qwen" for training
2024-06-20 01:48:35.197 | INFO | component.dataset:init:23 - There are 33 data in dataset
2024-06-20 01:48:35.207 | INFO | main:main:426 - *** starting training ***
Traceback (most recent call last):
File "/dfs/data/code/Firefly/train.py", line 439, in
main()
File "/dfs/data/code/Firefly/train.py", line 427, in main
train_result = trainer.train()
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "", line 159, in _fast_inner_training_loop
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1202, in prepare
result = tuple(
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1203, in
self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1030, in _prepare_one
return self.prepare_model(obj, device_placement=device_placement)
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1281, in prepare_model
raise ValueError(
ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. Make sure you loaded the model on the correct device using for example device_map={'':torch.cuda.current_device()}you're training on. Make sure you loaded the model on the correct device using for example device_map={'':torch.cuda.current_device() or device_map={'':torch.xpu.current_device()}

不使用unsloth,单机多卡正常训练,使用unsloth,单机单卡也可以正常训练,只有在unsloth+多卡的时候报错,请问这是因为什么呢?

@yangjianxin1
Copy link
Owner

unsloth暂时仅支持单卡训练

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants