ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. #270

WeixuanXiong · 2024-06-20T01:52:36Z

在用
torchrun --nproc_per_node=4 train.py --train_args_file train_args/sft/qlora/qwen2-7b-sft-qlora.json
训练qwen2+qlora+unsloth时（use_unsloth=true）出现错误：
ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. Make sure you loaded the model on the correct device using for example device_map={'':torch.cuda.current_device()}you're training on. Make sure you loaded the model on the correct device using for example device_map={'':torch.cuda.current_device() or device_map={'':torch.xpu.current_device()}

qwen2-7b-sft-qlora.json文件参数设置如下：

完整错误如下：
2024-06-20 01:48:35.195 | INFO | main:init_components:388 - Train model with sft task
2024-06-20 01:48:35.196 | INFO | main:load_sft_dataset:351 - Loading data with UnifiedSFTDataset
2024-06-20 01:48:35.196 | INFO | component.dataset:init:19 - Loading data: ./data/dummy_data.jsonl
2024-06-20 01:48:35.197 | INFO | component.dataset:init:22 - Use template "qwen" for training
2024-06-20 01:48:35.197 | INFO | component.dataset:init:23 - There are 33 data in dataset
2024-06-20 01:48:35.207 | INFO | main:main:426 - *** starting training ***
Traceback (most recent call last):
File "/dfs/data/code/Firefly/train.py", line 439, in
main()
File "/dfs/data/code/Firefly/train.py", line 427, in main
train_result = trainer.train()
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "", line 159, in _fast_inner_training_loop
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1202, in prepare
result = tuple(
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1203, in
self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1030, in _prepare_one
return self.prepare_model(obj, device_placement=device_placement)
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1281, in prepare_model
raise ValueError(
ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. Make sure you loaded the model on the correct device using for example device_map={'':torch.cuda.current_device()}you're training on. Make sure you loaded the model on the correct device using for example device_map={'':torch.cuda.current_device() or device_map={'':torch.xpu.current_device()}

不使用unsloth，单机多卡正常训练，使用unsloth，单机单卡也可以正常训练，只有在unsloth+多卡的时候报错，请问这是因为什么呢？

The text was updated successfully, but these errors were encountered:

yangjianxin1 · 2024-06-21T03:30:39Z

unsloth暂时仅支持单卡训练

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. #270

ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. #270

WeixuanXiong commented Jun 20, 2024 •

edited

Loading

yangjianxin1 commented Jun 21, 2024

ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. #270

ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. #270

Comments

WeixuanXiong commented Jun 20, 2024 • edited Loading

yangjianxin1 commented Jun 21, 2024

WeixuanXiong commented Jun 20, 2024 •

edited

Loading