You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
在用
torchrun --nproc_per_node=4 train.py --train_args_file train_args/sft/qlora/qwen2-7b-sft-qlora.json
训练qwen2+qlora+unsloth时(use_unsloth=true)出现错误:
ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. Make sure you loaded the model on the correct device using for example device_map={'':torch.cuda.current_device()}you're training on. Make sure you loaded the model on the correct device using for example device_map={'':torch.cuda.current_device() or device_map={'':torch.xpu.current_device()}
qwen2-7b-sft-qlora.json文件参数设置如下:
完整错误如下:
2024-06-20 01:48:35.195 | INFO | main:init_components:388 - Train model with sft task
2024-06-20 01:48:35.196 | INFO | main:load_sft_dataset:351 - Loading data with UnifiedSFTDataset
2024-06-20 01:48:35.196 | INFO | component.dataset:init:19 - Loading data: ./data/dummy_data.jsonl
2024-06-20 01:48:35.197 | INFO | component.dataset:init:22 - Use template "qwen" for training
2024-06-20 01:48:35.197 | INFO | component.dataset:init:23 - There are 33 data in dataset
2024-06-20 01:48:35.207 | INFO | main:main:426 - *** starting training ***
Traceback (most recent call last):
File "/dfs/data/code/Firefly/train.py", line 439, in
main()
File "/dfs/data/code/Firefly/train.py", line 427, in main
train_result = trainer.train()
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "", line 159, in _fast_inner_training_loop
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1202, in prepare
result = tuple(
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1203, in
self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1030, in _prepare_one
return self.prepare_model(obj, device_placement=device_placement)
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1281, in prepare_model
raise ValueError(
ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. Make sure you loaded the model on the correct device using for example device_map={'':torch.cuda.current_device()}you're training on. Make sure you loaded the model on the correct device using for example device_map={'':torch.cuda.current_device() or device_map={'':torch.xpu.current_device()}
在用
torchrun --nproc_per_node=4 train.py --train_args_file train_args/sft/qlora/qwen2-7b-sft-qlora.json
训练qwen2+qlora+unsloth时(use_unsloth=true)出现错误:
ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. Make sure you loaded the model on the correct device using for example
device_map={'':torch.cuda.current_device()}you're training on. Make sure you loaded the model on the correct device using for example
device_map={'':torch.cuda.current_device() or device_map={'':torch.xpu.current_device()}qwen2-7b-sft-qlora.json文件参数设置如下:
![image](https://private-user-images.githubusercontent.com/92519311/341241690-4d9dac90-8f7f-45f9-8100-660fe74fe032.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk5ODQ3OTUsIm5iZiI6MTcxOTk4NDQ5NSwicGF0aCI6Ii85MjUxOTMxMS8zNDEyNDE2OTAtNGQ5ZGFjOTAtOGY3Zi00NWY5LTgxMDAtNjYwZmU3NGZlMDMyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MDMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzAzVDA1MjgxNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWZjNTA0NDBmYjk2OTYxNmU2MmE5OGViNDhjMzZhNzA0ODZhNmIwMTg0MGMxZjU1OWRmYjZjNmRmY2QxMDIwZjgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.RK_2-LFj2v_wGxm7YSrOyXq7Fy-bQsCOhbc0aa7yyOs)
完整错误如下:
2024-06-20 01:48:35.195 | INFO | main:init_components:388 - Train model with sft task
2024-06-20 01:48:35.196 | INFO | main:load_sft_dataset:351 - Loading data with UnifiedSFTDataset
2024-06-20 01:48:35.196 | INFO | component.dataset:init:19 - Loading data: ./data/dummy_data.jsonl
2024-06-20 01:48:35.197 | INFO | component.dataset:init:22 - Use template "qwen" for training
2024-06-20 01:48:35.197 | INFO | component.dataset:init:23 - There are 33 data in dataset
2024-06-20 01:48:35.207 | INFO | main:main:426 - *** starting training ***
Traceback (most recent call last):
File "/dfs/data/code/Firefly/train.py", line 439, in
main()
File "/dfs/data/code/Firefly/train.py", line 427, in main
train_result = trainer.train()
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "", line 159, in _fast_inner_training_loop
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1202, in prepare
result = tuple(
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1203, in
self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1030, in _prepare_one
return self.prepare_model(obj, device_placement=device_placement)
File "/dfs/data/hujh9/miniconda/envs/firefly/lib/python3.9/site-packages/accelerate/accelerator.py", line 1281, in prepare_model
raise ValueError(
ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. Make sure you loaded the model on the correct device using for example
device_map={'':torch.cuda.current_device()}you're training on. Make sure you loaded the model on the correct device using for example
device_map={'':torch.cuda.current_device() or device_map={'':torch.xpu.current_device()}不使用unsloth,单机多卡正常训练,使用unsloth,单机单卡也可以正常训练,只有在unsloth+多卡的时候报错,请问这是因为什么呢?
The text was updated successfully, but these errors were encountered: